Hi folks,
The former part of this series refactors the IOMMU SVA code by assigning
an SVA type of iommu_domain to a shared virtual address and replacing
sva_bind/unbind iommu ops with set/block_dev_pasid domain ops.
The latter part changes the existing I/O page fault handling framework
from only serving SVA to a generic one. Any driver or component could
handle the I/O page faults for its domain in its own way by installing
an I/O page fault handler.
Thanks to Tony and Zhangfei. This series has been functionally tested on
both intel and ARM aarch64 platforms.
This series is also available on github:
[2] https://github.com/LuBaolu/intel-iommu/commits/iommu-sva-refactoring-v10
Please review and suggest.
Best regards,
baolu
Change log:
v10:
- Rebase on next branch of iommu tree.
- Split attach/detach_device_pasid interfaces and SVA domain extensions
to different patches.
- Handle the return error of xa_cmpxchg() gracefully.
- Directly pass mm in as the SVA fault data.
- Rename iopf_handle_group() to iopf_handler().
- Some commit message and code comment refinement.
- Add Tested-by's from Zhangfei and Tony.
v9:
- https://lore.kernel.org/linux-iommu/[email protected]/
- Some minor changes on comments and function names.
- Simplify dev_iommu_get_max_pasids().
v8:
- https://lore.kernel.org/linux-iommu/[email protected]/
- Add support for calculating the max pasids that a device could
consume.
- Replace container_of_safe() with container_of.
- Remove iommu_ops->sva_domain_ops and make sva support through the
generic domain_alloc/free() interfaces.
- [Robin] It would be logical to pass IOMMU_DOMAIN_SVA to the normal
domain_alloc call, so that driver-internal stuff like context
descriptors can be still be hung off the domain as usual (rather than
all drivers having to implement some extra internal lookup mechanism
to handle all the SVA domain ops).
- [Robin] I'd just stick the mm pointer in struct iommu_domain, in a
union with the fault handler stuff those are mutually exclusive with
SVA.
- https://lore.kernel.org/linux-iommu/[email protected]/
v7:
- https://lore.kernel.org/linux-iommu/[email protected]/
- Remove duplicate array for sva domain.
- Rename detach_dev_pasid to block_dev_pasid.
- Add raw device driver interfaces for iommufd.
- Other misc refinements and patch reorganization.
- Drop "dmaengine: idxd: Separate user and kernel pasid enabling" which
has been picked for dmaengine tree.
v6:
- https://lore.kernel.org/linux-iommu/[email protected]/
- Refine the SVA basic data structures.
Link: https://lore.kernel.org/linux-iommu/YnFv0ps0Ad8v+7uH@myrica/
- Refine arm smmuv3 sva domain allocation.
- Fix a possible lock issue.
Link: https://lore.kernel.org/linux-iommu/YnFydE8j8l7Q4m+b@myrica/
v5:
- https://lore.kernel.org/linux-iommu/[email protected]/
- Address review comments from Jean-Philippe Brucker. Very appreciated!
- Remove redundant pci aliases check in
device_group_immutable_singleton().
- Treat all buses except PCI as static in immutable singleton check.
- As the sva_bind/unbind() have already guaranteed sva domain free only
after iopf_queue_flush_dev(), remove the unnecessary domain refcount.
- Move domain get() out of the list iteration in iopf_handle_group().
v4:
- https://lore.kernel.org/linux-iommu/[email protected]/
- Solve the overlap with another series and make this series
self-contained.
- No objection to the abstraction of data structure during v3 review.
Hence remove the RFC subject prefix.
- Refine the immutable singleton group code according to Kevin's
comments.
v3:
- https://lore.kernel.org/linux-iommu/[email protected]/
- Rework iommu_group_singleton_lockdown() by adding a flag to the group
that positively indicates the group can never have more than one
member, even after hot plug.
- Abstract the data structs used for iommu sva in a separated patches to
make it easier for review.
- I still keep the RFC prefix in this series as above two significant
changes need at least another round review to be finalized.
- Several misc refinements.
v2:
- https://lore.kernel.org/linux-iommu/[email protected]/
- Add sva domain life cycle management to avoid race between unbind and
page fault handling.
- Use a single domain for each mm.
- Return a single sva handler for the same binding.
- Add a new helper to meet singleton group requirement.
- Rework the SVA domain allocation for arm smmu v3 driver and move the
pasid_bit initialization to device probe.
- Drop the patch "iommu: Handle IO page faults directly".
- Add mmget_not_zero(mm) in SVA page fault handler.
v1:
- https://lore.kernel.org/linux-iommu/[email protected]/
- Initial post.
Lu Baolu (12):
iommu: Add max_pasids field in struct iommu_device
iommu: Add max_pasids field in struct dev_iommu
iommu: Remove SVM_FLAG_SUPERVISOR_MODE support
iommu: Add attach/detach_dev_pasid iommu interface
iommu: Add IOMMU SVA domain support
iommu/vt-d: Add SVA domain support
arm-smmu-v3/sva: Add SVA domain support
iommu/sva: Refactoring iommu_sva_bind/unbind_device()
iommu: Remove SVA related callbacks from iommu ops
iommu: Prepare IOMMU domain for IOPF
iommu: Per-domain I/O page fault handling
iommu: Rename iommu-sva-lib.{c,h}
include/linux/intel-iommu.h | 12 +-
include/linux/intel-svm.h | 13 -
include/linux/iommu.h | 128 +++++++---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 19 +-
.../iommu/{iommu-sva-lib.h => iommu-sva.h} | 14 +-
drivers/dma/idxd/cdev.c | 3 +-
drivers/dma/idxd/init.c | 25 +-
.../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 112 +++++---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 9 +-
drivers/iommu/intel/dmar.c | 7 +
drivers/iommu/intel/iommu.c | 7 +-
drivers/iommu/intel/svm.c | 149 +++++------
drivers/iommu/io-pgfault.c | 77 ++----
drivers/iommu/iommu-sva-lib.c | 71 ------
drivers/iommu/iommu-sva.c | 227 +++++++++++++++++
drivers/iommu/iommu.c | 239 +++++++++++-------
drivers/misc/uacce/uacce.c | 2 +-
drivers/iommu/Makefile | 2 +-
18 files changed, 651 insertions(+), 465 deletions(-)
rename drivers/iommu/{iommu-sva-lib.h => iommu-sva.h} (83%)
delete mode 100644 drivers/iommu/iommu-sva-lib.c
create mode 100644 drivers/iommu/iommu-sva.c
--
2.25.1
Attaching an IOMMU domain to a PASID of a device is a generic operation
for modern IOMMU drivers which support PASID-granular DMA address
translation. Currently visible usage scenarios include (but not limited):
- SVA (Shared Virtual Address)
- kernel DMA with PASID
- hardware-assist mediated device
This adds a pair of domain ops for this purpose and adds the interfaces
for device drivers to attach/detach a domain to/from a {device, PASID}.
Some buses, like PCI, route packets without considering the PASID value.
Thus a DMA target address with PASID might be treated as P2P if the
address falls into the MMIO BAR of other devices in the group. To make
things simple, these interfaces only apply to devices belonging to the
singleton groups, and the singleton is immutable in fabric (i.e. not
affected by hotplug).
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jean-Philippe Brucker <[email protected]>
Tested-by: Zhangfei Gao <[email protected]>
Tested-by: Tony Zhu <[email protected]>
---
include/linux/iommu.h | 21 ++++++++++++
drivers/iommu/iommu.c | 75 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 96 insertions(+)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index f41eb2b3c7da..f2b5aa7efe43 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -262,6 +262,8 @@ struct iommu_ops {
* struct iommu_domain_ops - domain specific operations
* @attach_dev: attach an iommu domain to a device
* @detach_dev: detach an iommu domain from a device
+ * @set_dev_pasid: set an iommu domain to a pasid of device
+ * @block_dev_pasid: block pasid of device from using iommu domain
* @map: map a physically contiguous memory region to an iommu domain
* @map_pages: map a physically contiguous set of pages of the same size to
* an iommu domain.
@@ -282,6 +284,10 @@ struct iommu_ops {
struct iommu_domain_ops {
int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
+ int (*set_dev_pasid)(struct iommu_domain *domain, struct device *dev,
+ ioasid_t pasid);
+ void (*block_dev_pasid)(struct iommu_domain *domain, struct device *dev,
+ ioasid_t pasid);
int (*map)(struct iommu_domain *domain, unsigned long iova,
phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
@@ -679,6 +685,10 @@ int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner);
void iommu_group_release_dma_owner(struct iommu_group *group);
bool iommu_group_dma_owner_claimed(struct iommu_group *group);
+int iommu_attach_device_pasid(struct iommu_domain *domain, struct device *dev,
+ ioasid_t pasid);
+void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev,
+ ioasid_t pasid);
#else /* CONFIG_IOMMU_API */
struct iommu_ops {};
@@ -1052,6 +1062,17 @@ static inline bool iommu_group_dma_owner_claimed(struct iommu_group *group)
{
return false;
}
+
+static inline int iommu_attach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
+{
+ return -ENODEV;
+}
+
+static inline void iommu_detach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
+{
+}
#endif /* CONFIG_IOMMU_API */
/**
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 74a0a3ec0907..be48b09371f4 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -39,6 +39,7 @@ struct iommu_group {
struct kobject kobj;
struct kobject *devices_kobj;
struct list_head devices;
+ struct xarray pasid_array;
struct mutex mutex;
void *iommu_data;
void (*iommu_data_release)(void *iommu_data);
@@ -660,6 +661,7 @@ struct iommu_group *iommu_group_alloc(void)
mutex_init(&group->mutex);
INIT_LIST_HEAD(&group->devices);
INIT_LIST_HEAD(&group->entry);
+ xa_init(&group->pasid_array);
ret = ida_alloc(&iommu_group_ida, GFP_KERNEL);
if (ret < 0) {
@@ -3271,3 +3273,76 @@ bool iommu_group_dma_owner_claimed(struct iommu_group *group)
return user;
}
EXPORT_SYMBOL_GPL(iommu_group_dma_owner_claimed);
+
+static bool iommu_group_immutable_singleton(struct iommu_group *group,
+ struct device *dev)
+{
+ int count;
+
+ mutex_lock(&group->mutex);
+ count = iommu_group_device_count(group);
+ mutex_unlock(&group->mutex);
+
+ if (count != 1)
+ return false;
+
+ /*
+ * The PCI device could be considered to be fully isolated if all
+ * devices on the path from the device to the host-PCI bridge are
+ * protected from peer-to-peer DMA by ACS.
+ */
+ if (dev_is_pci(dev))
+ return pci_acs_path_enabled(to_pci_dev(dev), NULL,
+ REQ_ACS_FLAGS);
+
+ /*
+ * Otherwise, the device came from DT/ACPI, assume it is static and
+ * then singleton can know from the device count in the group.
+ */
+ return true;
+}
+
+int iommu_attach_device_pasid(struct iommu_domain *domain, struct device *dev,
+ ioasid_t pasid)
+{
+ struct iommu_group *group;
+ void *curr;
+ int ret;
+
+ if (!domain->ops->set_dev_pasid)
+ return -EOPNOTSUPP;
+
+ group = iommu_group_get(dev);
+ if (!group || !iommu_group_immutable_singleton(group, dev)) {
+ iommu_group_put(group);
+ return -EINVAL;
+ }
+
+ mutex_lock(&group->mutex);
+ curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain, GFP_KERNEL);
+ if (curr) {
+ ret = xa_err(curr) ? : -EBUSY;
+ goto out_unlock;
+ }
+ ret = domain->ops->set_dev_pasid(domain, dev, pasid);
+ if (ret)
+ xa_erase(&group->pasid_array, pasid);
+out_unlock:
+ mutex_unlock(&group->mutex);
+ iommu_group_put(group);
+
+ return ret;
+}
+
+void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev,
+ ioasid_t pasid)
+{
+ struct iommu_group *group = iommu_group_get(dev);
+
+ mutex_lock(&group->mutex);
+ domain->ops->block_dev_pasid(domain, dev, pasid);
+ xa_erase(&group->pasid_array, pasid);
+ mutex_unlock(&group->mutex);
+
+ iommu_group_put(group);
+}
--
2.25.1
The sva iommu_domain represents a hardware pagetable that the IOMMU
hardware could use for SVA translation. This adds some infrastructure
to support SVA domain in the iommu common layer. It includes:
- Extend the iommu_domain to support a new IOMMU_DOMAIN_SVA domain
type. The IOMMU drivers that support allocation of the SVA domain
should provide its own sva domain specific iommu_domain_ops.
- Add a helper to allocate an SVA domain. The iommu_domain_free()
is still used to free an SVA domain.
The report_iommu_fault() should be replaced by the new
iommu_report_device_fault(). Leave the existing fault handler with the
existing users and the newly added SVA members excludes it.
Suggested-by: Jean-Philippe Brucker <[email protected]>
Suggested-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jean-Philippe Brucker <[email protected]>
Tested-by: Zhangfei Gao <[email protected]>
Tested-by: Tony Zhu <[email protected]>
---
include/linux/iommu.h | 24 ++++++++++++++++++++++--
drivers/iommu/iommu.c | 20 ++++++++++++++++++++
2 files changed, 42 insertions(+), 2 deletions(-)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index f2b5aa7efe43..42f0418dc22c 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -64,6 +64,8 @@ struct iommu_domain_geometry {
#define __IOMMU_DOMAIN_PT (1U << 2) /* Domain is identity mapped */
#define __IOMMU_DOMAIN_DMA_FQ (1U << 3) /* DMA-API uses flush queue */
+#define __IOMMU_DOMAIN_SVA (1U << 4) /* Shared process address space */
+
/*
* This are the possible domain-types
*
@@ -77,6 +79,8 @@ struct iommu_domain_geometry {
* certain optimizations for these domains
* IOMMU_DOMAIN_DMA_FQ - As above, but definitely using batched TLB
* invalidation.
+ * IOMMU_DOMAIN_SVA - DMA addresses are shared process address
+ * spaces represented by mm_struct's.
*/
#define IOMMU_DOMAIN_BLOCKED (0U)
#define IOMMU_DOMAIN_IDENTITY (__IOMMU_DOMAIN_PT)
@@ -86,15 +90,23 @@ struct iommu_domain_geometry {
#define IOMMU_DOMAIN_DMA_FQ (__IOMMU_DOMAIN_PAGING | \
__IOMMU_DOMAIN_DMA_API | \
__IOMMU_DOMAIN_DMA_FQ)
+#define IOMMU_DOMAIN_SVA (__IOMMU_DOMAIN_SVA)
struct iommu_domain {
unsigned type;
const struct iommu_domain_ops *ops;
unsigned long pgsize_bitmap; /* Bitmap of page sizes in use */
- iommu_fault_handler_t handler;
- void *handler_token;
struct iommu_domain_geometry geometry;
struct iommu_dma_cookie *iova_cookie;
+ union {
+ struct {
+ iommu_fault_handler_t handler;
+ void *handler_token;
+ };
+ struct { /* IOMMU_DOMAIN_SVA */
+ struct mm_struct *mm;
+ };
+ };
};
static inline bool iommu_is_dma_domain(struct iommu_domain *domain)
@@ -685,6 +697,8 @@ int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner);
void iommu_group_release_dma_owner(struct iommu_group *group);
bool iommu_group_dma_owner_claimed(struct iommu_group *group);
+struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
+ struct mm_struct *mm);
int iommu_attach_device_pasid(struct iommu_domain *domain, struct device *dev,
ioasid_t pasid);
void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev,
@@ -1063,6 +1077,12 @@ static inline bool iommu_group_dma_owner_claimed(struct iommu_group *group)
return false;
}
+static inline struct iommu_domain *
+iommu_sva_domain_alloc(struct device *dev, struct mm_struct *mm)
+{
+ return NULL;
+}
+
static inline int iommu_attach_device_pasid(struct iommu_domain *domain,
struct device *dev, ioasid_t pasid)
{
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index be48b09371f4..10479c5e4d23 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -27,6 +27,7 @@
#include <linux/module.h>
#include <linux/cc_platform.h>
#include <trace/events/iommu.h>
+#include <linux/sched/mm.h>
static struct kset *iommu_group_kset;
static DEFINE_IDA(iommu_group_ida);
@@ -1957,6 +1958,8 @@ EXPORT_SYMBOL_GPL(iommu_domain_alloc);
void iommu_domain_free(struct iommu_domain *domain)
{
+ if (domain->type == IOMMU_DOMAIN_SVA)
+ mmdrop(domain->mm);
iommu_put_dma_cookie(domain);
domain->ops->free(domain);
}
@@ -3274,6 +3277,23 @@ bool iommu_group_dma_owner_claimed(struct iommu_group *group)
}
EXPORT_SYMBOL_GPL(iommu_group_dma_owner_claimed);
+struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
+ struct mm_struct *mm)
+{
+ const struct iommu_ops *ops = dev_iommu_ops(dev);
+ struct iommu_domain *domain;
+
+ domain = ops->domain_alloc(IOMMU_DOMAIN_SVA);
+ if (!domain)
+ return NULL;
+
+ domain->type = IOMMU_DOMAIN_SVA;
+ mmgrab(mm);
+ domain->mm = mm;
+
+ return domain;
+}
+
static bool iommu_group_immutable_singleton(struct iommu_group *group,
struct device *dev)
{
--
2.25.1
Use this field to save the number of PASIDs that a device is able to
consume. It is a generic attribute of a device and lifting it into the
per-device dev_iommu struct could help to avoid the boilerplate code
in various IOMMU drivers.
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Tested-by: Zhangfei Gao <[email protected]>
Tested-by: Tony Zhu <[email protected]>
---
include/linux/iommu.h | 2 ++
drivers/iommu/iommu.c | 20 ++++++++++++++++++++
2 files changed, 22 insertions(+)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 03fbb1b71536..418a1914a041 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -364,6 +364,7 @@ struct iommu_fault_param {
* @fwspec: IOMMU fwspec data
* @iommu_dev: IOMMU device this device is linked to
* @priv: IOMMU Driver private data
+ * @max_pasids: number of PASIDs this device can consume
*
* TODO: migrate other per device data pointers under iommu_dev_data, e.g.
* struct iommu_group *iommu_group;
@@ -375,6 +376,7 @@ struct dev_iommu {
struct iommu_fwspec *fwspec;
struct iommu_device *iommu_dev;
void *priv;
+ u32 max_pasids;
};
int iommu_device_register(struct iommu_device *iommu,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index cdc86c39954e..0cb0750f61e8 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -20,6 +20,7 @@
#include <linux/idr.h>
#include <linux/err.h>
#include <linux/pci.h>
+#include <linux/pci-ats.h>
#include <linux/bitops.h>
#include <linux/property.h>
#include <linux/fsl/mc.h>
@@ -218,6 +219,24 @@ static void dev_iommu_free(struct device *dev)
kfree(param);
}
+static u32 dev_iommu_get_max_pasids(struct device *dev)
+{
+ u32 max_pasids = 0, bits = 0;
+ int ret;
+
+ if (dev_is_pci(dev)) {
+ ret = pci_max_pasids(to_pci_dev(dev));
+ if (ret > 0)
+ max_pasids = ret;
+ } else {
+ ret = device_property_read_u32(dev, "pasid-num-bits", &bits);
+ if (!ret)
+ max_pasids = 1UL << bits;
+ }
+
+ return min_t(u32, max_pasids, dev->iommu->iommu_dev->max_pasids);
+}
+
static int __iommu_probe_device(struct device *dev, struct list_head *group_list)
{
const struct iommu_ops *ops = dev->bus->iommu_ops;
@@ -243,6 +262,7 @@ static int __iommu_probe_device(struct device *dev, struct list_head *group_list
}
dev->iommu->iommu_dev = iommu_dev;
+ dev->iommu->max_pasids = dev_iommu_get_max_pasids(dev);
group = iommu_group_get_for_dev(dev);
if (IS_ERR(group)) {
--
2.25.1
Use this field to keep the number of supported PASIDs that an IOMMU
hardware is able to support. This is a generic attribute of an IOMMU
and lifting it into the per-IOMMU device structure makes it possible
to allocate a PASID for device without calls into the IOMMU drivers.
Any iommu driver that supports PASID related features should set this
field before enabling them on the devices.
In the Intel IOMMU driver, intel_iommu_sm is moved to CONFIG_INTEL_IOMMU
enclave so that the pasid_supported() helper could be used in dmar.c
without compilation errors.
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jean-Philippe Brucker <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Tested-by: Zhangfei Gao <[email protected]>
Tested-by: Tony Zhu <[email protected]>
---
include/linux/intel-iommu.h | 3 ++-
include/linux/iommu.h | 2 ++
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 1 +
drivers/iommu/intel/dmar.c | 7 +++++++
4 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 4f29139bbfc3..e065cbe3c857 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -479,7 +479,6 @@ enum {
#define VTD_FLAG_IRQ_REMAP_PRE_ENABLED (1 << 1)
#define VTD_FLAG_SVM_CAPABLE (1 << 2)
-extern int intel_iommu_sm;
extern spinlock_t device_domain_lock;
#define sm_supported(iommu) (intel_iommu_sm && ecap_smts((iommu)->ecap))
@@ -786,6 +785,7 @@ struct context_entry *iommu_context_addr(struct intel_iommu *iommu, u8 bus,
extern const struct iommu_ops intel_iommu_ops;
#ifdef CONFIG_INTEL_IOMMU
+extern int intel_iommu_sm;
extern int iommu_calculate_agaw(struct intel_iommu *iommu);
extern int iommu_calculate_max_sagaw(struct intel_iommu *iommu);
extern int dmar_disabled;
@@ -802,6 +802,7 @@ static inline int iommu_calculate_max_sagaw(struct intel_iommu *iommu)
}
#define dmar_disabled (1)
#define intel_iommu_enabled (0)
+#define intel_iommu_sm (0)
#endif
static inline const char *decode_prq_descriptor(char *str, size_t size,
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5e1afe169549..03fbb1b71536 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -318,12 +318,14 @@ struct iommu_domain_ops {
* @list: Used by the iommu-core to keep a list of registered iommus
* @ops: iommu-ops for talking to this iommu
* @dev: struct device for sysfs handling
+ * @max_pasids: number of supported PASIDs
*/
struct iommu_device {
struct list_head list;
const struct iommu_ops *ops;
struct fwnode_handle *fwnode;
struct device *dev;
+ u32 max_pasids;
};
/**
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 88817a3376ef..ae8ec8df47c1 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3546,6 +3546,7 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
/* SID/SSID sizes */
smmu->ssid_bits = FIELD_GET(IDR1_SSIDSIZE, reg);
smmu->sid_bits = FIELD_GET(IDR1_SIDSIZE, reg);
+ smmu->iommu.max_pasids = 1UL << smmu->ssid_bits;
/*
* If the SMMU supports fewer bits than would fill a single L2 stream
diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 592c1e1a5d4b..6c338888061a 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1123,6 +1123,13 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd)
raw_spin_lock_init(&iommu->register_lock);
+ /*
+ * A value of N in PSS field of eCap register indicates hardware
+ * supports PASID field of N+1 bits.
+ */
+ if (pasid_supported(iommu))
+ iommu->iommu.max_pasids = 2UL << ecap_pss(iommu->ecap);
+
/*
* This is only for hotplug; at boot time intel_iommu_enabled won't
* be set yet. When intel_iommu_init() runs, it registers the units
--
2.25.1
The current kernel DMA with PASID support is based on the SVA with a flag
SVM_FLAG_SUPERVISOR_MODE. The IOMMU driver binds the kernel memory address
space to a PASID of the device. The device driver programs the device with
kernel virtual address (KVA) for DMA access. There have been security and
functional issues with this approach:
- The lack of IOTLB synchronization upon kernel page table updates.
(vmalloc, module/BPF loading, CONFIG_DEBUG_PAGEALLOC etc.)
- Other than slight more protection, using kernel virtual address (KVA)
has little advantage over physical address. There are also no use
cases yet where DMA engines need kernel virtual addresses for in-kernel
DMA.
This removes SVM_FLAG_SUPERVISOR_MODE support from the IOMMU interface.
The device drivers are suggested to handle kernel DMA with PASID through
the kernel DMA APIs.
The drvdata parameter in iommu_sva_bind_device() and all callbacks is not
needed anymore. Cleanup them as well.
Link: https://lore.kernel.org/linux-iommu/[email protected]/
Signed-off-by: Jacob Pan <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Jean-Philippe Brucker <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Tested-by: Zhangfei Gao <[email protected]>
Tested-by: Tony Zhu <[email protected]>
---
include/linux/intel-iommu.h | 3 +-
include/linux/intel-svm.h | 13 -----
include/linux/iommu.h | 8 +--
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 5 +-
drivers/dma/idxd/cdev.c | 3 +-
drivers/dma/idxd/init.c | 25 +-------
.../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 3 +-
drivers/iommu/intel/svm.c | 57 +++++--------------
drivers/iommu/iommu.c | 5 +-
drivers/misc/uacce/uacce.c | 2 +-
10 files changed, 26 insertions(+), 98 deletions(-)
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index e065cbe3c857..31e3edc0fc7e 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -738,8 +738,7 @@ struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn);
extern void intel_svm_check(struct intel_iommu *iommu);
extern int intel_svm_enable_prq(struct intel_iommu *iommu);
extern int intel_svm_finish_prq(struct intel_iommu *iommu);
-struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm,
- void *drvdata);
+struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm);
void intel_svm_unbind(struct iommu_sva *handle);
u32 intel_svm_get_pasid(struct iommu_sva *handle);
int intel_svm_page_response(struct device *dev, struct iommu_fault_event *evt,
diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
index 207ef06ba3e1..f9a0d44f6fdb 100644
--- a/include/linux/intel-svm.h
+++ b/include/linux/intel-svm.h
@@ -13,17 +13,4 @@
#define PRQ_RING_MASK ((0x1000 << PRQ_ORDER) - 0x20)
#define PRQ_DEPTH ((0x1000 << PRQ_ORDER) >> 5)
-/*
- * The SVM_FLAG_SUPERVISOR_MODE flag requests a PASID which can be used only
- * for access to kernel addresses. No IOTLB flushes are automatically done
- * for kernel mappings; it is valid only for access to the kernel's static
- * 1:1 mapping of physical memory — not to vmalloc or even module mappings.
- * A future API addition may permit the use of such ranges, by means of an
- * explicit IOTLB flush call (akin to the DMA API's unmap method).
- *
- * It is unlikely that we will ever hook into flush_tlb_kernel_range() to
- * do such IOTLB flushes automatically.
- */
-#define SVM_FLAG_SUPERVISOR_MODE BIT(0)
-
#endif /* __INTEL_SVM_H__ */
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 418a1914a041..f41eb2b3c7da 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -243,8 +243,7 @@ struct iommu_ops {
int (*dev_enable_feat)(struct device *dev, enum iommu_dev_features f);
int (*dev_disable_feat)(struct device *dev, enum iommu_dev_features f);
- struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct *mm,
- void *drvdata);
+ struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct *mm);
void (*sva_unbind)(struct iommu_sva *handle);
u32 (*sva_get_pasid)(struct iommu_sva *handle);
@@ -669,8 +668,7 @@ int iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features f);
bool iommu_dev_feature_enabled(struct device *dev, enum iommu_dev_features f);
struct iommu_sva *iommu_sva_bind_device(struct device *dev,
- struct mm_struct *mm,
- void *drvdata);
+ struct mm_struct *mm);
void iommu_sva_unbind_device(struct iommu_sva *handle);
u32 iommu_sva_get_pasid(struct iommu_sva *handle);
@@ -1012,7 +1010,7 @@ iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features feat)
}
static inline struct iommu_sva *
-iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
+iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
{
return NULL;
}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index cd48590ada30..d2ba86470c42 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -754,8 +754,7 @@ bool arm_smmu_master_sva_enabled(struct arm_smmu_master *master);
int arm_smmu_master_enable_sva(struct arm_smmu_master *master);
int arm_smmu_master_disable_sva(struct arm_smmu_master *master);
bool arm_smmu_master_iopf_supported(struct arm_smmu_master *master);
-struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm,
- void *drvdata);
+struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm);
void arm_smmu_sva_unbind(struct iommu_sva *handle);
u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle);
void arm_smmu_sva_notifier_synchronize(void);
@@ -791,7 +790,7 @@ static inline bool arm_smmu_master_iopf_supported(struct arm_smmu_master *master
}
static inline struct iommu_sva *
-arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
+arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
{
return ERR_PTR(-ENODEV);
}
diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
index c2808fd081d6..66720001ba1c 100644
--- a/drivers/dma/idxd/cdev.c
+++ b/drivers/dma/idxd/cdev.c
@@ -6,7 +6,6 @@
#include <linux/pci.h>
#include <linux/device.h>
#include <linux/sched/task.h>
-#include <linux/intel-svm.h>
#include <linux/io-64-nonatomic-lo-hi.h>
#include <linux/cdev.h>
#include <linux/fs.h>
@@ -100,7 +99,7 @@ static int idxd_cdev_open(struct inode *inode, struct file *filp)
filp->private_data = ctx;
if (device_user_pasid_enabled(idxd)) {
- sva = iommu_sva_bind_device(dev, current->mm, NULL);
+ sva = iommu_sva_bind_device(dev, current->mm);
if (IS_ERR(sva)) {
rc = PTR_ERR(sva);
dev_err(dev, "pasid allocation failed: %d\n", rc);
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index 355fb3ef4cbf..00b437f4f573 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -14,7 +14,6 @@
#include <linux/io-64-nonatomic-lo-hi.h>
#include <linux/device.h>
#include <linux/idr.h>
-#include <linux/intel-svm.h>
#include <linux/iommu.h>
#include <uapi/linux/idxd.h>
#include <linux/dmaengine.h>
@@ -466,29 +465,7 @@ static struct idxd_device *idxd_alloc(struct pci_dev *pdev, struct idxd_driver_d
static int idxd_enable_system_pasid(struct idxd_device *idxd)
{
- int flags;
- unsigned int pasid;
- struct iommu_sva *sva;
-
- flags = SVM_FLAG_SUPERVISOR_MODE;
-
- sva = iommu_sva_bind_device(&idxd->pdev->dev, NULL, &flags);
- if (IS_ERR(sva)) {
- dev_warn(&idxd->pdev->dev,
- "iommu sva bind failed: %ld\n", PTR_ERR(sva));
- return PTR_ERR(sva);
- }
-
- pasid = iommu_sva_get_pasid(sva);
- if (pasid == IOMMU_PASID_INVALID) {
- iommu_sva_unbind_device(sva);
- return -ENODEV;
- }
-
- idxd->sva = sva;
- idxd->pasid = pasid;
- dev_dbg(&idxd->pdev->dev, "system pasid: %u\n", pasid);
- return 0;
+ return -EOPNOTSUPP;
}
static void idxd_disable_system_pasid(struct idxd_device *idxd)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index 1ef7bbb4acf3..f155d406c5d5 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -367,8 +367,7 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
return ERR_PTR(ret);
}
-struct iommu_sva *
-arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
+struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
{
struct iommu_sva *handle;
struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 7ee37d996e15..d04880a291c3 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -313,8 +313,7 @@ static int pasid_to_svm_sdev(struct device *dev, unsigned int pasid,
return 0;
}
-static int intel_svm_alloc_pasid(struct device *dev, struct mm_struct *mm,
- unsigned int flags)
+static int intel_svm_alloc_pasid(struct device *dev, struct mm_struct *mm)
{
ioasid_t max_pasid = dev_is_pci(dev) ?
pci_max_pasids(to_pci_dev(dev)) : intel_pasid_max_id;
@@ -324,8 +323,7 @@ static int intel_svm_alloc_pasid(struct device *dev, struct mm_struct *mm,
static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu,
struct device *dev,
- struct mm_struct *mm,
- unsigned int flags)
+ struct mm_struct *mm)
{
struct device_domain_info *info = dev_iommu_priv_get(dev);
unsigned long iflags, sflags;
@@ -341,22 +339,18 @@ static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu,
svm->pasid = mm->pasid;
svm->mm = mm;
- svm->flags = flags;
INIT_LIST_HEAD_RCU(&svm->devs);
- if (!(flags & SVM_FLAG_SUPERVISOR_MODE)) {
- svm->notifier.ops = &intel_mmuops;
- ret = mmu_notifier_register(&svm->notifier, mm);
- if (ret) {
- kfree(svm);
- return ERR_PTR(ret);
- }
+ svm->notifier.ops = &intel_mmuops;
+ ret = mmu_notifier_register(&svm->notifier, mm);
+ if (ret) {
+ kfree(svm);
+ return ERR_PTR(ret);
}
ret = pasid_private_add(svm->pasid, svm);
if (ret) {
- if (svm->notifier.ops)
- mmu_notifier_unregister(&svm->notifier, mm);
+ mmu_notifier_unregister(&svm->notifier, mm);
kfree(svm);
return ERR_PTR(ret);
}
@@ -391,9 +385,7 @@ static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu,
}
/* Setup the pasid table: */
- sflags = (flags & SVM_FLAG_SUPERVISOR_MODE) ?
- PASID_FLAG_SUPERVISOR_MODE : 0;
- sflags |= cpu_feature_enabled(X86_FEATURE_LA57) ? PASID_FLAG_FL5LP : 0;
+ sflags = cpu_feature_enabled(X86_FEATURE_LA57) ? PASID_FLAG_FL5LP : 0;
spin_lock_irqsave(&iommu->lock, iflags);
ret = intel_pasid_setup_first_level(iommu, dev, mm->pgd, mm->pasid,
FLPT_DEFAULT_DID, sflags);
@@ -410,8 +402,7 @@ static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu,
kfree(sdev);
free_svm:
if (list_empty(&svm->devs)) {
- if (svm->notifier.ops)
- mmu_notifier_unregister(&svm->notifier, mm);
+ mmu_notifier_unregister(&svm->notifier, mm);
pasid_private_remove(mm->pasid);
kfree(svm);
}
@@ -767,7 +758,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
* to unbind the mm while any page faults are outstanding.
*/
svm = pasid_private_find(req->pasid);
- if (IS_ERR_OR_NULL(svm) || (svm->flags & SVM_FLAG_SUPERVISOR_MODE))
+ if (IS_ERR_OR_NULL(svm))
goto bad_req;
}
@@ -818,40 +809,20 @@ static irqreturn_t prq_event_thread(int irq, void *d)
return IRQ_RETVAL(handled);
}
-struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
+struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm)
{
struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
- unsigned int flags = 0;
struct iommu_sva *sva;
int ret;
- if (drvdata)
- flags = *(unsigned int *)drvdata;
-
- if (flags & SVM_FLAG_SUPERVISOR_MODE) {
- if (!ecap_srs(iommu->ecap)) {
- dev_err(dev, "%s: Supervisor PASID not supported\n",
- iommu->name);
- return ERR_PTR(-EOPNOTSUPP);
- }
-
- if (mm) {
- dev_err(dev, "%s: Supervisor PASID with user provided mm\n",
- iommu->name);
- return ERR_PTR(-EINVAL);
- }
-
- mm = &init_mm;
- }
-
mutex_lock(&pasid_mutex);
- ret = intel_svm_alloc_pasid(dev, mm, flags);
+ ret = intel_svm_alloc_pasid(dev, mm);
if (ret) {
mutex_unlock(&pasid_mutex);
return ERR_PTR(ret);
}
- sva = intel_svm_bind_mm(iommu, dev, mm, flags);
+ sva = intel_svm_bind_mm(iommu, dev, mm);
mutex_unlock(&pasid_mutex);
return sva;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 0cb0750f61e8..74a0a3ec0907 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2788,7 +2788,6 @@ EXPORT_SYMBOL_GPL(iommu_dev_feature_enabled);
* iommu_sva_bind_device() - Bind a process address space to a device
* @dev: the device
* @mm: the mm to bind, caller must hold a reference to it
- * @drvdata: opaque data pointer to pass to bind callback
*
* Create a bond between device and address space, allowing the device to access
* the mm using the returned PASID. If a bond already exists between @device and
@@ -2801,7 +2800,7 @@ EXPORT_SYMBOL_GPL(iommu_dev_feature_enabled);
* On error, returns an ERR_PTR value.
*/
struct iommu_sva *
-iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
+iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
{
struct iommu_group *group;
struct iommu_sva *handle = ERR_PTR(-EINVAL);
@@ -2826,7 +2825,7 @@ iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
if (iommu_group_device_count(group) != 1)
goto out_unlock;
- handle = ops->sva_bind(dev, mm, drvdata);
+ handle = ops->sva_bind(dev, mm);
out_unlock:
mutex_unlock(&group->mutex);
diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
index 281c54003edc..3238a867ea51 100644
--- a/drivers/misc/uacce/uacce.c
+++ b/drivers/misc/uacce/uacce.c
@@ -99,7 +99,7 @@ static int uacce_bind_queue(struct uacce_device *uacce, struct uacce_queue *q)
if (!(uacce->flags & UACCE_DEV_SVA))
return 0;
- handle = iommu_sva_bind_device(uacce->parent, current->mm, NULL);
+ handle = iommu_sva_bind_device(uacce->parent, current->mm);
if (IS_ERR(handle))
return PTR_ERR(handle);
--
2.25.1
Tweak the I/O page fault handling framework to route the page faults to
the domain and call the page fault handler retrieved from the domain.
This makes the I/O page fault handling framework possible to serve more
usage scenarios as long as they have an IOMMU domain and install a page
fault handler in it. Some unused functions are also removed to avoid
dead code.
The iommu_get_domain_for_dev_pasid() which retrieves attached domain
for a {device, PASID} pair is used. It will be used by the page fault
handling framework which knows {device, PASID} reported from the iommu
driver. We have a guarantee that the SVA domain doesn't go away during
IOPF handling, because unbind() won't free the domain until all the
pending page requests have been flushed from the pipeline. The drivers
either call iopf_queue_flush_dev() explicitly, or in stall case, the
device driver is required to flush all DMAs including stalled
transactions before calling unbind().
This also renames iopf_handle_group() to iopf_handler() to avoid
confusing.
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jean-Philippe Brucker <[email protected]>
Tested-by: Zhangfei Gao <[email protected]>
Tested-by: Tony Zhu <[email protected]>
---
drivers/iommu/io-pgfault.c | 68 +++++---------------------------------
1 file changed, 9 insertions(+), 59 deletions(-)
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index aee9e033012f..d1c522f4ab34 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -69,69 +69,18 @@ static int iopf_complete_group(struct device *dev, struct iopf_fault *iopf,
return iommu_page_response(dev, &resp);
}
-static enum iommu_page_response_code
-iopf_handle_single(struct iopf_fault *iopf)
-{
- vm_fault_t ret;
- struct mm_struct *mm;
- struct vm_area_struct *vma;
- unsigned int access_flags = 0;
- unsigned int fault_flags = FAULT_FLAG_REMOTE;
- struct iommu_fault_page_request *prm = &iopf->fault.prm;
- enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID;
-
- if (!(prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID))
- return status;
-
- mm = iommu_sva_find(prm->pasid);
- if (IS_ERR_OR_NULL(mm))
- return status;
-
- mmap_read_lock(mm);
-
- vma = find_extend_vma(mm, prm->addr);
- if (!vma)
- /* Unmapped area */
- goto out_put_mm;
-
- if (prm->perm & IOMMU_FAULT_PERM_READ)
- access_flags |= VM_READ;
-
- if (prm->perm & IOMMU_FAULT_PERM_WRITE) {
- access_flags |= VM_WRITE;
- fault_flags |= FAULT_FLAG_WRITE;
- }
-
- if (prm->perm & IOMMU_FAULT_PERM_EXEC) {
- access_flags |= VM_EXEC;
- fault_flags |= FAULT_FLAG_INSTRUCTION;
- }
-
- if (!(prm->perm & IOMMU_FAULT_PERM_PRIV))
- fault_flags |= FAULT_FLAG_USER;
-
- if (access_flags & ~vma->vm_flags)
- /* Access fault */
- goto out_put_mm;
-
- ret = handle_mm_fault(vma, prm->addr, fault_flags, NULL);
- status = ret & VM_FAULT_ERROR ? IOMMU_PAGE_RESP_INVALID :
- IOMMU_PAGE_RESP_SUCCESS;
-
-out_put_mm:
- mmap_read_unlock(mm);
- mmput(mm);
-
- return status;
-}
-
-static void iopf_handle_group(struct work_struct *work)
+static void iopf_handler(struct work_struct *work)
{
struct iopf_group *group;
+ struct iommu_domain *domain;
struct iopf_fault *iopf, *next;
enum iommu_page_response_code status = IOMMU_PAGE_RESP_SUCCESS;
group = container_of(work, struct iopf_group, work);
+ domain = iommu_get_domain_for_dev_pasid(group->dev,
+ group->last_fault.fault.prm.pasid);
+ if (!domain || !domain->iopf_handler)
+ status = IOMMU_PAGE_RESP_INVALID;
list_for_each_entry_safe(iopf, next, &group->faults, list) {
/*
@@ -139,7 +88,8 @@ static void iopf_handle_group(struct work_struct *work)
* faults in the group if there is an error.
*/
if (status == IOMMU_PAGE_RESP_SUCCESS)
- status = iopf_handle_single(iopf);
+ status = domain->iopf_handler(&iopf->fault,
+ domain->fault_data);
if (!(iopf->fault.prm.flags &
IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE))
@@ -242,7 +192,7 @@ int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
group->last_fault.fault = *fault;
INIT_LIST_HEAD(&group->faults);
list_add(&group->last_fault.list, &group->faults);
- INIT_WORK(&group->work, iopf_handle_group);
+ INIT_WORK(&group->work, iopf_handler);
/* See if we have partial faults for this group */
list_for_each_entry_safe(iopf, next, &iopf_param->partial, list) {
--
2.25.1
Rename iommu-sva-lib.c[h] to iommu-sva.c[h] as it contains all code
for SVA implementation in iommu core.
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jean-Philippe Brucker <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Tested-by: Zhangfei Gao <[email protected]>
Tested-by: Tony Zhu <[email protected]>
---
drivers/iommu/{iommu-sva-lib.h => iommu-sva.h} | 6 +++---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 2 +-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 +-
drivers/iommu/intel/iommu.c | 2 +-
drivers/iommu/intel/svm.c | 2 +-
drivers/iommu/io-pgfault.c | 2 +-
drivers/iommu/{iommu-sva-lib.c => iommu-sva.c} | 2 +-
drivers/iommu/iommu.c | 2 +-
drivers/iommu/Makefile | 2 +-
9 files changed, 11 insertions(+), 11 deletions(-)
rename drivers/iommu/{iommu-sva-lib.h => iommu-sva.h} (95%)
rename drivers/iommu/{iommu-sva-lib.c => iommu-sva.c} (99%)
diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva.h
similarity index 95%
rename from drivers/iommu/iommu-sva-lib.h
rename to drivers/iommu/iommu-sva.h
index 1b3ace4b5863..7215a761b962 100644
--- a/drivers/iommu/iommu-sva-lib.h
+++ b/drivers/iommu/iommu-sva.h
@@ -2,8 +2,8 @@
/*
* SVA library for IOMMU drivers
*/
-#ifndef _IOMMU_SVA_LIB_H
-#define _IOMMU_SVA_LIB_H
+#ifndef _IOMMU_SVA_H
+#define _IOMMU_SVA_H
#include <linux/ioasid.h>
#include <linux/mm_types.h>
@@ -72,4 +72,4 @@ iommu_sva_handle_iopf(struct iommu_fault *fault, void *data)
return IOMMU_PAGE_RESP_INVALID;
}
#endif /* CONFIG_IOMMU_SVA */
-#endif /* _IOMMU_SVA_LIB_H */
+#endif /* _IOMMU_SVA_H */
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index e36c689f56c5..b33bc592ccfa 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -10,7 +10,7 @@
#include <linux/slab.h>
#include "arm-smmu-v3.h"
-#include "../../iommu-sva-lib.h"
+#include "../../iommu-sva.h"
#include "../../io-pgtable-arm.h"
struct arm_smmu_mmu_notifier {
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 8b9b78c7a67d..79e8991e9181 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -31,7 +31,7 @@
#include <linux/amba/bus.h>
#include "arm-smmu-v3.h"
-#include "../../iommu-sva-lib.h"
+#include "../../iommu-sva.h"
static bool disable_bypass = true;
module_param(disable_bypass, bool, 0444);
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 37d68eda1889..d16ab6d1cc05 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -27,7 +27,7 @@
#include <linux/tboot.h>
#include "../irq_remapping.h"
-#include "../iommu-sva-lib.h"
+#include "../iommu-sva.h"
#include "pasid.h"
#include "cap_audit.h"
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index db55b06cafdf..58656a93b201 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -25,7 +25,7 @@
#include "pasid.h"
#include "perf.h"
-#include "../iommu-sva-lib.h"
+#include "../iommu-sva.h"
static irqreturn_t prq_event_thread(int irq, void *d);
static void intel_svm_drain_prq(struct device *dev, u32 pasid);
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index d1c522f4ab34..7a60b123e6b9 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -11,7 +11,7 @@
#include <linux/slab.h>
#include <linux/workqueue.h>
-#include "iommu-sva-lib.h"
+#include "iommu-sva.h"
/**
* struct iopf_queue - IO Page Fault queue
diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva.c
similarity index 99%
rename from drivers/iommu/iommu-sva-lib.c
rename to drivers/iommu/iommu-sva.c
index 536d34855c74..21ffbf1ac39e 100644
--- a/drivers/iommu/iommu-sva-lib.c
+++ b/drivers/iommu/iommu-sva.c
@@ -6,7 +6,7 @@
#include <linux/sched/mm.h>
#include <linux/iommu.h>
-#include "iommu-sva-lib.h"
+#include "iommu-sva.h"
static DEFINE_MUTEX(iommu_sva_lock);
static DECLARE_IOASID_SET(iommu_sva_pasid);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index c6e9c8e82771..8b9350e10d0b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -29,7 +29,7 @@
#include <trace/events/iommu.h>
#include <linux/sched/mm.h>
-#include "iommu-sva-lib.h"
+#include "iommu-sva.h"
static struct kset *iommu_group_kset;
static DEFINE_IDA(iommu_group_ida);
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 44475a9b3eea..c1763476162b 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -27,6 +27,6 @@ obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
-obj-$(CONFIG_IOMMU_SVA) += iommu-sva-lib.o io-pgfault.o
+obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o io-pgfault.o
obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o
obj-$(CONFIG_APPLE_DART) += apple-dart.o
--
2.25.1
Add support for SVA domain allocation and provide an SVA-specific
iommu_domain_ops.
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jean-Philippe Brucker <[email protected]>
Tested-by: Zhangfei Gao <[email protected]>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 6 ++
.../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 69 +++++++++++++++++++
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 3 +
3 files changed, 78 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index d2ba86470c42..96399dd3a67a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -758,6 +758,7 @@ struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm);
void arm_smmu_sva_unbind(struct iommu_sva *handle);
u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle);
void arm_smmu_sva_notifier_synchronize(void);
+struct iommu_domain *arm_smmu_sva_domain_alloc(void);
#else /* CONFIG_ARM_SMMU_V3_SVA */
static inline bool arm_smmu_sva_supported(struct arm_smmu_device *smmu)
{
@@ -803,5 +804,10 @@ static inline u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle)
}
static inline void arm_smmu_sva_notifier_synchronize(void) {}
+
+static inline struct iommu_domain *arm_smmu_sva_domain_alloc(void)
+{
+ return NULL;
+}
#endif /* CONFIG_ARM_SMMU_V3_SVA */
#endif /* _ARM_SMMU_V3_H */
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index f155d406c5d5..fc4555dac5b4 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -549,3 +549,72 @@ void arm_smmu_sva_notifier_synchronize(void)
*/
mmu_notifier_synchronize();
}
+
+static int arm_smmu_sva_set_dev_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t id)
+{
+ int ret = 0;
+ struct mm_struct *mm;
+ struct iommu_sva *handle;
+
+ if (domain->type != IOMMU_DOMAIN_SVA)
+ return -EINVAL;
+
+ mm = domain->mm;
+ if (WARN_ON(!mm))
+ return -ENODEV;
+
+ mutex_lock(&sva_lock);
+ handle = __arm_smmu_sva_bind(dev, mm);
+ if (IS_ERR(handle))
+ ret = PTR_ERR(handle);
+ mutex_unlock(&sva_lock);
+
+ return ret;
+}
+
+static void arm_smmu_sva_block_dev_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t id)
+{
+ struct mm_struct *mm = domain->mm;
+ struct arm_smmu_bond *bond = NULL, *t;
+ struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+
+ mutex_lock(&sva_lock);
+ list_for_each_entry(t, &master->bonds, list) {
+ if (t->mm == mm) {
+ bond = t;
+ break;
+ }
+ }
+
+ if (!WARN_ON(!bond) && refcount_dec_and_test(&bond->refs)) {
+ list_del(&bond->list);
+ arm_smmu_mmu_notifier_put(bond->smmu_mn);
+ kfree(bond);
+ }
+ mutex_unlock(&sva_lock);
+}
+
+static void arm_smmu_sva_domain_free(struct iommu_domain *domain)
+{
+ kfree(domain);
+}
+
+static const struct iommu_domain_ops arm_smmu_sva_domain_ops = {
+ .set_dev_pasid = arm_smmu_sva_set_dev_pasid,
+ .block_dev_pasid = arm_smmu_sva_block_dev_pasid,
+ .free = arm_smmu_sva_domain_free,
+};
+
+struct iommu_domain *arm_smmu_sva_domain_alloc(void)
+{
+ struct iommu_domain *domain;
+
+ domain = kzalloc(sizeof(*domain), GFP_KERNEL);
+ if (!domain)
+ return NULL;
+ domain->ops = &arm_smmu_sva_domain_ops;
+
+ return domain;
+}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index ae8ec8df47c1..a30b252e2f95 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1999,6 +1999,9 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
{
struct arm_smmu_domain *smmu_domain;
+ if (type == IOMMU_DOMAIN_SVA)
+ return arm_smmu_sva_domain_alloc();
+
if (type != IOMMU_DOMAIN_UNMANAGED &&
type != IOMMU_DOMAIN_DMA &&
type != IOMMU_DOMAIN_DMA_FQ &&
--
2.25.1
These ops'es have been replaced with the dev_attach/detach_pasid domain
ops'es. There's no need for them anymore. Remove them to avoid dead
code.
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jean-Philippe Brucker <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Tested-by: Zhangfei Gao <[email protected]>
Tested-by: Tony Zhu <[email protected]>
---
include/linux/intel-iommu.h | 3 --
include/linux/iommu.h | 7 ---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 16 ------
.../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 40 ---------------
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 3 --
drivers/iommu/intel/iommu.c | 3 --
drivers/iommu/intel/svm.c | 49 -------------------
7 files changed, 121 deletions(-)
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 9007428a68f1..5bd19c95a926 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -738,9 +738,6 @@ struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn);
extern void intel_svm_check(struct intel_iommu *iommu);
extern int intel_svm_enable_prq(struct intel_iommu *iommu);
extern int intel_svm_finish_prq(struct intel_iommu *iommu);
-struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm);
-void intel_svm_unbind(struct iommu_sva *handle);
-u32 intel_svm_get_pasid(struct iommu_sva *handle);
int intel_svm_page_response(struct device *dev, struct iommu_fault_event *evt,
struct iommu_page_response *msg);
struct iommu_domain *intel_svm_domain_alloc(void);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index f59b0ecd3995..ae0cfca064e6 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -227,9 +227,6 @@ struct iommu_iotlb_gather {
* @dev_has/enable/disable_feat: per device entries to check/enable/disable
* iommu specific features.
* @dev_feat_enabled: check enabled feature
- * @sva_bind: Bind process address space to device
- * @sva_unbind: Unbind process address space from device
- * @sva_get_pasid: Get PASID associated to a SVA handle
* @page_response: handle page request response
* @def_domain_type: device default domain type, return value:
* - IOMMU_DOMAIN_IDENTITY: must use an identity domain
@@ -263,10 +260,6 @@ struct iommu_ops {
int (*dev_enable_feat)(struct device *dev, enum iommu_dev_features f);
int (*dev_disable_feat)(struct device *dev, enum iommu_dev_features f);
- struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct *mm);
- void (*sva_unbind)(struct iommu_sva *handle);
- u32 (*sva_get_pasid)(struct iommu_sva *handle);
-
int (*page_response)(struct device *dev,
struct iommu_fault_event *evt,
struct iommu_page_response *msg);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 96399dd3a67a..15dd4c7e6d3a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -754,9 +754,6 @@ bool arm_smmu_master_sva_enabled(struct arm_smmu_master *master);
int arm_smmu_master_enable_sva(struct arm_smmu_master *master);
int arm_smmu_master_disable_sva(struct arm_smmu_master *master);
bool arm_smmu_master_iopf_supported(struct arm_smmu_master *master);
-struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm);
-void arm_smmu_sva_unbind(struct iommu_sva *handle);
-u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle);
void arm_smmu_sva_notifier_synchronize(void);
struct iommu_domain *arm_smmu_sva_domain_alloc(void);
#else /* CONFIG_ARM_SMMU_V3_SVA */
@@ -790,19 +787,6 @@ static inline bool arm_smmu_master_iopf_supported(struct arm_smmu_master *master
return false;
}
-static inline struct iommu_sva *
-arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
-{
- return ERR_PTR(-ENODEV);
-}
-
-static inline void arm_smmu_sva_unbind(struct iommu_sva *handle) {}
-
-static inline u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle)
-{
- return IOMMU_PASID_INVALID;
-}
-
static inline void arm_smmu_sva_notifier_synchronize(void) {}
static inline struct iommu_domain *arm_smmu_sva_domain_alloc(void)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index fc4555dac5b4..e36c689f56c5 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -344,11 +344,6 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
if (!bond)
return ERR_PTR(-ENOMEM);
- /* Allocate a PASID for this mm if necessary */
- ret = iommu_sva_alloc_pasid(mm, 1, (1U << master->ssid_bits) - 1);
- if (ret)
- goto err_free_bond;
-
bond->mm = mm;
bond->sva.dev = dev;
refcount_set(&bond->refs, 1);
@@ -367,41 +362,6 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
return ERR_PTR(ret);
}
-struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
-{
- struct iommu_sva *handle;
- struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
- struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
-
- if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
- return ERR_PTR(-EINVAL);
-
- mutex_lock(&sva_lock);
- handle = __arm_smmu_sva_bind(dev, mm);
- mutex_unlock(&sva_lock);
- return handle;
-}
-
-void arm_smmu_sva_unbind(struct iommu_sva *handle)
-{
- struct arm_smmu_bond *bond = sva_to_bond(handle);
-
- mutex_lock(&sva_lock);
- if (refcount_dec_and_test(&bond->refs)) {
- list_del(&bond->list);
- arm_smmu_mmu_notifier_put(bond->smmu_mn);
- kfree(bond);
- }
- mutex_unlock(&sva_lock);
-}
-
-u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle)
-{
- struct arm_smmu_bond *bond = sva_to_bond(handle);
-
- return bond->mm->pasid;
-}
-
bool arm_smmu_sva_supported(struct arm_smmu_device *smmu)
{
unsigned long reg, fld;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index a30b252e2f95..8b9b78c7a67d 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2855,9 +2855,6 @@ static struct iommu_ops arm_smmu_ops = {
.dev_feat_enabled = arm_smmu_dev_feature_enabled,
.dev_enable_feat = arm_smmu_dev_enable_feature,
.dev_disable_feat = arm_smmu_dev_disable_feature,
- .sva_bind = arm_smmu_sva_bind,
- .sva_unbind = arm_smmu_sva_unbind,
- .sva_get_pasid = arm_smmu_sva_get_pasid,
.page_response = arm_smmu_page_response,
.pgsize_bitmap = -1UL, /* Restricted during device attach */
.owner = THIS_MODULE,
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 993a1ce509a8..37d68eda1889 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4921,9 +4921,6 @@ const struct iommu_ops intel_iommu_ops = {
.def_domain_type = device_def_domain_type,
.pgsize_bitmap = SZ_4K,
#ifdef CONFIG_INTEL_IOMMU_SVM
- .sva_bind = intel_svm_bind,
- .sva_unbind = intel_svm_unbind,
- .sva_get_pasid = intel_svm_get_pasid,
.page_response = intel_svm_page_response,
#endif
.default_domain_ops = &(const struct iommu_domain_ops) {
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 7d4f9d173013..db55b06cafdf 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -313,14 +313,6 @@ static int pasid_to_svm_sdev(struct device *dev, unsigned int pasid,
return 0;
}
-static int intel_svm_alloc_pasid(struct device *dev, struct mm_struct *mm)
-{
- ioasid_t max_pasid = dev_is_pci(dev) ?
- pci_max_pasids(to_pci_dev(dev)) : intel_pasid_max_id;
-
- return iommu_sva_alloc_pasid(mm, PASID_MIN, max_pasid - 1);
-}
-
static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu,
struct device *dev,
struct mm_struct *mm)
@@ -809,47 +801,6 @@ static irqreturn_t prq_event_thread(int irq, void *d)
return IRQ_RETVAL(handled);
}
-struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm)
-{
- struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
- struct iommu_sva *sva;
- int ret;
-
- mutex_lock(&pasid_mutex);
- ret = intel_svm_alloc_pasid(dev, mm);
- if (ret) {
- mutex_unlock(&pasid_mutex);
- return ERR_PTR(ret);
- }
-
- sva = intel_svm_bind_mm(iommu, dev, mm);
- mutex_unlock(&pasid_mutex);
-
- return sva;
-}
-
-void intel_svm_unbind(struct iommu_sva *sva)
-{
- struct intel_svm_dev *sdev = to_intel_svm_dev(sva);
-
- mutex_lock(&pasid_mutex);
- intel_svm_unbind_mm(sdev->dev, sdev->pasid);
- mutex_unlock(&pasid_mutex);
-}
-
-u32 intel_svm_get_pasid(struct iommu_sva *sva)
-{
- struct intel_svm_dev *sdev;
- u32 pasid;
-
- mutex_lock(&pasid_mutex);
- sdev = to_intel_svm_dev(sva);
- pasid = sdev->pasid;
- mutex_unlock(&pasid_mutex);
-
- return pasid;
-}
-
int intel_svm_page_response(struct device *dev,
struct iommu_fault_event *evt,
struct iommu_page_response *msg)
--
2.25.1
This adds some mechanisms around the iommu_domain so that the I/O page
fault handling framework could route a page fault to the domain and
call the fault handler from it.
Add pointers to the page fault handler and its private data in struct
iommu_domain. The fault handler will be called with the private data
as a parameter once a page fault is routed to the domain. Any kernel
component which owns an iommu domain could install handler and its
private parameter so that the page fault could be further routed and
handled.
This also prepares the SVA implementation to be the first consumer of
the per-domain page fault handling model. The I/O page fault handler
for SVA is copied to the SVA file with mmget_not_zero() added before
mmap_read_lock().
Suggested-by: Jean-Philippe Brucker <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jean-Philippe Brucker <[email protected]>
Tested-by: Zhangfei Gao <[email protected]>
Tested-by: Tony Zhu <[email protected]>
---
include/linux/iommu.h | 3 ++
drivers/iommu/iommu-sva-lib.h | 8 +++++
drivers/iommu/io-pgfault.c | 7 +++++
drivers/iommu/iommu-sva-lib.c | 58 +++++++++++++++++++++++++++++++++++
drivers/iommu/iommu.c | 4 +++
5 files changed, 80 insertions(+)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index ae0cfca064e6..47610f21d451 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -105,6 +105,9 @@ struct iommu_domain {
unsigned long pgsize_bitmap; /* Bitmap of page sizes in use */
struct iommu_domain_geometry geometry;
struct iommu_dma_cookie *iova_cookie;
+ enum iommu_page_response_code (*iopf_handler)(struct iommu_fault *fault,
+ void *data);
+ void *fault_data;
union {
struct {
iommu_fault_handler_t handler;
diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva-lib.h
index 8909ea1094e3..1b3ace4b5863 100644
--- a/drivers/iommu/iommu-sva-lib.h
+++ b/drivers/iommu/iommu-sva-lib.h
@@ -26,6 +26,8 @@ int iopf_queue_flush_dev(struct device *dev);
struct iopf_queue *iopf_queue_alloc(const char *name);
void iopf_queue_free(struct iopf_queue *queue);
int iopf_queue_discard_partial(struct iopf_queue *queue);
+enum iommu_page_response_code
+iommu_sva_handle_iopf(struct iommu_fault *fault, void *data);
#else /* CONFIG_IOMMU_SVA */
static inline int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
@@ -63,5 +65,11 @@ static inline int iopf_queue_discard_partial(struct iopf_queue *queue)
{
return -ENODEV;
}
+
+static inline enum iommu_page_response_code
+iommu_sva_handle_iopf(struct iommu_fault *fault, void *data)
+{
+ return IOMMU_PAGE_RESP_INVALID;
+}
#endif /* CONFIG_IOMMU_SVA */
#endif /* _IOMMU_SVA_LIB_H */
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 1df8c1dcae77..aee9e033012f 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -181,6 +181,13 @@ static void iopf_handle_group(struct work_struct *work)
* request completes, outstanding faults will have been dealt with by the time
* the PASID is freed.
*
+ * Any valid page fault will be eventually routed to an iommu domain and the
+ * page fault handler installed there will get called. The users of this
+ * handling framework should guarantee that the iommu domain could only be
+ * freed after the device has stopped generating page faults (or the iommu
+ * hardware has been set to block the page faults) and the pending page faults
+ * have been flushed.
+ *
* Return: 0 on success and <0 on error.
*/
int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c
index 751366980232..536d34855c74 100644
--- a/drivers/iommu/iommu-sva-lib.c
+++ b/drivers/iommu/iommu-sva-lib.c
@@ -167,3 +167,61 @@ u32 iommu_sva_get_pasid(struct iommu_sva *handle)
return domain->mm->pasid;
}
EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
+
+/*
+ * I/O page fault handler for SVA
+ */
+enum iommu_page_response_code
+iommu_sva_handle_iopf(struct iommu_fault *fault, void *data)
+{
+ vm_fault_t ret;
+ struct vm_area_struct *vma;
+ struct mm_struct *mm = data;
+ unsigned int access_flags = 0;
+ unsigned int fault_flags = FAULT_FLAG_REMOTE;
+ struct iommu_fault_page_request *prm = &fault->prm;
+ enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID;
+
+ if (!(prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID))
+ return status;
+
+ if (IS_ERR_OR_NULL(mm) || !mmget_not_zero(mm))
+ return status;
+
+ mmap_read_lock(mm);
+
+ vma = find_extend_vma(mm, prm->addr);
+ if (!vma)
+ /* Unmapped area */
+ goto out_put_mm;
+
+ if (prm->perm & IOMMU_FAULT_PERM_READ)
+ access_flags |= VM_READ;
+
+ if (prm->perm & IOMMU_FAULT_PERM_WRITE) {
+ access_flags |= VM_WRITE;
+ fault_flags |= FAULT_FLAG_WRITE;
+ }
+
+ if (prm->perm & IOMMU_FAULT_PERM_EXEC) {
+ access_flags |= VM_EXEC;
+ fault_flags |= FAULT_FLAG_INSTRUCTION;
+ }
+
+ if (!(prm->perm & IOMMU_FAULT_PERM_PRIV))
+ fault_flags |= FAULT_FLAG_USER;
+
+ if (access_flags & ~vma->vm_flags)
+ /* Access fault */
+ goto out_put_mm;
+
+ ret = handle_mm_fault(vma, prm->addr, fault_flags, NULL);
+ status = ret & VM_FAULT_ERROR ? IOMMU_PAGE_RESP_INVALID :
+ IOMMU_PAGE_RESP_SUCCESS;
+
+out_put_mm:
+ mmap_read_unlock(mm);
+ mmput(mm);
+
+ return status;
+}
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index e1491eb3c7b6..c6e9c8e82771 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -29,6 +29,8 @@
#include <trace/events/iommu.h>
#include <linux/sched/mm.h>
+#include "iommu-sva-lib.h"
+
static struct kset *iommu_group_kset;
static DEFINE_IDA(iommu_group_ida);
@@ -3199,6 +3201,8 @@ struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
domain->type = IOMMU_DOMAIN_SVA;
mmgrab(mm);
domain->mm = mm;
+ domain->iopf_handler = iommu_sva_handle_iopf;
+ domain->fault_data = mm;
return domain;
}
--
2.25.1
Add support for SVA domain allocation and provide an SVA-specific
iommu_domain_ops.
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Tested-by: Tony Zhu <[email protected]>
---
include/linux/intel-iommu.h | 5 ++++
drivers/iommu/intel/iommu.c | 2 ++
drivers/iommu/intel/svm.c | 49 +++++++++++++++++++++++++++++++++++++
3 files changed, 56 insertions(+)
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 31e3edc0fc7e..9007428a68f1 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -743,6 +743,7 @@ void intel_svm_unbind(struct iommu_sva *handle);
u32 intel_svm_get_pasid(struct iommu_sva *handle);
int intel_svm_page_response(struct device *dev, struct iommu_fault_event *evt,
struct iommu_page_response *msg);
+struct iommu_domain *intel_svm_domain_alloc(void);
struct intel_svm_dev {
struct list_head list;
@@ -768,6 +769,10 @@ struct intel_svm {
};
#else
static inline void intel_svm_check(struct intel_iommu *iommu) {}
+static inline struct iommu_domain *intel_svm_domain_alloc(void)
+{
+ return NULL;
+}
#endif
#ifdef CONFIG_INTEL_IOMMU_DEBUGFS
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 44016594831d..993a1ce509a8 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4298,6 +4298,8 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type)
return domain;
case IOMMU_DOMAIN_IDENTITY:
return &si_domain->domain;
+ case IOMMU_DOMAIN_SVA:
+ return intel_svm_domain_alloc();
default:
return NULL;
}
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index d04880a291c3..7d4f9d173013 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -931,3 +931,52 @@ int intel_svm_page_response(struct device *dev,
mutex_unlock(&pasid_mutex);
return ret;
}
+
+static int intel_svm_set_dev_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
+{
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+ struct intel_iommu *iommu = info->iommu;
+ struct mm_struct *mm = domain->mm;
+ struct iommu_sva *sva;
+ int ret = 0;
+
+ mutex_lock(&pasid_mutex);
+ sva = intel_svm_bind_mm(iommu, dev, mm);
+ if (IS_ERR(sva))
+ ret = PTR_ERR(sva);
+ mutex_unlock(&pasid_mutex);
+
+ return ret;
+}
+
+static void intel_svm_block_dev_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
+{
+ mutex_lock(&pasid_mutex);
+ intel_svm_unbind_mm(dev, pasid);
+ mutex_unlock(&pasid_mutex);
+}
+
+static void intel_svm_domain_free(struct iommu_domain *domain)
+{
+ kfree(to_dmar_domain(domain));
+}
+
+static const struct iommu_domain_ops intel_svm_domain_ops = {
+ .set_dev_pasid = intel_svm_set_dev_pasid,
+ .block_dev_pasid = intel_svm_block_dev_pasid,
+ .free = intel_svm_domain_free,
+};
+
+struct iommu_domain *intel_svm_domain_alloc(void)
+{
+ struct dmar_domain *domain;
+
+ domain = kzalloc(sizeof(*domain), GFP_KERNEL);
+ if (!domain)
+ return NULL;
+ domain->domain.ops = &intel_svm_domain_ops;
+
+ return &domain->domain;
+}
--
2.25.1
The existing iommu SVA interfaces are implemented by calling the SVA
specific iommu ops provided by the IOMMU drivers. There's no need for
any SVA specific ops in iommu_ops vector anymore as we can achieve
this through the generic attach/detach_dev_pasid domain ops.
This refactors the IOMMU SVA interfaces implementation by using the
set/block_pasid_dev ops and align them with the concept of the SVA
iommu domain. Put the new SVA code in the sva related file in order
to make it self-contained.
Signed-off-by: Lu Baolu <[email protected]>
Tested-by: Zhangfei Gao <[email protected]>
Tested-by: Tony Zhu <[email protected]>
---
include/linux/iommu.h | 67 +++++++++++--------
drivers/iommu/iommu-sva-lib.c | 98 ++++++++++++++++++++++++++++
drivers/iommu/iommu.c | 119 ++++++++--------------------------
3 files changed, 165 insertions(+), 119 deletions(-)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 42f0418dc22c..f59b0ecd3995 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -39,7 +39,6 @@ struct device;
struct iommu_domain;
struct iommu_domain_ops;
struct notifier_block;
-struct iommu_sva;
struct iommu_fault_event;
struct iommu_dma_cookie;
@@ -57,6 +56,14 @@ struct iommu_domain_geometry {
bool force_aperture; /* DMA only allowed in mappable range? */
};
+/**
+ * struct iommu_sva - handle to a device-mm bond
+ */
+struct iommu_sva {
+ struct device *dev;
+ refcount_t users;
+};
+
/* Domain feature flags */
#define __IOMMU_DOMAIN_PAGING (1U << 0) /* Support for iommu_map/unmap */
#define __IOMMU_DOMAIN_DMA_API (1U << 1) /* Domain for use in DMA-API
@@ -105,6 +112,7 @@ struct iommu_domain {
};
struct { /* IOMMU_DOMAIN_SVA */
struct mm_struct *mm;
+ struct iommu_sva bond;
};
};
};
@@ -638,13 +646,6 @@ struct iommu_fwspec {
/* ATS is supported */
#define IOMMU_FWSPEC_PCI_RC_ATS (1 << 0)
-/**
- * struct iommu_sva - handle to a device-mm bond
- */
-struct iommu_sva {
- struct device *dev;
-};
-
int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
const struct iommu_ops *ops);
void iommu_fwspec_free(struct device *dev);
@@ -685,11 +686,6 @@ int iommu_dev_enable_feature(struct device *dev, enum iommu_dev_features f);
int iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features f);
bool iommu_dev_feature_enabled(struct device *dev, enum iommu_dev_features f);
-struct iommu_sva *iommu_sva_bind_device(struct device *dev,
- struct mm_struct *mm);
-void iommu_sva_unbind_device(struct iommu_sva *handle);
-u32 iommu_sva_get_pasid(struct iommu_sva *handle);
-
int iommu_device_use_default_domain(struct device *dev);
void iommu_device_unuse_default_domain(struct device *dev);
@@ -703,6 +699,8 @@ int iommu_attach_device_pasid(struct iommu_domain *domain, struct device *dev,
ioasid_t pasid);
void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev,
ioasid_t pasid);
+struct iommu_domain *
+iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid);
#else /* CONFIG_IOMMU_API */
struct iommu_ops {};
@@ -1033,21 +1031,6 @@ iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features feat)
return -ENODEV;
}
-static inline struct iommu_sva *
-iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
-{
- return NULL;
-}
-
-static inline void iommu_sva_unbind_device(struct iommu_sva *handle)
-{
-}
-
-static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle)
-{
- return IOMMU_PASID_INVALID;
-}
-
static inline struct iommu_fwspec *dev_iommu_fwspec_get(struct device *dev)
{
return NULL;
@@ -1093,6 +1076,12 @@ static inline void iommu_detach_device_pasid(struct iommu_domain *domain,
struct device *dev, ioasid_t pasid)
{
}
+
+static inline struct iommu_domain *
+iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid)
+{
+ return NULL;
+}
#endif /* CONFIG_IOMMU_API */
/**
@@ -1118,4 +1107,26 @@ void iommu_debugfs_setup(void);
static inline void iommu_debugfs_setup(void) {}
#endif
+#ifdef CONFIG_IOMMU_SVA
+struct iommu_sva *iommu_sva_bind_device(struct device *dev,
+ struct mm_struct *mm);
+void iommu_sva_unbind_device(struct iommu_sva *handle);
+u32 iommu_sva_get_pasid(struct iommu_sva *handle);
+#else
+static inline struct iommu_sva *
+iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
+{
+ return NULL;
+}
+
+static inline void iommu_sva_unbind_device(struct iommu_sva *handle)
+{
+}
+
+static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle)
+{
+ return IOMMU_PASID_INVALID;
+}
+#endif /* CONFIG_IOMMU_SVA */
+
#endif /* __LINUX_IOMMU_H */
diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c
index 106506143896..751366980232 100644
--- a/drivers/iommu/iommu-sva-lib.c
+++ b/drivers/iommu/iommu-sva-lib.c
@@ -4,6 +4,7 @@
*/
#include <linux/mutex.h>
#include <linux/sched/mm.h>
+#include <linux/iommu.h>
#include "iommu-sva-lib.h"
@@ -69,3 +70,100 @@ struct mm_struct *iommu_sva_find(ioasid_t pasid)
return ioasid_find(&iommu_sva_pasid, pasid, __mmget_not_zero);
}
EXPORT_SYMBOL_GPL(iommu_sva_find);
+
+/**
+ * iommu_sva_bind_device() - Bind a process address space to a device
+ * @dev: the device
+ * @mm: the mm to bind, caller must hold a reference to mm_users
+ *
+ * Create a bond between device and address space, allowing the device to access
+ * the mm using the returned PASID. If a bond already exists between @device and
+ * @mm, it is returned and an additional reference is taken. Caller must call
+ * iommu_sva_unbind_device() to release each reference.
+ *
+ * iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) must be called first, to
+ * initialize the required SVA features.
+ *
+ * On error, returns an ERR_PTR value.
+ */
+struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
+{
+ struct iommu_domain *domain;
+ ioasid_t max_pasids;
+ int ret = -EINVAL;
+
+ max_pasids = dev->iommu->max_pasids;
+ if (!max_pasids)
+ return ERR_PTR(-EOPNOTSUPP);
+
+ /* Allocate mm->pasid if necessary. */
+ ret = iommu_sva_alloc_pasid(mm, 1, max_pasids - 1);
+ if (ret)
+ return ERR_PTR(ret);
+
+ mutex_lock(&iommu_sva_lock);
+ /* Search for an existing domain. */
+ domain = iommu_get_domain_for_dev_pasid(dev, mm->pasid);
+ if (domain) {
+ refcount_inc(&domain->bond.users);
+ goto out_success;
+ }
+
+ /* Allocate a new domain and set it on device pasid. */
+ domain = iommu_sva_domain_alloc(dev, mm);
+ if (!domain) {
+ ret = -ENOMEM;
+ goto out_unlock;
+ }
+
+ ret = iommu_attach_device_pasid(domain, dev, mm->pasid);
+ if (ret)
+ goto out_free_domain;
+ domain->bond.dev = dev;
+ refcount_set(&domain->bond.users, 1);
+
+out_success:
+ mutex_unlock(&iommu_sva_lock);
+ return &domain->bond;
+
+out_free_domain:
+ iommu_domain_free(domain);
+out_unlock:
+ mutex_unlock(&iommu_sva_lock);
+
+ return ERR_PTR(ret);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
+
+/**
+ * iommu_sva_unbind_device() - Remove a bond created with iommu_sva_bind_device
+ * @handle: the handle returned by iommu_sva_bind_device()
+ *
+ * Put reference to a bond between device and address space. The device should
+ * not be issuing any more transaction for this PASID. All outstanding page
+ * requests for this PASID must have been flushed to the IOMMU.
+ */
+void iommu_sva_unbind_device(struct iommu_sva *handle)
+{
+ struct device *dev = handle->dev;
+ struct iommu_domain *domain =
+ container_of(handle, struct iommu_domain, bond);
+ ioasid_t pasid = iommu_sva_get_pasid(handle);
+
+ mutex_lock(&iommu_sva_lock);
+ if (refcount_dec_and_test(&domain->bond.users)) {
+ iommu_detach_device_pasid(domain, dev, pasid);
+ iommu_domain_free(domain);
+ }
+ mutex_unlock(&iommu_sva_lock);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
+
+u32 iommu_sva_get_pasid(struct iommu_sva *handle)
+{
+ struct iommu_domain *domain =
+ container_of(handle, struct iommu_domain, bond);
+
+ return domain->mm->pasid;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 10479c5e4d23..e1491eb3c7b6 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2789,97 +2789,6 @@ bool iommu_dev_feature_enabled(struct device *dev, enum iommu_dev_features feat)
}
EXPORT_SYMBOL_GPL(iommu_dev_feature_enabled);
-/**
- * iommu_sva_bind_device() - Bind a process address space to a device
- * @dev: the device
- * @mm: the mm to bind, caller must hold a reference to it
- *
- * Create a bond between device and address space, allowing the device to access
- * the mm using the returned PASID. If a bond already exists between @device and
- * @mm, it is returned and an additional reference is taken. Caller must call
- * iommu_sva_unbind_device() to release each reference.
- *
- * iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) must be called first, to
- * initialize the required SVA features.
- *
- * On error, returns an ERR_PTR value.
- */
-struct iommu_sva *
-iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
-{
- struct iommu_group *group;
- struct iommu_sva *handle = ERR_PTR(-EINVAL);
- const struct iommu_ops *ops = dev_iommu_ops(dev);
-
- if (!ops->sva_bind)
- return ERR_PTR(-ENODEV);
-
- group = iommu_group_get(dev);
- if (!group)
- return ERR_PTR(-ENODEV);
-
- /* Ensure device count and domain don't change while we're binding */
- mutex_lock(&group->mutex);
-
- /*
- * To keep things simple, SVA currently doesn't support IOMMU groups
- * with more than one device. Existing SVA-capable systems are not
- * affected by the problems that required IOMMU groups (lack of ACS
- * isolation, device ID aliasing and other hardware issues).
- */
- if (iommu_group_device_count(group) != 1)
- goto out_unlock;
-
- handle = ops->sva_bind(dev, mm);
-
-out_unlock:
- mutex_unlock(&group->mutex);
- iommu_group_put(group);
-
- return handle;
-}
-EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
-
-/**
- * iommu_sva_unbind_device() - Remove a bond created with iommu_sva_bind_device
- * @handle: the handle returned by iommu_sva_bind_device()
- *
- * Put reference to a bond between device and address space. The device should
- * not be issuing any more transaction for this PASID. All outstanding page
- * requests for this PASID must have been flushed to the IOMMU.
- */
-void iommu_sva_unbind_device(struct iommu_sva *handle)
-{
- struct iommu_group *group;
- struct device *dev = handle->dev;
- const struct iommu_ops *ops = dev_iommu_ops(dev);
-
- if (!ops->sva_unbind)
- return;
-
- group = iommu_group_get(dev);
- if (!group)
- return;
-
- mutex_lock(&group->mutex);
- ops->sva_unbind(handle);
- mutex_unlock(&group->mutex);
-
- iommu_group_put(group);
-}
-EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
-
-u32 iommu_sva_get_pasid(struct iommu_sva *handle)
-{
- const struct iommu_ops *ops = dev_iommu_ops(handle->dev);
-
- if (!ops->sva_get_pasid)
- return IOMMU_PASID_INVALID;
-
- return ops->sva_get_pasid(handle);
-}
-EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
-
/*
* Changes the default domain of an iommu group that has *only* one device
*
@@ -3366,3 +3275,31 @@ void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev,
iommu_group_put(group);
}
+
+/*
+ * This is a variant of iommu_get_domain_for_dev(). It returns the existing
+ * domain attached to pasid of a device. It's only for internal use of the
+ * IOMMU subsystem. The caller must take care to avoid any possible
+ * use-after-free case.
+ */
+struct iommu_domain *
+iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid)
+{
+ struct iommu_domain *domain;
+ struct iommu_group *group;
+
+ if (!pasid_valid(pasid))
+ return NULL;
+
+ group = iommu_group_get(dev);
+ if (!group)
+ return NULL;
+ /*
+ * The xarray protects its internal state with RCU. Hence the domain
+ * obtained is either NULL or fully formed.
+ */
+ domain = xa_load(&group->pasid_array, pasid);
+ iommu_group_put(group);
+
+ return domain;
+}
--
2.25.1
On Tue, Jul 05, 2022 at 01:07:06PM +0800, Lu Baolu wrote:
> The existing iommu SVA interfaces are implemented by calling the SVA
> specific iommu ops provided by the IOMMU drivers. There's no need for
> any SVA specific ops in iommu_ops vector anymore as we can achieve
> this through the generic attach/detach_dev_pasid domain ops.
>
> This refactors the IOMMU SVA interfaces implementation by using the
> set/block_pasid_dev ops and align them with the concept of the SVA
> iommu domain. Put the new SVA code in the sva related file in order
> to make it self-contained.
>
> Signed-off-by: Lu Baolu <[email protected]>
> Tested-by: Zhangfei Gao <[email protected]>
> Tested-by: Tony Zhu <[email protected]>
Reviewed-by: Jean-Philippe Brucker <[email protected]>
> From: Lu Baolu <[email protected]>
> Sent: Tuesday, July 5, 2022 1:07 PM
>
> Attaching an IOMMU domain to a PASID of a device is a generic operation
> for modern IOMMU drivers which support PASID-granular DMA address
> translation. Currently visible usage scenarios include (but not limited):
>
> - SVA (Shared Virtual Address)
> - kernel DMA with PASID
> - hardware-assist mediated device
>
> This adds a pair of domain ops for this purpose and adds the interfaces
> for device drivers to attach/detach a domain to/from a {device, PASID}.
> Some buses, like PCI, route packets without considering the PASID value.
> Thus a DMA target address with PASID might be treated as P2P if the
> address falls into the MMIO BAR of other devices in the group. To make
> things simple, these interfaces only apply to devices belonging to the
> singleton groups, and the singleton is immutable in fabric (i.e. not
> affected by hotplug).
>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Jean-Philippe Brucker <[email protected]>
> Tested-by: Zhangfei Gao <[email protected]>
> Tested-by: Tony Zhu <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
> From: Lu Baolu <[email protected]>
> Sent: Tuesday, July 5, 2022 1:07 PM
>
> This adds some mechanisms around the iommu_domain so that the I/O page
> fault handling framework could route a page fault to the domain and
> call the fault handler from it.
>
> Add pointers to the page fault handler and its private data in struct
> iommu_domain. The fault handler will be called with the private data
> as a parameter once a page fault is routed to the domain. Any kernel
> component which owns an iommu domain could install handler and its
> private parameter so that the page fault could be further routed and
> handled.
>
> This also prepares the SVA implementation to be the first consumer of
> the per-domain page fault handling model. The I/O page fault handler
> for SVA is copied to the SVA file with mmget_not_zero() added before
> mmap_read_lock().
>
> Suggested-by: Jean-Philippe Brucker <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Jean-Philippe Brucker <[email protected]>
> Tested-by: Zhangfei Gao <[email protected]>
> Tested-by: Tony Zhu <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
> From: Lu Baolu <[email protected]>
> Sent: Tuesday, July 5, 2022 1:07 PM
>
> Tweak the I/O page fault handling framework to route the page faults to
> the domain and call the page fault handler retrieved from the domain.
> This makes the I/O page fault handling framework possible to serve more
> usage scenarios as long as they have an IOMMU domain and install a page
> fault handler in it. Some unused functions are also removed to avoid
> dead code.
>
> The iommu_get_domain_for_dev_pasid() which retrieves attached domain
> for a {device, PASID} pair is used. It will be used by the page fault
> handling framework which knows {device, PASID} reported from the iommu
> driver. We have a guarantee that the SVA domain doesn't go away during
> IOPF handling, because unbind() won't free the domain until all the
> pending page requests have been flushed from the pipeline. The drivers
> either call iopf_queue_flush_dev() explicitly, or in stall case, the
> device driver is required to flush all DMAs including stalled
> transactions before calling unbind().
>
> This also renames iopf_handle_group() to iopf_handler() to avoid
> confusing.
>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Jean-Philippe Brucker <[email protected]>
> Tested-by: Zhangfei Gao <[email protected]>
> Tested-by: Tony Zhu <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
> From: Lu Baolu <[email protected]>
> Sent: Tuesday, July 5, 2022 1:07 PM
>
> +struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct
> mm_struct *mm)
> +{
> + struct iommu_domain *domain;
> + ioasid_t max_pasids;
> + int ret = -EINVAL;
-EINVAL is not used.
Reviewed-by: Kevin Tian <[email protected]>
> From: Lu Baolu <[email protected]>
> Sent: Tuesday, July 5, 2022 1:07 PM
>
> + * IOMMU_DOMAIN_SVA - DMA addresses are shared process address
> + * spaces represented by mm_struct's.
> */
s/address spaces/addresses/
Reviewed-by: Kevin Tian <[email protected]>
On 2022/7/7 09:56, Tian, Kevin wrote:
>> From: Lu Baolu <[email protected]>
>> Sent: Tuesday, July 5, 2022 1:07 PM
>>
>> +struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct
>> mm_struct *mm)
>> +{
>> + struct iommu_domain *domain;
>> + ioasid_t max_pasids;
>> + int ret = -EINVAL;
>
> -EINVAL is not used.
Updated.
>
> Reviewed-by: Kevin Tian <[email protected]>
>
Best regards,
baolu
On 2022/7/7 09:52, Tian, Kevin wrote:
>> From: Lu Baolu <[email protected]>
>> Sent: Tuesday, July 5, 2022 1:07 PM
>>
>> + * IOMMU_DOMAIN_SVA - DMA addresses are shared process address
>> + * spaces represented by mm_struct's.
>> */
>
> s/address spaces/addresses/
Updated.
>
> Reviewed-by: Kevin Tian <[email protected]>
Best regards,
baolu
On Tue, Jul 05, 2022 at 01:06:59PM +0800, Lu Baolu wrote:
> Use this field to keep the number of supported PASIDs that an IOMMU
> hardware is able to support. This is a generic attribute of an IOMMU
> and lifting it into the per-IOMMU device structure makes it possible
> to allocate a PASID for device without calls into the IOMMU drivers.
> Any iommu driver that supports PASID related features should set this
> field before enabling them on the devices.
>
> In the Intel IOMMU driver, intel_iommu_sm is moved to CONFIG_INTEL_IOMMU
> enclave so that the pasid_supported() helper could be used in dmar.c
> without compilation errors.
>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Jean-Philippe Brucker <[email protected]>
> Reviewed-by: Kevin Tian <[email protected]>
> Tested-by: Zhangfei Gao <[email protected]>
> Tested-by: Tony Zhu <[email protected]>
> ---
> include/linux/intel-iommu.h | 3 ++-
> include/linux/iommu.h | 2 ++
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 1 +
> drivers/iommu/intel/dmar.c | 7 +++++++
> 4 files changed, 12 insertions(+), 1 deletion(-)
Reviewed-by: Jason Gunthorpe <[email protected]>
Jason
On Tue, Jul 05, 2022 at 01:07:00PM +0800, Lu Baolu wrote:
> Use this field to save the number of PASIDs that a device is able to
> consume. It is a generic attribute of a device and lifting it into the
> per-device dev_iommu struct could help to avoid the boilerplate code
> in various IOMMU drivers.
>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Kevin Tian <[email protected]>
> Tested-by: Zhangfei Gao <[email protected]>
> Tested-by: Tony Zhu <[email protected]>
> ---
> include/linux/iommu.h | 2 ++
> drivers/iommu/iommu.c | 20 ++++++++++++++++++++
> 2 files changed, 22 insertions(+)
Reviewed-by: Jason Gunthorpe <[email protected]>
Jason
On Tue, Jul 05, 2022 at 01:07:02PM +0800, Lu Baolu wrote:
> Attaching an IOMMU domain to a PASID of a device is a generic operation
> for modern IOMMU drivers which support PASID-granular DMA address
> translation. Currently visible usage scenarios include (but not limited):
>
> - SVA (Shared Virtual Address)
> - kernel DMA with PASID
> - hardware-assist mediated device
>
> This adds a pair of domain ops for this purpose and adds the interfaces
> for device drivers to attach/detach a domain to/from a {device, PASID}.
> Some buses, like PCI, route packets without considering the PASID
> value.
Below the comments touch on ACS, so this is a bit out of date
> +static bool iommu_group_immutable_singleton(struct iommu_group *group,
> + struct device *dev)
> +{
> + int count;
> +
> + mutex_lock(&group->mutex);
> + count = iommu_group_device_count(group);
> + mutex_unlock(&group->mutex);
> +
> + if (count != 1)
> + return false;
> +
> + /*
> + * The PCI device could be considered to be fully isolated if all
> + * devices on the path from the device to the host-PCI bridge are
> + * protected from peer-to-peer DMA by ACS.
> + */
> + if (dev_is_pci(dev))
> + return pci_acs_path_enabled(to_pci_dev(dev), NULL,
> + REQ_ACS_FLAGS);
You might want to explain what condition causes ACS isolated devices
to share a group in the first place..
> +
> + /*
> + * Otherwise, the device came from DT/ACPI, assume it is static and
> + * then singleton can know from the device count in the group.
> + */
> + return true;
> +}
I would be happer if probe was changed to refuse to add a device to a
group if the group's pasid xarray is not empty, as a protective
measure.
> +int iommu_attach_device_pasid(struct iommu_domain *domain, struct device *dev,
> + ioasid_t pasid)
> +{
> + struct iommu_group *group;
> + void *curr;
> + int ret;
> +
> + if (!domain->ops->set_dev_pasid)
> + return -EOPNOTSUPP;
> +
> + group = iommu_group_get(dev);
> + if (!group || !iommu_group_immutable_singleton(group, dev)) {
> + iommu_group_put(group);
> + return -EINVAL;
goto error below
> + }
> +
> + mutex_lock(&group->mutex);
Just hold the group->mutex a few lines above and don't put locking in
iommu_group_immutable_singleton(), it is clearer
> +void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev,
> + ioasid_t pasid)
> +{
> + struct iommu_group *group = iommu_group_get(dev);
> +
> + mutex_lock(&group->mutex);
> + domain->ops->block_dev_pasid(domain, dev, pasid);
I still really this OP, it is nonsense to invoke 'block_dev_pasid' on
a domain, it should be on the iommu ops and it should not take in a
domain parameter. This is why I prefer we write it as
domain->ops->set_dev_pasid(group->blocking_domain, dev, pasid);
> + xa_erase(&group->pasid_array, pasid);
It is worth checking that the value returned from xa_erase is domain
and WARN_ON if not, since we are passing domain in..
Jason
On Tue, Jul 05, 2022 at 01:07:04PM +0800, Lu Baolu wrote:
> Add support for SVA domain allocation and provide an SVA-specific
> iommu_domain_ops.
>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Kevin Tian <[email protected]>
> Tested-by: Tony Zhu <[email protected]>
> ---
> include/linux/intel-iommu.h | 5 ++++
> drivers/iommu/intel/iommu.c | 2 ++
> drivers/iommu/intel/svm.c | 49 +++++++++++++++++++++++++++++++++++++
> 3 files changed, 56 insertions(+)
Reviewed-by: Jason Gunthorpe <[email protected]>
Jason
On Tue, Jul 05, 2022 at 01:07:03PM +0800, Lu Baolu wrote:
> The sva iommu_domain represents a hardware pagetable that the IOMMU
> hardware could use for SVA translation. This adds some infrastructure
> to support SVA domain in the iommu common layer. It includes:
>
> - Extend the iommu_domain to support a new IOMMU_DOMAIN_SVA domain
> type. The IOMMU drivers that support allocation of the SVA domain
> should provide its own sva domain specific iommu_domain_ops.
> - Add a helper to allocate an SVA domain. The iommu_domain_free()
> is still used to free an SVA domain.
>
> The report_iommu_fault() should be replaced by the new
> iommu_report_device_fault(). Leave the existing fault handler with the
> existing users and the newly added SVA members excludes it.
>
> Suggested-by: Jean-Philippe Brucker <[email protected]>
> Suggested-by: Jason Gunthorpe <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Jean-Philippe Brucker <[email protected]>
> Tested-by: Zhangfei Gao <[email protected]>
> Tested-by: Tony Zhu <[email protected]>
> ---
> include/linux/iommu.h | 24 ++++++++++++++++++++++--
> drivers/iommu/iommu.c | 20 ++++++++++++++++++++
> 2 files changed, 42 insertions(+), 2 deletions(-)
Reviewed-by: Jason Gunthorpe <[email protected]>
Jason
On Tue, Jul 05, 2022 at 01:07:07PM +0800, Lu Baolu wrote:
> These ops'es have been replaced with the dev_attach/detach_pasid domain
> ops'es. There's no need for them anymore. Remove them to avoid dead
> code.
>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Jean-Philippe Brucker <[email protected]>
> Reviewed-by: Kevin Tian <[email protected]>
> Tested-by: Zhangfei Gao <[email protected]>
> Tested-by: Tony Zhu <[email protected]>
> ---
> include/linux/intel-iommu.h | 3 --
> include/linux/iommu.h | 7 ---
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 16 ------
> .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 40 ---------------
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 3 --
> drivers/iommu/intel/iommu.c | 3 --
> drivers/iommu/intel/svm.c | 49 -------------------
> 7 files changed, 121 deletions(-)
Reviewed-by: Jason Gunthorpe <[email protected]>
Jason
On Tue, Jul 05, 2022 at 01:07:09PM +0800, Lu Baolu wrote:
> Tweak the I/O page fault handling framework to route the page faults to
> the domain and call the page fault handler retrieved from the domain.
> This makes the I/O page fault handling framework possible to serve more
> usage scenarios as long as they have an IOMMU domain and install a page
> fault handler in it. Some unused functions are also removed to avoid
> dead code.
>
> The iommu_get_domain_for_dev_pasid() which retrieves attached domain
> for a {device, PASID} pair is used. It will be used by the page fault
> handling framework which knows {device, PASID} reported from the iommu
> driver. We have a guarantee that the SVA domain doesn't go away during
> IOPF handling, because unbind() won't free the domain until all the
> pending page requests have been flushed from the pipeline. The drivers
> either call iopf_queue_flush_dev() explicitly, or in stall case, the
> device driver is required to flush all DMAs including stalled
> transactions before calling unbind().
>
> This also renames iopf_handle_group() to iopf_handler() to avoid
> confusing.
>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Jean-Philippe Brucker <[email protected]>
> Tested-by: Zhangfei Gao <[email protected]>
> Tested-by: Tony Zhu <[email protected]>
> ---
> drivers/iommu/io-pgfault.c | 68 +++++---------------------------------
> 1 file changed, 9 insertions(+), 59 deletions(-)
Reviewed-by: Jason Gunthorpe <[email protected]>
Jason
On Tue, Jul 05, 2022 at 01:07:10PM +0800, Lu Baolu wrote:
> Rename iommu-sva-lib.c[h] to iommu-sva.c[h] as it contains all code
> for SVA implementation in iommu core.
>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Jean-Philippe Brucker <[email protected]>
> Reviewed-by: Kevin Tian <[email protected]>
> Tested-by: Zhangfei Gao <[email protected]>
> Tested-by: Tony Zhu <[email protected]>
> ---
> drivers/iommu/{iommu-sva-lib.h => iommu-sva.h} | 6 +++---
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 2 +-
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 +-
> drivers/iommu/intel/iommu.c | 2 +-
> drivers/iommu/intel/svm.c | 2 +-
> drivers/iommu/io-pgfault.c | 2 +-
> drivers/iommu/{iommu-sva-lib.c => iommu-sva.c} | 2 +-
> drivers/iommu/iommu.c | 2 +-
> drivers/iommu/Makefile | 2 +-
> 9 files changed, 11 insertions(+), 11 deletions(-)
> rename drivers/iommu/{iommu-sva-lib.h => iommu-sva.h} (95%)
> rename drivers/iommu/{iommu-sva-lib.c => iommu-sva.c} (99%)
Reviewed-by: Jason Gunthorpe <[email protected]>
Jason
On Tue, Jul 05, 2022 at 01:07:05PM +0800, Lu Baolu wrote:
> Add support for SVA domain allocation and provide an SVA-specific
> iommu_domain_ops.
>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Jean-Philippe Brucker <[email protected]>
> Tested-by: Zhangfei Gao <[email protected]>
> ---
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 6 ++
> .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 69 +++++++++++++++++++
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 3 +
> 3 files changed, 78 insertions(+)
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index d2ba86470c42..96399dd3a67a 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -758,6 +758,7 @@ struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm);
> void arm_smmu_sva_unbind(struct iommu_sva *handle);
> u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle);
> void arm_smmu_sva_notifier_synchronize(void);
> +struct iommu_domain *arm_smmu_sva_domain_alloc(void);
> #else /* CONFIG_ARM_SMMU_V3_SVA */
> static inline bool arm_smmu_sva_supported(struct arm_smmu_device *smmu)
> {
> @@ -803,5 +804,10 @@ static inline u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle)
> }
>
> static inline void arm_smmu_sva_notifier_synchronize(void) {}
> +
> +static inline struct iommu_domain *arm_smmu_sva_domain_alloc(void)
> +{
> + return NULL;
> +}
> #endif /* CONFIG_ARM_SMMU_V3_SVA */
> #endif /* _ARM_SMMU_V3_H */
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> index f155d406c5d5..fc4555dac5b4 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> @@ -549,3 +549,72 @@ void arm_smmu_sva_notifier_synchronize(void)
> */
> mmu_notifier_synchronize();
> }
> +
> +static int arm_smmu_sva_set_dev_pasid(struct iommu_domain *domain,
> + struct device *dev, ioasid_t id)
> +{
> + int ret = 0;
> + struct mm_struct *mm;
> + struct iommu_sva *handle;
> +
> + if (domain->type != IOMMU_DOMAIN_SVA)
> + return -EINVAL;
Not needed, this function is only called from the sva ops, other
domain types are impossible, we don't need sanity tests in drivers
> + mm = domain->mm;
> + if (WARN_ON(!mm))
> + return -ENODEV;
Also guarenteed by core code, don't need sanity tests
> +static void arm_smmu_sva_block_dev_pasid(struct iommu_domain *domain,
> + struct device *dev, ioasid_t id)
> +{
> + struct mm_struct *mm = domain->mm;
> + struct arm_smmu_bond *bond = NULL, *t;
> + struct arm_smmu_master *master = dev_iommu_priv_get(dev);
> +
> + mutex_lock(&sva_lock);
> + list_for_each_entry(t, &master->bonds, list) {
> + if (t->mm == mm) {
> + bond = t;
> + break;
This doesn't seem like what I would expect, the domain should be used
at the key in these datastructures, not the mm..
> index ae8ec8df47c1..a30b252e2f95 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1999,6 +1999,9 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
> {
> struct arm_smmu_domain *smmu_domain;
>
> + if (type == IOMMU_DOMAIN_SVA)
> + return arm_smmu_sva_domain_alloc();
If no drivers are sharing any code with their other alloc paths perhaps we
should have a dedicated op for SVA?
Jason
On Tue, Jul 05, 2022 at 01:07:08PM +0800, Lu Baolu wrote:
> This adds some mechanisms around the iommu_domain so that the I/O page
> fault handling framework could route a page fault to the domain and
> call the fault handler from it.
>
> Add pointers to the page fault handler and its private data in struct
> iommu_domain. The fault handler will be called with the private data
> as a parameter once a page fault is routed to the domain. Any kernel
> component which owns an iommu domain could install handler and its
> private parameter so that the page fault could be further routed and
> handled.
>
> This also prepares the SVA implementation to be the first consumer of
> the per-domain page fault handling model. The I/O page fault handler
> for SVA is copied to the SVA file with mmget_not_zero() added before
> mmap_read_lock().
>
> Suggested-by: Jean-Philippe Brucker <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Jean-Philippe Brucker <[email protected]>
> Tested-by: Zhangfei Gao <[email protected]>
> Tested-by: Tony Zhu <[email protected]>
> ---
> include/linux/iommu.h | 3 ++
> drivers/iommu/iommu-sva-lib.h | 8 +++++
> drivers/iommu/io-pgfault.c | 7 +++++
> drivers/iommu/iommu-sva-lib.c | 58 +++++++++++++++++++++++++++++++++++
> drivers/iommu/iommu.c | 4 +++
> 5 files changed, 80 insertions(+)
>
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index ae0cfca064e6..47610f21d451 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -105,6 +105,9 @@ struct iommu_domain {
> unsigned long pgsize_bitmap; /* Bitmap of page sizes in use */
> struct iommu_domain_geometry geometry;
> struct iommu_dma_cookie *iova_cookie;
> + enum iommu_page_response_code (*iopf_handler)(struct iommu_fault *fault,
> + void *data);
> + void *fault_data;
> union {
> struct {
> iommu_fault_handler_t handler;
Why do we need two falut callbacks? The only difference is that one is
recoverable and the other is not, right?
Can we run both down the same op?
> +/*
> + * I/O page fault handler for SVA
> + */
> +enum iommu_page_response_code
> +iommu_sva_handle_iopf(struct iommu_fault *fault, void *data)
> +{
> + vm_fault_t ret;
> + struct vm_area_struct *vma;
> + struct mm_struct *mm = data;
> + unsigned int access_flags = 0;
> + unsigned int fault_flags = FAULT_FLAG_REMOTE;
> + struct iommu_fault_page_request *prm = &fault->prm;
> + enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID;
> +
> + if (!(prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID))
> + return status;
> +
> + if (IS_ERR_OR_NULL(mm) || !mmget_not_zero(mm))
Do not use IS_ERR_ON_NULL. mm should never be null here since the
fault handler should have been removed from the domain before the
fault_data is changed.
Jason
On Tue, Jul 05, 2022 at 01:07:06PM +0800, Lu Baolu wrote:
> The existing iommu SVA interfaces are implemented by calling the SVA
> specific iommu ops provided by the IOMMU drivers. There's no need for
> any SVA specific ops in iommu_ops vector anymore as we can achieve
> this through the generic attach/detach_dev_pasid domain ops.
>
> This refactors the IOMMU SVA interfaces implementation by using the
> set/block_pasid_dev ops and align them with the concept of the SVA
> iommu domain. Put the new SVA code in the sva related file in order
> to make it self-contained.
>
> Signed-off-by: Lu Baolu <[email protected]>
> Tested-by: Zhangfei Gao <[email protected]>
> Tested-by: Tony Zhu <[email protected]>
> ---
> include/linux/iommu.h | 67 +++++++++++--------
> drivers/iommu/iommu-sva-lib.c | 98 ++++++++++++++++++++++++++++
> drivers/iommu/iommu.c | 119 ++++++++--------------------------
> 3 files changed, 165 insertions(+), 119 deletions(-)
>
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 42f0418dc22c..f59b0ecd3995 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -39,7 +39,6 @@ struct device;
> struct iommu_domain;
> struct iommu_domain_ops;
> struct notifier_block;
> -struct iommu_sva;
> struct iommu_fault_event;
> struct iommu_dma_cookie;
>
> @@ -57,6 +56,14 @@ struct iommu_domain_geometry {
> bool force_aperture; /* DMA only allowed in mappable range? */
> };
>
> +/**
> + * struct iommu_sva - handle to a device-mm bond
> + */
> +struct iommu_sva {
> + struct device *dev;
> + refcount_t users;
> +};
> +
> /* Domain feature flags */
> #define __IOMMU_DOMAIN_PAGING (1U << 0) /* Support for iommu_map/unmap */
> #define __IOMMU_DOMAIN_DMA_API (1U << 1) /* Domain for use in DMA-API
> @@ -105,6 +112,7 @@ struct iommu_domain {
> };
> struct { /* IOMMU_DOMAIN_SVA */
> struct mm_struct *mm;
> + struct iommu_sva bond;
We can't store a single struct device inside a domain, this is not
layed out right.
The API is really refcounting the PASID:
> +struct iommu_sva *iommu_sva_bind_device(struct device *dev,
> + struct mm_struct *mm);
> +void iommu_sva_unbind_device(struct iommu_sva *handle);
So what you need to do is store that 'iommu_sva' in the group's PASID
xarray.
The bind logic would be
sva = xa_load(group->pasid, mm->pasid)
if (sva)
refcount_inc(sva->users)
return sva
sva = kalloc
sva->domain = domain
xa_store(group->pasid, sva);
Jason
Hi Jason,
Thank you for reviewing this series.
On 2022/7/23 22:11, Jason Gunthorpe wrote:
> On Tue, Jul 05, 2022 at 01:07:02PM +0800, Lu Baolu wrote:
>> Attaching an IOMMU domain to a PASID of a device is a generic operation
>> for modern IOMMU drivers which support PASID-granular DMA address
>> translation. Currently visible usage scenarios include (but not limited):
>>
>> - SVA (Shared Virtual Address)
>> - kernel DMA with PASID
>> - hardware-assist mediated device
>>
>> This adds a pair of domain ops for this purpose and adds the interfaces
>> for device drivers to attach/detach a domain to/from a {device, PASID}.
>> Some buses, like PCI, route packets without considering the PASID
>> value.
> Below the comments touch on ACS, so this is a bit out of date
>
>> +static bool iommu_group_immutable_singleton(struct iommu_group *group,
>> + struct device *dev)
>> +{
>> + int count;
>> +
>> + mutex_lock(&group->mutex);
>> + count = iommu_group_device_count(group);
>> + mutex_unlock(&group->mutex);
>> +
>> + if (count != 1)
>> + return false;
>> +
>> + /*
>> + * The PCI device could be considered to be fully isolated if all
>> + * devices on the path from the device to the host-PCI bridge are
>> + * protected from peer-to-peer DMA by ACS.
>> + */
>> + if (dev_is_pci(dev))
>> + return pci_acs_path_enabled(to_pci_dev(dev), NULL,
>> + REQ_ACS_FLAGS);
> You might want to explain what condition causes ACS isolated devices
> to share a group in the first place..
>
How about rephrasing this part of commit message like below:
Some buses, like PCI, route packets without considering the PASID value.
Thus a DMA target address with PASID might be treated as P2P if the
address falls into the MMIO BAR of other devices in the group. To make
things simple, these interfaces only apply to devices belonging to the
singleton groups.
Considering that the PCI bus supports hot-plug, even a device boots with
a singleton group, a later hot-added device is still possible to share
the group, which breaks the singleton group assumption. In order to
avoid this situation, this interface requires that the ACS is enabled on
all devices on the path from the device to the host-PCI bridge.
Best regards,
baolu
On 2022/7/23 22:11, Jason Gunthorpe wrote:
>> +
>> + /*
>> + * Otherwise, the device came from DT/ACPI, assume it is static and
>> + * then singleton can know from the device count in the group.
>> + */
>> + return true;
>> +}
> I would be happer if probe was changed to refuse to add a device to a
> group if the group's pasid xarray is not empty, as a protective
> measure.
Agreed. I will add below code.
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 047898666b9f..e43cb6776087 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -895,6 +895,14 @@ int iommu_group_add_device(struct iommu_group
*group, struct device *dev)
int ret, i = 0;
struct group_device *device;
+ /*
+ * The iommu_attach_device_pasid() requires a singleton group.
+ * Refuse to add a device into it if this assumption has been
+ * made.
+ */
+ if (!xa_empty(group->pasid_array))
+ return -EBUSY;
+
device = kzalloc(sizeof(*device), GFP_KERNEL);
if (!device)
return -ENOMEM;
>
>> +int iommu_attach_device_pasid(struct iommu_domain *domain, struct device *dev,
>> + ioasid_t pasid)
>> +{
>> + struct iommu_group *group;
>> + void *curr;
>> + int ret;
>> +
>> + if (!domain->ops->set_dev_pasid)
>> + return -EOPNOTSUPP;
>> +
>> + group = iommu_group_get(dev);
>> + if (!group || !iommu_group_immutable_singleton(group, dev)) {
>> + iommu_group_put(group);
>> + return -EINVAL;
> goto error below
>
>> + }
>> +
>> + mutex_lock(&group->mutex);
> Just hold the group->mutex a few lines above and don't put locking in
> iommu_group_immutable_singleton(), it is clearer
Above two comments agreed. iommu_attach_device_pasid() looks like below
after update.
int iommu_attach_device_pasid(struct iommu_domain *domain, struct device
*dev,
ioasid_t pasid)
{
struct iommu_group *group;
int ret = -EINVAL;
void *curr;
if (!domain->ops->set_dev_pasid)
return -EOPNOTSUPP;
group = iommu_group_get(dev);
if (!group)
return -ENODEV;
mutex_lock(&group->mutex);
if (!iommu_group_immutable_singleton(group, dev))
goto out_unlock;
curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain,
GFP_KERNEL);
if (curr) {
ret = xa_err(curr) ? : -EBUSY;
goto out_unlock;
}
ret = domain->ops->set_dev_pasid(domain, dev, pasid);
if (ret)
xa_erase(&group->pasid_array, pasid);
out_unlock:
mutex_unlock(&group->mutex);
iommu_group_put(group);
return ret;
}
Best regards,
baolu
On 2022/7/23 22:11, Jason Gunthorpe wrote:
>> + xa_erase(&group->pasid_array, pasid);
> It is worth checking that the value returned from xa_erase is domain
> and WARN_ON if not, since we are passing domain in..
Yes, will do like this:
WARN_ON(xa_erase(&group->pasid_array, pasid) != domain);
Best regards,
baolu
On 2022/7/23 22:11, Jason Gunthorpe wrote:
>> +void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev,
>> + ioasid_t pasid)
>> +{
>> + struct iommu_group *group = iommu_group_get(dev);
>> +
>> + mutex_lock(&group->mutex);
>> + domain->ops->block_dev_pasid(domain, dev, pasid);
> I still really this OP, it is nonsense to invoke 'block_dev_pasid' on
> a domain, it should be on the iommu ops and it should not take in a
> domain parameter. This is why I prefer we write it as
>
> domain->ops->set_dev_pasid(group->blocking_domain, dev, pasid);
>
I originally plan to refactor this after both Intel and ARM SMMUv3
drivers have real blocking domain supports. After revisiting this, it
seems that the only difficulty is how to check whether a domain is a
blocking domain. I am going to use below checking code:
+ /*
+ * Detach the domain if a blocking domain is set. Check the
+ * right domain type once the IOMMU driver supports a real
+ * blocking domain.
+ */
+ if (!domain || domain->type == IOMMU_DOMAIN_UNMANAGED) {
Does this work for you?
The incremental changes look like below:
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 6633c7b040b8..9f8748b51630 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -890,22 +890,23 @@ static int intel_svm_set_dev_pasid(struct
iommu_domain *domain,
int ret = 0;
mutex_lock(&pasid_mutex);
- sva = intel_svm_bind_mm(iommu, dev, mm);
- if (IS_ERR(sva))
- ret = PTR_ERR(sva);
+ /*
+ * Detach the domain if a blocking domain is set. Check the
+ * right domain type once the IOMMU driver supports a real
+ * blocking domain.
+ */
+ if (!domain || domain->type == IOMMU_DOMAIN_UNMANAGED) {
+ intel_svm_unbind_mm(dev, pasid);
+ } else {
+ sva = intel_svm_bind_mm(iommu, dev, mm);
+ if (IS_ERR(sva))
+ ret = PTR_ERR(sva);
+ }
mutex_unlock(&pasid_mutex);
return ret;
}
-static void intel_svm_block_dev_pasid(struct iommu_domain *domain,
- struct device *dev, ioasid_t pasid)
-{
- mutex_lock(&pasid_mutex);
- intel_svm_unbind_mm(dev, pasid);
- mutex_unlock(&pasid_mutex);
-}
-
static void intel_svm_domain_free(struct iommu_domain *domain)
{
kfree(to_dmar_domain(domain));
@@ -913,7 +914,6 @@ static void intel_svm_domain_free(struct
iommu_domain *domain)
static const struct iommu_domain_ops intel_svm_domain_ops = {
.set_dev_pasid = intel_svm_set_dev_pasid,
- .block_dev_pasid = intel_svm_block_dev_pasid,
.free = intel_svm_domain_free,
};
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index efe6a58eee48..a7f7a611fcce 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3262,7 +3262,7 @@ void iommu_detach_device_pasid(struct iommu_domain
*domain, struct device *dev,
struct iommu_group *group = iommu_group_get(dev);
mutex_lock(&group->mutex);
- domain->ops->block_dev_pasid(domain, dev, pasid);
+ domain->ops->set_dev_pasid(group->blocking_domain, dev, pasid);
WARN_ON(xa_erase(&group->pasid_array, pasid) != domain);
mutex_unlock(&group->mutex);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index ba6543f4c6a2..c52dccb86460 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -283,7 +283,6 @@ struct iommu_ops {
* @attach_dev: attach an iommu domain to a device
* @detach_dev: detach an iommu domain from a device
* @set_dev_pasid: set an iommu domain to a pasid of device
- * @block_dev_pasid: block pasid of device from using iommu domain
* @map: map a physically contiguous memory region to an iommu domain
* @map_pages: map a physically contiguous set of pages of the same
size to
* an iommu domain.
@@ -306,8 +305,6 @@ struct iommu_domain_ops {
void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
int (*set_dev_pasid)(struct iommu_domain *domain, struct device *dev,
ioasid_t pasid);
- void (*block_dev_pasid)(struct iommu_domain *domain, struct device *dev,
- ioasid_t pasid);
int (*map)(struct iommu_domain *domain, unsigned long iova,
phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
Best regards,
baolu
On 2022/7/23 22:20, Jason Gunthorpe wrote:
> On Tue, Jul 05, 2022 at 01:07:05PM +0800, Lu Baolu wrote:
>> Add support for SVA domain allocation and provide an SVA-specific
>> iommu_domain_ops.
>>
>> Signed-off-by: Lu Baolu <[email protected]>
>> Reviewed-by: Jean-Philippe Brucker <[email protected]>
>> Tested-by: Zhangfei Gao <[email protected]>
>> ---
>> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 6 ++
>> .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 69 +++++++++++++++++++
>> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 3 +
>> 3 files changed, 78 insertions(+)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> index d2ba86470c42..96399dd3a67a 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> @@ -758,6 +758,7 @@ struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm);
>> void arm_smmu_sva_unbind(struct iommu_sva *handle);
>> u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle);
>> void arm_smmu_sva_notifier_synchronize(void);
>> +struct iommu_domain *arm_smmu_sva_domain_alloc(void);
>> #else /* CONFIG_ARM_SMMU_V3_SVA */
>> static inline bool arm_smmu_sva_supported(struct arm_smmu_device *smmu)
>> {
>> @@ -803,5 +804,10 @@ static inline u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle)
>> }
>>
>> static inline void arm_smmu_sva_notifier_synchronize(void) {}
>> +
>> +static inline struct iommu_domain *arm_smmu_sva_domain_alloc(void)
>> +{
>> + return NULL;
>> +}
>> #endif /* CONFIG_ARM_SMMU_V3_SVA */
>> #endif /* _ARM_SMMU_V3_H */
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
>> index f155d406c5d5..fc4555dac5b4 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
>> @@ -549,3 +549,72 @@ void arm_smmu_sva_notifier_synchronize(void)
>> */
>> mmu_notifier_synchronize();
>> }
>> +
>> +static int arm_smmu_sva_set_dev_pasid(struct iommu_domain *domain,
>> + struct device *dev, ioasid_t id)
>> +{
>> + int ret = 0;
>> + struct mm_struct *mm;
>> + struct iommu_sva *handle;
>> +
>> + if (domain->type != IOMMU_DOMAIN_SVA)
>> + return -EINVAL;
>
> Not needed, this function is only called from the sva ops, other
> domain types are impossible, we don't need sanity tests in drivers
>
>> + mm = domain->mm;
>> + if (WARN_ON(!mm))
>> + return -ENODEV;
>
> Also guarenteed by core code, don't need sanity tests
Above two updated. Thanks!
>
>> +static void arm_smmu_sva_block_dev_pasid(struct iommu_domain *domain,
>> + struct device *dev, ioasid_t id)
>> +{
>> + struct mm_struct *mm = domain->mm;
>> + struct arm_smmu_bond *bond = NULL, *t;
>> + struct arm_smmu_master *master = dev_iommu_priv_get(dev);
>> +
>> + mutex_lock(&sva_lock);
>> + list_for_each_entry(t, &master->bonds, list) {
>> + if (t->mm == mm) {
>> + bond = t;
>> + break;
>
> This doesn't seem like what I would expect, the domain should be used
> at the key in these datastructures, not the mm..
Both Intel and arm-smmu-v3 SVA code have room to cleanup. I've discussed
this with Jean. We will cleanup and refactor the individual drivers in
separated series.
>
>> index ae8ec8df47c1..a30b252e2f95 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -1999,6 +1999,9 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
>> {
>> struct arm_smmu_domain *smmu_domain;
>>
>> + if (type == IOMMU_DOMAIN_SVA)
>> + return arm_smmu_sva_domain_alloc();
>
> If no drivers are sharing any code with their other alloc paths perhaps we
> should have a dedicated op for SVA?
AFAICS, Robin is refactoring the domain allocation interfaces. How about
leaving this until we finalize the interface?
Best regards,
baolu
On 2022/7/23 22:26, Jason Gunthorpe wrote:
> On Tue, Jul 05, 2022 at 01:07:06PM +0800, Lu Baolu wrote:
>> The existing iommu SVA interfaces are implemented by calling the SVA
>> specific iommu ops provided by the IOMMU drivers. There's no need for
>> any SVA specific ops in iommu_ops vector anymore as we can achieve
>> this through the generic attach/detach_dev_pasid domain ops.
>>
>> This refactors the IOMMU SVA interfaces implementation by using the
>> set/block_pasid_dev ops and align them with the concept of the SVA
>> iommu domain. Put the new SVA code in the sva related file in order
>> to make it self-contained.
>>
>> Signed-off-by: Lu Baolu <[email protected]>
>> Tested-by: Zhangfei Gao <[email protected]>
>> Tested-by: Tony Zhu <[email protected]>
>> ---
>> include/linux/iommu.h | 67 +++++++++++--------
>> drivers/iommu/iommu-sva-lib.c | 98 ++++++++++++++++++++++++++++
>> drivers/iommu/iommu.c | 119 ++++++++--------------------------
>> 3 files changed, 165 insertions(+), 119 deletions(-)
>>
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index 42f0418dc22c..f59b0ecd3995 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -39,7 +39,6 @@ struct device;
>> struct iommu_domain;
>> struct iommu_domain_ops;
>> struct notifier_block;
>> -struct iommu_sva;
>> struct iommu_fault_event;
>> struct iommu_dma_cookie;
>>
>> @@ -57,6 +56,14 @@ struct iommu_domain_geometry {
>> bool force_aperture; /* DMA only allowed in mappable range? */
>> };
>>
>> +/**
>> + * struct iommu_sva - handle to a device-mm bond
>> + */
>> +struct iommu_sva {
>> + struct device *dev;
>> + refcount_t users;
>> +};
>> +
>> /* Domain feature flags */
>> #define __IOMMU_DOMAIN_PAGING (1U << 0) /* Support for iommu_map/unmap */
>> #define __IOMMU_DOMAIN_DMA_API (1U << 1) /* Domain for use in DMA-API
>> @@ -105,6 +112,7 @@ struct iommu_domain {
>> };
>> struct { /* IOMMU_DOMAIN_SVA */
>> struct mm_struct *mm;
>> + struct iommu_sva bond;
>
> We can't store a single struct device inside a domain, this is not
> layed out right.
Yes, agreed.
>
> The API is really refcounting the PASID:
>
>> +struct iommu_sva *iommu_sva_bind_device(struct device *dev,
>> + struct mm_struct *mm);
>> +void iommu_sva_unbind_device(struct iommu_sva *handle);
>
> So what you need to do is store that 'iommu_sva' in the group's PASID
> xarray.
>
> The bind logic would be
>
> sva = xa_load(group->pasid, mm->pasid)
> if (sva)
> refcount_inc(sva->users)
> return sva
> sva = kalloc
> sva->domain = domain
> xa_store(group->pasid, sva);
Thanks for the suggestion. It makes a lot of sense to me.
Furthermore, I'd like to separate the generic data from the caller-
specific things because the group->pasid_array should also be able to
serve other usages. Hence, the attach/detach_device_pasid interfaces
might be changed like below:
/* Collection of per-pasid IOMMU data */
struct group_pasid {
struct iommu_domain *domain;
void *priv;
};
/*
* iommu_attach_device_pasid() - Attach a domain to pasid of device
* @domain: the iommu domain.
* @dev: the attached device.
* @pasid: the pasid of the device.
* @data: private data, NULL if not needed.
*
* Return: 0 on success, or an error.
*/
int iommu_attach_device_pasid(struct iommu_domain *domain, struct device
*dev,
ioasid_t pasid, void *data)
{
struct iommu_group *group;
struct group_pasid *param;
int ret = -EINVAL;
void *curr;
if (!domain->ops->set_dev_pasid)
return -EOPNOTSUPP;
group = iommu_group_get(dev);
if (!group)
return -ENODEV;
param = kzalloc(sizeof(*param), GFP_KERNEL);
if (!param) {
iommu_group_put(group);
return -ENOMEM;
}
param->domain = domain;
param->priv = data;
mutex_lock(&group->mutex);
if (!iommu_group_immutable_singleton(group, dev))
goto out_unlock;
curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, param, GFP_KERNEL);
if (curr) {
ret = xa_err(curr) ? : -EBUSY;
goto out_unlock;
}
ret = domain->ops->set_dev_pasid(domain, dev, pasid);
if (ret)
xa_erase(&group->pasid_array, pasid);
out_unlock:
mutex_unlock(&group->mutex);
iommu_group_put(group);
if (ret)
kfree(param);
return ret;
}
/*
* iommu_detach_device_pasid() - Detach the domain from pasid of device
* @domain: the iommu domain.
* @dev: the attached device.
* @pasid: the pasid of the device.
*
* The @domain must have been attached to @pasid of the @dev with
* iommu_detach_device_pasid().
*/
void iommu_detach_device_pasid(struct iommu_domain *domain, struct
device *dev,
ioasid_t pasid)
{
struct iommu_group *group = iommu_group_get(dev);
struct group_pasid *param;
mutex_lock(&group->mutex);
domain->ops->set_dev_pasid(group->blocking_domain, dev, pasid);
param = xa_erase(&group->pasid_array, pasid);
WARN_ON(!param || param->domain != domain);
mutex_unlock(&group->mutex);
iommu_group_put(group);
kfree(param);
}
Does this look right to you?
Best regards,
baolu
On 2022/7/23 22:33, Jason Gunthorpe wrote:
> On Tue, Jul 05, 2022 at 01:07:08PM +0800, Lu Baolu wrote:
>> This adds some mechanisms around the iommu_domain so that the I/O page
>> fault handling framework could route a page fault to the domain and
>> call the fault handler from it.
>>
>> Add pointers to the page fault handler and its private data in struct
>> iommu_domain. The fault handler will be called with the private data
>> as a parameter once a page fault is routed to the domain. Any kernel
>> component which owns an iommu domain could install handler and its
>> private parameter so that the page fault could be further routed and
>> handled.
>>
>> This also prepares the SVA implementation to be the first consumer of
>> the per-domain page fault handling model. The I/O page fault handler
>> for SVA is copied to the SVA file with mmget_not_zero() added before
>> mmap_read_lock().
>>
>> Suggested-by: Jean-Philippe Brucker <[email protected]>
>> Signed-off-by: Lu Baolu <[email protected]>
>> Reviewed-by: Jean-Philippe Brucker <[email protected]>
>> Tested-by: Zhangfei Gao <[email protected]>
>> Tested-by: Tony Zhu <[email protected]>
>> ---
>> include/linux/iommu.h | 3 ++
>> drivers/iommu/iommu-sva-lib.h | 8 +++++
>> drivers/iommu/io-pgfault.c | 7 +++++
>> drivers/iommu/iommu-sva-lib.c | 58 +++++++++++++++++++++++++++++++++++
>> drivers/iommu/iommu.c | 4 +++
>> 5 files changed, 80 insertions(+)
>>
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index ae0cfca064e6..47610f21d451 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -105,6 +105,9 @@ struct iommu_domain {
>> unsigned long pgsize_bitmap; /* Bitmap of page sizes in use */
>> struct iommu_domain_geometry geometry;
>> struct iommu_dma_cookie *iova_cookie;
>> + enum iommu_page_response_code (*iopf_handler)(struct iommu_fault *fault,
>> + void *data);
>> + void *fault_data;
>> union {
>> struct {
>> iommu_fault_handler_t handler;
>
> Why do we need two falut callbacks? The only difference is that one is
> recoverable and the other is not, right?
>
> Can we run both down the same op?
The iommu_fault_handler_t is for report_iommu_fault() which could be
replaced with the newer iommu_report_device_fault().
https://lore.kernel.org/linux-iommu/Yo4Nw9QyllT1RZbd@myrica/
>
>> +/*
>> + * I/O page fault handler for SVA
>> + */
>> +enum iommu_page_response_code
>> +iommu_sva_handle_iopf(struct iommu_fault *fault, void *data)
>> +{
>> + vm_fault_t ret;
>> + struct vm_area_struct *vma;
>> + struct mm_struct *mm = data;
>> + unsigned int access_flags = 0;
>> + unsigned int fault_flags = FAULT_FLAG_REMOTE;
>> + struct iommu_fault_page_request *prm = &fault->prm;
>> + enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID;
>> +
>> + if (!(prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID))
>> + return status;
>> +
>> + if (IS_ERR_OR_NULL(mm) || !mmget_not_zero(mm))
>
> Do not use IS_ERR_ON_NULL. mm should never be null here since the
> fault handler should have been removed from the domain before the
> fault_data is changed.
Yes. Updated.
Best regards,
baolu
On Sun, Jul 24, 2022 at 09:48:15PM +0800, Baolu Lu wrote:
> /*
> * iommu_detach_device_pasid() - Detach the domain from pasid of device
> * @domain: the iommu domain.
> * @dev: the attached device.
> * @pasid: the pasid of the device.
> *
> * The @domain must have been attached to @pasid of the @dev with
> * iommu_detach_device_pasid().
> */
> void iommu_detach_device_pasid(struct iommu_domain *domain, struct device
> *dev,
> ioasid_t pasid)
> {
> struct iommu_group *group = iommu_group_get(dev);
> struct group_pasid *param;
>
> mutex_lock(&group->mutex);
> domain->ops->set_dev_pasid(group->blocking_domain, dev, pasid);
Please also pass the old domain to this detach() function, so that the
IOMMU driver doesn't have to keep track of them internally.
In addition to clearing contexts, detach() also needs to invalidate TLBs,
and for that the SMMU driver needs to know the old ASID (!= PASID) that
was used by the context descriptor.
Thanks,
Jean
> From: Baolu Lu <[email protected]>
> Sent: Sunday, July 24, 2022 9:48 PM
> >
> > The API is really refcounting the PASID:
> >
> >> +struct iommu_sva *iommu_sva_bind_device(struct device *dev,
> >> + struct mm_struct *mm);
> >> +void iommu_sva_unbind_device(struct iommu_sva *handle);
> >
> > So what you need to do is store that 'iommu_sva' in the group's PASID
> > xarray.
> >
> > The bind logic would be
> >
> > sva = xa_load(group->pasid, mm->pasid)
> > if (sva)
> > refcount_inc(sva->users)
> > return sva
> > sva = kalloc
> > sva->domain = domain
> > xa_store(group->pasid, sva);
>
> Thanks for the suggestion. It makes a lot of sense to me.
>
> Furthermore, I'd like to separate the generic data from the caller-
> specific things because the group->pasid_array should also be able to
> serve other usages. Hence, the attach/detach_device_pasid interfaces
> might be changed like below:
>
> /* Collection of per-pasid IOMMU data */
> struct group_pasid {
> struct iommu_domain *domain;
> void *priv;
> };
>
Is there any reason why pasid refcnt is sva specific and needs to be
in a priv field?
> From: Baolu Lu <[email protected]>
> Sent: Sunday, July 24, 2022 5:14 PM
>
> On 2022/7/23 22:11, Jason Gunthorpe wrote:
> >> +void iommu_detach_device_pasid(struct iommu_domain *domain,
> struct device *dev,
> >> + ioasid_t pasid)
> >> +{
> >> + struct iommu_group *group = iommu_group_get(dev);
> >> +
> >> + mutex_lock(&group->mutex);
> >> + domain->ops->block_dev_pasid(domain, dev, pasid);
> > I still really this OP, it is nonsense to invoke 'block_dev_pasid' on
> > a domain, it should be on the iommu ops and it should not take in a
> > domain parameter. This is why I prefer we write it as
> >
> > domain->ops->set_dev_pasid(group->blocking_domain, dev, pasid);
> >
>
> I originally plan to refactor this after both Intel and ARM SMMUv3
> drivers have real blocking domain supports. After revisiting this, it
> seems that the only difficulty is how to check whether a domain is a
> blocking domain. I am going to use below checking code:
>
> + /*
> + * Detach the domain if a blocking domain is set. Check the
> + * right domain type once the IOMMU driver supports a real
> + * blocking domain.
> + */
> + if (!domain || domain->type == IOMMU_DOMAIN_UNMANAGED) {
>
> Does this work for you?
>
Or you can call __iommu_group_alloc_blocking_domain() in the sva
path and then just check whether the domain is equal to
group->blocking_domain here.
> From: Jean-Philippe Brucker <[email protected]>
> Sent: Monday, July 25, 2022 3:39 PM
>
> On Sun, Jul 24, 2022 at 09:48:15PM +0800, Baolu Lu wrote:
> > /*
> > * iommu_detach_device_pasid() - Detach the domain from pasid of device
> > * @domain: the iommu domain.
> > * @dev: the attached device.
> > * @pasid: the pasid of the device.
> > *
> > * The @domain must have been attached to @pasid of the @dev with
> > * iommu_detach_device_pasid().
> > */
> > void iommu_detach_device_pasid(struct iommu_domain *domain, struct
> device
> > *dev,
> > ioasid_t pasid)
> > {
> > struct iommu_group *group = iommu_group_get(dev);
> > struct group_pasid *param;
> >
> > mutex_lock(&group->mutex);
> > domain->ops->set_dev_pasid(group->blocking_domain, dev, pasid);
>
> Please also pass the old domain to this detach() function, so that the
> IOMMU driver doesn't have to keep track of them internally.
The old domain is already tracked in group->pasid_xarray and can
be retrieved using [dev, pasid].
>
> In addition to clearing contexts, detach() also needs to invalidate TLBs,
> and for that the SMMU driver needs to know the old ASID (!= PASID) that
> was used by the context descriptor.
>
Presumably both ASID and context descriptor are SMMU internal
knowledge. What exact information is required from the core API
and how is it done today?
On Mon, Jul 25, 2022 at 08:02:05AM +0000, Tian, Kevin wrote:
> > From: Jean-Philippe Brucker <[email protected]>
> > Sent: Monday, July 25, 2022 3:39 PM
> >
> > On Sun, Jul 24, 2022 at 09:48:15PM +0800, Baolu Lu wrote:
> > > /*
> > > * iommu_detach_device_pasid() - Detach the domain from pasid of device
> > > * @domain: the iommu domain.
> > > * @dev: the attached device.
> > > * @pasid: the pasid of the device.
> > > *
> > > * The @domain must have been attached to @pasid of the @dev with
> > > * iommu_detach_device_pasid().
> > > */
> > > void iommu_detach_device_pasid(struct iommu_domain *domain, struct
> > device
> > > *dev,
> > > ioasid_t pasid)
> > > {
> > > struct iommu_group *group = iommu_group_get(dev);
> > > struct group_pasid *param;
> > >
> > > mutex_lock(&group->mutex);
> > > domain->ops->set_dev_pasid(group->blocking_domain, dev, pasid);
> >
> > Please also pass the old domain to this detach() function, so that the
> > IOMMU driver doesn't have to keep track of them internally.
>
> The old domain is already tracked in group->pasid_xarray and can
> be retrieved using [dev, pasid].
Ah yes, I can use that. Something explicit would help avoid breaking the
driver next time the core changes
>
> >
> > In addition to clearing contexts, detach() also needs to invalidate TLBs,
> > and for that the SMMU driver needs to know the old ASID (!= PASID) that
> > was used by the context descriptor.
> >
>
> Presumably both ASID and context descriptor are SMMU internal
> knowledge. What exact information is required from the core API
> and how is it done today?
Today the SMMU driver keeps track of bonds, but the goal of this series is
to move that to the core.
Thanks,
Jean
Hi Jean,
On 2022/7/25 15:39, Jean-Philippe Brucker wrote:
> On Sun, Jul 24, 2022 at 09:48:15PM +0800, Baolu Lu wrote:
>> /*
>> * iommu_detach_device_pasid() - Detach the domain from pasid of device
>> * @domain: the iommu domain.
>> * @dev: the attached device.
>> * @pasid: the pasid of the device.
>> *
>> * The @domain must have been attached to @pasid of the @dev with
>> * iommu_detach_device_pasid().
>> */
>> void iommu_detach_device_pasid(struct iommu_domain *domain, struct device
>> *dev,
>> ioasid_t pasid)
>> {
>> struct iommu_group *group = iommu_group_get(dev);
>> struct group_pasid *param;
>>
>> mutex_lock(&group->mutex);
>> domain->ops->set_dev_pasid(group->blocking_domain, dev, pasid);
> Please also pass the old domain to this detach() function, so that the
> IOMMU driver doesn't have to keep track of them internally.
The iommu core provides the interface to retrieve attached domain with a
{device, pasid} pair. Therefore in the smmuv3 driver, the set_dev_pasid
could do like this:
+static int arm_smmu_sva_set_dev_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t id)
+{
+ int ret = 0;
+ struct mm_struct *mm;
+ struct iommu_sva *handle;
+
+ /*
+ * Detach the domain if a blocking domain is set. Check the
+ * right domain type once the IOMMU driver supports a real
+ * blocking domain.
+ */
+ if (!domain || domain->type == IOMMU_DOMAIN_UNMANAGED) {
+ struct pasid_iommu *param;
+
+ param = iommu_device_pasid_param(dev, id);
+ if (!param || !param->domain)
+ return -EINVAL;
+ arm_smmu_sva_block_dev_pasid(param->domain, dev, id);
+
+ return 0;
+ }
+
+ mm = domain->mm;
+ mutex_lock(&sva_lock);
+ handle = __arm_smmu_sva_bind(dev, mm);
+ if (IS_ERR(handle))
+ ret = PTR_ERR(handle);
+ mutex_unlock(&sva_lock);
+
+ return ret;
+}
The check of "(!domain || domain->type == IOMMU_DOMAIN_UNMANAGED)" looks
odd, but could get cleaned up after a real blocking domain is added.
Then, we can simply check "domain->type == IOMMU_DOMAIN_BLOCKING".
> In addition to clearing contexts, detach() also needs to invalidate TLBs,
> and for that the SMMU driver needs to know the old ASID (!= PASID) that
> was used by the context descriptor.
Best regards,
baolu
On Mon, Jul 25, 2022 at 05:33:05PM +0800, Baolu Lu wrote:
> Hi Jean,
>
> On 2022/7/25 15:39, Jean-Philippe Brucker wrote:
> > On Sun, Jul 24, 2022 at 09:48:15PM +0800, Baolu Lu wrote:
> > > /*
> > > * iommu_detach_device_pasid() - Detach the domain from pasid of device
> > > * @domain: the iommu domain.
> > > * @dev: the attached device.
> > > * @pasid: the pasid of the device.
> > > *
> > > * The @domain must have been attached to @pasid of the @dev with
> > > * iommu_detach_device_pasid().
> > > */
> > > void iommu_detach_device_pasid(struct iommu_domain *domain, struct device
> > > *dev,
> > > ioasid_t pasid)
> > > {
> > > struct iommu_group *group = iommu_group_get(dev);
> > > struct group_pasid *param;
> > >
> > > mutex_lock(&group->mutex);
> > > domain->ops->set_dev_pasid(group->blocking_domain, dev, pasid);
> > Please also pass the old domain to this detach() function, so that the
> > IOMMU driver doesn't have to keep track of them internally.
>
> The iommu core provides the interface to retrieve attached domain with a
> {device, pasid} pair. Therefore in the smmuv3 driver, the set_dev_pasid
> could do like this:
Thanks for the example, yes I can do something like this. I maintain that
attach+detach is clearer, but as long as it can be made to work, fine by
me
Thanks,
Jean
>
> +static int arm_smmu_sva_set_dev_pasid(struct iommu_domain *domain,
> + struct device *dev, ioasid_t id)
> +{
> + int ret = 0;
> + struct mm_struct *mm;
> + struct iommu_sva *handle;
> +
> + /*
> + * Detach the domain if a blocking domain is set. Check the
> + * right domain type once the IOMMU driver supports a real
> + * blocking domain.
> + */
> + if (!domain || domain->type == IOMMU_DOMAIN_UNMANAGED) {
> + struct pasid_iommu *param;
> +
> + param = iommu_device_pasid_param(dev, id);
> + if (!param || !param->domain)
> + return -EINVAL;
> + arm_smmu_sva_block_dev_pasid(param->domain, dev, id);
> +
> + return 0;
> + }
> +
> + mm = domain->mm;
> + mutex_lock(&sva_lock);
> + handle = __arm_smmu_sva_bind(dev, mm);
> + if (IS_ERR(handle))
> + ret = PTR_ERR(handle);
> + mutex_unlock(&sva_lock);
> +
> + return ret;
> +}
>
> The check of "(!domain || domain->type == IOMMU_DOMAIN_UNMANAGED)" looks
> odd, but could get cleaned up after a real blocking domain is added.
> Then, we can simply check "domain->type == IOMMU_DOMAIN_BLOCKING".
On 2022/7/25 15:46, Tian, Kevin wrote:
>> From: Baolu Lu <[email protected]>
>> Sent: Sunday, July 24, 2022 5:14 PM
>>
>> On 2022/7/23 22:11, Jason Gunthorpe wrote:
>>>> +void iommu_detach_device_pasid(struct iommu_domain *domain,
>> struct device *dev,
>>>> + ioasid_t pasid)
>>>> +{
>>>> + struct iommu_group *group = iommu_group_get(dev);
>>>> +
>>>> + mutex_lock(&group->mutex);
>>>> + domain->ops->block_dev_pasid(domain, dev, pasid);
>>> I still really this OP, it is nonsense to invoke 'block_dev_pasid' on
>>> a domain, it should be on the iommu ops and it should not take in a
>>> domain parameter. This is why I prefer we write it as
>>>
>>> domain->ops->set_dev_pasid(group->blocking_domain, dev, pasid);
>>>
>>
>> I originally plan to refactor this after both Intel and ARM SMMUv3
>> drivers have real blocking domain supports. After revisiting this, it
>> seems that the only difficulty is how to check whether a domain is a
>> blocking domain. I am going to use below checking code:
>>
>> + /*
>> + * Detach the domain if a blocking domain is set. Check the
>> + * right domain type once the IOMMU driver supports a real
>> + * blocking domain.
>> + */
>> + if (!domain || domain->type == IOMMU_DOMAIN_UNMANAGED) {
>>
>> Does this work for you?
>>
>
> Or you can call __iommu_group_alloc_blocking_domain() in the sva
> path and then just check whether the domain is equal to
> group->blocking_domain here.
Above check is in the IOMMU driver where group->blocking_domain is not
viable. I ever thought about have something like
struct iommu_domain *iommu_group_blocking_domain(struct iommu_group
*group)
to return group->blocking_domain. But it looks redundant.
Best regards,
baolu
On 2022/7/25 15:50, Tian, Kevin wrote:
>> From: Baolu Lu <[email protected]>
>> Sent: Sunday, July 24, 2022 9:48 PM
>>>
>>> The API is really refcounting the PASID:
>>>
>>>> +struct iommu_sva *iommu_sva_bind_device(struct device *dev,
>>>> + struct mm_struct *mm);
>>>> +void iommu_sva_unbind_device(struct iommu_sva *handle);
>>>
>>> So what you need to do is store that 'iommu_sva' in the group's PASID
>>> xarray.
>>>
>>> The bind logic would be
>>>
>>> sva = xa_load(group->pasid, mm->pasid)
>>> if (sva)
>>> refcount_inc(sva->users)
>>> return sva
>>> sva = kalloc
>>> sva->domain = domain
>>> xa_store(group->pasid, sva);
>>
>> Thanks for the suggestion. It makes a lot of sense to me.
>>
>> Furthermore, I'd like to separate the generic data from the caller-
>> specific things because the group->pasid_array should also be able to
>> serve other usages. Hence, the attach/detach_device_pasid interfaces
>> might be changed like below:
>>
>> /* Collection of per-pasid IOMMU data */
>> struct group_pasid {
>> struct iommu_domain *domain;
>> void *priv;
>> };
>>
>
> Is there any reason why pasid refcnt is sva specific and needs to be
> in a priv field?
I am going to store the iommu_sva data which represents the bind
relationship between device and domain.
Best regards,
baolu
On Sun, Jul 24, 2022 at 03:03:16PM +0800, Baolu Lu wrote:
> How about rephrasing this part of commit message like below:
>
> Some buses, like PCI, route packets without considering the PASID value.
> Thus a DMA target address with PASID might be treated as P2P if the
> address falls into the MMIO BAR of other devices in the group. To make
> things simple, these interfaces only apply to devices belonging to the
> singleton groups.
> Considering that the PCI bus supports hot-plug, even a device boots with
> a singleton group, a later hot-added device is still possible to share
> the group, which breaks the singleton group assumption. In order to
> avoid this situation, this interface requires that the ACS is enabled on
> all devices on the path from the device to the host-PCI bridge.
But ACS directly fixes the routing issue above
This entire explanation can be recast as saying we block PASID
attachment in all cases where the PCI fabric is routing based on
address. ACS disables that.
Not sure it even has anything to do with hotplug or singleton??
Jason
On Mon, Jul 25, 2022 at 06:22:06PM +0800, Baolu Lu wrote:
> On 2022/7/25 15:50, Tian, Kevin wrote:
> > > From: Baolu Lu <[email protected]>
> > > Sent: Sunday, July 24, 2022 9:48 PM
> > > >
> > > > The API is really refcounting the PASID:
> > > >
> > > > > +struct iommu_sva *iommu_sva_bind_device(struct device *dev,
> > > > > + struct mm_struct *mm);
> > > > > +void iommu_sva_unbind_device(struct iommu_sva *handle);
> > > >
> > > > So what you need to do is store that 'iommu_sva' in the group's PASID
> > > > xarray.
> > > >
> > > > The bind logic would be
> > > >
> > > > sva = xa_load(group->pasid, mm->pasid)
> > > > if (sva)
> > > > refcount_inc(sva->users)
> > > > return sva
> > > > sva = kalloc
> > > > sva->domain = domain
> > > > xa_store(group->pasid, sva);
> > >
> > > Thanks for the suggestion. It makes a lot of sense to me.
> > >
> > > Furthermore, I'd like to separate the generic data from the caller-
> > > specific things because the group->pasid_array should also be able to
> > > serve other usages. Hence, the attach/detach_device_pasid interfaces
> > > might be changed like below:
> > >
> > > /* Collection of per-pasid IOMMU data */
> > > struct group_pasid {
> > > struct iommu_domain *domain;
> > > void *priv;
> > > };
> > >
> >
> > Is there any reason why pasid refcnt is sva specific and needs to be
> > in a priv field?
>
> I am going to store the iommu_sva data which represents the bind
> relationship between device and domain.
Why do you need that?
If you are starting at the pasid xarray then you already know the
group/device, so we don't need to store it again.
The only thing needed is the refcount so just store a refcount in this
structure and be done with it. If someone needs to add something later
then we can use a union or something, but right now adding an untagged
void * is bad.
Jason
On Sun, Jul 24, 2022 at 10:04:50PM +0800, Baolu Lu wrote:
> > Why do we need two falut callbacks? The only difference is that one is
> > recoverable and the other is not, right?
> >
> > Can we run both down the same op?
>
> The iommu_fault_handler_t is for report_iommu_fault() which could be
> replaced with the newer iommu_report_device_fault().
>
> https://lore.kernel.org/linux-iommu/Yo4Nw9QyllT1RZbd@myrica/
Okay, sounds like a good future project, maybe add a fixme comment or
something so others understand.
Jason
On Mon, Jul 25, 2022 at 10:52:40AM +0100, Jean-Philippe Brucker wrote:
> > The iommu core provides the interface to retrieve attached domain with a
> > {device, pasid} pair. Therefore in the smmuv3 driver, the set_dev_pasid
> > could do like this:
>
> Thanks for the example, yes I can do something like this. I maintain that
> attach+detach is clearer, but as long as it can be made to work, fine by
> me
Except it is not clearer, because there isn't actually a detatch in
our model - many things already got messed up in the non-pasid case
because of this confusing assumption.
We have only a "set" operation and set moves between any two domain
configurations.
You don't need to call attach/detach pairs, just repeated attaches,
which is how the normal path works. detach is called in the legacy
flow for the NULL domain
So, creating a pair invites the wrong idea that they actually are a
pair.
> > The check of "(!domain || domain->type == IOMMU_DOMAIN_UNMANAGED)" looks
> > odd, but could get cleaned up after a real blocking domain is added.
> > Then, we can simply check "domain->type == IOMMU_DOMAIN_BLOCKING".
So this is probably a good reason enough not to do it yet, though it
would be nice to get a proper blocking domain concept in the SMMU
driver to support VFIO, it could be done later.
Jason
On 2022/7/25 22:40, Jason Gunthorpe wrote:
> On Sun, Jul 24, 2022 at 03:03:16PM +0800, Baolu Lu wrote:
>
>> How about rephrasing this part of commit message like below:
>>
>> Some buses, like PCI, route packets without considering the PASID value.
>> Thus a DMA target address with PASID might be treated as P2P if the
>> address falls into the MMIO BAR of other devices in the group. To make
>> things simple, these interfaces only apply to devices belonging to the
>> singleton groups.
>
>
>> Considering that the PCI bus supports hot-plug, even a device boots with
>> a singleton group, a later hot-added device is still possible to share
>> the group, which breaks the singleton group assumption. In order to
>> avoid this situation, this interface requires that the ACS is enabled on
>> all devices on the path from the device to the host-PCI bridge.
>
> But ACS directly fixes the routing issue above
>
> This entire explanation can be recast as saying we block PASID
> attachment in all cases where the PCI fabric is routing based on
> address. ACS disables that.
>
> Not sure it even has anything to do with hotplug or singleton??
Yes, agreed. I polished this patch like below. Does it look good to you?
iommu: Add attach/detach_dev_pasid iommu interface
Attaching an IOMMU domain to a PASID of a device is a generic operation
for modern IOMMU drivers which support PASID-granular DMA address
translation. Currently visible usage scenarios include (but not limited):
- SVA (Shared Virtual Address)
- kernel DMA with PASID
- hardware-assist mediated device
This adds a pair of domain ops for this purpose and adds the interfaces
for device drivers to attach/detach a domain to/from a {device, PASID}.
The PCI bus routes packets without considering the PASID value. Thus a
DMA target address with PASID might be treated as P2P if the address
falls into the MMIO BAR of other devices in the group. This blocks the
PASID attachment in all cases where the PCI fabric is routing based on
address. The ACS disables that.
[...]
---
drivers/iommu/iommu.c | 70 +++++++++++++++++++++++++++++++++++++++++++
include/linux/iommu.h | 18 +++++++++++
2 files changed, 88 insertions(+)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 63fc4317cb47..493db6e9302f 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -39,6 +39,7 @@ struct iommu_group {
struct kobject kobj;
struct kobject *devices_kobj;
struct list_head devices;
+ struct xarray pasid_array;
struct mutex mutex;
void *iommu_data;
void (*iommu_data_release)(void *iommu_data);
@@ -663,6 +664,7 @@ struct iommu_group *iommu_group_alloc(void)
mutex_init(&group->mutex);
INIT_LIST_HEAD(&group->devices);
INIT_LIST_HEAD(&group->entry);
+ xa_init(&group->pasid_array);
ret = ida_alloc(&iommu_group_ida, GFP_KERNEL);
if (ret < 0) {
@@ -3254,3 +3256,71 @@ bool iommu_group_dma_owner_claimed(struct
iommu_group *group)
return user;
}
EXPORT_SYMBOL_GPL(iommu_group_dma_owner_claimed);
+
+/*
+ * iommu_attach_device_pasid() - Attach a domain to pasid of device
+ * @domain: the iommu domain.
+ * @dev: the attached device.
+ * @pasid: the pasid of the device.
+ *
+ * Return: 0 on success, or an error.
+ */
+int iommu_attach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
+{
+ struct iommu_group *group;
+ void *curr;
+ int ret;
+
+ if (!domain->ops->set_dev_pasid)
+ return -EOPNOTSUPP;
+
+ /*
+ * Block PASID attachment in all cases where the PCI fabric is
+ * routing based on address. ACS disables it.
+ */
+ if (dev_is_pci(dev) &&
+ !pci_acs_path_enabled(to_pci_dev(dev), NULL, REQ_ACS_FLAGS))
+ return -ENODEV;
+
+ group = iommu_group_get(dev);
+ if (!group)
+ return -ENODEV;
+
+ mutex_lock(&group->mutex);
+ curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain, GFP_KERNEL);
+ if (curr) {
+ ret = xa_err(curr) ? : -EBUSY;
+ goto out_unlock;
+ }
+ ret = domain->ops->set_dev_pasid(domain, dev, pasid);
+ if (ret)
+ xa_erase(&group->pasid_array, pasid);
+out_unlock:
+ mutex_unlock(&group->mutex);
+ iommu_group_put(group);
+
+ return ret;
+}
+
+/*
+ * iommu_detach_device_pasid() - Detach the domain from pasid of device
+ * @domain: the iommu domain.
+ * @dev: the attached device.
+ * @pasid: the pasid of the device.
+ *
+ * The @domain must have been attached to @pasid of the @dev with
+ * iommu_detach_device_pasid().
+ */
+void iommu_detach_device_pasid(struct iommu_domain *domain, struct
device *dev,
+ ioasid_t pasid)
+{
+ struct iommu_group *group = iommu_group_get(dev);
+
+ mutex_lock(&group->mutex);
+ domain->ops->set_dev_pasid(group->blocking_domain, dev, pasid);
+ WARN_ON(xa_erase(&group->pasid_array, pasid) != domain);
+ mutex_unlock(&group->mutex);
+
+ iommu_group_put(group);
+}
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 2f237c3cd680..2c385e6d4b1a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -266,6 +266,7 @@ struct iommu_ops {
* struct iommu_domain_ops - domain specific operations
* @attach_dev: attach an iommu domain to a device
* @detach_dev: detach an iommu domain from a device
+ * @set_dev_pasid: set an iommu domain to a pasid of device
* @map: map a physically contiguous memory region to an iommu domain
* @map_pages: map a physically contiguous set of pages of the same
size to
* an iommu domain.
@@ -286,6 +287,8 @@ struct iommu_ops {
struct iommu_domain_ops {
int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
+ int (*set_dev_pasid)(struct iommu_domain *domain, struct device *dev,
+ ioasid_t pasid);
int (*map)(struct iommu_domain *domain, unsigned long iova,
phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
@@ -680,6 +683,10 @@ int iommu_group_claim_dma_owner(struct iommu_group
*group, void *owner);
void iommu_group_release_dma_owner(struct iommu_group *group);
bool iommu_group_dma_owner_claimed(struct iommu_group *group);
+int iommu_attach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid);
+void iommu_detach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid);
#else /* CONFIG_IOMMU_API */
struct iommu_ops {};
@@ -1047,6 +1054,17 @@ static inline bool
iommu_group_dma_owner_claimed(struct iommu_group *group)
{
return false;
}
+
+static inline int iommu_attach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
+{
+ return -ENODEV;
+}
+
+static inline void iommu_detach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
+{
+}
#endif /* CONFIG_IOMMU_API */
/**
Best regards,
baolu
On 2022/7/25 22:47, Jason Gunthorpe wrote:
> On Mon, Jul 25, 2022 at 06:22:06PM +0800, Baolu Lu wrote:
>> On 2022/7/25 15:50, Tian, Kevin wrote:
>>>> From: Baolu Lu <[email protected]>
>>>> Sent: Sunday, July 24, 2022 9:48 PM
>>>>>
>>>>> The API is really refcounting the PASID:
>>>>>
>>>>>> +struct iommu_sva *iommu_sva_bind_device(struct device *dev,
>>>>>> + struct mm_struct *mm);
>>>>>> +void iommu_sva_unbind_device(struct iommu_sva *handle);
>>>>>
>>>>> So what you need to do is store that 'iommu_sva' in the group's PASID
>>>>> xarray.
>>>>>
>>>>> The bind logic would be
>>>>>
>>>>> sva = xa_load(group->pasid, mm->pasid)
>>>>> if (sva)
>>>>> refcount_inc(sva->users)
>>>>> return sva
>>>>> sva = kalloc
>>>>> sva->domain = domain
>>>>> xa_store(group->pasid, sva);
>>>>
>>>> Thanks for the suggestion. It makes a lot of sense to me.
>>>>
>>>> Furthermore, I'd like to separate the generic data from the caller-
>>>> specific things because the group->pasid_array should also be able to
>>>> serve other usages. Hence, the attach/detach_device_pasid interfaces
>>>> might be changed like below:
>>>>
>>>> /* Collection of per-pasid IOMMU data */
>>>> struct group_pasid {
>>>> struct iommu_domain *domain;
>>>> void *priv;
>>>> };
>>>>
>>>
>>> Is there any reason why pasid refcnt is sva specific and needs to be
>>> in a priv field?
>>
>> I am going to store the iommu_sva data which represents the bind
>> relationship between device and domain.
>
> Why do you need that?
>
> If you are starting at the pasid xarray then you already know the
> group/device, so we don't need to store it again.
>
> The only thing needed is the refcount so just store a refcount in this
> structure and be done with it. If someone needs to add something later
> then we can use a union or something, but right now adding an untagged
> void * is bad.
Fair enough. I will update it accordingly.
Best regards,
baolu
On Tue, Jul 26, 2022 at 02:23:26PM +0800, Baolu Lu wrote:
> On 2022/7/25 22:40, Jason Gunthorpe wrote:
> > On Sun, Jul 24, 2022 at 03:03:16PM +0800, Baolu Lu wrote:
> >
> > > How about rephrasing this part of commit message like below:
> > >
> > > Some buses, like PCI, route packets without considering the PASID value.
> > > Thus a DMA target address with PASID might be treated as P2P if the
> > > address falls into the MMIO BAR of other devices in the group. To make
> > > things simple, these interfaces only apply to devices belonging to the
> > > singleton groups.
> >
> > > Considering that the PCI bus supports hot-plug, even a device boots with
> > > a singleton group, a later hot-added device is still possible to share
> > > the group, which breaks the singleton group assumption. In order to
> > > avoid this situation, this interface requires that the ACS is enabled on
> > > all devices on the path from the device to the host-PCI bridge.
> >
> > But ACS directly fixes the routing issue above
> >
> > This entire explanation can be recast as saying we block PASID
> > attachment in all cases where the PCI fabric is routing based on
> > address. ACS disables that.
> >
> > Not sure it even has anything to do with hotplug or singleton??
>
> Yes, agreed. I polished this patch like below. Does it look good to you?
>
> iommu: Add attach/detach_dev_pasid iommu interface
>
> Attaching an IOMMU domain to a PASID of a device is a generic operation
> for modern IOMMU drivers which support PASID-granular DMA address
> translation. Currently visible usage scenarios include (but not limited):
>
> - SVA (Shared Virtual Address)
> - kernel DMA with PASID
> - hardware-assist mediated device
>
> This adds a pair of domain ops for this purpose and adds the interfaces
> for device drivers to attach/detach a domain to/from a {device,
> PASID}.
> The PCI bus routes packets without considering the PASID value.
More like:
Some configurations of the PCI fabric will route device originated TLP
packets based on memory address, and these configurations are
incompatible with PASID as the PASID packets form a distinct address
space. For instance any configuration where switches are present
without ACS is incompatible with PASID.
> + /*
> + * Block PASID attachment in all cases where the PCI fabric is
> + * routing based on address. ACS disables it.
> + */
> + if (dev_is_pci(dev) &&
> + !pci_acs_path_enabled(to_pci_dev(dev), NULL, REQ_ACS_FLAGS))
> + return -ENODEV;
I would probably still put this in a function just to be clear, and
probably even a PCI layer funcion 'pci_is_pasid_supported' that
clearly indicates that the fabric path can route a PASID packet
without mis-routing it.
If the fabric routes PASID properly then groups are not an issue - all
agree on this?
Jason
> From: Jason Gunthorpe <[email protected]>
> Sent: Tuesday, July 26, 2022 9:57 PM
>
> On Tue, Jul 26, 2022 at 02:23:26PM +0800, Baolu Lu wrote:
> > On 2022/7/25 22:40, Jason Gunthorpe wrote:
> > > On Sun, Jul 24, 2022 at 03:03:16PM +0800, Baolu Lu wrote:
> > >
> > > > How about rephrasing this part of commit message like below:
> > > >
> > > > Some buses, like PCI, route packets without considering the PASID value.
> > > > Thus a DMA target address with PASID might be treated as P2P if the
> > > > address falls into the MMIO BAR of other devices in the group. To make
> > > > things simple, these interfaces only apply to devices belonging to the
> > > > singleton groups.
> > >
> > > > Considering that the PCI bus supports hot-plug, even a device boots
> with
> > > > a singleton group, a later hot-added device is still possible to share
> > > > the group, which breaks the singleton group assumption. In order to
> > > > avoid this situation, this interface requires that the ACS is enabled on
> > > > all devices on the path from the device to the host-PCI bridge.
> > >
> > > But ACS directly fixes the routing issue above
> > >
> > > This entire explanation can be recast as saying we block PASID
> > > attachment in all cases where the PCI fabric is routing based on
> > > address. ACS disables that.
> > >
> > > Not sure it even has anything to do with hotplug or singleton??
> >
> > Yes, agreed. I polished this patch like below. Does it look good to you?
> >
> > iommu: Add attach/detach_dev_pasid iommu interface
> >
> > Attaching an IOMMU domain to a PASID of a device is a generic operation
> > for modern IOMMU drivers which support PASID-granular DMA address
> > translation. Currently visible usage scenarios include (but not limited):
> >
> > - SVA (Shared Virtual Address)
> > - kernel DMA with PASID
> > - hardware-assist mediated device
> >
> > This adds a pair of domain ops for this purpose and adds the interfaces
> > for device drivers to attach/detach a domain to/from a {device,
> > PASID}.
>
> > The PCI bus routes packets without considering the PASID value.
>
> More like:
>
> Some configurations of the PCI fabric will route device originated TLP
> packets based on memory address, and these configurations are
> incompatible with PASID as the PASID packets form a distinct address
> space. For instance any configuration where switches are present
> without ACS is incompatible with PASID.
This description reads like ACS enables PASID-based routing...
In reality PCI fabric always route TLP based on memory address.
ACS just provides a way to redirect the packet to RC, with or
without PASID.
So it's simply that PASID requires such redirection hence ACS
because only RC/IOMMU understands PASID and related
address space.
>
> > + /*
> > + * Block PASID attachment in all cases where the PCI fabric is
> > + * routing based on address. ACS disables it.
> > + */
> > + if (dev_is_pci(dev) &&
> > + !pci_acs_path_enabled(to_pci_dev(dev), NULL, REQ_ACS_FLAGS))
> > + return -ENODEV;
>
> I would probably still put this in a function just to be clear, and
> probably even a PCI layer funcion 'pci_is_pasid_supported' that
> clearly indicates that the fabric path can route a PASID packet
> without mis-routing it.
But there is no single line in above check related to PASID...
>
> If the fabric routes PASID properly then groups are not an issue - all
> agree on this?
>
IMHO if the fabric can route PASID properly, and according to
above once such redirect is available it applies to both non-PASID
and PASID TLP, then the group will be singleton in the first place.
Is there a real-world example where the fabric can route PASID
properly for a multi-devices group?
Thanks
Kevin
On Wed, Jul 27, 2022 at 03:20:25AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <[email protected]>
> > Sent: Tuesday, July 26, 2022 9:57 PM
> >
> > On Tue, Jul 26, 2022 at 02:23:26PM +0800, Baolu Lu wrote:
> > > On 2022/7/25 22:40, Jason Gunthorpe wrote:
> > > > On Sun, Jul 24, 2022 at 03:03:16PM +0800, Baolu Lu wrote:
> > > >
> > > > > How about rephrasing this part of commit message like below:
> > > > >
> > > > > Some buses, like PCI, route packets without considering the PASID value.
> > > > > Thus a DMA target address with PASID might be treated as P2P if the
> > > > > address falls into the MMIO BAR of other devices in the group. To make
> > > > > things simple, these interfaces only apply to devices belonging to the
> > > > > singleton groups.
> > > >
> > > > > Considering that the PCI bus supports hot-plug, even a device boots
> > with
> > > > > a singleton group, a later hot-added device is still possible to share
> > > > > the group, which breaks the singleton group assumption. In order to
> > > > > avoid this situation, this interface requires that the ACS is enabled on
> > > > > all devices on the path from the device to the host-PCI bridge.
> > > >
> > > > But ACS directly fixes the routing issue above
> > > >
> > > > This entire explanation can be recast as saying we block PASID
> > > > attachment in all cases where the PCI fabric is routing based on
> > > > address. ACS disables that.
> > > >
> > > > Not sure it even has anything to do with hotplug or singleton??
> > >
> > > Yes, agreed. I polished this patch like below. Does it look good to you?
> > >
> > > iommu: Add attach/detach_dev_pasid iommu interface
> > >
> > > Attaching an IOMMU domain to a PASID of a device is a generic operation
> > > for modern IOMMU drivers which support PASID-granular DMA address
> > > translation. Currently visible usage scenarios include (but not limited):
> > >
> > > - SVA (Shared Virtual Address)
> > > - kernel DMA with PASID
> > > - hardware-assist mediated device
> > >
> > > This adds a pair of domain ops for this purpose and adds the interfaces
> > > for device drivers to attach/detach a domain to/from a {device,
> > > PASID}.
> >
> > > The PCI bus routes packets without considering the PASID value.
> >
> > More like:
> >
> > Some configurations of the PCI fabric will route device originated TLP
> > packets based on memory address, and these configurations are
> > incompatible with PASID as the PASID packets form a distinct address
> > space. For instance any configuration where switches are present
> > without ACS is incompatible with PASID.
>
> This description reads like ACS enables PASID-based routing...
Well, that is kind of what it is.
> In reality PCI fabric always route TLP based on memory address.
> ACS just provides a way to redirect the packet to RC, with or
> without PASID.
Always except in all the cases it doesn't, like ACS :)
> > > + * Block PASID attachment in all cases where the PCI fabric is
> > > + * routing based on address. ACS disables it.
> > > + */
> > > + if (dev_is_pci(dev) &&
> > > + !pci_acs_path_enabled(to_pci_dev(dev), NULL, REQ_ACS_FLAGS))
> > > + return -ENODEV;
> >
> > I would probably still put this in a function just to be clear, and
> > probably even a PCI layer funcion 'pci_is_pasid_supported' that
> > clearly indicates that the fabric path can route a PASID packet
> > without mis-routing it.
>
> But there is no single line in above check related to PASID...
The question to answer here is if the device/fabric supports PASID,
and on PCI that requires ACS on any switches. IMHO that is a PCI layer
question and perhaps we shouldn't even succeed pci_enable_pasid() if
ACS isn't on.
Then we don't need this weirdo check in the core iommu code at all.
Jason
On 2022/7/26 21:57, Jason Gunthorpe wrote:
> On Tue, Jul 26, 2022 at 02:23:26PM +0800, Baolu Lu wrote:
>> On 2022/7/25 22:40, Jason Gunthorpe wrote:
>>> On Sun, Jul 24, 2022 at 03:03:16PM +0800, Baolu Lu wrote:
>>>
>>>> How about rephrasing this part of commit message like below:
>>>>
>>>> Some buses, like PCI, route packets without considering the PASID value.
>>>> Thus a DMA target address with PASID might be treated as P2P if the
>>>> address falls into the MMIO BAR of other devices in the group. To make
>>>> things simple, these interfaces only apply to devices belonging to the
>>>> singleton groups.
>>>
>>>> Considering that the PCI bus supports hot-plug, even a device boots with
>>>> a singleton group, a later hot-added device is still possible to share
>>>> the group, which breaks the singleton group assumption. In order to
>>>> avoid this situation, this interface requires that the ACS is enabled on
>>>> all devices on the path from the device to the host-PCI bridge.
>>>
>>> But ACS directly fixes the routing issue above
>>>
>>> This entire explanation can be recast as saying we block PASID
>>> attachment in all cases where the PCI fabric is routing based on
>>> address. ACS disables that.
>>>
>>> Not sure it even has anything to do with hotplug or singleton??
>>
>> Yes, agreed. I polished this patch like below. Does it look good to you?
>>
>> iommu: Add attach/detach_dev_pasid iommu interface
>>
>> Attaching an IOMMU domain to a PASID of a device is a generic operation
>> for modern IOMMU drivers which support PASID-granular DMA address
>> translation. Currently visible usage scenarios include (but not limited):
>>
>> - SVA (Shared Virtual Address)
>> - kernel DMA with PASID
>> - hardware-assist mediated device
>>
>> This adds a pair of domain ops for this purpose and adds the interfaces
>> for device drivers to attach/detach a domain to/from a {device,
>> PASID}.
>
>> The PCI bus routes packets without considering the PASID value.
>
> More like:
>
> Some configurations of the PCI fabric will route device originated TLP
> packets based on memory address, and these configurations are
> incompatible with PASID as the PASID packets form a distinct address
> space. For instance any configuration where switches are present
> without ACS is incompatible with PASID.
This description reads more accurate and professional. Thank you! I will
update the patch with this.
>
>> + /*
>> + * Block PASID attachment in all cases where the PCI fabric is
>> + * routing based on address. ACS disables it.
>> + */
>> + if (dev_is_pci(dev) &&
>> + !pci_acs_path_enabled(to_pci_dev(dev), NULL, REQ_ACS_FLAGS))
>> + return -ENODEV;
>
> I would probably still put this in a function just to be clear, and
> probably even a PCI layer funcion 'pci_is_pasid_supported' that
> clearly indicates that the fabric path can route a PASID packet
> without mis-routing it.
Fair enough. Let's keep this in iommu for now and leave the possible
merging in PCI subsystem a future task.
>
> If the fabric routes PASID properly then groups are not an issue - all
> agree on this?
I still think the singleton group is required, but it's not related to
the PCI fabric routing discussed here.
We have a single array for PASIDs in the iommu group. All devices
sitting in the group should share a single PASID namespace. However both
the translation structures for IOMMU hardware or the device drivers can
only adapt to per-device PASID namespace. Hence, it's reasonable to
require the singleton group.
Best regards,
baolu
> From: Jason Gunthorpe <[email protected]>
> Sent: Wednesday, July 27, 2022 7:54 PM
>
> On Wed, Jul 27, 2022 at 03:20:25AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <[email protected]>
> > > Sent: Tuesday, July 26, 2022 9:57 PM
> > >
> > > On Tue, Jul 26, 2022 at 02:23:26PM +0800, Baolu Lu wrote:
> > > > On 2022/7/25 22:40, Jason Gunthorpe wrote:
> > > > > On Sun, Jul 24, 2022 at 03:03:16PM +0800, Baolu Lu wrote:
> > > > >
> > > > + * Block PASID attachment in all cases where the PCI fabric is
> > > > + * routing based on address. ACS disables it.
> > > > + */
> > > > + if (dev_is_pci(dev) &&
> > > > + !pci_acs_path_enabled(to_pci_dev(dev), NULL, REQ_ACS_FLAGS))
> > > > + return -ENODEV;
> > >
> > > I would probably still put this in a function just to be clear, and
> > > probably even a PCI layer funcion 'pci_is_pasid_supported' that
> > > clearly indicates that the fabric path can route a PASID packet
> > > without mis-routing it.
> >
> > But there is no single line in above check related to PASID...
>
> The question to answer here is if the device/fabric supports PASID,
> and on PCI that requires ACS on any switches. IMHO that is a PCI layer
> question and perhaps we shouldn't even succeed pci_enable_pasid() if
> ACS isn't on.
Yes, this sounds a better approach than inventing another function
for iommu core to check.
>
> Then we don't need this weirdo check in the core iommu code at all.
>
and then we could also move group->pasid_array to device->pasid_array
with this approach. Though the end result doesn't change i.e. still only
the singleton group can enable pasid the iommu core can just stick to
the device manner now.
On 2022/7/28 10:44, Baolu Lu wrote:
>>
>> If the fabric routes PASID properly then groups are not an issue - all
>> agree on this?
>
> I still think the singleton group is required, but it's not related to
> the PCI fabric routing discussed here.
>
> We have a single array for PASIDs in the iommu group. All devices
> sitting in the group should share a single PASID namespace. However both
> the translation structures for IOMMU hardware or the device drivers can
> only adapt to per-device PASID namespace. Hence, it's reasonable to
> require the singleton group.
Further, conceptually, we cannot support pasid attach/detach on multi-
device groups. If multiple devices cannot be isolated, it is difficult
to ensure that their pasid spaces are isolated from each other.
Therefore, it is wrong to attach a domain to the pasid of a device. All
devices in the group must share a domain.
Best regards,
baolu
On Thu, Jul 28, 2022 at 03:06:47AM +0000, Tian, Kevin wrote:
> > Then we don't need this weirdo check in the core iommu code at all.
>
> and then we could also move group->pasid_array to device->pasid_array
> with this approach. Though the end result doesn't change i.e. still only
> the singleton group can enable pasid the iommu core can just stick to
> the device manner now.
I don't see why, the group is still logically the unit of attachment
in the iommu area, and if we have a multi-device group it just means
we iterate over all the devices in the group when doing pasid set, no
different than a RID.
Jason
> From: Jason Gunthorpe <[email protected]>
> Sent: Thursday, July 28, 2022 8:00 PM
>
> On Thu, Jul 28, 2022 at 03:06:47AM +0000, Tian, Kevin wrote:
>
> > > Then we don't need this weirdo check in the core iommu code at all.
> >
> > and then we could also move group->pasid_array to device->pasid_array
> > with this approach. Though the end result doesn't change i.e. still only
> > the singleton group can enable pasid the iommu core can just stick to
> > the device manner now.
>
> I don't see why, the group is still logically the unit of attachment
> in the iommu area, and if we have a multi-device group it just means
> we iterate over all the devices in the group when doing pasid set, no
> different than a RID.
Probably I overthought about this.
To enable PASID in a multi-device group one prerequisite is to reserve
P2P ranges of the group in the related address space (let's assume
there is a way to do that reservation). In this case even w/o ACS in the
switch port all DMA requests from the group can be still routed to
upstream.
Then for a group created due to lacking of ACS looks we can still have
per-device PASID tables in the group.
But for a group created due to RID mess e.g. PCI bridge the PASID table
has to be shared by the entire group. So yes, from this angle leaving
one table per group is a simpler thing to do, especially when it's
unclear whether there is real demand to enable PASID for multi-device
group. ????
Thanks
Kevin
On 2022/7/28 19:59, Jason Gunthorpe wrote:
> On Thu, Jul 28, 2022 at 03:06:47AM +0000, Tian, Kevin wrote:
>
>>> Then we don't need this weirdo check in the core iommu code at all.
>>
>> and then we could also move group->pasid_array to device->pasid_array
>> with this approach. Though the end result doesn't change i.e. still only
>> the singleton group can enable pasid the iommu core can just stick to
>> the device manner now.
>
> I don't see why, the group is still logically the unit of attachment
> in the iommu area, and if we have a multi-device group it just means
> we iterate over all the devices in the group when doing pasid set, no
> different than a RID.
Okay. Based on the discussions in this thread, this patch will evolve to
look like below. Any comments or concerns?
[PATCH 04/12] iommu: Add attach/detach_dev_pasid iommu interface
Attaching an IOMMU domain to a PASID of a device is a generic operation
for modern IOMMU drivers which support PASID-granular DMA address
translation. Currently visible usage scenarios include (but not limited):
- SVA (Shared Virtual Address)
- kernel DMA with PASID
- hardware-assist mediated device
This adds set_dev_pasid domain ops for this purpose and also adds some
interfaces for device drivers to attach/detach/retrieve a domain for a
PASID of a device.
Multiple devices in an iommu group cannot be isolated from each other,
then it's also difficult to ensure that their pasid address spaces are
isolated without any guarantee of hardware mechanism. To make things
simple, this starts the PASID attach/detach support from the devices
belonging to singleton groups.
Some configurations of the PCI fabric will route device originated TLP
packets based on the memory addresses. These configurations are
incompatible with PASID as the PASID packets form a distinct address
space. For instance, any configuration where switches are present
without ACS enabled is incompatible. In the future, we can further
discuss the possibility of moving this restriction into PCI subsystem
and making it a prerequisite of pci_enable_pasid().
[--tags removed --]
---
include/linux/iommu.h | 26 ++++++++
drivers/iommu/iommu.c | 146 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 172 insertions(+)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 2f237c3cd680..f1e8953b1e2e 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -266,6 +266,7 @@ struct iommu_ops {
* struct iommu_domain_ops - domain specific operations
* @attach_dev: attach an iommu domain to a device
* @detach_dev: detach an iommu domain from a device
+ * @set_dev_pasid: set an iommu domain to a pasid of device
* @map: map a physically contiguous memory region to an iommu domain
* @map_pages: map a physically contiguous set of pages of the same
size to
* an iommu domain.
@@ -286,6 +287,8 @@ struct iommu_ops {
struct iommu_domain_ops {
int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
+ int (*set_dev_pasid)(struct iommu_domain *domain, struct device *dev,
+ ioasid_t pasid);
int (*map)(struct iommu_domain *domain, unsigned long iova,
phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
@@ -680,6 +683,12 @@ int iommu_group_claim_dma_owner(struct iommu_group
*group, void *owner);
void iommu_group_release_dma_owner(struct iommu_group *group);
bool iommu_group_dma_owner_claimed(struct iommu_group *group);
+int iommu_attach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid);
+void iommu_detach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid);
+struct iommu_domain *
+iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid);
#else /* CONFIG_IOMMU_API */
struct iommu_ops {};
@@ -1047,6 +1056,23 @@ static inline bool
iommu_group_dma_owner_claimed(struct iommu_group *group)
{
return false;
}
+
+static inline int iommu_attach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
+{
+ return -ENODEV;
+}
+
+static inline void iommu_detach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
+{
+}
+
+static inline struct iommu_domain *
+iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid)
+{
+ return NULL;
+}
#endif /* CONFIG_IOMMU_API */
/**
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 63fc4317cb47..46b8bffddc4d 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -39,6 +39,7 @@ struct iommu_group {
struct kobject kobj;
struct kobject *devices_kobj;
struct list_head devices;
+ struct xarray pasid_array;
struct mutex mutex;
void *iommu_data;
void (*iommu_data_release)(void *iommu_data);
@@ -663,6 +664,7 @@ struct iommu_group *iommu_group_alloc(void)
mutex_init(&group->mutex);
INIT_LIST_HEAD(&group->devices);
INIT_LIST_HEAD(&group->entry);
+ xa_init(&group->pasid_array);
ret = ida_alloc(&iommu_group_ida, GFP_KERNEL);
if (ret < 0) {
@@ -890,6 +892,14 @@ int iommu_group_add_device(struct iommu_group
*group, struct device *dev)
int ret, i = 0;
struct group_device *device;
+ /*
+ * The iommu_attach_device_pasid() requires a singleton group.
+ * Refuse to add a device into it if this assumption has been
+ * made.
+ */
+ if (!xa_empty(&group->pasid_array))
+ return -EBUSY;
+
device = kzalloc(sizeof(*device), GFP_KERNEL);
if (!device)
return -ENOMEM;
@@ -3254,3 +3264,139 @@ bool iommu_group_dma_owner_claimed(struct
iommu_group *group)
return user;
}
EXPORT_SYMBOL_GPL(iommu_group_dma_owner_claimed);
+
+/*
+ * Check the viability of PASID attach/detach on a @dev of a @group.
+ *
+ * Basically we don't support pasid attach/detach on multi-device groups.
+ * Multiple devices in an iommu group cannot be isolated from each other,
+ * then it's also difficult to ensure that their pasid address spaces are
+ * isolated without any guarantee of hardware mechanism. To make things
+ * simple, let's start the PASID attach/detach support from the devices
+ * belonging to singleton groups.
+ *
+ * Some configurations of the PCI fabric will route device originated TLP
+ * packets based on memory address, and these configurations are
+ * incompatible with PASID as the PASID packets form a distinct address
+ * space. For instance any configuration where switches are present
+ * without ACS enabled is incompatible.
+ */
+static bool iommu_group_device_pasid_viable(struct iommu_group *group,
+ struct device *dev)
+{
+ int count;
+
+ count = iommu_group_device_count(group);
+ if (count != 1)
+ return false;
+
+ /*
+ * Block PASID attachment in cases where the PCI fabric is
+ * routing based on address. PCI/ACS disables that.
+ */
+ if (dev_is_pci(dev))
+ return pci_acs_path_enabled(to_pci_dev(dev), NULL,
+ REQ_ACS_FLAGS);
+
+ /*
+ * Otherwise, the device came from DT/ACPI, assume it is static and
+ * then singleton can know from the device count in the group.
+ */
+ return true;
+}
+
+/*
+ * iommu_attach_device_pasid() - Attach a domain to pasid of device
+ * @domain: the iommu domain.
+ * @dev: the attached device.
+ * @pasid: the pasid of the device.
+ *
+ * Return: 0 on success, or an error.
+ */
+int iommu_attach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
+{
+ struct iommu_group *group;
+ int ret = -ENODEV;
+ void *curr;
+
+ if (!domain->ops->set_dev_pasid)
+ return -EOPNOTSUPP;
+
+ group = iommu_group_get(dev);
+ if (!group)
+ return -ENODEV;
+
+ mutex_lock(&group->mutex);
+ if (!iommu_group_device_pasid_viable(group, dev))
+ goto out_unlock;
+ curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain, GFP_KERNEL);
+ if (curr) {
+ ret = xa_err(curr) ? : -EBUSY;
+ goto out_unlock;
+ }
+ ret = domain->ops->set_dev_pasid(domain, dev, pasid);
+ if (ret)
+ xa_erase(&group->pasid_array, pasid);
+out_unlock:
+ mutex_unlock(&group->mutex);
+ iommu_group_put(group);
+
+ return ret;
+}
+
+/*
+ * iommu_detach_device_pasid() - Detach the domain from pasid of device
+ * @domain: the iommu domain.
+ * @dev: the attached device.
+ * @pasid: the pasid of the device.
+ *
+ * The @domain must have been attached to @pasid of the @dev with
+ * iommu_detach_device_pasid().
+ */
+void iommu_detach_device_pasid(struct iommu_domain *domain, struct
device *dev,
+ ioasid_t pasid)
+{
+ struct iommu_group *group = iommu_group_get(dev);
+
+ mutex_lock(&group->mutex);
+ domain->ops->set_dev_pasid(group->blocking_domain, dev, pasid);
+ WARN_ON(xa_erase(&group->pasid_array, pasid) != domain);
+ mutex_unlock(&group->mutex);
+
+ iommu_group_put(group);
+}
+
+/*
+ * iommu_get_domain_for_dev_pasid() - Retrieve domain for @pasid of @dev
+ * @dev: the queried device
+ * @pasid: the pasid of the device
+ *
+ * This is a variant of iommu_get_domain_for_dev(). It returns the existing
+ * domain attached to pasid of a device. It's only for internal use of the
+ * IOMMU subsystem. The caller must take care to avoid any possible
+ * use-after-free case.
+ *
+ * Return: attached domain on success, NULL otherwise.
+ */
+struct iommu_domain *
+iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid)
+{
+ struct iommu_domain *domain;
+ struct iommu_group *group;
+
+ if (!pasid_valid(pasid))
+ return NULL;
+
+ group = iommu_group_get(dev);
+ if (!group)
+ return NULL;
+ /*
+ * The xarray protects its internal state with RCU. Hence the domain
+ * obtained is either NULL or fully formed.
+ */
+ domain = xa_load(&group->pasid_array, pasid);
+ iommu_group_put(group);
+
+ return domain;
+}
Best regards,
baolu
On 2022/7/29 10:56, Tian, Kevin wrote:
>> +static bool iommu_group_device_pasid_viable(struct iommu_group *group,
>> + struct device *dev)
>> +{
>> + int count;
>> +
>> + count = iommu_group_device_count(group);
>> + if (count != 1)
>> + return false;
>> +
>> + /*
>> + * Block PASID attachment in cases where the PCI fabric is
>> + * routing based on address. PCI/ACS disables that.
>> + */
>> + if (dev_is_pci(dev))
>> + return pci_acs_path_enabled(to_pci_dev(dev), NULL,
>> + REQ_ACS_FLAGS);
> I think we are leaning toward doing above check in pci_enable_pasid().
> Then no singleton check inside iommu core.
The iommu grouping also considers other things, like PCI alias. There
are many calls of pci_add_dma_alias() in drivers/pci/quirks.c.
Therefore, I believe that pci_acs_path_enabled() returning true doesn't
guarantees a singleton group.
>
> Presumably similar check can be done in DT/ACPI path of enabling pasid?
>
I can't find the pasid (or anything similar) enabling interfaces for
DT or ACPI. They are device specific?
Best regards,
baolu
> From: Baolu Lu <[email protected]>
> Sent: Friday, July 29, 2022 10:49 AM
>
> On 2022/7/28 19:59, Jason Gunthorpe wrote:
> > On Thu, Jul 28, 2022 at 03:06:47AM +0000, Tian, Kevin wrote:
> >
> >>> Then we don't need this weirdo check in the core iommu code at all.
> >>
> >> and then we could also move group->pasid_array to device->pasid_array
> >> with this approach. Though the end result doesn't change i.e. still only
> >> the singleton group can enable pasid the iommu core can just stick to
> >> the device manner now.
> >
> > I don't see why, the group is still logically the unit of attachment
> > in the iommu area, and if we have a multi-device group it just means
> > we iterate over all the devices in the group when doing pasid set, no
> > different than a RID.
>
> Okay. Based on the discussions in this thread, this patch will evolve to
> look like below. Any comments or concerns?
>
...
> +static bool iommu_group_device_pasid_viable(struct iommu_group *group,
> + struct device *dev)
> +{
> + int count;
> +
> + count = iommu_group_device_count(group);
> + if (count != 1)
> + return false;
> +
> + /*
> + * Block PASID attachment in cases where the PCI fabric is
> + * routing based on address. PCI/ACS disables that.
> + */
> + if (dev_is_pci(dev))
> + return pci_acs_path_enabled(to_pci_dev(dev), NULL,
> + REQ_ACS_FLAGS);
I think we are leaning toward doing above check in pci_enable_pasid().
Then no singleton check inside iommu core.
Presumably similar check can be done in DT/ACPI path of enabling pasid?
> +
> + /*
> + * Otherwise, the device came from DT/ACPI, assume it is static and
> + * then singleton can know from the device count in the group.
> + */
> + return true;
> +}
> From: Baolu Lu <[email protected]>
> Sent: Friday, July 29, 2022 11:21 AM
>
> On 2022/7/29 10:56, Tian, Kevin wrote:
> >> +static bool iommu_group_device_pasid_viable(struct iommu_group
> *group,
> >> + struct device *dev)
> >> +{
> >> + int count;
> >> +
> >> + count = iommu_group_device_count(group);
> >> + if (count != 1)
> >> + return false;
> >> +
> >> + /*
> >> + * Block PASID attachment in cases where the PCI fabric is
> >> + * routing based on address. PCI/ACS disables that.
> >> + */
> >> + if (dev_is_pci(dev))
> >> + return pci_acs_path_enabled(to_pci_dev(dev), NULL,
> >> + REQ_ACS_FLAGS);
> > I think we are leaning toward doing above check in pci_enable_pasid().
> > Then no singleton check inside iommu core.
>
> The iommu grouping also considers other things, like PCI alias. There
> are many calls of pci_add_dma_alias() in drivers/pci/quirks.c.
> Therefore, I believe that pci_acs_path_enabled() returning true doesn't
> guarantees a singleton group.
Is there an actual problem of sharing PASID table between aliasing RIDs?
As long as ACS is enabled the device is isolated from other devices
in the fabric. DMA aliases don't change that fact and there is no p2p
between aliasing RIDs.
>
> >
> > Presumably similar check can be done in DT/ACPI path of enabling pasid?
> >
>
> I can't find the pasid (or anything similar) enabling interfaces for
> DT or ACPI. They are device specific?
>
Looks only PCI PASID is supported so far. both in Intel/ARM/AMD
drivers. If other buses will support PASID one day, then ACS-equivalent
can be also checked in their PASID enabling APIs.
On Fri, Jul 29, 2022 at 02:51:02AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <[email protected]>
> > Sent: Thursday, July 28, 2022 8:00 PM
> >
> > On Thu, Jul 28, 2022 at 03:06:47AM +0000, Tian, Kevin wrote:
> >
> > > > Then we don't need this weirdo check in the core iommu code at all.
> > >
> > > and then we could also move group->pasid_array to device->pasid_array
> > > with this approach. Though the end result doesn't change i.e. still only
> > > the singleton group can enable pasid the iommu core can just stick to
> > > the device manner now.
> >
> > I don't see why, the group is still logically the unit of attachment
> > in the iommu area, and if we have a multi-device group it just means
> > we iterate over all the devices in the group when doing pasid set, no
> > different than a RID.
>
> Probably I overthought about this.
>
> To enable PASID in a multi-device group one prerequisite is to reserve
> P2P ranges of the group in the related address space (let's assume
> there is a way to do that reservation).
No, that isn't the requirement - the only requirement is that every TLP
marked with a PASID is routed to the host bridge and only the host
bridge.
ACS achieves this universally, that is what it means in PCI.
We should not even think about supporting PASID in environments where
there is address routing present because it will never work properly
(eg SVA is a complete no-go)
> But for a group created due to RID mess e.g. PCI bridge the PASID table
> has to be shared by the entire group.
A legacy PCI bridge will be without ACS so it already fails the ACS
test. No issue.
The RID issue is that we can't reliably tell the source apart in a
group - so all the RIDs in a group have to be considered as the same
RID, and mapped to the same PASID table.
But that is the only restriction of a group we have left, because the
'iommu doesn't isolate all traffic' restriction is defined not to
exist if PASID is supported.
> So yes, from this angle leaving one table per group is a simpler
> thing to do, especially when it's unclear whether there is real
> demand to enable PASID for multi-device group. ????
Except it is confusing, complicated and unnecessary. Treating PASID of
multi-device groups the same as everything else is logically simple.
Jason
On 2022/7/29 12:22, Tian, Kevin wrote:
>> From: Baolu Lu <[email protected]>
>> Sent: Friday, July 29, 2022 11:21 AM
>>
>> On 2022/7/29 10:56, Tian, Kevin wrote:
>>>> +static bool iommu_group_device_pasid_viable(struct iommu_group
>> *group,
>>>> + struct device *dev)
>>>> +{
>>>> + int count;
>>>> +
>>>> + count = iommu_group_device_count(group);
>>>> + if (count != 1)
>>>> + return false;
>>>> +
>>>> + /*
>>>> + * Block PASID attachment in cases where the PCI fabric is
>>>> + * routing based on address. PCI/ACS disables that.
>>>> + */
>>>> + if (dev_is_pci(dev))
>>>> + return pci_acs_path_enabled(to_pci_dev(dev), NULL,
>>>> + REQ_ACS_FLAGS);
>>> I think we are leaning toward doing above check in pci_enable_pasid().
>>> Then no singleton check inside iommu core.
>>
>> The iommu grouping also considers other things, like PCI alias. There
>> are many calls of pci_add_dma_alias() in drivers/pci/quirks.c.
>> Therefore, I believe that pci_acs_path_enabled() returning true doesn't
>> guarantees a singleton group.
>
> Is there an actual problem of sharing PASID table between aliasing RIDs?
> As long as ACS is enabled the device is isolated from other devices
> in the fabric. DMA aliases don't change that fact and there is no p2p
> between aliasing RIDs.
Yes. Agreed.
At present, the visible PASID use cases only occur on the singleton
groups, so we can start to support it from this simple situation. In the
future, if the multi-device groups need to support pasid, we can simply
apply the domain to the PASIDs of all device of the group.
>
>>
>>>
>>> Presumably similar check can be done in DT/ACPI path of enabling pasid?
>>>
>>
>> I can't find the pasid (or anything similar) enabling interfaces for
>> DT or ACPI. They are device specific?
>>
>
> Looks only PCI PASID is supported so far. both in Intel/ARM/AMD
> drivers. If other buses will support PASID one day, then ACS-equivalent
> can be also checked in their PASID enabling APIs.
Yes. Fair enough.
Best regards,
baolu
On 2022/7/29 20:22, Jason Gunthorpe wrote:
> The RID issue is that we can't reliably tell the source apart in a
> group - so all the RIDs in a group have to be considered as the same
> RID, and mapped to the same PASID table.
>
> But that is the only restriction of a group we have left, because the
> 'iommu doesn't isolate all traffic' restriction is defined not to
> exist if PASID is supported.
Get you. Thank you for the guide.
>
>> So yes, from this angle leaving one table per group is a simpler
>> thing to do, especially when it's unclear whether there is real
>> demand to enable PASID for multi-device group. ????
> Except it is confusing, complicated and unnecessary. Treating PASID of
> multi-device groups the same as everything else is logically simple.
Yes. Considering that current PASID use cases occur only on singleton
groups, to make things simple, let's start our PASID attachment support
simply from singleton groups.
Best regards,
baolu
On 2022/7/5 13:07, Lu Baolu wrote:
> Use this field to save the number of PASIDs that a device is able to
> consume. It is a generic attribute of a device and lifting it into the
> per-device dev_iommu struct could help to avoid the boilerplate code
> in various IOMMU drivers.
>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Kevin Tian <[email protected]>
> Tested-by: Zhangfei Gao <[email protected]>
> Tested-by: Tony Zhu <[email protected]>
> ---
> include/linux/iommu.h | 2 ++
> drivers/iommu/iommu.c | 20 ++++++++++++++++++++
> 2 files changed, 22 insertions(+)
Reviewed-by: Yi Liu <[email protected]>
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 03fbb1b71536..418a1914a041 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -364,6 +364,7 @@ struct iommu_fault_param {
> * @fwspec: IOMMU fwspec data
> * @iommu_dev: IOMMU device this device is linked to
> * @priv: IOMMU Driver private data
> + * @max_pasids: number of PASIDs this device can consume
> *
> * TODO: migrate other per device data pointers under iommu_dev_data, e.g.
> * struct iommu_group *iommu_group;
> @@ -375,6 +376,7 @@ struct dev_iommu {
> struct iommu_fwspec *fwspec;
> struct iommu_device *iommu_dev;
> void *priv;
> + u32 max_pasids;
> };
>
> int iommu_device_register(struct iommu_device *iommu,
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index cdc86c39954e..0cb0750f61e8 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -20,6 +20,7 @@
> #include <linux/idr.h>
> #include <linux/err.h>
> #include <linux/pci.h>
> +#include <linux/pci-ats.h>
> #include <linux/bitops.h>
> #include <linux/property.h>
> #include <linux/fsl/mc.h>
> @@ -218,6 +219,24 @@ static void dev_iommu_free(struct device *dev)
> kfree(param);
> }
>
> +static u32 dev_iommu_get_max_pasids(struct device *dev)
> +{
> + u32 max_pasids = 0, bits = 0;
> + int ret;
> +
> + if (dev_is_pci(dev)) {
> + ret = pci_max_pasids(to_pci_dev(dev));
> + if (ret > 0)
> + max_pasids = ret;
> + } else {
> + ret = device_property_read_u32(dev, "pasid-num-bits", &bits);
> + if (!ret)
> + max_pasids = 1UL << bits;
> + }
> +
> + return min_t(u32, max_pasids, dev->iommu->iommu_dev->max_pasids);
> +}
> +
> static int __iommu_probe_device(struct device *dev, struct list_head *group_list)
> {
> const struct iommu_ops *ops = dev->bus->iommu_ops;
> @@ -243,6 +262,7 @@ static int __iommu_probe_device(struct device *dev, struct list_head *group_list
> }
>
> dev->iommu->iommu_dev = iommu_dev;
> + dev->iommu->max_pasids = dev_iommu_get_max_pasids(dev);
>
> group = iommu_group_get_for_dev(dev);
> if (IS_ERR(group)) {
--
Regards,
Yi Liu
On 2022/7/5 13:07, Lu Baolu wrote:
> The current kernel DMA with PASID support is based on the SVA with a flag
> SVM_FLAG_SUPERVISOR_MODE. The IOMMU driver binds the kernel memory address
> space to a PASID of the device. The device driver programs the device with
> kernel virtual address (KVA) for DMA access. There have been security and
> functional issues with this approach:
>
> - The lack of IOTLB synchronization upon kernel page table updates.
> (vmalloc, module/BPF loading, CONFIG_DEBUG_PAGEALLOC etc.)
> - Other than slight more protection, using kernel virtual address (KVA)
> has little advantage over physical address. There are also no use
> cases yet where DMA engines need kernel virtual addresses for in-kernel
> DMA.
>
> This removes SVM_FLAG_SUPERVISOR_MODE support from the IOMMU interface.
> The device drivers are suggested to handle kernel DMA with PASID through
> the kernel DMA APIs.
>
> The drvdata parameter in iommu_sva_bind_device() and all callbacks is not
> needed anymore. Cleanup them as well.
>
> Link: https://lore.kernel.org/linux-iommu/[email protected]/
> Signed-off-by: Jacob Pan <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Jason Gunthorpe <[email protected]>
> Reviewed-by: Jean-Philippe Brucker <[email protected]>
> Reviewed-by: Kevin Tian <[email protected]>
> Tested-by: Zhangfei Gao <[email protected]>
> Tested-by: Tony Zhu <[email protected]>
> ---
> include/linux/intel-iommu.h | 3 +-
> include/linux/intel-svm.h | 13 -----
> include/linux/iommu.h | 8 +--
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 5 +-
> drivers/dma/idxd/cdev.c | 3 +-
> drivers/dma/idxd/init.c | 25 +-------
> .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 3 +-
> drivers/iommu/intel/svm.c | 57 +++++--------------
> drivers/iommu/iommu.c | 5 +-
> drivers/misc/uacce/uacce.c | 2 +-
> 10 files changed, 26 insertions(+), 98 deletions(-)
>
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index e065cbe3c857..31e3edc0fc7e 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -738,8 +738,7 @@ struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn);
> extern void intel_svm_check(struct intel_iommu *iommu);
> extern int intel_svm_enable_prq(struct intel_iommu *iommu);
> extern int intel_svm_finish_prq(struct intel_iommu *iommu);
> -struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm,
> - void *drvdata);
> +struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm);
> void intel_svm_unbind(struct iommu_sva *handle);
> u32 intel_svm_get_pasid(struct iommu_sva *handle);
> int intel_svm_page_response(struct device *dev, struct iommu_fault_event *evt,
> diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
> index 207ef06ba3e1..f9a0d44f6fdb 100644
> --- a/include/linux/intel-svm.h
> +++ b/include/linux/intel-svm.h
> @@ -13,17 +13,4 @@
> #define PRQ_RING_MASK ((0x1000 << PRQ_ORDER) - 0x20)
> #define PRQ_DEPTH ((0x1000 << PRQ_ORDER) >> 5)
>
> -/*
> - * The SVM_FLAG_SUPERVISOR_MODE flag requests a PASID which can be used only
> - * for access to kernel addresses. No IOTLB flushes are automatically done
> - * for kernel mappings; it is valid only for access to the kernel's static
> - * 1:1 mapping of physical memory — not to vmalloc or even module mappings.
> - * A future API addition may permit the use of such ranges, by means of an
> - * explicit IOTLB flush call (akin to the DMA API's unmap method).
> - *
> - * It is unlikely that we will ever hook into flush_tlb_kernel_range() to
> - * do such IOTLB flushes automatically.
> - */
> -#define SVM_FLAG_SUPERVISOR_MODE BIT(0)
> -
> #endif /* __INTEL_SVM_H__ */
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 418a1914a041..f41eb2b3c7da 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -243,8 +243,7 @@ struct iommu_ops {
> int (*dev_enable_feat)(struct device *dev, enum iommu_dev_features f);
> int (*dev_disable_feat)(struct device *dev, enum iommu_dev_features f);
>
> - struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct *mm,
> - void *drvdata);
> + struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct *mm);
> void (*sva_unbind)(struct iommu_sva *handle);
> u32 (*sva_get_pasid)(struct iommu_sva *handle);
>
> @@ -669,8 +668,7 @@ int iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features f);
> bool iommu_dev_feature_enabled(struct device *dev, enum iommu_dev_features f);
>
> struct iommu_sva *iommu_sva_bind_device(struct device *dev,
> - struct mm_struct *mm,
> - void *drvdata);
> + struct mm_struct *mm);
> void iommu_sva_unbind_device(struct iommu_sva *handle);
> u32 iommu_sva_get_pasid(struct iommu_sva *handle);
>
> @@ -1012,7 +1010,7 @@ iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features feat)
> }
>
> static inline struct iommu_sva *
> -iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
> +iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
> {
> return NULL;
> }
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index cd48590ada30..d2ba86470c42 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -754,8 +754,7 @@ bool arm_smmu_master_sva_enabled(struct arm_smmu_master *master);
> int arm_smmu_master_enable_sva(struct arm_smmu_master *master);
> int arm_smmu_master_disable_sva(struct arm_smmu_master *master);
> bool arm_smmu_master_iopf_supported(struct arm_smmu_master *master);
> -struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm,
> - void *drvdata);
> +struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm);
> void arm_smmu_sva_unbind(struct iommu_sva *handle);
> u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle);
> void arm_smmu_sva_notifier_synchronize(void);
> @@ -791,7 +790,7 @@ static inline bool arm_smmu_master_iopf_supported(struct arm_smmu_master *master
> }
>
> static inline struct iommu_sva *
> -arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
> +arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
> {
> return ERR_PTR(-ENODEV);
> }
> diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
> index c2808fd081d6..66720001ba1c 100644
> --- a/drivers/dma/idxd/cdev.c
> +++ b/drivers/dma/idxd/cdev.c
> @@ -6,7 +6,6 @@
> #include <linux/pci.h>
> #include <linux/device.h>
> #include <linux/sched/task.h>
> -#include <linux/intel-svm.h>
> #include <linux/io-64-nonatomic-lo-hi.h>
> #include <linux/cdev.h>
> #include <linux/fs.h>
> @@ -100,7 +99,7 @@ static int idxd_cdev_open(struct inode *inode, struct file *filp)
> filp->private_data = ctx;
>
> if (device_user_pasid_enabled(idxd)) {
> - sva = iommu_sva_bind_device(dev, current->mm, NULL);
> + sva = iommu_sva_bind_device(dev, current->mm);
> if (IS_ERR(sva)) {
> rc = PTR_ERR(sva);
> dev_err(dev, "pasid allocation failed: %d\n", rc);
> diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
> index 355fb3ef4cbf..00b437f4f573 100644
> --- a/drivers/dma/idxd/init.c
> +++ b/drivers/dma/idxd/init.c
> @@ -14,7 +14,6 @@
> #include <linux/io-64-nonatomic-lo-hi.h>
> #include <linux/device.h>
> #include <linux/idr.h>
> -#include <linux/intel-svm.h>
> #include <linux/iommu.h>
> #include <uapi/linux/idxd.h>
> #include <linux/dmaengine.h>
> @@ -466,29 +465,7 @@ static struct idxd_device *idxd_alloc(struct pci_dev *pdev, struct idxd_driver_d
>
> static int idxd_enable_system_pasid(struct idxd_device *idxd)
> {
> - int flags;
> - unsigned int pasid;
> - struct iommu_sva *sva;
> -
> - flags = SVM_FLAG_SUPERVISOR_MODE;
> -
> - sva = iommu_sva_bind_device(&idxd->pdev->dev, NULL, &flags);
> - if (IS_ERR(sva)) {
> - dev_warn(&idxd->pdev->dev,
> - "iommu sva bind failed: %ld\n", PTR_ERR(sva));
> - return PTR_ERR(sva);
> - }
> -
> - pasid = iommu_sva_get_pasid(sva);
> - if (pasid == IOMMU_PASID_INVALID) {
> - iommu_sva_unbind_device(sva);
> - return -ENODEV;
> - }
> -
> - idxd->sva = sva;
> - idxd->pasid = pasid;
> - dev_dbg(&idxd->pdev->dev, "system pasid: %u\n", pasid);
> - return 0;
> + return -EOPNOTSUPP;
this makes it to be a always fail call. right? will it break any
existing idxd usage?
> }
>
> static void idxd_disable_system_pasid(struct idxd_device *idxd)
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> index 1ef7bbb4acf3..f155d406c5d5 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> @@ -367,8 +367,7 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
> return ERR_PTR(ret);
> }
>
> -struct iommu_sva *
> -arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
> +struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
> {
> struct iommu_sva *handle;
> struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index 7ee37d996e15..d04880a291c3 100644
> --- a/drivers/iommu/intel/svm.c
> +++ b/drivers/iommu/intel/svm.c
> @@ -313,8 +313,7 @@ static int pasid_to_svm_sdev(struct device *dev, unsigned int pasid,
> return 0;
> }
>
> -static int intel_svm_alloc_pasid(struct device *dev, struct mm_struct *mm,
> - unsigned int flags)
> +static int intel_svm_alloc_pasid(struct device *dev, struct mm_struct *mm)
> {
> ioasid_t max_pasid = dev_is_pci(dev) ?
> pci_max_pasids(to_pci_dev(dev)) : intel_pasid_max_id;
> @@ -324,8 +323,7 @@ static int intel_svm_alloc_pasid(struct device *dev, struct mm_struct *mm,
>
> static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu,
would be great to see a cleanup to rename the svm terms in intel iommu
driver to be sva. :-)
> struct device *dev,
> - struct mm_struct *mm,
> - unsigned int flags)
> + struct mm_struct *mm)
> {
> struct device_domain_info *info = dev_iommu_priv_get(dev);
> unsigned long iflags, sflags;
> @@ -341,22 +339,18 @@ static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu,
>
> svm->pasid = mm->pasid;
> svm->mm = mm;
> - svm->flags = flags;
> INIT_LIST_HEAD_RCU(&svm->devs);
>
> - if (!(flags & SVM_FLAG_SUPERVISOR_MODE)) {
> - svm->notifier.ops = &intel_mmuops;
> - ret = mmu_notifier_register(&svm->notifier, mm);
> - if (ret) {
> - kfree(svm);
> - return ERR_PTR(ret);
> - }
> + svm->notifier.ops = &intel_mmuops;
> + ret = mmu_notifier_register(&svm->notifier, mm);
> + if (ret) {
> + kfree(svm);
> + return ERR_PTR(ret);
> }
>
> ret = pasid_private_add(svm->pasid, svm);
> if (ret) {
> - if (svm->notifier.ops)
> - mmu_notifier_unregister(&svm->notifier, mm);
> + mmu_notifier_unregister(&svm->notifier, mm);
> kfree(svm);
> return ERR_PTR(ret);
> }
> @@ -391,9 +385,7 @@ static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu,
> }
>
> /* Setup the pasid table: */
> - sflags = (flags & SVM_FLAG_SUPERVISOR_MODE) ?
> - PASID_FLAG_SUPERVISOR_MODE : 0;
> - sflags |= cpu_feature_enabled(X86_FEATURE_LA57) ? PASID_FLAG_FL5LP : 0;
> + sflags = cpu_feature_enabled(X86_FEATURE_LA57) ? PASID_FLAG_FL5LP : 0;
> spin_lock_irqsave(&iommu->lock, iflags);
> ret = intel_pasid_setup_first_level(iommu, dev, mm->pgd, mm->pasid,
> FLPT_DEFAULT_DID, sflags);
> @@ -410,8 +402,7 @@ static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu,
> kfree(sdev);
> free_svm:
> if (list_empty(&svm->devs)) {
> - if (svm->notifier.ops)
> - mmu_notifier_unregister(&svm->notifier, mm);
> + mmu_notifier_unregister(&svm->notifier, mm);
> pasid_private_remove(mm->pasid);
> kfree(svm);
> }
> @@ -767,7 +758,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
> * to unbind the mm while any page faults are outstanding.
> */
> svm = pasid_private_find(req->pasid);
> - if (IS_ERR_OR_NULL(svm) || (svm->flags & SVM_FLAG_SUPERVISOR_MODE))
> + if (IS_ERR_OR_NULL(svm))
> goto bad_req;
> }
>
> @@ -818,40 +809,20 @@ static irqreturn_t prq_event_thread(int irq, void *d)
> return IRQ_RETVAL(handled);
> }
>
> -struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
> +struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm)
> {
> struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
> - unsigned int flags = 0;
> struct iommu_sva *sva;
> int ret;
>
> - if (drvdata)
> - flags = *(unsigned int *)drvdata;
> -
> - if (flags & SVM_FLAG_SUPERVISOR_MODE) {
> - if (!ecap_srs(iommu->ecap)) {
> - dev_err(dev, "%s: Supervisor PASID not supported\n",
> - iommu->name);
> - return ERR_PTR(-EOPNOTSUPP);
> - }
> -
> - if (mm) {
> - dev_err(dev, "%s: Supervisor PASID with user provided mm\n",
> - iommu->name);
> - return ERR_PTR(-EINVAL);
> - }
> -
> - mm = &init_mm;
> - }
> -
> mutex_lock(&pasid_mutex);
> - ret = intel_svm_alloc_pasid(dev, mm, flags);
> + ret = intel_svm_alloc_pasid(dev, mm);
> if (ret) {
> mutex_unlock(&pasid_mutex);
> return ERR_PTR(ret);
> }
>
> - sva = intel_svm_bind_mm(iommu, dev, mm, flags);
> + sva = intel_svm_bind_mm(iommu, dev, mm);
> mutex_unlock(&pasid_mutex);
>
> return sva;
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 0cb0750f61e8..74a0a3ec0907 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2788,7 +2788,6 @@ EXPORT_SYMBOL_GPL(iommu_dev_feature_enabled);
> * iommu_sva_bind_device() - Bind a process address space to a device
> * @dev: the device
> * @mm: the mm to bind, caller must hold a reference to it
> - * @drvdata: opaque data pointer to pass to bind callback
> *
> * Create a bond between device and address space, allowing the device to access
> * the mm using the returned PASID. If a bond already exists between @device and
> @@ -2801,7 +2800,7 @@ EXPORT_SYMBOL_GPL(iommu_dev_feature_enabled);
> * On error, returns an ERR_PTR value.
> */
> struct iommu_sva *
> -iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
> +iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
> {
> struct iommu_group *group;
> struct iommu_sva *handle = ERR_PTR(-EINVAL);
> @@ -2826,7 +2825,7 @@ iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
> if (iommu_group_device_count(group) != 1)
> goto out_unlock;
>
> - handle = ops->sva_bind(dev, mm, drvdata);
> + handle = ops->sva_bind(dev, mm);
>
> out_unlock:
> mutex_unlock(&group->mutex);
> diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
> index 281c54003edc..3238a867ea51 100644
> --- a/drivers/misc/uacce/uacce.c
> +++ b/drivers/misc/uacce/uacce.c
> @@ -99,7 +99,7 @@ static int uacce_bind_queue(struct uacce_device *uacce, struct uacce_queue *q)
> if (!(uacce->flags & UACCE_DEV_SVA))
> return 0;
>
> - handle = iommu_sva_bind_device(uacce->parent, current->mm, NULL);
> + handle = iommu_sva_bind_device(uacce->parent, current->mm);
> if (IS_ERR(handle))
> return PTR_ERR(handle);
>
--
Regards,
Yi Liu
On 2022/7/5 13:06, Lu Baolu wrote:
> Use this field to keep the number of supported PASIDs that an IOMMU
> hardware is able to support. This is a generic attribute of an IOMMU
a nit. it should be the max pasid value an IOMMU hardware can support
instead of number of PASIDs. right?
Reviewed-by: Yi Liu <[email protected]>
> and lifting it into the per-IOMMU device structure makes it possible
> to allocate a PASID for device without calls into the IOMMU drivers.
> Any iommu driver that supports PASID related features should set this
> field before enabling them on the devices.
>
> In the Intel IOMMU driver, intel_iommu_sm is moved to CONFIG_INTEL_IOMMU
> enclave so that the pasid_supported() helper could be used in dmar.c
> without compilation errors.
>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Jean-Philippe Brucker <[email protected]>
> Reviewed-by: Kevin Tian <[email protected]>
> Tested-by: Zhangfei Gao <[email protected]>
> Tested-by: Tony Zhu <[email protected]>
> ---
> include/linux/intel-iommu.h | 3 ++-
> include/linux/iommu.h | 2 ++
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 1 +
> drivers/iommu/intel/dmar.c | 7 +++++++
> 4 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 4f29139bbfc3..e065cbe3c857 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -479,7 +479,6 @@ enum {
> #define VTD_FLAG_IRQ_REMAP_PRE_ENABLED (1 << 1)
> #define VTD_FLAG_SVM_CAPABLE (1 << 2)
>
> -extern int intel_iommu_sm;
> extern spinlock_t device_domain_lock;
>
> #define sm_supported(iommu) (intel_iommu_sm && ecap_smts((iommu)->ecap))
> @@ -786,6 +785,7 @@ struct context_entry *iommu_context_addr(struct intel_iommu *iommu, u8 bus,
> extern const struct iommu_ops intel_iommu_ops;
>
> #ifdef CONFIG_INTEL_IOMMU
> +extern int intel_iommu_sm;
> extern int iommu_calculate_agaw(struct intel_iommu *iommu);
> extern int iommu_calculate_max_sagaw(struct intel_iommu *iommu);
> extern int dmar_disabled;
> @@ -802,6 +802,7 @@ static inline int iommu_calculate_max_sagaw(struct intel_iommu *iommu)
> }
> #define dmar_disabled (1)
> #define intel_iommu_enabled (0)
> +#define intel_iommu_sm (0)
> #endif
>
> static inline const char *decode_prq_descriptor(char *str, size_t size,
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 5e1afe169549..03fbb1b71536 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -318,12 +318,14 @@ struct iommu_domain_ops {
> * @list: Used by the iommu-core to keep a list of registered iommus
> * @ops: iommu-ops for talking to this iommu
> * @dev: struct device for sysfs handling
> + * @max_pasids: number of supported PASIDs
> */
> struct iommu_device {
> struct list_head list;
> const struct iommu_ops *ops;
> struct fwnode_handle *fwnode;
> struct device *dev;
> + u32 max_pasids;
> };
>
> /**
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 88817a3376ef..ae8ec8df47c1 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -3546,6 +3546,7 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
> /* SID/SSID sizes */
> smmu->ssid_bits = FIELD_GET(IDR1_SSIDSIZE, reg);
> smmu->sid_bits = FIELD_GET(IDR1_SIDSIZE, reg);
> + smmu->iommu.max_pasids = 1UL << smmu->ssid_bits;
>
> /*
> * If the SMMU supports fewer bits than would fill a single L2 stream
> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> index 592c1e1a5d4b..6c338888061a 100644
> --- a/drivers/iommu/intel/dmar.c
> +++ b/drivers/iommu/intel/dmar.c
> @@ -1123,6 +1123,13 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd)
>
> raw_spin_lock_init(&iommu->register_lock);
>
> + /*
> + * A value of N in PSS field of eCap register indicates hardware
> + * supports PASID field of N+1 bits.
> + */
> + if (pasid_supported(iommu))
> + iommu->iommu.max_pasids = 2UL << ecap_pss(iommu->ecap);
> +
> /*
> * This is only for hotplug; at boot time intel_iommu_enabled won't
> * be set yet. When intel_iommu_init() runs, it registers the units
--
Regards,
Yi Liu
On 2022/7/5 13:07, Lu Baolu wrote:
> Attaching an IOMMU domain to a PASID of a device is a generic operation
> for modern IOMMU drivers which support PASID-granular DMA address
> translation. Currently visible usage scenarios include (but not limited):
s/visible/known/? "known" seems better. up to you. :-)
>
> - SVA (Shared Virtual Address)
> - kernel DMA with PASID
> - hardware-assist mediated device
>
> This adds a pair of domain ops for this purpose and adds the interfaces
> for device drivers to attach/detach a domain to/from a {device, PASID}.
> Some buses, like PCI, route packets without considering the PASID value.
> Thus a DMA target address with PASID might be treated as P2P if the
> address falls into the MMIO BAR of other devices in the group. To make
> things simple, these interfaces only apply to devices belonging to the
> singleton groups, and the singleton is immutable in fabric (i.e. not
> affected by hotplug).
>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Jean-Philippe Brucker <[email protected]>
> Tested-by: Zhangfei Gao <[email protected]>
> Tested-by: Tony Zhu <[email protected]>
> ---
> include/linux/iommu.h | 21 ++++++++++++
> drivers/iommu/iommu.c | 75 +++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 96 insertions(+)
Reviewed-by: Yi Liu <[email protected]>
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index f41eb2b3c7da..f2b5aa7efe43 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -262,6 +262,8 @@ struct iommu_ops {
> * struct iommu_domain_ops - domain specific operations
> * @attach_dev: attach an iommu domain to a device
> * @detach_dev: detach an iommu domain from a device
> + * @set_dev_pasid: set an iommu domain to a pasid of device
> + * @block_dev_pasid: block pasid of device from using iommu domain
> * @map: map a physically contiguous memory region to an iommu domain
> * @map_pages: map a physically contiguous set of pages of the same size to
> * an iommu domain.
> @@ -282,6 +284,10 @@ struct iommu_ops {
> struct iommu_domain_ops {
> int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
> void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
> + int (*set_dev_pasid)(struct iommu_domain *domain, struct device *dev,
> + ioasid_t pasid);
> + void (*block_dev_pasid)(struct iommu_domain *domain, struct device *dev,
> + ioasid_t pasid);
>
> int (*map)(struct iommu_domain *domain, unsigned long iova,
> phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
> @@ -679,6 +685,10 @@ int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner);
> void iommu_group_release_dma_owner(struct iommu_group *group);
> bool iommu_group_dma_owner_claimed(struct iommu_group *group);
>
> +int iommu_attach_device_pasid(struct iommu_domain *domain, struct device *dev,
> + ioasid_t pasid);
> +void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev,
> + ioasid_t pasid);
> #else /* CONFIG_IOMMU_API */
>
> struct iommu_ops {};
> @@ -1052,6 +1062,17 @@ static inline bool iommu_group_dma_owner_claimed(struct iommu_group *group)
> {
> return false;
> }
> +
> +static inline int iommu_attach_device_pasid(struct iommu_domain *domain,
> + struct device *dev, ioasid_t pasid)
> +{
> + return -ENODEV;
> +}
> +
> +static inline void iommu_detach_device_pasid(struct iommu_domain *domain,
> + struct device *dev, ioasid_t pasid)
> +{
> +}
> #endif /* CONFIG_IOMMU_API */
>
> /**
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 74a0a3ec0907..be48b09371f4 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -39,6 +39,7 @@ struct iommu_group {
> struct kobject kobj;
> struct kobject *devices_kobj;
> struct list_head devices;
> + struct xarray pasid_array;
> struct mutex mutex;
> void *iommu_data;
> void (*iommu_data_release)(void *iommu_data);
> @@ -660,6 +661,7 @@ struct iommu_group *iommu_group_alloc(void)
> mutex_init(&group->mutex);
> INIT_LIST_HEAD(&group->devices);
> INIT_LIST_HEAD(&group->entry);
> + xa_init(&group->pasid_array);
>
> ret = ida_alloc(&iommu_group_ida, GFP_KERNEL);
> if (ret < 0) {
> @@ -3271,3 +3273,76 @@ bool iommu_group_dma_owner_claimed(struct iommu_group *group)
> return user;
> }
> EXPORT_SYMBOL_GPL(iommu_group_dma_owner_claimed);
> +
> +static bool iommu_group_immutable_singleton(struct iommu_group *group,
> + struct device *dev)
> +{
> + int count;
> +
> + mutex_lock(&group->mutex);
> + count = iommu_group_device_count(group);
> + mutex_unlock(&group->mutex);
> +
> + if (count != 1)
> + return false;
> +
> + /*
> + * The PCI device could be considered to be fully isolated if all
> + * devices on the path from the device to the host-PCI bridge are
> + * protected from peer-to-peer DMA by ACS.
> + */
> + if (dev_is_pci(dev))
> + return pci_acs_path_enabled(to_pci_dev(dev), NULL,
> + REQ_ACS_FLAGS);
> +
> + /*
> + * Otherwise, the device came from DT/ACPI, assume it is static and
> + * then singleton can know from the device count in the group.
> + */
> + return true;
> +}
> +
> +int iommu_attach_device_pasid(struct iommu_domain *domain, struct device *dev,
> + ioasid_t pasid)
> +{
> + struct iommu_group *group;
> + void *curr;
> + int ret;
> +
> + if (!domain->ops->set_dev_pasid)
> + return -EOPNOTSUPP;
> +
> + group = iommu_group_get(dev);
> + if (!group || !iommu_group_immutable_singleton(group, dev)) {
> + iommu_group_put(group);
> + return -EINVAL;
> + }
> +
> + mutex_lock(&group->mutex);
> + curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain, GFP_KERNEL);
> + if (curr) {
> + ret = xa_err(curr) ? : -EBUSY;
> + goto out_unlock;
> + }
> + ret = domain->ops->set_dev_pasid(domain, dev, pasid);
> + if (ret)
> + xa_erase(&group->pasid_array, pasid);
> +out_unlock:
> + mutex_unlock(&group->mutex);
> + iommu_group_put(group);
> +
> + return ret;
> +}
> +
> +void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev,
> + ioasid_t pasid)
> +{
> + struct iommu_group *group = iommu_group_get(dev);
> +
> + mutex_lock(&group->mutex);
> + domain->ops->block_dev_pasid(domain, dev, pasid);
> + xa_erase(&group->pasid_array, pasid);
> + mutex_unlock(&group->mutex);
> +
> + iommu_group_put(group);
> +}
--
Regards,
Yi Liu
On 2022/7/5 13:07, Lu Baolu wrote:
> Add support for SVA domain allocation and provide an SVA-specific
> iommu_domain_ops.
>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Kevin Tian <[email protected]>
> Tested-by: Tony Zhu <[email protected]>
> ---
> include/linux/intel-iommu.h | 5 ++++
> drivers/iommu/intel/iommu.c | 2 ++
> drivers/iommu/intel/svm.c | 49 +++++++++++++++++++++++++++++++++++++
> 3 files changed, 56 insertions(+)
Reviewed-by: Yi Liu <[email protected]>
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 31e3edc0fc7e..9007428a68f1 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -743,6 +743,7 @@ void intel_svm_unbind(struct iommu_sva *handle);
> u32 intel_svm_get_pasid(struct iommu_sva *handle);
> int intel_svm_page_response(struct device *dev, struct iommu_fault_event *evt,
> struct iommu_page_response *msg);
> +struct iommu_domain *intel_svm_domain_alloc(void);
>
> struct intel_svm_dev {
> struct list_head list;
> @@ -768,6 +769,10 @@ struct intel_svm {
> };
> #else
> static inline void intel_svm_check(struct intel_iommu *iommu) {}
> +static inline struct iommu_domain *intel_svm_domain_alloc(void)
> +{
> + return NULL;
> +}
> #endif
>
> #ifdef CONFIG_INTEL_IOMMU_DEBUGFS
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 44016594831d..993a1ce509a8 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -4298,6 +4298,8 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type)
> return domain;
> case IOMMU_DOMAIN_IDENTITY:
> return &si_domain->domain;
> + case IOMMU_DOMAIN_SVA:
> + return intel_svm_domain_alloc();
> default:
> return NULL;
> }
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index d04880a291c3..7d4f9d173013 100644
> --- a/drivers/iommu/intel/svm.c
> +++ b/drivers/iommu/intel/svm.c
> @@ -931,3 +931,52 @@ int intel_svm_page_response(struct device *dev,
> mutex_unlock(&pasid_mutex);
> return ret;
> }
> +
> +static int intel_svm_set_dev_pasid(struct iommu_domain *domain,
> + struct device *dev, ioasid_t pasid)
> +{
> + struct device_domain_info *info = dev_iommu_priv_get(dev);
> + struct intel_iommu *iommu = info->iommu;
> + struct mm_struct *mm = domain->mm;
> + struct iommu_sva *sva;
> + int ret = 0;
> +
> + mutex_lock(&pasid_mutex);
> + sva = intel_svm_bind_mm(iommu, dev, mm);
> + if (IS_ERR(sva))
> + ret = PTR_ERR(sva);
> + mutex_unlock(&pasid_mutex);
> +
> + return ret;
> +}
> +
> +static void intel_svm_block_dev_pasid(struct iommu_domain *domain,
> + struct device *dev, ioasid_t pasid)
> +{
> + mutex_lock(&pasid_mutex);
> + intel_svm_unbind_mm(dev, pasid);
> + mutex_unlock(&pasid_mutex);
> +}
> +
> +static void intel_svm_domain_free(struct iommu_domain *domain)
> +{
> + kfree(to_dmar_domain(domain));
> +}
> +
> +static const struct iommu_domain_ops intel_svm_domain_ops = {
> + .set_dev_pasid = intel_svm_set_dev_pasid,
> + .block_dev_pasid = intel_svm_block_dev_pasid,
> + .free = intel_svm_domain_free,
> +};
> +
> +struct iommu_domain *intel_svm_domain_alloc(void)
> +{
> + struct dmar_domain *domain;
> +
> + domain = kzalloc(sizeof(*domain), GFP_KERNEL);
> + if (!domain)
> + return NULL;
> + domain->domain.ops = &intel_svm_domain_ops;
> +
> + return &domain->domain;
> +}
--
Regards,
Yi Liu
On 2022/7/5 13:07, Lu Baolu wrote:
> These ops'es have been replaced with the dev_attach/detach_pasid domain
> ops'es. There's no need for them anymore. Remove them to avoid dead
> code.
nit. it is replaced by set/block_dev_pasid. isn't it?
Reviewed-by: Yi Liu <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Jean-Philippe Brucker <[email protected]>
> Reviewed-by: Kevin Tian <[email protected]>
> Tested-by: Zhangfei Gao <[email protected]>
> Tested-by: Tony Zhu <[email protected]>
> ---
> include/linux/intel-iommu.h | 3 --
> include/linux/iommu.h | 7 ---
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 16 ------
> .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 40 ---------------
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 3 --
> drivers/iommu/intel/iommu.c | 3 --
> drivers/iommu/intel/svm.c | 49 -------------------
> 7 files changed, 121 deletions(-)
>
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 9007428a68f1..5bd19c95a926 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -738,9 +738,6 @@ struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn);
> extern void intel_svm_check(struct intel_iommu *iommu);
> extern int intel_svm_enable_prq(struct intel_iommu *iommu);
> extern int intel_svm_finish_prq(struct intel_iommu *iommu);
> -struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm);
> -void intel_svm_unbind(struct iommu_sva *handle);
> -u32 intel_svm_get_pasid(struct iommu_sva *handle);
> int intel_svm_page_response(struct device *dev, struct iommu_fault_event *evt,
> struct iommu_page_response *msg);
> struct iommu_domain *intel_svm_domain_alloc(void);
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index f59b0ecd3995..ae0cfca064e6 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -227,9 +227,6 @@ struct iommu_iotlb_gather {
> * @dev_has/enable/disable_feat: per device entries to check/enable/disable
> * iommu specific features.
> * @dev_feat_enabled: check enabled feature
> - * @sva_bind: Bind process address space to device
> - * @sva_unbind: Unbind process address space from device
> - * @sva_get_pasid: Get PASID associated to a SVA handle
> * @page_response: handle page request response
> * @def_domain_type: device default domain type, return value:
> * - IOMMU_DOMAIN_IDENTITY: must use an identity domain
> @@ -263,10 +260,6 @@ struct iommu_ops {
> int (*dev_enable_feat)(struct device *dev, enum iommu_dev_features f);
> int (*dev_disable_feat)(struct device *dev, enum iommu_dev_features f);
>
> - struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct *mm);
> - void (*sva_unbind)(struct iommu_sva *handle);
> - u32 (*sva_get_pasid)(struct iommu_sva *handle);
> -
> int (*page_response)(struct device *dev,
> struct iommu_fault_event *evt,
> struct iommu_page_response *msg);
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 96399dd3a67a..15dd4c7e6d3a 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -754,9 +754,6 @@ bool arm_smmu_master_sva_enabled(struct arm_smmu_master *master);
> int arm_smmu_master_enable_sva(struct arm_smmu_master *master);
> int arm_smmu_master_disable_sva(struct arm_smmu_master *master);
> bool arm_smmu_master_iopf_supported(struct arm_smmu_master *master);
> -struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm);
> -void arm_smmu_sva_unbind(struct iommu_sva *handle);
> -u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle);
> void arm_smmu_sva_notifier_synchronize(void);
> struct iommu_domain *arm_smmu_sva_domain_alloc(void);
> #else /* CONFIG_ARM_SMMU_V3_SVA */
> @@ -790,19 +787,6 @@ static inline bool arm_smmu_master_iopf_supported(struct arm_smmu_master *master
> return false;
> }
>
> -static inline struct iommu_sva *
> -arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
> -{
> - return ERR_PTR(-ENODEV);
> -}
> -
> -static inline void arm_smmu_sva_unbind(struct iommu_sva *handle) {}
> -
> -static inline u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle)
> -{
> - return IOMMU_PASID_INVALID;
> -}
> -
> static inline void arm_smmu_sva_notifier_synchronize(void) {}
>
> static inline struct iommu_domain *arm_smmu_sva_domain_alloc(void)
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> index fc4555dac5b4..e36c689f56c5 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> @@ -344,11 +344,6 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
> if (!bond)
> return ERR_PTR(-ENOMEM);
>
> - /* Allocate a PASID for this mm if necessary */
> - ret = iommu_sva_alloc_pasid(mm, 1, (1U << master->ssid_bits) - 1);
> - if (ret)
> - goto err_free_bond;
> -
> bond->mm = mm;
> bond->sva.dev = dev;
> refcount_set(&bond->refs, 1);
> @@ -367,41 +362,6 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
> return ERR_PTR(ret);
> }
>
> -struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
> -{
> - struct iommu_sva *handle;
> - struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> - struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> -
> - if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
> - return ERR_PTR(-EINVAL);
> -
> - mutex_lock(&sva_lock);
> - handle = __arm_smmu_sva_bind(dev, mm);
> - mutex_unlock(&sva_lock);
> - return handle;
> -}
> -
> -void arm_smmu_sva_unbind(struct iommu_sva *handle)
> -{
> - struct arm_smmu_bond *bond = sva_to_bond(handle);
> -
> - mutex_lock(&sva_lock);
> - if (refcount_dec_and_test(&bond->refs)) {
> - list_del(&bond->list);
> - arm_smmu_mmu_notifier_put(bond->smmu_mn);
> - kfree(bond);
> - }
> - mutex_unlock(&sva_lock);
> -}
> -
> -u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle)
> -{
> - struct arm_smmu_bond *bond = sva_to_bond(handle);
> -
> - return bond->mm->pasid;
> -}
> -
> bool arm_smmu_sva_supported(struct arm_smmu_device *smmu)
> {
> unsigned long reg, fld;
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index a30b252e2f95..8b9b78c7a67d 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2855,9 +2855,6 @@ static struct iommu_ops arm_smmu_ops = {
> .dev_feat_enabled = arm_smmu_dev_feature_enabled,
> .dev_enable_feat = arm_smmu_dev_enable_feature,
> .dev_disable_feat = arm_smmu_dev_disable_feature,
> - .sva_bind = arm_smmu_sva_bind,
> - .sva_unbind = arm_smmu_sva_unbind,
> - .sva_get_pasid = arm_smmu_sva_get_pasid,
> .page_response = arm_smmu_page_response,
> .pgsize_bitmap = -1UL, /* Restricted during device attach */
> .owner = THIS_MODULE,
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 993a1ce509a8..37d68eda1889 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -4921,9 +4921,6 @@ const struct iommu_ops intel_iommu_ops = {
> .def_domain_type = device_def_domain_type,
> .pgsize_bitmap = SZ_4K,
> #ifdef CONFIG_INTEL_IOMMU_SVM
> - .sva_bind = intel_svm_bind,
> - .sva_unbind = intel_svm_unbind,
> - .sva_get_pasid = intel_svm_get_pasid,
> .page_response = intel_svm_page_response,
> #endif
> .default_domain_ops = &(const struct iommu_domain_ops) {
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index 7d4f9d173013..db55b06cafdf 100644
> --- a/drivers/iommu/intel/svm.c
> +++ b/drivers/iommu/intel/svm.c
> @@ -313,14 +313,6 @@ static int pasid_to_svm_sdev(struct device *dev, unsigned int pasid,
> return 0;
> }
>
> -static int intel_svm_alloc_pasid(struct device *dev, struct mm_struct *mm)
> -{
> - ioasid_t max_pasid = dev_is_pci(dev) ?
> - pci_max_pasids(to_pci_dev(dev)) : intel_pasid_max_id;
> -
> - return iommu_sva_alloc_pasid(mm, PASID_MIN, max_pasid - 1);
> -}
> -
> static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu,
> struct device *dev,
> struct mm_struct *mm)
> @@ -809,47 +801,6 @@ static irqreturn_t prq_event_thread(int irq, void *d)
> return IRQ_RETVAL(handled);
> }
>
> -struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm)
> -{
> - struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
> - struct iommu_sva *sva;
> - int ret;
> -
> - mutex_lock(&pasid_mutex);
> - ret = intel_svm_alloc_pasid(dev, mm);
> - if (ret) {
> - mutex_unlock(&pasid_mutex);
> - return ERR_PTR(ret);
> - }
> -
> - sva = intel_svm_bind_mm(iommu, dev, mm);
> - mutex_unlock(&pasid_mutex);
> -
> - return sva;
> -}
> -
> -void intel_svm_unbind(struct iommu_sva *sva)
> -{
> - struct intel_svm_dev *sdev = to_intel_svm_dev(sva);
> -
> - mutex_lock(&pasid_mutex);
> - intel_svm_unbind_mm(sdev->dev, sdev->pasid);
> - mutex_unlock(&pasid_mutex);
> -}
> -
> -u32 intel_svm_get_pasid(struct iommu_sva *sva)
> -{
> - struct intel_svm_dev *sdev;
> - u32 pasid;
> -
> - mutex_lock(&pasid_mutex);
> - sdev = to_intel_svm_dev(sva);
> - pasid = sdev->pasid;
> - mutex_unlock(&pasid_mutex);
> -
> - return pasid;
> -}
> -
> int intel_svm_page_response(struct device *dev,
> struct iommu_fault_event *evt,
> struct iommu_page_response *msg)
--
Regards,
Yi Liu
On 2022/7/5 13:07, Lu Baolu wrote:
> The existing iommu SVA interfaces are implemented by calling the SVA
> specific iommu ops provided by the IOMMU drivers. There's no need for
> any SVA specific ops in iommu_ops vector anymore as we can achieve
> this through the generic attach/detach_dev_pasid domain ops.
s/"attach/detach_dev_pasid"/"set/block_pasid_dev"/
>
> This refactors the IOMMU SVA interfaces implementation by using the
> set/block_pasid_dev ops and align them with the concept of the SVA
> iommu domain. Put the new SVA code in the sva related file in order
> to make it self-contained.
>
> Signed-off-by: Lu Baolu <[email protected]>
> Tested-by: Zhangfei Gao <[email protected]>
> Tested-by: Tony Zhu <[email protected]>
> ---
> include/linux/iommu.h | 67 +++++++++++--------
> drivers/iommu/iommu-sva-lib.c | 98 ++++++++++++++++++++++++++++
> drivers/iommu/iommu.c | 119 ++++++++--------------------------
> 3 files changed, 165 insertions(+), 119 deletions(-)
>
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 42f0418dc22c..f59b0ecd3995 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -39,7 +39,6 @@ struct device;
> struct iommu_domain;
> struct iommu_domain_ops;
> struct notifier_block;
> -struct iommu_sva;
> struct iommu_fault_event;
> struct iommu_dma_cookie;
>
> @@ -57,6 +56,14 @@ struct iommu_domain_geometry {
> bool force_aperture; /* DMA only allowed in mappable range? */
> };
>
> +/**
> + * struct iommu_sva - handle to a device-mm bond
> + */
> +struct iommu_sva {
> + struct device *dev;
> + refcount_t users;
> +};
> +
> /* Domain feature flags */
> #define __IOMMU_DOMAIN_PAGING (1U << 0) /* Support for iommu_map/unmap */
> #define __IOMMU_DOMAIN_DMA_API (1U << 1) /* Domain for use in DMA-API
> @@ -105,6 +112,7 @@ struct iommu_domain {
> };
> struct { /* IOMMU_DOMAIN_SVA */
> struct mm_struct *mm;
> + struct iommu_sva bond;
> };
> };
> };
> @@ -638,13 +646,6 @@ struct iommu_fwspec {
> /* ATS is supported */
> #define IOMMU_FWSPEC_PCI_RC_ATS (1 << 0)
>
> -/**
> - * struct iommu_sva - handle to a device-mm bond
> - */
> -struct iommu_sva {
> - struct device *dev;
> -};
> -
> int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
> const struct iommu_ops *ops);
> void iommu_fwspec_free(struct device *dev);
> @@ -685,11 +686,6 @@ int iommu_dev_enable_feature(struct device *dev, enum iommu_dev_features f);
> int iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features f);
> bool iommu_dev_feature_enabled(struct device *dev, enum iommu_dev_features f);
>
> -struct iommu_sva *iommu_sva_bind_device(struct device *dev,
> - struct mm_struct *mm);
> -void iommu_sva_unbind_device(struct iommu_sva *handle);
> -u32 iommu_sva_get_pasid(struct iommu_sva *handle);
> -
> int iommu_device_use_default_domain(struct device *dev);
> void iommu_device_unuse_default_domain(struct device *dev);
>
> @@ -703,6 +699,8 @@ int iommu_attach_device_pasid(struct iommu_domain *domain, struct device *dev,
> ioasid_t pasid);
> void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev,
> ioasid_t pasid);
> +struct iommu_domain *
> +iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid);
> #else /* CONFIG_IOMMU_API */
>
> struct iommu_ops {};
> @@ -1033,21 +1031,6 @@ iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features feat)
> return -ENODEV;
> }
>
> -static inline struct iommu_sva *
> -iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
> -{
> - return NULL;
> -}
> -
> -static inline void iommu_sva_unbind_device(struct iommu_sva *handle)
> -{
> -}
> -
> -static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle)
> -{
> - return IOMMU_PASID_INVALID;
> -}
> -
> static inline struct iommu_fwspec *dev_iommu_fwspec_get(struct device *dev)
> {
> return NULL;
> @@ -1093,6 +1076,12 @@ static inline void iommu_detach_device_pasid(struct iommu_domain *domain,
> struct device *dev, ioasid_t pasid)
> {
> }
> +
> +static inline struct iommu_domain *
> +iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid)
> +{
> + return NULL;
> +}
> #endif /* CONFIG_IOMMU_API */
>
> /**
> @@ -1118,4 +1107,26 @@ void iommu_debugfs_setup(void);
> static inline void iommu_debugfs_setup(void) {}
> #endif
>
> +#ifdef CONFIG_IOMMU_SVA
> +struct iommu_sva *iommu_sva_bind_device(struct device *dev,
> + struct mm_struct *mm);
> +void iommu_sva_unbind_device(struct iommu_sva *handle);
> +u32 iommu_sva_get_pasid(struct iommu_sva *handle);
> +#else
> +static inline struct iommu_sva *
> +iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
> +{
> + return NULL;
> +}
> +
> +static inline void iommu_sva_unbind_device(struct iommu_sva *handle)
> +{
> +}
> +
> +static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle)
> +{
> + return IOMMU_PASID_INVALID;
> +}
> +#endif /* CONFIG_IOMMU_SVA */
> +
> #endif /* __LINUX_IOMMU_H */
> diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c
> index 106506143896..751366980232 100644
> --- a/drivers/iommu/iommu-sva-lib.c
> +++ b/drivers/iommu/iommu-sva-lib.c
> @@ -4,6 +4,7 @@
> */
> #include <linux/mutex.h>
> #include <linux/sched/mm.h>
> +#include <linux/iommu.h>
>
> #include "iommu-sva-lib.h"
>
> @@ -69,3 +70,100 @@ struct mm_struct *iommu_sva_find(ioasid_t pasid)
> return ioasid_find(&iommu_sva_pasid, pasid, __mmget_not_zero);
> }
> EXPORT_SYMBOL_GPL(iommu_sva_find);
> +
> +/**
> + * iommu_sva_bind_device() - Bind a process address space to a device
> + * @dev: the device
> + * @mm: the mm to bind, caller must hold a reference to mm_users
> + *
> + * Create a bond between device and address space, allowing the device to access
> + * the mm using the returned PASID. If a bond already exists between @device and
> + * @mm, it is returned and an additional reference is taken. Caller must call
> + * iommu_sva_unbind_device() to release each reference.
> + *
> + * iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) must be called first, to
> + * initialize the required SVA features.
> + *
> + * On error, returns an ERR_PTR value.
> + */
> +struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
> +{
> + struct iommu_domain *domain;
> + ioasid_t max_pasids;
> + int ret = -EINVAL;
> +
> + max_pasids = dev->iommu->max_pasids;
> + if (!max_pasids)
> + return ERR_PTR(-EOPNOTSUPP);
> +
> + /* Allocate mm->pasid if necessary. */
> + ret = iommu_sva_alloc_pasid(mm, 1, max_pasids - 1);
> + if (ret)
> + return ERR_PTR(ret);
> +
> + mutex_lock(&iommu_sva_lock);
> + /* Search for an existing domain. */
> + domain = iommu_get_domain_for_dev_pasid(dev, mm->pasid);
> + if (domain) {
> + refcount_inc(&domain->bond.users);
> + goto out_success;
> + }
> +
> + /* Allocate a new domain and set it on device pasid. */
> + domain = iommu_sva_domain_alloc(dev, mm);
> + if (!domain) {
> + ret = -ENOMEM;
> + goto out_unlock;
> + }
> +
> + ret = iommu_attach_device_pasid(domain, dev, mm->pasid);
> + if (ret)
> + goto out_free_domain;
> + domain->bond.dev = dev;
> + refcount_set(&domain->bond.users, 1);
> +
> +out_success:
> + mutex_unlock(&iommu_sva_lock);
> + return &domain->bond;
> +
> +out_free_domain:
> + iommu_domain_free(domain);
> +out_unlock:
> + mutex_unlock(&iommu_sva_lock);
> +
> + return ERR_PTR(ret);
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
> +
> +/**
> + * iommu_sva_unbind_device() - Remove a bond created with iommu_sva_bind_device
> + * @handle: the handle returned by iommu_sva_bind_device()
> + *
> + * Put reference to a bond between device and address space. The device should
> + * not be issuing any more transaction for this PASID. All outstanding page
> + * requests for this PASID must have been flushed to the IOMMU.
> + */
> +void iommu_sva_unbind_device(struct iommu_sva *handle)
> +{
> + struct device *dev = handle->dev;
> + struct iommu_domain *domain =
> + container_of(handle, struct iommu_domain, bond);
> + ioasid_t pasid = iommu_sva_get_pasid(handle);
> +
> + mutex_lock(&iommu_sva_lock);
> + if (refcount_dec_and_test(&domain->bond.users)) {
> + iommu_detach_device_pasid(domain, dev, pasid);
> + iommu_domain_free(domain);
> + }
> + mutex_unlock(&iommu_sva_lock);
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
> +
> +u32 iommu_sva_get_pasid(struct iommu_sva *handle)
> +{
> + struct iommu_domain *domain =
> + container_of(handle, struct iommu_domain, bond);
> +
> + return domain->mm->pasid;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 10479c5e4d23..e1491eb3c7b6 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2789,97 +2789,6 @@ bool iommu_dev_feature_enabled(struct device *dev, enum iommu_dev_features feat)
> }
> EXPORT_SYMBOL_GPL(iommu_dev_feature_enabled);
>
> -/**
> - * iommu_sva_bind_device() - Bind a process address space to a device
> - * @dev: the device
> - * @mm: the mm to bind, caller must hold a reference to it
> - *
> - * Create a bond between device and address space, allowing the device to access
> - * the mm using the returned PASID. If a bond already exists between @device and
> - * @mm, it is returned and an additional reference is taken. Caller must call
> - * iommu_sva_unbind_device() to release each reference.
> - *
> - * iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) must be called first, to
> - * initialize the required SVA features.
> - *
> - * On error, returns an ERR_PTR value.
> - */
> -struct iommu_sva *
> -iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
> -{
> - struct iommu_group *group;
> - struct iommu_sva *handle = ERR_PTR(-EINVAL);
> - const struct iommu_ops *ops = dev_iommu_ops(dev);
> -
> - if (!ops->sva_bind)
> - return ERR_PTR(-ENODEV);
> -
> - group = iommu_group_get(dev);
> - if (!group)
> - return ERR_PTR(-ENODEV);
> -
> - /* Ensure device count and domain don't change while we're binding */
> - mutex_lock(&group->mutex);
> -
> - /*
> - * To keep things simple, SVA currently doesn't support IOMMU groups
> - * with more than one device. Existing SVA-capable systems are not
> - * affected by the problems that required IOMMU groups (lack of ACS
> - * isolation, device ID aliasing and other hardware issues).
> - */
> - if (iommu_group_device_count(group) != 1)
> - goto out_unlock;
> -
> - handle = ops->sva_bind(dev, mm);
> -
> -out_unlock:
> - mutex_unlock(&group->mutex);
> - iommu_group_put(group);
> -
> - return handle;
> -}
> -EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
> -
> -/**
> - * iommu_sva_unbind_device() - Remove a bond created with iommu_sva_bind_device
> - * @handle: the handle returned by iommu_sva_bind_device()
> - *
> - * Put reference to a bond between device and address space. The device should
> - * not be issuing any more transaction for this PASID. All outstanding page
> - * requests for this PASID must have been flushed to the IOMMU.
> - */
> -void iommu_sva_unbind_device(struct iommu_sva *handle)
> -{
> - struct iommu_group *group;
> - struct device *dev = handle->dev;
> - const struct iommu_ops *ops = dev_iommu_ops(dev);
> -
> - if (!ops->sva_unbind)
> - return;
> -
> - group = iommu_group_get(dev);
> - if (!group)
> - return;
> -
> - mutex_lock(&group->mutex);
> - ops->sva_unbind(handle);
> - mutex_unlock(&group->mutex);
> -
> - iommu_group_put(group);
> -}
> -EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
> -
> -u32 iommu_sva_get_pasid(struct iommu_sva *handle)
> -{
> - const struct iommu_ops *ops = dev_iommu_ops(handle->dev);
> -
> - if (!ops->sva_get_pasid)
> - return IOMMU_PASID_INVALID;
> -
> - return ops->sva_get_pasid(handle);
> -}
> -EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
> -
> /*
> * Changes the default domain of an iommu group that has *only* one device
> *
> @@ -3366,3 +3275,31 @@ void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev,
>
> iommu_group_put(group);
> }
> +
> +/*
> + * This is a variant of iommu_get_domain_for_dev(). It returns the existing
> + * domain attached to pasid of a device. It's only for internal use of the
> + * IOMMU subsystem. The caller must take care to avoid any possible
> + * use-after-free case.
> + */
> +struct iommu_domain *
> +iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid)
> +{
> + struct iommu_domain *domain;
> + struct iommu_group *group;
> +
> + if (!pasid_valid(pasid))
> + return NULL;
> +
> + group = iommu_group_get(dev);
> + if (!group)
> + return NULL;
> + /*
> + * The xarray protects its internal state with RCU. Hence the domain
> + * obtained is either NULL or fully formed.
> + */
> + domain = xa_load(&group->pasid_array, pasid);
> + iommu_group_put(group);
> +
> + return domain;
> +}
--
Regards,
Yi Liu
On 2022/7/5 13:07, Lu Baolu wrote:
> This adds some mechanisms around the iommu_domain so that the I/O page
> fault handling framework could route a page fault to the domain and
> call the fault handler from it.
>
> Add pointers to the page fault handler and its private data in struct
> iommu_domain. The fault handler will be called with the private data
> as a parameter once a page fault is routed to the domain. Any kernel
> component which owns an iommu domain could install handler and its
> private parameter so that the page fault could be further routed and
> handled.
>
> This also prepares the SVA implementation to be the first consumer of
> the per-domain page fault handling model. The I/O page fault handler
> for SVA is copied to the SVA file with mmget_not_zero() added before
> mmap_read_lock().
>
> Suggested-by: Jean-Philippe Brucker <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Jean-Philippe Brucker <[email protected]>
> Tested-by: Zhangfei Gao <[email protected]>
> Tested-by: Tony Zhu <[email protected]>
> ---
> include/linux/iommu.h | 3 ++
> drivers/iommu/iommu-sva-lib.h | 8 +++++
> drivers/iommu/io-pgfault.c | 7 +++++
> drivers/iommu/iommu-sva-lib.c | 58 +++++++++++++++++++++++++++++++++++
> drivers/iommu/iommu.c | 4 +++
> 5 files changed, 80 insertions(+)
>
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index ae0cfca064e6..47610f21d451 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -105,6 +105,9 @@ struct iommu_domain {
> unsigned long pgsize_bitmap; /* Bitmap of page sizes in use */
> struct iommu_domain_geometry geometry;
> struct iommu_dma_cookie *iova_cookie;
> + enum iommu_page_response_code (*iopf_handler)(struct iommu_fault *fault,
> + void *data);
> + void *fault_data;
> union {
> struct {
> iommu_fault_handler_t handler;
> diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva-lib.h
> index 8909ea1094e3..1b3ace4b5863 100644
> --- a/drivers/iommu/iommu-sva-lib.h
> +++ b/drivers/iommu/iommu-sva-lib.h
> @@ -26,6 +26,8 @@ int iopf_queue_flush_dev(struct device *dev);
> struct iopf_queue *iopf_queue_alloc(const char *name);
> void iopf_queue_free(struct iopf_queue *queue);
> int iopf_queue_discard_partial(struct iopf_queue *queue);
> +enum iommu_page_response_code
> +iommu_sva_handle_iopf(struct iommu_fault *fault, void *data);
>
> #else /* CONFIG_IOMMU_SVA */
> static inline int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
> @@ -63,5 +65,11 @@ static inline int iopf_queue_discard_partial(struct iopf_queue *queue)
> {
> return -ENODEV;
> }
> +
> +static inline enum iommu_page_response_code
> +iommu_sva_handle_iopf(struct iommu_fault *fault, void *data)
> +{
> + return IOMMU_PAGE_RESP_INVALID;
> +}
> #endif /* CONFIG_IOMMU_SVA */
> #endif /* _IOMMU_SVA_LIB_H */
> diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
> index 1df8c1dcae77..aee9e033012f 100644
> --- a/drivers/iommu/io-pgfault.c
> +++ b/drivers/iommu/io-pgfault.c
> @@ -181,6 +181,13 @@ static void iopf_handle_group(struct work_struct *work)
> * request completes, outstanding faults will have been dealt with by the time
> * the PASID is freed.
> *
> + * Any valid page fault will be eventually routed to an iommu domain and the
> + * page fault handler installed there will get called. The users of this
> + * handling framework should guarantee that the iommu domain could only be
> + * freed after the device has stopped generating page faults (or the iommu
> + * hardware has been set to block the page faults) and the pending page faults
> + * have been flushed.
> + *
> * Return: 0 on success and <0 on error.
> */
> int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
> diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c
> index 751366980232..536d34855c74 100644
> --- a/drivers/iommu/iommu-sva-lib.c
> +++ b/drivers/iommu/iommu-sva-lib.c
> @@ -167,3 +167,61 @@ u32 iommu_sva_get_pasid(struct iommu_sva *handle)
> return domain->mm->pasid;
> }
> EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
> +
> +/*
> + * I/O page fault handler for SVA
> + */
> +enum iommu_page_response_code
> +iommu_sva_handle_iopf(struct iommu_fault *fault, void *data)
> +{
> + vm_fault_t ret;
> + struct vm_area_struct *vma;
> + struct mm_struct *mm = data;
> + unsigned int access_flags = 0;
> + unsigned int fault_flags = FAULT_FLAG_REMOTE;
> + struct iommu_fault_page_request *prm = &fault->prm;
> + enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID;
> +
> + if (!(prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID))
> + return status;
> +
> + if (IS_ERR_OR_NULL(mm) || !mmget_not_zero(mm))
is it possible to be ERR or NULL? The mm life circle should have been
guaranteed by the mmgrab() in iommu_sva_domain_alloc(). Perhaps coding
issue if it happens. :-)
> + return status;
> +
> + mmap_read_lock(mm);
> +
> + vma = find_extend_vma(mm, prm->addr);
> + if (!vma)
> + /* Unmapped area */
> + goto out_put_mm;
> +
> + if (prm->perm & IOMMU_FAULT_PERM_READ)
> + access_flags |= VM_READ;
> +
> + if (prm->perm & IOMMU_FAULT_PERM_WRITE) {
> + access_flags |= VM_WRITE;
> + fault_flags |= FAULT_FLAG_WRITE;
> + }
> +
> + if (prm->perm & IOMMU_FAULT_PERM_EXEC) {
> + access_flags |= VM_EXEC;
> + fault_flags |= FAULT_FLAG_INSTRUCTION;
> + }
> +
> + if (!(prm->perm & IOMMU_FAULT_PERM_PRIV))
> + fault_flags |= FAULT_FLAG_USER;
> +
> + if (access_flags & ~vma->vm_flags)
> + /* Access fault */
> + goto out_put_mm;
> +
> + ret = handle_mm_fault(vma, prm->addr, fault_flags, NULL);
> + status = ret & VM_FAULT_ERROR ? IOMMU_PAGE_RESP_INVALID :
> + IOMMU_PAGE_RESP_SUCCESS;
> +
> +out_put_mm:
> + mmap_read_unlock(mm);
> + mmput(mm);
> +
> + return status;
> +}
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index e1491eb3c7b6..c6e9c8e82771 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -29,6 +29,8 @@
> #include <trace/events/iommu.h>
> #include <linux/sched/mm.h>
>
> +#include "iommu-sva-lib.h"
> +
> static struct kset *iommu_group_kset;
> static DEFINE_IDA(iommu_group_ida);
>
> @@ -3199,6 +3201,8 @@ struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
> domain->type = IOMMU_DOMAIN_SVA;
> mmgrab(mm);
> domain->mm = mm;
> + domain->iopf_handler = iommu_sva_handle_iopf;
> + domain->fault_data = mm;
>
> return domain;
> }
--
Regards,
Yi Liu
On 2022/7/5 13:07, Lu Baolu wrote:
> The existing iommu SVA interfaces are implemented by calling the SVA
> specific iommu ops provided by the IOMMU drivers. There's no need for
> any SVA specific ops in iommu_ops vector anymore as we can achieve
> this through the generic attach/detach_dev_pasid domain ops.
>
> This refactors the IOMMU SVA interfaces implementation by using the
> set/block_pasid_dev ops and align them with the concept of the SVA
> iommu domain. Put the new SVA code in the sva related file in order
> to make it self-contained.
>
> Signed-off-by: Lu Baolu <[email protected]>
> Tested-by: Zhangfei Gao <[email protected]>
> Tested-by: Tony Zhu <[email protected]>
> ---
> include/linux/iommu.h | 67 +++++++++++--------
> drivers/iommu/iommu-sva-lib.c | 98 ++++++++++++++++++++++++++++
> drivers/iommu/iommu.c | 119 ++++++++--------------------------
> 3 files changed, 165 insertions(+), 119 deletions(-)
>
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 42f0418dc22c..f59b0ecd3995 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -39,7 +39,6 @@ struct device;
> struct iommu_domain;
> struct iommu_domain_ops;
> struct notifier_block;
> -struct iommu_sva;
> struct iommu_fault_event;
> struct iommu_dma_cookie;
>
> @@ -57,6 +56,14 @@ struct iommu_domain_geometry {
> bool force_aperture; /* DMA only allowed in mappable range? */
> };
>
> +/**
> + * struct iommu_sva - handle to a device-mm bond
> + */
> +struct iommu_sva {
> + struct device *dev;
> + refcount_t users;
> +};
> +
> /* Domain feature flags */
> #define __IOMMU_DOMAIN_PAGING (1U << 0) /* Support for iommu_map/unmap */
> #define __IOMMU_DOMAIN_DMA_API (1U << 1) /* Domain for use in DMA-API
> @@ -105,6 +112,7 @@ struct iommu_domain {
> };
> struct { /* IOMMU_DOMAIN_SVA */
> struct mm_struct *mm;
> + struct iommu_sva bond;
> };
> };
> };
> @@ -638,13 +646,6 @@ struct iommu_fwspec {
> /* ATS is supported */
> #define IOMMU_FWSPEC_PCI_RC_ATS (1 << 0)
>
> -/**
> - * struct iommu_sva - handle to a device-mm bond
> - */
> -struct iommu_sva {
> - struct device *dev;
> -};
> -
> int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
> const struct iommu_ops *ops);
> void iommu_fwspec_free(struct device *dev);
> @@ -685,11 +686,6 @@ int iommu_dev_enable_feature(struct device *dev, enum iommu_dev_features f);
> int iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features f);
> bool iommu_dev_feature_enabled(struct device *dev, enum iommu_dev_features f);
>
> -struct iommu_sva *iommu_sva_bind_device(struct device *dev,
> - struct mm_struct *mm);
> -void iommu_sva_unbind_device(struct iommu_sva *handle);
> -u32 iommu_sva_get_pasid(struct iommu_sva *handle);
> -
> int iommu_device_use_default_domain(struct device *dev);
> void iommu_device_unuse_default_domain(struct device *dev);
>
> @@ -703,6 +699,8 @@ int iommu_attach_device_pasid(struct iommu_domain *domain, struct device *dev,
> ioasid_t pasid);
> void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev,
> ioasid_t pasid);
> +struct iommu_domain *
> +iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid);
> #else /* CONFIG_IOMMU_API */
>
> struct iommu_ops {};
> @@ -1033,21 +1031,6 @@ iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features feat)
> return -ENODEV;
> }
>
> -static inline struct iommu_sva *
> -iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
> -{
> - return NULL;
> -}
> -
> -static inline void iommu_sva_unbind_device(struct iommu_sva *handle)
> -{
> -}
> -
> -static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle)
> -{
> - return IOMMU_PASID_INVALID;
> -}
> -
> static inline struct iommu_fwspec *dev_iommu_fwspec_get(struct device *dev)
> {
> return NULL;
> @@ -1093,6 +1076,12 @@ static inline void iommu_detach_device_pasid(struct iommu_domain *domain,
> struct device *dev, ioasid_t pasid)
> {
> }
> +
> +static inline struct iommu_domain *
> +iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid)
> +{
> + return NULL;
> +}
> #endif /* CONFIG_IOMMU_API */
>
> /**
> @@ -1118,4 +1107,26 @@ void iommu_debugfs_setup(void);
> static inline void iommu_debugfs_setup(void) {}
> #endif
>
> +#ifdef CONFIG_IOMMU_SVA
> +struct iommu_sva *iommu_sva_bind_device(struct device *dev,
> + struct mm_struct *mm);
> +void iommu_sva_unbind_device(struct iommu_sva *handle);
> +u32 iommu_sva_get_pasid(struct iommu_sva *handle);
> +#else
> +static inline struct iommu_sva *
> +iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
> +{
> + return NULL;
> +}
> +
> +static inline void iommu_sva_unbind_device(struct iommu_sva *handle)
> +{
> +}
> +
> +static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle)
> +{
> + return IOMMU_PASID_INVALID;
> +}
> +#endif /* CONFIG_IOMMU_SVA */
> +
> #endif /* __LINUX_IOMMU_H */
> diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c
> index 106506143896..751366980232 100644
> --- a/drivers/iommu/iommu-sva-lib.c
> +++ b/drivers/iommu/iommu-sva-lib.c
> @@ -4,6 +4,7 @@
> */
> #include <linux/mutex.h>
> #include <linux/sched/mm.h>
> +#include <linux/iommu.h>
>
> #include "iommu-sva-lib.h"
>
> @@ -69,3 +70,100 @@ struct mm_struct *iommu_sva_find(ioasid_t pasid)
> return ioasid_find(&iommu_sva_pasid, pasid, __mmget_not_zero);
> }
> EXPORT_SYMBOL_GPL(iommu_sva_find);
> +
> +/**
> + * iommu_sva_bind_device() - Bind a process address space to a device
> + * @dev: the device
> + * @mm: the mm to bind, caller must hold a reference to mm_users
> + *
> + * Create a bond between device and address space, allowing the device to access
> + * the mm using the returned PASID. If a bond already exists between @device and
> + * @mm, it is returned and an additional reference is taken. Caller must call
> + * iommu_sva_unbind_device() to release each reference.
> + *
> + * iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) must be called first, to
> + * initialize the required SVA features.
> + *
> + * On error, returns an ERR_PTR value.
> + */
> +struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
> +{
> + struct iommu_domain *domain;
> + ioasid_t max_pasids;
> + int ret = -EINVAL;
> +
> + max_pasids = dev->iommu->max_pasids;
> + if (!max_pasids)
> + return ERR_PTR(-EOPNOTSUPP);
> +
> + /* Allocate mm->pasid if necessary. */
> + ret = iommu_sva_alloc_pasid(mm, 1, max_pasids - 1);
do we want to call mmgrab() before iomu_sva_alloc_pasid() to
avoid using mm without any reference? In your current code,
mmgrab() is called in iommu_sva_domain_alloc().
> + if (ret)
> + return ERR_PTR(ret);
> +
> + mutex_lock(&iommu_sva_lock);
> + /* Search for an existing domain. */
> + domain = iommu_get_domain_for_dev_pasid(dev, mm->pasid);
> + if (domain) {
> + refcount_inc(&domain->bond.users);
> + goto out_success;
> + }
> +
> + /* Allocate a new domain and set it on device pasid. */
> + domain = iommu_sva_domain_alloc(dev, mm);
> + if (!domain) {
> + ret = -ENOMEM;
> + goto out_unlock;
> + }
> +
> + ret = iommu_attach_device_pasid(domain, dev, mm->pasid);
> + if (ret)
> + goto out_free_domain;
> + domain->bond.dev = dev;
> + refcount_set(&domain->bond.users, 1);
> +
> +out_success:
> + mutex_unlock(&iommu_sva_lock);
> + return &domain->bond;
> +
> +out_free_domain:
> + iommu_domain_free(domain);
> +out_unlock:
> + mutex_unlock(&iommu_sva_lock);
> +
> + return ERR_PTR(ret);
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
> +
> +/**
> + * iommu_sva_unbind_device() - Remove a bond created with iommu_sva_bind_device
> + * @handle: the handle returned by iommu_sva_bind_device()
> + *
> + * Put reference to a bond between device and address space. The device should
> + * not be issuing any more transaction for this PASID. All outstanding page
> + * requests for this PASID must have been flushed to the IOMMU.
> + */
> +void iommu_sva_unbind_device(struct iommu_sva *handle)
> +{
> + struct device *dev = handle->dev;
> + struct iommu_domain *domain =
> + container_of(handle, struct iommu_domain, bond);
> + ioasid_t pasid = iommu_sva_get_pasid(handle);
> +
> + mutex_lock(&iommu_sva_lock);
> + if (refcount_dec_and_test(&domain->bond.users)) {
> + iommu_detach_device_pasid(domain, dev, pasid);
> + iommu_domain_free(domain);
> + }
> + mutex_unlock(&iommu_sva_lock);
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
> +
> +u32 iommu_sva_get_pasid(struct iommu_sva *handle)
> +{
> + struct iommu_domain *domain =
> + container_of(handle, struct iommu_domain, bond);
> +
> + return domain->mm->pasid;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 10479c5e4d23..e1491eb3c7b6 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2789,97 +2789,6 @@ bool iommu_dev_feature_enabled(struct device *dev, enum iommu_dev_features feat)
> }
> EXPORT_SYMBOL_GPL(iommu_dev_feature_enabled);
>
> -/**
> - * iommu_sva_bind_device() - Bind a process address space to a device
> - * @dev: the device
> - * @mm: the mm to bind, caller must hold a reference to it
> - *
> - * Create a bond between device and address space, allowing the device to access
> - * the mm using the returned PASID. If a bond already exists between @device and
> - * @mm, it is returned and an additional reference is taken. Caller must call
> - * iommu_sva_unbind_device() to release each reference.
> - *
> - * iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) must be called first, to
> - * initialize the required SVA features.
> - *
> - * On error, returns an ERR_PTR value.
> - */
> -struct iommu_sva *
> -iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
> -{
> - struct iommu_group *group;
> - struct iommu_sva *handle = ERR_PTR(-EINVAL);
> - const struct iommu_ops *ops = dev_iommu_ops(dev);
> -
> - if (!ops->sva_bind)
> - return ERR_PTR(-ENODEV);
> -
> - group = iommu_group_get(dev);
> - if (!group)
> - return ERR_PTR(-ENODEV);
> -
> - /* Ensure device count and domain don't change while we're binding */
> - mutex_lock(&group->mutex);
> -
> - /*
> - * To keep things simple, SVA currently doesn't support IOMMU groups
> - * with more than one device. Existing SVA-capable systems are not
> - * affected by the problems that required IOMMU groups (lack of ACS
> - * isolation, device ID aliasing and other hardware issues).
> - */
> - if (iommu_group_device_count(group) != 1)
> - goto out_unlock;
> -
> - handle = ops->sva_bind(dev, mm);
> -
> -out_unlock:
> - mutex_unlock(&group->mutex);
> - iommu_group_put(group);
> -
> - return handle;
> -}
> -EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
> -
> -/**
> - * iommu_sva_unbind_device() - Remove a bond created with iommu_sva_bind_device
> - * @handle: the handle returned by iommu_sva_bind_device()
> - *
> - * Put reference to a bond between device and address space. The device should
> - * not be issuing any more transaction for this PASID. All outstanding page
> - * requests for this PASID must have been flushed to the IOMMU.
> - */
> -void iommu_sva_unbind_device(struct iommu_sva *handle)
> -{
> - struct iommu_group *group;
> - struct device *dev = handle->dev;
> - const struct iommu_ops *ops = dev_iommu_ops(dev);
> -
> - if (!ops->sva_unbind)
> - return;
> -
> - group = iommu_group_get(dev);
> - if (!group)
> - return;
> -
> - mutex_lock(&group->mutex);
> - ops->sva_unbind(handle);
> - mutex_unlock(&group->mutex);
> -
> - iommu_group_put(group);
> -}
> -EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
> -
> -u32 iommu_sva_get_pasid(struct iommu_sva *handle)
> -{
> - const struct iommu_ops *ops = dev_iommu_ops(handle->dev);
> -
> - if (!ops->sva_get_pasid)
> - return IOMMU_PASID_INVALID;
> -
> - return ops->sva_get_pasid(handle);
> -}
> -EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
> -
> /*
> * Changes the default domain of an iommu group that has *only* one device
> *
> @@ -3366,3 +3275,31 @@ void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev,
>
> iommu_group_put(group);
> }
> +
> +/*
> + * This is a variant of iommu_get_domain_for_dev(). It returns the existing
> + * domain attached to pasid of a device. It's only for internal use of the
> + * IOMMU subsystem. The caller must take care to avoid any possible
> + * use-after-free case.
> + */
> +struct iommu_domain *
> +iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid)
> +{
> + struct iommu_domain *domain;
> + struct iommu_group *group;
> +
> + if (!pasid_valid(pasid))
> + return NULL;
> +
> + group = iommu_group_get(dev);
> + if (!group)
> + return NULL;
> + /*
> + * The xarray protects its internal state with RCU. Hence the domain
> + * obtained is either NULL or fully formed.
> + */
> + domain = xa_load(&group->pasid_array, pasid);
> + iommu_group_put(group);
> +
> + return domain;
> +}
--
Regards,
Yi Liu
On 2022/7/31 20:01, Yi Liu wrote:
> On 2022/7/5 13:07, Lu Baolu wrote:
>> The current kernel DMA with PASID support is based on the SVA with a flag
>> SVM_FLAG_SUPERVISOR_MODE. The IOMMU driver binds the kernel memory
>> address
>> space to a PASID of the device. The device driver programs the device
>> with
>> kernel virtual address (KVA) for DMA access. There have been security and
>> functional issues with this approach:
>>
>> - The lack of IOTLB synchronization upon kernel page table updates.
>> (vmalloc, module/BPF loading, CONFIG_DEBUG_PAGEALLOC etc.)
>> - Other than slight more protection, using kernel virtual address (KVA)
>> has little advantage over physical address. There are also no use
>> cases yet where DMA engines need kernel virtual addresses for
>> in-kernel
>> DMA.
>>
>> This removes SVM_FLAG_SUPERVISOR_MODE support from the IOMMU interface.
>> The device drivers are suggested to handle kernel DMA with PASID through
>> the kernel DMA APIs.
>>
>> The drvdata parameter in iommu_sva_bind_device() and all callbacks is not
>> needed anymore. Cleanup them as well.
>>
>> Link:
>> https://lore.kernel.org/linux-iommu/[email protected]/
>> Signed-off-by: Jacob Pan <[email protected]>
>> Signed-off-by: Lu Baolu <[email protected]>
>> Reviewed-by: Jason Gunthorpe <[email protected]>
>> Reviewed-by: Jean-Philippe Brucker <[email protected]>
>> Reviewed-by: Kevin Tian <[email protected]>
>> Tested-by: Zhangfei Gao <[email protected]>
>> Tested-by: Tony Zhu <[email protected]>
>> ---
>> include/linux/intel-iommu.h | 3 +-
>> include/linux/intel-svm.h | 13 -----
>> include/linux/iommu.h | 8 +--
>> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 5 +-
>> drivers/dma/idxd/cdev.c | 3 +-
>> drivers/dma/idxd/init.c | 25 +-------
>> .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 3 +-
>> drivers/iommu/intel/svm.c | 57 +++++--------------
>> drivers/iommu/iommu.c | 5 +-
>> drivers/misc/uacce/uacce.c | 2 +-
>> 10 files changed, 26 insertions(+), 98 deletions(-)
>>
>> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
>> index e065cbe3c857..31e3edc0fc7e 100644
>> --- a/include/linux/intel-iommu.h
>> +++ b/include/linux/intel-iommu.h
>> @@ -738,8 +738,7 @@ struct intel_iommu *device_to_iommu(struct device
>> *dev, u8 *bus, u8 *devfn);
>> extern void intel_svm_check(struct intel_iommu *iommu);
>> extern int intel_svm_enable_prq(struct intel_iommu *iommu);
>> extern int intel_svm_finish_prq(struct intel_iommu *iommu);
>> -struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct
>> *mm,
>> - void *drvdata);
>> +struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct
>> *mm);
>> void intel_svm_unbind(struct iommu_sva *handle);
>> u32 intel_svm_get_pasid(struct iommu_sva *handle);
>> int intel_svm_page_response(struct device *dev, struct
>> iommu_fault_event *evt,
>> diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
>> index 207ef06ba3e1..f9a0d44f6fdb 100644
>> --- a/include/linux/intel-svm.h
>> +++ b/include/linux/intel-svm.h
>> @@ -13,17 +13,4 @@
>> #define PRQ_RING_MASK ((0x1000 << PRQ_ORDER) - 0x20)
>> #define PRQ_DEPTH ((0x1000 << PRQ_ORDER) >> 5)
>> -/*
>> - * The SVM_FLAG_SUPERVISOR_MODE flag requests a PASID which can be
>> used only
>> - * for access to kernel addresses. No IOTLB flushes are automatically
>> done
>> - * for kernel mappings; it is valid only for access to the kernel's
>> static
>> - * 1:1 mapping of physical memory — not to vmalloc or even module
>> mappings.
>> - * A future API addition may permit the use of such ranges, by means
>> of an
>> - * explicit IOTLB flush call (akin to the DMA API's unmap method).
>> - *
>> - * It is unlikely that we will ever hook into
>> flush_tlb_kernel_range() to
>> - * do such IOTLB flushes automatically.
>> - */
>> -#define SVM_FLAG_SUPERVISOR_MODE BIT(0)
>> -
>> #endif /* __INTEL_SVM_H__ */
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index 418a1914a041..f41eb2b3c7da 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -243,8 +243,7 @@ struct iommu_ops {
>> int (*dev_enable_feat)(struct device *dev, enum
>> iommu_dev_features f);
>> int (*dev_disable_feat)(struct device *dev, enum
>> iommu_dev_features f);
>> - struct iommu_sva *(*sva_bind)(struct device *dev, struct
>> mm_struct *mm,
>> - void *drvdata);
>> + struct iommu_sva *(*sva_bind)(struct device *dev, struct
>> mm_struct *mm);
>> void (*sva_unbind)(struct iommu_sva *handle);
>> u32 (*sva_get_pasid)(struct iommu_sva *handle);
>> @@ -669,8 +668,7 @@ int iommu_dev_disable_feature(struct device *dev,
>> enum iommu_dev_features f);
>> bool iommu_dev_feature_enabled(struct device *dev, enum
>> iommu_dev_features f);
>> struct iommu_sva *iommu_sva_bind_device(struct device *dev,
>> - struct mm_struct *mm,
>> - void *drvdata);
>> + struct mm_struct *mm);
>> void iommu_sva_unbind_device(struct iommu_sva *handle);
>> u32 iommu_sva_get_pasid(struct iommu_sva *handle);
>> @@ -1012,7 +1010,7 @@ iommu_dev_disable_feature(struct device *dev,
>> enum iommu_dev_features feat)
>> }
>> static inline struct iommu_sva *
>> -iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void
>> *drvdata)
>> +iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
>> {
>> return NULL;
>> }
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> index cd48590ada30..d2ba86470c42 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> @@ -754,8 +754,7 @@ bool arm_smmu_master_sva_enabled(struct
>> arm_smmu_master *master);
>> int arm_smmu_master_enable_sva(struct arm_smmu_master *master);
>> int arm_smmu_master_disable_sva(struct arm_smmu_master *master);
>> bool arm_smmu_master_iopf_supported(struct arm_smmu_master *master);
>> -struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct
>> mm_struct *mm,
>> - void *drvdata);
>> +struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct
>> mm_struct *mm);
>> void arm_smmu_sva_unbind(struct iommu_sva *handle);
>> u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle);
>> void arm_smmu_sva_notifier_synchronize(void);
>> @@ -791,7 +790,7 @@ static inline bool
>> arm_smmu_master_iopf_supported(struct arm_smmu_master *master
>> }
>> static inline struct iommu_sva *
>> -arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void
>> *drvdata)
>> +arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
>> {
>> return ERR_PTR(-ENODEV);
>> }
>> diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
>> index c2808fd081d6..66720001ba1c 100644
>> --- a/drivers/dma/idxd/cdev.c
>> +++ b/drivers/dma/idxd/cdev.c
>> @@ -6,7 +6,6 @@
>> #include <linux/pci.h>
>> #include <linux/device.h>
>> #include <linux/sched/task.h>
>> -#include <linux/intel-svm.h>
>> #include <linux/io-64-nonatomic-lo-hi.h>
>> #include <linux/cdev.h>
>> #include <linux/fs.h>
>> @@ -100,7 +99,7 @@ static int idxd_cdev_open(struct inode *inode,
>> struct file *filp)
>> filp->private_data = ctx;
>> if (device_user_pasid_enabled(idxd)) {
>> - sva = iommu_sva_bind_device(dev, current->mm, NULL);
>> + sva = iommu_sva_bind_device(dev, current->mm);
>> if (IS_ERR(sva)) {
>> rc = PTR_ERR(sva);
>> dev_err(dev, "pasid allocation failed: %d\n", rc);
>> diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
>> index 355fb3ef4cbf..00b437f4f573 100644
>> --- a/drivers/dma/idxd/init.c
>> +++ b/drivers/dma/idxd/init.c
>> @@ -14,7 +14,6 @@
>> #include <linux/io-64-nonatomic-lo-hi.h>
>> #include <linux/device.h>
>> #include <linux/idr.h>
>> -#include <linux/intel-svm.h>
>> #include <linux/iommu.h>
>> #include <uapi/linux/idxd.h>
>> #include <linux/dmaengine.h>
>> @@ -466,29 +465,7 @@ static struct idxd_device *idxd_alloc(struct
>> pci_dev *pdev, struct idxd_driver_d
>> static int idxd_enable_system_pasid(struct idxd_device *idxd)
>> {
>> - int flags;
>> - unsigned int pasid;
>> - struct iommu_sva *sva;
>> -
>> - flags = SVM_FLAG_SUPERVISOR_MODE;
>> -
>> - sva = iommu_sva_bind_device(&idxd->pdev->dev, NULL, &flags);
>> - if (IS_ERR(sva)) {
>> - dev_warn(&idxd->pdev->dev,
>> - "iommu sva bind failed: %ld\n", PTR_ERR(sva));
>> - return PTR_ERR(sva);
>> - }
>> -
>> - pasid = iommu_sva_get_pasid(sva);
>> - if (pasid == IOMMU_PASID_INVALID) {
>> - iommu_sva_unbind_device(sva);
>> - return -ENODEV;
>> - }
>> -
>> - idxd->sva = sva;
>> - idxd->pasid = pasid;
>> - dev_dbg(&idxd->pdev->dev, "system pasid: %u\n", pasid);
>> - return 0;
>> + return -EOPNOTSUPP;
>
> this makes it to be a always fail call. right? will it break any
> existing idxd usage?
The existing implemenation is problematic. The right solution should be
to attach the default domain to a pasid of a device and handle the
kernel DMA through the formal DMA kernel APIs.
Jacob has already posted his v2 in the mailing list.
>
>> }
>> static void idxd_disable_system_pasid(struct idxd_device *idxd)
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
>> index 1ef7bbb4acf3..f155d406c5d5 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
>> @@ -367,8 +367,7 @@ __arm_smmu_sva_bind(struct device *dev, struct
>> mm_struct *mm)
>> return ERR_PTR(ret);
>> }
>> -struct iommu_sva *
>> -arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void
>> *drvdata)
>> +struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct
>> mm_struct *mm)
>> {
>> struct iommu_sva *handle;
>> struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
>> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
>> index 7ee37d996e15..d04880a291c3 100644
>> --- a/drivers/iommu/intel/svm.c
>> +++ b/drivers/iommu/intel/svm.c
>> @@ -313,8 +313,7 @@ static int pasid_to_svm_sdev(struct device *dev,
>> unsigned int pasid,
>> return 0;
>> }
>> -static int intel_svm_alloc_pasid(struct device *dev, struct mm_struct
>> *mm,
>> - unsigned int flags)
>> +static int intel_svm_alloc_pasid(struct device *dev, struct mm_struct
>> *mm)
>> {
>> ioasid_t max_pasid = dev_is_pci(dev) ?
>> pci_max_pasids(to_pci_dev(dev)) : intel_pasid_max_id;
>> @@ -324,8 +323,7 @@ static int intel_svm_alloc_pasid(struct device
>> *dev, struct mm_struct *mm,
>> static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu,
>
> would be great to see a cleanup to rename the svm terms in intel iommu
> driver to be sva. :-)
SVM is the term in Intel VT-d spec which existed before the term SVA.
It makes sense to make the naming consitent anyway. :-)
>> struct device *dev,
>> - struct mm_struct *mm,
>> - unsigned int flags)
>> + struct mm_struct *mm)
>> {
>> struct device_domain_info *info = dev_iommu_priv_get(dev);
>> unsigned long iflags, sflags;
>> @@ -341,22 +339,18 @@ static struct iommu_sva
>> *intel_svm_bind_mm(struct intel_iommu *iommu,
>> svm->pasid = mm->pasid;
>> svm->mm = mm;
>> - svm->flags = flags;
>> INIT_LIST_HEAD_RCU(&svm->devs);
>> - if (!(flags & SVM_FLAG_SUPERVISOR_MODE)) {
>> - svm->notifier.ops = &intel_mmuops;
>> - ret = mmu_notifier_register(&svm->notifier, mm);
>> - if (ret) {
>> - kfree(svm);
>> - return ERR_PTR(ret);
>> - }
>> + svm->notifier.ops = &intel_mmuops;
>> + ret = mmu_notifier_register(&svm->notifier, mm);
>> + if (ret) {
>> + kfree(svm);
>> + return ERR_PTR(ret);
>> }
>> ret = pasid_private_add(svm->pasid, svm);
>> if (ret) {
>> - if (svm->notifier.ops)
>> - mmu_notifier_unregister(&svm->notifier, mm);
>> + mmu_notifier_unregister(&svm->notifier, mm);
>> kfree(svm);
>> return ERR_PTR(ret);
>> }
>> @@ -391,9 +385,7 @@ static struct iommu_sva *intel_svm_bind_mm(struct
>> intel_iommu *iommu,
>> }
>> /* Setup the pasid table: */
>> - sflags = (flags & SVM_FLAG_SUPERVISOR_MODE) ?
>> - PASID_FLAG_SUPERVISOR_MODE : 0;
>> - sflags |= cpu_feature_enabled(X86_FEATURE_LA57) ?
>> PASID_FLAG_FL5LP : 0;
>> + sflags = cpu_feature_enabled(X86_FEATURE_LA57) ? PASID_FLAG_FL5LP
>> : 0;
>> spin_lock_irqsave(&iommu->lock, iflags);
>> ret = intel_pasid_setup_first_level(iommu, dev, mm->pgd, mm->pasid,
>> FLPT_DEFAULT_DID, sflags);
>> @@ -410,8 +402,7 @@ static struct iommu_sva *intel_svm_bind_mm(struct
>> intel_iommu *iommu,
>> kfree(sdev);
>> free_svm:
>> if (list_empty(&svm->devs)) {
>> - if (svm->notifier.ops)
>> - mmu_notifier_unregister(&svm->notifier, mm);
>> + mmu_notifier_unregister(&svm->notifier, mm);
>> pasid_private_remove(mm->pasid);
>> kfree(svm);
>> }
>> @@ -767,7 +758,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
>> * to unbind the mm while any page faults are outstanding.
>> */
>> svm = pasid_private_find(req->pasid);
>> - if (IS_ERR_OR_NULL(svm) || (svm->flags &
>> SVM_FLAG_SUPERVISOR_MODE))
>> + if (IS_ERR_OR_NULL(svm))
>> goto bad_req;
>> }
>> @@ -818,40 +809,20 @@ static irqreturn_t prq_event_thread(int irq,
>> void *d)
>> return IRQ_RETVAL(handled);
>> }
>> -struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct
>> *mm, void *drvdata)
>> +struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct
>> *mm)
>> {
>> struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
>> - unsigned int flags = 0;
>> struct iommu_sva *sva;
>> int ret;
>> - if (drvdata)
>> - flags = *(unsigned int *)drvdata;
>> -
>> - if (flags & SVM_FLAG_SUPERVISOR_MODE) {
>> - if (!ecap_srs(iommu->ecap)) {
>> - dev_err(dev, "%s: Supervisor PASID not supported\n",
>> - iommu->name);
>> - return ERR_PTR(-EOPNOTSUPP);
>> - }
>> -
>> - if (mm) {
>> - dev_err(dev, "%s: Supervisor PASID with user provided mm\n",
>> - iommu->name);
>> - return ERR_PTR(-EINVAL);
>> - }
>> -
>> - mm = &init_mm;
>> - }
>> -
>> mutex_lock(&pasid_mutex);
>> - ret = intel_svm_alloc_pasid(dev, mm, flags);
>> + ret = intel_svm_alloc_pasid(dev, mm);
>> if (ret) {
>> mutex_unlock(&pasid_mutex);
>> return ERR_PTR(ret);
>> }
>> - sva = intel_svm_bind_mm(iommu, dev, mm, flags);
>> + sva = intel_svm_bind_mm(iommu, dev, mm);
>> mutex_unlock(&pasid_mutex);
>> return sva;
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index 0cb0750f61e8..74a0a3ec0907 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -2788,7 +2788,6 @@ EXPORT_SYMBOL_GPL(iommu_dev_feature_enabled);
>> * iommu_sva_bind_device() - Bind a process address space to a device
>> * @dev: the device
>> * @mm: the mm to bind, caller must hold a reference to it
>> - * @drvdata: opaque data pointer to pass to bind callback
>> *
>> * Create a bond between device and address space, allowing the
>> device to access
>> * the mm using the returned PASID. If a bond already exists between
>> @device and
>> @@ -2801,7 +2800,7 @@ EXPORT_SYMBOL_GPL(iommu_dev_feature_enabled);
>> * On error, returns an ERR_PTR value.
>> */
>> struct iommu_sva *
>> -iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void
>> *drvdata)
>> +iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
>> {
>> struct iommu_group *group;
>> struct iommu_sva *handle = ERR_PTR(-EINVAL);
>> @@ -2826,7 +2825,7 @@ iommu_sva_bind_device(struct device *dev, struct
>> mm_struct *mm, void *drvdata)
>> if (iommu_group_device_count(group) != 1)
>> goto out_unlock;
>> - handle = ops->sva_bind(dev, mm, drvdata);
>> + handle = ops->sva_bind(dev, mm);
>> out_unlock:
>> mutex_unlock(&group->mutex);
>> diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
>> index 281c54003edc..3238a867ea51 100644
>> --- a/drivers/misc/uacce/uacce.c
>> +++ b/drivers/misc/uacce/uacce.c
>> @@ -99,7 +99,7 @@ static int uacce_bind_queue(struct uacce_device
>> *uacce, struct uacce_queue *q)
>> if (!(uacce->flags & UACCE_DEV_SVA))
>> return 0;
>> - handle = iommu_sva_bind_device(uacce->parent, current->mm, NULL);
>> + handle = iommu_sva_bind_device(uacce->parent, current->mm);
>> if (IS_ERR(handle))
>> return PTR_ERR(handle);
>
Best regards,
baolu
On 2022/7/31 20:55, Yi Liu wrote:
> On 2022/7/5 13:07, Lu Baolu wrote:
>> The existing iommu SVA interfaces are implemented by calling the SVA
>> specific iommu ops provided by the IOMMU drivers. There's no need for
>> any SVA specific ops in iommu_ops vector anymore as we can achieve
>> this through the generic attach/detach_dev_pasid domain ops.
>>
>> This refactors the IOMMU SVA interfaces implementation by using the
>> set/block_pasid_dev ops and align them with the concept of the SVA
>> iommu domain. Put the new SVA code in the sva related file in order
>> to make it self-contained.
>>
>> Signed-off-by: Lu Baolu <[email protected]>
>> Tested-by: Zhangfei Gao <[email protected]>
>> Tested-by: Tony Zhu <[email protected]>
>> ---
>> include/linux/iommu.h | 67 +++++++++++--------
>> drivers/iommu/iommu-sva-lib.c | 98 ++++++++++++++++++++++++++++
>> drivers/iommu/iommu.c | 119 ++++++++--------------------------
>> 3 files changed, 165 insertions(+), 119 deletions(-)
>>
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index 42f0418dc22c..f59b0ecd3995 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -39,7 +39,6 @@ struct device;
>> struct iommu_domain;
>> struct iommu_domain_ops;
>> struct notifier_block;
>> -struct iommu_sva;
>> struct iommu_fault_event;
>> struct iommu_dma_cookie;
>> @@ -57,6 +56,14 @@ struct iommu_domain_geometry {
>> bool force_aperture; /* DMA only allowed in mappable
>> range? */
>> };
>> +/**
>> + * struct iommu_sva - handle to a device-mm bond
>> + */
>> +struct iommu_sva {
>> + struct device *dev;
>> + refcount_t users;
>> +};
>> +
>> /* Domain feature flags */
>> #define __IOMMU_DOMAIN_PAGING (1U << 0) /* Support for
>> iommu_map/unmap */
>> #define __IOMMU_DOMAIN_DMA_API (1U << 1) /* Domain for use in
>> DMA-API
>> @@ -105,6 +112,7 @@ struct iommu_domain {
>> };
>> struct { /* IOMMU_DOMAIN_SVA */
>> struct mm_struct *mm;
>> + struct iommu_sva bond;
>> };
>> };
>> };
>> @@ -638,13 +646,6 @@ struct iommu_fwspec {
>> /* ATS is supported */
>> #define IOMMU_FWSPEC_PCI_RC_ATS (1 << 0)
>> -/**
>> - * struct iommu_sva - handle to a device-mm bond
>> - */
>> -struct iommu_sva {
>> - struct device *dev;
>> -};
>> -
>> int iommu_fwspec_init(struct device *dev, struct fwnode_handle
>> *iommu_fwnode,
>> const struct iommu_ops *ops);
>> void iommu_fwspec_free(struct device *dev);
>> @@ -685,11 +686,6 @@ int iommu_dev_enable_feature(struct device *dev,
>> enum iommu_dev_features f);
>> int iommu_dev_disable_feature(struct device *dev, enum
>> iommu_dev_features f);
>> bool iommu_dev_feature_enabled(struct device *dev, enum
>> iommu_dev_features f);
>> -struct iommu_sva *iommu_sva_bind_device(struct device *dev,
>> - struct mm_struct *mm);
>> -void iommu_sva_unbind_device(struct iommu_sva *handle);
>> -u32 iommu_sva_get_pasid(struct iommu_sva *handle);
>> -
>> int iommu_device_use_default_domain(struct device *dev);
>> void iommu_device_unuse_default_domain(struct device *dev);
>> @@ -703,6 +699,8 @@ int iommu_attach_device_pasid(struct iommu_domain
>> *domain, struct device *dev,
>> ioasid_t pasid);
>> void iommu_detach_device_pasid(struct iommu_domain *domain, struct
>> device *dev,
>> ioasid_t pasid);
>> +struct iommu_domain *
>> +iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid);
>> #else /* CONFIG_IOMMU_API */
>> struct iommu_ops {};
>> @@ -1033,21 +1031,6 @@ iommu_dev_disable_feature(struct device *dev,
>> enum iommu_dev_features feat)
>> return -ENODEV;
>> }
>> -static inline struct iommu_sva *
>> -iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
>> -{
>> - return NULL;
>> -}
>> -
>> -static inline void iommu_sva_unbind_device(struct iommu_sva *handle)
>> -{
>> -}
>> -
>> -static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle)
>> -{
>> - return IOMMU_PASID_INVALID;
>> -}
>> -
>> static inline struct iommu_fwspec *dev_iommu_fwspec_get(struct
>> device *dev)
>> {
>> return NULL;
>> @@ -1093,6 +1076,12 @@ static inline void
>> iommu_detach_device_pasid(struct iommu_domain *domain,
>> struct device *dev, ioasid_t pasid)
>> {
>> }
>> +
>> +static inline struct iommu_domain *
>> +iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid)
>> +{
>> + return NULL;
>> +}
>> #endif /* CONFIG_IOMMU_API */
>> /**
>> @@ -1118,4 +1107,26 @@ void iommu_debugfs_setup(void);
>> static inline void iommu_debugfs_setup(void) {}
>> #endif
>> +#ifdef CONFIG_IOMMU_SVA
>> +struct iommu_sva *iommu_sva_bind_device(struct device *dev,
>> + struct mm_struct *mm);
>> +void iommu_sva_unbind_device(struct iommu_sva *handle);
>> +u32 iommu_sva_get_pasid(struct iommu_sva *handle);
>> +#else
>> +static inline struct iommu_sva *
>> +iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
>> +{
>> + return NULL;
>> +}
>> +
>> +static inline void iommu_sva_unbind_device(struct iommu_sva *handle)
>> +{
>> +}
>> +
>> +static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle)
>> +{
>> + return IOMMU_PASID_INVALID;
>> +}
>> +#endif /* CONFIG_IOMMU_SVA */
>> +
>> #endif /* __LINUX_IOMMU_H */
>> diff --git a/drivers/iommu/iommu-sva-lib.c
>> b/drivers/iommu/iommu-sva-lib.c
>> index 106506143896..751366980232 100644
>> --- a/drivers/iommu/iommu-sva-lib.c
>> +++ b/drivers/iommu/iommu-sva-lib.c
>> @@ -4,6 +4,7 @@
>> */
>> #include <linux/mutex.h>
>> #include <linux/sched/mm.h>
>> +#include <linux/iommu.h>
>> #include "iommu-sva-lib.h"
>> @@ -69,3 +70,100 @@ struct mm_struct *iommu_sva_find(ioasid_t pasid)
>> return ioasid_find(&iommu_sva_pasid, pasid, __mmget_not_zero);
>> }
>> EXPORT_SYMBOL_GPL(iommu_sva_find);
>> +
>> +/**
>> + * iommu_sva_bind_device() - Bind a process address space to a device
>> + * @dev: the device
>> + * @mm: the mm to bind, caller must hold a reference to mm_users
>> + *
>> + * Create a bond between device and address space, allowing the
>> device to access
>> + * the mm using the returned PASID. If a bond already exists between
>> @device and
>> + * @mm, it is returned and an additional reference is taken. Caller
>> must call
>> + * iommu_sva_unbind_device() to release each reference.
>> + *
>> + * iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) must be called
>> first, to
>> + * initialize the required SVA features.
>> + *
>> + * On error, returns an ERR_PTR value.
>> + */
>> +struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct
>> mm_struct *mm)
>> +{
>> + struct iommu_domain *domain;
>> + ioasid_t max_pasids;
>> + int ret = -EINVAL;
>> +
>> + max_pasids = dev->iommu->max_pasids;
>> + if (!max_pasids)
>> + return ERR_PTR(-EOPNOTSUPP);
>> +
>> + /* Allocate mm->pasid if necessary. */
>> + ret = iommu_sva_alloc_pasid(mm, 1, max_pasids - 1);
>
> do we want to call mmgrab() before iomu_sva_alloc_pasid() to
> avoid using mm without any reference? In your current code,
> mmgrab() is called in iommu_sva_domain_alloc().
As the comment of this API states "caller must hold a reference to
mm_users".
Best regards,
baolu
On 2022/7/31 20:36, Yi Liu wrote:
> On 2022/7/5 13:07, Lu Baolu wrote:
>> The existing iommu SVA interfaces are implemented by calling the SVA
>> specific iommu ops provided by the IOMMU drivers. There's no need for
>> any SVA specific ops in iommu_ops vector anymore as we can achieve
>> this through the generic attach/detach_dev_pasid domain ops.
>
> s/"attach/detach_dev_pasid"/"set/block_pasid_dev"/
Updated. By the way, as discussed, block_pasid_dev will be dropped since
the next version. It's actually setting group's blocking domain.
Best regards,
baolu
On 2022/7/31 20:50, Yi Liu wrote:
> On 2022/7/5 13:07, Lu Baolu wrote:
>> This adds some mechanisms around the iommu_domain so that the I/O page
>> fault handling framework could route a page fault to the domain and
>> call the fault handler from it.
>>
>> Add pointers to the page fault handler and its private data in struct
>> iommu_domain. The fault handler will be called with the private data
>> as a parameter once a page fault is routed to the domain. Any kernel
>> component which owns an iommu domain could install handler and its
>> private parameter so that the page fault could be further routed and
>> handled.
>>
>> This also prepares the SVA implementation to be the first consumer of
>> the per-domain page fault handling model. The I/O page fault handler
>> for SVA is copied to the SVA file with mmget_not_zero() added before
>> mmap_read_lock().
>>
>> Suggested-by: Jean-Philippe Brucker <[email protected]>
>> Signed-off-by: Lu Baolu <[email protected]>
>> Reviewed-by: Jean-Philippe Brucker <[email protected]>
>> Tested-by: Zhangfei Gao <[email protected]>
>> Tested-by: Tony Zhu <[email protected]>
>> ---
>> include/linux/iommu.h | 3 ++
>> drivers/iommu/iommu-sva-lib.h | 8 +++++
>> drivers/iommu/io-pgfault.c | 7 +++++
>> drivers/iommu/iommu-sva-lib.c | 58 +++++++++++++++++++++++++++++++++++
>> drivers/iommu/iommu.c | 4 +++
>> 5 files changed, 80 insertions(+)
>>
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index ae0cfca064e6..47610f21d451 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -105,6 +105,9 @@ struct iommu_domain {
>> unsigned long pgsize_bitmap; /* Bitmap of page sizes in use */
>> struct iommu_domain_geometry geometry;
>> struct iommu_dma_cookie *iova_cookie;
>> + enum iommu_page_response_code (*iopf_handler)(struct iommu_fault
>> *fault,
>> + void *data);
>> + void *fault_data;
>> union {
>> struct {
>> iommu_fault_handler_t handler;
>> diff --git a/drivers/iommu/iommu-sva-lib.h
>> b/drivers/iommu/iommu-sva-lib.h
>> index 8909ea1094e3..1b3ace4b5863 100644
>> --- a/drivers/iommu/iommu-sva-lib.h
>> +++ b/drivers/iommu/iommu-sva-lib.h
>> @@ -26,6 +26,8 @@ int iopf_queue_flush_dev(struct device *dev);
>> struct iopf_queue *iopf_queue_alloc(const char *name);
>> void iopf_queue_free(struct iopf_queue *queue);
>> int iopf_queue_discard_partial(struct iopf_queue *queue);
>> +enum iommu_page_response_code
>> +iommu_sva_handle_iopf(struct iommu_fault *fault, void *data);
>> #else /* CONFIG_IOMMU_SVA */
>> static inline int iommu_queue_iopf(struct iommu_fault *fault, void
>> *cookie)
>> @@ -63,5 +65,11 @@ static inline int iopf_queue_discard_partial(struct
>> iopf_queue *queue)
>> {
>> return -ENODEV;
>> }
>> +
>> +static inline enum iommu_page_response_code
>> +iommu_sva_handle_iopf(struct iommu_fault *fault, void *data)
>> +{
>> + return IOMMU_PAGE_RESP_INVALID;
>> +}
>> #endif /* CONFIG_IOMMU_SVA */
>> #endif /* _IOMMU_SVA_LIB_H */
>> diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
>> index 1df8c1dcae77..aee9e033012f 100644
>> --- a/drivers/iommu/io-pgfault.c
>> +++ b/drivers/iommu/io-pgfault.c
>> @@ -181,6 +181,13 @@ static void iopf_handle_group(struct work_struct
>> *work)
>> * request completes, outstanding faults will have been dealt with
>> by the time
>> * the PASID is freed.
>> *
>> + * Any valid page fault will be eventually routed to an iommu domain
>> and the
>> + * page fault handler installed there will get called. The users of this
>> + * handling framework should guarantee that the iommu domain could
>> only be
>> + * freed after the device has stopped generating page faults (or the
>> iommu
>> + * hardware has been set to block the page faults) and the pending
>> page faults
>> + * have been flushed.
>> + *
>> * Return: 0 on success and <0 on error.
>> */
>> int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
>> diff --git a/drivers/iommu/iommu-sva-lib.c
>> b/drivers/iommu/iommu-sva-lib.c
>> index 751366980232..536d34855c74 100644
>> --- a/drivers/iommu/iommu-sva-lib.c
>> +++ b/drivers/iommu/iommu-sva-lib.c
>> @@ -167,3 +167,61 @@ u32 iommu_sva_get_pasid(struct iommu_sva *handle)
>> return domain->mm->pasid;
>> }
>> EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
>> +
>> +/*
>> + * I/O page fault handler for SVA
>> + */
>> +enum iommu_page_response_code
>> +iommu_sva_handle_iopf(struct iommu_fault *fault, void *data)
>> +{
>> + vm_fault_t ret;
>> + struct vm_area_struct *vma;
>> + struct mm_struct *mm = data;
>> + unsigned int access_flags = 0;
>> + unsigned int fault_flags = FAULT_FLAG_REMOTE;
>> + struct iommu_fault_page_request *prm = &fault->prm;
>> + enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID;
>> +
>> + if (!(prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID))
>> + return status;
>> +
>> + if (IS_ERR_OR_NULL(mm) || !mmget_not_zero(mm))
>
> is it possible to be ERR or NULL? The mm life circle should have been
> guaranteed by the mmgrab() in iommu_sva_domain_alloc(). Perhaps coding
> issue if it happens. :-)
Updated. Thanks!
Best regards,
baolu
On 2022/7/5 13:07, Lu Baolu wrote:
> The sva iommu_domain represents a hardware pagetable that the IOMMU
> hardware could use for SVA translation. This adds some infrastructure
> to support SVA domain in the iommu common layer. It includes:
>
> - Extend the iommu_domain to support a new IOMMU_DOMAIN_SVA domain
> type. The IOMMU drivers that support allocation of the SVA domain
> should provide its own sva domain specific iommu_domain_ops.
> - Add a helper to allocate an SVA domain. The iommu_domain_free()
> is still used to free an SVA domain.
>
> The report_iommu_fault() should be replaced by the new
> iommu_report_device_fault(). Leave the existing fault handler with the
> existing users and the newly added SVA members excludes it.
>
> Suggested-by: Jean-Philippe Brucker <[email protected]>
> Suggested-by: Jason Gunthorpe <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Jean-Philippe Brucker <[email protected]>
> Tested-by: Zhangfei Gao <[email protected]>
> Tested-by: Tony Zhu <[email protected]>
> ---
> include/linux/iommu.h | 24 ++++++++++++++++++++++--
> drivers/iommu/iommu.c | 20 ++++++++++++++++++++
> 2 files changed, 42 insertions(+), 2 deletions(-)
Reviewed-by: Yi Liu <[email protected]>
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index f2b5aa7efe43..42f0418dc22c 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -64,6 +64,8 @@ struct iommu_domain_geometry {
> #define __IOMMU_DOMAIN_PT (1U << 2) /* Domain is identity mapped */
> #define __IOMMU_DOMAIN_DMA_FQ (1U << 3) /* DMA-API uses flush queue */
>
> +#define __IOMMU_DOMAIN_SVA (1U << 4) /* Shared process address space */
> +
> /*
> * This are the possible domain-types
> *
> @@ -77,6 +79,8 @@ struct iommu_domain_geometry {
> * certain optimizations for these domains
> * IOMMU_DOMAIN_DMA_FQ - As above, but definitely using batched TLB
> * invalidation.
> + * IOMMU_DOMAIN_SVA - DMA addresses are shared process address
> + * spaces represented by mm_struct's.
> */
> #define IOMMU_DOMAIN_BLOCKED (0U)
> #define IOMMU_DOMAIN_IDENTITY (__IOMMU_DOMAIN_PT)
> @@ -86,15 +90,23 @@ struct iommu_domain_geometry {
> #define IOMMU_DOMAIN_DMA_FQ (__IOMMU_DOMAIN_PAGING | \
> __IOMMU_DOMAIN_DMA_API | \
> __IOMMU_DOMAIN_DMA_FQ)
> +#define IOMMU_DOMAIN_SVA (__IOMMU_DOMAIN_SVA)
>
> struct iommu_domain {
> unsigned type;
> const struct iommu_domain_ops *ops;
> unsigned long pgsize_bitmap; /* Bitmap of page sizes in use */
> - iommu_fault_handler_t handler;
> - void *handler_token;
> struct iommu_domain_geometry geometry;
> struct iommu_dma_cookie *iova_cookie;
> + union {
> + struct {
> + iommu_fault_handler_t handler;
> + void *handler_token;
> + };
> + struct { /* IOMMU_DOMAIN_SVA */
> + struct mm_struct *mm;
> + };
> + };
> };
>
> static inline bool iommu_is_dma_domain(struct iommu_domain *domain)
> @@ -685,6 +697,8 @@ int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner);
> void iommu_group_release_dma_owner(struct iommu_group *group);
> bool iommu_group_dma_owner_claimed(struct iommu_group *group);
>
> +struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
> + struct mm_struct *mm);
> int iommu_attach_device_pasid(struct iommu_domain *domain, struct device *dev,
> ioasid_t pasid);
> void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev,
> @@ -1063,6 +1077,12 @@ static inline bool iommu_group_dma_owner_claimed(struct iommu_group *group)
> return false;
> }
>
> +static inline struct iommu_domain *
> +iommu_sva_domain_alloc(struct device *dev, struct mm_struct *mm)
> +{
> + return NULL;
> +}
> +
> static inline int iommu_attach_device_pasid(struct iommu_domain *domain,
> struct device *dev, ioasid_t pasid)
> {
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index be48b09371f4..10479c5e4d23 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -27,6 +27,7 @@
> #include <linux/module.h>
> #include <linux/cc_platform.h>
> #include <trace/events/iommu.h>
> +#include <linux/sched/mm.h>
>
> static struct kset *iommu_group_kset;
> static DEFINE_IDA(iommu_group_ida);
> @@ -1957,6 +1958,8 @@ EXPORT_SYMBOL_GPL(iommu_domain_alloc);
>
> void iommu_domain_free(struct iommu_domain *domain)
> {
> + if (domain->type == IOMMU_DOMAIN_SVA)
> + mmdrop(domain->mm);
> iommu_put_dma_cookie(domain);
> domain->ops->free(domain);
> }
> @@ -3274,6 +3277,23 @@ bool iommu_group_dma_owner_claimed(struct iommu_group *group)
> }
> EXPORT_SYMBOL_GPL(iommu_group_dma_owner_claimed);
>
> +struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
> + struct mm_struct *mm)
> +{
> + const struct iommu_ops *ops = dev_iommu_ops(dev);
> + struct iommu_domain *domain;
> +
> + domain = ops->domain_alloc(IOMMU_DOMAIN_SVA);
> + if (!domain)
> + return NULL;
> +
> + domain->type = IOMMU_DOMAIN_SVA;
> + mmgrab(mm);
> + domain->mm = mm;
> +
> + return domain;
> +}
> +
> static bool iommu_group_immutable_singleton(struct iommu_group *group,
> struct device *dev)
> {
--
Regards,
Yi Liu
Hi Yi,
Thanks for reviewing my series.
On 2022/7/31 19:54, Yi Liu wrote:
> On 2022/7/5 13:06, Lu Baolu wrote:
>> Use this field to keep the number of supported PASIDs that an IOMMU
>> hardware is able to support. This is a generic attribute of an IOMMU
>
> a nit. it should be the max pasid value an IOMMU hardware can support
> instead of number of PASIDs. right?
More accurately, it's maximum number of PASIDs supported by IOMMU
hardware".
Best regards,
baolu
Hi Jason,
On 7/26/22 9:57 PM, Jason Gunthorpe wrote:
>> + /*
>> + * Block PASID attachment in all cases where the PCI fabric is
>> + * routing based on address. ACS disables it.
>> + */
>> + if (dev_is_pci(dev) &&
>> + !pci_acs_path_enabled(to_pci_dev(dev), NULL, REQ_ACS_FLAGS))
>> + return -ENODEV;
> I would probably still put this in a function just to be clear, and
> probably even a PCI layer funcion 'pci_is_pasid_supported' that
> clearly indicates that the fabric path can route a PASID packet
> without mis-routing it.
I am fine with putting above in a function to make it clear. But I am
hesitant to move this part of logic into the PCI layer.
From the perspective of IOMMU, TLPs with PASID prefix form distinct
address spaces, so it's reasonable to require ACS protection on the
upstream path.
But PCI spec doesn't require this. The interfaces defined in drivers/pci
/ats.c should work as well even the IOMMU is disabled.
> If the fabric routes PASID properly then groups are not an issue - all
> agree on this?
Yes, agreed. The iommu groups are not an issue any more. But just like
iommu_attach_device(), if multiple devices share a group, there must be
some mechanism to make sure that device drivers are aware of this fact
and only attach a shared domain to any PASID of those devices.'
Otherwise, the iommu_attach/detach_dev_pasid() might be misused.
Considering that all existing PASID use cases are singleton group case,
probably we can start our support from the simple singleton group case?
Best regards,
baolu
On Tue, Aug 02, 2022 at 10:19:08AM +0800, Baolu Lu wrote:
> Hi Jason,
>
> On 7/26/22 9:57 PM, Jason Gunthorpe wrote:
> > > + /*
> > > + * Block PASID attachment in all cases where the PCI fabric is
> > > + * routing based on address. ACS disables it.
> > > + */
> > > + if (dev_is_pci(dev) &&
> > > + !pci_acs_path_enabled(to_pci_dev(dev), NULL, REQ_ACS_FLAGS))
> > > + return -ENODEV;
> > I would probably still put this in a function just to be clear, and
> > probably even a PCI layer funcion 'pci_is_pasid_supported' that
> > clearly indicates that the fabric path can route a PASID packet
> > without mis-routing it.
>
> I am fine with putting above in a function to make it clear. But I am
> hesitant to move this part of logic into the PCI layer.
>
> From the perspective of IOMMU, TLPs with PASID prefix form distinct
> address spaces, so it's reasonable to require ACS protection on the
> upstream path.
>
> But PCI spec doesn't require this. The interfaces defined in drivers/pci
> /ats.c should work as well even the IOMMU is disabled.
No, I don't think so, that is useless.
PCI SIG has given a bunch of tools, and it is up to the system
software to figure out how to use them.
There is no reasonable case where Linux would want PASID and broken
fabric routing - so just block it at the PCI layer.
> Yes, agreed. The iommu groups are not an issue any more. But just like
> iommu_attach_device(), if multiple devices share a group, there must be
> some mechanism to make sure that device drivers are aware of this fact
> and only attach a shared domain to any PASID of those devices.'
> Otherwise, the iommu_attach/detach_dev_pasid() might be misused.
I think it is the same as the existing attach logic for groups, with
the sharing, owern and everything else did. No change for pasid.
> Considering that all existing PASID use cases are singleton group case,
> probably we can start our support from the simple singleton group case?
Don't make confusing unnecessary special cases please.
Jason
On 2022/8/2 20:37, Jason Gunthorpe wrote:
> On Tue, Aug 02, 2022 at 10:19:08AM +0800, Baolu Lu wrote:
>> Hi Jason,
>>
>> On 7/26/22 9:57 PM, Jason Gunthorpe wrote:
>>>> + /*
>>>> + * Block PASID attachment in all cases where the PCI fabric is
>>>> + * routing based on address. ACS disables it.
>>>> + */
>>>> + if (dev_is_pci(dev) &&
>>>> + !pci_acs_path_enabled(to_pci_dev(dev), NULL, REQ_ACS_FLAGS))
>>>> + return -ENODEV;
>>> I would probably still put this in a function just to be clear, and
>>> probably even a PCI layer funcion 'pci_is_pasid_supported' that
>>> clearly indicates that the fabric path can route a PASID packet
>>> without mis-routing it.
>>
>> I am fine with putting above in a function to make it clear. But I am
>> hesitant to move this part of logic into the PCI layer.
>>
>> From the perspective of IOMMU, TLPs with PASID prefix form distinct
>> address spaces, so it's reasonable to require ACS protection on the
>> upstream path.
>>
>> But PCI spec doesn't require this. The interfaces defined in drivers/pci
>> /ats.c should work as well even the IOMMU is disabled.
>
> No, I don't think so, that is useless.
>
> PCI SIG has given a bunch of tools, and it is up to the system
> software to figure out how to use them.
>
> There is no reasonable case where Linux would want PASID and broken
> fabric routing - so just block it at the PCI layer.
Okay. I will follow. Thank you for the confirmation.
>
>> Yes, agreed. The iommu groups are not an issue any more. But just like
>> iommu_attach_device(), if multiple devices share a group, there must be
>> some mechanism to make sure that device drivers are aware of this fact
>> and only attach a shared domain to any PASID of those devices.'
>> Otherwise, the iommu_attach/detach_dev_pasid() might be misused.
>
> I think it is the same as the existing attach logic for groups, with
> the sharing, owern and everything else did. No change for pasid.
Agreed. This is a complete scheme. I updated this patch accordingly. Can
you please help to give it a quick review?
[PATCH 04/12] iommu: Add attach/detach_dev_pasid iommu interfaces
Attaching an IOMMU domain to a PASID of a device is a generic operation
for modern IOMMU drivers which support PASID-granular DMA address
translation. Currently visible usage scenarios include (but not limited):
- SVA (Shared Virtual Address)
- kernel DMA with PASID
- hardware-assist mediated device
This adds set_dev_pasid domain ops for this purpose and also adds some
interfaces for device drivers to attach/detach a domain to/from a PASID
of a device.
The device drivers should use below interfaces to claim the ownership of
a device before attaching domain to it, and release the ownership after
detaching the domain.
int iommu_device_claim_pasid_owner(struct device *dev,
ioasid_t pasid, void *owner)
void iommu_device_release_pasid_owner(struct device *dev,
ioasid_t pasid)
After the ownership claimed successfully, the device drivers could use
below interfaces for domain attaching and detaching. The owner token
passed to iommu_attach_device_pasid() must match the one used to claim
the ownership.
int iommu_attach_device_pasid(struct iommu_domain *domain,
struct device *dev, ioasid_t pasid,
void *owner)
void iommu_detach_device_pasid(struct iommu_domain *domain,
struct device *dev, ioasid_t pasid)
This also adds below interface to retrieve the domain that has been
attached to a PASID of the device. This is only for uses in the IOMMU
subsystem. For example, the I/O page fault handling framework could use
it to get the domain from a {device, PASID} pair reported by hardware,
and the IOMMU device drivers could use it to get the existing domain
when a blocking domain is about to set.
struct iommu_domain *ommu_get_domain_for_dev_pasid(struct device *dev,
ioasid_t pasid);
[--tags skipped--]
---
drivers/iommu/iommu.c | 227 ++++++++++++++++++++++++++++++++++++++++++
include/linux/iommu.h | 36 +++++++
2 files changed, 263 insertions(+)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 63fc4317cb47..fd105441ca4d 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -39,6 +39,7 @@ struct iommu_group {
struct kobject kobj;
struct kobject *devices_kobj;
struct list_head devices;
+ struct xarray pasid_array;
struct mutex mutex;
void *iommu_data;
void (*iommu_data_release)(void *iommu_data);
@@ -58,6 +59,13 @@ struct group_device {
char *name;
};
+struct group_pasid {
+ struct iommu_domain *domain;
+ unsigned int attach_cnt;
+ unsigned int owner_cnt;
+ void *owner;
+};
+
struct iommu_group_attribute {
struct attribute attr;
ssize_t (*show)(struct iommu_group *group, char *buf);
@@ -663,6 +671,7 @@ struct iommu_group *iommu_group_alloc(void)
mutex_init(&group->mutex);
INIT_LIST_HEAD(&group->devices);
INIT_LIST_HEAD(&group->entry);
+ xa_init(&group->pasid_array);
ret = ida_alloc(&iommu_group_ida, GFP_KERNEL);
if (ret < 0) {
@@ -3254,3 +3263,221 @@ bool iommu_group_dma_owner_claimed(struct
iommu_group *group)
return user;
}
EXPORT_SYMBOL_GPL(iommu_group_dma_owner_claimed);
+
+static int __iommu_attach_group_pasid(struct iommu_domain *domain,
+ struct iommu_group *group,
+ ioasid_t pasid)
+{
+ struct group_device *device;
+ int ret = 0;
+
+ if (!domain->ops->set_dev_pasid)
+ return -EOPNOTSUPP;
+
+ list_for_each_entry(device, &group->devices, list) {
+ ret = domain->ops->set_dev_pasid(domain, device->dev, pasid);
+ if (ret)
+ break;
+ }
+
+ return ret;
+}
+
+/**
+ * iommu_device_claim_pasid_owner() - Set ownership of a pasid on device
+ * @dev: the device.
+ * @pasid: the pasid of the device.
+ * @owner: caller specified pointer. Used for exclusive ownership.
+ *
+ * Return 0 if it is allowed, otherwise an error.
+ */
+int iommu_device_claim_pasid_owner(struct device *dev, ioasid_t pasid,
void *owner)
+{
+ struct iommu_group *group = iommu_group_get(dev);
+ struct group_pasid *group_pasid;
+ void *curr;
+ int ret;
+
+ if (!group)
+ return -ENODEV;
+
+ mutex_lock(&group->mutex);
+ group_pasid = xa_load(&group->pasid_array, pasid);
+ if (group_pasid) {
+ if (group_pasid->owner != owner) {
+ ret = -EBUSY;
+ goto err_unlock;
+ }
+ group_pasid->owner_cnt++;
+ goto out;
+ }
+
+ group_pasid = kzalloc(sizeof(*group_pasid), GFP_KERNEL);
+ if (!group_pasid) {
+ ret = -ENOMEM;
+ goto err_unlock;
+ }
+
+ group_pasid->owner = owner;
+ group_pasid->owner_cnt = 1;
+ curr = xa_store(&group->pasid_array, pasid, group_pasid, GFP_KERNEL);
+ if (curr) {
+ ret = xa_err(curr) ? : -EBUSY;
+ goto err_free;
+ }
+out:
+ mutex_unlock(&group->mutex);
+ iommu_group_put(group);
+
+ return 0;
+
+err_free:
+ kfree(group_pasid);
+err_unlock:
+ mutex_unlock(&group->mutex);
+ iommu_group_put(group);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_device_claim_pasid_owner);
+
+/**
+ * iommu_device_release_pasid_owner() - Release ownership of a pasid on
device
+ * @group: The group.
+ *
+ * Release the pasid ownership claimed by iommu_device_claim_pasid_owner().
+ */
+void iommu_device_release_pasid_owner(struct device *dev, ioasid_t pasid)
+{
+ struct iommu_group *group = iommu_group_get(dev);
+ struct group_pasid *group_pasid;
+
+ mutex_lock(&group->mutex);
+ group_pasid = xa_load(&group->pasid_array, pasid);
+ if (WARN_ON(!group_pasid))
+ goto unlock_out;
+
+ if (--group_pasid->owner_cnt == 0) {
+ if (WARN_ON(group_pasid->attach_cnt))
+ goto unlock_out;
+ xa_erase(&group->pasid_array, pasid);
+ kfree(group_pasid);
+ }
+
+unlock_out:
+ mutex_unlock(&group->mutex);
+ iommu_group_put(group);
+}
+EXPORT_SYMBOL_GPL(iommu_device_release_pasid_owner);
+
+/**
+ * iommu_attach_device_pasid() - Attach a domain to pasid of device
+ * @domain: the iommu domain.
+ * @dev: the attached device.
+ * @pasid: the pasid of the device.
+ * @owner: the ownership token.
+ *
+ * Return: 0 on success, or an error.
+ */
+int iommu_attach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid,
+ void *owner)
+{
+ struct iommu_group *group = iommu_group_get(dev);
+ struct group_pasid *group_pasid;
+ int ret = 0;
+
+ if (!group)
+ return -ENODEV;
+
+ mutex_lock(&group->mutex);
+ group_pasid = xa_load(&group->pasid_array, pasid);
+ if (!group_pasid || group_pasid->owner != owner) {
+ ret = -EPERM;
+ goto unlock_out;
+ }
+
+ if (group_pasid->domain && group_pasid->domain != domain) {
+ ret = -EBUSY;
+ goto unlock_out;
+ }
+
+ if (!group_pasid->attach_cnt) {
+ ret = __iommu_attach_group_pasid(domain, group, pasid);
+ if (ret)
+ __iommu_attach_group_pasid(group->blocking_domain,
+ group, pasid);
+ }
+
+ if (!ret)
+ group_pasid->attach_cnt++;
+
+unlock_out:
+ mutex_unlock(&group->mutex);
+ iommu_group_put(group);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_attach_device_pasid);
+
+/**
+ * iommu_detach_device_pasid() - Detach the domain from pasid of device
+ * @domain: the iommu domain.
+ * @dev: the attached device.
+ * @pasid: the pasid of the device.
+ *
+ * The @domain must have been attached to @pasid of the @dev with
+ * iommu_detach_device_pasid().
+ */
+void iommu_detach_device_pasid(struct iommu_domain *domain, struct
device *dev,
+ ioasid_t pasid)
+{
+ struct iommu_group *group = iommu_group_get(dev);
+ struct group_pasid *group_pasid;
+
+ mutex_lock(&group->mutex);
+ group_pasid = xa_load(&group->pasid_array, pasid);
+ if (!WARN_ON(!group_pasid || group_pasid->domain != domain) &&
+ --group_pasid->attach_cnt == 0) {
+ __iommu_attach_group_pasid(group->blocking_domain, group, pasid);
+ group_pasid->domain = NULL;
+ }
+ mutex_unlock(&group->mutex);
+
+ iommu_group_put(group);
+}
+EXPORT_SYMBOL_GPL(iommu_detach_device_pasid);
+
+/**
+ * iommu_get_domain_for_dev_pasid() - Retrieve domain for @pasid of @dev
+ * @dev: the queried device
+ * @pasid: the pasid of the device
+ *
+ * This is a variant of iommu_get_domain_for_dev(). It returns the existing
+ * domain attached to pasid of a device. It's only for internal use of the
+ * IOMMU subsystem. The caller must take care to avoid any possible
+ * use-after-free case.
+ *
+ * Return: attached domain on success, NULL otherwise.
+ */
+struct iommu_domain *
+iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid)
+{
+ struct group_pasid *group_pasid;
+ struct iommu_group *group;
+
+ if (!pasid_valid(pasid))
+ return NULL;
+
+ group = iommu_group_get(dev);
+ if (!group)
+ return NULL;
+ /*
+ * The xarray protects its internal state with RCU. Hence the domain
+ * obtained is either NULL or fully formed.
+ */
+ group_pasid = xa_load(&group->pasid_array, pasid);
+ iommu_group_put(group);
+
+ return group_pasid ? group_pasid->domain : NULL;
+}
+EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev_pasid);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 2f237c3cd680..437980c54bb6 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -266,6 +266,7 @@ struct iommu_ops {
* struct iommu_domain_ops - domain specific operations
* @attach_dev: attach an iommu domain to a device
* @detach_dev: detach an iommu domain from a device
+ * @set_dev_pasid: set an iommu domain to a pasid of device
* @map: map a physically contiguous memory region to an iommu domain
* @map_pages: map a physically contiguous set of pages of the same
size to
* an iommu domain.
@@ -286,6 +287,8 @@ struct iommu_ops {
struct iommu_domain_ops {
int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
+ int (*set_dev_pasid)(struct iommu_domain *domain, struct device *dev,
+ ioasid_t pasid);
int (*map)(struct iommu_domain *domain, unsigned long iova,
phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
@@ -680,6 +683,16 @@ int iommu_group_claim_dma_owner(struct iommu_group
*group, void *owner);
void iommu_group_release_dma_owner(struct iommu_group *group);
bool iommu_group_dma_owner_claimed(struct iommu_group *group);
+int iommu_device_claim_pasid_owner(struct device *dev,
+ ioasid_t pasid, void *owner);
+void iommu_device_release_pasid_owner(struct device *dev, ioasid_t pasid);
+int iommu_attach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid,
+ void *owner);
+void iommu_detach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid);
+struct iommu_domain *
+iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid);
#else /* CONFIG_IOMMU_API */
struct iommu_ops {};
@@ -1047,6 +1060,29 @@ static inline bool
iommu_group_dma_owner_claimed(struct iommu_group *group)
{
return false;
}
+
+static inline int
+iommu_device_claim_pasid_owner(struct device *dev, ioasid_t pasid, void
*owner)
+{
+ return -ENODEV;
+}
+
+static inline void
+iommu_device_release_pasid_owner(struct device *dev, ioasid_t pasid)
+{
+}
+
+static inline int iommu_attach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid,
+ void *owner)
+{
+ return -ENODEV;
+}
+
+static inline void iommu_detach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
+{
+}
#endif /* CONFIG_IOMMU_API */
Best regards,
baolu
On Wed, Aug 03, 2022 at 09:07:35PM +0800, Baolu Lu wrote:
> +/**
> + * iommu_device_claim_pasid_owner() - Set ownership of a pasid on device
> + * @dev: the device.
> + * @pasid: the pasid of the device.
> + * @owner: caller specified pointer. Used for exclusive ownership.
> + *
> + * Return 0 if it is allowed, otherwise an error.
> + */
> +int iommu_device_claim_pasid_owner(struct device *dev, ioasid_t pasid, void
> *owner)
I don't see a use case for a special "pasid owner"
PASID is no different from normal DMA. If the calling driver already
has the proper ownership of the device/group then it is fine for that
driver to use any kind of IOMMU attachment, RID, PASID, whatever. It
doesn't matter *how* the attachment is made.
Remember the series that got dropped about converting all the drivers
to the new ownership scheme? That is how it should work - owernship
and domain attach are two different operations and do not get mixed
confusingly together. (and are you going to repost that series? It
would be great to get it done)
Jason
> From: Jason Gunthorpe <[email protected]>
> Sent: Thursday, August 4, 2022 3:03 AM
>
> On Wed, Aug 03, 2022 at 09:07:35PM +0800, Baolu Lu wrote:
> > +/**
> > + * iommu_device_claim_pasid_owner() - Set ownership of a pasid on
> device
> > + * @dev: the device.
> > + * @pasid: the pasid of the device.
> > + * @owner: caller specified pointer. Used for exclusive ownership.
> > + *
> > + * Return 0 if it is allowed, otherwise an error.
> > + */
> > +int iommu_device_claim_pasid_owner(struct device *dev, ioasid_t pasid,
> void
> > *owner)
>
> I don't see a use case for a special "pasid owner"
>
> PASID is no different from normal DMA. If the calling driver already
> has the proper ownership of the device/group then it is fine for that
> driver to use any kind of IOMMU attachment, RID, PASID, whatever. It
> doesn't matter *how* the attachment is made.
>
and pasid already has an alloc/free interface which already implies
an ownership model.
On 2022/8/4 3:03, Jason Gunthorpe wrote:
> On Wed, Aug 03, 2022 at 09:07:35PM +0800, Baolu Lu wrote:
>> +/**
>> + * iommu_device_claim_pasid_owner() - Set ownership of a pasid on device
>> + * @dev: the device.
>> + * @pasid: the pasid of the device.
>> + * @owner: caller specified pointer. Used for exclusive ownership.
>> + *
>> + * Return 0 if it is allowed, otherwise an error.
>> + */
>> +int iommu_device_claim_pasid_owner(struct device *dev, ioasid_t pasid, void
>> *owner)
>
> I don't see a use case for a special "pasid owner"
>
> PASID is no different from normal DMA. If the calling driver already
> has the proper ownership of the device/group then it is fine for that
> driver to use any kind of IOMMU attachment, RID, PASID, whatever. It
> doesn't matter *how* the attachment is made.
Agreed again.
The Linux kernel manages a device at the device driver level, and all
PASIDs are managed by a device driver. There is really no need to manage
ownership at the PASID level. The current DMA ownership mechanism can
manage the exclusions between kernel drivers and user space drivers.
Sorry that I over considered.
> Remember the series that got dropped about converting all the drivers
> to the new ownership scheme? That is how it should work - owernship
> and domain attach are two different operations and do not get mixed
> confusingly together. (and are you going to repost that series? It
> would be great to get it done)
Yes, of cause. I have also some other pending tasks, lock-free page
table traversal, driver ATS interfaces, ownership scheme for kernel
drivers, blocking domain improvement, and etc. These are enough to keep
me busy for a while. :-) If anyone in the community is also interested
in any task, I will be grateful.
Best regards,
baolu