2022-01-24 18:03:03

by Lu Baolu

[permalink] [raw]
Subject: [PATCH 0/7] iommu cleanup and refactoring

Hi,

The guest pasid and aux-domain related code are dead code in current
iommu subtree. As we have reached a consensus that all these features
should be based on the new iommufd framework (which is under active
development), the first part of this series removes and cleanups all
the dead code.

The second part of this series refactors the iommu_domain by moving all
domain-specific ops from iommu_ops to a new domain_ops. This makes an
iommu_domain self-contained and represent the abstraction of an I/O
translation table in the IOMMU subsystem. With different type of
iommu_domain providing different set of ops, it's easier to support more
types of I/O translation tables.

Please help to review and comment.

Best regards,
baolu

Lu Baolu (7):
iommu/vt-d: Remove guest pasid related callbacks
iommu: Remove guest pasid related interfaces and definitions
iommu/vt-d: Remove aux-domain related callbacks
iommu: Remove aux-domain related interfaces and iommu_ops
drm/nouveau/device: Get right pgsize_bitmap of iommu_domain
iommu: Use right way to retrieve iommu_ops
iommu: Add iommu_domain::domain_ops

include/linux/intel-iommu.h | 27 -
include/linux/intel-svm.h | 12 -
include/linux/iommu.h | 174 ++----
drivers/iommu/intel/pasid.h | 4 -
include/uapi/linux/iommu.h | 181 ------
.../drm/nouveau/nvkm/engine/device/tegra.c | 2 +-
drivers/iommu/amd/iommu.c | 21 +-
drivers/iommu/apple-dart.c | 24 +-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 22 +-
drivers/iommu/arm/arm-smmu/arm-smmu.c | 23 +-
drivers/iommu/arm/arm-smmu/qcom_iommu.c | 17 +-
drivers/iommu/exynos-iommu.c | 17 +-
drivers/iommu/fsl_pamu_domain.c | 13 +-
drivers/iommu/intel/debugfs.c | 3 +-
drivers/iommu/intel/iommu.c | 542 +-----------------
drivers/iommu/intel/pasid.c | 161 ------
drivers/iommu/intel/svm.c | 209 -------
drivers/iommu/iommu.c | 296 +---------
drivers/iommu/ipmmu-vmsa.c | 21 +-
drivers/iommu/msm_iommu.c | 17 +-
drivers/iommu/mtk_iommu.c | 24 +-
drivers/iommu/mtk_iommu_v1.c | 19 +-
drivers/iommu/omap-iommu.c | 15 +-
drivers/iommu/rockchip-iommu.c | 17 +-
drivers/iommu/s390-iommu.c | 15 +-
drivers/iommu/sprd-iommu.c | 19 +-
drivers/iommu/sun50i-iommu.c | 18 +-
drivers/iommu/tegra-gart.c | 15 +-
drivers/iommu/tegra-smmu.c | 16 +-
drivers/iommu/virtio-iommu.c | 18 +-
30 files changed, 331 insertions(+), 1631 deletions(-)

--
2.25.1


2022-01-24 18:03:10

by Lu Baolu

[permalink] [raw]
Subject: [PATCH 4/7] iommu: Remove aux-domain related interfaces and iommu_ops

The aux-domain related interfaces and iommu_ops are not referenced
anywhere in the tree. We've also reached a consensus to redesign it
based the new iommufd framework. Remove them to avoid dead code.

Signed-off-by: Lu Baolu <[email protected]>
---
include/linux/iommu.h | 29 ---------------------------
drivers/iommu/iommu.c | 46 -------------------------------------------
2 files changed, 75 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d1e9c3d73966..aa5486243892 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -144,7 +144,6 @@ struct iommu_resv_region {

/**
* enum iommu_dev_features - Per device IOMMU features
- * @IOMMU_DEV_FEAT_AUX: Auxiliary domain feature
* @IOMMU_DEV_FEAT_SVA: Shared Virtual Addresses
* @IOMMU_DEV_FEAT_IOPF: I/O Page Faults such as PRI or Stall. Generally
* enabling %IOMMU_DEV_FEAT_SVA requires
@@ -157,7 +156,6 @@ struct iommu_resv_region {
* iommu_dev_has_feature(), and enable it using iommu_dev_enable_feature().
*/
enum iommu_dev_features {
- IOMMU_DEV_FEAT_AUX,
IOMMU_DEV_FEAT_SVA,
IOMMU_DEV_FEAT_IOPF,
};
@@ -223,8 +221,6 @@ struct iommu_iotlb_gather {
* @dev_has/enable/disable_feat: per device entries to check/enable/disable
* iommu specific features.
* @dev_feat_enabled: check enabled feature
- * @aux_attach/detach_dev: aux-domain specific attach/detach entries.
- * @aux_get_pasid: get the pasid given an aux-domain
* @sva_bind: Bind process address space to device
* @sva_unbind: Unbind process address space from device
* @sva_get_pasid: Get PASID associated to a SVA handle
@@ -285,11 +281,6 @@ struct iommu_ops {
int (*dev_enable_feat)(struct device *dev, enum iommu_dev_features f);
int (*dev_disable_feat)(struct device *dev, enum iommu_dev_features f);

- /* Aux-domain specific attach/detach entries */
- int (*aux_attach_dev)(struct iommu_domain *domain, struct device *dev);
- void (*aux_detach_dev)(struct iommu_domain *domain, struct device *dev);
- int (*aux_get_pasid)(struct iommu_domain *domain, struct device *dev);
-
struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct *mm,
void *drvdata);
void (*sva_unbind)(struct iommu_sva *handle);
@@ -656,9 +647,6 @@ void iommu_release_device(struct device *dev);
int iommu_dev_enable_feature(struct device *dev, enum iommu_dev_features f);
int iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features f);
bool iommu_dev_feature_enabled(struct device *dev, enum iommu_dev_features f);
-int iommu_aux_attach_device(struct iommu_domain *domain, struct device *dev);
-void iommu_aux_detach_device(struct iommu_domain *domain, struct device *dev);
-int iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev);

struct iommu_sva *iommu_sva_bind_device(struct device *dev,
struct mm_struct *mm,
@@ -1003,23 +991,6 @@ iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features feat)
return -ENODEV;
}

-static inline int
-iommu_aux_attach_device(struct iommu_domain *domain, struct device *dev)
-{
- return -ENODEV;
-}
-
-static inline void
-iommu_aux_detach_device(struct iommu_domain *domain, struct device *dev)
-{
-}
-
-static inline int
-iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev)
-{
- return -ENODEV;
-}
-
static inline struct iommu_sva *
iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
{
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 5672beb061ea..5230c6d90ece 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2746,8 +2746,6 @@ EXPORT_SYMBOL_GPL(iommu_dev_enable_feature);

/*
* The device drivers should do the necessary cleanups before calling this.
- * For example, before disabling the aux-domain feature, the device driver
- * should detach all aux-domains. Otherwise, this will return -EBUSY.
*/
int iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features feat)
{
@@ -2775,50 +2773,6 @@ bool iommu_dev_feature_enabled(struct device *dev, enum iommu_dev_features feat)
}
EXPORT_SYMBOL_GPL(iommu_dev_feature_enabled);

-/*
- * Aux-domain specific attach/detach.
- *
- * Only works if iommu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_AUX) returns
- * true. Also, as long as domains are attached to a device through this
- * interface, any tries to call iommu_attach_device() should fail
- * (iommu_detach_device() can't fail, so we fail when trying to re-attach).
- * This should make us safe against a device being attached to a guest as a
- * whole while there are still pasid users on it (aux and sva).
- */
-int iommu_aux_attach_device(struct iommu_domain *domain, struct device *dev)
-{
- int ret = -ENODEV;
-
- if (domain->ops->aux_attach_dev)
- ret = domain->ops->aux_attach_dev(domain, dev);
-
- if (!ret)
- trace_attach_device_to_domain(dev);
-
- return ret;
-}
-EXPORT_SYMBOL_GPL(iommu_aux_attach_device);
-
-void iommu_aux_detach_device(struct iommu_domain *domain, struct device *dev)
-{
- if (domain->ops->aux_detach_dev) {
- domain->ops->aux_detach_dev(domain, dev);
- trace_detach_device_from_domain(dev);
- }
-}
-EXPORT_SYMBOL_GPL(iommu_aux_detach_device);
-
-int iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev)
-{
- int ret = -ENODEV;
-
- if (domain->ops->aux_get_pasid)
- ret = domain->ops->aux_get_pasid(domain, dev);
-
- return ret;
-}
-EXPORT_SYMBOL_GPL(iommu_aux_get_pasid);
-
/**
* iommu_sva_bind_device() - Bind a process address space to a device
* @dev: the device
--
2.25.1

2022-01-24 18:03:11

by Lu Baolu

[permalink] [raw]
Subject: [PATCH 5/7] drm/nouveau/device: Get right pgsize_bitmap of iommu_domain

The supported page sizes of an iommu_domain are saved in the pgsize_bitmap
field. Retrieve the value from the right place.

Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Robin Murphy <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
index d0d52c1d4aee..992cc285f2fe 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
@@ -133,7 +133,7 @@ nvkm_device_tegra_probe_iommu(struct nvkm_device_tegra *tdev)
* or equal to the system's PAGE_SIZE, with a preference if
* both are equal.
*/
- pgsize_bitmap = tdev->iommu.domain->ops->pgsize_bitmap;
+ pgsize_bitmap = tdev->iommu.domain->pgsize_bitmap;
if (pgsize_bitmap & PAGE_SIZE) {
tdev->iommu.pgshift = PAGE_SHIFT;
} else {
--
2.25.1

2022-01-24 18:03:13

by Lu Baolu

[permalink] [raw]
Subject: [PATCH 6/7] iommu: Use right way to retrieve iommu_ops

The common iommu_ops is hooked to both device and domain. When a helper
has both device and domain pointer, the way to get the iommu_ops looks
messy in iommu core. This sorts out the way to get iommu_ops. The device
related helpers go through device pointer, while the domain related ones
go through domain pointer.

Signed-off-by: Lu Baolu <[email protected]>
---
include/linux/iommu.h | 8 ++++++++
drivers/iommu/iommu.c | 25 ++++++++++++++-----------
2 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index aa5486243892..111b3e9c79bb 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -385,6 +385,14 @@ static inline void iommu_iotlb_gather_init(struct iommu_iotlb_gather *gather)
};
}

+static inline const struct iommu_ops *dev_iommu_ops_get(struct device *dev)
+{
+ if (dev && dev->iommu && dev->iommu->iommu_dev)
+ return dev->iommu->iommu_dev->ops;
+
+ return NULL;
+}
+
#define IOMMU_BUS_NOTIFY_PRIORITY 0
#define IOMMU_GROUP_NOTIFY_ADD_DEVICE 1 /* Device added */
#define IOMMU_GROUP_NOTIFY_DEL_DEVICE 2 /* Pre Device removed */
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 5230c6d90ece..6631e2ea44df 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -764,6 +764,7 @@ EXPORT_SYMBOL_GPL(iommu_group_set_name);
static int iommu_create_device_direct_mappings(struct iommu_group *group,
struct device *dev)
{
+ const struct iommu_ops *ops = dev_iommu_ops_get(dev);
struct iommu_domain *domain = group->default_domain;
struct iommu_resv_region *entry;
struct list_head mappings;
@@ -785,8 +786,8 @@ static int iommu_create_device_direct_mappings(struct iommu_group *group,
dma_addr_t start, end, addr;
size_t map_size = 0;

- if (domain->ops->apply_resv_region)
- domain->ops->apply_resv_region(dev, domain, entry);
+ if (ops->apply_resv_region)
+ ops->apply_resv_region(dev, domain, entry);

start = ALIGN(entry->start, pg_size);
end = ALIGN(entry->start + entry->length, pg_size);
@@ -831,8 +832,10 @@ static int iommu_create_device_direct_mappings(struct iommu_group *group,
static bool iommu_is_attach_deferred(struct iommu_domain *domain,
struct device *dev)
{
- if (domain->ops->is_attach_deferred)
- return domain->ops->is_attach_deferred(domain, dev);
+ const struct iommu_ops *ops = dev_iommu_ops_get(dev);
+
+ if (ops->is_attach_deferred)
+ return ops->is_attach_deferred(domain, dev);

return false;
}
@@ -1251,10 +1254,10 @@ int iommu_page_response(struct device *dev,
struct iommu_fault_event *evt;
struct iommu_fault_page_request *prm;
struct dev_iommu *param = dev->iommu;
+ const struct iommu_ops *ops = dev_iommu_ops_get(dev);
bool has_pasid = msg->flags & IOMMU_PAGE_RESP_PASID_VALID;
- struct iommu_domain *domain = iommu_get_domain_for_dev(dev);

- if (!domain || !domain->ops->page_response)
+ if (!ops || !ops->page_response)
return -ENODEV;

if (!param || !param->fault_param)
@@ -1295,7 +1298,7 @@ int iommu_page_response(struct device *dev,
msg->pasid = 0;
}

- ret = domain->ops->page_response(dev, evt, msg);
+ ret = ops->page_response(dev, evt, msg);
list_del(&evt->list);
kfree(evt);
break;
@@ -1758,10 +1761,10 @@ static int __iommu_group_dma_attach(struct iommu_group *group)

static int iommu_group_do_probe_finalize(struct device *dev, void *data)
{
- struct iommu_domain *domain = data;
+ const struct iommu_ops *ops = dev_iommu_ops_get(dev);

- if (domain->ops->probe_finalize)
- domain->ops->probe_finalize(dev);
+ if (ops->probe_finalize)
+ ops->probe_finalize(dev);

return 0;
}
@@ -2020,7 +2023,7 @@ EXPORT_SYMBOL_GPL(iommu_attach_device);

int iommu_deferred_attach(struct device *dev, struct iommu_domain *domain)
{
- const struct iommu_ops *ops = domain->ops;
+ const struct iommu_ops *ops = dev_iommu_ops_get(dev);

if (ops->is_attach_deferred && ops->is_attach_deferred(domain, dev))
return __iommu_attach_device(domain, dev);
--
2.25.1

2022-01-24 18:03:15

by Lu Baolu

[permalink] [raw]
Subject: [PATCH 7/7] iommu: Add iommu_domain::domain_ops

Add a domain specific callback set, domain_ops, for vendor iommu driver
to provide domain specific operations. Move domain-specific callbacks
from iommu_ops to the domain_ops and hook them when a domain is allocated.

Signed-off-by: Lu Baolu <[email protected]>
---
include/linux/iommu.h | 93 ++++++++++++---------
drivers/iommu/amd/iommu.c | 21 +++--
drivers/iommu/apple-dart.c | 24 ++++--
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 22 +++--
drivers/iommu/arm/arm-smmu/arm-smmu.c | 23 +++--
drivers/iommu/arm/arm-smmu/qcom_iommu.c | 17 ++--
drivers/iommu/exynos-iommu.c | 17 ++--
drivers/iommu/fsl_pamu_domain.c | 13 ++-
drivers/iommu/intel/iommu.c | 25 ++++--
drivers/iommu/iommu.c | 15 ++--
drivers/iommu/ipmmu-vmsa.c | 21 +++--
drivers/iommu/msm_iommu.c | 17 ++--
drivers/iommu/mtk_iommu.c | 24 ++++--
drivers/iommu/mtk_iommu_v1.c | 19 +++--
drivers/iommu/omap-iommu.c | 15 ++--
drivers/iommu/rockchip-iommu.c | 17 ++--
drivers/iommu/s390-iommu.c | 15 ++--
drivers/iommu/sprd-iommu.c | 19 +++--
drivers/iommu/sun50i-iommu.c | 18 ++--
drivers/iommu/tegra-gart.c | 15 ++--
drivers/iommu/tegra-smmu.c | 16 ++--
drivers/iommu/virtio-iommu.c | 18 ++--
22 files changed, 305 insertions(+), 179 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 111b3e9c79bb..33c5c0e5c465 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -88,7 +88,7 @@ struct iommu_domain_geometry {

struct iommu_domain {
unsigned type;
- const struct iommu_ops *ops;
+ const struct domain_ops *ops;
unsigned long pgsize_bitmap; /* Bitmap of page sizes in use */
iommu_fault_handler_t handler;
void *handler_token;
@@ -192,26 +192,11 @@ struct iommu_iotlb_gather {
* struct iommu_ops - iommu ops and capabilities
* @capable: check capability
* @domain_alloc: allocate iommu domain
- * @domain_free: free iommu domain
- * @attach_dev: attach device to an iommu domain
- * @detach_dev: detach device from an iommu domain
- * @map: map a physically contiguous memory region to an iommu domain
- * @map_pages: map a physically contiguous set of pages of the same size to
- * an iommu domain.
- * @unmap: unmap a physically contiguous memory region from an iommu domain
- * @unmap_pages: unmap a number of pages of the same size from an iommu domain
- * @flush_iotlb_all: Synchronously flush all hardware TLBs for this domain
- * @iotlb_sync_map: Sync mappings created recently using @map to the hardware
- * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty flush
- * queue
- * @iova_to_phys: translate iova to physical address
* @probe_device: Add device to iommu driver handling
* @release_device: Remove device from iommu driver handling
* @probe_finalize: Do final setup work after the device is added to an IOMMU
* group and attached to the groups domain
* @device_group: find iommu group for a particular device
- * @enable_nesting: Enable nesting
- * @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*)
* @get_resv_regions: Request list of reserved regions for a device
* @put_resv_regions: Free list of reserved regions for a device
* @apply_resv_region: Temporary helper call-back for iova reserved ranges
@@ -237,33 +222,11 @@ struct iommu_ops {

/* Domain allocation and freeing by the iommu driver */
struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type);
- void (*domain_free)(struct iommu_domain *);

- int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
- void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
- int (*map)(struct iommu_domain *domain, unsigned long iova,
- phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
- int (*map_pages)(struct iommu_domain *domain, unsigned long iova,
- phys_addr_t paddr, size_t pgsize, size_t pgcount,
- int prot, gfp_t gfp, size_t *mapped);
- size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
- size_t size, struct iommu_iotlb_gather *iotlb_gather);
- size_t (*unmap_pages)(struct iommu_domain *domain, unsigned long iova,
- size_t pgsize, size_t pgcount,
- struct iommu_iotlb_gather *iotlb_gather);
- void (*flush_iotlb_all)(struct iommu_domain *domain);
- void (*iotlb_sync_map)(struct iommu_domain *domain, unsigned long iova,
- size_t size);
- void (*iotlb_sync)(struct iommu_domain *domain,
- struct iommu_iotlb_gather *iotlb_gather);
- phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);
struct iommu_device *(*probe_device)(struct device *dev);
void (*release_device)(struct device *dev);
void (*probe_finalize)(struct device *dev);
struct iommu_group *(*device_group)(struct device *dev);
- int (*enable_nesting)(struct iommu_domain *domain);
- int (*set_pgtable_quirks)(struct iommu_domain *domain,
- unsigned long quirks);

/* Request/Free a list of reserved regions for a device */
void (*get_resv_regions)(struct device *dev, struct list_head *list);
@@ -296,6 +259,60 @@ struct iommu_ops {
struct module *owner;
};

+/**
+ * struct domain_ops - per-domain ops
+ * @attach_dev: attach an iommu domain to a device
+ * @detach_dev: detach an iommu domain from a device
+ * @map: map a physically contiguous memory region to an iommu domain
+ * @map_pages: map a physically contiguous set of pages of the same size to
+ * an iommu domain.
+ * @unmap: unmap a physically contiguous memory region from an iommu domain
+ * @unmap_pages: unmap a number of pages of the same size from an iommu domain
+ * @flush_iotlb_all: Synchronously flush all hardware TLBs for this domain
+ * @iotlb_sync_map: Sync mappings created recently using @map to the hardware
+ * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty flush
+ * queue
+ * @iova_to_phys: translate iova to physical address
+ * @enable_nesting: Enable nesting
+ * @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*)
+ * @release: Release the domain after use.
+ */
+struct domain_ops {
+ int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
+ void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
+
+ int (*map)(struct iommu_domain *domain, unsigned long iova,
+ phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
+ int (*map_pages)(struct iommu_domain *domain, unsigned long iova,
+ phys_addr_t paddr, size_t pgsize, size_t pgcount,
+ int prot, gfp_t gfp, size_t *mapped);
+ size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
+ size_t size, struct iommu_iotlb_gather *iotlb_gather);
+ size_t (*unmap_pages)(struct iommu_domain *domain, unsigned long iova,
+ size_t pgsize, size_t pgcount,
+ struct iommu_iotlb_gather *iotlb_gather);
+
+ void (*flush_iotlb_all)(struct iommu_domain *domain);
+ void (*iotlb_sync_map)(struct iommu_domain *domain, unsigned long iova,
+ size_t size);
+ void (*iotlb_sync)(struct iommu_domain *domain,
+ struct iommu_iotlb_gather *iotlb_gather);
+
+ phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);
+
+ int (*enable_nesting)(struct iommu_domain *domain);
+ int (*set_pgtable_quirks)(struct iommu_domain *domain,
+ unsigned long quirks);
+
+ void (*release)(struct iommu_domain *domain);
+};
+
+static inline void iommu_domain_set_ops(struct iommu_domain *domain,
+ const struct domain_ops *ops)
+{
+ domain->ops = ops;
+}
+
/**
* struct iommu_device - IOMMU core representation of one IOMMU hardware
* instance
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 461f1844ed1f..1d25dc2e0de9 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -74,6 +74,7 @@ LIST_HEAD(acpihid_map);
* if iommu=pt passed on kernel cmd line.
*/
const struct iommu_ops amd_iommu_ops;
+static const struct domain_ops amd_domain_ops;

static ATOMIC_NOTIFIER_HEAD(ppr_notifier);
int amd_iommu_max_glx_val = -1;
@@ -1977,6 +1978,7 @@ static struct iommu_domain *amd_iommu_domain_alloc(unsigned type)
domain->domain.geometry.aperture_start = 0;
domain->domain.geometry.aperture_end = ~0ULL;
domain->domain.geometry.force_aperture = true;
+ iommu_domain_set_ops(&domain->domain, &amd_domain_ops);

return &domain->domain;
}
@@ -2269,13 +2271,6 @@ static int amd_iommu_def_domain_type(struct device *dev)
const struct iommu_ops amd_iommu_ops = {
.capable = amd_iommu_capable,
.domain_alloc = amd_iommu_domain_alloc,
- .domain_free = amd_iommu_domain_free,
- .attach_dev = amd_iommu_attach_device,
- .detach_dev = amd_iommu_detach_device,
- .map = amd_iommu_map,
- .iotlb_sync_map = amd_iommu_iotlb_sync_map,
- .unmap = amd_iommu_unmap,
- .iova_to_phys = amd_iommu_iova_to_phys,
.probe_device = amd_iommu_probe_device,
.release_device = amd_iommu_release_device,
.probe_finalize = amd_iommu_probe_finalize,
@@ -2284,9 +2279,19 @@ const struct iommu_ops amd_iommu_ops = {
.put_resv_regions = generic_iommu_put_resv_regions,
.is_attach_deferred = amd_iommu_is_attach_deferred,
.pgsize_bitmap = AMD_IOMMU_PGSIZES,
+ .def_domain_type = amd_iommu_def_domain_type,
+};
+
+static const struct domain_ops amd_domain_ops = {
+ .attach_dev = amd_iommu_attach_device,
+ .detach_dev = amd_iommu_detach_device,
+ .map = amd_iommu_map,
+ .iotlb_sync_map = amd_iommu_iotlb_sync_map,
+ .unmap = amd_iommu_unmap,
+ .iova_to_phys = amd_iommu_iova_to_phys,
.flush_iotlb_all = amd_iommu_flush_iotlb_all,
.iotlb_sync = amd_iommu_iotlb_sync,
- .def_domain_type = amd_iommu_def_domain_type,
+ .release = amd_iommu_domain_free,
};

/*****************************************************************************
diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
index 565ef5598811..bf40970ec977 100644
--- a/drivers/iommu/apple-dart.c
+++ b/drivers/iommu/apple-dart.c
@@ -185,6 +185,7 @@ struct apple_dart_master_cfg {

static struct platform_driver apple_dart_driver;
static const struct iommu_ops apple_dart_iommu_ops;
+static const struct domain_ops apple_dart_domain_ops;

static struct apple_dart_domain *to_dart_domain(struct iommu_domain *dom)
{
@@ -589,6 +590,8 @@ static struct iommu_domain *apple_dart_domain_alloc(unsigned int type)
if (type == IOMMU_DOMAIN_IDENTITY || type == IOMMU_DOMAIN_BLOCKED)
dart_domain->finalized = true;

+ iommu_domain_set_ops(&dart_domain->domain, &apple_dart_domain_ops);
+
return &dart_domain->domain;
}

@@ -765,15 +768,6 @@ static void apple_dart_get_resv_regions(struct device *dev,

static const struct iommu_ops apple_dart_iommu_ops = {
.domain_alloc = apple_dart_domain_alloc,
- .domain_free = apple_dart_domain_free,
- .attach_dev = apple_dart_attach_dev,
- .detach_dev = apple_dart_detach_dev,
- .map_pages = apple_dart_map_pages,
- .unmap_pages = apple_dart_unmap_pages,
- .flush_iotlb_all = apple_dart_flush_iotlb_all,
- .iotlb_sync = apple_dart_iotlb_sync,
- .iotlb_sync_map = apple_dart_iotlb_sync_map,
- .iova_to_phys = apple_dart_iova_to_phys,
.probe_device = apple_dart_probe_device,
.release_device = apple_dart_release_device,
.device_group = apple_dart_device_group,
@@ -784,6 +778,18 @@ static const struct iommu_ops apple_dart_iommu_ops = {
.pgsize_bitmap = -1UL, /* Restricted during dart probe */
};

+static const struct domain_ops apple_dart_domain_ops = {
+ .attach_dev = apple_dart_attach_dev,
+ .detach_dev = apple_dart_detach_dev,
+ .map_pages = apple_dart_map_pages,
+ .unmap_pages = apple_dart_unmap_pages,
+ .flush_iotlb_all = apple_dart_flush_iotlb_all,
+ .iotlb_sync = apple_dart_iotlb_sync,
+ .iotlb_sync_map = apple_dart_iotlb_sync_map,
+ .iova_to_phys = apple_dart_iova_to_phys,
+ .release = apple_dart_domain_free,
+};
+
static irqreturn_t apple_dart_irq(int irq, void *dev)
{
struct apple_dart *dart = dev;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 6dc6d8b6b368..4461f7d35625 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1981,6 +1981,8 @@ static const struct iommu_flush_ops arm_smmu_flush_ops = {
.tlb_add_page = arm_smmu_tlb_inv_page_nosync,
};

+static const struct domain_ops arm_smmu_domain_ops;
+
/* IOMMU API */
static bool arm_smmu_capable(enum iommu_cap cap)
{
@@ -2017,6 +2019,7 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
INIT_LIST_HEAD(&smmu_domain->devices);
spin_lock_init(&smmu_domain->devices_lock);
INIT_LIST_HEAD(&smmu_domain->mmu_notifiers);
+ iommu_domain_set_ops(&smmu_domain->domain, &arm_smmu_domain_ops);

return &smmu_domain->domain;
}
@@ -2841,17 +2844,9 @@ static int arm_smmu_dev_disable_feature(struct device *dev,
static struct iommu_ops arm_smmu_ops = {
.capable = arm_smmu_capable,
.domain_alloc = arm_smmu_domain_alloc,
- .domain_free = arm_smmu_domain_free,
- .attach_dev = arm_smmu_attach_dev,
- .map_pages = arm_smmu_map_pages,
- .unmap_pages = arm_smmu_unmap_pages,
- .flush_iotlb_all = arm_smmu_flush_iotlb_all,
- .iotlb_sync = arm_smmu_iotlb_sync,
- .iova_to_phys = arm_smmu_iova_to_phys,
.probe_device = arm_smmu_probe_device,
.release_device = arm_smmu_release_device,
.device_group = arm_smmu_device_group,
- .enable_nesting = arm_smmu_enable_nesting,
.of_xlate = arm_smmu_of_xlate,
.get_resv_regions = arm_smmu_get_resv_regions,
.put_resv_regions = generic_iommu_put_resv_regions,
@@ -2867,6 +2862,17 @@ static struct iommu_ops arm_smmu_ops = {
.owner = THIS_MODULE,
};

+static const struct domain_ops arm_smmu_domain_ops = {
+ .attach_dev = arm_smmu_attach_dev,
+ .map_pages = arm_smmu_map_pages,
+ .unmap_pages = arm_smmu_unmap_pages,
+ .flush_iotlb_all = arm_smmu_flush_iotlb_all,
+ .iotlb_sync = arm_smmu_iotlb_sync,
+ .iova_to_phys = arm_smmu_iova_to_phys,
+ .enable_nesting = arm_smmu_enable_nesting,
+ .release = arm_smmu_domain_free,
+};
+
/* Probing and initialisation functions */
static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
struct arm_smmu_queue *q,
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 4bc75c4ce402..b59c7a0df781 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -91,6 +91,7 @@ static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)

static struct platform_driver arm_smmu_driver;
static struct iommu_ops arm_smmu_ops;
+static const struct domain_ops arm_smmu_domain_ops;

#ifdef CONFIG_ARM_SMMU_LEGACY_DT_BINDINGS
static int arm_smmu_bus_init(struct iommu_ops *ops);
@@ -888,6 +889,7 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)

mutex_init(&smmu_domain->init_mutex);
spin_lock_init(&smmu_domain->cb_lock);
+ iommu_domain_set_ops(&smmu_domain->domain, &arm_smmu_domain_ops);

return &smmu_domain->domain;
}
@@ -1583,19 +1585,10 @@ static int arm_smmu_def_domain_type(struct device *dev)
static struct iommu_ops arm_smmu_ops = {
.capable = arm_smmu_capable,
.domain_alloc = arm_smmu_domain_alloc,
- .domain_free = arm_smmu_domain_free,
- .attach_dev = arm_smmu_attach_dev,
- .map_pages = arm_smmu_map_pages,
- .unmap_pages = arm_smmu_unmap_pages,
- .flush_iotlb_all = arm_smmu_flush_iotlb_all,
- .iotlb_sync = arm_smmu_iotlb_sync,
- .iova_to_phys = arm_smmu_iova_to_phys,
.probe_device = arm_smmu_probe_device,
.release_device = arm_smmu_release_device,
.probe_finalize = arm_smmu_probe_finalize,
.device_group = arm_smmu_device_group,
- .enable_nesting = arm_smmu_enable_nesting,
- .set_pgtable_quirks = arm_smmu_set_pgtable_quirks,
.of_xlate = arm_smmu_of_xlate,
.get_resv_regions = arm_smmu_get_resv_regions,
.put_resv_regions = generic_iommu_put_resv_regions,
@@ -1604,6 +1597,18 @@ static struct iommu_ops arm_smmu_ops = {
.owner = THIS_MODULE,
};

+static const struct domain_ops arm_smmu_domain_ops = {
+ .attach_dev = arm_smmu_attach_dev,
+ .map_pages = arm_smmu_map_pages,
+ .unmap_pages = arm_smmu_unmap_pages,
+ .flush_iotlb_all = arm_smmu_flush_iotlb_all,
+ .iotlb_sync = arm_smmu_iotlb_sync,
+ .iova_to_phys = arm_smmu_iova_to_phys,
+ .set_pgtable_quirks = arm_smmu_set_pgtable_quirks,
+ .enable_nesting = arm_smmu_enable_nesting,
+ .release = arm_smmu_domain_free,
+};
+
static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
{
int i;
diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
index b91874cb6cf3..4469e4758dbb 100644
--- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c
+++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
@@ -78,6 +78,7 @@ static struct qcom_iommu_domain *to_qcom_iommu_domain(struct iommu_domain *dom)
}

static const struct iommu_ops qcom_iommu_ops;
+static const struct domain_ops qcom_domain_ops;

static struct qcom_iommu_dev * to_iommu(struct device *dev)
{
@@ -336,6 +337,7 @@ static struct iommu_domain *qcom_iommu_domain_alloc(unsigned type)

mutex_init(&qcom_domain->init_mutex);
spin_lock_init(&qcom_domain->pgtbl_lock);
+ iommu_domain_set_ops(&qcom_domain->domain, &qcom_domain_ops);

return &qcom_domain->domain;
}
@@ -590,7 +592,14 @@ static int qcom_iommu_of_xlate(struct device *dev, struct of_phandle_args *args)
static const struct iommu_ops qcom_iommu_ops = {
.capable = qcom_iommu_capable,
.domain_alloc = qcom_iommu_domain_alloc,
- .domain_free = qcom_iommu_domain_free,
+ .probe_device = qcom_iommu_probe_device,
+ .release_device = qcom_iommu_release_device,
+ .device_group = generic_device_group,
+ .of_xlate = qcom_iommu_of_xlate,
+ .pgsize_bitmap = SZ_4K | SZ_64K | SZ_1M | SZ_16M,
+};
+
+static const struct domain_ops qcom_domain_ops = {
.attach_dev = qcom_iommu_attach_dev,
.detach_dev = qcom_iommu_detach_dev,
.map = qcom_iommu_map,
@@ -598,11 +607,7 @@ static const struct iommu_ops qcom_iommu_ops = {
.flush_iotlb_all = qcom_iommu_flush_iotlb_all,
.iotlb_sync = qcom_iommu_iotlb_sync,
.iova_to_phys = qcom_iommu_iova_to_phys,
- .probe_device = qcom_iommu_probe_device,
- .release_device = qcom_iommu_release_device,
- .device_group = generic_device_group,
- .of_xlate = qcom_iommu_of_xlate,
- .pgsize_bitmap = SZ_4K | SZ_64K | SZ_1M | SZ_16M,
+ .release = qcom_iommu_domain_free,
};

static int qcom_iommu_sec_ptbl_init(struct device *dev)
diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 939ffa768986..b69e8b52f3c4 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -563,6 +563,7 @@ static void sysmmu_tlb_invalidate_entry(struct sysmmu_drvdata *data,
}

static const struct iommu_ops exynos_iommu_ops;
+static const struct domain_ops exynos_domain_ops;

static int exynos_sysmmu_probe(struct platform_device *pdev)
{
@@ -767,6 +768,7 @@ static struct iommu_domain *exynos_iommu_domain_alloc(unsigned type)
domain->domain.geometry.aperture_start = 0;
domain->domain.geometry.aperture_end = ~0UL;
domain->domain.geometry.force_aperture = true;
+ iommu_domain_set_ops(&domain->domain, &exynos_domain_ops);

return &domain->domain;

@@ -1309,12 +1311,6 @@ static int exynos_iommu_of_xlate(struct device *dev,

static const struct iommu_ops exynos_iommu_ops = {
.domain_alloc = exynos_iommu_domain_alloc,
- .domain_free = exynos_iommu_domain_free,
- .attach_dev = exynos_iommu_attach_device,
- .detach_dev = exynos_iommu_detach_device,
- .map = exynos_iommu_map,
- .unmap = exynos_iommu_unmap,
- .iova_to_phys = exynos_iommu_iova_to_phys,
.device_group = generic_device_group,
.probe_device = exynos_iommu_probe_device,
.release_device = exynos_iommu_release_device,
@@ -1322,6 +1318,15 @@ static const struct iommu_ops exynos_iommu_ops = {
.of_xlate = exynos_iommu_of_xlate,
};

+static const struct domain_ops exynos_domain_ops = {
+ .attach_dev = exynos_iommu_attach_device,
+ .detach_dev = exynos_iommu_detach_device,
+ .map = exynos_iommu_map,
+ .unmap = exynos_iommu_unmap,
+ .iova_to_phys = exynos_iommu_iova_to_phys,
+ .release = exynos_iommu_domain_free,
+};
+
static int __init exynos_iommu_init(void)
{
struct device_node *np;
diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
index a47f47307109..0caae87d1471 100644
--- a/drivers/iommu/fsl_pamu_domain.c
+++ b/drivers/iommu/fsl_pamu_domain.c
@@ -20,6 +20,7 @@ static DEFINE_SPINLOCK(iommu_lock);
static struct kmem_cache *fsl_pamu_domain_cache;
static struct kmem_cache *iommu_devinfo_cache;
static DEFINE_SPINLOCK(device_domain_lock);
+static const struct domain_ops fsl_pamu_domain_ops;

struct iommu_device pamu_iommu; /* IOMMU core code handle */

@@ -210,6 +211,7 @@ static struct iommu_domain *fsl_pamu_domain_alloc(unsigned type)
dma_domain->iommu_domain. geometry.aperture_start = 0;
dma_domain->iommu_domain.geometry.aperture_end = (1ULL << 36) - 1;
dma_domain->iommu_domain.geometry.force_aperture = true;
+ iommu_domain_set_ops(&dma_domain->iommu_domain, &fsl_pamu_domain_ops);

return &dma_domain->iommu_domain;
}
@@ -453,15 +455,18 @@ static void fsl_pamu_release_device(struct device *dev)
static const struct iommu_ops fsl_pamu_ops = {
.capable = fsl_pamu_capable,
.domain_alloc = fsl_pamu_domain_alloc,
- .domain_free = fsl_pamu_domain_free,
- .attach_dev = fsl_pamu_attach_device,
- .detach_dev = fsl_pamu_detach_device,
- .iova_to_phys = fsl_pamu_iova_to_phys,
.probe_device = fsl_pamu_probe_device,
.release_device = fsl_pamu_release_device,
.device_group = fsl_pamu_device_group,
};

+static const struct domain_ops fsl_pamu_domain_ops = {
+ .attach_dev = fsl_pamu_attach_device,
+ .detach_dev = fsl_pamu_detach_device,
+ .iova_to_phys = fsl_pamu_iova_to_phys,
+ .release = fsl_pamu_domain_free,
+};
+
int __init pamu_domain_init(void)
{
int ret = 0;
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 2b5f4e57a8bb..1a5ea7c23e4b 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -385,6 +385,7 @@ int for_each_device_domain(int (*fn)(struct device_domain_info *info,
}

const struct iommu_ops intel_iommu_ops;
+static const struct domain_ops intel_domain_ops;

static bool translation_pre_enabled(struct intel_iommu *iommu)
{
@@ -2772,6 +2773,8 @@ static int __init si_domain_init(int hw)
return -EFAULT;
}

+ iommu_domain_set_ops(&si_domain->domain, &intel_domain_ops);
+
if (hw)
return 0;

@@ -4586,6 +4589,7 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type)
domain->geometry.aperture_end =
__DOMAIN_MAX_ADDR(dmar_domain->gaw);
domain->geometry.force_aperture = true;
+ iommu_domain_set_ops(domain, &intel_domain_ops);

return domain;
case IOMMU_DOMAIN_IDENTITY:
@@ -5093,15 +5097,6 @@ static void intel_iommu_iotlb_sync_map(struct iommu_domain *domain,
const struct iommu_ops intel_iommu_ops = {
.capable = intel_iommu_capable,
.domain_alloc = intel_iommu_domain_alloc,
- .domain_free = intel_iommu_domain_free,
- .attach_dev = intel_iommu_attach_device,
- .detach_dev = intel_iommu_detach_device,
- .map_pages = intel_iommu_map_pages,
- .unmap_pages = intel_iommu_unmap_pages,
- .iotlb_sync_map = intel_iommu_iotlb_sync_map,
- .flush_iotlb_all = intel_flush_iotlb_all,
- .iotlb_sync = intel_iommu_tlb_sync,
- .iova_to_phys = intel_iommu_iova_to_phys,
.probe_device = intel_iommu_probe_device,
.probe_finalize = intel_iommu_probe_finalize,
.release_device = intel_iommu_release_device,
@@ -5121,6 +5116,18 @@ const struct iommu_ops intel_iommu_ops = {
#endif
};

+static const struct domain_ops intel_domain_ops = {
+ .attach_dev = intel_iommu_attach_device,
+ .detach_dev = intel_iommu_detach_device,
+ .map_pages = intel_iommu_map_pages,
+ .unmap_pages = intel_iommu_unmap_pages,
+ .iotlb_sync_map = intel_iommu_iotlb_sync_map,
+ .flush_iotlb_all = intel_flush_iotlb_all,
+ .iotlb_sync = intel_iommu_tlb_sync,
+ .iova_to_phys = intel_iommu_iova_to_phys,
+ .release = intel_iommu_domain_free,
+};
+
static void quirk_iommu_igfx(struct pci_dev *dev)
{
if (risky_device(dev))
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 6631e2ea44df..b44e5ab12b8a 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1954,7 +1954,6 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
if (!domain)
return NULL;

- domain->ops = bus->iommu_ops;
domain->type = type;
/* Assume all sizes by default; the driver may override this later */
domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap;
@@ -1975,7 +1974,7 @@ EXPORT_SYMBOL_GPL(iommu_domain_alloc);
void iommu_domain_free(struct iommu_domain *domain)
{
iommu_put_dma_cookie(domain);
- domain->ops->domain_free(domain);
+ domain->ops->release(domain);
}
EXPORT_SYMBOL_GPL(iommu_domain_free);

@@ -2248,7 +2247,7 @@ static int __iommu_map_pages(struct iommu_domain *domain, unsigned long iova,
phys_addr_t paddr, size_t size, int prot,
gfp_t gfp, size_t *mapped)
{
- const struct iommu_ops *ops = domain->ops;
+ const struct domain_ops *ops = domain->ops;
size_t pgsize, count;
int ret;

@@ -2271,7 +2270,7 @@ static int __iommu_map_pages(struct iommu_domain *domain, unsigned long iova,
static int __iommu_map(struct iommu_domain *domain, unsigned long iova,
phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
{
- const struct iommu_ops *ops = domain->ops;
+ const struct domain_ops *ops = domain->ops;
unsigned long orig_iova = iova;
unsigned int min_pagesz;
size_t orig_size = size;
@@ -2331,7 +2330,7 @@ static int __iommu_map(struct iommu_domain *domain, unsigned long iova,
static int _iommu_map(struct iommu_domain *domain, unsigned long iova,
phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
{
- const struct iommu_ops *ops = domain->ops;
+ const struct domain_ops *ops = domain->ops;
int ret;

ret = __iommu_map(domain, iova, paddr, size, prot, gfp);
@@ -2360,7 +2359,7 @@ static size_t __iommu_unmap_pages(struct iommu_domain *domain,
unsigned long iova, size_t size,
struct iommu_iotlb_gather *iotlb_gather)
{
- const struct iommu_ops *ops = domain->ops;
+ const struct domain_ops *ops = domain->ops;
size_t pgsize, count;

pgsize = iommu_pgsize(domain, iova, iova, size, &count);
@@ -2373,7 +2372,7 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
unsigned long iova, size_t size,
struct iommu_iotlb_gather *iotlb_gather)
{
- const struct iommu_ops *ops = domain->ops;
+ const struct domain_ops *ops = domain->ops;
size_t unmapped_page, unmapped = 0;
unsigned long orig_iova = iova;
unsigned int min_pagesz;
@@ -2449,7 +2448,7 @@ static ssize_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
struct scatterlist *sg, unsigned int nents, int prot,
gfp_t gfp)
{
- const struct iommu_ops *ops = domain->ops;
+ const struct domain_ops *ops = domain->ops;
size_t len = 0, mapped = 0;
phys_addr_t start;
unsigned int i = 0;
diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index ca752bdc710f..6b92130181c3 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -566,6 +566,7 @@ static irqreturn_t ipmmu_irq(int irq, void *dev)
/* -----------------------------------------------------------------------------
* IOMMU Operations
*/
+static const struct domain_ops ipmmu_domain_ops;

static struct iommu_domain *ipmmu_domain_alloc(unsigned type)
{
@@ -579,6 +580,7 @@ static struct iommu_domain *ipmmu_domain_alloc(unsigned type)
return NULL;

mutex_init(&domain->mutex);
+ iommu_domain_set_ops(&domain->io_domain, &ipmmu_domain_ops);

return &domain->io_domain;
}
@@ -868,14 +870,6 @@ static struct iommu_group *ipmmu_find_group(struct device *dev)

static const struct iommu_ops ipmmu_ops = {
.domain_alloc = ipmmu_domain_alloc,
- .domain_free = ipmmu_domain_free,
- .attach_dev = ipmmu_attach_device,
- .detach_dev = ipmmu_detach_device,
- .map = ipmmu_map,
- .unmap = ipmmu_unmap,
- .flush_iotlb_all = ipmmu_flush_iotlb_all,
- .iotlb_sync = ipmmu_iotlb_sync,
- .iova_to_phys = ipmmu_iova_to_phys,
.probe_device = ipmmu_probe_device,
.release_device = ipmmu_release_device,
.probe_finalize = ipmmu_probe_finalize,
@@ -885,6 +879,17 @@ static const struct iommu_ops ipmmu_ops = {
.of_xlate = ipmmu_of_xlate,
};

+static const struct domain_ops ipmmu_domain_ops = {
+ .attach_dev = ipmmu_attach_device,
+ .detach_dev = ipmmu_detach_device,
+ .map = ipmmu_map,
+ .unmap = ipmmu_unmap,
+ .flush_iotlb_all = ipmmu_flush_iotlb_all,
+ .iotlb_sync = ipmmu_iotlb_sync,
+ .iova_to_phys = ipmmu_iova_to_phys,
+ .release = ipmmu_domain_free,
+};
+
/* -----------------------------------------------------------------------------
* Probe/remove and init
*/
diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index 3a38352b603f..757e46c84044 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -36,6 +36,7 @@ __asm__ __volatile__ ( \
static DEFINE_SPINLOCK(msm_iommu_lock);
static LIST_HEAD(qcom_iommu_devices);
static struct iommu_ops msm_iommu_ops;
+static const struct domain_ops msm_domain_ops;

struct msm_priv {
struct list_head list_attached;
@@ -318,6 +319,7 @@ static struct iommu_domain *msm_iommu_domain_alloc(unsigned type)
priv->domain.geometry.aperture_start = 0;
priv->domain.geometry.aperture_end = (1ULL << 32) - 1;
priv->domain.geometry.force_aperture = true;
+ iommu_domain_set_ops(&priv->domain, &msm_domain_ops);

return &priv->domain;

@@ -674,7 +676,14 @@ irqreturn_t msm_iommu_fault_handler(int irq, void *dev_id)
static struct iommu_ops msm_iommu_ops = {
.capable = msm_iommu_capable,
.domain_alloc = msm_iommu_domain_alloc,
- .domain_free = msm_iommu_domain_free,
+ .probe_device = msm_iommu_probe_device,
+ .release_device = msm_iommu_release_device,
+ .device_group = generic_device_group,
+ .pgsize_bitmap = MSM_IOMMU_PGSIZES,
+ .of_xlate = qcom_iommu_of_xlate,
+};
+
+static const struct domain_ops msm_domain_ops = {
.attach_dev = msm_iommu_attach_dev,
.detach_dev = msm_iommu_detach_dev,
.map = msm_iommu_map,
@@ -688,11 +697,7 @@ static struct iommu_ops msm_iommu_ops = {
.iotlb_sync = NULL,
.iotlb_sync_map = msm_iommu_sync_map,
.iova_to_phys = msm_iommu_iova_to_phys,
- .probe_device = msm_iommu_probe_device,
- .release_device = msm_iommu_release_device,
- .device_group = generic_device_group,
- .pgsize_bitmap = MSM_IOMMU_PGSIZES,
- .of_xlate = qcom_iommu_of_xlate,
+ .release = msm_iommu_domain_free,
};

static int msm_iommu_probe(struct platform_device *pdev)
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 25b834104790..45c05899cfd9 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -131,6 +131,7 @@ struct mtk_iommu_domain {
};

static const struct iommu_ops mtk_iommu_ops;
+static const struct domain_ops mtk_domain_ops;

static int mtk_iommu_hw_init(const struct mtk_iommu_data *data);

@@ -440,6 +441,8 @@ static struct iommu_domain *mtk_iommu_domain_alloc(unsigned type)
if (!dom)
return NULL;

+ iommu_domain_set_ops(&dom->domain, &mtk_domain_ops);
+
return &dom->domain;
}

@@ -658,15 +661,6 @@ static void mtk_iommu_get_resv_regions(struct device *dev,

static const struct iommu_ops mtk_iommu_ops = {
.domain_alloc = mtk_iommu_domain_alloc,
- .domain_free = mtk_iommu_domain_free,
- .attach_dev = mtk_iommu_attach_device,
- .detach_dev = mtk_iommu_detach_device,
- .map = mtk_iommu_map,
- .unmap = mtk_iommu_unmap,
- .flush_iotlb_all = mtk_iommu_flush_iotlb_all,
- .iotlb_sync = mtk_iommu_iotlb_sync,
- .iotlb_sync_map = mtk_iommu_sync_map,
- .iova_to_phys = mtk_iommu_iova_to_phys,
.probe_device = mtk_iommu_probe_device,
.release_device = mtk_iommu_release_device,
.device_group = mtk_iommu_device_group,
@@ -677,6 +671,18 @@ static const struct iommu_ops mtk_iommu_ops = {
.owner = THIS_MODULE,
};

+static const struct domain_ops mtk_domain_ops = {
+ .attach_dev = mtk_iommu_attach_device,
+ .detach_dev = mtk_iommu_detach_device,
+ .map = mtk_iommu_map,
+ .unmap = mtk_iommu_unmap,
+ .flush_iotlb_all = mtk_iommu_flush_iotlb_all,
+ .iotlb_sync = mtk_iommu_iotlb_sync,
+ .iotlb_sync_map = mtk_iommu_sync_map,
+ .iova_to_phys = mtk_iommu_iova_to_phys,
+ .release = mtk_iommu_domain_free,
+};
+
static int mtk_iommu_hw_init(const struct mtk_iommu_data *data)
{
u32 regval;
diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index be22fcf988ce..10a9d6be42ff 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -87,6 +87,8 @@
*/
#define M2701_IOMMU_PGT_SIZE SZ_4M

+static const struct domain_ops mtk_domain_ops;
+
struct mtk_iommu_domain {
spinlock_t pgtlock; /* lock for page table */
struct iommu_domain domain;
@@ -246,6 +248,8 @@ static struct iommu_domain *mtk_iommu_domain_alloc(unsigned type)
if (!dom)
return NULL;

+ iommu_domain_set_ops(&dom->domain, &mtk_domain_ops);
+
return &dom->domain;
}

@@ -514,12 +518,6 @@ static int mtk_iommu_hw_init(const struct mtk_iommu_data *data)

static const struct iommu_ops mtk_iommu_ops = {
.domain_alloc = mtk_iommu_domain_alloc,
- .domain_free = mtk_iommu_domain_free,
- .attach_dev = mtk_iommu_attach_device,
- .detach_dev = mtk_iommu_detach_device,
- .map = mtk_iommu_map,
- .unmap = mtk_iommu_unmap,
- .iova_to_phys = mtk_iommu_iova_to_phys,
.probe_device = mtk_iommu_probe_device,
.probe_finalize = mtk_iommu_probe_finalize,
.release_device = mtk_iommu_release_device,
@@ -529,6 +527,15 @@ static const struct iommu_ops mtk_iommu_ops = {
.owner = THIS_MODULE,
};

+static const struct domain_ops mtk_domain_ops = {
+ .attach_dev = mtk_iommu_attach_device,
+ .detach_dev = mtk_iommu_detach_device,
+ .map = mtk_iommu_map,
+ .unmap = mtk_iommu_unmap,
+ .iova_to_phys = mtk_iommu_iova_to_phys,
+ .release = mtk_iommu_domain_free,
+};
+
static const struct of_device_id mtk_iommu_of_ids[] = {
{ .compatible = "mediatek,mt2701-m4u", },
{}
diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index 91749654fd49..ecd9cb589a31 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -33,6 +33,7 @@
#include "omap-iommu.h"

static const struct iommu_ops omap_iommu_ops;
+static const struct domain_ops omap_domain_ops;

#define to_iommu(dev) ((struct omap_iommu *)dev_get_drvdata(dev))

@@ -1582,6 +1583,7 @@ static struct iommu_domain *omap_iommu_domain_alloc(unsigned type)
omap_domain->domain.geometry.aperture_start = 0;
omap_domain->domain.geometry.aperture_end = (1ULL << 32) - 1;
omap_domain->domain.geometry.force_aperture = true;
+ iommu_domain_set_ops(&omap_domain->domain, &omap_domain_ops);

return &omap_domain->domain;
}
@@ -1734,16 +1736,19 @@ static struct iommu_group *omap_iommu_device_group(struct device *dev)

static const struct iommu_ops omap_iommu_ops = {
.domain_alloc = omap_iommu_domain_alloc,
- .domain_free = omap_iommu_domain_free,
+ .probe_device = omap_iommu_probe_device,
+ .release_device = omap_iommu_release_device,
+ .device_group = omap_iommu_device_group,
+ .pgsize_bitmap = OMAP_IOMMU_PGSIZES,
+};
+
+static const struct domain_ops omap_domain_ops = {
.attach_dev = omap_iommu_attach_dev,
.detach_dev = omap_iommu_detach_dev,
.map = omap_iommu_map,
.unmap = omap_iommu_unmap,
.iova_to_phys = omap_iommu_iova_to_phys,
- .probe_device = omap_iommu_probe_device,
- .release_device = omap_iommu_release_device,
- .device_group = omap_iommu_device_group,
- .pgsize_bitmap = OMAP_IOMMU_PGSIZES,
+ .release = omap_iommu_domain_free,
};

static int __init omap_iommu_init(void)
diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index 7f23ad61c094..0261c044ded1 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -124,6 +124,7 @@ struct rk_iommudata {

static struct device *dma_dev;
static const struct rk_iommu_ops *rk_ops;
+static const struct domain_ops rk_domain_ops;

static inline void rk_table_flush(struct rk_iommu_domain *dom, dma_addr_t dma,
unsigned int count)
@@ -1096,6 +1097,7 @@ static struct iommu_domain *rk_iommu_domain_alloc(unsigned type)
rk_domain->domain.geometry.aperture_start = 0;
rk_domain->domain.geometry.aperture_end = DMA_BIT_MASK(32);
rk_domain->domain.geometry.force_aperture = true;
+ iommu_domain_set_ops(&rk_domain->domain, &rk_domain_ops);

return &rk_domain->domain;

@@ -1187,19 +1189,22 @@ static int rk_iommu_of_xlate(struct device *dev,

static const struct iommu_ops rk_iommu_ops = {
.domain_alloc = rk_iommu_domain_alloc,
- .domain_free = rk_iommu_domain_free,
- .attach_dev = rk_iommu_attach_device,
- .detach_dev = rk_iommu_detach_device,
- .map = rk_iommu_map,
- .unmap = rk_iommu_unmap,
.probe_device = rk_iommu_probe_device,
.release_device = rk_iommu_release_device,
- .iova_to_phys = rk_iommu_iova_to_phys,
.device_group = rk_iommu_device_group,
.pgsize_bitmap = RK_IOMMU_PGSIZE_BITMAP,
.of_xlate = rk_iommu_of_xlate,
};

+static const struct domain_ops rk_domain_ops = {
+ .attach_dev = rk_iommu_attach_device,
+ .detach_dev = rk_iommu_detach_device,
+ .map = rk_iommu_map,
+ .unmap = rk_iommu_unmap,
+ .iova_to_phys = rk_iommu_iova_to_phys,
+ .release = rk_iommu_domain_free,
+};
+
static int rk_iommu_probe(struct platform_device *pdev)
{
struct device *dev = &pdev->dev;
diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index 50860ebdd087..70011eedef7c 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -20,6 +20,7 @@
#define S390_IOMMU_PGSIZES (~0xFFFUL)

static const struct iommu_ops s390_iommu_ops;
+static const struct domain_ops s390_domain_ops;

struct s390_domain {
struct iommu_domain domain;
@@ -71,6 +72,7 @@ static struct iommu_domain *s390_domain_alloc(unsigned domain_type)
spin_lock_init(&s390_domain->dma_table_lock);
spin_lock_init(&s390_domain->list_lock);
INIT_LIST_HEAD(&s390_domain->devices);
+ iommu_domain_set_ops(&s390_domain->domain, &s390_domain_ops);

return &s390_domain->domain;
}
@@ -363,16 +365,19 @@ void zpci_destroy_iommu(struct zpci_dev *zdev)
static const struct iommu_ops s390_iommu_ops = {
.capable = s390_iommu_capable,
.domain_alloc = s390_domain_alloc,
- .domain_free = s390_domain_free,
+ .probe_device = s390_iommu_probe_device,
+ .release_device = s390_iommu_release_device,
+ .device_group = generic_device_group,
+ .pgsize_bitmap = S390_IOMMU_PGSIZES,
+};
+
+static const struct domain_ops s390_domain_ops = {
.attach_dev = s390_iommu_attach_device,
.detach_dev = s390_iommu_detach_device,
.map = s390_iommu_map,
.unmap = s390_iommu_unmap,
.iova_to_phys = s390_iommu_iova_to_phys,
- .probe_device = s390_iommu_probe_device,
- .release_device = s390_iommu_release_device,
- .device_group = generic_device_group,
- .pgsize_bitmap = S390_IOMMU_PGSIZES,
+ .release = s390_domain_free,
};

static int __init s390_iommu_init(void)
diff --git a/drivers/iommu/sprd-iommu.c b/drivers/iommu/sprd-iommu.c
index 27ac818b0354..9fee9eafbff8 100644
--- a/drivers/iommu/sprd-iommu.c
+++ b/drivers/iommu/sprd-iommu.c
@@ -81,6 +81,7 @@ struct sprd_iommu_domain {
};

static const struct iommu_ops sprd_iommu_ops;
+static const struct domain_ops sprd_domain_ops;

static struct sprd_iommu_domain *to_sprd_domain(struct iommu_domain *dom)
{
@@ -147,6 +148,7 @@ static struct iommu_domain *sprd_iommu_domain_alloc(unsigned int domain_type)

dom->domain.geometry.aperture_start = 0;
dom->domain.geometry.aperture_end = SZ_256M - 1;
+ iommu_domain_set_ops(&dom->domain, &sprd_domain_ops);

return &dom->domain;
}
@@ -416,7 +418,15 @@ static int sprd_iommu_of_xlate(struct device *dev, struct of_phandle_args *args)

static const struct iommu_ops sprd_iommu_ops = {
.domain_alloc = sprd_iommu_domain_alloc,
- .domain_free = sprd_iommu_domain_free,
+ .probe_device = sprd_iommu_probe_device,
+ .release_device = sprd_iommu_release_device,
+ .device_group = sprd_iommu_device_group,
+ .of_xlate = sprd_iommu_of_xlate,
+ .pgsize_bitmap = ~0UL << SPRD_IOMMU_PAGE_SHIFT,
+ .owner = THIS_MODULE,
+};
+
+static const struct domain_ops sprd_domain_ops = {
.attach_dev = sprd_iommu_attach_device,
.detach_dev = sprd_iommu_detach_device,
.map = sprd_iommu_map,
@@ -424,12 +434,7 @@ static const struct iommu_ops sprd_iommu_ops = {
.iotlb_sync_map = sprd_iommu_sync_map,
.iotlb_sync = sprd_iommu_sync,
.iova_to_phys = sprd_iommu_iova_to_phys,
- .probe_device = sprd_iommu_probe_device,
- .release_device = sprd_iommu_release_device,
- .device_group = sprd_iommu_device_group,
- .of_xlate = sprd_iommu_of_xlate,
- .pgsize_bitmap = ~0UL << SPRD_IOMMU_PAGE_SHIFT,
- .owner = THIS_MODULE,
+ .release = sprd_iommu_domain_free,
};

static const struct of_device_id sprd_iommu_of_match[] = {
diff --git a/drivers/iommu/sun50i-iommu.c b/drivers/iommu/sun50i-iommu.c
index 92997021e188..7b5617695183 100644
--- a/drivers/iommu/sun50i-iommu.c
+++ b/drivers/iommu/sun50i-iommu.c
@@ -596,6 +596,8 @@ static phys_addr_t sun50i_iommu_iova_to_phys(struct iommu_domain *domain,
sun50i_iova_get_page_offset(iova);
}

+static const struct domain_ops sun50i_domain_ops;
+
static struct iommu_domain *sun50i_iommu_domain_alloc(unsigned type)
{
struct sun50i_iommu_domain *sun50i_domain;
@@ -619,6 +621,7 @@ static struct iommu_domain *sun50i_iommu_domain_alloc(unsigned type)
sun50i_domain->domain.geometry.aperture_start = 0;
sun50i_domain->domain.geometry.aperture_end = DMA_BIT_MASK(32);
sun50i_domain->domain.geometry.force_aperture = true;
+ iommu_domain_set_ops(&sun50i_domain->domain, &sun50i_domain_ops);

return &sun50i_domain->domain;

@@ -760,19 +763,22 @@ static int sun50i_iommu_of_xlate(struct device *dev,

static const struct iommu_ops sun50i_iommu_ops = {
.pgsize_bitmap = SZ_4K,
- .attach_dev = sun50i_iommu_attach_device,
- .detach_dev = sun50i_iommu_detach_device,
.device_group = sun50i_iommu_device_group,
.domain_alloc = sun50i_iommu_domain_alloc,
- .domain_free = sun50i_iommu_domain_free,
+ .of_xlate = sun50i_iommu_of_xlate,
+ .probe_device = sun50i_iommu_probe_device,
+ .release_device = sun50i_iommu_release_device,
+};
+
+static const struct domain_ops sun50i_domain_ops = {
+ .attach_dev = sun50i_iommu_attach_device,
+ .detach_dev = sun50i_iommu_detach_device,
.flush_iotlb_all = sun50i_iommu_flush_iotlb_all,
.iotlb_sync = sun50i_iommu_iotlb_sync,
.iova_to_phys = sun50i_iommu_iova_to_phys,
.map = sun50i_iommu_map,
- .of_xlate = sun50i_iommu_of_xlate,
- .probe_device = sun50i_iommu_probe_device,
- .release_device = sun50i_iommu_release_device,
.unmap = sun50i_iommu_unmap,
+ .release = sun50i_iommu_domain_free,
};

static void sun50i_iommu_report_fault(struct sun50i_iommu *iommu,
diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
index 6a358f92c7e5..cada6934c664 100644
--- a/drivers/iommu/tegra-gart.c
+++ b/drivers/iommu/tegra-gart.c
@@ -49,6 +49,7 @@ struct gart_device {
static struct gart_device *gart_handle; /* unique for a system */

static bool gart_debug;
+static const struct domain_ops gart_domain_ops;

/*
* Any interaction between any block on PPSB and a block on APB or AHB
@@ -153,6 +154,7 @@ static struct iommu_domain *gart_iommu_domain_alloc(unsigned type)
domain->geometry.aperture_start = gart_handle->iovmm_base;
domain->geometry.aperture_end = gart_handle->iovmm_end - 1;
domain->geometry.force_aperture = true;
+ iommu_domain_set_ops(domain, &gart_domain_ops);
}

return domain;
@@ -278,19 +280,22 @@ static void gart_iommu_sync(struct iommu_domain *domain,
static const struct iommu_ops gart_iommu_ops = {
.capable = gart_iommu_capable,
.domain_alloc = gart_iommu_domain_alloc,
- .domain_free = gart_iommu_domain_free,
- .attach_dev = gart_iommu_attach_dev,
- .detach_dev = gart_iommu_detach_dev,
.probe_device = gart_iommu_probe_device,
.release_device = gart_iommu_release_device,
.device_group = generic_device_group,
+ .pgsize_bitmap = GART_IOMMU_PGSIZES,
+ .of_xlate = gart_iommu_of_xlate,
+};
+
+static const struct domain_ops gart_domain_ops = {
+ .attach_dev = gart_iommu_attach_dev,
+ .detach_dev = gart_iommu_detach_dev,
.map = gart_iommu_map,
.unmap = gart_iommu_unmap,
.iova_to_phys = gart_iommu_iova_to_phys,
- .pgsize_bitmap = GART_IOMMU_PGSIZES,
- .of_xlate = gart_iommu_of_xlate,
.iotlb_sync_map = gart_iommu_sync_map,
.iotlb_sync = gart_iommu_sync,
+ .release = gart_iommu_domain_free,
};

int tegra_gart_suspend(struct gart_device *gart)
diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index e900e3c46903..e9ea0454ba9d 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -62,6 +62,8 @@ struct tegra_smmu_as {
u32 attr;
};

+static const struct domain_ops tegra_smmu_domain_ops;
+
static struct tegra_smmu_as *to_smmu_as(struct iommu_domain *dom)
{
return container_of(dom, struct tegra_smmu_as, domain);
@@ -317,6 +319,7 @@ static struct iommu_domain *tegra_smmu_domain_alloc(unsigned type)
as->domain.geometry.aperture_start = 0;
as->domain.geometry.aperture_end = 0xffffffff;
as->domain.geometry.force_aperture = true;
+ iommu_domain_set_ops(&as->domain, &tegra_smmu_domain_ops);

return &as->domain;
}
@@ -969,17 +972,20 @@ static int tegra_smmu_of_xlate(struct device *dev,
static const struct iommu_ops tegra_smmu_ops = {
.capable = tegra_smmu_capable,
.domain_alloc = tegra_smmu_domain_alloc,
- .domain_free = tegra_smmu_domain_free,
- .attach_dev = tegra_smmu_attach_dev,
- .detach_dev = tegra_smmu_detach_dev,
.probe_device = tegra_smmu_probe_device,
.release_device = tegra_smmu_release_device,
.device_group = tegra_smmu_device_group,
+ .of_xlate = tegra_smmu_of_xlate,
+ .pgsize_bitmap = SZ_4K,
+};
+
+static const struct domain_ops tegra_smmu_domain_ops = {
+ .attach_dev = tegra_smmu_attach_dev,
+ .detach_dev = tegra_smmu_detach_dev,
.map = tegra_smmu_map,
.unmap = tegra_smmu_unmap,
.iova_to_phys = tegra_smmu_iova_to_phys,
- .of_xlate = tegra_smmu_of_xlate,
- .pgsize_bitmap = SZ_4K,
+ .release = tegra_smmu_domain_free,
};

static void tegra_smmu_ahb_enable(void)
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index f2aa34f57454..538409592c6a 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -98,6 +98,8 @@ struct viommu_event {
};
};

+static const struct domain_ops viommu_domain_ops;
+
#define to_viommu_domain(domain) \
container_of(domain, struct viommu_domain, domain)

@@ -652,6 +654,7 @@ static struct iommu_domain *viommu_domain_alloc(unsigned type)
mutex_init(&vdomain->mutex);
spin_lock_init(&vdomain->mappings_lock);
vdomain->mappings = RB_ROOT_CACHED;
+ iommu_domain_set_ops(&vdomain->domain, &viommu_domain_ops);

return &vdomain->domain;
}
@@ -1008,12 +1011,6 @@ static int viommu_of_xlate(struct device *dev, struct of_phandle_args *args)

static struct iommu_ops viommu_ops = {
.domain_alloc = viommu_domain_alloc,
- .domain_free = viommu_domain_free,
- .attach_dev = viommu_attach_dev,
- .map = viommu_map,
- .unmap = viommu_unmap,
- .iova_to_phys = viommu_iova_to_phys,
- .iotlb_sync = viommu_iotlb_sync,
.probe_device = viommu_probe_device,
.probe_finalize = viommu_probe_finalize,
.release_device = viommu_release_device,
@@ -1024,6 +1021,15 @@ static struct iommu_ops viommu_ops = {
.owner = THIS_MODULE,
};

+static const struct domain_ops viommu_domain_ops = {
+ .attach_dev = viommu_attach_dev,
+ .map = viommu_map,
+ .unmap = viommu_unmap,
+ .iova_to_phys = viommu_iova_to_phys,
+ .iotlb_sync = viommu_iotlb_sync,
+ .release = viommu_domain_free,
+};
+
static int viommu_init_vqs(struct viommu_dev *viommu)
{
struct virtio_device *vdev = dev_to_virtio(viommu->dev);
--
2.25.1

2022-01-24 18:03:37

by Lu Baolu

[permalink] [raw]
Subject: [PATCH 3/7] iommu/vt-d: Remove aux-domain related callbacks

The aux-domain related callbacks are not called in the tree. Remove them
to avoid dead code.

Signed-off-by: Lu Baolu <[email protected]>
---
include/linux/intel-iommu.h | 17 --
drivers/iommu/intel/debugfs.c | 3 +-
drivers/iommu/intel/iommu.c | 309 +---------------------------------
3 files changed, 3 insertions(+), 326 deletions(-)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index beaacb589abd..5cfda90b2cca 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -542,7 +542,6 @@ struct dmar_domain {
u8 iommu_snooping: 1; /* indicate snooping control feature */

struct list_head devices; /* all devices' list */
- struct list_head subdevices; /* all subdevices' list */
struct iova_domain iovad; /* iova's that belong to this domain */

struct dma_pte *pgd; /* virtual address */
@@ -557,11 +556,6 @@ struct dmar_domain {
2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
u64 max_addr; /* maximum mapped address */

- u32 default_pasid; /*
- * The default pasid used for non-SVM
- * traffic on mediated devices.
- */
-
struct iommu_domain domain; /* generic domain data structure for
iommu core */
};
@@ -614,21 +608,11 @@ struct intel_iommu {
void *perf_statistic;
};

-/* Per subdevice private data */
-struct subdev_domain_info {
- struct list_head link_phys; /* link to phys device siblings */
- struct list_head link_domain; /* link to domain siblings */
- struct device *pdev; /* physical device derived from */
- struct dmar_domain *domain; /* aux-domain */
- int users; /* user count */
-};
-
/* PCI domain-device relationship */
struct device_domain_info {
struct list_head link; /* link to domain siblings */
struct list_head global; /* link to global list */
struct list_head table; /* link to pasid table */
- struct list_head subdevices; /* subdevices sibling */
u32 segment; /* PCI segment number */
u8 bus; /* PCI bus number */
u8 devfn; /* PCI devfn number */
@@ -639,7 +623,6 @@ struct device_domain_info {
u8 pri_enabled:1;
u8 ats_supported:1;
u8 ats_enabled:1;
- u8 auxd_enabled:1; /* Multiple domains per device */
u8 ats_qdep;
struct device *dev; /* it's NULL for PCIe-to-PCI bridge */
struct intel_iommu *iommu; /* IOMMU used by this device */
diff --git a/drivers/iommu/intel/debugfs.c b/drivers/iommu/intel/debugfs.c
index 62e23ff3c987..db7a0ca73626 100644
--- a/drivers/iommu/intel/debugfs.c
+++ b/drivers/iommu/intel/debugfs.c
@@ -351,8 +351,7 @@ static int show_device_domain_translation(struct device *dev, void *data)
if (!domain)
return 0;

- seq_printf(m, "Device %s with pasid %d @0x%llx\n",
- dev_name(dev), domain->default_pasid,
+ seq_printf(m, "Device %s @0x%llx\n", dev_name(dev),
(u64)virt_to_phys(domain->pgd));
seq_puts(m, "IOVA_PFN\t\tPML5E\t\t\tPML4E\t\t\tPDPE\t\t\tPDE\t\t\tPTE\n");

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 4f9d95067c8f..2b5f4e57a8bb 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1573,18 +1573,6 @@ static void domain_update_iotlb(struct dmar_domain *domain)
break;
}

- if (!has_iotlb_device) {
- struct subdev_domain_info *sinfo;
-
- list_for_each_entry(sinfo, &domain->subdevices, link_domain) {
- info = get_domain_info(sinfo->pdev);
- if (info && info->ats_enabled) {
- has_iotlb_device = true;
- break;
- }
- }
- }
-
domain->has_iotlb_device = has_iotlb_device;
}

@@ -1682,7 +1670,6 @@ static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
{
unsigned long flags;
struct device_domain_info *info;
- struct subdev_domain_info *sinfo;

if (!domain->has_iotlb_device)
return;
@@ -1691,27 +1678,9 @@ static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
list_for_each_entry(info, &domain->devices, link)
__iommu_flush_dev_iotlb(info, addr, mask);

- list_for_each_entry(sinfo, &domain->subdevices, link_domain) {
- info = get_domain_info(sinfo->pdev);
- __iommu_flush_dev_iotlb(info, addr, mask);
- }
spin_unlock_irqrestore(&device_domain_lock, flags);
}

-static void domain_flush_piotlb(struct intel_iommu *iommu,
- struct dmar_domain *domain,
- u64 addr, unsigned long npages, bool ih)
-{
- u16 did = domain->iommu_did[iommu->seq_id];
-
- if (domain->default_pasid)
- qi_flush_piotlb(iommu, did, domain->default_pasid,
- addr, npages, ih);
-
- if (!list_empty(&domain->devices))
- qi_flush_piotlb(iommu, did, PASID_RID2PASID, addr, npages, ih);
-}
-
static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
struct dmar_domain *domain,
unsigned long pfn, unsigned int pages,
@@ -1727,7 +1696,7 @@ static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
ih = 1 << 6;

if (domain_use_first_level(domain)) {
- domain_flush_piotlb(iommu, domain, addr, pages, ih);
+ qi_flush_piotlb(iommu, did, PASID_RID2PASID, addr, pages, ih);
} else {
/*
* Fallback to domain selective flush if no PSI support or
@@ -1776,7 +1745,7 @@ static void intel_flush_iotlb_all(struct iommu_domain *domain)
u16 did = dmar_domain->iommu_did[iommu->seq_id];

if (domain_use_first_level(dmar_domain))
- domain_flush_piotlb(iommu, dmar_domain, 0, -1, 0);
+ qi_flush_piotlb(iommu, did, PASID_RID2PASID, 0, -1, 0);
else
iommu->flush.flush_iotlb(iommu, did, 0, 0,
DMA_TLB_DSI_FLUSH);
@@ -1983,7 +1952,6 @@ static struct dmar_domain *alloc_domain(unsigned int type)
domain->flags |= DOMAIN_FLAG_USE_FIRST_LEVEL;
domain->has_iotlb_device = false;
INIT_LIST_HEAD(&domain->devices);
- INIT_LIST_HEAD(&domain->subdevices);

return domain;
}
@@ -2676,8 +2644,6 @@ static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu,
info->domain = domain;
info->iommu = iommu;
info->pasid_table = NULL;
- info->auxd_enabled = 0;
- INIT_LIST_HEAD(&info->subdevices);

if (dev && dev_is_pci(dev)) {
struct pci_dev *pdev = to_pci_dev(info->dev);
@@ -4637,183 +4603,6 @@ static void intel_iommu_domain_free(struct iommu_domain *domain)
domain_exit(to_dmar_domain(domain));
}

-/*
- * Check whether a @domain could be attached to the @dev through the
- * aux-domain attach/detach APIs.
- */
-static inline bool
-is_aux_domain(struct device *dev, struct iommu_domain *domain)
-{
- struct device_domain_info *info = get_domain_info(dev);
-
- return info && info->auxd_enabled &&
- domain->type == IOMMU_DOMAIN_UNMANAGED;
-}
-
-static inline struct subdev_domain_info *
-lookup_subdev_info(struct dmar_domain *domain, struct device *dev)
-{
- struct subdev_domain_info *sinfo;
-
- if (!list_empty(&domain->subdevices)) {
- list_for_each_entry(sinfo, &domain->subdevices, link_domain) {
- if (sinfo->pdev == dev)
- return sinfo;
- }
- }
-
- return NULL;
-}
-
-static int auxiliary_link_device(struct dmar_domain *domain,
- struct device *dev)
-{
- struct device_domain_info *info = get_domain_info(dev);
- struct subdev_domain_info *sinfo = lookup_subdev_info(domain, dev);
-
- assert_spin_locked(&device_domain_lock);
- if (WARN_ON(!info))
- return -EINVAL;
-
- if (!sinfo) {
- sinfo = kzalloc(sizeof(*sinfo), GFP_ATOMIC);
- if (!sinfo)
- return -ENOMEM;
- sinfo->domain = domain;
- sinfo->pdev = dev;
- list_add(&sinfo->link_phys, &info->subdevices);
- list_add(&sinfo->link_domain, &domain->subdevices);
- }
-
- return ++sinfo->users;
-}
-
-static int auxiliary_unlink_device(struct dmar_domain *domain,
- struct device *dev)
-{
- struct device_domain_info *info = get_domain_info(dev);
- struct subdev_domain_info *sinfo = lookup_subdev_info(domain, dev);
- int ret;
-
- assert_spin_locked(&device_domain_lock);
- if (WARN_ON(!info || !sinfo || sinfo->users <= 0))
- return -EINVAL;
-
- ret = --sinfo->users;
- if (!ret) {
- list_del(&sinfo->link_phys);
- list_del(&sinfo->link_domain);
- kfree(sinfo);
- }
-
- return ret;
-}
-
-static int aux_domain_add_dev(struct dmar_domain *domain,
- struct device *dev)
-{
- int ret;
- unsigned long flags;
- struct intel_iommu *iommu;
-
- iommu = device_to_iommu(dev, NULL, NULL);
- if (!iommu)
- return -ENODEV;
-
- if (domain->default_pasid <= 0) {
- u32 pasid;
-
- /* No private data needed for the default pasid */
- pasid = ioasid_alloc(NULL, PASID_MIN,
- pci_max_pasids(to_pci_dev(dev)) - 1,
- NULL);
- if (pasid == INVALID_IOASID) {
- pr_err("Can't allocate default pasid\n");
- return -ENODEV;
- }
- domain->default_pasid = pasid;
- }
-
- spin_lock_irqsave(&device_domain_lock, flags);
- ret = auxiliary_link_device(domain, dev);
- if (ret <= 0)
- goto link_failed;
-
- /*
- * Subdevices from the same physical device can be attached to the
- * same domain. For such cases, only the first subdevice attachment
- * needs to go through the full steps in this function. So if ret >
- * 1, just goto out.
- */
- if (ret > 1)
- goto out;
-
- /*
- * iommu->lock must be held to attach domain to iommu and setup the
- * pasid entry for second level translation.
- */
- spin_lock(&iommu->lock);
- ret = domain_attach_iommu(domain, iommu);
- if (ret)
- goto attach_failed;
-
- /* Setup the PASID entry for mediated devices: */
- if (domain_use_first_level(domain))
- ret = domain_setup_first_level(iommu, domain, dev,
- domain->default_pasid);
- else
- ret = intel_pasid_setup_second_level(iommu, domain, dev,
- domain->default_pasid);
- if (ret)
- goto table_failed;
-
- spin_unlock(&iommu->lock);
-out:
- spin_unlock_irqrestore(&device_domain_lock, flags);
-
- return 0;
-
-table_failed:
- domain_detach_iommu(domain, iommu);
-attach_failed:
- spin_unlock(&iommu->lock);
- auxiliary_unlink_device(domain, dev);
-link_failed:
- spin_unlock_irqrestore(&device_domain_lock, flags);
- if (list_empty(&domain->subdevices) && domain->default_pasid > 0)
- ioasid_put(domain->default_pasid);
-
- return ret;
-}
-
-static void aux_domain_remove_dev(struct dmar_domain *domain,
- struct device *dev)
-{
- struct device_domain_info *info;
- struct intel_iommu *iommu;
- unsigned long flags;
-
- if (!is_aux_domain(dev, &domain->domain))
- return;
-
- spin_lock_irqsave(&device_domain_lock, flags);
- info = get_domain_info(dev);
- iommu = info->iommu;
-
- if (!auxiliary_unlink_device(domain, dev)) {
- spin_lock(&iommu->lock);
- intel_pasid_tear_down_entry(iommu, dev,
- domain->default_pasid, false);
- domain_detach_iommu(domain, iommu);
- spin_unlock(&iommu->lock);
- }
-
- spin_unlock_irqrestore(&device_domain_lock, flags);
-
- if (list_empty(&domain->subdevices) && domain->default_pasid > 0)
- ioasid_put(domain->default_pasid);
-}
-
static int prepare_domain_attach_device(struct iommu_domain *domain,
struct device *dev)
{
@@ -4866,9 +4655,6 @@ static int intel_iommu_attach_device(struct iommu_domain *domain,
return -EPERM;
}

- if (is_aux_domain(dev, domain))
- return -EPERM;
-
/* normally dev is not mapped */
if (unlikely(domain_context_mapped(dev))) {
struct dmar_domain *old_domain;
@@ -4885,33 +4671,12 @@ static int intel_iommu_attach_device(struct iommu_domain *domain,
return domain_add_dev_info(to_dmar_domain(domain), dev);
}

-static int intel_iommu_aux_attach_device(struct iommu_domain *domain,
- struct device *dev)
-{
- int ret;
-
- if (!is_aux_domain(dev, domain))
- return -EPERM;
-
- ret = prepare_domain_attach_device(domain, dev);
- if (ret)
- return ret;
-
- return aux_domain_add_dev(to_dmar_domain(domain), dev);
-}
-
static void intel_iommu_detach_device(struct iommu_domain *domain,
struct device *dev)
{
dmar_remove_one_dev_info(dev);
}

-static void intel_iommu_aux_detach_device(struct iommu_domain *domain,
- struct device *dev)
-{
- aux_domain_remove_dev(to_dmar_domain(domain), dev);
-}
-
static int intel_iommu_map(struct iommu_domain *domain,
unsigned long iova, phys_addr_t hpa,
size_t size, int iommu_prot, gfp_t gfp)
@@ -5205,46 +4970,6 @@ static struct iommu_group *intel_iommu_device_group(struct device *dev)
return generic_device_group(dev);
}

-static int intel_iommu_enable_auxd(struct device *dev)
-{
- struct device_domain_info *info;
- struct intel_iommu *iommu;
- unsigned long flags;
- int ret;
-
- iommu = device_to_iommu(dev, NULL, NULL);
- if (!iommu || dmar_disabled)
- return -EINVAL;
-
- if (!sm_supported(iommu) || !pasid_supported(iommu))
- return -EINVAL;
-
- ret = intel_iommu_enable_pasid(iommu, dev);
- if (ret)
- return -ENODEV;
-
- spin_lock_irqsave(&device_domain_lock, flags);
- info = get_domain_info(dev);
- info->auxd_enabled = 1;
- spin_unlock_irqrestore(&device_domain_lock, flags);
-
- return 0;
-}
-
-static int intel_iommu_disable_auxd(struct device *dev)
-{
- struct device_domain_info *info;
- unsigned long flags;
-
- spin_lock_irqsave(&device_domain_lock, flags);
- info = get_domain_info(dev);
- if (!WARN_ON(!info))
- info->auxd_enabled = 0;
- spin_unlock_irqrestore(&device_domain_lock, flags);
-
- return 0;
-}
-
static int intel_iommu_enable_sva(struct device *dev)
{
struct device_domain_info *info = get_domain_info(dev);
@@ -5301,9 +5026,6 @@ static int
intel_iommu_dev_enable_feat(struct device *dev, enum iommu_dev_features feat)
{
switch (feat) {
- case IOMMU_DEV_FEAT_AUX:
- return intel_iommu_enable_auxd(dev);
-
case IOMMU_DEV_FEAT_IOPF:
return intel_iommu_enable_iopf(dev);

@@ -5319,9 +5041,6 @@ static int
intel_iommu_dev_disable_feat(struct device *dev, enum iommu_dev_features feat)
{
switch (feat) {
- case IOMMU_DEV_FEAT_AUX:
- return intel_iommu_disable_auxd(dev);
-
case IOMMU_DEV_FEAT_IOPF:
return 0;

@@ -5333,26 +5052,6 @@ intel_iommu_dev_disable_feat(struct device *dev, enum iommu_dev_features feat)
}
}

-static bool
-intel_iommu_dev_feat_enabled(struct device *dev, enum iommu_dev_features feat)
-{
- struct device_domain_info *info = get_domain_info(dev);
-
- if (feat == IOMMU_DEV_FEAT_AUX)
- return scalable_mode_support() && info && info->auxd_enabled;
-
- return false;
-}
-
-static int
-intel_iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev)
-{
- struct dmar_domain *dmar_domain = to_dmar_domain(domain);
-
- return dmar_domain->default_pasid > 0 ?
- dmar_domain->default_pasid : -EINVAL;
-}
-
static bool intel_iommu_is_attach_deferred(struct iommu_domain *domain,
struct device *dev)
{
@@ -5397,9 +5096,6 @@ const struct iommu_ops intel_iommu_ops = {
.domain_free = intel_iommu_domain_free,
.attach_dev = intel_iommu_attach_device,
.detach_dev = intel_iommu_detach_device,
- .aux_attach_dev = intel_iommu_aux_attach_device,
- .aux_detach_dev = intel_iommu_aux_detach_device,
- .aux_get_pasid = intel_iommu_aux_get_pasid,
.map_pages = intel_iommu_map_pages,
.unmap_pages = intel_iommu_unmap_pages,
.iotlb_sync_map = intel_iommu_iotlb_sync_map,
@@ -5412,7 +5108,6 @@ const struct iommu_ops intel_iommu_ops = {
.get_resv_regions = intel_iommu_get_resv_regions,
.put_resv_regions = generic_iommu_put_resv_regions,
.device_group = intel_iommu_device_group,
- .dev_feat_enabled = intel_iommu_dev_feat_enabled,
.dev_enable_feat = intel_iommu_dev_enable_feat,
.dev_disable_feat = intel_iommu_dev_disable_feat,
.is_attach_deferred = intel_iommu_is_attach_deferred,
--
2.25.1

2022-01-24 18:58:04

by Christoph Hellwig

[permalink] [raw]

2022-01-24 18:58:21

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 5/7] drm/nouveau/device: Get right pgsize_bitmap of iommu_domain

On Mon, Jan 24, 2022 at 03:11:00PM +0800, Lu Baolu wrote:
> The supported page sizes of an iommu_domain are saved in the pgsize_bitmap
> field. Retrieve the value from the right place.
>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Robin Murphy <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]

Looks good:

Reviewed-by: Christoph Hellwig <[email protected]>

Wouldn't it mke sense to remove the pgsize_bitmap in struct iommu_ops
and initialize the domain field in the domain_alloc methods? Or am I
missing something?

2022-01-24 18:58:26

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 6/7] iommu: Use right way to retrieve iommu_ops

On Mon, Jan 24, 2022 at 03:11:01PM +0800, Lu Baolu wrote:
> The common iommu_ops is hooked to both device and domain. When a helper
> has both device and domain pointer, the way to get the iommu_ops looks
> messy in iommu core. This sorts out the way to get iommu_ops. The device
> related helpers go through device pointer, while the domain related ones
> go through domain pointer.

Ugg. This really sounds like we should have a different structures for
each set of ops?

2022-01-24 18:58:46

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 7/7] iommu: Add iommu_domain::domain_ops

On Mon, Jan 24, 2022 at 03:11:02PM +0800, Lu Baolu wrote:
> Add a domain specific callback set, domain_ops, for vendor iommu driver
> to provide domain specific operations. Move domain-specific callbacks
> from iommu_ops to the domain_ops and hook them when a domain is allocated.

Ah, that's what I meant earlier. Perfect!

Nit: I don't think vendor is the right term here.

Maybe something like:

iommut: split struct iommu_ops

Move the domain specific operations out of struct iommu_ops into a new
structure that only has domain specific operations. This solves
the problem of needing to know if the method vector for a given
operation needs to be retreived from the device or th domain.

> +struct domain_ops {

This needs to keep an iommu in the name, i.e. iommu_domain_ops.

> + phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);

Overly long line.

> +static inline void iommu_domain_set_ops(struct iommu_domain *domain,
> + const struct domain_ops *ops)
> +{
> + domain->ops = ops;
> +}

Do we really need this helper?

> +static const struct domain_ops amd_domain_ops;

Can we avoid these forward declarations or would that cause too much
churn?

2022-01-24 19:01:27

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH 0/7] iommu cleanup and refactoring

> From: Lu Baolu <[email protected]>
> Sent: Monday, January 24, 2022 3:11 PM
>
> Hi,
>
> The guest pasid and aux-domain related code are dead code in current
> iommu subtree. As we have reached a consensus that all these features
> should be based on the new iommufd framework (which is under active
> development), the first part of this series removes and cleanups all
> the dead code.
>
> The second part of this series refactors the iommu_domain by moving all
> domain-specific ops from iommu_ops to a new domain_ops. This makes an
> iommu_domain self-contained and represent the abstraction of an I/O
> translation table in the IOMMU subsystem. With different type of
> iommu_domain providing different set of ops, it's easier to support more
> types of I/O translation tables.

You may want to give more background on this end goal. In general there
are four IOPT types in iommufd discussions:

1) The one currently tracked by iommu_domain, with a map/unmap semantics
2) The one managed by mm and shared to iommu via sva_bind/unbind ops
3) The one managed by userspace and bound to iommu via iommufd (require nesting)
4) The one managed by KVM (e.g. EPT) and shared to iommu via a TBD interface

Currently only 1) and 2) are supported. 2) is supported via sva specific
structures, tracked outside of iommu_domain.

Following current direction 3) and 4) will end up to have separate tracking
structures too (some existing bits for 3 are removed by this series).

And this series is preparatory for extending the domain concept to be a
general object for all IOPT types. And it is one important design direction
to be aligned in the iommu community.

Thanks
Kevin

2022-01-24 19:01:31

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH 7/7] iommu: Add iommu_domain::domain_ops

> From: Lu Baolu <[email protected]>
> Sent: Monday, January 24, 2022 3:11 PM
> +/**
> + * struct domain_ops - per-domain ops
> + * @attach_dev: attach an iommu domain to a device
> + * @detach_dev: detach an iommu domain from a device

What is the criteria about whether an op should be iommu_ops or domain_ops
when it requires both domain and device pointers like above two (and future
PASID-based attach)?

Other examples include:
@apply_resv_region
@is_attach_deferred

Thanks
Kevin

2022-01-24 19:01:44

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH 6/7] iommu: Use right way to retrieve iommu_ops

> From: Lu Baolu <[email protected]>
> Sent: Monday, January 24, 2022 3:11 PM
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index aa5486243892..111b3e9c79bb 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -385,6 +385,14 @@ static inline void iommu_iotlb_gather_init(struct
> iommu_iotlb_gather *gather)
> };
> }
>
> +static inline const struct iommu_ops *dev_iommu_ops_get(struct device
> *dev)

dev_get_iommu_ops or just dev_iommu_ops?

> +{
> + if (dev && dev->iommu && dev->iommu->iommu_dev)
> + return dev->iommu->iommu_dev->ops;
> +
> + return NULL;
> +}

2022-01-24 19:02:26

by Jean-Philippe Brucker

[permalink] [raw]
Subject: Re: [PATCH 7/7] iommu: Add iommu_domain::domain_ops

On Mon, Jan 24, 2022 at 09:58:18AM +0000, Tian, Kevin wrote:
> > From: Lu Baolu <[email protected]>
> > Sent: Monday, January 24, 2022 3:11 PM
> > +/**
> > + * struct domain_ops - per-domain ops
> > + * @attach_dev: attach an iommu domain to a device
> > + * @detach_dev: detach an iommu domain from a device
>
> What is the criteria about whether an op should be iommu_ops or domain_ops
> when it requires both domain and device pointers like above two (and future
> PASID-based attach)?
>
> Other examples include:
> @apply_resv_region
> @is_attach_deferred

Could attach_dev() be an IOMMU op? So a driver could set the domain ops
in attach_dev() rather than domain_alloc(). That would allow to install
map()/unmap() ops that are tailored for the device's IOMMU, which we don't
know at domain_alloc() time. I'm thinking about a guest that has both
physical and virtual endpoints, which would ideally use different kinds of
domain ops to support both efficiently (caching mode vs page tables)

Thanks,
Jean

2022-01-24 19:34:43

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 7/7] iommu: Add iommu_domain::domain_ops

On Mon, Jan 24, 2022 at 10:16:07AM +0000, Jean-Philippe Brucker wrote:
> On Mon, Jan 24, 2022 at 09:58:18AM +0000, Tian, Kevin wrote:
> > > From: Lu Baolu <[email protected]>
> > > Sent: Monday, January 24, 2022 3:11 PM
> > > +/**
> > > + * struct domain_ops - per-domain ops
> > > + * @attach_dev: attach an iommu domain to a device
> > > + * @detach_dev: detach an iommu domain from a device
> >
> > What is the criteria about whether an op should be iommu_ops or domain_ops
> > when it requires both domain and device pointers like above two (and future
> > PASID-based attach)?
> >
> > Other examples include:
> > @apply_resv_region
> > @is_attach_deferred
>
> Could attach_dev() be an IOMMU op? So a driver could set the domain ops
> in attach_dev() rather than domain_alloc(). That would allow to install
> map()/unmap() ops that are tailored for the device's IOMMU, which we don't
> know at domain_alloc() time.

I think we should be moving toward 'domain_alloc' returning the
correct domain and the way the driver implements the domain shouldn't
change after that.

> I'm thinking about a guest that has both physical and virtual
> endpoints, which would ideally use different kinds of domain ops to
> support both efficiently (caching mode vs page tables)

In this case shouldn't domain_alloc() reached from the struct device
already do the correct thing?

Jason

2022-01-24 19:40:49

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 7/7] iommu: Add iommu_domain::domain_ops

On Mon, Jan 24, 2022 at 09:58:18AM +0000, Tian, Kevin wrote:
> > From: Lu Baolu <[email protected]>
> > Sent: Monday, January 24, 2022 3:11 PM
> > +/**
> > + * struct domain_ops - per-domain ops
> > + * @attach_dev: attach an iommu domain to a device
> > + * @detach_dev: detach an iommu domain from a device
>
> What is the criteria about whether an op should be iommu_ops or domain_ops
> when it requires both domain and device pointers like above two (and future
> PASID-based attach)?
>
> Other examples include:
> @apply_resv_region

For apply_resv_region the 'dev' argument is really selecting a device
that is already attached to the domain, so it should be in the domain
ops.

> @is_attach_deferred

Only two drivers implement this and neither use the domain
argument. Remove the domain argument and keep it in the iommu_ops

Jason

2022-01-24 19:41:51

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 7/7] iommu: Add iommu_domain::domain_ops

On Mon, Jan 24, 2022 at 01:37:21AM -0800, Christoph Hellwig wrote:
> On Mon, Jan 24, 2022 at 03:11:02PM +0800, Lu Baolu wrote:
> > Add a domain specific callback set, domain_ops, for vendor iommu driver
> > to provide domain specific operations. Move domain-specific callbacks
> > from iommu_ops to the domain_ops and hook them when a domain is allocated.
>
> Ah, that's what I meant earlier. Perfect!
>
> Nit: I don't think vendor is the right term here.

+1 please don't use the confusing 'vendor' in the kernel, if you feel
you want to write 'vendor' writing 'device driver' is usually a good
choice..

This whole series is nice, the few small notes aside, thanks for
making it!

Jason

2022-01-24 19:43:39

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 6/7] iommu: Use right way to retrieve iommu_ops

On Mon, Jan 24, 2022 at 03:11:01PM +0800, Lu Baolu wrote:
> The common iommu_ops is hooked to both device and domain. When a helper
> has both device and domain pointer, the way to get the iommu_ops looks
> messy in iommu core. This sorts out the way to get iommu_ops. The device
> related helpers go through device pointer, while the domain related ones
> go through domain pointer.
>
> Signed-off-by: Lu Baolu <[email protected]>
> include/linux/iommu.h | 8 ++++++++
> drivers/iommu/iommu.c | 25 ++++++++++++++-----------
> 2 files changed, 22 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index aa5486243892..111b3e9c79bb 100644
> +++ b/include/linux/iommu.h
> @@ -385,6 +385,14 @@ static inline void iommu_iotlb_gather_init(struct iommu_iotlb_gather *gather)
> };
> }
>
> +static inline const struct iommu_ops *dev_iommu_ops_get(struct device *dev)
> +{
> + if (dev && dev->iommu && dev->iommu->iommu_dev)
> + return dev->iommu->iommu_dev->ops;
> +
> + return NULL;

What is the purpose of this helper?

> + const struct iommu_ops *ops = dev_iommu_ops_get(dev);
> struct iommu_domain *domain = group->default_domain;
> struct iommu_resv_region *entry;
> struct list_head mappings;
> @@ -785,8 +786,8 @@ static int iommu_create_device_direct_mappings(struct iommu_group *group,
> dma_addr_t start, end, addr;
> size_t map_size = 0;
>
> - if (domain->ops->apply_resv_region)
> - domain->ops->apply_resv_region(dev, domain, entry);
> + if (ops->apply_resv_region)
> + ops->apply_resv_region(dev, domain, entry);

Here we call it and don't check for NULL? So why did we check the
interior pointers in the helper?

> @@ -831,8 +832,10 @@ static int iommu_create_device_direct_mappings(struct iommu_group *group,
> static bool iommu_is_attach_deferred(struct iommu_domain *domain,
> struct device *dev)
> {
> - if (domain->ops->is_attach_deferred)
> - return domain->ops->is_attach_deferred(domain, dev);
> + const struct iommu_ops *ops = dev_iommu_ops_get(dev);
> +
> + if (ops->is_attach_deferred)
> + return ops->is_attach_deferred(domain, dev);

Same here, at least return false if ops is null..

> @@ -1251,10 +1254,10 @@ int iommu_page_response(struct device *dev,
> struct iommu_fault_event *evt;
> struct iommu_fault_page_request *prm;
> struct dev_iommu *param = dev->iommu;
> + const struct iommu_ops *ops = dev_iommu_ops_get(dev);
> bool has_pasid = msg->flags & IOMMU_PAGE_RESP_PASID_VALID;
> - struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
>
> - if (!domain || !domain->ops->page_response)
> + if (!ops || !ops->page_response)
> return -ENODEV;
>
> if (!param || !param->fault_param)
> @@ -1295,7 +1298,7 @@ int iommu_page_response(struct device *dev,
> msg->pasid = 0;
> }
>
> - ret = domain->ops->page_response(dev, evt, msg);
> + ret = ops->page_response(dev, evt, msg);
> list_del(&evt->list);
> kfree(evt);
> break;

Feels weird that page_response is not connected to a domain, the fault
originated from a domain after all. I would say this op should be
moved to the domain and the caller should provide the a pointer to the
domain that originated the fault.

Ideally since only some domain's will be configured to handle faults
at all - domains that can't do this should have a NULL page_response
op, even if other domains created by the same device driver could
handle page_response..

> @@ -1758,10 +1761,10 @@ static int __iommu_group_dma_attach(struct iommu_group *group)
>
> static int iommu_group_do_probe_finalize(struct device *dev, void *data)
> {
> - struct iommu_domain *domain = data;
> + const struct iommu_ops *ops = dev_iommu_ops_get(dev);
>
> - if (domain->ops->probe_finalize)
> - domain->ops->probe_finalize(dev);
> + if (ops->probe_finalize)
> + ops->probe_finalize(dev);

This is an oddball one too, it is finishing setting up the default
domain for a device? Several drivers seem to recover the default
domain in their implementations..

Jason

2022-01-24 19:44:43

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 0/7] iommu cleanup and refactoring

On Mon, Jan 24, 2022 at 09:46:26AM +0000, Tian, Kevin wrote:
> > From: Lu Baolu <[email protected]>
> > Sent: Monday, January 24, 2022 3:11 PM
> >
> > Hi,
> >
> > The guest pasid and aux-domain related code are dead code in current
> > iommu subtree. As we have reached a consensus that all these features
> > should be based on the new iommufd framework (which is under active
> > development), the first part of this series removes and cleanups all
> > the dead code.
> >
> > The second part of this series refactors the iommu_domain by moving all
> > domain-specific ops from iommu_ops to a new domain_ops. This makes an
> > iommu_domain self-contained and represent the abstraction of an I/O
> > translation table in the IOMMU subsystem. With different type of
> > iommu_domain providing different set of ops, it's easier to support more
> > types of I/O translation tables.
>
> You may want to give more background on this end goal. In general there
> are four IOPT types in iommufd discussions:
>
> 1) The one currently tracked by iommu_domain, with a map/unmap semantics
> 2) The one managed by mm and shared to iommu via sva_bind/unbind ops
> 3) The one managed by userspace and bound to iommu via iommufd (require nesting)
> 4) The one managed by KVM (e.g. EPT) and shared to iommu via a TBD interface

Yes, at least from an iommufd perspective I'd like to see one struct
for all of these types, mainly so we can have a uniform alloc, attach
and detatch flow for all io page table types.

If we want to use the iommu_domain, or make iommu_domain a sub-class
of a new struct, can be determined as we go along.

Regardless, I think this cleanup stands on its own. Moving the ops and
purging the dead code is clearly the right thing to do.

Thanks,
Jason

2022-01-24 19:48:50

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 7/7] iommu: Add iommu_domain::domain_ops

On Mon, Jan 24, 2022 at 03:11:02PM +0800, Lu Baolu wrote:
> - int (*enable_nesting)(struct iommu_domain *domain);

Perhaps for another series, but enable_nesting looks like dead code
too, AFAICT.

Or at the very least I can't figure how out VFIO_TYPE1_NESTING_IOMMU
is supposed to work, or find any implementation of it in userspace.

8 years after it was merged in commit f5c9ecebaf2a ("vfio/iommu_type1:
add new VFIO_TYPE1_NESTING_IOMMU IOMMU type")

The commit comment says 'which are nested with the existing
translation installed by VFIO.' but it doesn't explain how the newly
created domain knows what its parent nest is..

Lu, there is an implementation in the Intel driver here, is it usable
at all?

Jason

2022-01-25 08:37:56

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH 6/7] iommu: Use right way to retrieve iommu_ops

On 2022-01-24 07:11, Lu Baolu wrote:
> The common iommu_ops is hooked to both device and domain. When a helper
> has both device and domain pointer, the way to get the iommu_ops looks
> messy in iommu core. This sorts out the way to get iommu_ops. The device
> related helpers go through device pointer, while the domain related ones
> go through domain pointer.
>
> Signed-off-by: Lu Baolu <[email protected]>
> ---
> include/linux/iommu.h | 8 ++++++++
> drivers/iommu/iommu.c | 25 ++++++++++++++-----------
> 2 files changed, 22 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index aa5486243892..111b3e9c79bb 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -385,6 +385,14 @@ static inline void iommu_iotlb_gather_init(struct iommu_iotlb_gather *gather)
> };
> }
>
> +static inline const struct iommu_ops *dev_iommu_ops_get(struct device *dev)
> +{
> + if (dev && dev->iommu && dev->iommu->iommu_dev)
> + return dev->iommu->iommu_dev->ops;
> +
> + return NULL;

This probably warrants at least a WARN, but it's arguable to just assume
that valid ops must be installed if iommu_probe_device() has succeeded.
The device ops are essentially for internal use within the IOMMU
subsystem itself, so we should be able to trust ourselves not to misuse
the helper.

> +}
> +
> #define IOMMU_BUS_NOTIFY_PRIORITY 0
> #define IOMMU_GROUP_NOTIFY_ADD_DEVICE 1 /* Device added */
> #define IOMMU_GROUP_NOTIFY_DEL_DEVICE 2 /* Pre Device removed */
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 5230c6d90ece..6631e2ea44df 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -764,6 +764,7 @@ EXPORT_SYMBOL_GPL(iommu_group_set_name);
> static int iommu_create_device_direct_mappings(struct iommu_group *group,
> struct device *dev)
> {
> + const struct iommu_ops *ops = dev_iommu_ops_get(dev);
> struct iommu_domain *domain = group->default_domain;
> struct iommu_resv_region *entry;
> struct list_head mappings;
> @@ -785,8 +786,8 @@ static int iommu_create_device_direct_mappings(struct iommu_group *group,
> dma_addr_t start, end, addr;
> size_t map_size = 0;
>
> - if (domain->ops->apply_resv_region)
> - domain->ops->apply_resv_region(dev, domain, entry);
> + if (ops->apply_resv_region)
> + ops->apply_resv_region(dev, domain, entry);

Strictly I think this was a domain op, as it was about reserving the
IOVA range in the given DMA domain. Also taking the domain as an
argument is a bit of a giveaway. However it's now just dead code either
way since there are no remaining implementations, and no reason for any
new ones.

>
> start = ALIGN(entry->start, pg_size);
> end = ALIGN(entry->start + entry->length, pg_size);
> @@ -831,8 +832,10 @@ static int iommu_create_device_direct_mappings(struct iommu_group *group,
> static bool iommu_is_attach_deferred(struct iommu_domain *domain,
> struct device *dev)
> {
> - if (domain->ops->is_attach_deferred)
> - return domain->ops->is_attach_deferred(domain, dev);
> + const struct iommu_ops *ops = dev_iommu_ops_get(dev);
> +
> + if (ops->is_attach_deferred)
> + return ops->is_attach_deferred(domain, dev);

Similarly if this takes a domain as its first argument then it's de
facto a domain method. However, I'd concur that logically it *is* a
device op, so let's drop that (unused) domain argument if we're cleaning up.

Maybe there's even an argument for factoring this out to a standard flag
in dev_iommu rather than an op at all?

The others covered here look OK - we can blame PCI for page response
being weirdly device-centric - however could we also convert all the
feasible instances of dev->bus->iommu_ops to dev_iommu_ops() as well?
(Subtly implying that I'm also not a fan of having "_get" in the name
for a non-refcounted lookup...) Obviously iommu_probe_device() and
iommmu_domain_alloc() still need bus ops at this point, but I'm working
on that... :)

Thanks,
Robin.

> return false;
> }
> @@ -1251,10 +1254,10 @@ int iommu_page_response(struct device *dev,
> struct iommu_fault_event *evt;
> struct iommu_fault_page_request *prm;
> struct dev_iommu *param = dev->iommu;
> + const struct iommu_ops *ops = dev_iommu_ops_get(dev);
> bool has_pasid = msg->flags & IOMMU_PAGE_RESP_PASID_VALID;
> - struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
>
> - if (!domain || !domain->ops->page_response)
> + if (!ops || !ops->page_response)
> return -ENODEV;
>
> if (!param || !param->fault_param)
> @@ -1295,7 +1298,7 @@ int iommu_page_response(struct device *dev,
> msg->pasid = 0;
> }
>
> - ret = domain->ops->page_response(dev, evt, msg);
> + ret = ops->page_response(dev, evt, msg);
> list_del(&evt->list);
> kfree(evt);
> break;
> @@ -1758,10 +1761,10 @@ static int __iommu_group_dma_attach(struct iommu_group *group)
>
> static int iommu_group_do_probe_finalize(struct device *dev, void *data)
> {
> - struct iommu_domain *domain = data;
> + const struct iommu_ops *ops = dev_iommu_ops_get(dev);
>
> - if (domain->ops->probe_finalize)
> - domain->ops->probe_finalize(dev);
> + if (ops->probe_finalize)
> + ops->probe_finalize(dev);
>
> return 0;
> }
> @@ -2020,7 +2023,7 @@ EXPORT_SYMBOL_GPL(iommu_attach_device);
>
> int iommu_deferred_attach(struct device *dev, struct iommu_domain *domain)
> {
> - const struct iommu_ops *ops = domain->ops;
> + const struct iommu_ops *ops = dev_iommu_ops_get(dev);
>
> if (ops->is_attach_deferred && ops->is_attach_deferred(domain, dev))
> return __iommu_attach_device(domain, dev);

2022-01-25 08:39:07

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH 7/7] iommu: Add iommu_domain::domain_ops

On 2022-01-24 07:11, Lu Baolu wrote:
> Add a domain specific callback set, domain_ops, for vendor iommu driver
> to provide domain specific operations. Move domain-specific callbacks
> from iommu_ops to the domain_ops and hook them when a domain is allocated.

I think it would make a lot of sense for iommu_domain_ops to be a
substructure *within* iommu_ops. That way we can save most of the driver
boilerplate here by not needing extra variables everywhere, and letting
iommu_domain_alloc() still do a default initialisation like "domain->ops
= bus->iommu_ops->domain_ops;"

I do like the future possibility of trying to flatten some of the
io-pgtable indirection by having the SMMU drivers subsequently swizzle
in alternate domain ops once they've picked a format, though. That was a
bit too clunky to achieve at the whole iommu_ops level when I
experimented with it years ago, but now it might well be worth another
try...

One other comment below.

> Signed-off-by: Lu Baolu <[email protected]>
> ---
> include/linux/iommu.h | 93 ++++++++++++---------
> drivers/iommu/amd/iommu.c | 21 +++--
> drivers/iommu/apple-dart.c | 24 ++++--
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 22 +++--
> drivers/iommu/arm/arm-smmu/arm-smmu.c | 23 +++--
> drivers/iommu/arm/arm-smmu/qcom_iommu.c | 17 ++--
> drivers/iommu/exynos-iommu.c | 17 ++--
> drivers/iommu/fsl_pamu_domain.c | 13 ++-
> drivers/iommu/intel/iommu.c | 25 ++++--
> drivers/iommu/iommu.c | 15 ++--
> drivers/iommu/ipmmu-vmsa.c | 21 +++--
> drivers/iommu/msm_iommu.c | 17 ++--
> drivers/iommu/mtk_iommu.c | 24 ++++--
> drivers/iommu/mtk_iommu_v1.c | 19 +++--
> drivers/iommu/omap-iommu.c | 15 ++--
> drivers/iommu/rockchip-iommu.c | 17 ++--
> drivers/iommu/s390-iommu.c | 15 ++--
> drivers/iommu/sprd-iommu.c | 19 +++--
> drivers/iommu/sun50i-iommu.c | 18 ++--
> drivers/iommu/tegra-gart.c | 15 ++--
> drivers/iommu/tegra-smmu.c | 16 ++--
> drivers/iommu/virtio-iommu.c | 18 ++--
> 22 files changed, 305 insertions(+), 179 deletions(-)
>
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 111b3e9c79bb..33c5c0e5c465 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -88,7 +88,7 @@ struct iommu_domain_geometry {
>
> struct iommu_domain {
> unsigned type;
> - const struct iommu_ops *ops;
> + const struct domain_ops *ops;
> unsigned long pgsize_bitmap; /* Bitmap of page sizes in use */
> iommu_fault_handler_t handler;
> void *handler_token;
> @@ -192,26 +192,11 @@ struct iommu_iotlb_gather {
> * struct iommu_ops - iommu ops and capabilities
> * @capable: check capability
> * @domain_alloc: allocate iommu domain
> - * @domain_free: free iommu domain
> - * @attach_dev: attach device to an iommu domain
> - * @detach_dev: detach device from an iommu domain
> - * @map: map a physically contiguous memory region to an iommu domain
> - * @map_pages: map a physically contiguous set of pages of the same size to
> - * an iommu domain.
> - * @unmap: unmap a physically contiguous memory region from an iommu domain
> - * @unmap_pages: unmap a number of pages of the same size from an iommu domain
> - * @flush_iotlb_all: Synchronously flush all hardware TLBs for this domain
> - * @iotlb_sync_map: Sync mappings created recently using @map to the hardware
> - * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty flush
> - * queue
> - * @iova_to_phys: translate iova to physical address
> * @probe_device: Add device to iommu driver handling
> * @release_device: Remove device from iommu driver handling
> * @probe_finalize: Do final setup work after the device is added to an IOMMU
> * group and attached to the groups domain
> * @device_group: find iommu group for a particular device
> - * @enable_nesting: Enable nesting
> - * @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*)
> * @get_resv_regions: Request list of reserved regions for a device
> * @put_resv_regions: Free list of reserved regions for a device
> * @apply_resv_region: Temporary helper call-back for iova reserved ranges
> @@ -237,33 +222,11 @@ struct iommu_ops {
>
> /* Domain allocation and freeing by the iommu driver */
> struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type);
> - void (*domain_free)(struct iommu_domain *);
>
> - int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
> - void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
> - int (*map)(struct iommu_domain *domain, unsigned long iova,
> - phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
> - int (*map_pages)(struct iommu_domain *domain, unsigned long iova,
> - phys_addr_t paddr, size_t pgsize, size_t pgcount,
> - int prot, gfp_t gfp, size_t *mapped);
> - size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
> - size_t size, struct iommu_iotlb_gather *iotlb_gather);
> - size_t (*unmap_pages)(struct iommu_domain *domain, unsigned long iova,
> - size_t pgsize, size_t pgcount,
> - struct iommu_iotlb_gather *iotlb_gather);
> - void (*flush_iotlb_all)(struct iommu_domain *domain);
> - void (*iotlb_sync_map)(struct iommu_domain *domain, unsigned long iova,
> - size_t size);
> - void (*iotlb_sync)(struct iommu_domain *domain,
> - struct iommu_iotlb_gather *iotlb_gather);
> - phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);
> struct iommu_device *(*probe_device)(struct device *dev);
> void (*release_device)(struct device *dev);
> void (*probe_finalize)(struct device *dev);
> struct iommu_group *(*device_group)(struct device *dev);
> - int (*enable_nesting)(struct iommu_domain *domain);
> - int (*set_pgtable_quirks)(struct iommu_domain *domain,
> - unsigned long quirks);
>
> /* Request/Free a list of reserved regions for a device */
> void (*get_resv_regions)(struct device *dev, struct list_head *list);
> @@ -296,6 +259,60 @@ struct iommu_ops {
> struct module *owner;
> };
>
> +/**
> + * struct domain_ops - per-domain ops
> + * @attach_dev: attach an iommu domain to a device
> + * @detach_dev: detach an iommu domain from a device
> + * @map: map a physically contiguous memory region to an iommu domain
> + * @map_pages: map a physically contiguous set of pages of the same size to
> + * an iommu domain.
> + * @unmap: unmap a physically contiguous memory region from an iommu domain
> + * @unmap_pages: unmap a number of pages of the same size from an iommu domain
> + * @flush_iotlb_all: Synchronously flush all hardware TLBs for this domain
> + * @iotlb_sync_map: Sync mappings created recently using @map to the hardware
> + * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty flush
> + * queue
> + * @iova_to_phys: translate iova to physical address
> + * @enable_nesting: Enable nesting
> + * @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*)
> + * @release: Release the domain after use.

Not sure about the name change here - it's still logically a "free" that
matches an "alloc", we're not doing anything clever like refcounting or
devres, which is what I'd tend to associate with a "release" function
for an allocated resource.

Robin.

> + */
> +struct domain_ops {
> + int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
> + void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
> +
> + int (*map)(struct iommu_domain *domain, unsigned long iova,
> + phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
> + int (*map_pages)(struct iommu_domain *domain, unsigned long iova,
> + phys_addr_t paddr, size_t pgsize, size_t pgcount,
> + int prot, gfp_t gfp, size_t *mapped);
> + size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
> + size_t size, struct iommu_iotlb_gather *iotlb_gather);
> + size_t (*unmap_pages)(struct iommu_domain *domain, unsigned long iova,
> + size_t pgsize, size_t pgcount,
> + struct iommu_iotlb_gather *iotlb_gather);
> +
> + void (*flush_iotlb_all)(struct iommu_domain *domain);
> + void (*iotlb_sync_map)(struct iommu_domain *domain, unsigned long iova,
> + size_t size);
> + void (*iotlb_sync)(struct iommu_domain *domain,
> + struct iommu_iotlb_gather *iotlb_gather);
> +
> + phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);
> +
> + int (*enable_nesting)(struct iommu_domain *domain);
> + int (*set_pgtable_quirks)(struct iommu_domain *domain,
> + unsigned long quirks);
> +
> + void (*release)(struct iommu_domain *domain);
> +};

2022-01-25 08:39:08

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH 7/7] iommu: Add iommu_domain::domain_ops

> From: Jason Gunthorpe via iommu
> Sent: Tuesday, January 25, 2022 1:18 AM
>
> On Mon, Jan 24, 2022 at 09:58:18AM +0000, Tian, Kevin wrote:
> > > From: Lu Baolu <[email protected]>
> > > Sent: Monday, January 24, 2022 3:11 PM
> > > +/**
> > > + * struct domain_ops - per-domain ops
> > > + * @attach_dev: attach an iommu domain to a device
> > > + * @detach_dev: detach an iommu domain from a device
> >
> > What is the criteria about whether an op should be iommu_ops or
> domain_ops
> > when it requires both domain and device pointers like above two (and
> future
> > PASID-based attach)?
> >
> > Other examples include:
> > @apply_resv_region
>
> For apply_resv_region the 'dev' argument is really selecting a device
> that is already attached to the domain, so it should be in the domain
> ops.

Looks this one can be just removed. No iommu driver implements today.

>
> > @is_attach_deferred
>
> Only two drivers implement this and neither use the domain
> argument. Remove the domain argument and keep it in the iommu_ops
>

Yes, especially when none of the two drivers actually uses domain.

Generally it makes sense to me by putting all attach related callbacks in
iommu_ops, since they are all for building connections between device
and domain while such connection is device specific (just like how a
vfio device is connected to iommufd).

This in concept also applies to @enable_nesting which you mentioned
in another thread. Nesting is a per-device configuration between multiple
domains attached by a same device.

Thanks
Kevin

2022-01-25 08:42:49

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH 0/7] iommu cleanup and refactoring

> From: Jason Gunthorpe <[email protected]>
> Sent: Tuesday, January 25, 2022 1:44 AM
>
> On Mon, Jan 24, 2022 at 09:46:26AM +0000, Tian, Kevin wrote:
> > > From: Lu Baolu <[email protected]>
> > > Sent: Monday, January 24, 2022 3:11 PM
> > >
> > > Hi,
> > >
> > > The guest pasid and aux-domain related code are dead code in current
> > > iommu subtree. As we have reached a consensus that all these features
> > > should be based on the new iommufd framework (which is under active
> > > development), the first part of this series removes and cleanups all
> > > the dead code.
> > >
> > > The second part of this series refactors the iommu_domain by moving all
> > > domain-specific ops from iommu_ops to a new domain_ops. This makes
> an
> > > iommu_domain self-contained and represent the abstraction of an I/O
> > > translation table in the IOMMU subsystem. With different type of
> > > iommu_domain providing different set of ops, it's easier to support more
> > > types of I/O translation tables.
> >
> > You may want to give more background on this end goal. In general there
> > are four IOPT types in iommufd discussions:
> >
> > 1) The one currently tracked by iommu_domain, with a map/unmap
> semantics
> > 2) The one managed by mm and shared to iommu via sva_bind/unbind ops
> > 3) The one managed by userspace and bound to iommu via iommufd
> (require nesting)
> > 4) The one managed by KVM (e.g. EPT) and shared to iommu via a TBD
> interface
>
> Yes, at least from an iommufd perspective I'd like to see one struct
> for all of these types, mainly so we can have a uniform alloc, attach
> and detatch flow for all io page table types.
>
> If we want to use the iommu_domain, or make iommu_domain a sub-class
> of a new struct, can be determined as we go along.
>
> Regardless, I think this cleanup stands on its own. Moving the ops and
> purging the dead code is clearly the right thing to do.
>

Indeed!

2022-01-25 08:46:45

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH 6/7] iommu: Use right way to retrieve iommu_ops

> From: Jason Gunthorpe via iommu
> Sent: Tuesday, January 25, 2022 1:37 AM
> > @@ -1295,7 +1298,7 @@ int iommu_page_response(struct device *dev,
> > msg->pasid = 0;
> > }
> >
> > - ret = domain->ops->page_response(dev, evt, msg);
> > + ret = ops->page_response(dev, evt, msg);
> > list_del(&evt->list);
> > kfree(evt);
> > break;
>
> Feels weird that page_response is not connected to a domain, the fault
> originated from a domain after all. I would say this op should be
> moved to the domain and the caller should provide the a pointer to the
> domain that originated the fault.
>

In concept yes.

But currently the entire sva path is not associated with domain. This was
one mistake as we discussed in the cover letter. Before extending iommu
domain to cover CPU page tables we may have to leave it in iommu_ops
given this series is just for cleanup...

Thanks
Kevin

2022-01-25 08:47:06

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH 6/7] iommu: Use right way to retrieve iommu_ops

On 2022-01-25 01:11, Tian, Kevin wrote:
>> From: Jason Gunthorpe via iommu
>> Sent: Tuesday, January 25, 2022 1:37 AM
>>> @@ -1295,7 +1298,7 @@ int iommu_page_response(struct device *dev,
>>> msg->pasid = 0;
>>> }
>>>
>>> - ret = domain->ops->page_response(dev, evt, msg);
>>> + ret = ops->page_response(dev, evt, msg);
>>> list_del(&evt->list);
>>> kfree(evt);
>>> break;
>>
>> Feels weird that page_response is not connected to a domain, the fault
>> originated from a domain after all. I would say this op should be
>> moved to the domain and the caller should provide the a pointer to the
>> domain that originated the fault.
>>
>
> In concept yes.

Not even that, really. It's true that the "fault" itself is logically
associated with the domain, but we never see that - the ATS request and
response which encapsulate it all happen automatically on the PCI side.
It's the endpoint that then decides to handle ATS translation failure
via PRI, so all we actually get is a page request message from a
RID/PASID, which most definitely represents the "device" (and in fact
having to work backwards from there to figure out which domain/context
it is currently attached to can be a bit of a pain). Similarly the
response is a message directly back to the device itself - an operation
on a domain may (or may not) have happened off the back of receiving the
initial request, but even if the content of the response is to reflect
that, the operation of responding is clearly focused on the device.

I fully agree that it's a weird-looking model, but that's how PCI SIG
made it - and no IOMMU architecture seems to have tried to wrap it up in
anything nicer either - so I don't see that we'd gain much from trying
to pretend otherwise :)

Robin.

> But currently the entire sva path is not associated with domain. This was
> one mistake as we discussed in the cover letter. Before extending iommu
> domain to cover CPU page tables we may have to leave it in iommu_ops
> given this series is just for cleanup...
>
> Thanks
> Kevin

2022-01-25 08:47:22

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH 6/7] iommu: Use right way to retrieve iommu_ops

> From: Robin Murphy <[email protected]>
> Sent: Tuesday, January 25, 2022 9:52 AM
>
> On 2022-01-25 01:11, Tian, Kevin wrote:
> >> From: Jason Gunthorpe via iommu
> >> Sent: Tuesday, January 25, 2022 1:37 AM
> >>> @@ -1295,7 +1298,7 @@ int iommu_page_response(struct device *dev,
> >>> msg->pasid = 0;
> >>> }
> >>>
> >>> - ret = domain->ops->page_response(dev, evt, msg);
> >>> + ret = ops->page_response(dev, evt, msg);
> >>> list_del(&evt->list);
> >>> kfree(evt);
> >>> break;
> >>
> >> Feels weird that page_response is not connected to a domain, the fault
> >> originated from a domain after all. I would say this op should be
> >> moved to the domain and the caller should provide the a pointer to the
> >> domain that originated the fault.
> >>
> >
> > In concept yes.
>
> Not even that, really. It's true that the "fault" itself is logically
> associated with the domain, but we never see that - the ATS request and
> response which encapsulate it all happen automatically on the PCI side.
> It's the endpoint that then decides to handle ATS translation failure
> via PRI, so all we actually get is a page request message from a
> RID/PASID, which most definitely represents the "device" (and in fact
> having to work backwards from there to figure out which domain/context
> it is currently attached to can be a bit of a pain). Similarly the
> response is a message directly back to the device itself - an operation
> on a domain may (or may not) have happened off the back of receiving the
> initial request, but even if the content of the response is to reflect
> that, the operation of responding is clearly focused on the device.
>
> I fully agree that it's a weird-looking model, but that's how PCI SIG
> made it - and no IOMMU architecture seems to have tried to wrap it up in
> anything nicer either - so I don't see that we'd gain much from trying
> to pretend otherwise :)
>

I think the point here is that although page requests are received
per device from the wire the low level iommu driver should convert
those requests into domain-wide requests (with RID/PASID recorded
as private data in the request) which then can be handled by domain
ops in iommu core. Once a domain-wide request is completed by
the iommu core, the low level iommu driver then retrieves RID/PASID
information from private data of the completed request and triggers
page response per RID/PASID in bus specific way.

Does it sound reasonable?

Thanks
Kevin

2022-01-25 08:48:32

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH 5/7] drm/nouveau/device: Get right pgsize_bitmap of iommu_domain

On 1/24/22 5:29 PM, Christoph Hellwig wrote:
> On Mon, Jan 24, 2022 at 03:11:00PM +0800, Lu Baolu wrote:
>> The supported page sizes of an iommu_domain are saved in the pgsize_bitmap
>> field. Retrieve the value from the right place.
>>
>> Signed-off-by: Lu Baolu <[email protected]>
>> Reviewed-by: Robin Murphy <[email protected]>
>> Link: https://lore.kernel.org/r/[email protected]
>
> Looks good:
>
> Reviewed-by: Christoph Hellwig <[email protected]>
>
> Wouldn't it mke sense to remove the pgsize_bitmap in struct iommu_ops
> and initialize the domain field in the domain_alloc methods? Or am I
> missing something?
>

It looks reasonable to me. The pgsize_bitmap is an attribute of a domain
that provides the map/unmap interfaces. It could be moved out of the
iommu_ops structure.

Best regards,
baolu

2022-01-25 08:50:40

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH 6/7] iommu: Use right way to retrieve iommu_ops

On 1/24/22 5:48 PM, Tian, Kevin wrote:
>> From: Lu Baolu <[email protected]>
>> Sent: Monday, January 24, 2022 3:11 PM
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index aa5486243892..111b3e9c79bb 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -385,6 +385,14 @@ static inline void iommu_iotlb_gather_init(struct
>> iommu_iotlb_gather *gather)
>> };
>> }
>>
>> +static inline const struct iommu_ops *dev_iommu_ops_get(struct device
>> *dev)
>
> dev_get_iommu_ops or just dev_iommu_ops?

dev_iommu_ops() looks better.

Best regards,
baolu

2022-01-25 08:51:04

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH 6/7] iommu: Use right way to retrieve iommu_ops

On 1/24/22 5:32 PM, Christoph Hellwig wrote:
> On Mon, Jan 24, 2022 at 03:11:01PM +0800, Lu Baolu wrote:
>> The common iommu_ops is hooked to both device and domain. When a helper
>> has both device and domain pointer, the way to get the iommu_ops looks
>> messy in iommu core. This sorts out the way to get iommu_ops. The device
>> related helpers go through device pointer, while the domain related ones
>> go through domain pointer.
>
> Ugg. This really sounds like we should have a different structures for
> each set of ops?
>

Yes. Do this in the following patch.

Best regards,
baolu

2022-01-25 08:53:26

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH 6/7] iommu: Use right way to retrieve iommu_ops

On 1/25/22 10:08 AM, Tian, Kevin wrote:
>> From: Robin Murphy <[email protected]>
>> Sent: Tuesday, January 25, 2022 9:52 AM
>>
>> On 2022-01-25 01:11, Tian, Kevin wrote:
>>>> From: Jason Gunthorpe via iommu
>>>> Sent: Tuesday, January 25, 2022 1:37 AM
>>>>> @@ -1295,7 +1298,7 @@ int iommu_page_response(struct device *dev,
>>>>> msg->pasid = 0;
>>>>> }
>>>>>
>>>>> - ret = domain->ops->page_response(dev, evt, msg);
>>>>> + ret = ops->page_response(dev, evt, msg);
>>>>> list_del(&evt->list);
>>>>> kfree(evt);
>>>>> break;
>>>>
>>>> Feels weird that page_response is not connected to a domain, the fault
>>>> originated from a domain after all. I would say this op should be
>>>> moved to the domain and the caller should provide the a pointer to the
>>>> domain that originated the fault.
>>>>
>>>
>>> In concept yes.
>>
>> Not even that, really. It's true that the "fault" itself is logically
>> associated with the domain, but we never see that - the ATS request and
>> response which encapsulate it all happen automatically on the PCI side.
>> It's the endpoint that then decides to handle ATS translation failure
>> via PRI, so all we actually get is a page request message from a
>> RID/PASID, which most definitely represents the "device" (and in fact
>> having to work backwards from there to figure out which domain/context
>> it is currently attached to can be a bit of a pain). Similarly the
>> response is a message directly back to the device itself - an operation
>> on a domain may (or may not) have happened off the back of receiving the
>> initial request, but even if the content of the response is to reflect
>> that, the operation of responding is clearly focused on the device.
>>
>> I fully agree that it's a weird-looking model, but that's how PCI SIG
>> made it - and no IOMMU architecture seems to have tried to wrap it up in
>> anything nicer either - so I don't see that we'd gain much from trying
>> to pretend otherwise :)
>>
>
> I think the point here is that although page requests are received
> per device from the wire the low level iommu driver should convert
> those requests into domain-wide requests (with RID/PASID recorded
> as private data in the request) which then can be handled by domain
> ops in iommu core. Once a domain-wide request is completed by
> the iommu core, the low level iommu driver then retrieves RID/PASID
> information from private data of the completed request and triggers
> page response per RID/PASID in bus specific way.

I also have a pending series to associate the sva with an iommu domain
and make the existing I/O page fault framework generic (vs. sva
specific). Perhaps we can discuss the page fault handle/response there
with the real code.

>
> Does it sound reasonable?
>
> Thanks
> Kevin
>

Best regards,
baolu

2022-01-25 08:56:27

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH 6/7] iommu: Use right way to retrieve iommu_ops

On 1/25/22 1:36 AM, Jason Gunthorpe wrote:
> On Mon, Jan 24, 2022 at 03:11:01PM +0800, Lu Baolu wrote:
>> The common iommu_ops is hooked to both device and domain. When a helper
>> has both device and domain pointer, the way to get the iommu_ops looks
>> messy in iommu core. This sorts out the way to get iommu_ops. The device
>> related helpers go through device pointer, while the domain related ones
>> go through domain pointer.
>>
>> Signed-off-by: Lu Baolu <[email protected]>
>> include/linux/iommu.h | 8 ++++++++
>> drivers/iommu/iommu.c | 25 ++++++++++++++-----------
>> 2 files changed, 22 insertions(+), 11 deletions(-)
>>
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index aa5486243892..111b3e9c79bb 100644
>> +++ b/include/linux/iommu.h
>> @@ -385,6 +385,14 @@ static inline void iommu_iotlb_gather_init(struct iommu_iotlb_gather *gather)
>> };
>> }
>>
>> +static inline const struct iommu_ops *dev_iommu_ops_get(struct device *dev)
>> +{
>> + if (dev && dev->iommu && dev->iommu->iommu_dev)
>> + return dev->iommu->iommu_dev->ops;
>> +
>> + return NULL;
>
> What is the purpose of this helper?

Get the iommu_ops from a device pointer. Just want to avoid duplicate
code in various functions.

>
>> + const struct iommu_ops *ops = dev_iommu_ops_get(dev);
>> struct iommu_domain *domain = group->default_domain;
>> struct iommu_resv_region *entry;
>> struct list_head mappings;
>> @@ -785,8 +786,8 @@ static int iommu_create_device_direct_mappings(struct iommu_group *group,
>> dma_addr_t start, end, addr;
>> size_t map_size = 0;
>>
>> - if (domain->ops->apply_resv_region)
>> - domain->ops->apply_resv_region(dev, domain, entry);
>> + if (ops->apply_resv_region)
>> + ops->apply_resv_region(dev, domain, entry);
>
> Here we call it and don't check for NULL? So why did we check the
> interior pointers in the helper?

Yes. Should check.

>
>> @@ -831,8 +832,10 @@ static int iommu_create_device_direct_mappings(struct iommu_group *group,
>> static bool iommu_is_attach_deferred(struct iommu_domain *domain,
>> struct device *dev)
>> {
>> - if (domain->ops->is_attach_deferred)
>> - return domain->ops->is_attach_deferred(domain, dev);
>> + const struct iommu_ops *ops = dev_iommu_ops_get(dev);
>> +
>> + if (ops->is_attach_deferred)
>> + return ops->is_attach_deferred(domain, dev);
>
> Same here, at least return false if ops is null..

Yes.

>
>> @@ -1251,10 +1254,10 @@ int iommu_page_response(struct device *dev,
>> struct iommu_fault_event *evt;
>> struct iommu_fault_page_request *prm;
>> struct dev_iommu *param = dev->iommu;
>> + const struct iommu_ops *ops = dev_iommu_ops_get(dev);
>> bool has_pasid = msg->flags & IOMMU_PAGE_RESP_PASID_VALID;
>> - struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
>>
>> - if (!domain || !domain->ops->page_response)
>> + if (!ops || !ops->page_response)
>> return -ENODEV;
>>
>> if (!param || !param->fault_param)
>> @@ -1295,7 +1298,7 @@ int iommu_page_response(struct device *dev,
>> msg->pasid = 0;
>> }
>>
>> - ret = domain->ops->page_response(dev, evt, msg);
>> + ret = ops->page_response(dev, evt, msg);
>> list_del(&evt->list);
>> kfree(evt);
>> break;
>
> Feels weird that page_response is not connected to a domain, the fault
> originated from a domain after all. I would say this op should be
> moved to the domain and the caller should provide the a pointer to the
> domain that originated the fault.
>
> Ideally since only some domain's will be configured to handle faults
> at all - domains that can't do this should have a NULL page_response
> op, even if other domains created by the same device driver could
> handle page_response..
>
>> @@ -1758,10 +1761,10 @@ static int __iommu_group_dma_attach(struct iommu_group *group)
>>
>> static int iommu_group_do_probe_finalize(struct device *dev, void *data)
>> {
>> - struct iommu_domain *domain = data;
>> + const struct iommu_ops *ops = dev_iommu_ops_get(dev);
>>
>> - if (domain->ops->probe_finalize)
>> - domain->ops->probe_finalize(dev);
>> + if (ops->probe_finalize)
>> + ops->probe_finalize(dev);
>
> This is an oddball one too, it is finishing setting up the default
> domain for a device? Several drivers seem to recover the default
> domain in their implementations..

In order to avoid default domain type (DMA or IDENDITY) conflict among
devices in a same iommu_group, the probe process is divided into two
phases, one for determining the default domain type and the other for
allocating the default domain and attaching it to the device.

ops->probe_finalize() is called to tell the vendor iommu driver that
the device probe is done. Normally the vendor iommu driver could set the
dma ops in this callback.

>
> Jason
>

Best regards,
baolu

2022-01-25 08:57:31

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH 6/7] iommu: Use right way to retrieve iommu_ops


On 1/25/22 8:20 AM, Robin Murphy wrote:
> On 2022-01-24 07:11, Lu Baolu wrote:
>> The common iommu_ops is hooked to both device and domain. When a helper
>> has both device and domain pointer, the way to get the iommu_ops looks
>> messy in iommu core. This sorts out the way to get iommu_ops. The device
>> related helpers go through device pointer, while the domain related ones
>> go through domain pointer.
>>
>> Signed-off-by: Lu Baolu <[email protected]>
>> ---
>>   include/linux/iommu.h |  8 ++++++++
>>   drivers/iommu/iommu.c | 25 ++++++++++++++-----------
>>   2 files changed, 22 insertions(+), 11 deletions(-)
>>
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index aa5486243892..111b3e9c79bb 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -385,6 +385,14 @@ static inline void iommu_iotlb_gather_init(struct
>> iommu_iotlb_gather *gather)
>>       };
>>   }
>> +static inline const struct iommu_ops *dev_iommu_ops_get(struct device
>> *dev)
>> +{
>> +    if (dev && dev->iommu && dev->iommu->iommu_dev)
>> +        return dev->iommu->iommu_dev->ops;
>> +
>> +    return NULL;
>
> This probably warrants at least a WARN, but it's arguable to just assume
> that valid ops must be installed if iommu_probe_device() has succeeded.
> The device ops are essentially for internal use within the IOMMU
> subsystem itself, so we should be able to trust ourselves not to misuse
> the helper.

I agree that we could add a WARN() here. The expectation is that every
device going through the IOMMU interfaces or helpers should have been
probed by iommu_probe_device().

>
>> +}
>> +
>>   #define IOMMU_BUS_NOTIFY_PRIORITY        0
>>   #define IOMMU_GROUP_NOTIFY_ADD_DEVICE        1 /* Device added */
>>   #define IOMMU_GROUP_NOTIFY_DEL_DEVICE        2 /* Pre Device removed */
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index 5230c6d90ece..6631e2ea44df 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -764,6 +764,7 @@ EXPORT_SYMBOL_GPL(iommu_group_set_name);
>>   static int iommu_create_device_direct_mappings(struct iommu_group
>> *group,
>>                              struct device *dev)
>>   {
>> +    const struct iommu_ops *ops = dev_iommu_ops_get(dev);
>>       struct iommu_domain *domain = group->default_domain;
>>       struct iommu_resv_region *entry;
>>       struct list_head mappings;
>> @@ -785,8 +786,8 @@ static int
>> iommu_create_device_direct_mappings(struct iommu_group *group,
>>           dma_addr_t start, end, addr;
>>           size_t map_size = 0;
>> -        if (domain->ops->apply_resv_region)
>> -            domain->ops->apply_resv_region(dev, domain, entry);
>> +        if (ops->apply_resv_region)
>> +            ops->apply_resv_region(dev, domain, entry);
>
> Strictly I think this was a domain op, as it was about reserving the
> IOVA range in the given DMA domain. Also taking the domain as an
> argument is a bit of a giveaway. However it's now just dead code either
> way since there are no remaining implementations, and no reason for any
> new ones.

This callback is a dead code. I will cleanup it.

$ git grep apply_resv_region
drivers/iommu/iommu.c: if (ops->apply_resv_region)
drivers/iommu/iommu.c: ops->apply_resv_region(dev,
domain, entry);
include/linux/iommu.h: * @apply_resv_region: Temporary helper call-back
for iova reserved ranges
include/linux/iommu.h: void (*apply_resv_region)(struct device *dev,

>
>>           start = ALIGN(entry->start, pg_size);
>>           end   = ALIGN(entry->start + entry->length, pg_size);
>> @@ -831,8 +832,10 @@ static int
>> iommu_create_device_direct_mappings(struct iommu_group *group,
>>   static bool iommu_is_attach_deferred(struct iommu_domain *domain,
>>                        struct device *dev)
>>   {
>> -    if (domain->ops->is_attach_deferred)
>> -        return domain->ops->is_attach_deferred(domain, dev);
>> +    const struct iommu_ops *ops = dev_iommu_ops_get(dev);
>> +
>> +    if (ops->is_attach_deferred)
>> +        return ops->is_attach_deferred(domain, dev);
>
> Similarly if this takes a domain as its first argument then it's de
> facto a domain method. However, I'd concur that logically it *is* a
> device op, so let's drop that (unused) domain argument if we're cleaning
> up.
>
> Maybe there's even an argument for factoring this out to a standard flag
> in dev_iommu rather than an op at all?

Make it part of dev_iommu looks more attractive. Let me check how many
efforts will it take. If a lot of changes required, maybe we can remove
@domain in this series and then switch it to a dev_iommu flag in a
separated series.

>
> The others covered here look OK - we can blame PCI for page response
> being weirdly device-centric - however could we also convert all the
> feasible instances of dev->bus->iommu_ops to dev_iommu_ops() as well?

Sure.

> (Subtly implying that I'm also not a fan of having "_get" in the name
> for a non-refcounted lookup...) Obviously iommu_probe_device() and
> iommmu_domain_alloc() still need bus ops at this point, but I'm working
> on that... :)

Thanks and glad to know that.

>
> Thanks,
> Robin.

Best regards,
baolu

2022-01-25 08:57:33

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH 7/7] iommu: Add iommu_domain::domain_ops

On 1/24/22 5:37 PM, Christoph Hellwig wrote:
> On Mon, Jan 24, 2022 at 03:11:02PM +0800, Lu Baolu wrote:
>> Add a domain specific callback set, domain_ops, for vendor iommu driver
>> to provide domain specific operations. Move domain-specific callbacks
>> from iommu_ops to the domain_ops and hook them when a domain is allocated.
>
> Ah, that's what I meant earlier. Perfect!
>
> Nit: I don't think vendor is the right term here.
>
> Maybe something like:
>
> iommut: split struct iommu_ops
>
> Move the domain specific operations out of struct iommu_ops into a new
> structure that only has domain specific operations. This solves
> the problem of needing to know if the method vector for a given
> operation needs to be retreived from the device or th domain.

Sure. Will use above description.

>
>> +struct domain_ops {
>
> This needs to keep an iommu in the name, i.e. iommu_domain_ops.

Sure.

>
>> + phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);
>
> Overly long line.

./scripts/checkpatch.pl --strict *.patch

didn't give me a WARN or CHECK. I will make it short anyway.

>
>> +static inline void iommu_domain_set_ops(struct iommu_domain *domain,
>> + const struct domain_ops *ops)
>> +{
>> + domain->ops = ops;
>> +}
>
> Do we really need this helper?

Unnecessary. I can set the pointer directly in the drivers.

>
>> +static const struct domain_ops amd_domain_ops;
>
> Can we avoid these forward declarations or would that cause too much
> churn?
>

I don't like this either. But it's common to put the ops at the bottom
of the file in almost all iommu drivers.

Best regards,
baolu

2022-01-25 08:57:33

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH 7/7] iommu: Add iommu_domain::domain_ops

On 1/25/22 1:24 AM, Jason Gunthorpe wrote:
> On Mon, Jan 24, 2022 at 01:37:21AM -0800, Christoph Hellwig wrote:
>> On Mon, Jan 24, 2022 at 03:11:02PM +0800, Lu Baolu wrote:
>>> Add a domain specific callback set, domain_ops, for vendor iommu driver
>>> to provide domain specific operations. Move domain-specific callbacks
>>> from iommu_ops to the domain_ops and hook them when a domain is allocated.
>>
>> Ah, that's what I meant earlier. Perfect!
>>
>> Nit: I don't think vendor is the right term here.
>
> +1 please don't use the confusing 'vendor' in the kernel, if you feel
> you want to write 'vendor' writing 'device driver' is usually a good
> choice..

Sure!

>
> This whole series is nice, the few small notes aside, thanks for
> making it!

Thank you!

>
> Jason
>

Best regards,
baolu

2022-01-25 08:59:07

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH 7/7] iommu: Add iommu_domain::domain_ops

On 1/25/22 1:55 AM, Jason Gunthorpe wrote:
> On Mon, Jan 24, 2022 at 03:11:02PM +0800, Lu Baolu wrote:
>> - int (*enable_nesting)(struct iommu_domain *domain);
>
>
> Lu, there is an implementation in the Intel driver here, is it usable
> at all?

It's useless and I have cleaned it up in this series.

>
> Jason
>

Best regards,
baolu

2022-01-25 08:59:14

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH 7/7] iommu: Add iommu_domain::domain_ops

On 1/24/22 5:58 PM, Tian, Kevin wrote:
>> From: Lu Baolu <[email protected]>
>> Sent: Monday, January 24, 2022 3:11 PM
>> +/**
>> + * struct domain_ops - per-domain ops
>> + * @attach_dev: attach an iommu domain to a device
>> + * @detach_dev: detach an iommu domain from a device
>
> What is the criteria about whether an op should be iommu_ops or domain_ops
> when it requires both domain and device pointers like above two (and future
> PASID-based attach)?

Generally ops belong to iommu_ops if they are only device oriented, and
domain_ops if they are only domain oriented. But it's really hard to
judge when both device and domain are involved. Good question. :-)

Perhaps we should leave the attach/detach interfaces in iommu_ops since
they are related to device capabilities. For example, some devices
support attach_dev_pasid, while others not.

>
> Other examples include:
> @apply_resv_region

This will be deprecated.

> @is_attach_deferred

Should be at least device centric (domain doesn't play any role here).
Further step is to save the is_attach_deferred at a flag in dev_iommu
and remove the ops (as Robin suggested).

>
> Thanks
> Kevin
>

Best regards,
baolu

2022-01-25 09:06:40

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH 7/7] iommu: Add iommu_domain::domain_ops

On 1/25/22 8:57 AM, Robin Murphy wrote:
> On 2022-01-24 07:11, Lu Baolu wrote:
>> Add a domain specific callback set, domain_ops, for vendor iommu driver
>> to provide domain specific operations. Move domain-specific callbacks
>> from iommu_ops to the domain_ops and hook them when a domain is
>> allocated.
>
> I think it would make a lot of sense for iommu_domain_ops to be a
> substructure *within* iommu_ops. That way we can save most of the driver
> boilerplate here by not needing extra variables everywhere, and letting
> iommu_domain_alloc() still do a default initialisation like "domain->ops
> = bus->iommu_ops->domain_ops;"

In the new model, iommu_domain_ops and iommu_ops are not 1:1 mapped.
For example, a PASID-capable IOMMU could support DMA domain (which
supports map/unmap), SVA domain (which does not), and others in the
future. Different type of domain has its own domain_ops.

>
> I do like the future possibility of trying to flatten some of the
> io-pgtable indirection by having the SMMU drivers subsequently swizzle
> in alternate domain ops once they've picked a format, though. That was a
> bit too clunky to achieve at the whole iommu_ops level when I
> experimented with it years ago, but now it might well be worth another
> try...
>
> One other comment below.
>
>> Signed-off-by: Lu Baolu <[email protected]>
>> ---
>>   include/linux/iommu.h                       | 93 ++++++++++++---------
>>   drivers/iommu/amd/iommu.c                   | 21 +++--
>>   drivers/iommu/apple-dart.c                  | 24 ++++--
>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 22 +++--
>>   drivers/iommu/arm/arm-smmu/arm-smmu.c       | 23 +++--
>>   drivers/iommu/arm/arm-smmu/qcom_iommu.c     | 17 ++--
>>   drivers/iommu/exynos-iommu.c                | 17 ++--
>>   drivers/iommu/fsl_pamu_domain.c             | 13 ++-
>>   drivers/iommu/intel/iommu.c                 | 25 ++++--
>>   drivers/iommu/iommu.c                       | 15 ++--
>>   drivers/iommu/ipmmu-vmsa.c                  | 21 +++--
>>   drivers/iommu/msm_iommu.c                   | 17 ++--
>>   drivers/iommu/mtk_iommu.c                   | 24 ++++--
>>   drivers/iommu/mtk_iommu_v1.c                | 19 +++--
>>   drivers/iommu/omap-iommu.c                  | 15 ++--
>>   drivers/iommu/rockchip-iommu.c              | 17 ++--
>>   drivers/iommu/s390-iommu.c                  | 15 ++--
>>   drivers/iommu/sprd-iommu.c                  | 19 +++--
>>   drivers/iommu/sun50i-iommu.c                | 18 ++--
>>   drivers/iommu/tegra-gart.c                  | 15 ++--
>>   drivers/iommu/tegra-smmu.c                  | 16 ++--
>>   drivers/iommu/virtio-iommu.c                | 18 ++--
>>   22 files changed, 305 insertions(+), 179 deletions(-)
>>
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index 111b3e9c79bb..33c5c0e5c465 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -88,7 +88,7 @@ struct iommu_domain_geometry {
>>   struct iommu_domain {
>>       unsigned type;
>> -    const struct iommu_ops *ops;
>> +    const struct domain_ops *ops;
>>       unsigned long pgsize_bitmap;    /* Bitmap of page sizes in use */
>>       iommu_fault_handler_t handler;
>>       void *handler_token;
>> @@ -192,26 +192,11 @@ struct iommu_iotlb_gather {
>>    * struct iommu_ops - iommu ops and capabilities
>>    * @capable: check capability
>>    * @domain_alloc: allocate iommu domain
>> - * @domain_free: free iommu domain
>> - * @attach_dev: attach device to an iommu domain
>> - * @detach_dev: detach device from an iommu domain
>> - * @map: map a physically contiguous memory region to an iommu domain
>> - * @map_pages: map a physically contiguous set of pages of the same
>> size to
>> - *             an iommu domain.
>> - * @unmap: unmap a physically contiguous memory region from an iommu
>> domain
>> - * @unmap_pages: unmap a number of pages of the same size from an
>> iommu domain
>> - * @flush_iotlb_all: Synchronously flush all hardware TLBs for this
>> domain
>> - * @iotlb_sync_map: Sync mappings created recently using @map to the
>> hardware
>> - * @iotlb_sync: Flush all queued ranges from the hardware TLBs and
>> empty flush
>> - *            queue
>> - * @iova_to_phys: translate iova to physical address
>>    * @probe_device: Add device to iommu driver handling
>>    * @release_device: Remove device from iommu driver handling
>>    * @probe_finalize: Do final setup work after the device is added to
>> an IOMMU
>>    *                  group and attached to the groups domain
>>    * @device_group: find iommu group for a particular device
>> - * @enable_nesting: Enable nesting
>> - * @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*)
>>    * @get_resv_regions: Request list of reserved regions for a device
>>    * @put_resv_regions: Free list of reserved regions for a device
>>    * @apply_resv_region: Temporary helper call-back for iova reserved
>> ranges
>> @@ -237,33 +222,11 @@ struct iommu_ops {
>>       /* Domain allocation and freeing by the iommu driver */
>>       struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type);
>> -    void (*domain_free)(struct iommu_domain *);
>> -    int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
>> -    void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
>> -    int (*map)(struct iommu_domain *domain, unsigned long iova,
>> -           phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
>> -    int (*map_pages)(struct iommu_domain *domain, unsigned long iova,
>> -             phys_addr_t paddr, size_t pgsize, size_t pgcount,
>> -             int prot, gfp_t gfp, size_t *mapped);
>> -    size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
>> -             size_t size, struct iommu_iotlb_gather *iotlb_gather);
>> -    size_t (*unmap_pages)(struct iommu_domain *domain, unsigned long
>> iova,
>> -                  size_t pgsize, size_t pgcount,
>> -                  struct iommu_iotlb_gather *iotlb_gather);
>> -    void (*flush_iotlb_all)(struct iommu_domain *domain);
>> -    void (*iotlb_sync_map)(struct iommu_domain *domain, unsigned long
>> iova,
>> -                   size_t size);
>> -    void (*iotlb_sync)(struct iommu_domain *domain,
>> -               struct iommu_iotlb_gather *iotlb_gather);
>> -    phys_addr_t (*iova_to_phys)(struct iommu_domain *domain,
>> dma_addr_t iova);
>>       struct iommu_device *(*probe_device)(struct device *dev);
>>       void (*release_device)(struct device *dev);
>>       void (*probe_finalize)(struct device *dev);
>>       struct iommu_group *(*device_group)(struct device *dev);
>> -    int (*enable_nesting)(struct iommu_domain *domain);
>> -    int (*set_pgtable_quirks)(struct iommu_domain *domain,
>> -                  unsigned long quirks);
>>       /* Request/Free a list of reserved regions for a device */
>>       void (*get_resv_regions)(struct device *dev, struct list_head
>> *list);
>> @@ -296,6 +259,60 @@ struct iommu_ops {
>>       struct module *owner;
>>   };
>> +/**
>> + * struct domain_ops - per-domain ops
>> + * @attach_dev: attach an iommu domain to a device
>> + * @detach_dev: detach an iommu domain from a device
>> + * @map: map a physically contiguous memory region to an iommu domain
>> + * @map_pages: map a physically contiguous set of pages of the same
>> size to
>> + *             an iommu domain.
>> + * @unmap: unmap a physically contiguous memory region from an iommu
>> domain
>> + * @unmap_pages: unmap a number of pages of the same size from an
>> iommu domain
>> + * @flush_iotlb_all: Synchronously flush all hardware TLBs for this
>> domain
>> + * @iotlb_sync_map: Sync mappings created recently using @map to the
>> hardware
>> + * @iotlb_sync: Flush all queued ranges from the hardware TLBs and
>> empty flush
>> + *            queue
>> + * @iova_to_phys: translate iova to physical address
>> + * @enable_nesting: Enable nesting
>> + * @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*)
>> + * @release: Release the domain after use.
>
> Not sure about the name change here - it's still logically a "free" that
> matches an "alloc", we're not doing anything clever like refcounting or
> devres, which is what I'd tend to associate with a "release" function
> for an allocated resource.

Yes. "free" is better.

>
> Robin.

Best regards,
baolu

2022-01-25 17:00:09

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 7/7] iommu: Add iommu_domain::domain_ops

On Tue, Jan 25, 2022 at 12:59:14PM +0800, Lu Baolu wrote:
> On 1/24/22 5:58 PM, Tian, Kevin wrote:
> > > From: Lu Baolu <[email protected]>
> > > Sent: Monday, January 24, 2022 3:11 PM
> > > +/**
> > > + * struct domain_ops - per-domain ops
> > > + * @attach_dev: attach an iommu domain to a device
> > > + * @detach_dev: detach an iommu domain from a device
> >
> > What is the criteria about whether an op should be iommu_ops or domain_ops
> > when it requires both domain and device pointers like above two (and future
> > PASID-based attach)?
>
> Generally ops belong to iommu_ops if they are only device oriented, and
> domain_ops if they are only domain oriented. But it's really hard to
> judge when both device and domain are involved. Good question. :-)
>
> Perhaps we should leave the attach/detach interfaces in iommu_ops since
> they are related to device capabilities. For example, some devices
> support attach_dev_pasid, while others not.

Isn't the attach process for something like SVA or the KVM version
actually rather different code?

Ideally the function pointers should help minimize case statements/etc
inside the driver..

Or stated another way, I expect drivers will have different structs
container_of'ing back to the iommu_domain (ie intel_iommu_domain_sva,
say). Any code that needs to work differently depending on the struct
should have an op in the domain so it can sanely container_of without
a mess.

Jason

2022-01-25 20:16:30

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH 7/7] iommu: Add iommu_domain::domain_ops

On 2022-01-25 06:27, Lu Baolu wrote:
> On 1/25/22 8:57 AM, Robin Murphy wrote:
>> On 2022-01-24 07:11, Lu Baolu wrote:
>>> Add a domain specific callback set, domain_ops, for vendor iommu driver
>>> to provide domain specific operations. Move domain-specific callbacks
>>> from iommu_ops to the domain_ops and hook them when a domain is
>>> allocated.
>>
>> I think it would make a lot of sense for iommu_domain_ops to be a
>> substructure *within* iommu_ops. That way we can save most of the
>> driver boilerplate here by not needing extra variables everywhere, and
>> letting iommu_domain_alloc() still do a default initialisation like
>> "domain->ops = bus->iommu_ops->domain_ops;"
>
> In the new model, iommu_domain_ops and iommu_ops are not 1:1 mapped.
> For example, a PASID-capable IOMMU could support DMA domain (which
> supports map/unmap), SVA domain (which does not), and others in the
> future. Different type of domain has its own domain_ops.

Sure, it's clear that that's the direction in which this is headed, and
as I say I'm quite excited about that. However there are a couple of
points I think are really worth considering:

Where it's just about which operations are valid for which domains, it's
even simpler for the core interface wrappers to validate the domain
type, rather than forcing drivers to implement multiple ops structures
purely for the sake of having different callbacks populated. We already
have this in places, e.g. where iommu_map() checks for
__IOMMU_DOMAIN_PAGING.

Paging domains are also effectively the baseline level of IOMMU API
functionality. All drivers support them, and for the majority of drivers
it's all they will ever support. Those drivers really don't benefit from
any of the churn and boilerplate in this patch as-is, and it's so easy
to compromise with a couple of lines of core code to handle the common
case by default when the driver *isn't* one of the handful which ever
actually cares to install their own per-domain ops. Consider how much
cleaner this patch would look if the typical driver diff could be
something completely minimal like this:

----->8-----
diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index be22fcf988ce..6aff493e37ee 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -514,12 +514,14 @@ static int mtk_iommu_hw_init(const struct
mtk_iommu_data *data)

static const struct iommu_ops mtk_iommu_ops = {
.domain_alloc = mtk_iommu_domain_alloc,
- .domain_free = mtk_iommu_domain_free,
- .attach_dev = mtk_iommu_attach_device,
- .detach_dev = mtk_iommu_detach_device,
- .map = mtk_iommu_map,
- .unmap = mtk_iommu_unmap,
- .iova_to_phys = mtk_iommu_iova_to_phys,
+ .domain_ops = &(const struct iommu_domain_ops){
+ .domain_free = mtk_iommu_domain_free,
+ .attach_dev = mtk_iommu_attach_device,
+ .detach_dev = mtk_iommu_detach_device,
+ .map = mtk_iommu_map,
+ .unmap = mtk_iommu_unmap,
+ .iova_to_phys = mtk_iommu_iova_to_phys,
+ }
.probe_device = mtk_iommu_probe_device,
.probe_finalize = mtk_iommu_probe_finalize,
.release_device = mtk_iommu_release_device,

-----8<-----

And of course I have to shy away from calling it default_domain_ops,
since although it's logically default ops for domains, it is not
specifically ops for default domains, sigh... :)

Cheers,
Robin.

2022-01-25 22:32:27

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH 0/7] iommu cleanup and refactoring

On 2022-01-24 17:44, Jason Gunthorpe wrote:
> On Mon, Jan 24, 2022 at 09:46:26AM +0000, Tian, Kevin wrote:
>>> From: Lu Baolu <[email protected]>
>>> Sent: Monday, January 24, 2022 3:11 PM
>>>
>>> Hi,
>>>
>>> The guest pasid and aux-domain related code are dead code in current
>>> iommu subtree. As we have reached a consensus that all these features
>>> should be based on the new iommufd framework (which is under active
>>> development), the first part of this series removes and cleanups all
>>> the dead code.
>>>
>>> The second part of this series refactors the iommu_domain by moving all
>>> domain-specific ops from iommu_ops to a new domain_ops. This makes an
>>> iommu_domain self-contained and represent the abstraction of an I/O
>>> translation table in the IOMMU subsystem. With different type of
>>> iommu_domain providing different set of ops, it's easier to support more
>>> types of I/O translation tables.
>>
>> You may want to give more background on this end goal. In general there
>> are four IOPT types in iommufd discussions:
>>
>> 1) The one currently tracked by iommu_domain, with a map/unmap semantics
>> 2) The one managed by mm and shared to iommu via sva_bind/unbind ops
>> 3) The one managed by userspace and bound to iommu via iommufd (require nesting)
>> 4) The one managed by KVM (e.g. EPT) and shared to iommu via a TBD interface
>
> Yes, at least from an iommufd perspective I'd like to see one struct
> for all of these types, mainly so we can have a uniform alloc, attach
> and detatch flow for all io page table types.

Agreed, certainly an IOMMU_DOMAIN_SVA type that can both encapsulate the
mm and effectively replace iommu_sva seems like a logical and fairly
small next step. We already have the paradigm of different domain types
supporting different ops, so initially an SVA domain would simply allow
bind/unbind rather than attach/detach/map/unmap.

It might then further be possible to hide SVA bind/unbind behind the
attach/detach interface, but AFAICS we'd still need distinct flows for
attaching/binding the whole device vs. attaching/binding to a PASID,
since they are fundamentally different things in their own right, and
the ideal API should give us the orthogonality to also bind a device to
an SVA domain without PASID (e.g. for KVM stage 2, or userspace
assignment of simpler fault/stall-tolerant devices), or attach PASIDs to
regular iommu_domains.

That distinction could of course be consolidated by flipping to the
other approach of explicitly allocating the PASID first, then wrapping
it in a struct device that could then be passed through the same
attach/detach interfaces and distinguished internally, but although I
still have a fondness for that approach I know I'm about the only one :)

Cheers,
Robin.

> If we want to use the iommu_domain, or make iommu_domain a sub-class
> of a new struct, can be determined as we go along.
>
> Regardless, I think this cleanup stands on its own. Moving the ops and
> purging the dead code is clearly the right thing to do.
>
> Thanks,
> Jason

2022-01-25 22:47:11

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 7/7] iommu: Add iommu_domain::domain_ops

On Tue, Jan 25, 2022 at 02:23:52PM +0000, Robin Murphy wrote:
> On 2022-01-25 06:27, Lu Baolu wrote:

> Where it's just about which operations are valid for which domains, it's
> even simpler for the core interface wrappers to validate the domain type,
> rather than forcing drivers to implement multiple ops structures purely for
> the sake of having different callbacks populated. We already have this in
> places, e.g. where iommu_map() checks for __IOMMU_DOMAIN_PAGING.

In my experience it is usually much clearer to directly test the op
for NULL to know if a feature is supported than to invent flags to do
the same test. eg ops->map/etc == NULL means no paging.

I think we should not be afraid to have multiple ops in drivers for
things that are actually different in the driver. This is usually a
net win vs tying to handle all the cases with different 'if' flows.

eg identity domains and others really would ideally eventually have a
NULL ops for map/unmap too.

The 'type' should conceptually be part of the ops, not the mutable
struct - but we don't have to get there all at once.

> Paging domains are also effectively the baseline level of IOMMU API
> functionality. All drivers support them, and for the majority of drivers
> it's all they will ever support. Those drivers really don't benefit from any
> of the churn and boilerplate in this patch as-is, and it's so easy to
> compromise with a couple of lines of core code to handle the common case by
> default when the driver *isn't* one of the handful which ever actually cares
> to install their own per-domain ops. Consider how much cleaner this patch
> would look if the typical driver diff could be something completely minimal
> like this:

It is clever, but I'm not sure if hoisting a single assignment out of
the driver is worth the small long term complexity of having different
driver flows?

Jason

2022-01-25 22:48:32

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 0/7] iommu cleanup and refactoring

On Tue, Jan 25, 2022 at 02:48:02PM +0000, Robin Murphy wrote:

> Agreed, certainly an IOMMU_DOMAIN_SVA type that can both encapsulate the mm
> and effectively replace iommu_sva seems like a logical and fairly small next
> step. We already have the paradigm of different domain types supporting
> different ops, so initially an SVA domain would simply allow bind/unbind
> rather than attach/detach/map/unmap.

I hope we can quickly get to a PASID enabled generic attach/detach
scheme - we really need this to do the uAPI part of this interface.

> they are fundamentally different things in their own right, and the ideal
> API should give us the orthogonality to also bind a device to an SVA domain
> without PASID (e.g. for KVM stage 2, or userspace assignment of simpler
> fault/stall-tolerant devices), or attach PASIDs to regular iommu_domains.

Yes, these are orthogonal things. A iommu driver that supports PASID
ideally should support PASID enabled attach/detatch for every
iommu_domain type it supports.

SVA should not be entangled with PASID beyond that SVA is often used
with PASID - a SVA iommu_domain should be fully usable with a RID too.

I'm hoping to see the core iommu code provide some simplified "SVA"
API that under the covers creates a SVA domain and then does a normal
PASID attach using the global PASID in the mm_struct - the
driver should not care what, or even if, PASID is used for a SVA
domain.

Jason

2022-01-26 15:19:58

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH 0/7] iommu cleanup and refactoring

On 1/25/22 11:16 PM, Jason Gunthorpe wrote:
> On Tue, Jan 25, 2022 at 02:48:02PM +0000, Robin Murphy wrote:
>
>> Agreed, certainly an IOMMU_DOMAIN_SVA type that can both encapsulate the mm
>> and effectively replace iommu_sva seems like a logical and fairly small next
>> step. We already have the paradigm of different domain types supporting
>> different ops, so initially an SVA domain would simply allow bind/unbind
>> rather than attach/detach/map/unmap.
>
> I hope we can quickly get to a PASID enabled generic attach/detach
> scheme - we really need this to do the uAPI part of this interface.

Agreed. Jacob is working on kernel DMA with PASID. He needs such
interfaces as well. I have worked out an implementation for vt-d driver.
It could be post for review inside Jacob's series for kernel DMA with
PASID.

>
>> they are fundamentally different things in their own right, and the ideal
>> API should give us the orthogonality to also bind a device to an SVA domain
>> without PASID (e.g. for KVM stage 2, or userspace assignment of simpler
>> fault/stall-tolerant devices), or attach PASIDs to regular iommu_domains.
>
> Yes, these are orthogonal things. A iommu driver that supports PASID
> ideally should support PASID enabled attach/detatch for every
> iommu_domain type it supports.
>
> SVA should not be entangled with PASID beyond that SVA is often used
> with PASID - a SVA iommu_domain should be fully usable with a RID too.

The prototype of PASID enabled attach/detach ops could look like:

int (*attach_dev_pasid)(struct iommu_domain *domain,
struct device *dev, ioasid_t id);
void (*detach_dev_pasid)(struct iommu_domain *domain,
struct device *dev, ioasid_t id);

But the iommu driver should implement different callbacks for

1) attaching an IOMMU DMA domain to a PASID on device;
- kernel DMA with PASID
- mdev-like device passthrough
- etc.
2) attaching a CPU-shared domain to a PASID on device;
- SVA
- guest PASID
- etc.

>
> I'm hoping to see the core iommu code provide some simplified "SVA"
> API that under the covers creates a SVA domain and then does a normal
> PASID attach using the global PASID in the mm_struct - the
> driver should not care what, or even if, PASID is used for a SVA
> domain.
>
> Jason
>

Best regards,
baolu

2022-01-26 20:50:31

by Jean-Philippe Brucker

[permalink] [raw]
Subject: Re: [PATCH 7/7] iommu: Add iommu_domain::domain_ops

On Mon, Jan 24, 2022 at 12:33:02PM -0400, Jason Gunthorpe wrote:
> On Mon, Jan 24, 2022 at 10:16:07AM +0000, Jean-Philippe Brucker wrote:
> > On Mon, Jan 24, 2022 at 09:58:18AM +0000, Tian, Kevin wrote:
> > > > From: Lu Baolu <[email protected]>
> > > > Sent: Monday, January 24, 2022 3:11 PM
> > > > +/**
> > > > + * struct domain_ops - per-domain ops
> > > > + * @attach_dev: attach an iommu domain to a device
> > > > + * @detach_dev: detach an iommu domain from a device
> > >
> > > What is the criteria about whether an op should be iommu_ops or domain_ops
> > > when it requires both domain and device pointers like above two (and future
> > > PASID-based attach)?
> > >
> > > Other examples include:
> > > @apply_resv_region
> > > @is_attach_deferred
> >
> > Could attach_dev() be an IOMMU op? So a driver could set the domain ops
> > in attach_dev() rather than domain_alloc(). That would allow to install
> > map()/unmap() ops that are tailored for the device's IOMMU, which we don't
> > know at domain_alloc() time.
>
> I think we should be moving toward 'domain_alloc' returning the
> correct domain and the way the driver implements the domain shouldn't
> change after that.
>
> > I'm thinking about a guest that has both physical and virtual
> > endpoints, which would ideally use different kinds of domain ops to
> > support both efficiently (caching mode vs page tables)
>
> In this case shouldn't domain_alloc() reached from the struct device
> already do the correct thing?

Sure, if we can finalise the domains before attach that could also clean
up the drivers a bit.

Thanks,
Jean

2022-01-26 21:20:29

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 0/7] iommu cleanup and refactoring

On Wed, Jan 26, 2022 at 09:51:36AM +0800, Lu Baolu wrote:

> > > they are fundamentally different things in their own right, and the ideal
> > > API should give us the orthogonality to also bind a device to an SVA domain
> > > without PASID (e.g. for KVM stage 2, or userspace assignment of simpler
> > > fault/stall-tolerant devices), or attach PASIDs to regular iommu_domains.
> >
> > Yes, these are orthogonal things. A iommu driver that supports PASID
> > ideally should support PASID enabled attach/detatch for every
> > iommu_domain type it supports.
> >
> > SVA should not be entangled with PASID beyond that SVA is often used
> > with PASID - a SVA iommu_domain should be fully usable with a RID too.
>
> The prototype of PASID enabled attach/detach ops could look like:
>
> int (*attach_dev_pasid)(struct iommu_domain *domain,
> struct device *dev, ioasid_t id);
> void (*detach_dev_pasid)(struct iommu_domain *domain,
> struct device *dev, ioasid_t id);

It seems reasonable and straightforward to me..

These would be domain ops?

> But the iommu driver should implement different callbacks for
>
> 1) attaching an IOMMU DMA domain to a PASID on device;
> - kernel DMA with PASID
> - mdev-like device passthrough
> - etc.
> 2) attaching a CPU-shared domain to a PASID on device;
> - SVA
> - guest PASID
> - etc.

But this you mean domain->ops would be different? Seems fine, up to
the driver.

I'd hope to see some flow like:

domain = device->bus->iommu_ops->alloc_sva_domain(dev)
domain->ops->attach_dev_pasid(domain, dev, current->pasid)

To duplicate the current SVA APIs

Jason

2022-01-26 21:28:16

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH 0/7] iommu cleanup and refactoring

On 2022-01-26 13:27, Jason Gunthorpe via iommu wrote:
> On Wed, Jan 26, 2022 at 09:51:36AM +0800, Lu Baolu wrote:
>
>>>> they are fundamentally different things in their own right, and the ideal
>>>> API should give us the orthogonality to also bind a device to an SVA domain
>>>> without PASID (e.g. for KVM stage 2, or userspace assignment of simpler
>>>> fault/stall-tolerant devices), or attach PASIDs to regular iommu_domains.
>>>
>>> Yes, these are orthogonal things. A iommu driver that supports PASID
>>> ideally should support PASID enabled attach/detatch for every
>>> iommu_domain type it supports.
>>>
>>> SVA should not be entangled with PASID beyond that SVA is often used
>>> with PASID - a SVA iommu_domain should be fully usable with a RID too.
>>
>> The prototype of PASID enabled attach/detach ops could look like:
>>
>> int (*attach_dev_pasid)(struct iommu_domain *domain,
>> struct device *dev, ioasid_t id);
>> void (*detach_dev_pasid)(struct iommu_domain *domain,
>> struct device *dev, ioasid_t id);
>
> It seems reasonable and straightforward to me..
>
> These would be domain ops?
>
>> But the iommu driver should implement different callbacks for
>>
>> 1) attaching an IOMMU DMA domain to a PASID on device;
>> - kernel DMA with PASID
>> - mdev-like device passthrough
>> - etc.
>> 2) attaching a CPU-shared domain to a PASID on device;
>> - SVA
>> - guest PASID
>> - etc.
>
> But this you mean domain->ops would be different? Seems fine, up to
> the driver.

Indeed, it makes little practical difference whether the driver provides
distinct sets of ops or just common callbacks which switch on the domain
type internally. I expect that's largely going to come down to personal
preference and how much internal driver code is common between the paths.

> I'd hope to see some flow like:
>
> domain = device->bus->iommu_ops->alloc_sva_domain(dev)
> domain->ops->attach_dev_pasid(domain, dev, current->pasid)
>
> To duplicate the current SVA APIs

+1 - I'd concur that attach/detach belong as domain ops in general.
There's quite a nice logical split forming here where domain ops are the
ones that make sense for external subsystems and endpoint drivers to use
too, while device ops, with the sole exception of domain_alloc, are
IOMMU API internals. By that reasoning, attach/bind/etc. are firmly in
the former category.

Thanks,
Robin.

2022-02-08 08:02:03

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH 0/7] iommu cleanup and refactoring

On 1/24/22 3:10 PM, Lu Baolu wrote:
> Hi,
>
> The guest pasid and aux-domain related code are dead code in current
> iommu subtree. As we have reached a consensus that all these features
> should be based on the new iommufd framework (which is under active
> development), the first part of this series removes and cleanups all
> the dead code.
>
> The second part of this series refactors the iommu_domain by moving all
> domain-specific ops from iommu_ops to a new domain_ops. This makes an
> iommu_domain self-contained and represent the abstraction of an I/O
> translation table in the IOMMU subsystem. With different type of
> iommu_domain providing different set of ops, it's easier to support more
> types of I/O translation tables.
>
> Please help to review and comment.

Thank you all for the great comments. A new version of this series has
been posted.

https://lore.kernel.org/linux-iommu/[email protected]/

Please check whether I missed anything.

Best regards,
baolu