2024-05-27 04:07:55

by Lu Baolu

[permalink] [raw]
Subject: [PATCH v6 00/10] IOMMUFD: Deliver IO page faults to user space

This series implements the functionality of delivering IO page faults to
user space through the IOMMUFD framework. One feasible use case is the
nested translation. Nested translation is a hardware feature that
supports two-stage translation tables for IOMMU. The second-stage
translation table is managed by the host VMM, while the first-stage
translation table is owned by user space. This allows user space to
control the IOMMU mappings for its devices.

When an IO page fault occurs on the first-stage translation table, the
IOMMU hardware can deliver the page fault to user space through the
IOMMUFD framework. User space can then handle the page fault and respond
to the device top-down through the IOMMUFD. This allows user space to
implement its own IO page fault handling policies.

User space application that is capable of handling IO page faults should
allocate a fault object, and bind the fault object to any domain that it
is willing to handle the fault generatd for them. On a successful return
of fault object allocation, the user can retrieve and respond to page
faults by reading or writing to the file descriptor (FD) returned.

The iommu selftest framework has been updated to test the IO page fault
delivery and response functionality.

The series and related patches are available on GitHub:
https://github.com/LuBaolu/intel-iommu/commits/iommufd-io-pgfault-delivery-v6

Change log:
v6:
- Refine the attach handle code by shifting the handle allocation to
the caller. The caller will then provide the allocated handle to the
domain attachment interfaces.
- Add reference counter in iommufd_fault_iopf_enable/disable() helpers.
- Fix the return values of fault FD's read/write fops.
- Add IOMMU_CAP_USER_IOASID_TABLE capability and check it before roll
back getting attach_handle to RID.
- Move the iopf respond queue from iommufd device to iommufd fault.
- Disallow PRI enablement on SR-IOV VF devices.
- Miscellaneous cleanup.

v5: https://lore.kernel.org/linux-iommu/[email protected]/
- Removed attach handle reference count from the core. Drivers will now
synchronize their use of handles and domain attach/detach.
- Automatically responds to all outstanding faults in hwpt detach or
replace paths.
- Supports getting a domain-type specific attach handle.
- Reorganized the series by changing the patch order.
- Miscellaneous cleanup.

v4: https://lore.kernel.org/linux-iommu/[email protected]/
- Add the iommu domain attachment handle to replace the iopf-specific
domain attachment interfaces introduced in the previous v3.
- Replace the iommu_sva with iommu domain attachment handle.
- Refine some fields in the fault and response message encoding
according to feedback collected during v3 review period.
- Refine and fix some problems in the fault FD implementation.
- Miscellaneous cleanup.

v3: https://lore.kernel.org/linux-iommu/[email protected]/
- Add iopf domain attach/detach/replace interfaces to manage the
reference counters of hwpt and device, ensuring that both can only be
destroyed after all outstanding IOPFs have been responded to.
- Relocate the fault handling file descriptor from hwpt to a fault
object to enable a single fault handling object to be utilized
across multiple domains.
- Miscellaneous cleanup and performance improvements.

v2: https://lore.kernel.org/linux-iommu/[email protected]/
- Move all iommu refactoring patches into a sparated series and discuss
it in a different thread. The latest patch series [v6] is available at
https://lore.kernel.org/linux-iommu/[email protected]/
- We discussed the timeout of the pending page fault messages. We
agreed that we shouldn't apply any timeout policy for the page fault
handling in user space.
https://lore.kernel.org/linux-iommu/20230616113232.GA84678@myrica/
- Jason suggested that we adopt a simple file descriptor interface for
reading and responding to I/O page requests, so that user space
applications can improve performance using io_uring.
https://lore.kernel.org/linux-iommu/[email protected]/

v1: https://lore.kernel.org/linux-iommu/[email protected]/

Lu Baolu (10):
iommu: Introduce domain attachment handle
iommu: Remove sva handle list
iommu: Add attach handle to struct iopf_group
iommu: Extend domain attach group with handle support
iommufd: Add fault and response message definitions
iommufd: Add iommufd fault object
iommufd: Fault-capable hwpt attach/detach/replace
iommufd: Associate fault object with iommufd_hw_pgtable
iommufd/selftest: Add IOPF support for mock device
iommufd/selftest: Add coverage for IOPF test

include/linux/iommu.h | 42 +-
drivers/iommu/iommu-priv.h | 11 +
drivers/iommu/iommufd/iommufd_private.h | 51 ++
drivers/iommu/iommufd/iommufd_test.h | 8 +
include/uapi/linux/iommufd.h | 122 +++++
tools/testing/selftests/iommu/iommufd_utils.h | 84 +++-
drivers/dma/idxd/init.c | 2 +-
drivers/iommu/io-pgfault.c | 61 +--
drivers/iommu/iommu-sva.c | 37 +-
drivers/iommu/iommu.c | 178 +++++--
drivers/iommu/iommufd/device.c | 16 +-
drivers/iommu/iommufd/fault.c | 435 ++++++++++++++++++
drivers/iommu/iommufd/hw_pagetable.c | 41 +-
drivers/iommu/iommufd/main.c | 6 +
drivers/iommu/iommufd/selftest.c | 64 +++
tools/testing/selftests/iommu/iommufd.c | 18 +
.../selftests/iommu/iommufd_fail_nth.c | 2 +-
drivers/iommu/iommufd/Makefile | 1 +
18 files changed, 1060 insertions(+), 119 deletions(-)
create mode 100644 drivers/iommu/iommufd/fault.c

--
2.34.1



2024-05-27 04:08:10

by Lu Baolu

[permalink] [raw]
Subject: [PATCH v6 01/10] iommu: Introduce domain attachment handle

Currently, when attaching a domain to a device or its PASID, domain is
stored within the iommu group. It could be retrieved for use during the
window between attachment and detachment.

With new features introduced, there's a need to store more information
than just a domain pointer. This information essentially represents the
association between a domain and a device. For example, the SVA code
already has a custom struct iommu_sva which represents a bond between
sva domain and a PASID of a device. Looking forward, the IOMMUFD needs
a place to store the iommufd_device pointer in the core, so that the
device object ID could be quickly retrieved in the critical fault handling
path.

Introduce domain attachment handle that explicitly represents the
attachment relationship between a domain and a device or its PASID.

Co-developed-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
---
include/linux/iommu.h | 18 +++++++++++++++---
drivers/dma/idxd/init.c | 2 +-
drivers/iommu/iommu-sva.c | 13 ++++++++-----
drivers/iommu/iommu.c | 23 +++++++++++++----------
4 files changed, 37 insertions(+), 19 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 7bc8dff7cf6d..b89404acacd6 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -989,12 +989,22 @@ struct iommu_fwspec {
/* ATS is supported */
#define IOMMU_FWSPEC_PCI_RC_ATS (1 << 0)

+/*
+ * An iommu attach handle represents a relationship between an iommu domain
+ * and a PASID or RID of a device. It is allocated and managed by the component
+ * that manages the domain and is stored in the iommu group during the time the
+ * domain is attached.
+ */
+struct iommu_attach_handle {
+ struct iommu_domain *domain;
+};
+
/**
* struct iommu_sva - handle to a device-mm bond
*/
struct iommu_sva {
+ struct iommu_attach_handle handle;
struct device *dev;
- struct iommu_domain *domain;
struct list_head handle_item;
refcount_t users;
};
@@ -1052,7 +1062,8 @@ int iommu_device_claim_dma_owner(struct device *dev, void *owner);
void iommu_device_release_dma_owner(struct device *dev);

int iommu_attach_device_pasid(struct iommu_domain *domain,
- struct device *dev, ioasid_t pasid);
+ struct device *dev, ioasid_t pasid,
+ struct iommu_attach_handle *handle);
void iommu_detach_device_pasid(struct iommu_domain *domain,
struct device *dev, ioasid_t pasid);
struct iommu_domain *
@@ -1388,7 +1399,8 @@ static inline int iommu_device_claim_dma_owner(struct device *dev, void *owner)
}

static inline int iommu_attach_device_pasid(struct iommu_domain *domain,
- struct device *dev, ioasid_t pasid)
+ struct device *dev, ioasid_t pasid,
+ struct iommu_attach_handle *handle)
{
return -ENODEV;
}
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index a7295943fa22..385c488c9cd1 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -584,7 +584,7 @@ static int idxd_enable_system_pasid(struct idxd_device *idxd)
* DMA domain is owned by the driver, it should support all valid
* types such as DMA-FQ, identity, etc.
*/
- ret = iommu_attach_device_pasid(domain, dev, pasid);
+ ret = iommu_attach_device_pasid(domain, dev, pasid, NULL);
if (ret) {
dev_err(dev, "failed to attach device pasid %d, domain type %d",
pasid, domain->type);
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 18a35e798b72..0fb923254062 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -99,7 +99,9 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm

/* Search for an existing domain. */
list_for_each_entry(domain, &mm->iommu_mm->sva_domains, next) {
- ret = iommu_attach_device_pasid(domain, dev, iommu_mm->pasid);
+ handle->handle.domain = domain;
+ ret = iommu_attach_device_pasid(domain, dev, iommu_mm->pasid,
+ &handle->handle);
if (!ret) {
domain->users++;
goto out;
@@ -113,7 +115,9 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm
goto out_free_handle;
}

- ret = iommu_attach_device_pasid(domain, dev, iommu_mm->pasid);
+ handle->handle.domain = domain;
+ ret = iommu_attach_device_pasid(domain, dev, iommu_mm->pasid,
+ &handle->handle);
if (ret)
goto out_free_domain;
domain->users = 1;
@@ -124,7 +128,6 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm
list_add(&handle->handle_item, &mm->iommu_mm->sva_handles);
mutex_unlock(&iommu_sva_lock);
handle->dev = dev;
- handle->domain = domain;
return handle;

out_free_domain:
@@ -147,7 +150,7 @@ EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
*/
void iommu_sva_unbind_device(struct iommu_sva *handle)
{
- struct iommu_domain *domain = handle->domain;
+ struct iommu_domain *domain = handle->handle.domain;
struct iommu_mm_data *iommu_mm = domain->mm->iommu_mm;
struct device *dev = handle->dev;

@@ -170,7 +173,7 @@ EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);

u32 iommu_sva_get_pasid(struct iommu_sva *handle)
{
- struct iommu_domain *domain = handle->domain;
+ struct iommu_domain *domain = handle->handle.domain;

return mm_get_enqcmd_pasid(domain->mm);
}
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 9df7cc75c1bc..efd281ddfa63 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3352,16 +3352,17 @@ static void __iommu_remove_group_pasid(struct iommu_group *group,
* @domain: the iommu domain.
* @dev: the attached device.
* @pasid: the pasid of the device.
+ * @handle: the attach handle.
*
* Return: 0 on success, or an error.
*/
int iommu_attach_device_pasid(struct iommu_domain *domain,
- struct device *dev, ioasid_t pasid)
+ struct device *dev, ioasid_t pasid,
+ struct iommu_attach_handle *handle)
{
/* Caller must be a probed driver on dev */
struct iommu_group *group = dev->iommu_group;
struct group_device *device;
- void *curr;
int ret;

if (!domain->ops->set_dev_pasid)
@@ -3382,11 +3383,9 @@ int iommu_attach_device_pasid(struct iommu_domain *domain,
}
}

- curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain, GFP_KERNEL);
- if (curr) {
- ret = xa_err(curr) ? : -EBUSY;
+ ret = xa_insert(&group->pasid_array, pasid, handle, GFP_KERNEL);
+ if (ret)
goto out_unlock;
- }

ret = __iommu_set_group_pasid(domain, group, pasid);
if (ret)
@@ -3414,7 +3413,7 @@ void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev,

mutex_lock(&group->mutex);
__iommu_remove_group_pasid(group, pasid, domain);
- WARN_ON(xa_erase(&group->pasid_array, pasid) != domain);
+ xa_erase(&group->pasid_array, pasid);
mutex_unlock(&group->mutex);
}
EXPORT_SYMBOL_GPL(iommu_detach_device_pasid);
@@ -3439,15 +3438,19 @@ struct iommu_domain *iommu_get_domain_for_dev_pasid(struct device *dev,
{
/* Caller must be a probed driver on dev */
struct iommu_group *group = dev->iommu_group;
- struct iommu_domain *domain;
+ struct iommu_attach_handle *handle;
+ struct iommu_domain *domain = NULL;

if (!group)
return NULL;

xa_lock(&group->pasid_array);
- domain = xa_load(&group->pasid_array, pasid);
+ handle = xa_load(&group->pasid_array, pasid);
+ if (handle)
+ domain = handle->domain;
+
if (type && domain && domain->type != type)
- domain = ERR_PTR(-EBUSY);
+ domain = NULL;
xa_unlock(&group->pasid_array);

return domain;
--
2.34.1


2024-05-27 04:08:22

by Lu Baolu

[permalink] [raw]
Subject: [PATCH v6 02/10] iommu: Remove sva handle list

The struct sva_iommu represents an association between an SVA domain and
a PASID of a device. It's stored in the iommu group's pasid array and also
tracked by a list in the per-mm data structure. Removes duplicate tracking
of sva_iommu by eliminating the list.

Signed-off-by: Lu Baolu <[email protected]>
---
include/linux/iommu.h | 2 --
drivers/iommu/iommu-priv.h | 3 +++
drivers/iommu/iommu-sva.c | 21 ++++++++++++---------
drivers/iommu/iommu.c | 31 +++++++++++++++++++++++++++++++
4 files changed, 46 insertions(+), 11 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index b89404acacd6..823fa3bcc2c6 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -1005,14 +1005,12 @@ struct iommu_attach_handle {
struct iommu_sva {
struct iommu_attach_handle handle;
struct device *dev;
- struct list_head handle_item;
refcount_t users;
};

struct iommu_mm_data {
u32 pasid;
struct list_head sva_domains;
- struct list_head sva_handles;
};

int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
index 5f731d994803..f1536a5ebb0d 100644
--- a/drivers/iommu/iommu-priv.h
+++ b/drivers/iommu/iommu-priv.h
@@ -28,4 +28,7 @@ void iommu_device_unregister_bus(struct iommu_device *iommu,
const struct bus_type *bus,
struct notifier_block *nb);

+struct iommu_attach_handle *iommu_attach_handle_get(struct iommu_group *group,
+ ioasid_t pasid,
+ unsigned int type);
#endif /* __LINUX_IOMMU_PRIV_H */
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 0fb923254062..e85f4ccc9dcc 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -41,7 +41,6 @@ static struct iommu_mm_data *iommu_alloc_mm_data(struct mm_struct *mm, struct de
}
iommu_mm->pasid = pasid;
INIT_LIST_HEAD(&iommu_mm->sva_domains);
- INIT_LIST_HEAD(&iommu_mm->sva_handles);
/*
* Make sure the write to mm->iommu_mm is not reordered in front of
* initialization to iommu_mm fields. If it does, readers may see a
@@ -69,11 +68,16 @@ static struct iommu_mm_data *iommu_alloc_mm_data(struct mm_struct *mm, struct de
*/
struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
{
+ struct iommu_group *group = dev->iommu_group;
+ struct iommu_attach_handle *attach_handle;
struct iommu_mm_data *iommu_mm;
struct iommu_domain *domain;
struct iommu_sva *handle;
int ret;

+ if (!group)
+ return ERR_PTR(-ENODEV);
+
mutex_lock(&iommu_sva_lock);

/* Allocate mm->pasid if necessary. */
@@ -83,12 +87,13 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm
goto out_unlock;
}

- list_for_each_entry(handle, &mm->iommu_mm->sva_handles, handle_item) {
- if (handle->dev == dev) {
- refcount_inc(&handle->users);
- mutex_unlock(&iommu_sva_lock);
- return handle;
- }
+ /* A bond already exists, just take a reference`. */
+ attach_handle = iommu_attach_handle_get(group, iommu_mm->pasid, IOMMU_DOMAIN_SVA);
+ if (!IS_ERR(attach_handle)) {
+ handle = container_of(attach_handle, struct iommu_sva, handle);
+ refcount_inc(&handle->users);
+ mutex_unlock(&iommu_sva_lock);
+ return handle;
}

handle = kzalloc(sizeof(*handle), GFP_KERNEL);
@@ -125,7 +130,6 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm

out:
refcount_set(&handle->users, 1);
- list_add(&handle->handle_item, &mm->iommu_mm->sva_handles);
mutex_unlock(&iommu_sva_lock);
handle->dev = dev;
return handle;
@@ -159,7 +163,6 @@ void iommu_sva_unbind_device(struct iommu_sva *handle)
mutex_unlock(&iommu_sva_lock);
return;
}
- list_del(&handle->handle_item);

iommu_detach_device_pasid(domain, dev, iommu_mm->pasid);
if (--domain->users == 0) {
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index efd281ddfa63..0263814cba6b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3483,3 +3483,34 @@ void iommu_free_global_pasid(ioasid_t pasid)
ida_free(&iommu_global_pasid_ida, pasid);
}
EXPORT_SYMBOL_GPL(iommu_free_global_pasid);
+
+/**
+ * iommu_attach_handle_get - Return the attach handle
+ * @group: the iommu group that domain was attached to
+ * @pasid: the pasid within the group
+ * @type: matched domain type, 0 for any match
+ *
+ * Return handle or ERR_PTR(-ENOENT) on none, ERR_PTR(-EBUSY) on mismatch.
+ *
+ * Return the attach handle to the caller. The life cycle of an iommu attach
+ * handle is from the time when the domain is attached to the time when the
+ * domain is detached. Callers are required to synchronize the call of
+ * iommu_attach_handle_get() with domain attachment and detachment. The attach
+ * handle can only be used during its life cycle.
+ */
+struct iommu_attach_handle *
+iommu_attach_handle_get(struct iommu_group *group, ioasid_t pasid, unsigned int type)
+{
+ struct iommu_attach_handle *handle;
+
+ xa_lock(&group->pasid_array);
+ handle = xa_load(&group->pasid_array, pasid);
+ if (!handle)
+ handle = ERR_PTR(-ENOENT);
+ else if (type && handle->domain->type != type)
+ handle = ERR_PTR(-EBUSY);
+ xa_unlock(&group->pasid_array);
+
+ return handle;
+}
+EXPORT_SYMBOL_NS_GPL(iommu_attach_handle_get, IOMMUFD_INTERNAL);
--
2.34.1


2024-05-27 04:08:41

by Lu Baolu

[permalink] [raw]
Subject: [PATCH v6 03/10] iommu: Add attach handle to struct iopf_group

Previously, the domain that a page fault targets is stored in an
iopf_group, which represents a minimal set of page faults. With the
introduction of attach handle, replace the domain with the handle
so that the fault handler can obtain more information as needed
when handling the faults.

iommu_report_device_fault() is currently used for SVA page faults,
which handles the page fault in an internal cycle. The domain is retrieved
with iommu_get_domain_for_dev_pasid() if the pasid in the fault message
is valid. This doesn't work in IOMMUFD case, where if the pasid table of
a device is wholly managed by user space, there is no domain attached to
the PASID of the device, and all page faults are forwarded through a
NESTING domain attaching to RID.

Add a new IOMMU capability flag, IOMMU_CAP_USER_IOASID_TABLE, which
indicates if the IOMMU driver supports user-managed PASID tables. In the
iopf deliver path, if no attach handle found for the iopf PASID, roll
back to RID domain when the IOMMU driver supports this capability.

iommu_get_domain_for_dev_pasid() is no longer used and can be removed.

Signed-off-by: Lu Baolu <[email protected]>
---
include/linux/iommu.h | 18 +++++-------
drivers/iommu/io-pgfault.c | 59 +++++++++++++++++++++-----------------
drivers/iommu/iommu-sva.c | 3 +-
drivers/iommu/iommu.c | 39 -------------------------
4 files changed, 41 insertions(+), 78 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 823fa3bcc2c6..4067ebdd6232 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -127,7 +127,7 @@ struct iopf_group {
/* list node for iommu_fault_param::faults */
struct list_head pending_node;
struct work_struct work;
- struct iommu_domain *domain;
+ struct iommu_attach_handle *attach_handle;
/* The device's fault data parameter. */
struct iommu_fault_param *fault_param;
};
@@ -249,6 +249,12 @@ enum iommu_cap {
*/
IOMMU_CAP_DEFERRED_FLUSH,
IOMMU_CAP_DIRTY_TRACKING, /* IOMMU supports dirty tracking */
+ /*
+ * IOMMU driver supports user-managed IOASID table. There is no
+ * user domain for each PASID and the I/O page faults are forwarded
+ * through the user domain attached to the device RID.
+ */
+ IOMMU_CAP_USER_IOASID_TABLE,
};

/* These are the possible reserved region types */
@@ -1064,9 +1070,6 @@ int iommu_attach_device_pasid(struct iommu_domain *domain,
struct iommu_attach_handle *handle);
void iommu_detach_device_pasid(struct iommu_domain *domain,
struct device *dev, ioasid_t pasid);
-struct iommu_domain *
-iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid,
- unsigned int type);
ioasid_t iommu_alloc_global_pasid(struct device *dev);
void iommu_free_global_pasid(ioasid_t pasid);
#else /* CONFIG_IOMMU_API */
@@ -1408,13 +1411,6 @@ static inline void iommu_detach_device_pasid(struct iommu_domain *domain,
{
}

-static inline struct iommu_domain *
-iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid,
- unsigned int type)
-{
- return NULL;
-}
-
static inline ioasid_t iommu_alloc_global_pasid(struct device *dev)
{
return IOMMU_PASID_INVALID;
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 06d78fcc79fd..c62fcb67ef02 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -59,30 +59,6 @@ void iopf_free_group(struct iopf_group *group)
}
EXPORT_SYMBOL_GPL(iopf_free_group);

-static struct iommu_domain *get_domain_for_iopf(struct device *dev,
- struct iommu_fault *fault)
-{
- struct iommu_domain *domain;
-
- if (fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) {
- domain = iommu_get_domain_for_dev_pasid(dev, fault->prm.pasid, 0);
- if (IS_ERR(domain))
- domain = NULL;
- } else {
- domain = iommu_get_domain_for_dev(dev);
- }
-
- if (!domain || !domain->iopf_handler) {
- dev_warn_ratelimited(dev,
- "iopf (pasid %d) without domain attached or handler installed\n",
- fault->prm.pasid);
-
- return NULL;
- }
-
- return domain;
-}
-
/* Non-last request of a group. Postpone until the last one. */
static int report_partial_fault(struct iommu_fault_param *fault_param,
struct iommu_fault *fault)
@@ -206,20 +182,49 @@ void iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
if (group == &abort_group)
goto err_abort;

- group->domain = get_domain_for_iopf(dev, fault);
- if (!group->domain)
+ if (fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) {
+ group->attach_handle = iommu_attach_handle_get(dev->iommu_group,
+ fault->prm.pasid,
+ 0);
+ if (IS_ERR(group->attach_handle)) {
+ if (!device_iommu_capable(dev, IOMMU_CAP_USER_IOASID_TABLE))
+ goto err_abort;
+
+ /*
+ * The iommu driver for this device supports user-
+ * managed PASID table. Therefore page faults for
+ * any PASID should go through the NESTING domain
+ * attached to the device RID.
+ */
+ group->attach_handle =
+ iommu_attach_handle_get(dev->iommu_group,
+ IOMMU_NO_PASID,
+ IOMMU_DOMAIN_NESTED);
+ if (IS_ERR(group->attach_handle))
+ goto err_abort;
+ }
+ } else {
+ group->attach_handle =
+ iommu_attach_handle_get(dev->iommu_group, IOMMU_NO_PASID, 0);
+ if (IS_ERR(group->attach_handle))
+ goto err_abort;
+ }
+
+ if (!group->attach_handle->domain->iopf_handler)
goto err_abort;

/*
* On success iopf_handler must call iopf_group_response() and
* iopf_free_group()
*/
- if (group->domain->iopf_handler(group))
+ if (group->attach_handle->domain->iopf_handler(group))
goto err_abort;

return;

err_abort:
+ dev_warn_ratelimited(dev, "iopf with pasid %d aborted\n",
+ fault->prm.pasid);
iopf_group_response(group, IOMMU_PAGE_RESP_FAILURE);
if (group == &abort_group)
__iopf_free_group(group);
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index e85f4ccc9dcc..36d2862941de 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -265,7 +265,8 @@ static void iommu_sva_handle_iopf(struct work_struct *work)
if (status != IOMMU_PAGE_RESP_SUCCESS)
break;

- status = iommu_sva_handle_mm(&iopf->fault, group->domain->mm);
+ status = iommu_sva_handle_mm(&iopf->fault,
+ group->attach_handle->domain->mm);
}

iopf_group_response(group, status);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 0263814cba6b..c506185a2fad 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3418,45 +3418,6 @@ void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev,
}
EXPORT_SYMBOL_GPL(iommu_detach_device_pasid);

-/*
- * iommu_get_domain_for_dev_pasid() - Retrieve domain for @pasid of @dev
- * @dev: the queried device
- * @pasid: the pasid of the device
- * @type: matched domain type, 0 for any match
- *
- * This is a variant of iommu_get_domain_for_dev(). It returns the existing
- * domain attached to pasid of a device. Callers must hold a lock around this
- * function, and both iommu_attach/detach_dev_pasid() whenever a domain of
- * type is being manipulated. This API does not internally resolve races with
- * attach/detach.
- *
- * Return: attached domain on success, NULL otherwise.
- */
-struct iommu_domain *iommu_get_domain_for_dev_pasid(struct device *dev,
- ioasid_t pasid,
- unsigned int type)
-{
- /* Caller must be a probed driver on dev */
- struct iommu_group *group = dev->iommu_group;
- struct iommu_attach_handle *handle;
- struct iommu_domain *domain = NULL;
-
- if (!group)
- return NULL;
-
- xa_lock(&group->pasid_array);
- handle = xa_load(&group->pasid_array, pasid);
- if (handle)
- domain = handle->domain;
-
- if (type && domain && domain->type != type)
- domain = NULL;
- xa_unlock(&group->pasid_array);
-
- return domain;
-}
-EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev_pasid);
-
ioasid_t iommu_alloc_global_pasid(struct device *dev)
{
int ret;
--
2.34.1


2024-05-27 04:09:12

by Lu Baolu

[permalink] [raw]
Subject: [PATCH v6 06/10] iommufd: Add iommufd fault object

An iommufd fault object provides an interface for delivering I/O page
faults to user space. These objects are created and destroyed by user
space, and they can be associated with or dissociated from hardware page
table objects during page table allocation or destruction.

User space interacts with the fault object through a file interface. This
interface offers a straightforward and efficient way for user space to
handle page faults. It allows user space to read fault messages
sequentially and respond to them by writing to the same file. The file
interface supports reading messages in poll mode, so it's recommended that
user space applications use io_uring to enhance read and write efficiency.

A fault object can be associated with any iopf-capable iommufd_hw_pgtable
during the pgtable's allocation. All I/O page faults triggered by devices
when accessing the I/O addresses of an iommufd_hw_pgtable are routed
through the fault object to user space. Similarly, user space's responses
to these page faults are routed back to the iommu device driver through
the same fault object.

Signed-off-by: Lu Baolu <[email protected]>
---
include/linux/iommu.h | 4 +
drivers/iommu/iommufd/iommufd_private.h | 30 ++++
include/uapi/linux/iommufd.h | 18 ++
drivers/iommu/io-pgfault.c | 2 +
drivers/iommu/iommufd/fault.c | 227 ++++++++++++++++++++++++
drivers/iommu/iommufd/main.c | 6 +
drivers/iommu/iommufd/Makefile | 1 +
7 files changed, 288 insertions(+)
create mode 100644 drivers/iommu/iommufd/fault.c

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 4067ebdd6232..16b3a2da91ef 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -124,12 +124,16 @@ struct iopf_fault {
struct iopf_group {
struct iopf_fault last_fault;
struct list_head faults;
+ size_t fault_count;
/* list node for iommu_fault_param::faults */
struct list_head pending_node;
struct work_struct work;
struct iommu_attach_handle *attach_handle;
/* The device's fault data parameter. */
struct iommu_fault_param *fault_param;
+ /* Used by handler provider to hook the group on its own lists. */
+ struct list_head node;
+ u32 cookie;
};

/**
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 991f864d1f9b..c8a4519f1405 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -128,6 +128,7 @@ enum iommufd_object_type {
IOMMUFD_OBJ_HWPT_NESTED,
IOMMUFD_OBJ_IOAS,
IOMMUFD_OBJ_ACCESS,
+ IOMMUFD_OBJ_FAULT,
#ifdef CONFIG_IOMMUFD_TEST
IOMMUFD_OBJ_SELFTEST,
#endif
@@ -426,6 +427,35 @@ void iopt_remove_access(struct io_pagetable *iopt,
u32 iopt_access_list_id);
void iommufd_access_destroy_object(struct iommufd_object *obj);

+/*
+ * An iommufd_fault object represents an interface to deliver I/O page faults
+ * to the user space. These objects are created/destroyed by the user space and
+ * associated with hardware page table objects during page-table allocation.
+ */
+struct iommufd_fault {
+ struct iommufd_object obj;
+ struct iommufd_ctx *ictx;
+ struct file *filep;
+
+ /* The lists of outstanding faults protected by below mutex. */
+ struct mutex mutex;
+ struct list_head deliver;
+ struct xarray response;
+
+ struct wait_queue_head wait_queue;
+};
+
+struct iommufd_attach_handle {
+ struct iommu_attach_handle handle;
+ struct iommufd_device *idev;
+};
+
+/* Convert an iommu attach handle to iommufd handle. */
+#define to_iommufd_handle(hdl) container_of(hdl, struct iommufd_attach_handle, handle)
+
+int iommufd_fault_alloc(struct iommufd_ucmd *ucmd);
+void iommufd_fault_destroy(struct iommufd_object *obj);
+
#ifdef CONFIG_IOMMUFD_TEST
int iommufd_test(struct iommufd_ucmd *ucmd);
void iommufd_selftest_destroy(struct iommufd_object *obj);
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 2f34d66436fb..eba452d4344e 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -50,6 +50,7 @@ enum {
IOMMUFD_CMD_HWPT_SET_DIRTY_TRACKING,
IOMMUFD_CMD_HWPT_GET_DIRTY_BITMAP,
IOMMUFD_CMD_HWPT_INVALIDATE,
+ IOMMUFD_CMD_FAULT_QUEUE_ALLOC,
};

/**
@@ -788,4 +789,21 @@ struct iommu_hwpt_page_response {
__u32 cookie;
__u32 reserved;
};
+
+/**
+ * struct iommu_fault_alloc - ioctl(IOMMU_FAULT_QUEUE_ALLOC)
+ * @size: sizeof(struct iommu_fault_alloc)
+ * @flags: Must be 0
+ * @out_fault_id: The ID of the new FAULT
+ * @out_fault_fd: The fd of the new FAULT
+ *
+ * Explicitly allocate a fault handling object.
+ */
+struct iommu_fault_alloc {
+ __u32 size;
+ __u32 flags;
+ __u32 out_fault_id;
+ __u32 out_fault_fd;
+};
+#define IOMMU_FAULT_QUEUE_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_FAULT_QUEUE_ALLOC)
#endif
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index c62fcb67ef02..a629d8a93614 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -110,6 +110,8 @@ static struct iopf_group *iopf_group_alloc(struct iommu_fault_param *iopf_param,
list_add(&group->pending_node, &iopf_param->faults);
mutex_unlock(&iopf_param->lock);

+ group->fault_count = list_count_nodes(&group->faults);
+
return group;
}

diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
new file mode 100644
index 000000000000..d0dafe761075
--- /dev/null
+++ b/drivers/iommu/iommufd/fault.c
@@ -0,0 +1,227 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (C) 2024 Intel Corporation
+ */
+#define pr_fmt(fmt) "iommufd: " fmt
+
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/iommufd.h>
+#include <linux/poll.h>
+#include <linux/anon_inodes.h>
+#include <uapi/linux/iommufd.h>
+
+#include "../iommu-priv.h"
+#include "iommufd_private.h"
+
+void iommufd_fault_destroy(struct iommufd_object *obj)
+{
+ struct iommufd_fault *fault = container_of(obj, struct iommufd_fault, obj);
+ struct iopf_group *group, *next;
+
+ /*
+ * The iommufd object's reference count is zero at this point.
+ * We can be confident that no other threads are currently
+ * accessing this pointer. Therefore, acquiring the mutex here
+ * is unnecessary.
+ */
+ list_for_each_entry_safe(group, next, &fault->deliver, node) {
+ list_del(&group->node);
+ iopf_group_response(group, IOMMU_PAGE_RESP_INVALID);
+ iopf_free_group(group);
+ }
+}
+
+static void iommufd_compose_fault_message(struct iommu_fault *fault,
+ struct iommu_hwpt_pgfault *hwpt_fault,
+ struct iommufd_device *idev,
+ u32 cookie)
+{
+ hwpt_fault->size = sizeof(*hwpt_fault);
+ hwpt_fault->flags = fault->prm.flags;
+ hwpt_fault->dev_id = idev->obj.id;
+ hwpt_fault->pasid = fault->prm.pasid;
+ hwpt_fault->grpid = fault->prm.grpid;
+ hwpt_fault->perm = fault->prm.perm;
+ hwpt_fault->addr = fault->prm.addr;
+ hwpt_fault->length = 0;
+ hwpt_fault->cookie = cookie;
+}
+
+static ssize_t iommufd_fault_fops_read(struct file *filep, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ size_t fault_size = sizeof(struct iommu_hwpt_pgfault);
+ struct iommufd_fault *fault = filep->private_data;
+ struct iommu_hwpt_pgfault data;
+ struct iommufd_device *idev;
+ struct iopf_group *group;
+ struct iopf_fault *iopf;
+ size_t done = 0;
+ int rc = 0;
+
+ if (*ppos || count % fault_size)
+ return -ESPIPE;
+
+ mutex_lock(&fault->mutex);
+ while (!list_empty(&fault->deliver) && count > done) {
+ group = list_first_entry(&fault->deliver,
+ struct iopf_group, node);
+
+ if (group->fault_count * fault_size > count - done)
+ break;
+
+ rc = xa_alloc(&fault->response, &group->cookie, group,
+ xa_limit_32b, GFP_KERNEL);
+ if (rc)
+ break;
+
+ idev = to_iommufd_handle(group->attach_handle)->idev;
+ list_for_each_entry(iopf, &group->faults, list) {
+ iommufd_compose_fault_message(&iopf->fault,
+ &data, idev,
+ group->cookie);
+ rc = copy_to_user(buf + done, &data, fault_size);
+ if (rc) {
+ xa_erase(&fault->response, group->cookie);
+ break;
+ }
+ done += fault_size;
+ }
+
+ list_del(&group->node);
+ }
+ mutex_unlock(&fault->mutex);
+
+ return done == 0 ? rc : done;
+}
+
+static ssize_t iommufd_fault_fops_write(struct file *filep, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ size_t response_size = sizeof(struct iommu_hwpt_page_response);
+ struct iommufd_fault *fault = filep->private_data;
+ struct iommu_hwpt_page_response response;
+ struct iopf_group *group;
+ size_t done = 0;
+ int rc = 0;
+
+ if (*ppos || count % response_size)
+ return -ESPIPE;
+
+ mutex_lock(&fault->mutex);
+ while (count > done) {
+ rc = copy_from_user(&response, buf + done, response_size);
+ if (rc)
+ break;
+
+ group = xa_erase(&fault->response, response.cookie);
+ if (!group) {
+ rc = -EINVAL;
+ break;
+ }
+
+ iopf_group_response(group, response.code);
+ iopf_free_group(group);
+ done += response_size;
+ }
+ mutex_unlock(&fault->mutex);
+
+ return done == 0 ? rc : done;
+}
+
+static __poll_t iommufd_fault_fops_poll(struct file *filep,
+ struct poll_table_struct *wait)
+{
+ struct iommufd_fault *fault = filep->private_data;
+ __poll_t pollflags = EPOLLOUT;
+
+ poll_wait(filep, &fault->wait_queue, wait);
+ mutex_lock(&fault->mutex);
+ if (!list_empty(&fault->deliver))
+ pollflags |= EPOLLIN | EPOLLRDNORM;
+ mutex_unlock(&fault->mutex);
+
+ return pollflags;
+}
+
+static int iommufd_fault_fops_release(struct inode *inode, struct file *filep)
+{
+ struct iommufd_fault *fault = filep->private_data;
+
+ iommufd_ctx_put(fault->ictx);
+ refcount_dec(&fault->obj.users);
+ return 0;
+}
+
+static const struct file_operations iommufd_fault_fops = {
+ .owner = THIS_MODULE,
+ .open = nonseekable_open,
+ .read = iommufd_fault_fops_read,
+ .write = iommufd_fault_fops_write,
+ .poll = iommufd_fault_fops_poll,
+ .release = iommufd_fault_fops_release,
+ .llseek = no_llseek,
+};
+
+int iommufd_fault_alloc(struct iommufd_ucmd *ucmd)
+{
+ struct iommu_fault_alloc *cmd = ucmd->cmd;
+ struct iommufd_fault *fault;
+ struct file *filep;
+ int fdno;
+ int rc;
+
+ if (cmd->flags)
+ return -EOPNOTSUPP;
+
+ fault = iommufd_object_alloc(ucmd->ictx, fault, IOMMUFD_OBJ_FAULT);
+ if (IS_ERR(fault))
+ return PTR_ERR(fault);
+
+ fault->ictx = ucmd->ictx;
+ INIT_LIST_HEAD(&fault->deliver);
+ xa_init_flags(&fault->response, XA_FLAGS_ALLOC1);
+ mutex_init(&fault->mutex);
+ init_waitqueue_head(&fault->wait_queue);
+
+ filep = anon_inode_getfile("[iommufd-pgfault]", &iommufd_fault_fops,
+ fault, O_RDWR);
+ if (IS_ERR(filep)) {
+ rc = PTR_ERR(filep);
+ goto out_abort;
+ }
+
+ refcount_inc(&fault->obj.users);
+ iommufd_ctx_get(fault->ictx);
+ fault->filep = filep;
+
+ fdno = get_unused_fd_flags(O_CLOEXEC);
+ if (fdno < 0) {
+ rc = fdno;
+ goto out_fput;
+ }
+
+ cmd->out_fault_id = fault->obj.id;
+ cmd->out_fault_fd = fdno;
+
+ rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
+ if (rc)
+ goto out_put_fdno;
+ iommufd_object_finalize(ucmd->ictx, &fault->obj);
+
+ fd_install(fdno, fault->filep);
+
+ return 0;
+out_put_fdno:
+ put_unused_fd(fdno);
+out_fput:
+ fput(filep);
+ refcount_dec(&fault->obj.users);
+ iommufd_ctx_put(fault->ictx);
+out_abort:
+ iommufd_object_abort_and_destroy(ucmd->ictx, &fault->obj);
+
+ return rc;
+}
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 39b32932c61e..961b2949c06f 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -332,6 +332,7 @@ union ucmd_buffer {
struct iommu_ioas_unmap unmap;
struct iommu_option option;
struct iommu_vfio_ioas vfio_ioas;
+ struct iommu_fault_alloc fault;
#ifdef CONFIG_IOMMUFD_TEST
struct iommu_test_cmd test;
#endif
@@ -381,6 +382,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
val64),
IOCTL_OP(IOMMU_VFIO_IOAS, iommufd_vfio_ioas, struct iommu_vfio_ioas,
__reserved),
+ IOCTL_OP(IOMMU_FAULT_QUEUE_ALLOC, iommufd_fault_alloc, struct iommu_fault_alloc,
+ out_fault_fd),
#ifdef CONFIG_IOMMUFD_TEST
IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last),
#endif
@@ -513,6 +516,9 @@ static const struct iommufd_object_ops iommufd_object_ops[] = {
.destroy = iommufd_hwpt_nested_destroy,
.abort = iommufd_hwpt_nested_abort,
},
+ [IOMMUFD_OBJ_FAULT] = {
+ .destroy = iommufd_fault_destroy,
+ },
#ifdef CONFIG_IOMMUFD_TEST
[IOMMUFD_OBJ_SELFTEST] = {
.destroy = iommufd_selftest_destroy,
diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile
index 34b446146961..b94a74366eed 100644
--- a/drivers/iommu/iommufd/Makefile
+++ b/drivers/iommu/iommufd/Makefile
@@ -6,6 +6,7 @@ iommufd-y := \
ioas.o \
main.o \
pages.o \
+ fault.o \
vfio_compat.o

iommufd-$(CONFIG_IOMMUFD_TEST) += selftest.o
--
2.34.1


2024-05-27 04:09:26

by Lu Baolu

[permalink] [raw]
Subject: [PATCH v6 05/10] iommufd: Add fault and response message definitions

iommu_hwpt_pgfaults represent fault messages that the userspace can
retrieve. Multiple iommu_hwpt_pgfaults might be put in an iopf group,
with the IOMMU_PGFAULT_FLAGS_LAST_PAGE flag set only for the last
iommu_hwpt_pgfault.

An iommu_hwpt_page_response is a response message that the userspace
should send to the kernel after finishing handling a group of fault
messages. The @dev_id, @pasid, and @grpid fields in the message
identify an outstanding iopf group for a device. The @cookie field,
which matches the cookie field of the last fault in the group, will
be used by the kernel to look up the pending message.

Signed-off-by: Lu Baolu <[email protected]>
---
include/uapi/linux/iommufd.h | 96 ++++++++++++++++++++++++++++++++++++
1 file changed, 96 insertions(+)

diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 1dfeaa2e649e..2f34d66436fb 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -692,4 +692,100 @@ struct iommu_hwpt_invalidate {
__u32 __reserved;
};
#define IOMMU_HWPT_INVALIDATE _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_INVALIDATE)
+
+/**
+ * enum iommu_hwpt_pgfault_flags - flags for struct iommu_hwpt_pgfault
+ * @IOMMU_PGFAULT_FLAGS_PASID_VALID: The pasid field of the fault data is
+ * valid.
+ * @IOMMU_PGFAULT_FLAGS_LAST_PAGE: It's the last fault of a fault group.
+ */
+enum iommu_hwpt_pgfault_flags {
+ IOMMU_PGFAULT_FLAGS_PASID_VALID = (1 << 0),
+ IOMMU_PGFAULT_FLAGS_LAST_PAGE = (1 << 1),
+};
+
+/**
+ * enum iommu_hwpt_pgfault_perm - perm bits for struct iommu_hwpt_pgfault
+ * @IOMMU_PGFAULT_PERM_READ: request for read permission
+ * @IOMMU_PGFAULT_PERM_WRITE: request for write permission
+ * @IOMMU_PGFAULT_PERM_EXEC: (PCIE 10.4.1) request with a PASID that has the
+ * Execute Requested bit set in PASID TLP Prefix.
+ * @IOMMU_PGFAULT_PERM_PRIV: (PCIE 10.4.1) request with a PASID that has the
+ * Privileged Mode Requested bit set in PASID TLP
+ * Prefix.
+ */
+enum iommu_hwpt_pgfault_perm {
+ IOMMU_PGFAULT_PERM_READ = (1 << 0),
+ IOMMU_PGFAULT_PERM_WRITE = (1 << 1),
+ IOMMU_PGFAULT_PERM_EXEC = (1 << 2),
+ IOMMU_PGFAULT_PERM_PRIV = (1 << 3),
+};
+
+/**
+ * struct iommu_hwpt_pgfault - iommu page fault data
+ * @size: sizeof(struct iommu_hwpt_pgfault)
+ * @flags: Combination of enum iommu_hwpt_pgfault_flags
+ * @dev_id: id of the originated device
+ * @pasid: Process Address Space ID
+ * @grpid: Page Request Group Index
+ * @perm: Combination of enum iommu_hwpt_pgfault_perm
+ * @addr: Fault address
+ * @length: a hint of how much data the requestor is expecting to fetch. For
+ * example, if the PRI initiator knows it is going to do a 10MB
+ * transfer, it could fill in 10MB and the OS could pre-fault in
+ * 10MB of IOVA. It's default to 0 if there's no such hint.
+ * @cookie: kernel-managed cookie identifying a group of fault messages. The
+ * cookie number encoded in the last page fault of the group should
+ * be echoed back in the response message.
+ */
+struct iommu_hwpt_pgfault {
+ __u32 size;
+ __u32 flags;
+ __u32 dev_id;
+ __u32 pasid;
+ __u32 grpid;
+ __u32 perm;
+ __u64 addr;
+ __u32 length;
+ __u32 cookie;
+};
+
+/**
+ * enum iommufd_page_response_code - Return status of fault handlers
+ * @IOMMUFD_PAGE_RESP_SUCCESS: Fault has been handled and the page tables
+ * populated, retry the access. This is the
+ * "Success" defined in PCI 10.4.2.1.
+ * @IOMMUFD_PAGE_RESP_INVALID: Could not handle this fault, don't retry the
+ * access. This is the "Invalid Request" in PCI
+ * 10.4.2.1.
+ * @IOMMUFD_PAGE_RESP_FAILURE: General error. Drop all subsequent faults from
+ * this device if possible. This is the "Response
+ * Failure" in PCI 10.4.2.1.
+ */
+enum iommufd_page_response_code {
+ IOMMUFD_PAGE_RESP_SUCCESS = 0,
+ IOMMUFD_PAGE_RESP_INVALID,
+ IOMMUFD_PAGE_RESP_FAILURE,
+};
+
+/**
+ * struct iommu_hwpt_page_response - IOMMU page fault response
+ * @size: sizeof(struct iommu_hwpt_page_response)
+ * @flags: Must be set to 0
+ * @dev_id: device ID of target device for the response
+ * @pasid: Process Address Space ID
+ * @grpid: Page Request Group Index
+ * @code: One of response code in enum iommufd_page_response_code.
+ * @cookie: The kernel-managed cookie reported in the fault message.
+ */
+struct iommu_hwpt_page_response {
+ __u32 size;
+ __u32 flags;
+ __u32 dev_id;
+ __u32 pasid;
+ __u32 grpid;
+ __u32 code;
+ __u32 cookie;
+ __u32 reserved;
+};
#endif
--
2.34.1


2024-05-27 04:09:37

by Lu Baolu

[permalink] [raw]
Subject: [PATCH v6 07/10] iommufd: Fault-capable hwpt attach/detach/replace

Add iopf-capable hw page table attach/detach/replace helpers. The pointer
to iommufd_device is stored in the domain attachment handle, so that it
can be echo'ed back in the iopf_group.

The iopf-capable hw page tables can only be attached to devices that
support the IOMMU_DEV_FEAT_IOPF feature. On the first attachment of an
iopf-capable hw_pagetable to the device, the IOPF feature is enabled on
the device. Similarly, after the last iopf-capable hwpt is detached from
the device, the IOPF feature is disabled on the device.

The current implementation allows a replacement between iopf-capable and
non-iopf-capable hw page tables. This matches the nested translation use
case, where a parent domain is attached by default and can then be
replaced with a nested user domain with iopf support.

Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/iommufd/iommufd_private.h | 12 ++
drivers/iommu/iommufd/device.c | 16 +-
drivers/iommu/iommufd/fault.c | 191 ++++++++++++++++++++++++
3 files changed, 216 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index c8a4519f1405..ba89c86e1af7 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -293,6 +293,7 @@ int iommufd_check_iova_range(struct io_pagetable *iopt,
struct iommufd_hw_pagetable {
struct iommufd_object obj;
struct iommu_domain *domain;
+ struct iommufd_fault *fault;
};

struct iommufd_hwpt_paging {
@@ -396,6 +397,9 @@ struct iommufd_device {
/* always the physical device */
struct device *dev;
bool enforce_cache_coherency;
+ /* protect iopf_enabled counter */
+ struct mutex iopf_lock;
+ unsigned int iopf_enabled;
};

static inline struct iommufd_device *
@@ -456,6 +460,14 @@ struct iommufd_attach_handle {
int iommufd_fault_alloc(struct iommufd_ucmd *ucmd);
void iommufd_fault_destroy(struct iommufd_object *obj);

+int iommufd_fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt,
+ struct iommufd_device *idev);
+void iommufd_fault_domain_detach_dev(struct iommufd_hw_pagetable *hwpt,
+ struct iommufd_device *idev);
+int iommufd_fault_domain_replace_dev(struct iommufd_device *idev,
+ struct iommufd_hw_pagetable *hwpt,
+ struct iommufd_hw_pagetable *old);
+
#ifdef CONFIG_IOMMUFD_TEST
int iommufd_test(struct iommufd_ucmd *ucmd);
void iommufd_selftest_destroy(struct iommufd_object *obj);
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 873630c111c1..63681d79b72d 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -215,6 +215,7 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
refcount_inc(&idev->obj.users);
/* igroup refcount moves into iommufd_device */
idev->igroup = igroup;
+ mutex_init(&idev->iopf_lock);

/*
* If the caller fails after this success it must call
@@ -376,7 +377,10 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
* attachment.
*/
if (list_empty(&idev->igroup->device_list)) {
- rc = iommu_attach_group(hwpt->domain, idev->igroup->group);
+ if (hwpt->fault)
+ rc = iommufd_fault_domain_attach_dev(hwpt, idev);
+ else
+ rc = iommu_attach_group(hwpt->domain, idev->igroup->group);
if (rc)
goto err_unresv;
idev->igroup->hwpt = hwpt;
@@ -402,7 +406,10 @@ iommufd_hw_pagetable_detach(struct iommufd_device *idev)
mutex_lock(&idev->igroup->lock);
list_del(&idev->group_item);
if (list_empty(&idev->igroup->device_list)) {
- iommu_detach_group(hwpt->domain, idev->igroup->group);
+ if (hwpt->fault)
+ iommufd_fault_domain_detach_dev(hwpt, idev);
+ else
+ iommu_detach_group(hwpt->domain, idev->igroup->group);
idev->igroup->hwpt = NULL;
}
if (hwpt_is_paging(hwpt))
@@ -497,7 +504,10 @@ iommufd_device_do_replace(struct iommufd_device *idev,
goto err_unlock;
}

- rc = iommu_group_replace_domain(igroup->group, hwpt->domain);
+ if (old_hwpt->fault || hwpt->fault)
+ rc = iommufd_fault_domain_replace_dev(idev, hwpt, old_hwpt);
+ else
+ rc = iommu_group_replace_domain(igroup->group, hwpt->domain);
if (rc)
goto err_unresv;

diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
index d0dafe761075..94dde1f57cfc 100644
--- a/drivers/iommu/iommufd/fault.c
+++ b/drivers/iommu/iommufd/fault.c
@@ -8,6 +8,7 @@
#include <linux/module.h>
#include <linux/mutex.h>
#include <linux/iommufd.h>
+#include <linux/pci.h>
#include <linux/poll.h>
#include <linux/anon_inodes.h>
#include <uapi/linux/iommufd.h>
@@ -15,6 +16,196 @@
#include "../iommu-priv.h"
#include "iommufd_private.h"

+static int iommufd_fault_iopf_enable(struct iommufd_device *idev)
+{
+ struct device *dev = idev->dev;
+ int ret;
+
+ /*
+ * Once we turn on PCI/PRI support for VF, the response failure code
+ * could not be forwarded to the hardware due to PRI being a shared
+ * resource between PF and VFs. There is no coordination for this
+ * shared capability. This waits for a vPRI reset to recover.
+ */
+ if (dev_is_pci(dev) && to_pci_dev(dev)->is_virtfn)
+ return -EINVAL;
+
+ mutex_lock(&idev->iopf_lock);
+ /* Device iopf has already been on. */
+ if (++idev->iopf_enabled > 1) {
+ mutex_unlock(&idev->iopf_lock);
+ return 0;
+ }
+
+ ret = iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_IOPF);
+ if (ret)
+ --idev->iopf_enabled;
+ mutex_unlock(&idev->iopf_lock);
+
+ return ret;
+}
+
+static void iommufd_fault_iopf_disable(struct iommufd_device *idev)
+{
+ mutex_lock(&idev->iopf_lock);
+ if (!WARN_ON(idev->iopf_enabled == 0)) {
+ if (--idev->iopf_enabled == 0)
+ iommu_dev_disable_feature(idev->dev, IOMMU_DEV_FEAT_IOPF);
+ }
+ mutex_unlock(&idev->iopf_lock);
+}
+
+static int __fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt,
+ struct iommufd_device *idev)
+{
+ struct iommufd_attach_handle *handle;
+ int ret;
+
+ handle = kzalloc(sizeof(*handle), GFP_KERNEL);
+ if (!handle)
+ return -ENOMEM;
+
+ handle->handle.domain = hwpt->domain;
+ handle->idev = idev;
+ ret = iommu_attach_group_handle(hwpt->domain, idev->igroup->group,
+ &handle->handle);
+ if (ret)
+ kfree(handle);
+
+ return ret;
+}
+
+int iommufd_fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt,
+ struct iommufd_device *idev)
+{
+ int ret;
+
+ if (!hwpt->fault)
+ return -EINVAL;
+
+ ret = iommufd_fault_iopf_enable(idev);
+ if (ret)
+ return ret;
+
+ ret = __fault_domain_attach_dev(hwpt, idev);
+ if (ret)
+ iommufd_fault_iopf_disable(idev);
+
+ return ret;
+}
+
+static void iommufd_auto_response_faults(struct iommufd_hw_pagetable *hwpt,
+ struct iommufd_attach_handle *handle)
+{
+ struct iommufd_fault *fault = hwpt->fault;
+ struct iopf_group *group, *next;
+ unsigned long index;
+
+ if (!fault)
+ return;
+
+ mutex_lock(&fault->mutex);
+ list_for_each_entry_safe(group, next, &fault->deliver, node) {
+ if (group->attach_handle != &handle->handle)
+ continue;
+ list_del(&group->node);
+ iopf_group_response(group, IOMMU_PAGE_RESP_INVALID);
+ iopf_free_group(group);
+ }
+
+ xa_for_each(&fault->response, index, group) {
+ if (group->attach_handle != &handle->handle)
+ continue;
+ xa_erase(&fault->response, index);
+ iopf_group_response(group, IOMMU_PAGE_RESP_INVALID);
+ iopf_free_group(group);
+ }
+ mutex_unlock(&fault->mutex);
+}
+
+static struct iommufd_attach_handle *
+iommufd_device_get_attach_handle(struct iommufd_device *idev)
+{
+ struct iommu_attach_handle *handle;
+
+ handle = iommu_attach_handle_get(idev->igroup->group, IOMMU_NO_PASID, 0);
+ if (!handle)
+ return NULL;
+
+ return to_iommufd_handle(handle);
+}
+
+void iommufd_fault_domain_detach_dev(struct iommufd_hw_pagetable *hwpt,
+ struct iommufd_device *idev)
+{
+ struct iommufd_attach_handle *handle;
+
+ handle = iommufd_device_get_attach_handle(idev);
+ iommu_detach_group_handle(hwpt->domain, idev->igroup->group);
+ iommufd_auto_response_faults(hwpt, handle);
+ iommufd_fault_iopf_disable(idev);
+ kfree(handle);
+}
+
+static int __fault_domain_replace_dev(struct iommufd_device *idev,
+ struct iommufd_hw_pagetable *hwpt,
+ struct iommufd_hw_pagetable *old)
+{
+ struct iommufd_attach_handle *handle, *curr = NULL;
+ int ret;
+
+ if (old->fault)
+ curr = iommufd_device_get_attach_handle(idev);
+
+ if (hwpt->fault) {
+ handle = kzalloc(sizeof(*handle), GFP_KERNEL);
+ if (!handle)
+ return -ENOMEM;
+
+ handle->handle.domain = hwpt->domain;
+ handle->idev = idev;
+ ret = iommu_replace_group_handle(idev->igroup->group,
+ hwpt->domain, &handle->handle);
+ } else {
+ ret = iommu_replace_group_handle(idev->igroup->group,
+ hwpt->domain, NULL);
+ }
+
+ if (!ret && curr) {
+ iommufd_auto_response_faults(old, curr);
+ kfree(curr);
+ }
+
+ return ret;
+}
+
+int iommufd_fault_domain_replace_dev(struct iommufd_device *idev,
+ struct iommufd_hw_pagetable *hwpt,
+ struct iommufd_hw_pagetable *old)
+{
+ bool iopf_off = !hwpt->fault && old->fault;
+ bool iopf_on = hwpt->fault && !old->fault;
+ int ret;
+
+ if (iopf_on) {
+ ret = iommufd_fault_iopf_enable(idev);
+ if (ret)
+ return ret;
+ }
+
+ ret = __fault_domain_replace_dev(idev, hwpt, old);
+ if (ret) {
+ if (iopf_on)
+ iommufd_fault_iopf_disable(idev);
+ return ret;
+ }
+
+ if (iopf_off)
+ iommufd_fault_iopf_disable(idev);
+
+ return 0;
+}
+
void iommufd_fault_destroy(struct iommufd_object *obj)
{
struct iommufd_fault *fault = container_of(obj, struct iommufd_fault, obj);
--
2.34.1


2024-05-27 04:09:57

by Lu Baolu

[permalink] [raw]
Subject: [PATCH v6 08/10] iommufd: Associate fault object with iommufd_hw_pgtable

When allocating a user iommufd_hw_pagetable, the user space is allowed to
associate a fault object with the hw_pagetable by specifying the fault
object ID in the page table allocation data and setting the
IOMMU_HWPT_FAULT_ID_VALID flag bit.

On a successful return of hwpt allocation, the user can retrieve and
respond to page faults by reading and writing the file interface of the
fault object.

Once a fault object has been associated with a hwpt, the hwpt is
iopf-capable, indicated by hwpt->fault is non NULL. Attaching,
detaching, or replacing an iopf-capable hwpt to an RID or PASID will
differ from those that are not iopf-capable.

Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/iommufd/iommufd_private.h | 9 ++++++
include/uapi/linux/iommufd.h | 8 +++++
drivers/iommu/iommufd/fault.c | 17 ++++++++++
drivers/iommu/iommufd/hw_pagetable.c | 41 +++++++++++++++++++------
4 files changed, 66 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index ba89c86e1af7..db50881e76f6 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -457,8 +457,17 @@ struct iommufd_attach_handle {
/* Convert an iommu attach handle to iommufd handle. */
#define to_iommufd_handle(hdl) container_of(hdl, struct iommufd_attach_handle, handle)

+static inline struct iommufd_fault *
+iommufd_get_fault(struct iommufd_ucmd *ucmd, u32 id)
+{
+ return container_of(iommufd_get_object(ucmd->ictx, id,
+ IOMMUFD_OBJ_FAULT),
+ struct iommufd_fault, obj);
+}
+
int iommufd_fault_alloc(struct iommufd_ucmd *ucmd);
void iommufd_fault_destroy(struct iommufd_object *obj);
+int iommufd_fault_iopf_handler(struct iopf_group *group);

int iommufd_fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt,
struct iommufd_device *idev);
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index eba452d4344e..5391db3a7180 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -357,10 +357,13 @@ struct iommu_vfio_ioas {
* the parent HWPT in a nesting configuration.
* @IOMMU_HWPT_ALLOC_DIRTY_TRACKING: Dirty tracking support for device IOMMU is
* enforced on device attachment
+ * @IOMMU_HWPT_FAULT_ID_VALID: The fault_id field of hwpt allocation data is
+ * valid.
*/
enum iommufd_hwpt_alloc_flags {
IOMMU_HWPT_ALLOC_NEST_PARENT = 1 << 0,
IOMMU_HWPT_ALLOC_DIRTY_TRACKING = 1 << 1,
+ IOMMU_HWPT_FAULT_ID_VALID = 1 << 2,
};

/**
@@ -412,6 +415,9 @@ enum iommu_hwpt_data_type {
* @data_type: One of enum iommu_hwpt_data_type
* @data_len: Length of the type specific data
* @data_uptr: User pointer to the type specific data
+ * @fault_id: The ID of IOMMUFD_FAULT object. Valid only if flags field of
+ * IOMMU_HWPT_FAULT_ID_VALID is set.
+ * @__reserved2: Padding to 64-bit alignment. Must be 0.
*
* Explicitly allocate a hardware page table object. This is the same object
* type that is returned by iommufd_device_attach() and represents the
@@ -442,6 +448,8 @@ struct iommu_hwpt_alloc {
__u32 data_type;
__u32 data_len;
__aligned_u64 data_uptr;
+ __u32 fault_id;
+ __u32 __reserved2;
};
#define IOMMU_HWPT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_ALLOC)

diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
index 94dde1f57cfc..dd07e3b1b4c1 100644
--- a/drivers/iommu/iommufd/fault.c
+++ b/drivers/iommu/iommufd/fault.c
@@ -416,3 +416,20 @@ int iommufd_fault_alloc(struct iommufd_ucmd *ucmd)

return rc;
}
+
+int iommufd_fault_iopf_handler(struct iopf_group *group)
+{
+ struct iommufd_hw_pagetable *hwpt;
+ struct iommufd_fault *fault;
+
+ hwpt = group->attach_handle->domain->fault_data;
+ fault = hwpt->fault;
+
+ mutex_lock(&fault->mutex);
+ list_add_tail(&group->node, &fault->deliver);
+ mutex_unlock(&fault->mutex);
+
+ wake_up_interruptible(&fault->wait_queue);
+
+ return 0;
+}
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 33d142f8057d..14a32bf80549 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -8,6 +8,15 @@
#include "../iommu-priv.h"
#include "iommufd_private.h"

+static void __iommufd_hwpt_destroy(struct iommufd_hw_pagetable *hwpt)
+{
+ if (hwpt->domain)
+ iommu_domain_free(hwpt->domain);
+
+ if (hwpt->fault)
+ iommufd_put_object(hwpt->fault->ictx, &hwpt->fault->obj);
+}
+
void iommufd_hwpt_paging_destroy(struct iommufd_object *obj)
{
struct iommufd_hwpt_paging *hwpt_paging =
@@ -22,9 +31,7 @@ void iommufd_hwpt_paging_destroy(struct iommufd_object *obj)
hwpt_paging->common.domain);
}

- if (hwpt_paging->common.domain)
- iommu_domain_free(hwpt_paging->common.domain);
-
+ __iommufd_hwpt_destroy(&hwpt_paging->common);
refcount_dec(&hwpt_paging->ioas->obj.users);
}

@@ -49,9 +56,7 @@ void iommufd_hwpt_nested_destroy(struct iommufd_object *obj)
struct iommufd_hwpt_nested *hwpt_nested =
container_of(obj, struct iommufd_hwpt_nested, common.obj);

- if (hwpt_nested->common.domain)
- iommu_domain_free(hwpt_nested->common.domain);
-
+ __iommufd_hwpt_destroy(&hwpt_nested->common);
refcount_dec(&hwpt_nested->parent->common.obj.users);
}

@@ -213,7 +218,8 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx,
struct iommufd_hw_pagetable *hwpt;
int rc;

- if (flags || !user_data->len || !ops->domain_alloc_user)
+ if ((flags & ~IOMMU_HWPT_FAULT_ID_VALID) ||
+ !user_data->len || !ops->domain_alloc_user)
return ERR_PTR(-EOPNOTSUPP);
if (parent->auto_domain || !parent->nest_parent)
return ERR_PTR(-EINVAL);
@@ -227,7 +233,8 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx,
refcount_inc(&parent->common.obj.users);
hwpt_nested->parent = parent;

- hwpt->domain = ops->domain_alloc_user(idev->dev, flags,
+ hwpt->domain = ops->domain_alloc_user(idev->dev,
+ flags & ~IOMMU_HWPT_FAULT_ID_VALID,
parent->common.domain, user_data);
if (IS_ERR(hwpt->domain)) {
rc = PTR_ERR(hwpt->domain);
@@ -308,13 +315,29 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
goto out_put_pt;
}

+ if (cmd->flags & IOMMU_HWPT_FAULT_ID_VALID) {
+ struct iommufd_fault *fault;
+
+ fault = iommufd_get_fault(ucmd, cmd->fault_id);
+ if (IS_ERR(fault)) {
+ rc = PTR_ERR(fault);
+ goto out_hwpt;
+ }
+ hwpt->fault = fault;
+ hwpt->domain->iopf_handler = iommufd_fault_iopf_handler;
+ hwpt->domain->fault_data = hwpt;
+ }
+
cmd->out_hwpt_id = hwpt->obj.id;
rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
if (rc)
- goto out_hwpt;
+ goto out_put_fault;
iommufd_object_finalize(ucmd->ictx, &hwpt->obj);
goto out_unlock;

+out_put_fault:
+ if (cmd->flags & IOMMU_HWPT_FAULT_ID_VALID)
+ iommufd_put_object(ucmd->ictx, &hwpt->fault->obj);
out_hwpt:
iommufd_object_abort_and_destroy(ucmd->ictx, &hwpt->obj);
out_unlock:
--
2.34.1


2024-05-27 04:10:29

by Lu Baolu

[permalink] [raw]
Subject: [PATCH v6 10/10] iommufd/selftest: Add coverage for IOPF test

Extend the selftest tool to add coverage of testing IOPF handling. This
would include the following tests:

- Allocating and destroying an iommufd fault object.
- Allocating and destroying an IOPF-capable HWPT.
- Attaching/detaching/replacing an IOPF-capable HWPT on a device.
- Triggering an IOPF on the mock device.
- Retrieving and responding to the IOPF through the file interface.

Signed-off-by: Lu Baolu <[email protected]>
---
tools/testing/selftests/iommu/iommufd_utils.h | 84 +++++++++++++++++--
tools/testing/selftests/iommu/iommufd.c | 18 ++++
.../selftests/iommu/iommufd_fail_nth.c | 2 +-
3 files changed, 98 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 8d2b46b2114d..cb913ec89f96 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -153,7 +153,7 @@ static int _test_cmd_mock_domain_replace(int fd, __u32 stdev_id, __u32 pt_id,
EXPECT_ERRNO(_errno, _test_cmd_mock_domain_replace(self->fd, stdev_id, \
pt_id, NULL))

-static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id,
+static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, __u32 ft_id,
__u32 flags, __u32 *hwpt_id, __u32 data_type,
void *data, size_t data_len)
{
@@ -165,6 +165,7 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id,
.data_type = data_type,
.data_len = data_len,
.data_uptr = (uint64_t)data,
+ .fault_id = ft_id,
};
int ret;

@@ -177,24 +178,30 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id,
}

#define test_cmd_hwpt_alloc(device_id, pt_id, flags, hwpt_id) \
- ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \
+ ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, 0, flags, \
hwpt_id, IOMMU_HWPT_DATA_NONE, NULL, \
0))
#define test_err_hwpt_alloc(_errno, device_id, pt_id, flags, hwpt_id) \
EXPECT_ERRNO(_errno, _test_cmd_hwpt_alloc( \
- self->fd, device_id, pt_id, flags, \
+ self->fd, device_id, pt_id, 0, flags, \
hwpt_id, IOMMU_HWPT_DATA_NONE, NULL, 0))

#define test_cmd_hwpt_alloc_nested(device_id, pt_id, flags, hwpt_id, \
data_type, data, data_len) \
- ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \
+ ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, 0, flags, \
hwpt_id, data_type, data, data_len))
#define test_err_hwpt_alloc_nested(_errno, device_id, pt_id, flags, hwpt_id, \
data_type, data, data_len) \
EXPECT_ERRNO(_errno, \
- _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \
+ _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, 0, flags, \
hwpt_id, data_type, data, data_len))

+#define test_cmd_hwpt_alloc_iopf(device_id, pt_id, fault_id, flags, hwpt_id, \
+ data_type, data, data_len) \
+ ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, fault_id, \
+ flags, hwpt_id, data_type, data, \
+ data_len))
+
#define test_cmd_hwpt_check_iotlb(hwpt_id, iotlb_id, expected) \
({ \
struct iommu_test_cmd test_cmd = { \
@@ -684,3 +691,70 @@ static int _test_cmd_get_hw_info(int fd, __u32 device_id, void *data,

#define test_cmd_get_hw_capabilities(device_id, caps, mask) \
ASSERT_EQ(0, _test_cmd_get_hw_info(self->fd, device_id, NULL, 0, &caps))
+
+static int _test_ioctl_fault_alloc(int fd, __u32 *fault_id, __u32 *fault_fd)
+{
+ struct iommu_fault_alloc cmd = {
+ .size = sizeof(cmd),
+ };
+ int ret;
+
+ ret = ioctl(fd, IOMMU_FAULT_QUEUE_ALLOC, &cmd);
+ if (ret)
+ return ret;
+ *fault_id = cmd.out_fault_id;
+ *fault_fd = cmd.out_fault_fd;
+ return 0;
+}
+
+#define test_ioctl_fault_alloc(fault_id, fault_fd) \
+ ({ \
+ ASSERT_EQ(0, _test_ioctl_fault_alloc(self->fd, fault_id, \
+ fault_fd)); \
+ ASSERT_NE(0, *(fault_id)); \
+ ASSERT_NE(0, *(fault_fd)); \
+ })
+
+static int _test_cmd_trigger_iopf(int fd, __u32 device_id, __u32 fault_fd)
+{
+ struct iommu_test_cmd trigger_iopf_cmd = {
+ .size = sizeof(trigger_iopf_cmd),
+ .op = IOMMU_TEST_OP_TRIGGER_IOPF,
+ .trigger_iopf = {
+ .dev_id = device_id,
+ .pasid = 0x1,
+ .grpid = 0x2,
+ .perm = IOMMU_PGFAULT_PERM_READ | IOMMU_PGFAULT_PERM_WRITE,
+ .addr = 0xdeadbeaf,
+ },
+ };
+ struct iommu_hwpt_page_response response = {
+ .size = sizeof(struct iommu_hwpt_page_response),
+ .dev_id = device_id,
+ .pasid = 0x1,
+ .grpid = 0x2,
+ .code = 0,
+ };
+ struct iommu_hwpt_pgfault fault = {};
+ ssize_t bytes;
+ int ret;
+
+ ret = ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_TRIGGER_IOPF), &trigger_iopf_cmd);
+ if (ret)
+ return ret;
+
+ bytes = read(fault_fd, &fault, sizeof(fault));
+ if (bytes <= 0)
+ return -EIO;
+
+ response.cookie = fault.cookie;
+
+ bytes = write(fault_fd, &response, sizeof(response));
+ if (bytes <= 0)
+ return -EIO;
+
+ return 0;
+}
+
+#define test_cmd_trigger_iopf(device_id, fault_fd) \
+ ASSERT_EQ(0, _test_cmd_trigger_iopf(self->fd, device_id, fault_fd))
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index edf1c99c9936..5b0169875a4d 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -279,6 +279,9 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested)
uint32_t parent_hwpt_id = 0;
uint32_t parent_hwpt_id_not_work = 0;
uint32_t test_hwpt_id = 0;
+ uint32_t iopf_hwpt_id;
+ uint32_t fault_id;
+ uint32_t fault_fd;

if (self->device_id) {
/* Negative tests */
@@ -326,6 +329,7 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested)
sizeof(data));

/* Allocate two nested hwpts sharing one common parent hwpt */
+ test_ioctl_fault_alloc(&fault_id, &fault_fd);
test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id, 0,
&nested_hwpt_id[0],
IOMMU_HWPT_DATA_SELFTEST, &data,
@@ -334,6 +338,10 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested)
&nested_hwpt_id[1],
IOMMU_HWPT_DATA_SELFTEST, &data,
sizeof(data));
+ test_cmd_hwpt_alloc_iopf(self->device_id, parent_hwpt_id, fault_id,
+ IOMMU_HWPT_FAULT_ID_VALID, &iopf_hwpt_id,
+ IOMMU_HWPT_DATA_SELFTEST, &data,
+ sizeof(data));
test_cmd_hwpt_check_iotlb_all(nested_hwpt_id[0],
IOMMU_TEST_IOTLB_DEFAULT);
test_cmd_hwpt_check_iotlb_all(nested_hwpt_id[1],
@@ -504,14 +512,24 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested)
_test_ioctl_destroy(self->fd, nested_hwpt_id[1]));
test_ioctl_destroy(nested_hwpt_id[0]);

+ /* Switch from nested_hwpt_id[1] to iopf_hwpt_id */
+ test_cmd_mock_domain_replace(self->stdev_id, iopf_hwpt_id);
+ EXPECT_ERRNO(EBUSY,
+ _test_ioctl_destroy(self->fd, iopf_hwpt_id));
+ /* Trigger an IOPF on the device */
+ test_cmd_trigger_iopf(self->device_id, fault_fd);
+
/* Detach from nested_hwpt_id[1] and destroy it */
test_cmd_mock_domain_replace(self->stdev_id, parent_hwpt_id);
test_ioctl_destroy(nested_hwpt_id[1]);
+ test_ioctl_destroy(iopf_hwpt_id);

/* Detach from the parent hw_pagetable and destroy it */
test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id);
test_ioctl_destroy(parent_hwpt_id);
test_ioctl_destroy(parent_hwpt_id_not_work);
+ close(fault_fd);
+ test_ioctl_destroy(fault_id);
} else {
test_err_hwpt_alloc(ENOENT, self->device_id, self->ioas_id, 0,
&parent_hwpt_id);
diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c
index f590417cd67a..c5d5e69452b0 100644
--- a/tools/testing/selftests/iommu/iommufd_fail_nth.c
+++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c
@@ -615,7 +615,7 @@ TEST_FAIL_NTH(basic_fail_nth, device)
if (_test_cmd_get_hw_info(self->fd, idev_id, &info, sizeof(info), NULL))
return -1;

- if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, 0, &hwpt_id,
+ if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, 0, 0, &hwpt_id,
IOMMU_HWPT_DATA_NONE, 0, 0))
return -1;

--
2.34.1


2024-05-27 06:41:03

by Lu Baolu

[permalink] [raw]
Subject: [PATCH v6 09/10] iommufd/selftest: Add IOPF support for mock device

Extend the selftest mock device to support generating and responding to
an IOPF. Also add an ioctl interface to userspace applications to trigger
the IOPF on the mock device. This would allow userspace applications to
test the IOMMUFD's handling of IOPFs without having to rely on any real
hardware.

Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/iommufd/iommufd_test.h | 8 ++++
drivers/iommu/iommufd/selftest.c | 64 ++++++++++++++++++++++++++++
2 files changed, 72 insertions(+)

diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
index e854d3f67205..acbbba1c6671 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -22,6 +22,7 @@ enum {
IOMMU_TEST_OP_MOCK_DOMAIN_FLAGS,
IOMMU_TEST_OP_DIRTY,
IOMMU_TEST_OP_MD_CHECK_IOTLB,
+ IOMMU_TEST_OP_TRIGGER_IOPF,
};

enum {
@@ -127,6 +128,13 @@ struct iommu_test_cmd {
__u32 id;
__u32 iotlb;
} check_iotlb;
+ struct {
+ __u32 dev_id;
+ __u32 pasid;
+ __u32 grpid;
+ __u32 perm;
+ __u64 addr;
+ } trigger_iopf;
};
__u32 last;
};
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 7a2199470f31..431eea438226 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -494,6 +494,7 @@ static bool mock_domain_capable(struct device *dev, enum iommu_cap cap)

switch (cap) {
case IOMMU_CAP_CACHE_COHERENCY:
+ case IOMMU_CAP_USER_IOASID_TABLE:
return true;
case IOMMU_CAP_DIRTY_TRACKING:
return !(mdev->flags & MOCK_FLAGS_DEVICE_NO_DIRTY);
@@ -504,6 +505,8 @@ static bool mock_domain_capable(struct device *dev, enum iommu_cap cap)
return false;
}

+static struct iopf_queue *mock_iommu_iopf_queue;
+
static struct iommu_device mock_iommu_device = {
};

@@ -514,6 +517,29 @@ static struct iommu_device *mock_probe_device(struct device *dev)
return &mock_iommu_device;
}

+static void mock_domain_page_response(struct device *dev, struct iopf_fault *evt,
+ struct iommu_page_response *msg)
+{
+}
+
+static int mock_dev_enable_feat(struct device *dev, enum iommu_dev_features feat)
+{
+ if (feat != IOMMU_DEV_FEAT_IOPF || !mock_iommu_iopf_queue)
+ return -ENODEV;
+
+ return iopf_queue_add_device(mock_iommu_iopf_queue, dev);
+}
+
+static int mock_dev_disable_feat(struct device *dev, enum iommu_dev_features feat)
+{
+ if (feat != IOMMU_DEV_FEAT_IOPF || !mock_iommu_iopf_queue)
+ return -ENODEV;
+
+ iopf_queue_remove_device(mock_iommu_iopf_queue, dev);
+
+ return 0;
+}
+
static const struct iommu_ops mock_ops = {
/*
* IOMMU_DOMAIN_BLOCKED cannot be returned from def_domain_type()
@@ -529,6 +555,9 @@ static const struct iommu_ops mock_ops = {
.capable = mock_domain_capable,
.device_group = generic_device_group,
.probe_device = mock_probe_device,
+ .page_response = mock_domain_page_response,
+ .dev_enable_feat = mock_dev_enable_feat,
+ .dev_disable_feat = mock_dev_disable_feat,
.default_domain_ops =
&(struct iommu_domain_ops){
.free = mock_domain_free,
@@ -1375,6 +1404,31 @@ static int iommufd_test_dirty(struct iommufd_ucmd *ucmd, unsigned int mockpt_id,
return rc;
}

+static int iommufd_test_trigger_iopf(struct iommufd_ucmd *ucmd,
+ struct iommu_test_cmd *cmd)
+{
+ struct iopf_fault event = { };
+ struct iommufd_device *idev;
+
+ idev = iommufd_get_device(ucmd, cmd->trigger_iopf.dev_id);
+ if (IS_ERR(idev))
+ return PTR_ERR(idev);
+
+ event.fault.prm.flags = IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
+ if (cmd->trigger_iopf.pasid != IOMMU_NO_PASID)
+ event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
+ event.fault.type = IOMMU_FAULT_PAGE_REQ;
+ event.fault.prm.addr = cmd->trigger_iopf.addr;
+ event.fault.prm.pasid = cmd->trigger_iopf.pasid;
+ event.fault.prm.grpid = cmd->trigger_iopf.grpid;
+ event.fault.prm.perm = cmd->trigger_iopf.perm;
+
+ iommu_report_device_fault(idev->dev, &event);
+ iommufd_put_object(ucmd->ictx, &idev->obj);
+
+ return 0;
+}
+
void iommufd_selftest_destroy(struct iommufd_object *obj)
{
struct selftest_obj *sobj = container_of(obj, struct selftest_obj, obj);
@@ -1450,6 +1504,8 @@ int iommufd_test(struct iommufd_ucmd *ucmd)
cmd->dirty.page_size,
u64_to_user_ptr(cmd->dirty.uptr),
cmd->dirty.flags);
+ case IOMMU_TEST_OP_TRIGGER_IOPF:
+ return iommufd_test_trigger_iopf(ucmd, cmd);
default:
return -EOPNOTSUPP;
}
@@ -1491,6 +1547,9 @@ int __init iommufd_test_init(void)
&iommufd_mock_bus_type.nb);
if (rc)
goto err_sysfs;
+
+ mock_iommu_iopf_queue = iopf_queue_alloc("mock-iopfq");
+
return 0;

err_sysfs:
@@ -1506,6 +1565,11 @@ int __init iommufd_test_init(void)

void iommufd_test_exit(void)
{
+ if (mock_iommu_iopf_queue) {
+ iopf_queue_free(mock_iommu_iopf_queue);
+ mock_iommu_iopf_queue = NULL;
+ }
+
iommu_device_sysfs_remove(&mock_iommu_device);
iommu_device_unregister_bus(&mock_iommu_device,
&iommufd_mock_bus_type.bus,
--
2.34.1


2024-05-27 08:01:51

by Lu Baolu

[permalink] [raw]
Subject: [PATCH v6 04/10] iommu: Extend domain attach group with handle support

Unlike the SVA case where each PASID of a device has an SVA domain
attached to it, the I/O page faults are handled by the fault handler
of the SVA domain. The I/O page faults for a user page table might
be handled by the domain attached to RID or the domain attached to
the PASID, depending on whether the PASID table is managed by user
space or kernel. As a result, there is a need for the domain attach
group interfaces to have attach handle support. The attach handle
will be forwarded to the fault handler of the user domain.

Add some variants of the domain attaching group interfaces so that they
could support the attach handle and export them for use in IOMMUFD.

Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/iommu-priv.h | 8 +++
drivers/iommu/iommu.c | 99 ++++++++++++++++++++++++++++++++++++++
2 files changed, 107 insertions(+)

diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
index f1536a5ebb0d..c37801c32f33 100644
--- a/drivers/iommu/iommu-priv.h
+++ b/drivers/iommu/iommu-priv.h
@@ -31,4 +31,12 @@ void iommu_device_unregister_bus(struct iommu_device *iommu,
struct iommu_attach_handle *iommu_attach_handle_get(struct iommu_group *group,
ioasid_t pasid,
unsigned int type);
+int iommu_attach_group_handle(struct iommu_domain *domain,
+ struct iommu_group *group,
+ struct iommu_attach_handle *handle);
+void iommu_detach_group_handle(struct iommu_domain *domain,
+ struct iommu_group *group);
+int iommu_replace_group_handle(struct iommu_group *group,
+ struct iommu_domain *new_domain,
+ struct iommu_attach_handle *handle);
#endif /* __LINUX_IOMMU_PRIV_H */
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index c506185a2fad..78e29fc97236 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3475,3 +3475,102 @@ iommu_attach_handle_get(struct iommu_group *group, ioasid_t pasid, unsigned int
return handle;
}
EXPORT_SYMBOL_NS_GPL(iommu_attach_handle_get, IOMMUFD_INTERNAL);
+
+/**
+ * iommu_attach_group_handle - Attach an IOMMU domain to an IOMMU group
+ * @domain: IOMMU domain to attach
+ * @group: IOMMU group that will be attached
+ * @handle: attach handle
+ *
+ * Returns 0 on success and error code on failure.
+ *
+ * This is a variant of iommu_attach_group(). It allows the caller to provide
+ * an attach handle and use it when the domain is attached. This is currently
+ * used by IOMMUFD to deliver the I/O page faults.
+ */
+int iommu_attach_group_handle(struct iommu_domain *domain,
+ struct iommu_group *group,
+ struct iommu_attach_handle *handle)
+{
+ int ret;
+
+ mutex_lock(&group->mutex);
+ ret = xa_insert(&group->pasid_array, IOMMU_NO_PASID, handle, GFP_KERNEL);
+ if (ret)
+ goto err_unlock;
+
+ ret = __iommu_attach_group(domain, group);
+ if (ret)
+ goto err_erase;
+ mutex_unlock(&group->mutex);
+
+ return 0;
+err_erase:
+ xa_erase(&group->pasid_array, IOMMU_NO_PASID);
+err_unlock:
+ mutex_unlock(&group->mutex);
+ return ret;
+}
+EXPORT_SYMBOL_NS_GPL(iommu_attach_group_handle, IOMMUFD_INTERNAL);
+
+/**
+ * iommu_detach_group_handle - Detach an IOMMU domain from an IOMMU group
+ * @domain: IOMMU domain to attach
+ * @group: IOMMU group that will be attached
+ *
+ * Detach the specified IOMMU domain from the specified IOMMU group.
+ * It must be used in conjunction with iommu_attach_group_handle().
+ */
+void iommu_detach_group_handle(struct iommu_domain *domain,
+ struct iommu_group *group)
+{
+ mutex_lock(&group->mutex);
+ __iommu_group_set_core_domain(group);
+ xa_erase(&group->pasid_array, IOMMU_NO_PASID);
+ mutex_unlock(&group->mutex);
+}
+EXPORT_SYMBOL_NS_GPL(iommu_detach_group_handle, IOMMUFD_INTERNAL);
+
+/**
+ * iommu_replace_group_handle - replace the domain that a group is attached to
+ * @group: IOMMU group that will be attached to the new domain
+ * @new_domain: new IOMMU domain to replace with
+ * @handle: attach handle
+ *
+ * This is a variant of iommu_group_replace_domain(). It allows the caller to
+ * provide an attach handle for the new domain and use it when the domain is
+ * attached.
+ */
+int iommu_replace_group_handle(struct iommu_group *group,
+ struct iommu_domain *new_domain,
+ struct iommu_attach_handle *handle)
+{
+ struct iommu_domain *old_domain = group->domain;
+ void *curr;
+ int ret;
+
+ if (!new_domain)
+ return -EINVAL;
+
+ mutex_lock(&group->mutex);
+ ret = __iommu_group_set_domain(group, new_domain);
+ if (ret)
+ goto err_unlock;
+ xa_erase(&group->pasid_array, IOMMU_NO_PASID);
+ if (handle) {
+ curr = xa_store(&group->pasid_array, IOMMU_NO_PASID, handle, GFP_KERNEL);
+ if (xa_err(curr)) {
+ ret = xa_err(curr);
+ goto err_restore;
+ }
+ }
+ mutex_unlock(&group->mutex);
+
+ return 0;
+err_restore:
+ __iommu_group_set_domain(group, old_domain);
+err_unlock:
+ mutex_unlock(&group->mutex);
+ return ret;
+}
+EXPORT_SYMBOL_NS_GPL(iommu_replace_group_handle, IOMMUFD_INTERNAL);
--
2.34.1


2024-06-05 08:17:34

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v6 02/10] iommu: Remove sva handle list

> From: Lu Baolu <[email protected]>
> Sent: Monday, May 27, 2024 12:05 PM
>
> @@ -69,11 +68,16 @@ static struct iommu_mm_data
> *iommu_alloc_mm_data(struct mm_struct *mm, struct de
> */
> struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct
> mm_struct *mm)
> {
> + struct iommu_group *group = dev->iommu_group;
> + struct iommu_attach_handle *attach_handle;
> struct iommu_mm_data *iommu_mm;
> struct iommu_domain *domain;
> struct iommu_sva *handle;

it's confusing to have both 'handle' and 'attach_handle' in one function.

Clearer to rename 'handle' as 'sva'.

> int ret;
>
> + if (!group)
> + return ERR_PTR(-ENODEV);
> +
> mutex_lock(&iommu_sva_lock);
>
> /* Allocate mm->pasid if necessary. */
> @@ -83,12 +87,13 @@ struct iommu_sva *iommu_sva_bind_device(struct
> device *dev, struct mm_struct *mm
> goto out_unlock;
> }
>
> - list_for_each_entry(handle, &mm->iommu_mm->sva_handles,
> handle_item) {
> - if (handle->dev == dev) {
> - refcount_inc(&handle->users);
> - mutex_unlock(&iommu_sva_lock);
> - return handle;
> - }
> + /* A bond already exists, just take a reference`. */
> + attach_handle = iommu_attach_handle_get(group, iommu_mm-
> >pasid, IOMMU_DOMAIN_SVA);
> + if (!IS_ERR(attach_handle)) {
> + handle = container_of(attach_handle, struct iommu_sva,
> handle);
> + refcount_inc(&handle->users);
> + mutex_unlock(&iommu_sva_lock);
> + return handle;
> }

It's counter-intuitive to move forward when an error is returned.

e.g. if it's -EBUSY indicating the pasid already used for another type then
following attempts shouldn't been tried.

probably we should have iommu_attach_handle_get() return NULL
instead of -ENOENT when the entry is free? then:

attach_handle = iommu_attach_handle_get();
if (IS_ERR(attach_handle)) {
ret = PTR_ERR(attach_handle);
goto out_unlock;
} else if (attach_handle) {
/* matched and increase handle->users */
}

/* free entry falls through */

But then there is one potential issue with the design that 'handle'
can be optional in iommu_attach_device_pasid(). In that case
xa_load returns NULL then we cannot differentiate a real unused
PASID vs. one which has been attached w/o an handle.

Does it suggest that having the caller to always provide a handle
makes more sense?


2024-06-05 08:20:14

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v6 01/10] iommu: Introduce domain attachment handle

> From: Lu Baolu <[email protected]>
> Sent: Monday, May 27, 2024 12:05 PM
>
> @@ -99,7 +99,9 @@ struct iommu_sva *iommu_sva_bind_device(struct device
> *dev, struct mm_struct *mm
>
> /* Search for an existing domain. */
> list_for_each_entry(domain, &mm->iommu_mm->sva_domains, next) {
> - ret = iommu_attach_device_pasid(domain, dev, iommu_mm-
> >pasid);
> + handle->handle.domain = domain;
> + ret = iommu_attach_device_pasid(domain, dev, iommu_mm-
> >pasid,
> + &handle->handle);

move the setting of handle.domain into the helper?

> @@ -3382,11 +3383,9 @@ int iommu_attach_device_pasid(struct
> iommu_domain *domain,
> }
> }
>
> - curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain,
> GFP_KERNEL);
> - if (curr) {
> - ret = xa_err(curr) ? : -EBUSY;
> + ret = xa_insert(&group->pasid_array, pasid, handle, GFP_KERNEL);
> + if (ret)
> goto out_unlock;
> - }

this leads to a slightly different implication comparing to the old code.

the domain pointer was always tracked in the old code but now the handle
is optional.

if iommu core only needs to know whether a pasid has been attached (as
in this helper), it still works as xa_insert() will mark the entry as reserved
if handle==NULL and next xa_insert() at the same index will fail due to
the entry being reserved.

But if certain path (other than iopf) in the iommu core needs to know
the exact domain pointer then this change breaks it.

Anyway some explanation is welcomed here for why this change is safe.

> @@ -3414,7 +3413,7 @@ void iommu_detach_device_pasid(struct
> iommu_domain *domain, struct device *dev,
>
> mutex_lock(&group->mutex);
> __iommu_remove_group_pasid(group, pasid, domain);
> - WARN_ON(xa_erase(&group->pasid_array, pasid) != domain);
> + xa_erase(&group->pasid_array, pasid);

if the entry is valid do we want to keep the WARN_ON() upon handle->domain?

2024-06-05 08:29:04

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v6 03/10] iommu: Add attach handle to struct iopf_group

> From: Lu Baolu <[email protected]>
> Sent: Monday, May 27, 2024 12:05 PM
>
> @@ -249,6 +249,12 @@ enum iommu_cap {
> */
> IOMMU_CAP_DEFERRED_FLUSH,
> IOMMU_CAP_DIRTY_TRACKING, /* IOMMU supports dirty
> tracking */
> + /*
> + * IOMMU driver supports user-managed IOASID table. There is no
> + * user domain for each PASID and the I/O page faults are forwarded
> + * through the user domain attached to the device RID.
> + */
> + IOMMU_CAP_USER_IOASID_TABLE,
> };

Given all other context are around PASID let's just call it as USER_PASID_TABLE.

btw this goes differently from your plan in [1] which tried to introduce
different nesting types between Intel and other vendors.

I guess the reason might be that you want to avoid getting the handle
for RID on Intel platform in case of failing to find the handle for the
faulting PASID. and save a new domain type.

this looks fine to me but should be explained.

[1] https://lore.kernel.org/linux-iommu/[email protected]/


2024-06-05 08:31:53

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v6 05/10] iommufd: Add fault and response message definitions

> From: Lu Baolu <[email protected]>
> Sent: Monday, May 27, 2024 12:05 PM
>
> +
> +/**
> + * struct iommu_hwpt_page_response - IOMMU page fault response
> + * @size: sizeof(struct iommu_hwpt_page_response)
> + * @flags: Must be set to 0
> + * @dev_id: device ID of target device for the response
> + * @pasid: Process Address Space ID
> + * @grpid: Page Request Group Index
> + * @code: One of response code in enum iommufd_page_response_code.
> + * @cookie: The kernel-managed cookie reported in the fault message.
> + */
> +struct iommu_hwpt_page_response {
> + __u32 size;
> + __u32 flags;
> + __u32 dev_id;
> + __u32 pasid;
> + __u32 grpid;
> + __u32 code;
> + __u32 cookie;
> + __u32 reserved;
> +};

with the response queue per fault object we don't need all fields here,
e.g. dev_id, pasid, etc. Cookie is sufficient.

2024-06-06 05:35:52

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v6 01/10] iommu: Introduce domain attachment handle

On 6/5/24 4:02 PM, Tian, Kevin wrote:
>> From: Lu Baolu <[email protected]>
>> Sent: Monday, May 27, 2024 12:05 PM
>>
>> @@ -99,7 +99,9 @@ struct iommu_sva *iommu_sva_bind_device(struct device
>> *dev, struct mm_struct *mm
>>
>> /* Search for an existing domain. */
>> list_for_each_entry(domain, &mm->iommu_mm->sva_domains, next) {
>> - ret = iommu_attach_device_pasid(domain, dev, iommu_mm-
>>> pasid);
>> + handle->handle.domain = domain;
>> + ret = iommu_attach_device_pasid(domain, dev, iommu_mm-
>>> pasid,
>> + &handle->handle);
>
> move the setting of handle.domain into the helper?

Yes.

>
>> @@ -3382,11 +3383,9 @@ int iommu_attach_device_pasid(struct
>> iommu_domain *domain,
>> }
>> }
>>
>> - curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain,
>> GFP_KERNEL);
>> - if (curr) {
>> - ret = xa_err(curr) ? : -EBUSY;
>> + ret = xa_insert(&group->pasid_array, pasid, handle, GFP_KERNEL);
>> + if (ret)
>> goto out_unlock;
>> - }
>
> this leads to a slightly different implication comparing to the old code.
>
> the domain pointer was always tracked in the old code but now the handle
> is optional.
>
> if iommu core only needs to know whether a pasid has been attached (as
> in this helper), it still works as xa_insert() will mark the entry as reserved
> if handle==NULL and next xa_insert() at the same index will fail due to
> the entry being reserved.
>
> But if certain path (other than iopf) in the iommu core needs to know
> the exact domain pointer then this change breaks it.

The iommu core should not fetch the domain pointer in paths other than
attach/detach/replace. There is currently no reference counter for an
iommu domain, hence fetching the domain for other purposes will
potentially lead to a use-after-free issue.

>
> Anyway some explanation is welcomed here for why this change is safe.
>
>> @@ -3414,7 +3413,7 @@ void iommu_detach_device_pasid(struct
>> iommu_domain *domain, struct device *dev,
>>
>> mutex_lock(&group->mutex);
>> __iommu_remove_group_pasid(group, pasid, domain);
>> - WARN_ON(xa_erase(&group->pasid_array, pasid) != domain);
>> + xa_erase(&group->pasid_array, pasid);
>
> if the entry is valid do we want to keep the WARN_ON() upon handle->domain?

The domain pointer has already been passed to the iommu driver in
__iommu_remove_group_pasid(). Therefore, the check for the pointer's
validity should be performed before calling
__iommu_remove_group_pasid(). Hence, in my view, using WARN_ON() around
xa_erase() is not very useful.

Best regards,
baolu

2024-06-06 06:08:54

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v6 02/10] iommu: Remove sva handle list

On 6/5/24 4:15 PM, Tian, Kevin wrote:
>> From: Lu Baolu <[email protected]>
>> Sent: Monday, May 27, 2024 12:05 PM
>>
>> @@ -69,11 +68,16 @@ static struct iommu_mm_data
>> *iommu_alloc_mm_data(struct mm_struct *mm, struct de
>> */
>> struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct
>> mm_struct *mm)
>> {
>> + struct iommu_group *group = dev->iommu_group;
>> + struct iommu_attach_handle *attach_handle;
>> struct iommu_mm_data *iommu_mm;
>> struct iommu_domain *domain;
>> struct iommu_sva *handle;
>
> it's confusing to have both 'handle' and 'attach_handle' in one function.
>
> Clearer to rename 'handle' as 'sva'.

Yes. Could be cleaned up in a separated patch. All sva handle in
iommu-sva.c should be converted if we decide to do that.

>
>> int ret;
>>
>> + if (!group)
>> + return ERR_PTR(-ENODEV);
>> +
>> mutex_lock(&iommu_sva_lock);
>>
>> /* Allocate mm->pasid if necessary. */
>> @@ -83,12 +87,13 @@ struct iommu_sva *iommu_sva_bind_device(struct
>> device *dev, struct mm_struct *mm
>> goto out_unlock;
>> }
>>
>> - list_for_each_entry(handle, &mm->iommu_mm->sva_handles,
>> handle_item) {
>> - if (handle->dev == dev) {
>> - refcount_inc(&handle->users);
>> - mutex_unlock(&iommu_sva_lock);
>> - return handle;
>> - }
>> + /* A bond already exists, just take a reference`. */
>> + attach_handle = iommu_attach_handle_get(group, iommu_mm-
>>> pasid, IOMMU_DOMAIN_SVA);
>> + if (!IS_ERR(attach_handle)) {
>> + handle = container_of(attach_handle, struct iommu_sva,
>> handle);
>> + refcount_inc(&handle->users);
>> + mutex_unlock(&iommu_sva_lock);
>> + return handle;
>> }
>
> It's counter-intuitive to move forward when an error is returned.
>
> e.g. if it's -EBUSY indicating the pasid already used for another type then
> following attempts shouldn't been tried.
>
> probably we should have iommu_attach_handle_get() return NULL
> instead of -ENOENT when the entry is free? then:
>
> attach_handle = iommu_attach_handle_get();
> if (IS_ERR(attach_handle)) {
> ret = PTR_ERR(attach_handle);
> goto out_unlock;
> } else if (attach_handle) {
> /* matched and increase handle->users */
> }
>
> /* free entry falls through */
> But then there is one potential issue with the design that 'handle'
> can be optional in iommu_attach_device_pasid(). In that case
> xa_load returns NULL then we cannot differentiate a real unused
> PASID vs. one which has been attached w/o an handle.

The PASID should be allocated exclusively. This means that once a PASID
is assigned to A, it shouldn't be assigned to B at the same time. If a
single PASID is used for multiple purposes, it's likely a bug in the
system.

So the logic of iommu_attach_handle_get() here is: has an SVA domain
already been installed for this PASID? If so, just reuse it. Otherwise,
try to install a new SVA domain.

> Does it suggest that having the caller to always provide a handle
> makes more sense?

I'm neutral on this since only sva bind and iopf path delivery currently
require an attach handle.

Best regards,
baolu

2024-06-06 06:20:25

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v6 03/10] iommu: Add attach handle to struct iopf_group

On 6/5/24 4:23 PM, Tian, Kevin wrote:
>> From: Lu Baolu <[email protected]>
>> Sent: Monday, May 27, 2024 12:05 PM
>>
>> @@ -249,6 +249,12 @@ enum iommu_cap {
>> */
>> IOMMU_CAP_DEFERRED_FLUSH,
>> IOMMU_CAP_DIRTY_TRACKING, /* IOMMU supports dirty
>> tracking */
>> + /*
>> + * IOMMU driver supports user-managed IOASID table. There is no
>> + * user domain for each PASID and the I/O page faults are forwarded
>> + * through the user domain attached to the device RID.
>> + */
>> + IOMMU_CAP_USER_IOASID_TABLE,
>> };
>
> Given all other context are around PASID let's just call it as USER_PASID_TABLE.
>
> btw this goes differently from your plan in [1] which tried to introduce
> different nesting types between Intel and other vendors.
>
> I guess the reason might be that you want to avoid getting the handle
> for RID on Intel platform in case of failing to find the handle for the
> faulting PASID. and save a new domain type.
>
> this looks fine to me but should be explained.

Yeah! I was considering this in two aspects and chose this simple
solution in the end.

- If we choose to add a new domain type, we need to change iommufd,
iommu core and the individual driver. That seems too complicated to
address a small issue here.

- Fundamentally, this is a hardware implementation difference, hence use
the existing per-device iommu capability interface sounds more
reasonable.

>
> [1] https://lore.kernel.org/linux-iommu/[email protected]/
>

Best regards,
baolu

2024-06-06 06:29:59

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v6 05/10] iommufd: Add fault and response message definitions

On 6/5/24 4:28 PM, Tian, Kevin wrote:
>> From: Lu Baolu <[email protected]>
>> Sent: Monday, May 27, 2024 12:05 PM
>>
>> +
>> +/**
>> + * struct iommu_hwpt_page_response - IOMMU page fault response
>> + * @size: sizeof(struct iommu_hwpt_page_response)
>> + * @flags: Must be set to 0
>> + * @dev_id: device ID of target device for the response
>> + * @pasid: Process Address Space ID
>> + * @grpid: Page Request Group Index
>> + * @code: One of response code in enum iommufd_page_response_code.
>> + * @cookie: The kernel-managed cookie reported in the fault message.
>> + */
>> +struct iommu_hwpt_page_response {
>> + __u32 size;
>> + __u32 flags;
>> + __u32 dev_id;
>> + __u32 pasid;
>> + __u32 grpid;
>> + __u32 code;
>> + __u32 cookie;
>> + __u32 reserved;
>> +};
>
> with the response queue per fault object we don't need all fields here,
> e.g. dev_id, pasid, etc. Cookie is sufficient.

I prefer not to mess the definition of user API data and the kernel
driver implementation. The kernel driver may change in the future, but
the user API will remain stable for a long time.

I am neutral about whether we could remove above fields, but I need all
maintainers to agree on this, given that this has undergone five rounds
of review. :-)

Best regards,
baolu

2024-06-07 09:17:38

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v6 06/10] iommufd: Add iommufd fault object

> From: Lu Baolu <[email protected]>
> Sent: Monday, May 27, 2024 12:05 PM
>
> +static ssize_t iommufd_fault_fops_read(struct file *filep, char __user *buf,
> + size_t count, loff_t *ppos)
> +{
> + size_t fault_size = sizeof(struct iommu_hwpt_pgfault);
> + struct iommufd_fault *fault = filep->private_data;
> + struct iommu_hwpt_pgfault data;
> + struct iommufd_device *idev;
> + struct iopf_group *group;
> + struct iopf_fault *iopf;
> + size_t done = 0;
> + int rc = 0;
> +
> + if (*ppos || count % fault_size)
> + return -ESPIPE;

the man page says:

"If count is zero, read() returns zero and has no other results."

> +
> + mutex_lock(&fault->mutex);
> + while (!list_empty(&fault->deliver) && count > done) {
> + group = list_first_entry(&fault->deliver,
> + struct iopf_group, node);
> +
> + if (group->fault_count * fault_size > count - done)
> + break;
> +
> + rc = xa_alloc(&fault->response, &group->cookie, group,
> + xa_limit_32b, GFP_KERNEL);
> + if (rc)
> + break;
> +
> + idev = to_iommufd_handle(group->attach_handle)->idev;
> + list_for_each_entry(iopf, &group->faults, list) {
> + iommufd_compose_fault_message(&iopf->fault,
> + &data, idev,
> + group->cookie);
> + rc = copy_to_user(buf + done, &data, fault_size);
> + if (rc) {

'rc' should be converted to -EFAULT.

> + xa_erase(&fault->response, group->cookie);
> + break;
> + }
> + done += fault_size;
> + }
> +
> + list_del(&group->node);
> + }
> + mutex_unlock(&fault->mutex);
> +
> + return done == 0 ? rc : done;

again this doesn't match the manual:

"On error, -1 is returned, and errno is set appropriately. "

it doesn't matter whether 'done' is 0.

> +
> +static int iommufd_fault_fops_release(struct inode *inode, struct file *filep)
> +{
> + struct iommufd_fault *fault = filep->private_data;
> +
> + iommufd_ctx_put(fault->ictx);
> + refcount_dec(&fault->obj.users);
> + return 0;
> +}

hmm this doesn't sound correct. the context and refcount are
acquired in iommufd_fault_alloc() but here they are reverted when
the fd is closed...

> +
> + filep = anon_inode_getfile("[iommufd-pgfault]",
> &iommufd_fault_fops,
> + fault, O_RDWR);
> + if (IS_ERR(filep)) {
> + rc = PTR_ERR(filep);
> + goto out_abort;
> + }
> +
> + refcount_inc(&fault->obj.users);
> + iommufd_ctx_get(fault->ictx);
> + fault->filep = filep;

those 3 lines can be moved after below fdno get. It's reads slightly
clearer to put file related work together before getting to the last piece
of intiailzation.

> +
> + fdno = get_unused_fd_flags(O_CLOEXEC);
> + if (fdno < 0) {
> + rc = fdno;
> + goto out_fput;
> + }
> +
> @@ -332,6 +332,7 @@ union ucmd_buffer {
> struct iommu_ioas_unmap unmap;
> struct iommu_option option;
> struct iommu_vfio_ioas vfio_ioas;
> + struct iommu_fault_alloc fault;

alphabetic

> @@ -381,6 +382,8 @@ static const struct iommufd_ioctl_op
> iommufd_ioctl_ops[] = {
> val64),
> IOCTL_OP(IOMMU_VFIO_IOAS, iommufd_vfio_ioas, struct
> iommu_vfio_ioas,
> __reserved),
> + IOCTL_OP(IOMMU_FAULT_QUEUE_ALLOC, iommufd_fault_alloc,
> struct iommu_fault_alloc,
> + out_fault_fd),

ditto



2024-06-07 09:30:12

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v6 07/10] iommufd: Fault-capable hwpt attach/detach/replace

> From: Lu Baolu <[email protected]>
> Sent: Monday, May 27, 2024 12:05 PM
>
> Add iopf-capable hw page table attach/detach/replace helpers. The pointer
> to iommufd_device is stored in the domain attachment handle, so that it
> can be echo'ed back in the iopf_group.

this message needs an update. now the device pointer is not in the
attach handle.

and there worths a explanation about VF in the commit msg.

> @@ -376,7 +377,10 @@ int iommufd_hw_pagetable_attach(struct
> iommufd_hw_pagetable *hwpt,
> * attachment.
> */
> if (list_empty(&idev->igroup->device_list)) {
> - rc = iommu_attach_group(hwpt->domain, idev->igroup->group);
> + if (hwpt->fault)
> + rc = iommufd_fault_domain_attach_dev(hwpt, idev);
> + else
> + rc = iommu_attach_group(hwpt->domain, idev-
> >igroup->group);

Does it read better to have a iommufd_attach_device() wrapper with
above branches handled internally?

>
> +static int iommufd_fault_iopf_enable(struct iommufd_device *idev)
> +{
> + struct device *dev = idev->dev;
> + int ret;
> +
> + /*
> + * Once we turn on PCI/PRI support for VF, the response failure code
> + * could not be forwarded to the hardware due to PRI being a shared

you could but just doing so is incorrect. ????

s/could/should/

> + * resource between PF and VFs. There is no coordination for this
> + * shared capability. This waits for a vPRI reset to recover.
> + */

this may go a bit further to talk about supporting it requires an emulation
in iommufd (i.e. pause any further fault delivery until vPRI reset). It is a
future work so disable it for VF at this point.

> +void iommufd_fault_domain_detach_dev(struct iommufd_hw_pagetable
> *hwpt,
> + struct iommufd_device *idev)
> +{
> + struct iommufd_attach_handle *handle;
> +
> + handle = iommufd_device_get_attach_handle(idev);
> + iommu_detach_group_handle(hwpt->domain, idev->igroup->group);
> + iommufd_auto_response_faults(hwpt, handle);
> + iommufd_fault_iopf_disable(idev);
> + kfree(handle);

this assumes that the detach callback of the iommu driver needs to drain
in-fly page requests so no further reference to handle or queue new req
to the deliver queue when it returns, otherwise there is a use-after-free
race or stale requests in the fault queue which auto response doesn't
cleanly handle.

iirc previous discussion reveals that only intel-iommu driver guarantees
that behavior. In any case this should be documented to avoid this being
used in a non-conforming iommu driver.

If I misunderstood, some comment why no race in this window is also
appreciated. ????

> +}
> +
> +static int __fault_domain_replace_dev(struct iommufd_device *idev,
> + struct iommufd_hw_pagetable *hwpt,
> + struct iommufd_hw_pagetable *old)
> +{
> + struct iommufd_attach_handle *handle, *curr = NULL;
> + int ret;
> +
> + if (old->fault)
> + curr = iommufd_device_get_attach_handle(idev);
> +
> + if (hwpt->fault) {
> + handle = kzalloc(sizeof(*handle), GFP_KERNEL);
> + if (!handle)
> + return -ENOMEM;
> +
> + handle->handle.domain = hwpt->domain;
> + handle->idev = idev;
> + ret = iommu_replace_group_handle(idev->igroup->group,
> + hwpt->domain, &handle-
> >handle);
> + } else {
> + ret = iommu_replace_group_handle(idev->igroup->group,
> + hwpt->domain, NULL);
> + }
> +
> + if (!ret && curr) {
> + iommufd_auto_response_faults(old, curr);
> + kfree(curr);
> + }

In last version you said auto response is required only when switching
from fault-capable old to fault-incapable new. But above code doesn't
reflect that description?

2024-06-07 09:31:00

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v6 08/10] iommufd: Associate fault object with iommufd_hw_pgtable

> From: Lu Baolu <[email protected]>
> Sent: Monday, May 27, 2024 12:05 PM
>
> When allocating a user iommufd_hw_pagetable, the user space is allowed to
> associate a fault object with the hw_pagetable by specifying the fault
> object ID in the page table allocation data and setting the
> IOMMU_HWPT_FAULT_ID_VALID flag bit.
>
> On a successful return of hwpt allocation, the user can retrieve and
> respond to page faults by reading and writing the file interface of the
> fault object.
>
> Once a fault object has been associated with a hwpt, the hwpt is
> iopf-capable, indicated by hwpt->fault is non NULL. Attaching,
> detaching, or replacing an iopf-capable hwpt to an RID or PASID will
> differ from those that are not iopf-capable.
>
> Signed-off-by: Lu Baolu <[email protected]>

Reviewed-by: Kevin Tian <[email protected]>

2024-06-07 09:35:39

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v6 02/10] iommu: Remove sva handle list

> From: Baolu Lu <[email protected]>
> Sent: Thursday, June 6, 2024 2:07 PM
>
> On 6/5/24 4:15 PM, Tian, Kevin wrote:
> >> From: Lu Baolu <[email protected]>
> >> Sent: Monday, May 27, 2024 12:05 PM
> >>
> >> - list_for_each_entry(handle, &mm->iommu_mm->sva_handles,
> >> handle_item) {
> >> - if (handle->dev == dev) {
> >> - refcount_inc(&handle->users);
> >> - mutex_unlock(&iommu_sva_lock);
> >> - return handle;
> >> - }
> >> + /* A bond already exists, just take a reference`. */
> >> + attach_handle = iommu_attach_handle_get(group, iommu_mm-
> >>> pasid, IOMMU_DOMAIN_SVA);
> >> + if (!IS_ERR(attach_handle)) {
> >> + handle = container_of(attach_handle, struct iommu_sva,
> >> handle);
> >> + refcount_inc(&handle->users);
> >> + mutex_unlock(&iommu_sva_lock);
> >> + return handle;
> >> }
> >
> > It's counter-intuitive to move forward when an error is returned.
> >
> > e.g. if it's -EBUSY indicating the pasid already used for another type then
> > following attempts shouldn't been tried.
> >
> > probably we should have iommu_attach_handle_get() return NULL
> > instead of -ENOENT when the entry is free? then:
> >
> > attach_handle = iommu_attach_handle_get();
> > if (IS_ERR(attach_handle)) {
> > ret = PTR_ERR(attach_handle);
> > goto out_unlock;
> > } else if (attach_handle) {
> > /* matched and increase handle->users */
> > }
> >
> > /* free entry falls through */
> > But then there is one potential issue with the design that 'handle'
> > can be optional in iommu_attach_device_pasid(). In that case
> > xa_load returns NULL then we cannot differentiate a real unused
> > PASID vs. one which has been attached w/o an handle.
>
> The PASID should be allocated exclusively. This means that once a PASID
> is assigned to A, it shouldn't be assigned to B at the same time. If a
> single PASID is used for multiple purposes, it's likely a bug in the
> system.

yes there is a bug but catching it here would make diagnostic easier.

>
> So the logic of iommu_attach_handle_get() here is: has an SVA domain
> already been installed for this PASID? If so, just reuse it. Otherwise,
> try to install a new SVA domain.
>
> > Does it suggest that having the caller to always provide a handle
> > makes more sense?
>
> I'm neutral on this since only sva bind and iopf path delivery currently
> require an attach handle.
>

let's hear Jason's opinion.

2024-06-07 09:38:58

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v6 05/10] iommufd: Add fault and response message definitions

> From: Baolu Lu <[email protected]>
> Sent: Thursday, June 6, 2024 2:28 PM
>
> On 6/5/24 4:28 PM, Tian, Kevin wrote:
> >> From: Lu Baolu <[email protected]>
> >> Sent: Monday, May 27, 2024 12:05 PM
> >>
> >> +
> >> +/**
> >> + * struct iommu_hwpt_page_response - IOMMU page fault response
> >> + * @size: sizeof(struct iommu_hwpt_page_response)
> >> + * @flags: Must be set to 0
> >> + * @dev_id: device ID of target device for the response
> >> + * @pasid: Process Address Space ID
> >> + * @grpid: Page Request Group Index
> >> + * @code: One of response code in enum
> iommufd_page_response_code.
> >> + * @cookie: The kernel-managed cookie reported in the fault message.
> >> + */
> >> +struct iommu_hwpt_page_response {
> >> + __u32 size;
> >> + __u32 flags;
> >> + __u32 dev_id;
> >> + __u32 pasid;
> >> + __u32 grpid;
> >> + __u32 code;
> >> + __u32 cookie;
> >> + __u32 reserved;
> >> +};
> >
> > with the response queue per fault object we don't need all fields here,
> > e.g. dev_id, pasid, etc. Cookie is sufficient.
>
> I prefer not to mess the definition of user API data and the kernel
> driver implementation. The kernel driver may change in the future, but
> the user API will remain stable for a long time.

sure it remains stable for reasonable reason. Here we defined some
fields but they are even not used and checked in the kernel. IMHO it
suggests redundant definition. If there is value to keep them, do we
need to at least verify them same as the completion record?

>
> I am neutral about whether we could remove above fields, but I need all
> maintainers to agree on this, given that this has undergone five rounds
> of review. :-)
>

sure let's hear their opinions.

2024-06-08 12:26:49

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v6 06/10] iommufd: Add iommufd fault object

On 6/7/24 5:17 PM, Tian, Kevin wrote:
>> From: Lu Baolu <[email protected]>
>> Sent: Monday, May 27, 2024 12:05 PM
>>
>> +static ssize_t iommufd_fault_fops_read(struct file *filep, char __user *buf,
>> + size_t count, loff_t *ppos)
>> +{
>> + size_t fault_size = sizeof(struct iommu_hwpt_pgfault);
>> + struct iommufd_fault *fault = filep->private_data;
>> + struct iommu_hwpt_pgfault data;
>> + struct iommufd_device *idev;
>> + struct iopf_group *group;
>> + struct iopf_fault *iopf;
>> + size_t done = 0;
>> + int rc = 0;
>> +
>> + if (*ppos || count % fault_size)
>> + return -ESPIPE;
>
> the man page says:
>
> "If count is zero, read() returns zero and has no other results."

My understanding is that reading zero bytes is likely to check if a file
descriptor is valid and ready for reading without actually taking any
data from it.

In this code, it just returns 0 and it's compatible with the man page.
Or, I overlooked anything?

>
>> +
>> + mutex_lock(&fault->mutex);
>> + while (!list_empty(&fault->deliver) && count > done) {
>> + group = list_first_entry(&fault->deliver,
>> + struct iopf_group, node);
>> +
>> + if (group->fault_count * fault_size > count - done)
>> + break;
>> +
>> + rc = xa_alloc(&fault->response, &group->cookie, group,
>> + xa_limit_32b, GFP_KERNEL);
>> + if (rc)
>> + break;
>> +
>> + idev = to_iommufd_handle(group->attach_handle)->idev;
>> + list_for_each_entry(iopf, &group->faults, list) {
>> + iommufd_compose_fault_message(&iopf->fault,
>> + &data, idev,
>> + group->cookie);
>> + rc = copy_to_user(buf + done, &data, fault_size);
>> + if (rc) {
>
> 'rc' should be converted to -EFAULT.

Yes. I will make it like this:

if (copy_to_user(buf + done, &data, fault_size)) {
xa_erase(&fault->response, group->cookie);
rc = -EFAULT;
break;
}

>
>> + xa_erase(&fault->response, group->cookie);
>> + break;
>> + }
>> + done += fault_size;
>> + }
>> +
>> + list_del(&group->node);
>> + }
>> + mutex_unlock(&fault->mutex);
>> +
>> + return done == 0 ? rc : done;
>
> again this doesn't match the manual:
>
> "On error, -1 is returned, and errno is set appropriately."
>
> it doesn't matter whether 'done' is 0.

I don't quite follow here. The code is doing the following:

- If done == 0, it means nothing has been copied to user space. This
could be due to two reasons:

1) the user read with a count set to 0, or
2) a failure case.

The code returns 0 for the first case and an error number for the
second.

- If done != 0, some data has been copied to user space. In this case,
the code returns the number of data copied regardless of the value of
rc.

>
>> +
>> +static int iommufd_fault_fops_release(struct inode *inode, struct file *filep)
>> +{
>> + struct iommufd_fault *fault = filep->private_data;
>> +
>> + iommufd_ctx_put(fault->ictx);
>> + refcount_dec(&fault->obj.users);
>> + return 0;
>> +}
>
> hmm this doesn't sound correct. the context and refcount are
> acquired in iommufd_fault_alloc() but here they are reverted when
> the fd is closed...

These two refcounts were requested when the fault object was installed
to the fault FD.

filep = anon_inode_getfile("[iommufd-pgfault]", &iommufd_fault_fops,
fault, O_RDWR);
if (IS_ERR(filep)) {
rc = PTR_ERR(filep);
goto out_abort;
}

refcount_inc(&fault->obj.users);
iommufd_ctx_get(fault->ictx);
fault->filep = filep;

These refcounts must then be released when the FD is released.

>> +
>> + filep = anon_inode_getfile("[iommufd-pgfault]",
>> &iommufd_fault_fops,
>> + fault, O_RDWR);
>> + if (IS_ERR(filep)) {
>> + rc = PTR_ERR(filep);
>> + goto out_abort;
>> + }
>> +
>> + refcount_inc(&fault->obj.users);
>> + iommufd_ctx_get(fault->ictx);
>> + fault->filep = filep;
>
> those 3 lines can be moved after below fdno get. It's reads slightly
> clearer to put file related work together before getting to the last piece
> of intiailzation.

The filep is allocated and initialized together.

>> +
>> + fdno = get_unused_fd_flags(O_CLOEXEC);
>> + if (fdno < 0) {
>> + rc = fdno;
>> + goto out_fput;
>> + }
>> +
>> @@ -332,6 +332,7 @@ union ucmd_buffer {
>> struct iommu_ioas_unmap unmap;
>> struct iommu_option option;
>> struct iommu_vfio_ioas vfio_ioas;
>> + struct iommu_fault_alloc fault;
>
> alphabetic
>
>> @@ -381,6 +382,8 @@ static const struct iommufd_ioctl_op
>> iommufd_ioctl_ops[] = {
>> val64),
>> IOCTL_OP(IOMMU_VFIO_IOAS, iommufd_vfio_ioas, struct
>> iommu_vfio_ioas,
>> __reserved),
>> + IOCTL_OP(IOMMU_FAULT_QUEUE_ALLOC, iommufd_fault_alloc,
>> struct iommu_fault_alloc,
>> + out_fault_fd),
>
> ditto

Yes, sure. I wasn't aware of the order.

Best regards,
baolu

2024-06-09 12:16:56

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v6 07/10] iommufd: Fault-capable hwpt attach/detach/replace

On 6/7/24 5:30 PM, Tian, Kevin wrote:
>> From: Lu Baolu <[email protected]>
>> Sent: Monday, May 27, 2024 12:05 PM
>>
>> Add iopf-capable hw page table attach/detach/replace helpers. The pointer
>> to iommufd_device is stored in the domain attachment handle, so that it
>> can be echo'ed back in the iopf_group.
>
> this message needs an update. now the device pointer is not in the
> attach handle.

The iommufd_device pointer is in the attach handle provided by iommufd
in attach or replace path.

> and there worths a explanation about VF in the commit msg.
>
>> @@ -376,7 +377,10 @@ int iommufd_hw_pagetable_attach(struct
>> iommufd_hw_pagetable *hwpt,
>> * attachment.
>> */
>> if (list_empty(&idev->igroup->device_list)) {
>> - rc = iommu_attach_group(hwpt->domain, idev->igroup->group);
>> + if (hwpt->fault)
>> + rc = iommufd_fault_domain_attach_dev(hwpt, idev);
>> + else
>> + rc = iommu_attach_group(hwpt->domain, idev-
>>> igroup->group);
>
> Does it read better to have a iommufd_attach_device() wrapper with
> above branches handled internally?

Yes. Will do this in the next version.

>
>>
>> +static int iommufd_fault_iopf_enable(struct iommufd_device *idev)
>> +{
>> + struct device *dev = idev->dev;
>> + int ret;
>> +
>> + /*
>> + * Once we turn on PCI/PRI support for VF, the response failure code
>> + * could not be forwarded to the hardware due to PRI being a shared
>
> you could but just doing so is incorrect. ????
>
> s/could/should/

Done.

>
>> + * resource between PF and VFs. There is no coordination for this
>> + * shared capability. This waits for a vPRI reset to recover.
>> + */
>
> this may go a bit further to talk about supporting it requires an emulation
> in iommufd (i.e. pause any further fault delivery until vPRI reset). It is a
> future work so disable it for VF at this point.

Yes.

>
>> +void iommufd_fault_domain_detach_dev(struct iommufd_hw_pagetable
>> *hwpt,
>> + struct iommufd_device *idev)
>> +{
>> + struct iommufd_attach_handle *handle;
>> +
>> + handle = iommufd_device_get_attach_handle(idev);
>> + iommu_detach_group_handle(hwpt->domain, idev->igroup->group);
>> + iommufd_auto_response_faults(hwpt, handle);
>> + iommufd_fault_iopf_disable(idev);
>> + kfree(handle);
>
> this assumes that the detach callback of the iommu driver needs to drain
> in-fly page requests so no further reference to handle or queue new req
> to the deliver queue when it returns, otherwise there is a use-after-free
> race or stale requests in the fault queue which auto response doesn't
> cleanly handle.
>
> iirc previous discussion reveals that only intel-iommu driver guarantees
> that behavior. In any case this should be documented to avoid this being
> used in a non-conforming iommu driver.
>
> If I misunderstood, some comment why no race in this window is also
> appreciated. ????

Yes. The iommu driver needs to guarantee that there will be no iopf
request for a RID or PASID after the domain has been detached. This
implies that:

- IOMMU hardware should not put further iopf in its hardware queue if
the domain has been detached.
- Before the domain detachment is complete, the iommu driver should
flush all iopf targeting the detached domain in the hardware page
request queue.

>
>> +}
>> +
>> +static int __fault_domain_replace_dev(struct iommufd_device *idev,
>> + struct iommufd_hw_pagetable *hwpt,
>> + struct iommufd_hw_pagetable *old)
>> +{
>> + struct iommufd_attach_handle *handle, *curr = NULL;
>> + int ret;
>> +
>> + if (old->fault)
>> + curr = iommufd_device_get_attach_handle(idev);
>> +
>> + if (hwpt->fault) {
>> + handle = kzalloc(sizeof(*handle), GFP_KERNEL);
>> + if (!handle)
>> + return -ENOMEM;
>> +
>> + handle->handle.domain = hwpt->domain;
>> + handle->idev = idev;
>> + ret = iommu_replace_group_handle(idev->igroup->group,
>> + hwpt->domain, &handle-
>>> handle);
>> + } else {
>> + ret = iommu_replace_group_handle(idev->igroup->group,
>> + hwpt->domain, NULL);
>> + }
>> +
>> + if (!ret && curr) {
>> + iommufd_auto_response_faults(old, curr);
>> + kfree(curr);
>> + }
>
> In last version you said auto response is required only when switching
> from fault-capable old to fault-incapable new. But above code doesn't
> reflect that description?

What the current code does is auto-respond to all page faults targeting
the old fault-capable hwpt. I'm also okay if we decide to limit this to
flushing page faults only if the new hwpt is *not* fault-capable.

Best regards,
baolu

2024-06-12 13:06:12

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v6 02/10] iommu: Remove sva handle list

On Fri, Jun 07, 2024 at 09:35:23AM +0000, Tian, Kevin wrote:
> > From: Baolu Lu <[email protected]>
> > Sent: Thursday, June 6, 2024 2:07 PM
> >
> > On 6/5/24 4:15 PM, Tian, Kevin wrote:
> > >> From: Lu Baolu <[email protected]>
> > >> Sent: Monday, May 27, 2024 12:05 PM
> > >>
> > >> - list_for_each_entry(handle, &mm->iommu_mm->sva_handles,
> > >> handle_item) {
> > >> - if (handle->dev == dev) {
> > >> - refcount_inc(&handle->users);
> > >> - mutex_unlock(&iommu_sva_lock);
> > >> - return handle;
> > >> - }
> > >> + /* A bond already exists, just take a reference`. */
> > >> + attach_handle = iommu_attach_handle_get(group, iommu_mm-
> > >>> pasid, IOMMU_DOMAIN_SVA);
> > >> + if (!IS_ERR(attach_handle)) {
> > >> + handle = container_of(attach_handle, struct iommu_sva,
> > >> handle);
> > >> + refcount_inc(&handle->users);
> > >> + mutex_unlock(&iommu_sva_lock);
> > >> + return handle;
> > >> }
> > >
> > > It's counter-intuitive to move forward when an error is returned.
> > >
> > > e.g. if it's -EBUSY indicating the pasid already used for another type then
> > > following attempts shouldn't been tried.

Yes, it looks like iommu_sva_bind_device() should fail with EBUSY if
the PASID is already in use and is not exactly the same SVA as being
asked for here.

It will eventually do this naturally when iommu_attach_device_pasid()
is called with an in-use PASID, but may as well do it here for
clarity.

Also, is there a missing test for the same mm too?

I'd maybe change iommu_attach_handle() to return NULL if there is no
handle and then write it like:

if (IS_ERR(attach_handle) && PTR_ERR(attach_handle) != -ENOENT) {
ret = PTR_ERR(attach_handle);
goto out_unlock;
}

if (!IS_ERR(attach_handle) && attach_handle->domain->mm == mm) {
/* Can re-use the existing SVA attachment */
}

> > > Does it suggest that having the caller to always provide a handle
> > > makes more sense?

I was thinking no just to avoid memory allocation.. But how does the
caller not provide a handle? My original draft of this concept used an
XA_MARK to indicate if the xarray pointed to a handle or a domain

This seems to require the handle:

- curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain, GFP_KERNEL);
- if (curr) {
- ret = xa_err(curr) ? : -EBUSY;
+ ret = xa_insert(&group->pasid_array, pasid, handle, GFP_KERNEL);

Confused.

> > I'm neutral on this since only sva bind and iopf path delivery currently
> > require an attach handle.
>
> let's hear Jason's opinion.

At least iommu_attach_handle_get() should not return NULL if there is
no handle, it should return EBUSY as though it couldn't match the
type.

Jason

2024-06-12 13:11:57

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v6 01/10] iommu: Introduce domain attachment handle

On Thu, Jun 06, 2024 at 01:33:29PM +0800, Baolu Lu wrote:

> > But if certain path (other than iopf) in the iommu core needs to know
> > the exact domain pointer then this change breaks it.
>
> The iommu core should not fetch the domain pointer in paths other than
> attach/detach/replace. There is currently no reference counter for an
> iommu domain, hence fetching the domain for other purposes will
> potentially lead to a use-after-free issue.

If you are doing this then we should get rid of
iommu_get_domain_for_dev_pasid, and always return the handle
there. Making it clear that all those flows require handles to work.

But is this really OK? What happened to the patch fixing the error
unwind in attach_pasid? Doesn't it always need the domain to undo a
failed attach?

Jason

2024-06-12 13:20:07

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v6 05/10] iommufd: Add fault and response message definitions

On Fri, Jun 07, 2024 at 09:38:38AM +0000, Tian, Kevin wrote:
> > From: Baolu Lu <[email protected]>
> > Sent: Thursday, June 6, 2024 2:28 PM
> >
> > On 6/5/24 4:28 PM, Tian, Kevin wrote:
> > >> From: Lu Baolu <[email protected]>
> > >> Sent: Monday, May 27, 2024 12:05 PM
> > >>
> > >> +
> > >> +/**
> > >> + * struct iommu_hwpt_page_response - IOMMU page fault response
> > >> + * @size: sizeof(struct iommu_hwpt_page_response)
> > >> + * @flags: Must be set to 0
> > >> + * @dev_id: device ID of target device for the response
> > >> + * @pasid: Process Address Space ID
> > >> + * @grpid: Page Request Group Index
> > >> + * @code: One of response code in enum
> > iommufd_page_response_code.
> > >> + * @cookie: The kernel-managed cookie reported in the fault message.
> > >> + */
> > >> +struct iommu_hwpt_page_response {
> > >> + __u32 size;
> > >> + __u32 flags;
> > >> + __u32 dev_id;
> > >> + __u32 pasid;
> > >> + __u32 grpid;
> > >> + __u32 code;
> > >> + __u32 cookie;
> > >> + __u32 reserved;
> > >> +};
> > >
> > > with the response queue per fault object we don't need all fields here,
> > > e.g. dev_id, pasid, etc. Cookie is sufficient.

Wait, why did we make it per object? The fault FD is supposed to be
sharable across HWPTs.

> > I prefer not to mess the definition of user API data and the kernel
> > driver implementation. The kernel driver may change in the future, but
> > the user API will remain stable for a long time.
>
> sure it remains stable for reasonable reason. Here we defined some
> fields but they are even not used and checked in the kernel. IMHO it
> suggests redundant definition. If there is value to keep them, do we
> need to at least verify them same as the completion record?

They are not here for the kernel, they are here for userspace.

A single HWPT and a single fault queue can be attached to multiple
devices we need to return the dev_id so that userspace can know which
device initiated the PRI. Same with PASID.

The only way we could remove them is if we are sure that no vIOMMU
requires RID or PASID in the virtual fault queue PRI fault message.. I
don't think that is true?

Cookie is not a replacement, cookie is an opaque value for the kernel
to use to match a response to a request.

Jason

2024-06-12 13:24:30

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v6 06/10] iommufd: Add iommufd fault object

On Fri, Jun 07, 2024 at 09:17:28AM +0000, Tian, Kevin wrote:
> > From: Lu Baolu <[email protected]>
> > Sent: Monday, May 27, 2024 12:05 PM
> >
> > +static ssize_t iommufd_fault_fops_read(struct file *filep, char __user *buf,
> > + size_t count, loff_t *ppos)
> > +{
> > + size_t fault_size = sizeof(struct iommu_hwpt_pgfault);
> > + struct iommufd_fault *fault = filep->private_data;
> > + struct iommu_hwpt_pgfault data;
> > + struct iommufd_device *idev;
> > + struct iopf_group *group;
> > + struct iopf_fault *iopf;
> > + size_t done = 0;
> > + int rc = 0;
> > +
> > + if (*ppos || count % fault_size)
> > + return -ESPIPE;
>
> the man page says:
>
> "If count is zero, read() returns zero and has no other results."

The above does that? 0 % X == 0

> > +
> > + mutex_lock(&fault->mutex);
> > + while (!list_empty(&fault->deliver) && count > done) {
> > + group = list_first_entry(&fault->deliver,
> > + struct iopf_group, node);
> > +
> > + if (group->fault_count * fault_size > count - done)
> > + break;
> > +
> > + rc = xa_alloc(&fault->response, &group->cookie, group,
> > + xa_limit_32b, GFP_KERNEL);
> > + if (rc)
> > + break;
> > +
> > + idev = to_iommufd_handle(group->attach_handle)->idev;
> > + list_for_each_entry(iopf, &group->faults, list) {
> > + iommufd_compose_fault_message(&iopf->fault,
> > + &data, idev,
> > + group->cookie);
> > + rc = copy_to_user(buf + done, &data, fault_size);
> > + if (rc) {
>
> 'rc' should be converted to -EFAULT.

Yes


> > + xa_erase(&fault->response, group->cookie);
> > + break;
> > + }
> > + done += fault_size;
> > + }
> > +
> > + list_del(&group->node);
> > + }
> > + mutex_unlock(&fault->mutex);
> > +
> > + return done == 0 ? rc : done;
>
> again this doesn't match the manual:
>
> "On error, -1 is returned, and errno is set appropriately. "
>
> it doesn't matter whether 'done' is 0.

It is setup so that once the list_del() below happens it is guarenteed
that the system call will return a positive result so that the
list_del'd items are always returned to userspace.

If we hit any fault here on the Nth item we should still return the
prior items and ignore the fault.

If we hit a fault on the first item then we should return the fault.

Jason

2024-06-12 13:31:22

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v6 06/10] iommufd: Add iommufd fault object

On Sat, Jun 08, 2024 at 05:58:34PM +0800, Baolu Lu wrote:

> > > +static int iommufd_fault_fops_release(struct inode *inode, struct file *filep)
> > > +{
> > > + struct iommufd_fault *fault = filep->private_data;
> > > +
> > > + iommufd_ctx_put(fault->ictx);
> > > + refcount_dec(&fault->obj.users);
> > > + return 0;


This is in the wrong order, dec users before ctx_put.

> > hmm this doesn't sound correct. the context and refcount are
> > acquired in iommufd_fault_alloc() but here they are reverted when
> > the fd is closed...
>
> These two refcounts were requested when the fault object was installed
> to the fault FD.
>
> filep = anon_inode_getfile("[iommufd-pgfault]", &iommufd_fault_fops,
> fault, O_RDWR);
> if (IS_ERR(filep)) {
> rc = PTR_ERR(filep);
> goto out_abort;
> }
>
> refcount_inc(&fault->obj.users);
> iommufd_ctx_get(fault->ictx);
> fault->filep = filep;
>
> These refcounts must then be released when the FD is released.

Yes

The ctx refcount is to avoid destroying the ctx FD, which can't work,
while the fault FD has an object pinned. This is also why the above
order is backwards.

Jason

2024-06-12 13:37:49

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v6 03/10] iommu: Add attach handle to struct iopf_group

On Mon, May 27, 2024 at 12:05:10PM +0800, Lu Baolu wrote:
> @@ -206,20 +182,49 @@ void iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
> if (group == &abort_group)
> goto err_abort;
>
> - group->domain = get_domain_for_iopf(dev, fault);
> - if (!group->domain)
> + if (fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) {
> + group->attach_handle = iommu_attach_handle_get(dev->iommu_group,
> + fault->prm.pasid,
> + 0);
> + if (IS_ERR(group->attach_handle)) {
> + if (!device_iommu_capable(dev, IOMMU_CAP_USER_IOASID_TABLE))
> + goto err_abort;

I'm not excited about calling a function pointer on every fault. Let's
just add a constant flag to iommu_ops?

Jason

2024-06-12 13:41:36

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v6 04/10] iommu: Extend domain attach group with handle support

On Mon, May 27, 2024 at 12:05:11PM +0800, Lu Baolu wrote:
> Unlike the SVA case where each PASID of a device has an SVA domain
> attached to it, the I/O page faults are handled by the fault handler
> of the SVA domain. The I/O page faults for a user page table might
> be handled by the domain attached to RID or the domain attached to
> the PASID, depending on whether the PASID table is managed by user
> space or kernel. As a result, there is a need for the domain attach
> group interfaces to have attach handle support. The attach handle
> will be forwarded to the fault handler of the user domain.
>
> Add some variants of the domain attaching group interfaces so that they
> could support the attach handle and export them for use in IOMMUFD.
>
> Signed-off-by: Lu Baolu <[email protected]>
> ---
> drivers/iommu/iommu-priv.h | 8 +++
> drivers/iommu/iommu.c | 99 ++++++++++++++++++++++++++++++++++++++
> 2 files changed, 107 insertions(+)

I don't have an objection to it like this, but I wonder if we could be
smaller to teach iommu_attach_device_pasid to use IOMMU_NO_PASID to
attach the handle to the rid?

It would have an if there to call the __iommu_attach_group() instead
of the pasid attach ?

Jason

2024-06-12 13:51:39

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v6 05/10] iommufd: Add fault and response message definitions

On Wed, Jun 12, 2024 at 10:19:46AM -0300, Jason Gunthorpe wrote:
> > > I prefer not to mess the definition of user API data and the kernel
> > > driver implementation. The kernel driver may change in the future, but
> > > the user API will remain stable for a long time.
> >
> > sure it remains stable for reasonable reason. Here we defined some
> > fields but they are even not used and checked in the kernel. IMHO it
> > suggests redundant definition. If there is value to keep them, do we
> > need to at least verify them same as the completion record?
>
> They are not here for the kernel, they are here for userspace.
>
> A single HWPT and a single fault queue can be attached to multiple
> devices we need to return the dev_id so that userspace can know which
> device initiated the PRI. Same with PASID.
>
> The only way we could remove them is if we are sure that no vIOMMU
> requires RID or PASID in the virtual fault queue PRI fault message.. I
> don't think that is true?
>
> Cookie is not a replacement, cookie is an opaque value for the kernel
> to use to match a response to a request.

Oh I got this wrong, the above is the response, yeah we can ditch
everything but the cookie, and code right?

struct iommu_hwpt_page_response {
__u32 cookie;
__u32 code;
};

What is the workflow here? We expect the VMM will take the vIOMMU
response and match it to a request cookie for the kernel to complete?

Jason

2024-06-12 13:52:58

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v6 05/10] iommufd: Add fault and response message definitions

On Mon, May 27, 2024 at 12:05:12PM +0800, Lu Baolu wrote:
> +/**
> + * struct iommu_hwpt_pgfault - iommu page fault data
> + * @size: sizeof(struct iommu_hwpt_pgfault)
> + * @flags: Combination of enum iommu_hwpt_pgfault_flags
> + * @dev_id: id of the originated device
> + * @pasid: Process Address Space ID
> + * @grpid: Page Request Group Index
> + * @perm: Combination of enum iommu_hwpt_pgfault_perm
> + * @addr: Fault address
> + * @length: a hint of how much data the requestor is expecting to fetch. For
> + * example, if the PRI initiator knows it is going to do a 10MB
> + * transfer, it could fill in 10MB and the OS could pre-fault in
> + * 10MB of IOVA. It's default to 0 if there's no such hint.
> + * @cookie: kernel-managed cookie identifying a group of fault messages. The
> + * cookie number encoded in the last page fault of the group should
> + * be echoed back in the response message.
> + */
> +struct iommu_hwpt_pgfault {
> + __u32 size;

Given we fail the system call if size is not exactly the right value
we should probably drop it here.

The ioctl to get the FD can someday specify the format of the fault
messages if we need to upgrade. If we want to change it down the road
then the old FD will be exactly as it is now, and the user will
request a new format FD that only works in whatever the new way is.

Jason

2024-06-12 13:57:30

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v6 00/10] IOMMUFD: Deliver IO page faults to user space

On Mon, May 27, 2024 at 12:05:07PM +0800, Lu Baolu wrote:
> This series implements the functionality of delivering IO page faults to
> user space through the IOMMUFD framework. One feasible use case is the
> nested translation. Nested translation is a hardware feature that
> supports two-stage translation tables for IOMMU. The second-stage
> translation table is managed by the host VMM, while the first-stage
> translation table is owned by user space. This allows user space to
> control the IOMMU mappings for its devices.

This looks pretty close, will you post a v7 with the minor changes?

Thanks,
Jasno

2024-06-13 03:07:31

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v6 01/10] iommu: Introduce domain attachment handle

On 6/12/24 9:10 PM, Jason Gunthorpe wrote:
> On Thu, Jun 06, 2024 at 01:33:29PM +0800, Baolu Lu wrote:
>
>>> But if certain path (other than iopf) in the iommu core needs to know
>>> the exact domain pointer then this change breaks it.
>>
>> The iommu core should not fetch the domain pointer in paths other than
>> attach/detach/replace. There is currently no reference counter for an
>> iommu domain, hence fetching the domain for other purposes will
>> potentially lead to a use-after-free issue.
>
> If you are doing this then we should get rid of
> iommu_get_domain_for_dev_pasid, and always return the handle
> there. Making it clear that all those flows require handles to work.

I removed iommu_get_domain_for_dev_pasid() in this series. The iopf
handling is the only path that requires fetching domain, where we
requires attach handle to work.

>
> But is this really OK? What happened to the patch fixing the error
> unwind in attach_pasid? Doesn't it always need the domain to undo a
> failed attach?

attach_pasid switches from being unassigned to being assigned with a
real domain, so the unwind operation simply involves calling
iommu_ops->remove_dev_pasid.

In the future, probably we will introduce a replace interface for pasid.
In that scenario, the caller should explicitly pass both the old and new
domains. If the replace operation fails, the interface will revert to
the old domain.

In my mind, the iommu core should only manage the default domain for
kernel DMA. All domains that are allocated by domain_alloc interfaces
should be managed by the callers and passed through the attach/detach
/replace interfaces.

>
> Jason
>

Best regards,
baolu

2024-06-13 04:08:58

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v6 02/10] iommu: Remove sva handle list

On 6/12/24 9:05 PM, Jason Gunthorpe wrote:
> On Fri, Jun 07, 2024 at 09:35:23AM +0000, Tian, Kevin wrote:
>>> From: Baolu Lu <[email protected]>
>>> Sent: Thursday, June 6, 2024 2:07 PM
>>>
>>> On 6/5/24 4:15 PM, Tian, Kevin wrote:
>>>>> From: Lu Baolu <[email protected]>
>>>>> Sent: Monday, May 27, 2024 12:05 PM
>>>>>
>>>>> - list_for_each_entry(handle, &mm->iommu_mm->sva_handles,
>>>>> handle_item) {
>>>>> - if (handle->dev == dev) {
>>>>> - refcount_inc(&handle->users);
>>>>> - mutex_unlock(&iommu_sva_lock);
>>>>> - return handle;
>>>>> - }
>>>>> + /* A bond already exists, just take a reference`. */
>>>>> + attach_handle = iommu_attach_handle_get(group, iommu_mm-
>>>>>> pasid, IOMMU_DOMAIN_SVA);
>>>>> + if (!IS_ERR(attach_handle)) {
>>>>> + handle = container_of(attach_handle, struct iommu_sva,
>>>>> handle);
>>>>> + refcount_inc(&handle->users);
>>>>> + mutex_unlock(&iommu_sva_lock);
>>>>> + return handle;
>>>>> }
>>>>
>>>> It's counter-intuitive to move forward when an error is returned.
>>>>
>>>> e.g. if it's -EBUSY indicating the pasid already used for another type then
>>>> following attempts shouldn't been tried.
>
> Yes, it looks like iommu_sva_bind_device() should fail with EBUSY if
> the PASID is already in use and is not exactly the same SVA as being
> asked for here.
>
> It will eventually do this naturally when iommu_attach_device_pasid()
> is called with an in-use PASID, but may as well do it here for
> clarity.
>
> Also, is there a missing test for the same mm too?
>
> I'd maybe change iommu_attach_handle() to return NULL if there is no
> handle and then write it like:
>
> if (IS_ERR(attach_handle) && PTR_ERR(attach_handle) != -ENOENT) {
> ret = PTR_ERR(attach_handle);
> goto out_unlock;
> }
>
> if (!IS_ERR(attach_handle) && attach_handle->domain->mm == mm) {
> /* Can re-use the existing SVA attachment */
> }
>

Okay, I will change it like below:

--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -91,11 +91,20 @@ struct iommu_sva *iommu_sva_bind_device(struct
device *dev, struct mm_struct *mm
attach_handle = iommu_attach_handle_get(group, iommu_mm->pasid,
IOMMU_DOMAIN_SVA);
if (!IS_ERR(attach_handle)) {
handle = container_of(attach_handle, struct iommu_sva,
handle);
+ if (attach_handle->domain->mm != mm) {
+ ret = -EBUSY;
+ goto out_unlock;
+ }
refcount_inc(&handle->users);
mutex_unlock(&iommu_sva_lock);
return handle;
}

+ if (PTR_ERR(attach_handle) != -ENOENT) {
+ ret = PTR_ERR(attach_handle);
+ goto out_unlock;
+ }
+
handle = kzalloc(sizeof(*handle), GFP_KERNEL);
if (!handle) {
ret = -ENOMEM;

>>>> Does it suggest that having the caller to always provide a handle
>>>> makes more sense?
>
> I was thinking no just to avoid memory allocation.. But how does the
> caller not provide a handle? My original draft of this concept used an
> XA_MARK to indicate if the xarray pointed to a handle or a domain
>
> This seems to require the handle:
>
> - curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain, GFP_KERNEL);
> - if (curr) {
> - ret = xa_err(curr) ? : -EBUSY;
> + ret = xa_insert(&group->pasid_array, pasid, handle, GFP_KERNEL);
>
> Confused.

XA_MARK was used to indicate whether the stored pointer was an iommu
domain or an attach handle. Since this series removes
iommu_get_domain_for_dev_pasid(), there is no longer any need to store
the domain pointer at all.

>
>>> I'm neutral on this since only sva bind and iopf path delivery currently
>>> require an attach handle.
>>
>> let's hear Jason's opinion.
>
> At least iommu_attach_handle_get() should not return NULL if there is
> no handle, it should return EBUSY as though it couldn't match the
> type.

Agreed.

>
> Jason

Best regards,
baolu

2024-06-13 04:50:28

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v6 04/10] iommu: Extend domain attach group with handle support

On 6/12/24 9:41 PM, Jason Gunthorpe wrote:
> On Mon, May 27, 2024 at 12:05:11PM +0800, Lu Baolu wrote:
>> Unlike the SVA case where each PASID of a device has an SVA domain
>> attached to it, the I/O page faults are handled by the fault handler
>> of the SVA domain. The I/O page faults for a user page table might
>> be handled by the domain attached to RID or the domain attached to
>> the PASID, depending on whether the PASID table is managed by user
>> space or kernel. As a result, there is a need for the domain attach
>> group interfaces to have attach handle support. The attach handle
>> will be forwarded to the fault handler of the user domain.
>>
>> Add some variants of the domain attaching group interfaces so that they
>> could support the attach handle and export them for use in IOMMUFD.
>>
>> Signed-off-by: Lu Baolu<[email protected]>
>> ---
>> drivers/iommu/iommu-priv.h | 8 +++
>> drivers/iommu/iommu.c | 99 ++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 107 insertions(+)
> I don't have an objection to it like this, but I wonder if we could be
> smaller to teach iommu_attach_device_pasid to use IOMMU_NO_PASID to
> attach the handle to the rid?
>
> It would have an if there to call the __iommu_attach_group() instead
> of the pasid attach ?

This is a good idea. I guess that in the future, we will have
iommu_attach_device_pasid(IOMMU_NO_PASID) to replace
iommu_attach_group(). The group->domain and domain_ops::attach_dev will
also be removed.

I'd suggest making such refactoring in a separate series with wider
discussions. For now, let's make this special interface for iommufd.

Best regards,
baolu

2024-06-13 06:47:25

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v6 06/10] iommufd: Add iommufd fault object

On 6/12/24 9:25 PM, Jason Gunthorpe wrote:
> On Sat, Jun 08, 2024 at 05:58:34PM +0800, Baolu Lu wrote:
>
>>>> +static int iommufd_fault_fops_release(struct inode *inode, struct file *filep)
>>>> +{
>>>> + struct iommufd_fault *fault = filep->private_data;
>>>> +
>>>> + iommufd_ctx_put(fault->ictx);
>>>> + refcount_dec(&fault->obj.users);
>>>> + return 0;
>
> This is in the wrong order, dec users before ctx_put.

Done.

Best regards,
baolu

2024-06-13 06:49:25

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v6 00/10] IOMMUFD: Deliver IO page faults to user space

On 6/12/24 9:54 PM, Jason Gunthorpe wrote:
> On Mon, May 27, 2024 at 12:05:07PM +0800, Lu Baolu wrote:
>> This series implements the functionality of delivering IO page faults to
>> user space through the IOMMUFD framework. One feasible use case is the
>> nested translation. Nested translation is a hardware feature that
>> supports two-stage translation tables for IOMMU. The second-stage
>> translation table is managed by the host VMM, while the first-stage
>> translation table is owned by user space. This allows user space to
>> control the IOMMU mappings for its devices.
> This looks pretty close, will you post a v7 with the minor changes?

Yes. Sure.

Best regards,
baolu

2024-06-13 06:57:20

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v6 05/10] iommufd: Add fault and response message definitions

On 6/12/24 9:19 PM, Jason Gunthorpe wrote:
> On Fri, Jun 07, 2024 at 09:38:38AM +0000, Tian, Kevin wrote:
>>> From: Baolu Lu<[email protected]>
>>> Sent: Thursday, June 6, 2024 2:28 PM
>>>
>>> On 6/5/24 4:28 PM, Tian, Kevin wrote:
>>>>> From: Lu Baolu<[email protected]>
>>>>> Sent: Monday, May 27, 2024 12:05 PM
>>>>>
>>>>> +
>>>>> +/**
>>>>> + * struct iommu_hwpt_page_response - IOMMU page fault response
>>>>> + * @size: sizeof(struct iommu_hwpt_page_response)
>>>>> + * @flags: Must be set to 0
>>>>> + * @dev_id: device ID of target device for the response
>>>>> + * @pasid: Process Address Space ID
>>>>> + * @grpid: Page Request Group Index
>>>>> + * @code: One of response code in enum
>>> iommufd_page_response_code.
>>>>> + * @cookie: The kernel-managed cookie reported in the fault message.
>>>>> + */
>>>>> +struct iommu_hwpt_page_response {
>>>>> + __u32 size;
>>>>> + __u32 flags;
>>>>> + __u32 dev_id;
>>>>> + __u32 pasid;
>>>>> + __u32 grpid;
>>>>> + __u32 code;
>>>>> + __u32 cookie;
>>>>> + __u32 reserved;
>>>>> +};
>>>> with the response queue per fault object we don't need all fields here,
>>>> e.g. dev_id, pasid, etc. Cookie is sufficient.
> Wait, why did we make it per object? The fault FD is supposed to be
> sharable across HWPTs.

The fault FD is shareable across HWPTs. Kevin was suggesting that the
response queue (for all outstanding IOPFs awaiting responses) could be
put in the fault structure.

Best regards,
baolu

2024-06-13 07:38:13

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v6 05/10] iommufd: Add fault and response message definitions

On 6/12/24 9:50 PM, Jason Gunthorpe wrote:
> On Wed, Jun 12, 2024 at 10:19:46AM -0300, Jason Gunthorpe wrote:
>>>> I prefer not to mess the definition of user API data and the kernel
>>>> driver implementation. The kernel driver may change in the future, but
>>>> the user API will remain stable for a long time.
>>> sure it remains stable for reasonable reason. Here we defined some
>>> fields but they are even not used and checked in the kernel. IMHO it
>>> suggests redundant definition. If there is value to keep them, do we
>>> need to at least verify them same as the completion record?
>> They are not here for the kernel, they are here for userspace.
>>
>> A single HWPT and a single fault queue can be attached to multiple
>> devices we need to return the dev_id so that userspace can know which
>> device initiated the PRI. Same with PASID.
>>
>> The only way we could remove them is if we are sure that no vIOMMU
>> requires RID or PASID in the virtual fault queue PRI fault message.. I
>> don't think that is true?
>>
>> Cookie is not a replacement, cookie is an opaque value for the kernel
>> to use to match a response to a request.
> Oh I got this wrong, the above is the response, yeah we can ditch
> everything but the cookie, and code right?
>
> struct iommu_hwpt_page_response {
> __u32 cookie;
> __u32 code;
> };
>
> What is the workflow here? We expect the VMM will take the vIOMMU
> response and match it to a request cookie for the kernel to complete?

The workflow in the host kernel involves looking up the iopf_group using
the cookie value in the response queue and then responding to the
iopf_group with the code. Therefore, from the host kernel's perspective,
it is acceptable if the user only passes the cookie and code in the
response message.

Since you both agreed that the response message could be simplified, I
will implement the above structure in the new version.

Best regards,
baolu

2024-06-13 07:58:26

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v6 05/10] iommufd: Add fault and response message definitions

On 6/12/24 9:52 PM, Jason Gunthorpe wrote:
> On Mon, May 27, 2024 at 12:05:12PM +0800, Lu Baolu wrote:
>> +/**
>> + * struct iommu_hwpt_pgfault - iommu page fault data
>> + * @size: sizeof(struct iommu_hwpt_pgfault)
>> + * @flags: Combination of enum iommu_hwpt_pgfault_flags
>> + * @dev_id: id of the originated device
>> + * @pasid: Process Address Space ID
>> + * @grpid: Page Request Group Index
>> + * @perm: Combination of enum iommu_hwpt_pgfault_perm
>> + * @addr: Fault address
>> + * @length: a hint of how much data the requestor is expecting to fetch. For
>> + * example, if the PRI initiator knows it is going to do a 10MB
>> + * transfer, it could fill in 10MB and the OS could pre-fault in
>> + * 10MB of IOVA. It's default to 0 if there's no such hint.
>> + * @cookie: kernel-managed cookie identifying a group of fault messages. The
>> + * cookie number encoded in the last page fault of the group should
>> + * be echoed back in the response message.
>> + */
>> +struct iommu_hwpt_pgfault {
>> + __u32 size;
> Given we fail the system call if size is not exactly the right value
> we should probably drop it here.
>
> The ioctl to get the FD can someday specify the format of the fault
> messages if we need to upgrade. If we want to change it down the road
> then the old FD will be exactly as it is now, and the user will
> request a new format FD that only works in whatever the new way is.

Okay, sure!

Best regards,
baolu

2024-06-13 08:34:12

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v6 03/10] iommu: Add attach handle to struct iopf_group

On 6/12/24 9:37 PM, Jason Gunthorpe wrote:
> On Mon, May 27, 2024 at 12:05:10PM +0800, Lu Baolu wrote:
>> @@ -206,20 +182,49 @@ void iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
>> if (group == &abort_group)
>> goto err_abort;
>>
>> - group->domain = get_domain_for_iopf(dev, fault);
>> - if (!group->domain)
>> + if (fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) {
>> + group->attach_handle = iommu_attach_handle_get(dev->iommu_group,
>> + fault->prm.pasid,
>> + 0);
>> + if (IS_ERR(group->attach_handle)) {
>> + if (!device_iommu_capable(dev, IOMMU_CAP_USER_IOASID_TABLE))
>> + goto err_abort;
>
> I'm not excited about calling a function pointer on every fault. Let's
> just add a constant flag to iommu_ops?

Yes, it's reasonable given this is a critical path. How about below
additional change?

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 16b3a2da91ef..69ea4d0374b9 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -253,12 +253,6 @@ enum iommu_cap {
*/
IOMMU_CAP_DEFERRED_FLUSH,
IOMMU_CAP_DIRTY_TRACKING, /* IOMMU supports dirty tracking */
- /*
- * IOMMU driver supports user-managed IOASID table. There is no
- * user domain for each PASID and the I/O page faults are forwarded
- * through the user domain attached to the device RID.
- */
- IOMMU_CAP_USER_IOASID_TABLE,
};

/* These are the possible reserved region types */
@@ -557,6 +551,10 @@ static inline int __iommu_copy_struct_from_user_array(
* @default_domain: If not NULL this will always be set as the default
domain.
* This should be an IDENTITY/BLOCKED/PLATFORM domain.
* Do not use in new drivers.
+ * @user_pasid_table: IOMMU driver supports user-managed PASID table.
There is
+ * no user domain for each PASID and the I/O page
faults are
+ * forwarded through the user domain attached to the
device
+ * RID.
*/
struct iommu_ops {
bool (*capable)(struct device *dev, enum iommu_cap);
@@ -600,6 +598,7 @@ struct iommu_ops {
struct iommu_domain *blocked_domain;
struct iommu_domain *release_domain;
struct iommu_domain *default_domain;
+ bool user_pasid_table;
};

/**
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index a629d8a93614..cd679c13752e 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -189,7 +189,9 @@ void iommu_report_device_fault(struct device *dev,
struct iopf_fault *evt)

fault->prm.pasid,
0);
if (IS_ERR(group->attach_handle)) {
- if (!device_iommu_capable(dev,
IOMMU_CAP_USER_IOASID_TABLE))
+ const struct iommu_ops *ops = dev_iommu_ops(dev);
+
+ if (!ops->user_pasid_table)
goto err_abort;

Best regards,
baolu

2024-06-13 11:52:26

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v6 03/10] iommu: Add attach handle to struct iopf_group

On Thu, Jun 13, 2024 at 12:23:17PM +0800, Baolu Lu wrote:

> struct iommu_ops {
> bool (*capable)(struct device *dev, enum iommu_cap);
> @@ -600,6 +598,7 @@ struct iommu_ops {
> struct iommu_domain *blocked_domain;
> struct iommu_domain *release_domain;
> struct iommu_domain *default_domain;
> + bool user_pasid_table;
> };

Yeah, maybe spell it

u8 user_pasid_table : 1;

Jason

2024-06-14 01:19:30

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v6 03/10] iommu: Add attach handle to struct iopf_group

On 6/13/24 7:49 PM, Jason Gunthorpe wrote:
> On Thu, Jun 13, 2024 at 12:23:17PM +0800, Baolu Lu wrote:
>
>> struct iommu_ops {
>> bool (*capable)(struct device *dev, enum iommu_cap);
>> @@ -600,6 +598,7 @@ struct iommu_ops {
>> struct iommu_domain *blocked_domain;
>> struct iommu_domain *release_domain;
>> struct iommu_domain *default_domain;
>> + bool user_pasid_table;
>> };
> Yeah, maybe spell it
>
> u8 user_pasid_table : 1;

Done.

Best regards,
baolu