2024-04-03 01:16:33

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v4 0/9] IOMMUFD: Deliver IO page faults to user space

This series implements the functionality of delivering IO page faults to
user space through the IOMMUFD framework. One feasible use case is the
nested translation. Nested translation is a hardware feature that
supports two-stage translation tables for IOMMU. The second-stage
translation table is managed by the host VMM, while the first-stage
translation table is owned by user space. This allows user space to
control the IOMMU mappings for its devices.

When an IO page fault occurs on the first-stage translation table, the
IOMMU hardware can deliver the page fault to user space through the
IOMMUFD framework. User space can then handle the page fault and respond
to the device top-down through the IOMMUFD. This allows user space to
implement its own IO page fault handling policies.

User space application that is capable of handling IO page faults should
allocate a fault object, and bind the fault object to any domain that it
is willing to handle the fault generatd for them. On a successful return
of fault object allocation, the user can retrieve and respond to page
faults by reading or writing to the file descriptor (FD) returned.

The iommu selftest framework has been updated to test the IO page fault
delivery and response functionality.

The series and related patches are available on GitHub:
https://github.com/LuBaolu/intel-iommu/commits/iommufd-io-pgfault-delivery-v4

Change log:

v4:
- Add the iommu domain attachment handle to replace the iopf-specific
domain attachment interfaces introduced in the previous v3.
- Replace the iommu_sva with iommu domain attachment handle.
- Refine some fields in the fault and response message encoding
according to feedback collected during v3 review period.
- Refine and fix some problems in the fault FD implementation.
- Miscellaneous cleanup.

v3: https://lore.kernel.org/linux-iommu/[email protected]/
- Add iopf domain attach/detach/replace interfaces to manage the
reference counters of hwpt and device, ensuring that both can only be
destroyed after all outstanding IOPFs have been responded to.
- Relocate the fault handling file descriptor from hwpt to a fault
object to enable a single fault handling object to be utilized
across multiple domains.
- Miscellaneous cleanup and performance improvements.

v2: https://lore.kernel.org/linux-iommu/[email protected]/
- Move all iommu refactoring patches into a sparated series and discuss
it in a different thread. The latest patch series [v6] is available at
https://lore.kernel.org/linux-iommu/[email protected]/
- We discussed the timeout of the pending page fault messages. We
agreed that we shouldn't apply any timeout policy for the page fault
handling in user space.
https://lore.kernel.org/linux-iommu/20230616113232.GA84678@myrica/
- Jason suggested that we adopt a simple file descriptor interface for
reading and responding to I/O page requests, so that user space
applications can improve performance using io_uring.
https://lore.kernel.org/linux-iommu/[email protected]/

v1: https://lore.kernel.org/linux-iommu/[email protected]/

Lu Baolu (9):
iommu: Introduce domain attachment handle
iommu: Replace sva_iommu with iommu_attach_handle
iommu: Add attachment handle to struct iopf_group
iommufd: Fault-capable hw page table attach/detach/replace
iommufd: Add fault and response message definitions
iommufd: Add iommufd fault object
iommufd: Associate fault object with iommufd_hw_pgtable
iommufd/selftest: Add IOPF support for mock device
iommufd/selftest: Add coverage for IOPF test

include/linux/iommu.h | 33 +-
include/linux/uacce.h | 2 +-
drivers/dma/idxd/idxd.h | 2 +-
drivers/iommu/iommu-priv.h | 9 +
drivers/iommu/iommufd/iommufd_private.h | 43 ++
drivers/iommu/iommufd/iommufd_test.h | 8 +
include/uapi/linux/iommufd.h | 122 ++++++
tools/testing/selftests/iommu/iommufd_utils.h | 84 +++-
drivers/dma/idxd/cdev.c | 4 +-
drivers/iommu/io-pgfault.c | 37 +-
drivers/iommu/iommu-sva.c | 64 ++-
drivers/iommu/iommu.c | 158 +++++++-
drivers/iommu/iommufd/device.c | 16 +-
drivers/iommu/iommufd/fault.c | 372 ++++++++++++++++++
drivers/iommu/iommufd/hw_pagetable.c | 36 +-
drivers/iommu/iommufd/main.c | 6 +
drivers/iommu/iommufd/selftest.c | 63 +++
drivers/misc/uacce/uacce.c | 2 +-
tools/testing/selftests/iommu/iommufd.c | 18 +
.../selftests/iommu/iommufd_fail_nth.c | 2 +-
drivers/iommu/iommufd/Makefile | 1 +
21 files changed, 968 insertions(+), 114 deletions(-)
create mode 100644 drivers/iommu/iommufd/fault.c

--
2.34.1



2024-04-03 01:17:04

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v4 2/9] iommu: Replace sva_iommu with iommu_attach_handle

The struct sva_iommu represents a bond of an SVA domain and a device.
It is functionally equivalent to the iommu_attach_handle. To avoid
code duplication, replace sva_iommu with the iommu_attach_handle and
remove the code that manages sva_iommu.

Signed-off-by: Lu Baolu <[email protected]>
---
include/linux/iommu.h | 28 +++++------------
include/linux/uacce.h | 2 +-
drivers/dma/idxd/idxd.h | 2 +-
drivers/dma/idxd/cdev.c | 4 +--
drivers/iommu/iommu-sva.c | 61 ++++++++++++++++----------------------
drivers/misc/uacce/uacce.c | 2 +-
6 files changed, 38 insertions(+), 61 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 2e925b5eba53..be9c9a10169d 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -39,7 +39,6 @@ struct iommu_domain;
struct iommu_domain_ops;
struct iommu_dirty_ops;
struct notifier_block;
-struct iommu_sva;
struct iommu_dma_cookie;
struct iommu_fault_param;

@@ -986,20 +985,9 @@ struct iommu_fwspec {
/* ATS is supported */
#define IOMMU_FWSPEC_PCI_RC_ATS (1 << 0)

-/**
- * struct iommu_sva - handle to a device-mm bond
- */
-struct iommu_sva {
- struct device *dev;
- struct iommu_domain *domain;
- struct list_head handle_item;
- refcount_t users;
-};
-
struct iommu_mm_data {
u32 pasid;
struct list_head sva_domains;
- struct list_head sva_handles;
};

int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
@@ -1527,24 +1515,24 @@ static inline u32 mm_get_enqcmd_pasid(struct mm_struct *mm)
}

void mm_pasid_drop(struct mm_struct *mm);
-struct iommu_sva *iommu_sva_bind_device(struct device *dev,
- struct mm_struct *mm);
-void iommu_sva_unbind_device(struct iommu_sva *handle);
-u32 iommu_sva_get_pasid(struct iommu_sva *handle);
+struct iommu_attach_handle *iommu_sva_bind_device(struct device *dev,
+ struct mm_struct *mm);
+void iommu_sva_unbind_device(struct iommu_attach_handle *handle);
+u32 iommu_sva_get_pasid(struct iommu_attach_handle *handle);
struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
struct mm_struct *mm);
#else
-static inline struct iommu_sva *
+static inline struct iommu_attach_handle *
iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
{
- return NULL;
+ return ERR_PTR(-ENODEV);
}

-static inline void iommu_sva_unbind_device(struct iommu_sva *handle)
+static inline void iommu_sva_unbind_device(struct iommu_attach_handle *handle)
{
}

-static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle)
+static inline u32 iommu_sva_get_pasid(struct iommu_attach_handle *handle)
{
return IOMMU_PASID_INVALID;
}
diff --git a/include/linux/uacce.h b/include/linux/uacce.h
index e290c0269944..1548119c89ae 100644
--- a/include/linux/uacce.h
+++ b/include/linux/uacce.h
@@ -97,7 +97,7 @@ struct uacce_queue {
struct mutex mutex;
enum uacce_q_state state;
u32 pasid;
- struct iommu_sva *handle;
+ struct iommu_attach_handle *handle;
struct address_space *mapping;
};

diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index a4099a1e2340..3ee89e9cb049 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -335,7 +335,7 @@ struct idxd_device {
struct idxd_wq **wqs;
struct idxd_engine **engines;

- struct iommu_sva *sva;
+ struct iommu_attach_handle *sva;
unsigned int pasid;

int num_groups;
diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
index 8078ab9acfbc..a029bda92615 100644
--- a/drivers/dma/idxd/cdev.c
+++ b/drivers/dma/idxd/cdev.c
@@ -45,7 +45,7 @@ struct idxd_user_context {
unsigned int pasid;
struct mm_struct *mm;
unsigned int flags;
- struct iommu_sva *sva;
+ struct iommu_attach_handle *sva;
struct idxd_dev idxd_dev;
u64 counters[COUNTER_MAX];
int id;
@@ -225,7 +225,7 @@ static int idxd_cdev_open(struct inode *inode, struct file *filp)
struct idxd_wq *wq;
struct device *dev, *fdev;
int rc = 0;
- struct iommu_sva *sva;
+ struct iommu_attach_handle *sva;
unsigned int pasid;
struct idxd_cdev *idxd_cdev;

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 640acc804e8c..35ac2e4836e9 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -41,7 +41,6 @@ static struct iommu_mm_data *iommu_alloc_mm_data(struct mm_struct *mm, struct de
}
iommu_mm->pasid = pasid;
INIT_LIST_HEAD(&iommu_mm->sva_domains);
- INIT_LIST_HEAD(&iommu_mm->sva_handles);
/*
* Make sure the write to mm->iommu_mm is not reordered in front of
* initialization to iommu_mm fields. If it does, readers may see a
@@ -67,13 +66,17 @@ static struct iommu_mm_data *iommu_alloc_mm_data(struct mm_struct *mm, struct de
*
* On error, returns an ERR_PTR value.
*/
-struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
+struct iommu_attach_handle *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
{
+ struct iommu_group *group = dev->iommu_group;
+ struct iommu_attach_handle *handle;
struct iommu_mm_data *iommu_mm;
struct iommu_domain *domain;
- struct iommu_sva *handle;
int ret;

+ if (!group)
+ return ERR_PTR(-ENODEV);
+
mutex_lock(&iommu_sva_lock);

/* Allocate mm->pasid if necessary. */
@@ -83,18 +86,11 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm
goto out_unlock;
}

- list_for_each_entry(handle, &mm->iommu_mm->sva_handles, handle_item) {
- if (handle->dev == dev) {
- refcount_inc(&handle->users);
- mutex_unlock(&iommu_sva_lock);
- return handle;
- }
- }
-
- handle = kzalloc(sizeof(*handle), GFP_KERNEL);
- if (!handle) {
- ret = -ENOMEM;
- goto out_unlock;
+ /* A bond already exists, just take a reference`. */
+ handle = iommu_attach_handle_get(group, iommu_mm->pasid);
+ if (handle) {
+ mutex_unlock(&iommu_sva_lock);
+ return handle;
}

/* Search for an existing domain. */
@@ -110,7 +106,7 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm
domain = iommu_sva_domain_alloc(dev, mm);
if (!domain) {
ret = -ENOMEM;
- goto out_free_handle;
+ goto out_unlock;
}

ret = iommu_attach_device_pasid(domain, dev, iommu_mm->pasid);
@@ -120,17 +116,14 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm
list_add(&domain->next, &mm->iommu_mm->sva_domains);

out:
- refcount_set(&handle->users, 1);
- list_add(&handle->handle_item, &mm->iommu_mm->sva_handles);
+ handle = iommu_attach_handle_get(group, iommu_mm->pasid);
mutex_unlock(&iommu_sva_lock);
- handle->dev = dev;
- handle->domain = domain;
+ handle->priv = dev;
+
return handle;

out_free_domain:
iommu_domain_free(domain);
-out_free_handle:
- kfree(handle);
out_unlock:
mutex_unlock(&iommu_sva_lock);
return ERR_PTR(ret);
@@ -145,30 +138,26 @@ EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
* not be issuing any more transaction for this PASID. All outstanding page
* requests for this PASID must have been flushed to the IOMMU.
*/
-void iommu_sva_unbind_device(struct iommu_sva *handle)
+void iommu_sva_unbind_device(struct iommu_attach_handle *handle)
{
struct iommu_domain *domain = handle->domain;
struct iommu_mm_data *iommu_mm = domain->mm->iommu_mm;
- struct device *dev = handle->dev;
+ struct device *dev = handle->priv;

mutex_lock(&iommu_sva_lock);
- if (!refcount_dec_and_test(&handle->users)) {
- mutex_unlock(&iommu_sva_lock);
- return;
- }
- list_del(&handle->handle_item);
-
- iommu_detach_device_pasid(domain, dev, iommu_mm->pasid);
- if (--domain->users == 0) {
- list_del(&domain->next);
- iommu_domain_free(domain);
+ iommu_attach_handle_put(handle);
+ if (refcount_read(&handle->users) == 1) {
+ iommu_detach_device_pasid(domain, dev, iommu_mm->pasid);
+ if (--domain->users == 0) {
+ list_del(&domain->next);
+ iommu_domain_free(domain);
+ }
}
mutex_unlock(&iommu_sva_lock);
- kfree(handle);
}
EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);

-u32 iommu_sva_get_pasid(struct iommu_sva *handle)
+u32 iommu_sva_get_pasid(struct iommu_attach_handle *handle)
{
struct iommu_domain *domain = handle->domain;

diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
index bdc2e6fda782..b325097421c1 100644
--- a/drivers/misc/uacce/uacce.c
+++ b/drivers/misc/uacce/uacce.c
@@ -106,7 +106,7 @@ static long uacce_fops_compat_ioctl(struct file *filep,
static int uacce_bind_queue(struct uacce_device *uacce, struct uacce_queue *q)
{
u32 pasid;
- struct iommu_sva *handle;
+ struct iommu_attach_handle *handle;

if (!(uacce->flags & UACCE_DEV_SVA))
return 0;
--
2.34.1


2024-04-03 01:17:17

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v4 3/9] iommu: Add attachment handle to struct iopf_group

Previously, the domain that a page fault targets is stored in an
iopf_group, which represents a minimal set of page faults. With the
introduction of attachment handle, replace the domain with the handle
so that the fault handler can obtain more information as needed
when handling the faults.

iommu_report_device_fault() is currently used for SVA page faults,
which handles the page fault in an internal cycle. The domain is retrieved
with iommu_get_domain_for_dev_pasid() if the pasid in the fault message
is valid. This doesn't work in IOMMUFD case, where if the pasid table of
a device is wholly managed by user space, there is no domain attached to
the PASID of the device. Improve this code to try fetching the attachment
handle on a PASID if possible, otherwise try it on the RID.

Signed-off-by: Lu Baolu <[email protected]>
---
include/linux/iommu.h | 2 +-
drivers/iommu/io-pgfault.c | 37 +++++++++++++++++--------------------
drivers/iommu/iommu-sva.c | 3 ++-
3 files changed, 20 insertions(+), 22 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index be9c9a10169d..652a0bdd5074 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -127,7 +127,7 @@ struct iopf_group {
/* list node for iommu_fault_param::faults */
struct list_head pending_node;
struct work_struct work;
- struct iommu_domain *domain;
+ struct iommu_attach_handle *attach_handle;
/* The device's fault data parameter. */
struct iommu_fault_param *fault_param;
};
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 06d78fcc79fd..32004976a6b7 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -50,6 +50,9 @@ static void __iopf_free_group(struct iopf_group *group)

/* Pair with iommu_report_device_fault(). */
iopf_put_dev_fault_param(group->fault_param);
+
+ /* Pair with get_attach_handle_for_iopf(). */
+ iommu_attach_handle_put(group->attach_handle);
}

void iopf_free_group(struct iopf_group *group)
@@ -59,28 +62,22 @@ void iopf_free_group(struct iopf_group *group)
}
EXPORT_SYMBOL_GPL(iopf_free_group);

-static struct iommu_domain *get_domain_for_iopf(struct device *dev,
- struct iommu_fault *fault)
+static struct iommu_attach_handle *
+get_attach_handle_for_iopf(struct device *dev, struct iommu_fault *fault)
{
- struct iommu_domain *domain;
+ struct iommu_group *group = dev->iommu_group;
+ struct iommu_attach_handle *handle;
+
+ if (!group)
+ return ERR_PTR(-ENODEV);

if (fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) {
- domain = iommu_get_domain_for_dev_pasid(dev, fault->prm.pasid, 0);
- if (IS_ERR(domain))
- domain = NULL;
- } else {
- domain = iommu_get_domain_for_dev(dev);
+ handle = iommu_attach_handle_get(group, fault->prm.pasid);
+ if (handle)
+ return handle;
}

- if (!domain || !domain->iopf_handler) {
- dev_warn_ratelimited(dev,
- "iopf (pasid %d) without domain attached or handler installed\n",
- fault->prm.pasid);
-
- return NULL;
- }
-
- return domain;
+ return iommu_attach_handle_get(group, IOMMU_NO_PASID);
}

/* Non-last request of a group. Postpone until the last one. */
@@ -206,15 +203,15 @@ void iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
if (group == &abort_group)
goto err_abort;

- group->domain = get_domain_for_iopf(dev, fault);
- if (!group->domain)
+ group->attach_handle = get_attach_handle_for_iopf(dev, fault);
+ if (!group->attach_handle || !group->attach_handle->domain->iopf_handler)
goto err_abort;

/*
* On success iopf_handler must call iopf_group_response() and
* iopf_free_group()
*/
- if (group->domain->iopf_handler(group))
+ if (group->attach_handle->domain->iopf_handler(group))
goto err_abort;

return;
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 35ac2e4836e9..c66cf26137bf 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -248,7 +248,8 @@ static void iommu_sva_handle_iopf(struct work_struct *work)
if (status != IOMMU_PAGE_RESP_SUCCESS)
break;

- status = iommu_sva_handle_mm(&iopf->fault, group->domain->mm);
+ status = iommu_sva_handle_mm(&iopf->fault,
+ group->attach_handle->domain->mm);
}

iopf_group_response(group, status);
--
2.34.1


2024-04-03 01:17:32

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v4 4/9] iommufd: Fault-capable hw page table attach/detach/replace

Add iopf-capable hw page table attach/detach/replace helpers. The pointer
to iommufd_device is stored in the domain attachment handle, so that it
can be echo'ed back in the iopf_group.

The iopf-capable hw page tables can only be attached to devices that
support the IOMMU_DEV_FEAT_IOPF feature. On the first attachment of an
iopf-capable hw_pagetable to the device, the IOPF feature is enabled on
the device. Similarly, after the last iopf-capable hwpt is detached from
the device, the IOPF feature is disabled on the device.

The current implementation allows a replacement between iopf-capable and
non-iopf-capable hw page tables. This matches the nested translation use
case, where a parent domain is attached by default and can then be
replaced with a nested user domain with iopf support.

Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/iommufd/iommufd_private.h | 9 ++
drivers/iommu/iommufd/device.c | 15 +++-
drivers/iommu/iommufd/fault.c | 113 ++++++++++++++++++++++++
drivers/iommu/iommufd/Makefile | 1 +
4 files changed, 135 insertions(+), 3 deletions(-)
create mode 100644 drivers/iommu/iommufd/fault.c

diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 991f864d1f9b..047cfb47112a 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -292,6 +292,7 @@ int iommufd_check_iova_range(struct io_pagetable *iopt,
struct iommufd_hw_pagetable {
struct iommufd_object obj;
struct iommu_domain *domain;
+ bool fault_capable;
};

struct iommufd_hwpt_paging {
@@ -395,6 +396,7 @@ struct iommufd_device {
/* always the physical device */
struct device *dev;
bool enforce_cache_coherency;
+ bool iopf_enabled;
};

static inline struct iommufd_device *
@@ -426,6 +428,13 @@ void iopt_remove_access(struct io_pagetable *iopt,
u32 iopt_access_list_id);
void iommufd_access_destroy_object(struct iommufd_object *obj);

+int iommufd_fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt,
+ struct iommufd_device *idev);
+void iommufd_fault_domain_detach_dev(struct iommufd_hw_pagetable *hwpt,
+ struct iommufd_device *idev);
+int iommufd_fault_domain_replace_dev(struct iommufd_hw_pagetable *hwpt,
+ struct iommufd_device *idev);
+
#ifdef CONFIG_IOMMUFD_TEST
int iommufd_test(struct iommufd_ucmd *ucmd);
void iommufd_selftest_destroy(struct iommufd_object *obj);
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 873630c111c1..4fc183a83925 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -376,7 +376,10 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
* attachment.
*/
if (list_empty(&idev->igroup->device_list)) {
- rc = iommu_attach_group(hwpt->domain, idev->igroup->group);
+ if (hwpt->fault_capable)
+ rc = iommufd_fault_domain_attach_dev(hwpt, idev);
+ else
+ rc = iommu_attach_group(hwpt->domain, idev->igroup->group);
if (rc)
goto err_unresv;
idev->igroup->hwpt = hwpt;
@@ -402,7 +405,10 @@ iommufd_hw_pagetable_detach(struct iommufd_device *idev)
mutex_lock(&idev->igroup->lock);
list_del(&idev->group_item);
if (list_empty(&idev->igroup->device_list)) {
- iommu_detach_group(hwpt->domain, idev->igroup->group);
+ if (hwpt->fault_capable)
+ iommufd_fault_domain_detach_dev(hwpt, idev);
+ else
+ iommu_detach_group(hwpt->domain, idev->igroup->group);
idev->igroup->hwpt = NULL;
}
if (hwpt_is_paging(hwpt))
@@ -497,7 +503,10 @@ iommufd_device_do_replace(struct iommufd_device *idev,
goto err_unlock;
}

- rc = iommu_group_replace_domain(igroup->group, hwpt->domain);
+ if (old_hwpt->fault_capable || hwpt->fault_capable)
+ rc = iommufd_fault_domain_replace_dev(hwpt, idev);
+ else
+ rc = iommu_group_replace_domain(igroup->group, hwpt->domain);
if (rc)
goto err_unresv;

diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
new file mode 100644
index 000000000000..47d7c106d839
--- /dev/null
+++ b/drivers/iommu/iommufd/fault.c
@@ -0,0 +1,113 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (C) 2024 Intel Corporation
+ */
+#define pr_fmt(fmt) "iommufd: " fmt
+
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/iommufd.h>
+#include <linux/poll.h>
+#include <linux/anon_inodes.h>
+#include <uapi/linux/iommufd.h>
+
+#include "../iommu-priv.h"
+#include "iommufd_private.h"
+
+static int iommufd_fault_iopf_enable(struct iommufd_device *idev)
+{
+ int ret;
+
+ if (idev->iopf_enabled)
+ return 0;
+
+ ret = iommu_dev_enable_feature(idev->dev, IOMMU_DEV_FEAT_IOPF);
+ if (ret)
+ return ret;
+
+ idev->iopf_enabled = true;
+
+ return 0;
+}
+
+static void iommufd_fault_iopf_disable(struct iommufd_device *idev)
+{
+ if (!idev->iopf_enabled)
+ return;
+
+ iommu_dev_disable_feature(idev->dev, IOMMU_DEV_FEAT_IOPF);
+ idev->iopf_enabled = false;
+}
+
+int iommufd_fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt,
+ struct iommufd_device *idev)
+{
+ struct iommu_attach_handle *handle;
+ int ret;
+
+ if (!hwpt->fault_capable)
+ return -EINVAL;
+
+ if (!idev->iopf_enabled) {
+ ret = iommufd_fault_iopf_enable(idev);
+ if (ret)
+ return ret;
+ }
+
+ ret = iommu_attach_group(hwpt->domain, idev->igroup->group);
+ if (ret) {
+ iommufd_fault_iopf_disable(idev);
+ return ret;
+ }
+
+ handle = iommu_attach_handle_get(idev->igroup->group, IOMMU_NO_PASID);
+ handle->priv = idev;
+ iommu_attach_handle_put(handle);
+
+ return 0;
+}
+
+void iommufd_fault_domain_detach_dev(struct iommufd_hw_pagetable *hwpt,
+ struct iommufd_device *idev)
+{
+ if (WARN_ON(!hwpt->fault_capable))
+ return;
+
+ iommu_detach_group(hwpt->domain, idev->igroup->group);
+ iommufd_fault_iopf_disable(idev);
+}
+
+int iommufd_fault_domain_replace_dev(struct iommufd_hw_pagetable *hwpt,
+ struct iommufd_device *idev)
+{
+ bool iopf_enabled_originally = idev->iopf_enabled;
+ struct iommu_attach_handle *handle;
+ int ret;
+
+ if (hwpt->fault_capable) {
+ ret = iommufd_fault_iopf_enable(idev);
+ if (ret)
+ return ret;
+ }
+
+ ret = iommu_group_replace_domain(idev->igroup->group, hwpt->domain);
+ if (ret)
+ goto out_cleanup;
+
+ if (!hwpt->fault_capable)
+ iommufd_fault_iopf_disable(idev);
+
+ handle = iommu_attach_handle_get(idev->igroup->group, IOMMU_NO_PASID);
+ handle->priv = idev;
+ iommu_attach_handle_put(handle);
+
+ return 0;
+out_cleanup:
+ if (iopf_enabled_originally)
+ iommufd_fault_iopf_enable(idev);
+ else
+ iommufd_fault_iopf_disable(idev);
+
+ return ret;
+}
diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile
index 34b446146961..b94a74366eed 100644
--- a/drivers/iommu/iommufd/Makefile
+++ b/drivers/iommu/iommufd/Makefile
@@ -6,6 +6,7 @@ iommufd-y := \
ioas.o \
main.o \
pages.o \
+ fault.o \
vfio_compat.o

iommufd-$(CONFIG_IOMMUFD_TEST) += selftest.o
--
2.34.1


2024-04-03 01:17:48

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v4 5/9] iommufd: Add fault and response message definitions

iommu_hwpt_pgfaults represent fault messages that the userspace can
retrieve. Multiple iommu_hwpt_pgfaults might be put in an iopf group,
with the IOMMU_PGFAULT_FLAGS_LAST_PAGE flag set only for the last
iommu_hwpt_pgfault.

An iommu_hwpt_page_response is a response message that the userspace
should send to the kernel after finishing handling a group of fault
messages. The @dev_id, @pasid, and @grpid fields in the message
identify an outstanding iopf group for a device. The @cookie field,
which matches the cookie field of the last fault in the group, will
be used by the kernel to look up the pending message.

Signed-off-by: Lu Baolu <[email protected]>
---
include/uapi/linux/iommufd.h | 96 ++++++++++++++++++++++++++++++++++++
1 file changed, 96 insertions(+)

diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 1dfeaa2e649e..83b45dce94a4 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -692,4 +692,100 @@ struct iommu_hwpt_invalidate {
__u32 __reserved;
};
#define IOMMU_HWPT_INVALIDATE _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_INVALIDATE)
+
+/**
+ * enum iommu_hwpt_pgfault_flags - flags for struct iommu_hwpt_pgfault
+ * @IOMMU_PGFAULT_FLAGS_PASID_VALID: The pasid field of the fault data is
+ * valid.
+ * @IOMMU_PGFAULT_FLAGS_LAST_PAGE: It's the last fault of a fault group.
+ */
+enum iommu_hwpt_pgfault_flags {
+ IOMMU_PGFAULT_FLAGS_PASID_VALID = (1 << 0),
+ IOMMU_PGFAULT_FLAGS_LAST_PAGE = (1 << 1),
+};
+
+/**
+ * enum iommu_hwpt_pgfault_perm - perm bits for struct iommu_hwpt_pgfault
+ * @IOMMU_PGFAULT_PERM_READ: request for read permission
+ * @IOMMU_PGFAULT_PERM_WRITE: request for write permission
+ * @IOMMU_PGFAULT_PERM_EXEC: (PCIE 10.4.1) request with a PASID that has the
+ * Execute Requested bit set in PASID TLP Prefix.
+ * @IOMMU_PGFAULT_PERM_PRIV: (PCIE 10.4.1) request with a PASID that has the
+ * Privileged Mode Requested bit set in PASID TLP
+ * Prefix.
+ */
+enum iommu_hwpt_pgfault_perm {
+ IOMMU_PGFAULT_PERM_READ = (1 << 0),
+ IOMMU_PGFAULT_PERM_WRITE = (1 << 1),
+ IOMMU_PGFAULT_PERM_EXEC = (1 << 2),
+ IOMMU_PGFAULT_PERM_PRIV = (1 << 3),
+};
+
+/**
+ * struct iommu_hwpt_pgfault - iommu page fault data
+ * @size: sizeof(struct iommu_hwpt_pgfault)
+ * @flags: Combination of enum iommu_hwpt_pgfault_flags
+ * @dev_id: id of the originated device
+ * @pasid: Process Address Space ID
+ * @grpid: Page Request Group Index
+ * @perm: Combination of enum iommu_hwpt_pgfault_perm
+ * @addr: Page address
+ * @length: a hint of how much data the requestor is expecting to fetch. For
+ * example, if the PRI initiator knows it is going to do a 10MB
+ * transfer, it could fill in 10MB and the OS could pre-fault in
+ * 10MB of IOVA. It's default to 0 if there's no such hint.
+ * @cookie: kernel-managed cookie identifying a group of fault messages. The
+ * cookie number encoded in the last page fault of the group should
+ * be echoed back in the response message.
+ */
+struct iommu_hwpt_pgfault {
+ __u32 size;
+ __u32 flags;
+ __u32 dev_id;
+ __u32 pasid;
+ __u32 grpid;
+ __u32 perm;
+ __u64 addr;
+ __u32 length;
+ __u32 cookie;
+};
+
+/**
+ * enum iommufd_page_response_code - Return status of fault handlers
+ * @IOMMUFD_PAGE_RESP_SUCCESS: Fault has been handled and the page tables
+ * populated, retry the access. This is the
+ * "Success" defined in PCI 10.4.2.1.
+ * @IOMMUFD_PAGE_RESP_INVALID: General error. Drop all subsequent faults
+ * from this device if possible. This is the
+ * "Response Failure" in PCI 10.4.2.1.
+ * @IOMMUFD_PAGE_RESP_FAILURE: Could not handle this fault, don't retry the
+ * access. This is the "Invalid Request" in PCI
+ * 10.4.2.1.
+ */
+enum iommufd_page_response_code {
+ IOMMUFD_PAGE_RESP_SUCCESS = 0,
+ IOMMUFD_PAGE_RESP_INVALID,
+ IOMMUFD_PAGE_RESP_FAILURE,
+};
+
+/**
+ * struct iommu_hwpt_page_response - IOMMU page fault response
+ * @size: sizeof(struct iommu_hwpt_page_response)
+ * @flags: Must be set to 0
+ * @dev_id: device ID of target device for the response
+ * @pasid: Process Address Space ID
+ * @grpid: Page Request Group Index
+ * @code: One of response code in enum iommufd_page_response_code.
+ * @cookie: The kernel-managed cookie reported in the fault message.
+ */
+struct iommu_hwpt_page_response {
+ __u32 size;
+ __u32 flags;
+ __u32 dev_id;
+ __u32 pasid;
+ __u32 grpid;
+ __u32 code;
+ __u32 cookie;
+ __u32 reserved;
+};
#endif
--
2.34.1


2024-04-03 01:18:04

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v4 6/9] iommufd: Add iommufd fault object

An iommufd fault object provides an interface for delivering I/O page
faults to user space. These objects are created and destroyed by user
space, and they can be associated with or dissociated from hardware page
table objects during page table allocation or destruction.

User space interacts with the fault object through a file interface. This
interface offers a straightforward and efficient way for user space to
handle page faults. It allows user space to read fault messages
sequentially and respond to them by writing to the same file. The file
interface supports reading messages in poll mode, so it's recommended that
user space applications use io_uring to enhance read and write efficiency.

A fault object can be associated with any iopf-capable iommufd_hw_pgtable
during the pgtable's allocation. All I/O page faults triggered by devices
when accessing the I/O addresses of an iommufd_hw_pgtable are routed
through the fault object to user space. Similarly, user space's responses
to these page faults are routed back to the iommu device driver through
the same fault object.

Signed-off-by: Lu Baolu <[email protected]>
---
include/linux/iommu.h | 3 +
drivers/iommu/iommufd/iommufd_private.h | 24 +++
include/uapi/linux/iommufd.h | 18 ++
drivers/iommu/iommufd/device.c | 1 +
drivers/iommu/iommufd/fault.c | 239 ++++++++++++++++++++++++
drivers/iommu/iommufd/main.c | 6 +
6 files changed, 291 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 652a0bdd5074..b479e0ad5dea 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -130,6 +130,9 @@ struct iopf_group {
struct iommu_attach_handle *attach_handle;
/* The device's fault data parameter. */
struct iommu_fault_param *fault_param;
+ /* Used by handler provider to hook the group on its own lists. */
+ struct list_head node;
+ u32 cookie;
};

/**
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 047cfb47112a..8dad1150eaf0 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -128,6 +128,7 @@ enum iommufd_object_type {
IOMMUFD_OBJ_HWPT_NESTED,
IOMMUFD_OBJ_IOAS,
IOMMUFD_OBJ_ACCESS,
+ IOMMUFD_OBJ_FAULT,
#ifdef CONFIG_IOMMUFD_TEST
IOMMUFD_OBJ_SELFTEST,
#endif
@@ -397,6 +398,8 @@ struct iommufd_device {
struct device *dev;
bool enforce_cache_coherency;
bool iopf_enabled;
+ /* outstanding faults awaiting response indexed by fault group id */
+ struct xarray faults;
};

static inline struct iommufd_device *
@@ -435,6 +438,27 @@ void iommufd_fault_domain_detach_dev(struct iommufd_hw_pagetable *hwpt,
int iommufd_fault_domain_replace_dev(struct iommufd_hw_pagetable *hwpt,
struct iommufd_device *idev);

+/*
+ * An iommufd_fault object represents an interface to deliver I/O page faults
+ * to the user space. These objects are created/destroyed by the user space and
+ * associated with hardware page table objects during page-table allocation.
+ */
+struct iommufd_fault {
+ struct iommufd_object obj;
+ struct iommufd_ctx *ictx;
+ struct file *filep;
+
+ /* The lists of outstanding faults protected by below mutex. */
+ struct mutex mutex;
+ struct list_head deliver;
+ struct list_head response;
+
+ struct wait_queue_head wait_queue;
+};
+
+int iommufd_fault_alloc(struct iommufd_ucmd *ucmd);
+void iommufd_fault_destroy(struct iommufd_object *obj);
+
#ifdef CONFIG_IOMMUFD_TEST
int iommufd_test(struct iommufd_ucmd *ucmd);
void iommufd_selftest_destroy(struct iommufd_object *obj);
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 83b45dce94a4..1819b28e9e6b 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -50,6 +50,7 @@ enum {
IOMMUFD_CMD_HWPT_SET_DIRTY_TRACKING,
IOMMUFD_CMD_HWPT_GET_DIRTY_BITMAP,
IOMMUFD_CMD_HWPT_INVALIDATE,
+ IOMMUFD_CMD_FAULT_ALLOC,
};

/**
@@ -788,4 +789,21 @@ struct iommu_hwpt_page_response {
__u32 cookie;
__u32 reserved;
};
+
+/**
+ * struct iommu_fault_alloc - ioctl(IOMMU_FAULT_ALLOC)
+ * @size: sizeof(struct iommu_fault_alloc)
+ * @flags: Must be 0
+ * @out_fault_id: The ID of the new FAULT
+ * @out_fault_fd: The fd of the new FAULT
+ *
+ * Explicitly allocate a fault handling object.
+ */
+struct iommu_fault_alloc {
+ __u32 size;
+ __u32 flags;
+ __u32 out_fault_id;
+ __u32 out_fault_fd;
+};
+#define IOMMU_FAULT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_FAULT_ALLOC)
#endif
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 4fc183a83925..3f7a11018ec2 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -215,6 +215,7 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
refcount_inc(&idev->obj.users);
/* igroup refcount moves into iommufd_device */
idev->igroup = igroup;
+ xa_init_flags(&idev->faults, XA_FLAGS_ALLOC1);

/*
* If the caller fails after this success it must call
diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
index 47d7c106d839..f16f15e7edd4 100644
--- a/drivers/iommu/iommufd/fault.c
+++ b/drivers/iommu/iommufd/fault.c
@@ -111,3 +111,242 @@ int iommufd_fault_domain_replace_dev(struct iommufd_hw_pagetable *hwpt,

return ret;
}
+
+static void device_remove_fault(struct iopf_group *group)
+{
+ struct iommufd_device *idev = group->attach_handle->priv;
+
+ xa_erase(&idev->faults, group->cookie);
+}
+
+void iommufd_fault_destroy(struct iommufd_object *obj)
+{
+ struct iommufd_fault *fault = container_of(obj, struct iommufd_fault, obj);
+ struct iopf_group *group, *next;
+
+ /*
+ * The iommufd object's reference count is zero at this point.
+ * We can be confident that no other threads are currently
+ * accessing this pointer. Therefore, acquiring the mutex here
+ * is unnecessary.
+ */
+ list_for_each_entry_safe(group, next, &fault->deliver, node) {
+ list_del(&group->node);
+ iopf_group_response(group, IOMMU_PAGE_RESP_INVALID);
+ iopf_free_group(group);
+ }
+ list_for_each_entry_safe(group, next, &fault->response, node) {
+ list_del(&group->node);
+ device_remove_fault(group);
+ iopf_group_response(group, IOMMU_PAGE_RESP_INVALID);
+ iopf_free_group(group);
+ }
+}
+
+static void iommufd_compose_fault_message(struct iommu_fault *fault,
+ struct iommu_hwpt_pgfault *hwpt_fault,
+ struct iommufd_device *idev,
+ u32 cookie)
+{
+ hwpt_fault->size = sizeof(*hwpt_fault);
+ hwpt_fault->flags = fault->prm.flags;
+ hwpt_fault->dev_id = idev->obj.id;
+ hwpt_fault->pasid = fault->prm.pasid;
+ hwpt_fault->grpid = fault->prm.grpid;
+ hwpt_fault->perm = fault->prm.perm;
+ hwpt_fault->addr = fault->prm.addr;
+ hwpt_fault->length = 0;
+ hwpt_fault->cookie = cookie;
+}
+
+static ssize_t iommufd_fault_fops_read(struct file *filep, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ size_t fault_size = sizeof(struct iommu_hwpt_pgfault);
+ struct iommufd_fault *fault = filep->private_data;
+ struct iommu_hwpt_pgfault data;
+ struct iommufd_device *idev;
+ struct iopf_group *group;
+ struct iopf_fault *iopf;
+ size_t done = 0;
+ int rc;
+
+ if (*ppos || count % fault_size)
+ return -ESPIPE;
+
+ mutex_lock(&fault->mutex);
+ while (!list_empty(&fault->deliver) && count > done) {
+ group = list_first_entry(&fault->deliver,
+ struct iopf_group, node);
+
+ if (list_count_nodes(&group->faults) * fault_size > count - done)
+ break;
+
+ idev = group->attach_handle->priv;
+ if (!idev)
+ break;
+
+ rc = xa_alloc(&idev->faults, &group->cookie, group,
+ xa_limit_32b, GFP_KERNEL);
+ if (rc)
+ break;
+
+ list_for_each_entry(iopf, &group->faults, list) {
+ iommufd_compose_fault_message(&iopf->fault,
+ &data, idev,
+ group->cookie);
+ rc = copy_to_user(buf + done, &data, fault_size);
+ if (rc) {
+ xa_erase(&idev->faults, group->cookie);
+ break;
+ }
+ done += fault_size;
+ }
+
+ list_move_tail(&group->node, &fault->response);
+ }
+ mutex_unlock(&fault->mutex);
+
+ return done;
+}
+
+static ssize_t iommufd_fault_fops_write(struct file *filep, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ size_t response_size = sizeof(struct iommu_hwpt_page_response);
+ struct iommufd_fault *fault = filep->private_data;
+ struct iommu_hwpt_page_response response;
+ struct iommufd_device *idev = NULL;
+ struct iopf_group *group;
+ size_t done = 0;
+ int rc;
+
+ if (*ppos || count % response_size)
+ return -ESPIPE;
+
+ while (!list_empty(&fault->response) && count > done) {
+ rc = copy_from_user(&response, buf + done, response_size);
+ if (rc)
+ break;
+
+ if (!idev || idev->obj.id != response.dev_id)
+ idev = container_of(iommufd_get_object(fault->ictx,
+ response.dev_id,
+ IOMMUFD_OBJ_DEVICE),
+ struct iommufd_device, obj);
+ if (IS_ERR(idev))
+ break;
+
+ group = xa_erase(&idev->faults, response.cookie);
+ if (!group)
+ break;
+
+ iopf_group_response(group, response.code);
+
+ mutex_lock(&fault->mutex);
+ list_del(&group->node);
+ mutex_unlock(&fault->mutex);
+
+ iopf_free_group(group);
+ done += response_size;
+
+ iommufd_put_object(fault->ictx, &idev->obj);
+ }
+
+ return done;
+}
+
+static __poll_t iommufd_fault_fops_poll(struct file *filep,
+ struct poll_table_struct *wait)
+{
+ struct iommufd_fault *fault = filep->private_data;
+ __poll_t pollflags = EPOLLOUT;
+
+ poll_wait(filep, &fault->wait_queue, wait);
+ mutex_lock(&fault->mutex);
+ if (!list_empty(&fault->deliver))
+ pollflags |= EPOLLIN | EPOLLRDNORM;
+ mutex_unlock(&fault->mutex);
+
+ return pollflags;
+}
+
+static int iommufd_fault_fops_release(struct inode *inode, struct file *filep)
+{
+ struct iommufd_fault *fault = filep->private_data;
+
+ iommufd_ctx_put(fault->ictx);
+ refcount_dec(&fault->obj.users);
+ return 0;
+}
+
+static const struct file_operations iommufd_fault_fops = {
+ .owner = THIS_MODULE,
+ .open = nonseekable_open,
+ .read = iommufd_fault_fops_read,
+ .write = iommufd_fault_fops_write,
+ .poll = iommufd_fault_fops_poll,
+ .release = iommufd_fault_fops_release,
+ .llseek = no_llseek,
+};
+
+int iommufd_fault_alloc(struct iommufd_ucmd *ucmd)
+{
+ struct iommu_fault_alloc *cmd = ucmd->cmd;
+ struct iommufd_fault *fault;
+ struct file *filep;
+ int fdno;
+ int rc;
+
+ if (cmd->flags)
+ return -EOPNOTSUPP;
+
+ fault = iommufd_object_alloc(ucmd->ictx, fault, IOMMUFD_OBJ_FAULT);
+ if (IS_ERR(fault))
+ return PTR_ERR(fault);
+
+ fault->ictx = ucmd->ictx;
+ INIT_LIST_HEAD(&fault->deliver);
+ INIT_LIST_HEAD(&fault->response);
+ mutex_init(&fault->mutex);
+ init_waitqueue_head(&fault->wait_queue);
+
+ filep = anon_inode_getfile("[iommufd-pgfault]", &iommufd_fault_fops,
+ fault, O_RDWR);
+ if (IS_ERR(filep)) {
+ rc = PTR_ERR(filep);
+ goto out_abort;
+ }
+
+ refcount_inc(&fault->obj.users);
+ iommufd_ctx_get(fault->ictx);
+ fault->filep = filep;
+
+ fdno = get_unused_fd_flags(O_CLOEXEC);
+ if (fdno < 0) {
+ rc = fdno;
+ goto out_fput;
+ }
+
+ cmd->out_fault_id = fault->obj.id;
+ cmd->out_fault_fd = fdno;
+
+ rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
+ if (rc)
+ goto out_put_fdno;
+ iommufd_object_finalize(ucmd->ictx, &fault->obj);
+
+ fd_install(fdno, fault->filep);
+
+ return 0;
+out_put_fdno:
+ put_unused_fd(fdno);
+out_fput:
+ fput(filep);
+ refcount_dec(&fault->obj.users);
+ iommufd_ctx_put(fault->ictx);
+out_abort:
+ iommufd_object_abort_and_destroy(ucmd->ictx, &fault->obj);
+
+ return rc;
+}
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 39b32932c61e..792db077d33e 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -332,6 +332,7 @@ union ucmd_buffer {
struct iommu_ioas_unmap unmap;
struct iommu_option option;
struct iommu_vfio_ioas vfio_ioas;
+ struct iommu_fault_alloc fault;
#ifdef CONFIG_IOMMUFD_TEST
struct iommu_test_cmd test;
#endif
@@ -381,6 +382,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
val64),
IOCTL_OP(IOMMU_VFIO_IOAS, iommufd_vfio_ioas, struct iommu_vfio_ioas,
__reserved),
+ IOCTL_OP(IOMMU_FAULT_ALLOC, iommufd_fault_alloc, struct iommu_fault_alloc,
+ out_fault_fd),
#ifdef CONFIG_IOMMUFD_TEST
IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last),
#endif
@@ -513,6 +516,9 @@ static const struct iommufd_object_ops iommufd_object_ops[] = {
.destroy = iommufd_hwpt_nested_destroy,
.abort = iommufd_hwpt_nested_abort,
},
+ [IOMMUFD_OBJ_FAULT] = {
+ .destroy = iommufd_fault_destroy,
+ },
#ifdef CONFIG_IOMMUFD_TEST
[IOMMUFD_OBJ_SELFTEST] = {
.destroy = iommufd_selftest_destroy,
--
2.34.1


2024-04-03 01:18:48

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v4 9/9] iommufd/selftest: Add coverage for IOPF test

Extend the selftest tool to add coverage of testing IOPF handling. This
would include the following tests:

- Allocating and destroying an iommufd fault object.
- Allocating and destroying an IOPF-capable HWPT.
- Attaching/detaching/replacing an IOPF-capable HWPT on a device.
- Triggering an IOPF on the mock device.
- Retrieving and responding to the IOPF through the file interface.

Signed-off-by: Lu Baolu <[email protected]>
---
tools/testing/selftests/iommu/iommufd_utils.h | 84 +++++++++++++++++--
tools/testing/selftests/iommu/iommufd.c | 18 ++++
.../selftests/iommu/iommufd_fail_nth.c | 2 +-
3 files changed, 98 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 8d2b46b2114d..7f33389e070c 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -153,7 +153,7 @@ static int _test_cmd_mock_domain_replace(int fd, __u32 stdev_id, __u32 pt_id,
EXPECT_ERRNO(_errno, _test_cmd_mock_domain_replace(self->fd, stdev_id, \
pt_id, NULL))

-static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id,
+static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, __u32 ft_id,
__u32 flags, __u32 *hwpt_id, __u32 data_type,
void *data, size_t data_len)
{
@@ -165,6 +165,7 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id,
.data_type = data_type,
.data_len = data_len,
.data_uptr = (uint64_t)data,
+ .fault_id = ft_id,
};
int ret;

@@ -177,24 +178,30 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id,
}

#define test_cmd_hwpt_alloc(device_id, pt_id, flags, hwpt_id) \
- ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \
+ ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, 0, flags, \
hwpt_id, IOMMU_HWPT_DATA_NONE, NULL, \
0))
#define test_err_hwpt_alloc(_errno, device_id, pt_id, flags, hwpt_id) \
EXPECT_ERRNO(_errno, _test_cmd_hwpt_alloc( \
- self->fd, device_id, pt_id, flags, \
+ self->fd, device_id, pt_id, 0, flags, \
hwpt_id, IOMMU_HWPT_DATA_NONE, NULL, 0))

#define test_cmd_hwpt_alloc_nested(device_id, pt_id, flags, hwpt_id, \
data_type, data, data_len) \
- ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \
+ ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, 0, flags, \
hwpt_id, data_type, data, data_len))
#define test_err_hwpt_alloc_nested(_errno, device_id, pt_id, flags, hwpt_id, \
data_type, data, data_len) \
EXPECT_ERRNO(_errno, \
- _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \
+ _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, 0, flags, \
hwpt_id, data_type, data, data_len))

+#define test_cmd_hwpt_alloc_iopf(device_id, pt_id, fault_id, flags, hwpt_id, \
+ data_type, data, data_len) \
+ ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, fault_id, \
+ flags, hwpt_id, data_type, data, \
+ data_len))
+
#define test_cmd_hwpt_check_iotlb(hwpt_id, iotlb_id, expected) \
({ \
struct iommu_test_cmd test_cmd = { \
@@ -684,3 +691,70 @@ static int _test_cmd_get_hw_info(int fd, __u32 device_id, void *data,

#define test_cmd_get_hw_capabilities(device_id, caps, mask) \
ASSERT_EQ(0, _test_cmd_get_hw_info(self->fd, device_id, NULL, 0, &caps))
+
+static int _test_ioctl_fault_alloc(int fd, __u32 *fault_id, __u32 *fault_fd)
+{
+ struct iommu_fault_alloc cmd = {
+ .size = sizeof(cmd),
+ };
+ int ret;
+
+ ret = ioctl(fd, IOMMU_FAULT_ALLOC, &cmd);
+ if (ret)
+ return ret;
+ *fault_id = cmd.out_fault_id;
+ *fault_fd = cmd.out_fault_fd;
+ return 0;
+}
+
+#define test_ioctl_fault_alloc(fault_id, fault_fd) \
+ ({ \
+ ASSERT_EQ(0, _test_ioctl_fault_alloc(self->fd, fault_id, \
+ fault_fd)); \
+ ASSERT_NE(0, *(fault_id)); \
+ ASSERT_NE(0, *(fault_fd)); \
+ })
+
+static int _test_cmd_trigger_iopf(int fd, __u32 device_id, __u32 fault_fd)
+{
+ struct iommu_test_cmd trigger_iopf_cmd = {
+ .size = sizeof(trigger_iopf_cmd),
+ .op = IOMMU_TEST_OP_TRIGGER_IOPF,
+ .trigger_iopf = {
+ .dev_id = device_id,
+ .pasid = 0x1,
+ .grpid = 0x2,
+ .perm = IOMMU_PGFAULT_PERM_READ | IOMMU_PGFAULT_PERM_WRITE,
+ .addr = 0xdeadbeaf,
+ },
+ };
+ struct iommu_hwpt_page_response response = {
+ .size = sizeof(struct iommu_hwpt_page_response),
+ .dev_id = device_id,
+ .pasid = 0x1,
+ .grpid = 0x2,
+ .code = 0,
+ };
+ struct iommu_hwpt_pgfault fault = {};
+ ssize_t bytes;
+ int ret;
+
+ ret = ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_TRIGGER_IOPF), &trigger_iopf_cmd);
+ if (ret)
+ return ret;
+
+ bytes = read(fault_fd, &fault, sizeof(fault));
+ if (bytes <= 0)
+ return -EIO;
+
+ response.cookie = fault.cookie;
+
+ bytes = write(fault_fd, &response, sizeof(response));
+ if (bytes <= 0)
+ return -EIO;
+
+ return 0;
+}
+
+#define test_cmd_trigger_iopf(device_id, fault_fd) \
+ ASSERT_EQ(0, _test_cmd_trigger_iopf(self->fd, device_id, fault_fd))
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index edf1c99c9936..5b0169875a4d 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -279,6 +279,9 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested)
uint32_t parent_hwpt_id = 0;
uint32_t parent_hwpt_id_not_work = 0;
uint32_t test_hwpt_id = 0;
+ uint32_t iopf_hwpt_id;
+ uint32_t fault_id;
+ uint32_t fault_fd;

if (self->device_id) {
/* Negative tests */
@@ -326,6 +329,7 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested)
sizeof(data));

/* Allocate two nested hwpts sharing one common parent hwpt */
+ test_ioctl_fault_alloc(&fault_id, &fault_fd);
test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id, 0,
&nested_hwpt_id[0],
IOMMU_HWPT_DATA_SELFTEST, &data,
@@ -334,6 +338,10 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested)
&nested_hwpt_id[1],
IOMMU_HWPT_DATA_SELFTEST, &data,
sizeof(data));
+ test_cmd_hwpt_alloc_iopf(self->device_id, parent_hwpt_id, fault_id,
+ IOMMU_HWPT_FAULT_ID_VALID, &iopf_hwpt_id,
+ IOMMU_HWPT_DATA_SELFTEST, &data,
+ sizeof(data));
test_cmd_hwpt_check_iotlb_all(nested_hwpt_id[0],
IOMMU_TEST_IOTLB_DEFAULT);
test_cmd_hwpt_check_iotlb_all(nested_hwpt_id[1],
@@ -504,14 +512,24 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested)
_test_ioctl_destroy(self->fd, nested_hwpt_id[1]));
test_ioctl_destroy(nested_hwpt_id[0]);

+ /* Switch from nested_hwpt_id[1] to iopf_hwpt_id */
+ test_cmd_mock_domain_replace(self->stdev_id, iopf_hwpt_id);
+ EXPECT_ERRNO(EBUSY,
+ _test_ioctl_destroy(self->fd, iopf_hwpt_id));
+ /* Trigger an IOPF on the device */
+ test_cmd_trigger_iopf(self->device_id, fault_fd);
+
/* Detach from nested_hwpt_id[1] and destroy it */
test_cmd_mock_domain_replace(self->stdev_id, parent_hwpt_id);
test_ioctl_destroy(nested_hwpt_id[1]);
+ test_ioctl_destroy(iopf_hwpt_id);

/* Detach from the parent hw_pagetable and destroy it */
test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id);
test_ioctl_destroy(parent_hwpt_id);
test_ioctl_destroy(parent_hwpt_id_not_work);
+ close(fault_fd);
+ test_ioctl_destroy(fault_id);
} else {
test_err_hwpt_alloc(ENOENT, self->device_id, self->ioas_id, 0,
&parent_hwpt_id);
diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c
index f590417cd67a..c5d5e69452b0 100644
--- a/tools/testing/selftests/iommu/iommufd_fail_nth.c
+++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c
@@ -615,7 +615,7 @@ TEST_FAIL_NTH(basic_fail_nth, device)
if (_test_cmd_get_hw_info(self->fd, idev_id, &info, sizeof(info), NULL))
return -1;

- if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, 0, &hwpt_id,
+ if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, 0, 0, &hwpt_id,
IOMMU_HWPT_DATA_NONE, 0, 0))
return -1;

--
2.34.1


2024-04-03 02:31:30

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v4 1/9] iommu: Introduce domain attachment handle

Currently, when attaching a domain to a device or its PASID, domain is
stored within the iommu group. It could be retrieved for use during the
window between attachment and detachment.

With new features introduced, there's a need to store more information
than just a domain pointer. This information essentially represents the
association between a domain and a device. For example, the SVA code
already has a custom struct iommu_sva which represents a bond between
sva domain and a PASID of a device. Looking forward, the IOMMUFD needs
a place to store the iommufd_device pointer in the core, so that the
device object ID could be quickly retrieved in the critical fault handling
path.

Introduce domain attachment handle that explicitly represents the
attachment relationship between a domain and a device or its PASID.
A caller-specific data field can be used by the caller to store additional
information beyond a domain pointer, depending on its specific use case.

Co-developed-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/iommu-priv.h | 9 +++
drivers/iommu/iommu.c | 158 +++++++++++++++++++++++++++++++++----
2 files changed, 153 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
index 5f731d994803..08c0667cef54 100644
--- a/drivers/iommu/iommu-priv.h
+++ b/drivers/iommu/iommu-priv.h
@@ -28,4 +28,13 @@ void iommu_device_unregister_bus(struct iommu_device *iommu,
const struct bus_type *bus,
struct notifier_block *nb);

+struct iommu_attach_handle {
+ struct iommu_domain *domain;
+ refcount_t users;
+ void *priv;
+};
+
+struct iommu_attach_handle *iommu_attach_handle_get(struct iommu_group *group,
+ ioasid_t pasid);
+void iommu_attach_handle_put(struct iommu_attach_handle *handle);
#endif /* __LINUX_IOMMU_PRIV_H */
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index a95a483def2d..8bbff3bf7c26 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2039,6 +2039,94 @@ void iommu_domain_free(struct iommu_domain *domain)
}
EXPORT_SYMBOL_GPL(iommu_domain_free);

+/* Add an attach handle to the group's pasid array. */
+static struct iommu_attach_handle *
+iommu_attach_handle_set(struct iommu_domain *domain,
+ struct iommu_group *group, ioasid_t pasid)
+{
+ struct iommu_attach_handle *handle;
+ void *curr;
+
+ handle = kzalloc(sizeof(*handle), GFP_KERNEL);
+ if (!handle)
+ return ERR_PTR(-ENOMEM);
+
+ handle->domain = domain;
+ refcount_set(&handle->users, 1);
+
+ curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, handle, GFP_KERNEL);
+ if (curr) {
+ kfree(handle);
+ return xa_err(curr) ? curr : ERR_PTR(-EBUSY);
+ }
+
+ return handle;
+}
+
+static struct iommu_attach_handle *
+iommu_attach_handle_replace(struct iommu_domain *domain,
+ struct iommu_group *group, ioasid_t pasid)
+{
+ struct iommu_attach_handle *handle, *curr;
+
+ handle = kzalloc(sizeof(*handle), GFP_KERNEL);
+ if (!handle)
+ return ERR_PTR(-ENOMEM);
+
+ handle->domain = domain;
+ refcount_set(&handle->users, 1);
+
+ curr = xa_store(&group->pasid_array, pasid, handle, GFP_KERNEL);
+ if (xa_err(curr)) {
+ kfree(handle);
+ return curr;
+ }
+
+ if (curr)
+ iommu_attach_handle_put(curr);
+
+ return handle;
+}
+
+/*
+ * Return caller the attach handle. The caller holds a refcount of the handle.
+ * This refcount should be released by calling iommu_attach_handle_put().
+ */
+struct iommu_attach_handle *
+iommu_attach_handle_get(struct iommu_group *group, ioasid_t pasid)
+{
+ struct iommu_attach_handle *handle;
+
+ xa_lock(&group->pasid_array);
+ handle = xa_load(&group->pasid_array, pasid);
+ if (handle)
+ refcount_inc(&handle->users);
+ xa_unlock(&group->pasid_array);
+
+ return handle;
+}
+EXPORT_SYMBOL_NS_GPL(iommu_attach_handle_get, IOMMUFD_INTERNAL);
+
+/* Put the refcount of the attach handle. */
+void iommu_attach_handle_put(struct iommu_attach_handle *handle)
+{
+ if (!handle)
+ return;
+
+ if (refcount_dec_and_test(&handle->users))
+ kfree(handle);
+}
+EXPORT_SYMBOL_NS_GPL(iommu_attach_handle_put, IOMMUFD_INTERNAL);
+
+/* Remove the attach handle stored in group's pasid array. */
+static void iommu_attach_handle_remove(struct iommu_group *group, ioasid_t pasid)
+{
+ struct iommu_attach_handle *handle;
+
+ handle = xa_erase(&group->pasid_array, pasid);
+ iommu_attach_handle_put(handle);
+}
+
/*
* Put the group's domain back to the appropriate core-owned domain - either the
* standard kernel-mode DMA configuration or an all-DMA-blocked domain.
@@ -2187,12 +2275,25 @@ static int __iommu_attach_group(struct iommu_domain *domain,
*/
int iommu_attach_group(struct iommu_domain *domain, struct iommu_group *group)
{
+ struct iommu_attach_handle *handle;
int ret;

mutex_lock(&group->mutex);
+ handle = iommu_attach_handle_set(domain, group, IOMMU_NO_PASID);
+ if (IS_ERR(handle)) {
+ ret = PTR_ERR(handle);
+ goto out_unlock;
+ }
ret = __iommu_attach_group(domain, group);
+ if (ret)
+ goto out_put_handle;
mutex_unlock(&group->mutex);

+ return 0;
+out_put_handle:
+ iommu_attach_handle_put(handle);
+out_unlock:
+ mutex_unlock(&group->mutex);
return ret;
}
EXPORT_SYMBOL_GPL(iommu_attach_group);
@@ -2211,13 +2312,33 @@ EXPORT_SYMBOL_GPL(iommu_attach_group);
int iommu_group_replace_domain(struct iommu_group *group,
struct iommu_domain *new_domain)
{
+ struct iommu_domain *old_domain = group->domain;
+ struct iommu_attach_handle *handle;
int ret;

if (!new_domain)
return -EINVAL;

+ if (new_domain == old_domain)
+ return 0;
+
mutex_lock(&group->mutex);
ret = __iommu_group_set_domain(group, new_domain);
+ if (ret)
+ goto out_unlock;
+
+ handle = iommu_attach_handle_replace(new_domain, group, IOMMU_NO_PASID);
+ if (IS_ERR(handle)) {
+ ret = PTR_ERR(handle);
+ goto out_old_domain;
+ }
+ mutex_unlock(&group->mutex);
+
+ return 0;
+
+out_old_domain:
+ __iommu_group_set_domain(group, old_domain);
+out_unlock:
mutex_unlock(&group->mutex);
return ret;
}
@@ -2352,6 +2473,7 @@ void iommu_detach_group(struct iommu_domain *domain, struct iommu_group *group)
{
mutex_lock(&group->mutex);
__iommu_group_set_core_domain(group);
+ iommu_attach_handle_remove(group, IOMMU_NO_PASID);
mutex_unlock(&group->mutex);
}
EXPORT_SYMBOL_GPL(iommu_detach_group);
@@ -3354,8 +3476,8 @@ int iommu_attach_device_pasid(struct iommu_domain *domain,
{
/* Caller must be a probed driver on dev */
struct iommu_group *group = dev->iommu_group;
+ struct iommu_attach_handle *handle;
struct group_device *device;
- void *curr;
int ret;

if (!domain->ops->set_dev_pasid)
@@ -3376,17 +3498,22 @@ int iommu_attach_device_pasid(struct iommu_domain *domain,
}
}

- curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain, GFP_KERNEL);
- if (curr) {
- ret = xa_err(curr) ? : -EBUSY;
+ handle = iommu_attach_handle_set(domain, group, pasid);
+ if (IS_ERR(handle)) {
+ ret = PTR_ERR(handle);
goto out_unlock;
}

ret = __iommu_set_group_pasid(domain, group, pasid);
- if (ret) {
- __iommu_remove_group_pasid(group, pasid);
- xa_erase(&group->pasid_array, pasid);
- }
+ if (ret)
+ goto out_put_handle;
+ mutex_unlock(&group->mutex);
+
+ return 0;
+
+out_put_handle:
+ __iommu_remove_group_pasid(group, pasid);
+ iommu_attach_handle_put(handle);
out_unlock:
mutex_unlock(&group->mutex);
return ret;
@@ -3410,7 +3537,7 @@ void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev,

mutex_lock(&group->mutex);
__iommu_remove_group_pasid(group, pasid);
- WARN_ON(xa_erase(&group->pasid_array, pasid) != domain);
+ iommu_attach_handle_remove(group, pasid);
mutex_unlock(&group->mutex);
}
EXPORT_SYMBOL_GPL(iommu_detach_device_pasid);
@@ -3433,18 +3560,21 @@ struct iommu_domain *iommu_get_domain_for_dev_pasid(struct device *dev,
ioasid_t pasid,
unsigned int type)
{
- /* Caller must be a probed driver on dev */
struct iommu_group *group = dev->iommu_group;
- struct iommu_domain *domain;
+ struct iommu_attach_handle *handle;
+ struct iommu_domain *domain = NULL;

if (!group)
return NULL;

- xa_lock(&group->pasid_array);
- domain = xa_load(&group->pasid_array, pasid);
+ handle = iommu_attach_handle_get(group, pasid);
+ if (handle) {
+ domain = handle->domain;
+ iommu_attach_handle_put(handle);
+ }
+
if (type && domain && domain->type != type)
domain = ERR_PTR(-EBUSY);
- xa_unlock(&group->pasid_array);

return domain;
}
--
2.34.1


2024-04-03 03:47:24

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v4 7/9] iommufd: Associate fault object with iommufd_hw_pgtable

When allocating a user iommufd_hw_pagetable, the user space is allowed to
associate a fault object with the hw_pagetable by specifying the fault
object ID in the page table allocation data and setting the
IOMMU_HWPT_FAULT_ID_VALID flag bit.

On a successful return of hwpt allocation, the user can retrieve and
respond to page faults by reading and writing the file interface of the
fault object.

Once a fault object has been associated with a hwpt, the hwpt is
iopf-capable, indicated by fault_capable set to true. Attaching,
detaching, or replacing an iopf-capable hwpt to an RID or PASID will
differ from those that are not iopf-capable.

Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/iommufd/iommufd_private.h | 10 +++++++
include/uapi/linux/iommufd.h | 8 ++++++
drivers/iommu/iommufd/fault.c | 20 ++++++++++++++
drivers/iommu/iommufd/hw_pagetable.c | 36 +++++++++++++++++++------
4 files changed, 66 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 8dad1150eaf0..0c2f5fee4130 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -293,6 +293,7 @@ int iommufd_check_iova_range(struct io_pagetable *iopt,
struct iommufd_hw_pagetable {
struct iommufd_object obj;
struct iommu_domain *domain;
+ struct iommufd_fault *fault;
bool fault_capable;
};

@@ -456,8 +457,17 @@ struct iommufd_fault {
struct wait_queue_head wait_queue;
};

+static inline struct iommufd_fault *
+iommufd_get_fault(struct iommufd_ucmd *ucmd, u32 id)
+{
+ return container_of(iommufd_get_object(ucmd->ictx, id,
+ IOMMUFD_OBJ_FAULT),
+ struct iommufd_fault, obj);
+}
+
int iommufd_fault_alloc(struct iommufd_ucmd *ucmd);
void iommufd_fault_destroy(struct iommufd_object *obj);
+int iommufd_fault_iopf_handler(struct iopf_group *group);

#ifdef CONFIG_IOMMUFD_TEST
int iommufd_test(struct iommufd_ucmd *ucmd);
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 1819b28e9e6b..3d566c1ffcc6 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -357,10 +357,13 @@ struct iommu_vfio_ioas {
* the parent HWPT in a nesting configuration.
* @IOMMU_HWPT_ALLOC_DIRTY_TRACKING: Dirty tracking support for device IOMMU is
* enforced on device attachment
+ * @IOMMU_HWPT_FAULT_ID_VALID: The fault_id field of hwpt allocation data is
+ * valid.
*/
enum iommufd_hwpt_alloc_flags {
IOMMU_HWPT_ALLOC_NEST_PARENT = 1 << 0,
IOMMU_HWPT_ALLOC_DIRTY_TRACKING = 1 << 1,
+ IOMMU_HWPT_FAULT_ID_VALID = 1 << 2,
};

/**
@@ -412,6 +415,9 @@ enum iommu_hwpt_data_type {
* @data_type: One of enum iommu_hwpt_data_type
* @data_len: Length of the type specific data
* @data_uptr: User pointer to the type specific data
+ * @fault_id: The ID of IOMMUFD_FAULT object. Valid only if flags field of
+ * IOMMU_HWPT_FAULT_ID_VALID is set.
+ * @__reserved2: Padding to 64-bit alignment. Must be 0.
*
* Explicitly allocate a hardware page table object. This is the same object
* type that is returned by iommufd_device_attach() and represents the
@@ -442,6 +448,8 @@ struct iommu_hwpt_alloc {
__u32 data_type;
__u32 data_len;
__aligned_u64 data_uptr;
+ __u32 fault_id;
+ __u32 __reserved2;
};
#define IOMMU_HWPT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_ALLOC)

diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
index f16f15e7edd4..ffe948b6a5b5 100644
--- a/drivers/iommu/iommufd/fault.c
+++ b/drivers/iommu/iommufd/fault.c
@@ -350,3 +350,23 @@ int iommufd_fault_alloc(struct iommufd_ucmd *ucmd)

return rc;
}
+
+int iommufd_fault_iopf_handler(struct iopf_group *group)
+{
+ struct iommufd_hw_pagetable *hwpt;
+ struct iommufd_fault *fault;
+
+ if (!group->attach_handle)
+ return -EINVAL;
+
+ hwpt = group->attach_handle->domain->fault_data;
+ fault = hwpt->fault;
+
+ mutex_lock(&fault->mutex);
+ list_add_tail(&group->node, &fault->deliver);
+ mutex_unlock(&fault->mutex);
+
+ wake_up_interruptible(&fault->wait_queue);
+
+ return 0;
+}
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 33d142f8057d..95ea943f5be5 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -8,6 +8,15 @@
#include "../iommu-priv.h"
#include "iommufd_private.h"

+static void __iommufd_hwpt_destroy(struct iommufd_hw_pagetable *hwpt)
+{
+ if (hwpt->domain)
+ iommu_domain_free(hwpt->domain);
+
+ if (hwpt->fault)
+ iommufd_put_object(hwpt->fault->ictx, &hwpt->fault->obj);
+}
+
void iommufd_hwpt_paging_destroy(struct iommufd_object *obj)
{
struct iommufd_hwpt_paging *hwpt_paging =
@@ -22,9 +31,7 @@ void iommufd_hwpt_paging_destroy(struct iommufd_object *obj)
hwpt_paging->common.domain);
}

- if (hwpt_paging->common.domain)
- iommu_domain_free(hwpt_paging->common.domain);
-
+ __iommufd_hwpt_destroy(&hwpt_paging->common);
refcount_dec(&hwpt_paging->ioas->obj.users);
}

@@ -49,9 +56,7 @@ void iommufd_hwpt_nested_destroy(struct iommufd_object *obj)
struct iommufd_hwpt_nested *hwpt_nested =
container_of(obj, struct iommufd_hwpt_nested, common.obj);

- if (hwpt_nested->common.domain)
- iommu_domain_free(hwpt_nested->common.domain);
-
+ __iommufd_hwpt_destroy(&hwpt_nested->common);
refcount_dec(&hwpt_nested->parent->common.obj.users);
}

@@ -213,7 +218,8 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx,
struct iommufd_hw_pagetable *hwpt;
int rc;

- if (flags || !user_data->len || !ops->domain_alloc_user)
+ if ((flags & ~IOMMU_HWPT_FAULT_ID_VALID) ||
+ !user_data->len || !ops->domain_alloc_user)
return ERR_PTR(-EOPNOTSUPP);
if (parent->auto_domain || !parent->nest_parent)
return ERR_PTR(-EINVAL);
@@ -227,7 +233,7 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx,
refcount_inc(&parent->common.obj.users);
hwpt_nested->parent = parent;

- hwpt->domain = ops->domain_alloc_user(idev->dev, flags,
+ hwpt->domain = ops->domain_alloc_user(idev->dev, 0,
parent->common.domain, user_data);
if (IS_ERR(hwpt->domain)) {
rc = PTR_ERR(hwpt->domain);
@@ -308,6 +314,20 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
goto out_put_pt;
}

+ if (cmd->flags & IOMMU_HWPT_FAULT_ID_VALID) {
+ struct iommufd_fault *fault;
+
+ fault = iommufd_get_fault(ucmd, cmd->fault_id);
+ if (IS_ERR(fault)) {
+ rc = PTR_ERR(fault);
+ goto out_hwpt;
+ }
+ hwpt->fault = fault;
+ hwpt->domain->iopf_handler = iommufd_fault_iopf_handler;
+ hwpt->domain->fault_data = hwpt;
+ hwpt->fault_capable = true;
+ }
+
cmd->out_hwpt_id = hwpt->obj.id;
rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
if (rc)
--
2.34.1


2024-04-03 05:17:48

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v4 8/9] iommufd/selftest: Add IOPF support for mock device

Extend the selftest mock device to support generating and responding to
an IOPF. Also add an ioctl interface to userspace applications to trigger
the IOPF on the mock device. This would allow userspace applications to
test the IOMMUFD's handling of IOPFs without having to rely on any real
hardware.

Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/iommufd/iommufd_test.h | 8 ++++
drivers/iommu/iommufd/selftest.c | 63 ++++++++++++++++++++++++++++
2 files changed, 71 insertions(+)

diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
index e854d3f67205..acbbba1c6671 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -22,6 +22,7 @@ enum {
IOMMU_TEST_OP_MOCK_DOMAIN_FLAGS,
IOMMU_TEST_OP_DIRTY,
IOMMU_TEST_OP_MD_CHECK_IOTLB,
+ IOMMU_TEST_OP_TRIGGER_IOPF,
};

enum {
@@ -127,6 +128,13 @@ struct iommu_test_cmd {
__u32 id;
__u32 iotlb;
} check_iotlb;
+ struct {
+ __u32 dev_id;
+ __u32 pasid;
+ __u32 grpid;
+ __u32 perm;
+ __u64 addr;
+ } trigger_iopf;
};
__u32 last;
};
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 97ce62602e66..f925f5e5c00a 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -504,6 +504,8 @@ static bool mock_domain_capable(struct device *dev, enum iommu_cap cap)
return false;
}

+static struct iopf_queue *mock_iommu_iopf_queue;
+
static struct iommu_device mock_iommu_device = {
};

@@ -514,6 +516,29 @@ static struct iommu_device *mock_probe_device(struct device *dev)
return &mock_iommu_device;
}

+static void mock_domain_page_response(struct device *dev, struct iopf_fault *evt,
+ struct iommu_page_response *msg)
+{
+}
+
+static int mock_dev_enable_feat(struct device *dev, enum iommu_dev_features feat)
+{
+ if (feat != IOMMU_DEV_FEAT_IOPF || !mock_iommu_iopf_queue)
+ return -ENODEV;
+
+ return iopf_queue_add_device(mock_iommu_iopf_queue, dev);
+}
+
+static int mock_dev_disable_feat(struct device *dev, enum iommu_dev_features feat)
+{
+ if (feat != IOMMU_DEV_FEAT_IOPF || !mock_iommu_iopf_queue)
+ return -ENODEV;
+
+ iopf_queue_remove_device(mock_iommu_iopf_queue, dev);
+
+ return 0;
+}
+
static const struct iommu_ops mock_ops = {
/*
* IOMMU_DOMAIN_BLOCKED cannot be returned from def_domain_type()
@@ -529,6 +554,9 @@ static const struct iommu_ops mock_ops = {
.capable = mock_domain_capable,
.device_group = generic_device_group,
.probe_device = mock_probe_device,
+ .page_response = mock_domain_page_response,
+ .dev_enable_feat = mock_dev_enable_feat,
+ .dev_disable_feat = mock_dev_disable_feat,
.default_domain_ops =
&(struct iommu_domain_ops){
.free = mock_domain_free,
@@ -1397,6 +1425,31 @@ static int iommufd_test_dirty(struct iommufd_ucmd *ucmd, unsigned int mockpt_id,
return rc;
}

+static int iommufd_test_trigger_iopf(struct iommufd_ucmd *ucmd,
+ struct iommu_test_cmd *cmd)
+{
+ struct iopf_fault event = { };
+ struct iommufd_device *idev;
+
+ idev = iommufd_get_device(ucmd, cmd->trigger_iopf.dev_id);
+ if (IS_ERR(idev))
+ return PTR_ERR(idev);
+
+ event.fault.prm.flags = IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
+ if (cmd->trigger_iopf.pasid != IOMMU_NO_PASID)
+ event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
+ event.fault.type = IOMMU_FAULT_PAGE_REQ;
+ event.fault.prm.addr = cmd->trigger_iopf.addr;
+ event.fault.prm.pasid = cmd->trigger_iopf.pasid;
+ event.fault.prm.grpid = cmd->trigger_iopf.grpid;
+ event.fault.prm.perm = cmd->trigger_iopf.perm;
+
+ iommu_report_device_fault(idev->dev, &event);
+ iommufd_put_object(ucmd->ictx, &idev->obj);
+
+ return 0;
+}
+
void iommufd_selftest_destroy(struct iommufd_object *obj)
{
struct selftest_obj *sobj = container_of(obj, struct selftest_obj, obj);
@@ -1472,6 +1525,8 @@ int iommufd_test(struct iommufd_ucmd *ucmd)
cmd->dirty.page_size,
u64_to_user_ptr(cmd->dirty.uptr),
cmd->dirty.flags);
+ case IOMMU_TEST_OP_TRIGGER_IOPF:
+ return iommufd_test_trigger_iopf(ucmd, cmd);
default:
return -EOPNOTSUPP;
}
@@ -1513,6 +1568,9 @@ int __init iommufd_test_init(void)
&iommufd_mock_bus_type.nb);
if (rc)
goto err_sysfs;
+
+ mock_iommu_iopf_queue = iopf_queue_alloc("mock-iopfq");
+
return 0;

err_sysfs:
@@ -1528,6 +1586,11 @@ int __init iommufd_test_init(void)

void iommufd_test_exit(void)
{
+ if (mock_iommu_iopf_queue) {
+ iopf_queue_free(mock_iommu_iopf_queue);
+ mock_iommu_iopf_queue = NULL;
+ }
+
iommu_device_sysfs_remove(&mock_iommu_device);
iommu_device_unregister_bus(&mock_iommu_device,
&iommufd_mock_bus_type.bus,
--
2.34.1


2024-04-03 11:59:09

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 1/9] iommu: Introduce domain attachment handle

On Wed, Apr 03, 2024 at 09:15:11AM +0800, Lu Baolu wrote:
> Currently, when attaching a domain to a device or its PASID, domain is
> stored within the iommu group. It could be retrieved for use during the
> window between attachment and detachment.
>
> With new features introduced, there's a need to store more information
> than just a domain pointer. This information essentially represents the
> association between a domain and a device. For example, the SVA code
> already has a custom struct iommu_sva which represents a bond between
> sva domain and a PASID of a device. Looking forward, the IOMMUFD needs
> a place to store the iommufd_device pointer in the core, so that the
> device object ID could be quickly retrieved in the critical fault handling
> path.
>
> Introduce domain attachment handle that explicitly represents the
> attachment relationship between a domain and a device or its PASID.
> A caller-specific data field can be used by the caller to store additional
> information beyond a domain pointer, depending on its specific use case.
>
> Co-developed-by: Jason Gunthorpe <[email protected]>
> Signed-off-by: Jason Gunthorpe <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> ---
> drivers/iommu/iommu-priv.h | 9 +++
> drivers/iommu/iommu.c | 158 +++++++++++++++++++++++++++++++++----
> 2 files changed, 153 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
> index 5f731d994803..08c0667cef54 100644
> --- a/drivers/iommu/iommu-priv.h
> +++ b/drivers/iommu/iommu-priv.h
> @@ -28,4 +28,13 @@ void iommu_device_unregister_bus(struct iommu_device *iommu,
> const struct bus_type *bus,
> struct notifier_block *nb);
>
> +struct iommu_attach_handle {
> + struct iommu_domain *domain;
> + refcount_t users;

I don't understand how the refcounting can be generally useful. There
is no way to free this:

> + void *priv;

When the refcount goes to zero.

Jason

2024-04-03 11:59:43

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 2/9] iommu: Replace sva_iommu with iommu_attach_handle

On Wed, Apr 03, 2024 at 09:15:12AM +0800, Lu Baolu wrote:
> + /* A bond already exists, just take a reference`. */
> + handle = iommu_attach_handle_get(group, iommu_mm->pasid);
> + if (handle) {
> + mutex_unlock(&iommu_sva_lock);
> + return handle;
> }

At least in this context this is not enough we need to ensure that the
domain on the PASID is actually an SVA domain and it was installed by
this mechanism, not an iommufd domain for instance.

ie you probably need a type field in the iommu_attach_handle to tell
what the priv is.

Otherwise this seems like a great idea!

> - iommu_detach_device_pasid(domain, dev, iommu_mm->pasid);
> - if (--domain->users == 0) {
> - list_del(&domain->next);
> - iommu_domain_free(domain);
> + iommu_attach_handle_put(handle);
> + if (refcount_read(&handle->users) == 1) {
> + iommu_detach_device_pasid(domain, dev, iommu_mm->pasid);
> + if (--domain->users == 0) {
> + list_del(&domain->next);
> + iommu_domain_free(domain);
> + }
> }

Though I'm not convinced the refcount should be elevated into the core
structure. The prior patch I showed you where the caller can provide
the memory for the handle and we don't have a priv would make it easy
to put the refcount in a SVA dervied handle struct without more
allocation. Then we don't need this weirdness.

> mutex_unlock(&iommu_sva_lock);
> - kfree(handle);

Also do we need iommu_sva_lock here anymore? I wonder if the group
mutex would be sufficient..

Jason

2024-04-06 04:35:33

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH v4 1/9] iommu: Introduce domain attachment handle

On 4/3/24 7:58 PM, Jason Gunthorpe wrote:
> On Wed, Apr 03, 2024 at 09:15:11AM +0800, Lu Baolu wrote:
>> Currently, when attaching a domain to a device or its PASID, domain is
>> stored within the iommu group. It could be retrieved for use during the
>> window between attachment and detachment.
>>
>> With new features introduced, there's a need to store more information
>> than just a domain pointer. This information essentially represents the
>> association between a domain and a device. For example, the SVA code
>> already has a custom struct iommu_sva which represents a bond between
>> sva domain and a PASID of a device. Looking forward, the IOMMUFD needs
>> a place to store the iommufd_device pointer in the core, so that the
>> device object ID could be quickly retrieved in the critical fault handling
>> path.
>>
>> Introduce domain attachment handle that explicitly represents the
>> attachment relationship between a domain and a device or its PASID.
>> A caller-specific data field can be used by the caller to store additional
>> information beyond a domain pointer, depending on its specific use case.
>>
>> Co-developed-by: Jason Gunthorpe<[email protected]>
>> Signed-off-by: Jason Gunthorpe<[email protected]>
>> Signed-off-by: Lu Baolu<[email protected]>
>> ---
>> drivers/iommu/iommu-priv.h | 9 +++
>> drivers/iommu/iommu.c | 158 +++++++++++++++++++++++++++++++++----
>> 2 files changed, 153 insertions(+), 14 deletions(-)
>>
>> diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
>> index 5f731d994803..08c0667cef54 100644
>> --- a/drivers/iommu/iommu-priv.h
>> +++ b/drivers/iommu/iommu-priv.h
>> @@ -28,4 +28,13 @@ void iommu_device_unregister_bus(struct iommu_device *iommu,
>> const struct bus_type *bus,
>> struct notifier_block *nb);
>>
>> +struct iommu_attach_handle {
>> + struct iommu_domain *domain;
>> + refcount_t users;
> I don't understand how the refcounting can be generally useful. There
> is no way to free this:
>
>> + void *priv;
> When the refcount goes to zero.

This field is set by the caller, so the caller ensures that the pointer
can only be freed after iommu domain detachment. For iopf, the caller
should automatically respond to all outstanding iopf's in the domain
detach path.

In the sva case, which uses the workqueue to handle iopf,
flush_workqueue() is called in the domain detach path to ensure that all
outstanding iopf's are completed before detach completion.

For iommufd, perhaps I need to add the following code in the detach (and
the same to replace) paths:

--- a/drivers/iommu/iommufd/fault.c
+++ b/drivers/iommu/iommufd/fault.c
@@ -68,14 +68,35 @@ int iommufd_fault_domain_attach_dev(struct
iommufd_hw_pagetable *hwpt,
return 0;
}

+static void iommufd_auto_response_handle(struct iommufd_fault *fault,
+ struct iommu_attach_handle *handle)
+{
+ struct iommufd_device *idev = handle->priv;
+ struct iopf_group *group;
+ unsigned long index;
+
+ mutex_lock(&fault->mutex);
+ xa_for_each(&idev->faults, index, group) {
+ xa_erase(&idev->faults, index);
+ iopf_group_response(group, IOMMU_PAGE_RESP_INVALID);
+ }
+ mutex_unlock(&fault->mutex);
+}
+
void iommufd_fault_domain_detach_dev(struct iommufd_hw_pagetable *hwpt,
struct iommufd_device *idev)
{
+ struct iommufd_fault *fault = hwpt->fault;
+ struct iommu_attach_handle *handle;
+
if (WARN_ON(!hwpt->fault_capable))
return;

+ handle = iommu_attach_handle_get(idev->igroup->group,
IOMMU_NO_PASID);
iommu_detach_group(hwpt->domain, idev->igroup->group);
iommufd_fault_iopf_disable(idev);
+ iommufd_auto_response_handle(fault, handle);
+ iommu_attach_handle_put(handle);
}

Best regards,
baolu

2024-04-06 06:11:42

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH v4 2/9] iommu: Replace sva_iommu with iommu_attach_handle

On 4/3/24 7:59 PM, Jason Gunthorpe wrote:
> On Wed, Apr 03, 2024 at 09:15:12AM +0800, Lu Baolu wrote:
>> + /* A bond already exists, just take a reference`. */
>> + handle = iommu_attach_handle_get(group, iommu_mm->pasid);
>> + if (handle) {
>> + mutex_unlock(&iommu_sva_lock);
>> + return handle;
>> }
> At least in this context this is not enough we need to ensure that the
> domain on the PASID is actually an SVA domain and it was installed by
> this mechanism, not an iommufd domain for instance.
>
> ie you probably need a type field in the iommu_attach_handle to tell
> what the priv is.
>
> Otherwise this seems like a great idea!

Yes, you are right. For the SVA case, I will add the following changes.
The IOMMUFD path will also need such enhancement. I will update it in
the next version.

diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
index 08c0667cef54..9aee70f87a21 100644
--- a/drivers/iommu/iommu-priv.h
+++ b/drivers/iommu/iommu-priv.h
@@ -28,9 +28,22 @@ void iommu_device_unregister_bus(struct iommu_device
*iommu,
const struct bus_type *bus,
struct notifier_block *nb);

+enum attach_handle_type {
+ ATTACH_HANDLE_TYPE_DEFAULT = 0,
+ ATTACH_HANDLE_TYPE_SVA,
+ ATTACH_HANDLE_TYPE_IOMMUFD,
+};
+
struct iommu_attach_handle {
struct iommu_domain *domain;
refcount_t users;
+
+ /*
+ * Set by the attach interface callers. The type field could be used
+ * by the caller to identify whether the priv field was installed by
+ * them.
+ */
+ enum attach_handle_type type;
void *priv;
};

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index c66cf26137bf..3eb664cc3f3a 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -90,7 +90,11 @@ struct iommu_attach_handle
*iommu_sva_bind_device(struct device *dev, struct mm_
handle = iommu_attach_handle_get(group, iommu_mm->pasid);
if (handle) {
mutex_unlock(&iommu_sva_lock);
- return handle;
+ if (handle->type == ATTACH_HANDLE_TYPE_SVA)
+ return handle;
+
+ iommu_attach_handle_put(handle);
+ return ERR_PTR(-EBUSY);
}

/* Search for an existing domain. */
@@ -118,6 +122,7 @@ struct iommu_attach_handle
*iommu_sva_bind_device(struct device *dev, struct mm_
out:
handle = iommu_attach_handle_get(group, iommu_mm->pasid);
mutex_unlock(&iommu_sva_lock);
+ handle->type = ATTACH_HANDLE_TYPE_SVA;
handle->priv = dev;

return handle;

Best regards,
baolu

2024-04-06 08:58:56

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH v4 2/9] iommu: Replace sva_iommu with iommu_attach_handle

On 4/3/24 7:59 PM, Jason Gunthorpe wrote:
>> - iommu_detach_device_pasid(domain, dev, iommu_mm->pasid);
>> - if (--domain->users == 0) {
>> - list_del(&domain->next);
>> - iommu_domain_free(domain);
>> + iommu_attach_handle_put(handle);
>> + if (refcount_read(&handle->users) == 1) {
>> + iommu_detach_device_pasid(domain, dev, iommu_mm->pasid);
>> + if (--domain->users == 0) {
>> + list_del(&domain->next);
>> + iommu_domain_free(domain);
>> + }
>> }
> Though I'm not convinced the refcount should be elevated into the core
> structure. The prior patch I showed you where the caller can provide
> the memory for the handle and we don't have a priv would make it easy
> to put the refcount in a SVA dervied handle struct without more
> allocation. Then we don't need this weirdness.

It's fine to move the refcount out of the core and allow the caller to
specify and manage its own attach handler. The refcount would then be
managed by the SVA code.

For the IOMMUFD case, we've discussed that all outstanding iopf's
should be automatically responded in the detach process. This ensures
the attach handle won't be used once the detach process completes.
Therefore, if this is true, there appears to be no need for a refcount
for IOMMUFD.

>
>> mutex_unlock(&iommu_sva_lock);
>> - kfree(handle);
> Also do we need iommu_sva_lock here anymore? I wonder if the group
> mutex would be sufficient..

The iommu_sva_lock protects the whole process of a mm binding, from
pasid allocation to domain attachment. While the group mutex only
protects the data within it structure. I don't think we could replace
iommu_sva_lock with group mutex in this patch. Or any misunderstanding?

Best regards,
baolu

2024-04-08 14:06:05

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 1/9] iommu: Introduce domain attachment handle

On Sat, Apr 06, 2024 at 12:34:14PM +0800, Baolu Lu wrote:
> On 4/3/24 7:58 PM, Jason Gunthorpe wrote:
> > On Wed, Apr 03, 2024 at 09:15:11AM +0800, Lu Baolu wrote:
> > > Currently, when attaching a domain to a device or its PASID, domain is
> > > stored within the iommu group. It could be retrieved for use during the
> > > window between attachment and detachment.
> > >
> > > With new features introduced, there's a need to store more information
> > > than just a domain pointer. This information essentially represents the
> > > association between a domain and a device. For example, the SVA code
> > > already has a custom struct iommu_sva which represents a bond between
> > > sva domain and a PASID of a device. Looking forward, the IOMMUFD needs
> > > a place to store the iommufd_device pointer in the core, so that the
> > > device object ID could be quickly retrieved in the critical fault handling
> > > path.
> > >
> > > Introduce domain attachment handle that explicitly represents the
> > > attachment relationship between a domain and a device or its PASID.
> > > A caller-specific data field can be used by the caller to store additional
> > > information beyond a domain pointer, depending on its specific use case.
> > >
> > > Co-developed-by: Jason Gunthorpe<[email protected]>
> > > Signed-off-by: Jason Gunthorpe<[email protected]>
> > > Signed-off-by: Lu Baolu<[email protected]>
> > > ---
> > > drivers/iommu/iommu-priv.h | 9 +++
> > > drivers/iommu/iommu.c | 158 +++++++++++++++++++++++++++++++++----
> > > 2 files changed, 153 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
> > > index 5f731d994803..08c0667cef54 100644
> > > --- a/drivers/iommu/iommu-priv.h
> > > +++ b/drivers/iommu/iommu-priv.h
> > > @@ -28,4 +28,13 @@ void iommu_device_unregister_bus(struct iommu_device *iommu,
> > > const struct bus_type *bus,
> > > struct notifier_block *nb);
> > > +struct iommu_attach_handle {
> > > + struct iommu_domain *domain;
> > > + refcount_t users;
> > I don't understand how the refcounting can be generally useful. There
> > is no way to free this:
> >
> > > + void *priv;
> > When the refcount goes to zero.
>
> This field is set by the caller, so the caller ensures that the pointer
> can only be freed after iommu domain detachment. For iopf, the caller
> should automatically respond to all outstanding iopf's in the domain
> detach path.
>
> In the sva case, which uses the workqueue to handle iopf,
> flush_workqueue() is called in the domain detach path to ensure that all
> outstanding iopf's are completed before detach completion.

Which is back to what is the point of the refcount at all?

> +static void iommufd_auto_response_handle(struct iommufd_fault *fault,
> + struct iommu_attach_handle *handle)
> +{
> + struct iommufd_device *idev = handle->priv;

The caller already has the iommufd_device, don't need the handler.

> + struct iopf_group *group;
> + unsigned long index;
> +
> + mutex_lock(&fault->mutex);
> + xa_for_each(&idev->faults, index, group) {
> + xa_erase(&idev->faults, index);
> + iopf_group_response(group, IOMMU_PAGE_RESP_INVALID);
> + }
> + mutex_unlock(&fault->mutex);

This makes sense, yes..

> void iommufd_fault_domain_detach_dev(struct iommufd_hw_pagetable *hwpt,
> struct iommufd_device *idev)
> {
> + struct iommufd_fault *fault = hwpt->fault;
> + struct iommu_attach_handle *handle;
> +
> if (WARN_ON(!hwpt->fault_capable))
> return;
>
> + handle = iommu_attach_handle_get(idev->igroup->group,
> IOMMU_NO_PASID);
> iommu_detach_group(hwpt->domain, idev->igroup->group);
> iommufd_fault_iopf_disable(idev);

But is this right? Couldn't there be PASID's doing PRI?

Jason

2024-04-08 14:21:32

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 2/9] iommu: Replace sva_iommu with iommu_attach_handle

On Sat, Apr 06, 2024 at 02:09:34PM +0800, Baolu Lu wrote:
> On 4/3/24 7:59 PM, Jason Gunthorpe wrote:
> > On Wed, Apr 03, 2024 at 09:15:12AM +0800, Lu Baolu wrote:
> > > + /* A bond already exists, just take a reference`. */
> > > + handle = iommu_attach_handle_get(group, iommu_mm->pasid);
> > > + if (handle) {
> > > + mutex_unlock(&iommu_sva_lock);
> > > + return handle;
> > > }
> > At least in this context this is not enough we need to ensure that the
> > domain on the PASID is actually an SVA domain and it was installed by
> > this mechanism, not an iommufd domain for instance.
> >
> > ie you probably need a type field in the iommu_attach_handle to tell
> > what the priv is.
> >
> > Otherwise this seems like a great idea!
>
> Yes, you are right. For the SVA case, I will add the following changes.
> The IOMMUFD path will also need such enhancement. I will update it in
> the next version.

The only use for this is the PRI callbacks right? Maybe instead of
adding a handle type let's just check domain->iopf_handler ?

Ie SVA will pass &ommu_sva_iopf_handler as its "type"

Jason

2024-04-09 01:35:32

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH v4 1/9] iommu: Introduce domain attachment handle

On 4/8/24 10:05 PM, Jason Gunthorpe wrote:
> On Sat, Apr 06, 2024 at 12:34:14PM +0800, Baolu Lu wrote:
>> On 4/3/24 7:58 PM, Jason Gunthorpe wrote:
>>> On Wed, Apr 03, 2024 at 09:15:11AM +0800, Lu Baolu wrote:
>>>> Currently, when attaching a domain to a device or its PASID, domain is
>>>> stored within the iommu group. It could be retrieved for use during the
>>>> window between attachment and detachment.
>>>>
>>>> With new features introduced, there's a need to store more information
>>>> than just a domain pointer. This information essentially represents the
>>>> association between a domain and a device. For example, the SVA code
>>>> already has a custom struct iommu_sva which represents a bond between
>>>> sva domain and a PASID of a device. Looking forward, the IOMMUFD needs
>>>> a place to store the iommufd_device pointer in the core, so that the
>>>> device object ID could be quickly retrieved in the critical fault handling
>>>> path.
>>>>
>>>> Introduce domain attachment handle that explicitly represents the
>>>> attachment relationship between a domain and a device or its PASID.
>>>> A caller-specific data field can be used by the caller to store additional
>>>> information beyond a domain pointer, depending on its specific use case.
>>>>
>>>> Co-developed-by: Jason Gunthorpe<[email protected]>
>>>> Signed-off-by: Jason Gunthorpe<[email protected]>
>>>> Signed-off-by: Lu Baolu<[email protected]>
>>>> ---
>>>> drivers/iommu/iommu-priv.h | 9 +++
>>>> drivers/iommu/iommu.c | 158 +++++++++++++++++++++++++++++++++----
>>>> 2 files changed, 153 insertions(+), 14 deletions(-)
>>>>
>>>> diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
>>>> index 5f731d994803..08c0667cef54 100644
>>>> --- a/drivers/iommu/iommu-priv.h
>>>> +++ b/drivers/iommu/iommu-priv.h
>>>> @@ -28,4 +28,13 @@ void iommu_device_unregister_bus(struct iommu_device *iommu,
>>>> const struct bus_type *bus,
>>>> struct notifier_block *nb);
>>>> +struct iommu_attach_handle {
>>>> + struct iommu_domain *domain;
>>>> + refcount_t users;
>>> I don't understand how the refcounting can be generally useful. There
>>> is no way to free this:
>>>
>>>> + void *priv;
>>> When the refcount goes to zero.
>> This field is set by the caller, so the caller ensures that the pointer
>> can only be freed after iommu domain detachment. For iopf, the caller
>> should automatically respond to all outstanding iopf's in the domain
>> detach path.
>>
>> In the sva case, which uses the workqueue to handle iopf,
>> flush_workqueue() is called in the domain detach path to ensure that all
>> outstanding iopf's are completed before detach completion.
> Which is back to what is the point of the refcount at all?

Yeah, refcount is not generally useful. It's context-specific, so it
needs to move out of the core.

The SVA code needs refcount because it allows multiple attachments
between a SVA domain and a PASID. This is not a common case.

>
>> +static void iommufd_auto_response_handle(struct iommufd_fault *fault,
>> + struct iommu_attach_handle *handle)
>> +{
>> + struct iommufd_device *idev = handle->priv;
> The caller already has the iommufd_device, don't need the handler.

Yes.

Best regards,
baolu

2024-04-09 01:54:45

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH v4 1/9] iommu: Introduce domain attachment handle

On 4/8/24 10:05 PM, Jason Gunthorpe wrote:
>> void iommufd_fault_domain_detach_dev(struct iommufd_hw_pagetable *hwpt,
>> struct iommufd_device *idev)
>> {
>> + struct iommufd_fault *fault = hwpt->fault;
>> + struct iommu_attach_handle *handle;
>> +
>> if (WARN_ON(!hwpt->fault_capable))
>> return;
>>
>> + handle = iommu_attach_handle_get(idev->igroup->group,
>> IOMMU_NO_PASID);
>> iommu_detach_group(hwpt->domain, idev->igroup->group);
>> iommufd_fault_iopf_disable(idev);
> But is this right? Couldn't there be PASID's doing PRI?

As far as I can see, there are two types of user PASID.

1. When a device is assigned to userspace, the PASID table is managed by
the userspace.

Userspace doesn't need PASID attach/detach/replace uAPIs in this
scenario. All I/O page faults are directed to userspace through the
hw pagetable attached to the RID.

If hw pagetable is detached from the RID, or a non-iopf-capable
hw pagetable is attached the RID, the PRI for user PASID is already
broken.

2. When a device is assigned to userspace, the PASID table is managed by
the host kernel. Userspace then needs PASID attach/detach/replace
uAPIs to manage the hw page table for each PASID. Each PASID has its
own hw page table for handling I/O page faults.

Here, disabling PRI is only safe after all iopf-capable hw page
tables for both the RID and all PASIDs are detached.

The current code base doesn't yet support PASID attach/detach/replace
uAPIs. Therefore, above code is safe and reasonable. However, we will
need to revisit this code when those APIs become available.

Please correct me if my understanding is incorrect.

Best regards,
baolu

2024-04-09 02:12:48

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH v4 2/9] iommu: Replace sva_iommu with iommu_attach_handle

On 4/8/24 10:19 PM, Jason Gunthorpe wrote:
> On Sat, Apr 06, 2024 at 02:09:34PM +0800, Baolu Lu wrote:
>> On 4/3/24 7:59 PM, Jason Gunthorpe wrote:
>>> On Wed, Apr 03, 2024 at 09:15:12AM +0800, Lu Baolu wrote:
>>>> + /* A bond already exists, just take a reference`. */
>>>> + handle = iommu_attach_handle_get(group, iommu_mm->pasid);
>>>> + if (handle) {
>>>> + mutex_unlock(&iommu_sva_lock);
>>>> + return handle;
>>>> }
>>> At least in this context this is not enough we need to ensure that the
>>> domain on the PASID is actually an SVA domain and it was installed by
>>> this mechanism, not an iommufd domain for instance.
>>>
>>> ie you probably need a type field in the iommu_attach_handle to tell
>>> what the priv is.
>>>
>>> Otherwise this seems like a great idea!
>> Yes, you are right. For the SVA case, I will add the following changes.
>> The IOMMUFD path will also need such enhancement. I will update it in
>> the next version.
> The only use for this is the PRI callbacks right? Maybe instead of
> adding a handle type let's just check domain->iopf_handler ?
>
> Ie SVA will pass &ommu_sva_iopf_handler as its "type"

Sorry that I don't fully understand the proposal here.

We need to get the attach handle at least in below cases:

1. In the iommu_sva_bind_device() path so that the existing bind could
be reused.

2. In the iommu_report_device_fault() path so that the context-specific
data could be used in the fault handler.

The problem is that the context code (SVA, IOMMUFD, etc.) needs to make
sure that the attach handle is really what it has installed during
domain attachment. The context code needs some mechanism to include some
kind of "owner cookie" in the attach handle, so that it could check
against it later for valid use.

Best regards,
baolu

2024-04-09 23:38:10

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 1/9] iommu: Introduce domain attachment handle

On Tue, Apr 09, 2024 at 09:53:26AM +0800, Baolu Lu wrote:
> On 4/8/24 10:05 PM, Jason Gunthorpe wrote:
> > > void iommufd_fault_domain_detach_dev(struct iommufd_hw_pagetable *hwpt,
> > > struct iommufd_device *idev)
> > > {
> > > + struct iommufd_fault *fault = hwpt->fault;
> > > + struct iommu_attach_handle *handle;
> > > +
> > > if (WARN_ON(!hwpt->fault_capable))
> > > return;
> > >
> > > + handle = iommu_attach_handle_get(idev->igroup->group,
> > > IOMMU_NO_PASID);
> > > iommu_detach_group(hwpt->domain, idev->igroup->group);
> > > iommufd_fault_iopf_disable(idev);
> > But is this right? Couldn't there be PASID's doing PRI?
>
> As far as I can see, there are two types of user PASID.
>
> 1. When a device is assigned to userspace, the PASID table is managed by
> the userspace.
>
> Userspace doesn't need PASID attach/detach/replace uAPIs in this
> scenario. All I/O page faults are directed to userspace through the
> hw pagetable attached to the RID.
>
> If hw pagetable is detached from the RID, or a non-iopf-capable
> hw pagetable is attached the RID, the PRI for user PASID is already
> broken.

I would say in this case the special nesting HWPT should have an
indication if PRI should be supported or not when it is created and
that should drive the PRI enablement..

> The current code base doesn't yet support PASID attach/detach/replace
> uAPIs. Therefore, above code is safe and reasonable. However, we will
> need to revisit this code when those APIs become available.

Okay, I see.

Can we do the PASID iommufd stuff soon? What is the plan here?

Jason

2024-04-09 23:49:07

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 2/9] iommu: Replace sva_iommu with iommu_attach_handle

On Tue, Apr 09, 2024 at 10:11:28AM +0800, Baolu Lu wrote:
> On 4/8/24 10:19 PM, Jason Gunthorpe wrote:
> > On Sat, Apr 06, 2024 at 02:09:34PM +0800, Baolu Lu wrote:
> > > On 4/3/24 7:59 PM, Jason Gunthorpe wrote:
> > > > On Wed, Apr 03, 2024 at 09:15:12AM +0800, Lu Baolu wrote:
> > > > > + /* A bond already exists, just take a reference`. */
> > > > > + handle = iommu_attach_handle_get(group, iommu_mm->pasid);
> > > > > + if (handle) {
> > > > > + mutex_unlock(&iommu_sva_lock);
> > > > > + return handle;
> > > > > }
> > > > At least in this context this is not enough we need to ensure that the
> > > > domain on the PASID is actually an SVA domain and it was installed by
> > > > this mechanism, not an iommufd domain for instance.
> > > >
> > > > ie you probably need a type field in the iommu_attach_handle to tell
> > > > what the priv is.
> > > >
> > > > Otherwise this seems like a great idea!
> > > Yes, you are right. For the SVA case, I will add the following changes.
> > > The IOMMUFD path will also need such enhancement. I will update it in
> > > the next version.
> > The only use for this is the PRI callbacks right? Maybe instead of
> > adding a handle type let's just check domain->iopf_handler ?
> >
> > Ie SVA will pass &ommu_sva_iopf_handler as its "type"
>
> Sorry that I don't fully understand the proposal here.

I was talking specifically about the type field you suggested adding
to the handle struct.

Instead of adding a type field check the domain->iopf_handler to
determine the domain and thus handle type.

> The problem is that the context code (SVA, IOMMUFD, etc.) needs to make
> sure that the attach handle is really what it has installed during
> domain attachment. The context code needs some mechanism to include some
> kind of "owner cookie" in the attach handle, so that it could check
> against it later for valid use.

Right, you have a derived struct for each user and you need a way to
check if casting from the general handle struct to the derived struct
is OK.

I'm suggesting using domain->iopf_handle as the type key.

Jason

2024-04-10 00:25:58

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v4 1/9] iommu: Introduce domain attachment handle

> From: Jason Gunthorpe <[email protected]>
> Sent: Wednesday, April 10, 2024 7:38 AM
>
> On Tue, Apr 09, 2024 at 09:53:26AM +0800, Baolu Lu wrote:
>
> > The current code base doesn't yet support PASID attach/detach/replace
> > uAPIs. Therefore, above code is safe and reasonable. However, we will
> > need to revisit this code when those APIs become available.
>
> Okay, I see.
>
> Can we do the PASID iommufd stuff soon? What is the plan here?
>

Yi is working on that. He will send out once the last open about ida
is handled.

2024-04-10 07:16:36

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH v4 2/9] iommu: Replace sva_iommu with iommu_attach_handle

On 4/10/24 7:48 AM, Jason Gunthorpe wrote:
> On Tue, Apr 09, 2024 at 10:11:28AM +0800, Baolu Lu wrote:
>> On 4/8/24 10:19 PM, Jason Gunthorpe wrote:
>>> On Sat, Apr 06, 2024 at 02:09:34PM +0800, Baolu Lu wrote:
>>>> On 4/3/24 7:59 PM, Jason Gunthorpe wrote:
>>>>> On Wed, Apr 03, 2024 at 09:15:12AM +0800, Lu Baolu wrote:
>>>>>> + /* A bond already exists, just take a reference`. */
>>>>>> + handle = iommu_attach_handle_get(group, iommu_mm->pasid);
>>>>>> + if (handle) {
>>>>>> + mutex_unlock(&iommu_sva_lock);
>>>>>> + return handle;
>>>>>> }
>>>>> At least in this context this is not enough we need to ensure that the
>>>>> domain on the PASID is actually an SVA domain and it was installed by
>>>>> this mechanism, not an iommufd domain for instance.
>>>>>
>>>>> ie you probably need a type field in the iommu_attach_handle to tell
>>>>> what the priv is.
>>>>>
>>>>> Otherwise this seems like a great idea!
>>>> Yes, you are right. For the SVA case, I will add the following changes.
>>>> The IOMMUFD path will also need such enhancement. I will update it in
>>>> the next version.
>>> The only use for this is the PRI callbacks right? Maybe instead of
>>> adding a handle type let's just check domain->iopf_handler ?
>>>
>>> Ie SVA will pass &ommu_sva_iopf_handler as its "type"
>> Sorry that I don't fully understand the proposal here.
> I was talking specifically about the type field you suggested adding
> to the handle struct.
>
> Instead of adding a type field check the domain->iopf_handler to
> determine the domain and thus handle type.
>
>> The problem is that the context code (SVA, IOMMUFD, etc.) needs to make
>> sure that the attach handle is really what it has installed during
>> domain attachment. The context code needs some mechanism to include some
>> kind of "owner cookie" in the attach handle, so that it could check
>> against it later for valid use.
> Right, you have a derived struct for each user and you need a way to
> check if casting from the general handle struct to the derived struct
> is OK.
>
> I'm suggesting using domain->iopf_handle as the type key.

Oh, I see. It works. Thanks!

Best regards,
baolu

2024-04-28 10:22:42

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH v4 2/9] iommu: Replace sva_iommu with iommu_attach_handle

On 2024/4/10 7:48, Jason Gunthorpe wrote:
> On Tue, Apr 09, 2024 at 10:11:28AM +0800, Baolu Lu wrote:
>> On 4/8/24 10:19 PM, Jason Gunthorpe wrote:
>>> On Sat, Apr 06, 2024 at 02:09:34PM +0800, Baolu Lu wrote:
>>>> On 4/3/24 7:59 PM, Jason Gunthorpe wrote:
>>>>> On Wed, Apr 03, 2024 at 09:15:12AM +0800, Lu Baolu wrote:
>>>>>> + /* A bond already exists, just take a reference`. */
>>>>>> + handle = iommu_attach_handle_get(group, iommu_mm->pasid);
>>>>>> + if (handle) {
>>>>>> + mutex_unlock(&iommu_sva_lock);
>>>>>> + return handle;
>>>>>> }
>>>>> At least in this context this is not enough we need to ensure that the
>>>>> domain on the PASID is actually an SVA domain and it was installed by
>>>>> this mechanism, not an iommufd domain for instance.
>>>>>
>>>>> ie you probably need a type field in the iommu_attach_handle to tell
>>>>> what the priv is.
>>>>>
>>>>> Otherwise this seems like a great idea!
>>>> Yes, you are right. For the SVA case, I will add the following changes.
>>>> The IOMMUFD path will also need such enhancement. I will update it in
>>>> the next version.
>>> The only use for this is the PRI callbacks right? Maybe instead of
>>> adding a handle type let's just check domain->iopf_handler ?
>>>
>>> Ie SVA will pass &ommu_sva_iopf_handler as its "type"
>> Sorry that I don't fully understand the proposal here.
> I was talking specifically about the type field you suggested adding
> to the handle struct.
>
> Instead of adding a type field check the domain->iopf_handler to
> determine the domain and thus handle type.
>
>> The problem is that the context code (SVA, IOMMUFD, etc.) needs to make
>> sure that the attach handle is really what it has installed during
>> domain attachment. The context code needs some mechanism to include some
>> kind of "owner cookie" in the attach handle, so that it could check
>> against it later for valid use.
> Right, you have a derived struct for each user and you need a way to
> check if casting from the general handle struct to the derived struct
> is OK.
>
> I'm suggesting using domain->iopf_handle as the type key.

After removing the refcount from the attach handle, I am trying to make
the code look like this,

/* A bond already exists, just take a reference`. */
handle = iommu_attach_handle_get(group, iommu_mm->pasid);
if (handle) {
if (handle->domain->iopf_handler !=
iommu_sva_iopf_handler) {
ret = -EBUSY;
goto out_unlock;
}

refcount_inc(&handle->users);
mutex_unlock(&iommu_sva_lock);
return handle;
}

But it appears that this code is not lock safe. If the domain on the
PASID is not a SVA domain, the check of "handle->domain->iopf_handler !=
iommu_sva_iopf_handler" could result in a use-after-free issue as the
other thread might detach the domain in between the fetch and check
lines.

Probably we still need to keep the refcount in the attach handle?

Best regards,
baolu

2024-04-29 02:39:36

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v4 2/9] iommu: Replace sva_iommu with iommu_attach_handle

> From: Baolu Lu <[email protected]>
> Sent: Sunday, April 28, 2024 6:22 PM
>
> On 2024/4/10 7:48, Jason Gunthorpe wrote:
> > On Tue, Apr 09, 2024 at 10:11:28AM +0800, Baolu Lu wrote:
> >> On 4/8/24 10:19 PM, Jason Gunthorpe wrote:
> >>> On Sat, Apr 06, 2024 at 02:09:34PM +0800, Baolu Lu wrote:
> >>>> On 4/3/24 7:59 PM, Jason Gunthorpe wrote:
> >>>>> On Wed, Apr 03, 2024 at 09:15:12AM +0800, Lu Baolu wrote:
> >>>>>> + /* A bond already exists, just take a reference`. */
> >>>>>> + handle = iommu_attach_handle_get(group, iommu_mm-
> >pasid);
> >>>>>> + if (handle) {
> >>>>>> + mutex_unlock(&iommu_sva_lock);
> >>>>>> + return handle;
> >>>>>> }
> >>>>> At least in this context this is not enough we need to ensure that the
> >>>>> domain on the PASID is actually an SVA domain and it was installed by
> >>>>> this mechanism, not an iommufd domain for instance.
> >>>>>
> >>>>> ie you probably need a type field in the iommu_attach_handle to tell
> >>>>> what the priv is.
> >>>>>
> >>>>> Otherwise this seems like a great idea!
> >>>> Yes, you are right. For the SVA case, I will add the following changes.
> >>>> The IOMMUFD path will also need such enhancement. I will update it in
> >>>> the next version.
> >>> The only use for this is the PRI callbacks right? Maybe instead of
> >>> adding a handle type let's just check domain->iopf_handler ?
> >>>
> >>> Ie SVA will pass &ommu_sva_iopf_handler as its "type"
> >> Sorry that I don't fully understand the proposal here.
> > I was talking specifically about the type field you suggested adding
> > to the handle struct.
> >
> > Instead of adding a type field check the domain->iopf_handler to
> > determine the domain and thus handle type.
> >
> >> The problem is that the context code (SVA, IOMMUFD, etc.) needs to
> make
> >> sure that the attach handle is really what it has installed during
> >> domain attachment. The context code needs some mechanism to include
> some
> >> kind of "owner cookie" in the attach handle, so that it could check
> >> against it later for valid use.
> > Right, you have a derived struct for each user and you need a way to
> > check if casting from the general handle struct to the derived struct
> > is OK.
> >
> > I'm suggesting using domain->iopf_handle as the type key.
>
> After removing the refcount from the attach handle, I am trying to make
> the code look like this,
>
> /* A bond already exists, just take a reference`. */
> handle = iommu_attach_handle_get(group, iommu_mm->pasid);
> if (handle) {
> if (handle->domain->iopf_handler !=
> iommu_sva_iopf_handler) {
> ret = -EBUSY;
> goto out_unlock;
> }
>
> refcount_inc(&handle->users);
> mutex_unlock(&iommu_sva_lock);
> return handle;
> }
>
> But it appears that this code is not lock safe. If the domain on the
> PASID is not a SVA domain, the check of "handle->domain->iopf_handler !=
> iommu_sva_iopf_handler" could result in a use-after-free issue as the
> other thread might detach the domain in between the fetch and check
> lines.
>
> Probably we still need to keep the refcount in the attach handle?
>

What about Jason's another comment in his original replies?

"
Though I'm not convinced the refcount should be elevated into the core
structure. The prior patch I showed you where the caller can provide
the memory for the handle and we don't have a priv would make it easy
to put the refcount in a SVA dervied handle struct without more
allocation. Then we don't need this weirdness.
"

That sounds like we'll need a iommu_sva like structure to hold
its own refcnt. Then we don't need this type check and refcnt
in the core.

2024-04-29 05:08:54

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH v4 2/9] iommu: Replace sva_iommu with iommu_attach_handle

On 4/29/24 10:39 AM, Tian, Kevin wrote:
>> From: Baolu Lu <[email protected]>
>> Sent: Sunday, April 28, 2024 6:22 PM
>>
>> On 2024/4/10 7:48, Jason Gunthorpe wrote:
>>> On Tue, Apr 09, 2024 at 10:11:28AM +0800, Baolu Lu wrote:
>>>> On 4/8/24 10:19 PM, Jason Gunthorpe wrote:
>>>>> On Sat, Apr 06, 2024 at 02:09:34PM +0800, Baolu Lu wrote:
>>>>>> On 4/3/24 7:59 PM, Jason Gunthorpe wrote:
>>>>>>> On Wed, Apr 03, 2024 at 09:15:12AM +0800, Lu Baolu wrote:
>>>>>>>> + /* A bond already exists, just take a reference`. */
>>>>>>>> + handle = iommu_attach_handle_get(group, iommu_mm-
>>> pasid);
>>>>>>>> + if (handle) {
>>>>>>>> + mutex_unlock(&iommu_sva_lock);
>>>>>>>> + return handle;
>>>>>>>> }
>>>>>>> At least in this context this is not enough we need to ensure that the
>>>>>>> domain on the PASID is actually an SVA domain and it was installed by
>>>>>>> this mechanism, not an iommufd domain for instance.
>>>>>>>
>>>>>>> ie you probably need a type field in the iommu_attach_handle to tell
>>>>>>> what the priv is.
>>>>>>>
>>>>>>> Otherwise this seems like a great idea!
>>>>>> Yes, you are right. For the SVA case, I will add the following changes.
>>>>>> The IOMMUFD path will also need such enhancement. I will update it in
>>>>>> the next version.
>>>>> The only use for this is the PRI callbacks right? Maybe instead of
>>>>> adding a handle type let's just check domain->iopf_handler ?
>>>>>
>>>>> Ie SVA will pass &ommu_sva_iopf_handler as its "type"
>>>> Sorry that I don't fully understand the proposal here.
>>> I was talking specifically about the type field you suggested adding
>>> to the handle struct.
>>>
>>> Instead of adding a type field check the domain->iopf_handler to
>>> determine the domain and thus handle type.
>>>
>>>> The problem is that the context code (SVA, IOMMUFD, etc.) needs to
>> make
>>>> sure that the attach handle is really what it has installed during
>>>> domain attachment. The context code needs some mechanism to include
>> some
>>>> kind of "owner cookie" in the attach handle, so that it could check
>>>> against it later for valid use.
>>> Right, you have a derived struct for each user and you need a way to
>>> check if casting from the general handle struct to the derived struct
>>> is OK.
>>>
>>> I'm suggesting using domain->iopf_handle as the type key.
>>
>> After removing the refcount from the attach handle, I am trying to make
>> the code look like this,
>>
>> /* A bond already exists, just take a reference`. */
>> handle = iommu_attach_handle_get(group, iommu_mm->pasid);
>> if (handle) {
>> if (handle->domain->iopf_handler !=
>> iommu_sva_iopf_handler) {
>> ret = -EBUSY;
>> goto out_unlock;
>> }
>>
>> refcount_inc(&handle->users);
>> mutex_unlock(&iommu_sva_lock);
>> return handle;
>> }
>>
>> But it appears that this code is not lock safe. If the domain on the
>> PASID is not a SVA domain, the check of "handle->domain->iopf_handler !=
>> iommu_sva_iopf_handler" could result in a use-after-free issue as the
>> other thread might detach the domain in between the fetch and check
>> lines.
>>
>> Probably we still need to keep the refcount in the attach handle?
>>
>
> What about Jason's another comment in his original replies?
>
> "
> Though I'm not convinced the refcount should be elevated into the core
> structure. The prior patch I showed you where the caller can provide
> the memory for the handle and we don't have a priv would make it easy
> to put the refcount in a SVA dervied handle struct without more
> allocation. Then we don't need this weirdness.
> "
>
> That sounds like we'll need a iommu_sva like structure to hold
> its own refcnt. Then we don't need this type check and refcnt
> in the core.

The problem I'm facing isn't about who allocates the handle memory.
Instead, there's no mechanism to synchronize access between two threads.
One thread might remove the handle while another fetches and reads a
member of its structure.

A similar issue exists with iommu_get_domain_for_dev_pasid(). It fetches
and returns a domain, but there's no guarantee that the domain will
*not* be freed while the caller is still using it.

One reason I introduced the reference count for attach handles is to
potentially replace iommu_get_domain_for_dev_pasid(), allowing the
domain to be accessible without any potential UAF issue.

Best regards,
baolu

2024-04-29 20:25:11

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 2/9] iommu: Replace sva_iommu with iommu_attach_handle

On Sun, Apr 28, 2024 at 06:22:28PM +0800, Baolu Lu wrote:

> /* A bond already exists, just take a reference`. */
> handle = iommu_attach_handle_get(group, iommu_mm->pasid);
> if (handle) {
> if (handle->domain->iopf_handler != iommu_sva_iopf_handler)
> {
> ret = -EBUSY;
> goto out_unlock;
> }
>
> refcount_inc(&handle->users);
> mutex_unlock(&iommu_sva_lock);
> return handle;
> }
>
> But it appears that this code is not lock safe. If the domain on the
> PASID is not a SVA domain, the check of "handle->domain->iopf_handler !=
> iommu_sva_iopf_handler" could result in a use-after-free issue as the
> other thread might detach the domain in between the fetch and check
> lines.

For the above you just need to pass in the iommu_sva_iopf_handler as
an argument to attach_handle_get() and have it check it under the
xa_lock.

The whole thing is already protected under the ugly sva_lock.

Ideally it would be protected by the group mutex..

Jason