2023-12-07 06:48:24

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v8 00/12] iommu: Prepare to deliver page faults to user space

When a user-managed page table is attached to an IOMMU, it is necessary
to deliver IO page faults to user space so that they can be handled
appropriately. One use case for this is nested translation, which is
currently being discussed in the mailing list.

I have posted a RFC series [1] that describes the implementation of
delivering page faults to user space through IOMMUFD. This series has
received several comments on the IOMMU refactoring, which I am trying to
address in this series.

The major refactoring includes:

- [PATCH 01 ~ 04] Move include/uapi/linux/iommu.h to
include/linux/iommu.h. Remove the unrecoverable fault data definition.
- [PATCH 05 ~ 06] Remove iommu_[un]register_device_fault_handler().
- [PATCH 07 ~ 10] Separate SVA and IOPF. Make IOPF a generic page fault
handling framework.
- [PATCH 11 ~ 12] Improve iopf framework for iommufd use.

This is also available at github [2].

[1] https://lore.kernel.org/linux-iommu/[email protected]/
[2] https://github.com/LuBaolu/intel-iommu/commits/preparatory-io-pgfault-delivery-v8

Change log:

v8:
- Drop PATCH 12/12 as it is no longer necessary to drain page requests page requests
during PASID translation changes.
- Separate PATCH 11/12 into two distinct patches. The first patch refines locking
scheme for protecting per-device fault data, while the second patch replaces
mutex with RCU to enhance locking efficiency.
- PATCH 01 ~ 10 are in good shapes now.

v7: https://lore.kernel.org/linux-iommu/[email protected]/
- Rebase to v6.7-rc1.
- Export iopf_group_response() for global use.
- Release lock when calling iopf handler.
- The whole series has been verified to work for SVA case on Intel
platforms by Zhao Yan. Add her Tested-by to affected patches.

v6: https://lore.kernel.org/linux-iommu/[email protected]/
- [PATCH 09/12] Check IS_ERR() against the iommu domain. [Jingqi/Jason]
- [PATCH 12/12] Rename the comments and name of iopf_queue_flush_dev(),
no functionality changes. [Kevin]
- All patches rebased on the latest iommu/core branch.

v5: https://lore.kernel.org/linux-iommu/[email protected]/
- Consolidate per-device fault data management. (New patch 11)
- Improve iopf_queue_flush_dev(). (New patch 12)

v4: https://lore.kernel.org/linux-iommu/[email protected]/
- Merge iommu_fault_event and iopf_fault. They are duplicate.
- Move iommu_report_device_fault() and iommu_page_response() to
io-pgfault.c.
- Move iommu_sva_domain_alloc() to iommu-sva.c.
- Add group->domain and use it directly in sva fault handler.
- Misc code refactoring and refining.

v3: https://lore.kernel.org/linux-iommu/[email protected]/
- Convert the fault data structures from uAPI to kAPI.
- Merge iopf_device_param into iommu_fault_param.
- Add debugging on domain lifetime for iopf.
- Remove patch "iommu: Change the return value of dev_iommu_get()".
- Remove patch "iommu: Add helper to set iopf handler for domain".
- Misc code refactoring and refining.

v2: https://lore.kernel.org/linux-iommu/[email protected]/
- Remove unrecoverable fault data definition as suggested by Kevin.
- Drop the per-device fault cookie code considering that doesn't make
much sense for SVA.
- Make the IOMMU page fault handling framework generic. So that it can
available for use cases other than SVA.

v1: https://lore.kernel.org/linux-iommu/[email protected]/

Lu Baolu (12):
iommu: Move iommu fault data to linux/iommu.h
iommu/arm-smmu-v3: Remove unrecoverable faults reporting
iommu: Remove unrecoverable fault data
iommu: Cleanup iopf data structure definitions
iommu: Merge iopf_device_param into iommu_fault_param
iommu: Remove iommu_[un]register_device_fault_handler()
iommu: Merge iommu_fault_event and iopf_fault
iommu: Prepare for separating SVA and IOPF
iommu: Make iommu_queue_iopf() more generic
iommu: Separate SVA and IOPF
iommu: Refine locking for per-device fault data management
iommu: Use refcount for fault data access

include/linux/iommu.h | 267 +++++++++---
drivers/iommu/intel/iommu.h | 2 +-
drivers/iommu/iommu-sva.h | 71 ---
include/uapi/linux/iommu.h | 161 -------
.../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 14 +-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 51 +--
drivers/iommu/intel/iommu.c | 25 +-
drivers/iommu/intel/svm.c | 6 +-
drivers/iommu/io-pgfault.c | 412 +++++++++++-------
drivers/iommu/iommu-sva.c | 67 ++-
drivers/iommu/iommu.c | 233 ----------
MAINTAINERS | 1 -
drivers/iommu/Kconfig | 4 +
drivers/iommu/Makefile | 3 +-
drivers/iommu/intel/Kconfig | 1 +
15 files changed, 551 insertions(+), 767 deletions(-)
delete mode 100644 drivers/iommu/iommu-sva.h
delete mode 100644 include/uapi/linux/iommu.h

--
2.34.1


2023-12-07 06:48:32

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v8 02/12] iommu/arm-smmu-v3: Remove unrecoverable faults reporting

No device driver registers fault handler to handle the reported
unrecoveraable faults. Remove it to avoid dead code.

Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Tested-by: Longfang Liu <[email protected]>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 46 ++++++---------------
1 file changed, 13 insertions(+), 33 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index fc4317c25b6d..0f33b212ca28 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1463,7 +1463,6 @@ arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
{
int ret;
- u32 reason;
u32 perm = 0;
struct arm_smmu_master *master;
bool ssid_valid = evt[0] & EVTQ_0_SSV;
@@ -1473,16 +1472,9 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)

switch (FIELD_GET(EVTQ_0_ID, evt[0])) {
case EVT_ID_TRANSLATION_FAULT:
- reason = IOMMU_FAULT_REASON_PTE_FETCH;
- break;
case EVT_ID_ADDR_SIZE_FAULT:
- reason = IOMMU_FAULT_REASON_OOR_ADDRESS;
- break;
case EVT_ID_ACCESS_FAULT:
- reason = IOMMU_FAULT_REASON_ACCESS;
- break;
case EVT_ID_PERMISSION_FAULT:
- reason = IOMMU_FAULT_REASON_PERMISSION;
break;
default:
return -EOPNOTSUPP;
@@ -1492,6 +1484,9 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
if (evt[1] & EVTQ_1_S2)
return -EFAULT;

+ if (!(evt[1] & EVTQ_1_STALL))
+ return -EOPNOTSUPP;
+
if (evt[1] & EVTQ_1_RnW)
perm |= IOMMU_FAULT_PERM_READ;
else
@@ -1503,32 +1498,17 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
if (evt[1] & EVTQ_1_PnU)
perm |= IOMMU_FAULT_PERM_PRIV;

- if (evt[1] & EVTQ_1_STALL) {
- flt->type = IOMMU_FAULT_PAGE_REQ;
- flt->prm = (struct iommu_fault_page_request) {
- .flags = IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE,
- .grpid = FIELD_GET(EVTQ_1_STAG, evt[1]),
- .perm = perm,
- .addr = FIELD_GET(EVTQ_2_ADDR, evt[2]),
- };
+ flt->type = IOMMU_FAULT_PAGE_REQ;
+ flt->prm = (struct iommu_fault_page_request) {
+ .flags = IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE,
+ .grpid = FIELD_GET(EVTQ_1_STAG, evt[1]),
+ .perm = perm,
+ .addr = FIELD_GET(EVTQ_2_ADDR, evt[2]),
+ };

- if (ssid_valid) {
- flt->prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
- flt->prm.pasid = FIELD_GET(EVTQ_0_SSID, evt[0]);
- }
- } else {
- flt->type = IOMMU_FAULT_DMA_UNRECOV;
- flt->event = (struct iommu_fault_unrecoverable) {
- .reason = reason,
- .flags = IOMMU_FAULT_UNRECOV_ADDR_VALID,
- .perm = perm,
- .addr = FIELD_GET(EVTQ_2_ADDR, evt[2]),
- };
-
- if (ssid_valid) {
- flt->event.flags |= IOMMU_FAULT_UNRECOV_PASID_VALID;
- flt->event.pasid = FIELD_GET(EVTQ_0_SSID, evt[0]);
- }
+ if (ssid_valid) {
+ flt->prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
+ flt->prm.pasid = FIELD_GET(EVTQ_0_SSID, evt[0]);
}

mutex_lock(&smmu->streams_mutex);
--
2.34.1

2023-12-07 06:48:35

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v8 04/12] iommu: Cleanup iopf data structure definitions

struct iommu_fault_page_request and struct iommu_page_response are not
part of uAPI anymore. Convert them to data structures for kAPI.

Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Reviewed-by: Yi Liu <[email protected]>
Tested-by: Yan Zhao <[email protected]>
Tested-by: Longfang Liu <[email protected]>
---
include/linux/iommu.h | 27 +++++++++++----------------
drivers/iommu/io-pgfault.c | 1 -
drivers/iommu/iommu.c | 4 ----
3 files changed, 11 insertions(+), 21 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index c1e4369b1957..ca539df70a67 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -71,12 +71,12 @@ struct iommu_fault_page_request {
#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE (1 << 1)
#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA (1 << 2)
#define IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID (1 << 3)
- __u32 flags;
- __u32 pasid;
- __u32 grpid;
- __u32 perm;
- __u64 addr;
- __u64 private_data[2];
+ u32 flags;
+ u32 pasid;
+ u32 grpid;
+ u32 perm;
+ u64 addr;
+ u64 private_data[2];
};

/**
@@ -85,7 +85,7 @@ struct iommu_fault_page_request {
* @prm: Page Request message, when @type is %IOMMU_FAULT_PAGE_REQ
*/
struct iommu_fault {
- __u32 type;
+ u32 type;
struct iommu_fault_page_request prm;
};

@@ -106,8 +106,6 @@ enum iommu_page_response_code {

/**
* struct iommu_page_response - Generic page response information
- * @argsz: User filled size of this data
- * @version: API version of this structure
* @flags: encodes whether the corresponding fields are valid
* (IOMMU_FAULT_PAGE_RESPONSE_* values)
* @pasid: Process Address Space ID
@@ -115,14 +113,11 @@ enum iommu_page_response_code {
* @code: response code from &enum iommu_page_response_code
*/
struct iommu_page_response {
- __u32 argsz;
-#define IOMMU_PAGE_RESP_VERSION_1 1
- __u32 version;
#define IOMMU_PAGE_RESP_PASID_VALID (1 << 0)
- __u32 flags;
- __u32 pasid;
- __u32 grpid;
- __u32 code;
+ u32 flags;
+ u32 pasid;
+ u32 grpid;
+ u32 code;
};


diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index e5b8b9110c13..24b5545352ae 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -56,7 +56,6 @@ static int iopf_complete_group(struct device *dev, struct iopf_fault *iopf,
enum iommu_page_response_code status)
{
struct iommu_page_response resp = {
- .version = IOMMU_PAGE_RESP_VERSION_1,
.pasid = iopf->fault.prm.pasid,
.grpid = iopf->fault.prm.grpid,
.code = status,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d0a28667479a..2064946e2bb4 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1483,10 +1483,6 @@ int iommu_page_response(struct device *dev,
if (!param || !param->fault_param)
return -EINVAL;

- if (msg->version != IOMMU_PAGE_RESP_VERSION_1 ||
- msg->flags & ~IOMMU_PAGE_RESP_PASID_VALID)
- return -EINVAL;
-
/* Only send response if there is a fault report pending */
mutex_lock(&param->fault_param->lock);
if (list_empty(&param->fault_param->faults)) {
--
2.34.1

2023-12-07 06:48:52

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v8 03/12] iommu: Remove unrecoverable fault data

The unrecoverable fault data is not used anywhere. Remove it to avoid
dead code.

Suggested-by: Kevin Tian <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Tested-by: Yan Zhao <[email protected]>
Tested-by: Longfang Liu <[email protected]>
---
include/linux/iommu.h | 72 ++-----------------------------------------
1 file changed, 2 insertions(+), 70 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d683ebf152bd..c1e4369b1957 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -50,67 +50,7 @@ struct iommu_dma_cookie;

/* Generic fault types, can be expanded IRQ remapping fault */
enum iommu_fault_type {
- IOMMU_FAULT_DMA_UNRECOV = 1, /* unrecoverable fault */
- IOMMU_FAULT_PAGE_REQ, /* page request fault */
-};
-
-enum iommu_fault_reason {
- IOMMU_FAULT_REASON_UNKNOWN = 0,
-
- /* Could not access the PASID table (fetch caused external abort) */
- IOMMU_FAULT_REASON_PASID_FETCH,
-
- /* PASID entry is invalid or has configuration errors */
- IOMMU_FAULT_REASON_BAD_PASID_ENTRY,
-
- /*
- * PASID is out of range (e.g. exceeds the maximum PASID
- * supported by the IOMMU) or disabled.
- */
- IOMMU_FAULT_REASON_PASID_INVALID,
-
- /*
- * An external abort occurred fetching (or updating) a translation
- * table descriptor
- */
- IOMMU_FAULT_REASON_WALK_EABT,
-
- /*
- * Could not access the page table entry (Bad address),
- * actual translation fault
- */
- IOMMU_FAULT_REASON_PTE_FETCH,
-
- /* Protection flag check failed */
- IOMMU_FAULT_REASON_PERMISSION,
-
- /* access flag check failed */
- IOMMU_FAULT_REASON_ACCESS,
-
- /* Output address of a translation stage caused Address Size fault */
- IOMMU_FAULT_REASON_OOR_ADDRESS,
-};
-
-/**
- * struct iommu_fault_unrecoverable - Unrecoverable fault data
- * @reason: reason of the fault, from &enum iommu_fault_reason
- * @flags: parameters of this fault (IOMMU_FAULT_UNRECOV_* values)
- * @pasid: Process Address Space ID
- * @perm: requested permission access using by the incoming transaction
- * (IOMMU_FAULT_PERM_* values)
- * @addr: offending page address
- * @fetch_addr: address that caused a fetch abort, if any
- */
-struct iommu_fault_unrecoverable {
- __u32 reason;
-#define IOMMU_FAULT_UNRECOV_PASID_VALID (1 << 0)
-#define IOMMU_FAULT_UNRECOV_ADDR_VALID (1 << 1)
-#define IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID (1 << 2)
- __u32 flags;
- __u32 pasid;
- __u32 perm;
- __u64 addr;
- __u64 fetch_addr;
+ IOMMU_FAULT_PAGE_REQ = 1, /* page request fault */
};

/**
@@ -142,19 +82,11 @@ struct iommu_fault_page_request {
/**
* struct iommu_fault - Generic fault data
* @type: fault type from &enum iommu_fault_type
- * @padding: reserved for future use (should be zero)
- * @event: fault event, when @type is %IOMMU_FAULT_DMA_UNRECOV
* @prm: Page Request message, when @type is %IOMMU_FAULT_PAGE_REQ
- * @padding2: sets the fault size to allow for future extensions
*/
struct iommu_fault {
__u32 type;
- __u32 padding;
- union {
- struct iommu_fault_unrecoverable event;
- struct iommu_fault_page_request prm;
- __u8 padding2[56];
- };
+ struct iommu_fault_page_request prm;
};

/**
--
2.34.1

2023-12-07 06:48:58

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v8 05/12] iommu: Merge iopf_device_param into iommu_fault_param

The struct dev_iommu contains two pointers, fault_param and iopf_param.
The fault_param pointer points to a data structure that is used to store
pending faults that are awaiting responses. The iopf_param pointer points
to a data structure that is used to store partial faults that are part of
a Page Request Group.

The fault_param and iopf_param pointers are essentially duplicate. This
causes memory waste. Merge the iopf_device_param pointer into the
iommu_fault_param pointer to consolidate the code and save memory. The
consolidated pointer would be allocated on demand when the device driver
enables the iopf on device, and would be freed after iopf is disabled.

Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Tested-by: Yan Zhao <[email protected]>
Tested-by: Longfang Liu <[email protected]>
---
include/linux/iommu.h | 18 ++++--
drivers/iommu/io-pgfault.c | 110 ++++++++++++++++++-------------------
drivers/iommu/iommu.c | 34 ++----------
3 files changed, 72 insertions(+), 90 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index ca539df70a67..8a9b71191597 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -42,6 +42,7 @@ struct notifier_block;
struct iommu_sva;
struct iommu_fault_event;
struct iommu_dma_cookie;
+struct iopf_queue;

#define IOMMU_FAULT_PERM_READ (1 << 0) /* read */
#define IOMMU_FAULT_PERM_WRITE (1 << 1) /* write */
@@ -590,21 +591,31 @@ struct iommu_fault_event {
* struct iommu_fault_param - per-device IOMMU fault data
* @handler: Callback function to handle IOMMU faults at device level
* @data: handler private data
- * @faults: holds the pending faults which needs response
* @lock: protect pending faults list
+ * @dev: the device that owns this param
+ * @queue: IOPF queue
+ * @queue_list: index into queue->devices
+ * @partial: faults that are part of a Page Request Group for which the last
+ * request hasn't been submitted yet.
+ * @faults: holds the pending faults which need response
*/
struct iommu_fault_param {
iommu_dev_fault_handler_t handler;
void *data;
+ struct mutex lock;
+
+ struct device *dev;
+ struct iopf_queue *queue;
+ struct list_head queue_list;
+
+ struct list_head partial;
struct list_head faults;
- struct mutex lock;
};

/**
* struct dev_iommu - Collection of per-device IOMMU data
*
* @fault_param: IOMMU detected device fault reporting data
- * @iopf_param: I/O Page Fault queue and data
* @fwspec: IOMMU fwspec data
* @iommu_dev: IOMMU device this device is linked to
* @priv: IOMMU Driver private data
@@ -620,7 +631,6 @@ struct iommu_fault_param {
struct dev_iommu {
struct mutex lock;
struct iommu_fault_param *fault_param;
- struct iopf_device_param *iopf_param;
struct iommu_fwspec *fwspec;
struct iommu_device *iommu_dev;
void *priv;
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 24b5545352ae..f948303b2a91 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -25,21 +25,6 @@ struct iopf_queue {
struct mutex lock;
};

-/**
- * struct iopf_device_param - IO Page Fault data attached to a device
- * @dev: the device that owns this param
- * @queue: IOPF queue
- * @queue_list: index into queue->devices
- * @partial: faults that are part of a Page Request Group for which the last
- * request hasn't been submitted yet.
- */
-struct iopf_device_param {
- struct device *dev;
- struct iopf_queue *queue;
- struct list_head queue_list;
- struct list_head partial;
-};
-
struct iopf_fault {
struct iommu_fault fault;
struct list_head list;
@@ -144,7 +129,7 @@ int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
int ret;
struct iopf_group *group;
struct iopf_fault *iopf, *next;
- struct iopf_device_param *iopf_param;
+ struct iommu_fault_param *iopf_param;

struct device *dev = cookie;
struct dev_iommu *param = dev->iommu;
@@ -159,7 +144,7 @@ int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
* As long as we're holding param->lock, the queue can't be unlinked
* from the device and therefore cannot disappear.
*/
- iopf_param = param->iopf_param;
+ iopf_param = param->fault_param;
if (!iopf_param)
return -ENODEV;

@@ -229,14 +214,14 @@ EXPORT_SYMBOL_GPL(iommu_queue_iopf);
int iopf_queue_flush_dev(struct device *dev)
{
int ret = 0;
- struct iopf_device_param *iopf_param;
+ struct iommu_fault_param *iopf_param;
struct dev_iommu *param = dev->iommu;

if (!param)
return -ENODEV;

mutex_lock(&param->lock);
- iopf_param = param->iopf_param;
+ iopf_param = param->fault_param;
if (iopf_param)
flush_workqueue(iopf_param->queue->wq);
else
@@ -260,7 +245,7 @@ EXPORT_SYMBOL_GPL(iopf_queue_flush_dev);
int iopf_queue_discard_partial(struct iopf_queue *queue)
{
struct iopf_fault *iopf, *next;
- struct iopf_device_param *iopf_param;
+ struct iommu_fault_param *iopf_param;

if (!queue)
return -EINVAL;
@@ -287,34 +272,36 @@ EXPORT_SYMBOL_GPL(iopf_queue_discard_partial);
*/
int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev)
{
- int ret = -EBUSY;
- struct iopf_device_param *iopf_param;
+ int ret = 0;
struct dev_iommu *param = dev->iommu;
-
- if (!param)
- return -ENODEV;
-
- iopf_param = kzalloc(sizeof(*iopf_param), GFP_KERNEL);
- if (!iopf_param)
- return -ENOMEM;
-
- INIT_LIST_HEAD(&iopf_param->partial);
- iopf_param->queue = queue;
- iopf_param->dev = dev;
+ struct iommu_fault_param *fault_param;

mutex_lock(&queue->lock);
mutex_lock(&param->lock);
- if (!param->iopf_param) {
- list_add(&iopf_param->queue_list, &queue->devices);
- param->iopf_param = iopf_param;
- ret = 0;
+ if (param->fault_param) {
+ ret = -EBUSY;
+ goto done_unlock;
}
+
+ fault_param = kzalloc(sizeof(*fault_param), GFP_KERNEL);
+ if (!fault_param) {
+ ret = -ENOMEM;
+ goto done_unlock;
+ }
+
+ mutex_init(&fault_param->lock);
+ INIT_LIST_HEAD(&fault_param->faults);
+ INIT_LIST_HEAD(&fault_param->partial);
+ fault_param->dev = dev;
+ list_add(&fault_param->queue_list, &queue->devices);
+ fault_param->queue = queue;
+
+ param->fault_param = fault_param;
+
+done_unlock:
mutex_unlock(&param->lock);
mutex_unlock(&queue->lock);

- if (ret)
- kfree(iopf_param);
-
return ret;
}
EXPORT_SYMBOL_GPL(iopf_queue_add_device);
@@ -330,34 +317,41 @@ EXPORT_SYMBOL_GPL(iopf_queue_add_device);
*/
int iopf_queue_remove_device(struct iopf_queue *queue, struct device *dev)
{
- int ret = -EINVAL;
+ int ret = 0;
struct iopf_fault *iopf, *next;
- struct iopf_device_param *iopf_param;
struct dev_iommu *param = dev->iommu;
-
- if (!param || !queue)
- return -EINVAL;
+ struct iommu_fault_param *fault_param = param->fault_param;

mutex_lock(&queue->lock);
mutex_lock(&param->lock);
- iopf_param = param->iopf_param;
- if (iopf_param && iopf_param->queue == queue) {
- list_del(&iopf_param->queue_list);
- param->iopf_param = NULL;
- ret = 0;
+ if (!fault_param) {
+ ret = -ENODEV;
+ goto unlock;
}
- mutex_unlock(&param->lock);
- mutex_unlock(&queue->lock);
- if (ret)
- return ret;
+
+ if (fault_param->queue != queue) {
+ ret = -EINVAL;
+ goto unlock;
+ }
+
+ if (!list_empty(&fault_param->faults)) {
+ ret = -EBUSY;
+ goto unlock;
+ }
+
+ list_del(&fault_param->queue_list);

/* Just in case some faults are still stuck */
- list_for_each_entry_safe(iopf, next, &iopf_param->partial, list)
+ list_for_each_entry_safe(iopf, next, &fault_param->partial, list)
kfree(iopf);

- kfree(iopf_param);
+ param->fault_param = NULL;
+ kfree(fault_param);
+unlock:
+ mutex_unlock(&param->lock);
+ mutex_unlock(&queue->lock);

- return 0;
+ return ret;
}
EXPORT_SYMBOL_GPL(iopf_queue_remove_device);

@@ -403,7 +397,7 @@ EXPORT_SYMBOL_GPL(iopf_queue_alloc);
*/
void iopf_queue_free(struct iopf_queue *queue)
{
- struct iopf_device_param *iopf_param, *next;
+ struct iommu_fault_param *iopf_param, *next;

if (!queue)
return;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 2064946e2bb4..ca55fed4fbff 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1344,27 +1344,18 @@ int iommu_register_device_fault_handler(struct device *dev,
struct dev_iommu *param = dev->iommu;
int ret = 0;

- if (!param)
+ if (!param || !param->fault_param)
return -EINVAL;

mutex_lock(&param->lock);
/* Only allow one fault handler registered for each device */
- if (param->fault_param) {
+ if (param->fault_param->handler) {
ret = -EBUSY;
goto done_unlock;
}

- get_device(dev);
- param->fault_param = kzalloc(sizeof(*param->fault_param), GFP_KERNEL);
- if (!param->fault_param) {
- put_device(dev);
- ret = -ENOMEM;
- goto done_unlock;
- }
param->fault_param->handler = handler;
param->fault_param->data = data;
- mutex_init(&param->fault_param->lock);
- INIT_LIST_HEAD(&param->fault_param->faults);

done_unlock:
mutex_unlock(&param->lock);
@@ -1385,29 +1376,16 @@ EXPORT_SYMBOL_GPL(iommu_register_device_fault_handler);
int iommu_unregister_device_fault_handler(struct device *dev)
{
struct dev_iommu *param = dev->iommu;
- int ret = 0;

- if (!param)
+ if (!param || !param->fault_param)
return -EINVAL;

mutex_lock(&param->lock);
-
- if (!param->fault_param)
- goto unlock;
-
- /* we cannot unregister handler if there are pending faults */
- if (!list_empty(&param->fault_param->faults)) {
- ret = -EBUSY;
- goto unlock;
- }
-
- kfree(param->fault_param);
- param->fault_param = NULL;
- put_device(dev);
-unlock:
+ param->fault_param->handler = NULL;
+ param->fault_param->data = NULL;
mutex_unlock(&param->lock);

- return ret;
+ return 0;
}
EXPORT_SYMBOL_GPL(iommu_unregister_device_fault_handler);

--
2.34.1

2023-12-07 06:49:16

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v8 06/12] iommu: Remove iommu_[un]register_device_fault_handler()

The individual iommu driver reports the iommu page faults by calling
iommu_report_device_fault(), where a pre-registered device fault handler
is called to route the fault to another fault handler installed on the
corresponding iommu domain.

The pre-registered device fault handler is static and won't be dynamic
as the fault handler is eventually per iommu domain. Replace calling
device fault handler with iommu_queue_iopf().

After this replacement, the registering and unregistering fault handler
interfaces are not needed anywhere. Remove the interfaces and the related
data structures to avoid dead code.

Convert cookie parameter of iommu_queue_iopf() into a device pointer that
is really passed.

Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Tested-by: Yan Zhao <[email protected]>
Tested-by: Longfang Liu <[email protected]>
---
include/linux/iommu.h | 23 ------
drivers/iommu/iommu-sva.h | 4 +-
.../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 13 +---
drivers/iommu/intel/iommu.c | 24 ++----
drivers/iommu/io-pgfault.c | 6 +-
drivers/iommu/iommu.c | 76 +------------------
6 files changed, 13 insertions(+), 133 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 8a9b71191597..6a74711dc47a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -128,7 +128,6 @@ struct iommu_page_response {

typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
struct device *, unsigned long, int, void *);
-typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault *, void *);

struct iommu_domain_geometry {
dma_addr_t aperture_start; /* First address that can be mapped */
@@ -589,8 +588,6 @@ struct iommu_fault_event {

/**
* struct iommu_fault_param - per-device IOMMU fault data
- * @handler: Callback function to handle IOMMU faults at device level
- * @data: handler private data
* @lock: protect pending faults list
* @dev: the device that owns this param
* @queue: IOPF queue
@@ -600,8 +597,6 @@ struct iommu_fault_event {
* @faults: holds the pending faults which need response
*/
struct iommu_fault_param {
- iommu_dev_fault_handler_t handler;
- void *data;
struct mutex lock;

struct device *dev;
@@ -724,11 +719,6 @@ extern int iommu_group_for_each_dev(struct iommu_group *group, void *data,
extern struct iommu_group *iommu_group_get(struct device *dev);
extern struct iommu_group *iommu_group_ref_get(struct iommu_group *group);
extern void iommu_group_put(struct iommu_group *group);
-extern int iommu_register_device_fault_handler(struct device *dev,
- iommu_dev_fault_handler_t handler,
- void *data);
-
-extern int iommu_unregister_device_fault_handler(struct device *dev);

extern int iommu_report_device_fault(struct device *dev,
struct iommu_fault_event *evt);
@@ -1138,19 +1128,6 @@ static inline void iommu_group_put(struct iommu_group *group)
{
}

-static inline
-int iommu_register_device_fault_handler(struct device *dev,
- iommu_dev_fault_handler_t handler,
- void *data)
-{
- return -ENODEV;
-}
-
-static inline int iommu_unregister_device_fault_handler(struct device *dev)
-{
- return 0;
-}
-
static inline
int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
{
diff --git a/drivers/iommu/iommu-sva.h b/drivers/iommu/iommu-sva.h
index 54946b5a7caf..de7819c796ce 100644
--- a/drivers/iommu/iommu-sva.h
+++ b/drivers/iommu/iommu-sva.h
@@ -13,7 +13,7 @@ struct iommu_fault;
struct iopf_queue;

#ifdef CONFIG_IOMMU_SVA
-int iommu_queue_iopf(struct iommu_fault *fault, void *cookie);
+int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev);

int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev);
int iopf_queue_remove_device(struct iopf_queue *queue,
@@ -26,7 +26,7 @@ enum iommu_page_response_code
iommu_sva_handle_iopf(struct iommu_fault *fault, void *data);

#else /* CONFIG_IOMMU_SVA */
-static inline int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
+static inline int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev)
{
return -ENODEV;
}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index 353248ab18e7..84c9554144cb 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -480,7 +480,6 @@ bool arm_smmu_master_sva_enabled(struct arm_smmu_master *master)

static int arm_smmu_master_sva_enable_iopf(struct arm_smmu_master *master)
{
- int ret;
struct device *dev = master->dev;

/*
@@ -493,16 +492,7 @@ static int arm_smmu_master_sva_enable_iopf(struct arm_smmu_master *master)
if (!master->iopf_enabled)
return -EINVAL;

- ret = iopf_queue_add_device(master->smmu->evtq.iopf, dev);
- if (ret)
- return ret;
-
- ret = iommu_register_device_fault_handler(dev, iommu_queue_iopf, dev);
- if (ret) {
- iopf_queue_remove_device(master->smmu->evtq.iopf, dev);
- return ret;
- }
- return 0;
+ return iopf_queue_add_device(master->smmu->evtq.iopf, dev);
}

static void arm_smmu_master_sva_disable_iopf(struct arm_smmu_master *master)
@@ -512,7 +502,6 @@ static void arm_smmu_master_sva_disable_iopf(struct arm_smmu_master *master)
if (!master->iopf_enabled)
return;

- iommu_unregister_device_fault_handler(dev);
iopf_queue_remove_device(master->smmu->evtq.iopf, dev);
}

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 897159dba47d..17f7690f4919 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4620,23 +4620,15 @@ static int intel_iommu_enable_iopf(struct device *dev)
if (ret)
return ret;

- ret = iommu_register_device_fault_handler(dev, iommu_queue_iopf, dev);
- if (ret)
- goto iopf_remove_device;
-
ret = pci_enable_pri(pdev, PRQ_DEPTH);
- if (ret)
- goto iopf_unregister_handler;
+ if (ret) {
+ iopf_queue_remove_device(iommu->iopf_queue, dev);
+ return ret;
+ }
+
info->pri_enabled = 1;

return 0;
-
-iopf_unregister_handler:
- iommu_unregister_device_fault_handler(dev);
-iopf_remove_device:
- iopf_queue_remove_device(iommu->iopf_queue, dev);
-
- return ret;
}

static int intel_iommu_disable_iopf(struct device *dev)
@@ -4659,11 +4651,9 @@ static int intel_iommu_disable_iopf(struct device *dev)
info->pri_enabled = 0;

/*
- * With PRI disabled and outstanding PRQs drained, unregistering
- * fault handler and removing device from iopf queue should never
- * fail.
+ * With PRI disabled and outstanding PRQs drained, removing device
+ * from iopf queue should never fail.
*/
- WARN_ON(iommu_unregister_device_fault_handler(dev));
WARN_ON(iopf_queue_remove_device(iommu->iopf_queue, dev));

return 0;
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index f948303b2a91..4fda01de5589 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -87,7 +87,7 @@ static void iopf_handler(struct work_struct *work)
/**
* iommu_queue_iopf - IO Page Fault handler
* @fault: fault event
- * @cookie: struct device, passed to iommu_register_device_fault_handler.
+ * @dev: struct device.
*
* Add a fault to the device workqueue, to be handled by mm.
*
@@ -124,14 +124,12 @@ static void iopf_handler(struct work_struct *work)
*
* Return: 0 on success and <0 on error.
*/
-int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
+int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev)
{
int ret;
struct iopf_group *group;
struct iopf_fault *iopf, *next;
struct iommu_fault_param *iopf_param;
-
- struct device *dev = cookie;
struct dev_iommu *param = dev->iommu;

lockdep_assert_held(&param->lock);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index ca55fed4fbff..9e2f399044bf 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1319,76 +1319,6 @@ void iommu_group_put(struct iommu_group *group)
}
EXPORT_SYMBOL_GPL(iommu_group_put);

-/**
- * iommu_register_device_fault_handler() - Register a device fault handler
- * @dev: the device
- * @handler: the fault handler
- * @data: private data passed as argument to the handler
- *
- * When an IOMMU fault event is received, this handler gets called with the
- * fault event and data as argument. The handler should return 0 on success. If
- * the fault is recoverable (IOMMU_FAULT_PAGE_REQ), the consumer should also
- * complete the fault by calling iommu_page_response() with one of the following
- * response code:
- * - IOMMU_PAGE_RESP_SUCCESS: retry the translation
- * - IOMMU_PAGE_RESP_INVALID: terminate the fault
- * - IOMMU_PAGE_RESP_FAILURE: terminate the fault and stop reporting
- * page faults if possible.
- *
- * Return 0 if the fault handler was installed successfully, or an error.
- */
-int iommu_register_device_fault_handler(struct device *dev,
- iommu_dev_fault_handler_t handler,
- void *data)
-{
- struct dev_iommu *param = dev->iommu;
- int ret = 0;
-
- if (!param || !param->fault_param)
- return -EINVAL;
-
- mutex_lock(&param->lock);
- /* Only allow one fault handler registered for each device */
- if (param->fault_param->handler) {
- ret = -EBUSY;
- goto done_unlock;
- }
-
- param->fault_param->handler = handler;
- param->fault_param->data = data;
-
-done_unlock:
- mutex_unlock(&param->lock);
-
- return ret;
-}
-EXPORT_SYMBOL_GPL(iommu_register_device_fault_handler);
-
-/**
- * iommu_unregister_device_fault_handler() - Unregister the device fault handler
- * @dev: the device
- *
- * Remove the device fault handler installed with
- * iommu_register_device_fault_handler().
- *
- * Return 0 on success, or an error.
- */
-int iommu_unregister_device_fault_handler(struct device *dev)
-{
- struct dev_iommu *param = dev->iommu;
-
- if (!param || !param->fault_param)
- return -EINVAL;
-
- mutex_lock(&param->lock);
- param->fault_param->handler = NULL;
- param->fault_param->data = NULL;
- mutex_unlock(&param->lock);
-
- return 0;
-}
-EXPORT_SYMBOL_GPL(iommu_unregister_device_fault_handler);
-
/**
* iommu_report_device_fault() - Report fault event to device driver
* @dev: the device
@@ -1413,10 +1343,6 @@ int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
/* we only report device fault if there is a handler registered */
mutex_lock(&param->lock);
fparam = param->fault_param;
- if (!fparam || !fparam->handler) {
- ret = -EINVAL;
- goto done_unlock;
- }

if (evt->fault.type == IOMMU_FAULT_PAGE_REQ &&
(evt->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) {
@@ -1431,7 +1357,7 @@ int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
mutex_unlock(&fparam->lock);
}

- ret = fparam->handler(&evt->fault, fparam->data);
+ ret = iommu_queue_iopf(&evt->fault, dev);
if (ret && evt_pending) {
mutex_lock(&fparam->lock);
list_del(&evt_pending->list);
--
2.34.1

2023-12-07 06:49:20

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v8 07/12] iommu: Merge iommu_fault_event and iopf_fault

The iommu_fault_event and iopf_fault data structures store the same
information about an iopf fault. They are also used in the same way.
Merge these two data structures into a single one to make the code
more concise and easier to maintain.

Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Yi Liu <[email protected]>
Tested-by: Yan Zhao <[email protected]>
Tested-by: Longfang Liu <[email protected]>
---
include/linux/iommu.h | 27 ++++++---------------
drivers/iommu/intel/iommu.h | 2 +-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 4 +--
drivers/iommu/intel/svm.c | 5 ++--
drivers/iommu/io-pgfault.c | 5 ----
drivers/iommu/iommu.c | 8 +++---
6 files changed, 17 insertions(+), 34 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 6a74711dc47a..d0d120efce11 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -40,7 +40,6 @@ struct iommu_domain_ops;
struct iommu_dirty_ops;
struct notifier_block;
struct iommu_sva;
-struct iommu_fault_event;
struct iommu_dma_cookie;
struct iopf_queue;

@@ -121,6 +120,11 @@ struct iommu_page_response {
u32 code;
};

+struct iopf_fault {
+ struct iommu_fault fault;
+ /* node for pending lists */
+ struct list_head list;
+};

/* iommu fault flags */
#define IOMMU_FAULT_READ 0x0
@@ -480,7 +484,7 @@ struct iommu_ops {
int (*dev_disable_feat)(struct device *dev, enum iommu_dev_features f);

int (*page_response)(struct device *dev,
- struct iommu_fault_event *evt,
+ struct iopf_fault *evt,
struct iommu_page_response *msg);

int (*def_domain_type)(struct device *dev);
@@ -572,20 +576,6 @@ struct iommu_device {
u32 max_pasids;
};

-/**
- * struct iommu_fault_event - Generic fault event
- *
- * Can represent recoverable faults such as a page requests or
- * unrecoverable faults such as DMA or IRQ remapping faults.
- *
- * @fault: fault descriptor
- * @list: pending fault event list, used for tracking responses
- */
-struct iommu_fault_event {
- struct iommu_fault fault;
- struct list_head list;
-};
-
/**
* struct iommu_fault_param - per-device IOMMU fault data
* @lock: protect pending faults list
@@ -720,8 +710,7 @@ extern struct iommu_group *iommu_group_get(struct device *dev);
extern struct iommu_group *iommu_group_ref_get(struct iommu_group *group);
extern void iommu_group_put(struct iommu_group *group);

-extern int iommu_report_device_fault(struct device *dev,
- struct iommu_fault_event *evt);
+extern int iommu_report_device_fault(struct device *dev, struct iopf_fault *evt);
extern int iommu_page_response(struct device *dev,
struct iommu_page_response *msg);

@@ -1129,7 +1118,7 @@ static inline void iommu_group_put(struct iommu_group *group)
}

static inline
-int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
+int iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
{
return -ENODEV;
}
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index ce030c5b5772..c8cd5d4047e7 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -908,7 +908,7 @@ struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *parent,
void intel_svm_check(struct intel_iommu *iommu);
int intel_svm_enable_prq(struct intel_iommu *iommu);
int intel_svm_finish_prq(struct intel_iommu *iommu);
-int intel_svm_page_response(struct device *dev, struct iommu_fault_event *evt,
+int intel_svm_page_response(struct device *dev, struct iopf_fault *evt,
struct iommu_page_response *msg);
struct iommu_domain *intel_svm_domain_alloc(void);
void intel_svm_remove_dev_pasid(struct device *dev, ioasid_t pasid);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 0f33b212ca28..4e5fae4c3430 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -922,7 +922,7 @@ static int arm_smmu_cmdq_batch_submit(struct arm_smmu_device *smmu,
}

static int arm_smmu_page_response(struct device *dev,
- struct iommu_fault_event *unused,
+ struct iopf_fault *unused,
struct iommu_page_response *resp)
{
struct arm_smmu_cmdq_ent cmd = {0};
@@ -1467,7 +1467,7 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
struct arm_smmu_master *master;
bool ssid_valid = evt[0] & EVTQ_0_SSV;
u32 sid = FIELD_GET(EVTQ_0_SID, evt[0]);
- struct iommu_fault_event fault_evt = { };
+ struct iopf_fault fault_evt = { };
struct iommu_fault *flt = &fault_evt.fault;

switch (FIELD_GET(EVTQ_0_ID, evt[0])) {
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index ac12f76c1212..1a37fefaf4fd 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -569,13 +569,12 @@ static int prq_to_iommu_prot(struct page_req_dsc *req)
static int intel_svm_prq_report(struct intel_iommu *iommu, struct device *dev,
struct page_req_dsc *desc)
{
- struct iommu_fault_event event;
+ struct iopf_fault event = { };

if (!dev || !dev_is_pci(dev))
return -ENODEV;

/* Fill in event data for device specific processing */
- memset(&event, 0, sizeof(struct iommu_fault_event));
event.fault.type = IOMMU_FAULT_PAGE_REQ;
event.fault.prm.addr = (u64)desc->addr << VTD_PAGE_SHIFT;
event.fault.prm.pasid = desc->pasid;
@@ -747,7 +746,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
}

int intel_svm_page_response(struct device *dev,
- struct iommu_fault_event *evt,
+ struct iopf_fault *evt,
struct iommu_page_response *msg)
{
struct iommu_fault_page_request *prm;
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 4fda01de5589..10d48eb72608 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -25,11 +25,6 @@ struct iopf_queue {
struct mutex lock;
};

-struct iopf_fault {
- struct iommu_fault fault;
- struct list_head list;
-};
-
struct iopf_group {
struct iopf_fault last_fault;
struct list_head faults;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 9e2f399044bf..c7012a96adb3 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1330,10 +1330,10 @@ EXPORT_SYMBOL_GPL(iommu_group_put);
*
* Return 0 on success, or an error.
*/
-int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
+int iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
{
struct dev_iommu *param = dev->iommu;
- struct iommu_fault_event *evt_pending = NULL;
+ struct iopf_fault *evt_pending = NULL;
struct iommu_fault_param *fparam;
int ret = 0;

@@ -1346,7 +1346,7 @@ int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)

if (evt->fault.type == IOMMU_FAULT_PAGE_REQ &&
(evt->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) {
- evt_pending = kmemdup(evt, sizeof(struct iommu_fault_event),
+ evt_pending = kmemdup(evt, sizeof(struct iopf_fault),
GFP_KERNEL);
if (!evt_pending) {
ret = -ENOMEM;
@@ -1375,7 +1375,7 @@ int iommu_page_response(struct device *dev,
{
bool needs_pasid;
int ret = -EINVAL;
- struct iommu_fault_event *evt;
+ struct iopf_fault *evt;
struct iommu_fault_page_request *prm;
struct dev_iommu *param = dev->iommu;
const struct iommu_ops *ops = dev_iommu_ops(dev);
--
2.34.1

2023-12-07 06:49:21

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v8 08/12] iommu: Prepare for separating SVA and IOPF

Move iopf_group data structure to iommu.h to make it a minimal set of
faults that a domain's page fault handler should handle.

Add a new function, iopf_free_group(), to free a fault group after all
faults in the group are handled. This function will be made global so
that it can be called from other files, such as iommu-sva.c.

Move iopf_queue data structure to iommu.h to allow the workqueue to be
scheduled out of this file.

This will simplify the sequential patches.

Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Reviewed-by: Yi Liu <[email protected]>
Tested-by: Yan Zhao <[email protected]>
Tested-by: Longfang Liu <[email protected]>
---
include/linux/iommu.h | 20 +++++++++++++++++++-
drivers/iommu/io-pgfault.c | 37 +++++++++++++------------------------
2 files changed, 32 insertions(+), 25 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d0d120efce11..c17a99648e44 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -41,7 +41,6 @@ struct iommu_dirty_ops;
struct notifier_block;
struct iommu_sva;
struct iommu_dma_cookie;
-struct iopf_queue;

#define IOMMU_FAULT_PERM_READ (1 << 0) /* read */
#define IOMMU_FAULT_PERM_WRITE (1 << 1) /* write */
@@ -126,6 +125,25 @@ struct iopf_fault {
struct list_head list;
};

+struct iopf_group {
+ struct iopf_fault last_fault;
+ struct list_head faults;
+ struct work_struct work;
+ struct device *dev;
+};
+
+/**
+ * struct iopf_queue - IO Page Fault queue
+ * @wq: the fault workqueue
+ * @devices: devices attached to this queue
+ * @lock: protects the device list
+ */
+struct iopf_queue {
+ struct workqueue_struct *wq;
+ struct list_head devices;
+ struct mutex lock;
+};
+
/* iommu fault flags */
#define IOMMU_FAULT_READ 0x0
#define IOMMU_FAULT_WRITE 0x1
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 10d48eb72608..c7e6bbed5c05 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -13,24 +13,17 @@

#include "iommu-sva.h"

-/**
- * struct iopf_queue - IO Page Fault queue
- * @wq: the fault workqueue
- * @devices: devices attached to this queue
- * @lock: protects the device list
- */
-struct iopf_queue {
- struct workqueue_struct *wq;
- struct list_head devices;
- struct mutex lock;
-};
+static void iopf_free_group(struct iopf_group *group)
+{
+ struct iopf_fault *iopf, *next;

-struct iopf_group {
- struct iopf_fault last_fault;
- struct list_head faults;
- struct work_struct work;
- struct device *dev;
-};
+ list_for_each_entry_safe(iopf, next, &group->faults, list) {
+ if (!(iopf->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE))
+ kfree(iopf);
+ }
+
+ kfree(group);
+}

static int iopf_complete_group(struct device *dev, struct iopf_fault *iopf,
enum iommu_page_response_code status)
@@ -50,9 +43,9 @@ static int iopf_complete_group(struct device *dev, struct iopf_fault *iopf,

static void iopf_handler(struct work_struct *work)
{
+ struct iopf_fault *iopf;
struct iopf_group *group;
struct iommu_domain *domain;
- struct iopf_fault *iopf, *next;
enum iommu_page_response_code status = IOMMU_PAGE_RESP_SUCCESS;

group = container_of(work, struct iopf_group, work);
@@ -61,7 +54,7 @@ static void iopf_handler(struct work_struct *work)
if (!domain || !domain->iopf_handler)
status = IOMMU_PAGE_RESP_INVALID;

- list_for_each_entry_safe(iopf, next, &group->faults, list) {
+ list_for_each_entry(iopf, &group->faults, list) {
/*
* For the moment, errors are sticky: don't handle subsequent
* faults in the group if there is an error.
@@ -69,14 +62,10 @@ static void iopf_handler(struct work_struct *work)
if (status == IOMMU_PAGE_RESP_SUCCESS)
status = domain->iopf_handler(&iopf->fault,
domain->fault_data);
-
- if (!(iopf->fault.prm.flags &
- IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE))
- kfree(iopf);
}

iopf_complete_group(group->dev, &group->last_fault, status);
- kfree(group);
+ iopf_free_group(group);
}

/**
--
2.34.1

2023-12-07 06:49:31

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v8 10/12] iommu: Separate SVA and IOPF

Add CONFIG_IOMMU_IOPF for page fault handling framework and select it
from its real consumer. Move iopf function declaration from iommu-sva.h
to iommu.h and remove iommu-sva.h as it's empty now.

Consolidate all SVA related code into iommu-sva.c:
- Move iommu_sva_domain_alloc() from iommu.c to iommu-sva.c.
- Move sva iopf handling code from io-pgfault.c to iommu-sva.c.

Consolidate iommu_report_device_fault() and iommu_page_response() into
io-pgfault.c.

Export iopf_free_group() and iopf_group_response() for iopf handlers
implemented in modules. Some functions are renamed with more meaningful
names. No other intentional functionality changes.

Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Tested-by: Yan Zhao <[email protected]>
Tested-by: Longfang Liu <[email protected]>
---
include/linux/iommu.h | 98 ++++++---
drivers/iommu/iommu-sva.h | 69 -------
.../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 1 -
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 1 -
drivers/iommu/intel/iommu.c | 1 -
drivers/iommu/intel/svm.c | 1 -
drivers/iommu/io-pgfault.c | 188 +++++++++++++-----
drivers/iommu/iommu-sva.c | 64 +++++-
drivers/iommu/iommu.c | 133 -------------
drivers/iommu/Kconfig | 4 +
drivers/iommu/Makefile | 3 +-
drivers/iommu/intel/Kconfig | 1 +
12 files changed, 275 insertions(+), 289 deletions(-)
delete mode 100644 drivers/iommu/iommu-sva.h

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index dcd22bb381a0..63df77cc0b61 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -728,10 +728,6 @@ extern struct iommu_group *iommu_group_get(struct device *dev);
extern struct iommu_group *iommu_group_ref_get(struct iommu_group *group);
extern void iommu_group_put(struct iommu_group *group);

-extern int iommu_report_device_fault(struct device *dev, struct iopf_fault *evt);
-extern int iommu_page_response(struct device *dev,
- struct iommu_page_response *msg);
-
extern int iommu_group_id(struct iommu_group *group);
extern struct iommu_domain *iommu_group_default_domain(struct iommu_group *);

@@ -945,8 +941,6 @@ bool iommu_group_dma_owner_claimed(struct iommu_group *group);
int iommu_device_claim_dma_owner(struct device *dev, void *owner);
void iommu_device_release_dma_owner(struct device *dev);

-struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
- struct mm_struct *mm);
int iommu_attach_device_pasid(struct iommu_domain *domain,
struct device *dev, ioasid_t pasid);
void iommu_detach_device_pasid(struct iommu_domain *domain,
@@ -1135,18 +1129,6 @@ static inline void iommu_group_put(struct iommu_group *group)
{
}

-static inline
-int iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
-{
- return -ENODEV;
-}
-
-static inline int iommu_page_response(struct device *dev,
- struct iommu_page_response *msg)
-{
- return -ENODEV;
-}
-
static inline int iommu_group_id(struct iommu_group *group)
{
return -ENODEV;
@@ -1295,12 +1277,6 @@ static inline int iommu_device_claim_dma_owner(struct device *dev, void *owner)
return -ENODEV;
}

-static inline struct iommu_domain *
-iommu_sva_domain_alloc(struct device *dev, struct mm_struct *mm)
-{
- return NULL;
-}
-
static inline int iommu_attach_device_pasid(struct iommu_domain *domain,
struct device *dev, ioasid_t pasid)
{
@@ -1422,6 +1398,8 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev,
struct mm_struct *mm);
void iommu_sva_unbind_device(struct iommu_sva *handle);
u32 iommu_sva_get_pasid(struct iommu_sva *handle);
+struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
+ struct mm_struct *mm);
#else
static inline struct iommu_sva *
iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
@@ -1440,6 +1418,78 @@ static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle)
static inline void mm_pasid_init(struct mm_struct *mm) {}
static inline bool mm_valid_pasid(struct mm_struct *mm) { return false; }
static inline void mm_pasid_drop(struct mm_struct *mm) {}
+
+static inline struct iommu_domain *
+iommu_sva_domain_alloc(struct device *dev, struct mm_struct *mm)
+{
+ return NULL;
+}
#endif /* CONFIG_IOMMU_SVA */

+#ifdef CONFIG_IOMMU_IOPF
+int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev);
+int iopf_queue_remove_device(struct iopf_queue *queue, struct device *dev);
+int iopf_queue_flush_dev(struct device *dev);
+struct iopf_queue *iopf_queue_alloc(const char *name);
+void iopf_queue_free(struct iopf_queue *queue);
+int iopf_queue_discard_partial(struct iopf_queue *queue);
+void iopf_free_group(struct iopf_group *group);
+int iommu_report_device_fault(struct device *dev, struct iopf_fault *evt);
+int iommu_page_response(struct device *dev, struct iommu_page_response *msg);
+int iopf_group_response(struct iopf_group *group,
+ enum iommu_page_response_code status);
+#else
+static inline int
+iopf_queue_add_device(struct iopf_queue *queue, struct device *dev)
+{
+ return -ENODEV;
+}
+
+static inline int
+iopf_queue_remove_device(struct iopf_queue *queue, struct device *dev)
+{
+ return -ENODEV;
+}
+
+static inline int iopf_queue_flush_dev(struct device *dev)
+{
+ return -ENODEV;
+}
+
+static inline struct iopf_queue *iopf_queue_alloc(const char *name)
+{
+ return NULL;
+}
+
+static inline void iopf_queue_free(struct iopf_queue *queue)
+{
+}
+
+static inline int iopf_queue_discard_partial(struct iopf_queue *queue)
+{
+ return -ENODEV;
+}
+
+static inline void iopf_free_group(struct iopf_group *group)
+{
+}
+
+static inline int
+iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
+{
+ return -ENODEV;
+}
+
+static inline int
+iommu_page_response(struct device *dev, struct iommu_page_response *msg)
+{
+ return -ENODEV;
+}
+
+static inline int iopf_group_response(struct iopf_group *group,
+ enum iommu_page_response_code status)
+{
+ return -ENODEV;
+}
+#endif /* CONFIG_IOMMU_IOPF */
#endif /* __LINUX_IOMMU_H */
diff --git a/drivers/iommu/iommu-sva.h b/drivers/iommu/iommu-sva.h
deleted file mode 100644
index 27c8da115b41..000000000000
--- a/drivers/iommu/iommu-sva.h
+++ /dev/null
@@ -1,69 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * SVA library for IOMMU drivers
- */
-#ifndef _IOMMU_SVA_H
-#define _IOMMU_SVA_H
-
-#include <linux/mm_types.h>
-
-/* I/O Page fault */
-struct device;
-struct iommu_fault;
-struct iopf_queue;
-
-#ifdef CONFIG_IOMMU_SVA
-int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev);
-
-int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev);
-int iopf_queue_remove_device(struct iopf_queue *queue,
- struct device *dev);
-int iopf_queue_flush_dev(struct device *dev);
-struct iopf_queue *iopf_queue_alloc(const char *name);
-void iopf_queue_free(struct iopf_queue *queue);
-int iopf_queue_discard_partial(struct iopf_queue *queue);
-int iommu_sva_handle_iopf(struct iopf_group *group);
-
-#else /* CONFIG_IOMMU_SVA */
-static inline int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev)
-{
- return -ENODEV;
-}
-
-static inline int iopf_queue_add_device(struct iopf_queue *queue,
- struct device *dev)
-{
- return -ENODEV;
-}
-
-static inline int iopf_queue_remove_device(struct iopf_queue *queue,
- struct device *dev)
-{
- return -ENODEV;
-}
-
-static inline int iopf_queue_flush_dev(struct device *dev)
-{
- return -ENODEV;
-}
-
-static inline struct iopf_queue *iopf_queue_alloc(const char *name)
-{
- return NULL;
-}
-
-static inline void iopf_queue_free(struct iopf_queue *queue)
-{
-}
-
-static inline int iopf_queue_discard_partial(struct iopf_queue *queue)
-{
- return -ENODEV;
-}
-
-static inline int iommu_sva_handle_iopf(struct iopf_group *group)
-{
- return IOMMU_PAGE_RESP_INVALID;
-}
-#endif /* CONFIG_IOMMU_SVA */
-#endif /* _IOMMU_SVA_H */
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index 84c9554144cb..c8bdbb9ec8de 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -10,7 +10,6 @@
#include <linux/slab.h>

#include "arm-smmu-v3.h"
-#include "../../iommu-sva.h"
#include "../../io-pgtable-arm.h"

struct arm_smmu_mmu_notifier {
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 4e5fae4c3430..f86e257a2ea6 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -29,7 +29,6 @@

#include "arm-smmu-v3.h"
#include "../../dma-iommu.h"
-#include "../../iommu-sva.h"

static bool disable_bypass = true;
module_param(disable_bypass, bool, 0444);
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 17f7690f4919..e8c59cf3b811 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -27,7 +27,6 @@
#include "iommu.h"
#include "../dma-iommu.h"
#include "../irq_remapping.h"
-#include "../iommu-sva.h"
#include "pasid.h"
#include "cap_audit.h"
#include "perfmon.h"
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 1a37fefaf4fd..9447e50b1b2d 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -22,7 +22,6 @@
#include "iommu.h"
#include "pasid.h"
#include "perf.h"
-#include "../iommu-sva.h"
#include "trace.h"

static irqreturn_t prq_event_thread(int irq, void *d);
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index ccd0e5fefe1d..f501197a2892 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -11,12 +11,9 @@
#include <linux/slab.h>
#include <linux/workqueue.h>

-#include "iommu-sva.h"
+#include "iommu-priv.h"

-enum iommu_page_response_code
-iommu_sva_handle_mm(struct iommu_fault *fault, struct mm_struct *mm);
-
-static void iopf_free_group(struct iopf_group *group)
+void iopf_free_group(struct iopf_group *group)
{
struct iopf_fault *iopf, *next;

@@ -27,47 +24,10 @@ static void iopf_free_group(struct iopf_group *group)

kfree(group);
}
-
-static int iopf_complete_group(struct device *dev, struct iopf_fault *iopf,
- enum iommu_page_response_code status)
-{
- struct iommu_page_response resp = {
- .pasid = iopf->fault.prm.pasid,
- .grpid = iopf->fault.prm.grpid,
- .code = status,
- };
-
- if ((iopf->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) &&
- (iopf->fault.prm.flags & IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID))
- resp.flags = IOMMU_PAGE_RESP_PASID_VALID;
-
- return iommu_page_response(dev, &resp);
-}
-
-static void iopf_handler(struct work_struct *work)
-{
- struct iopf_fault *iopf;
- struct iopf_group *group;
- enum iommu_page_response_code status = IOMMU_PAGE_RESP_SUCCESS;
-
- group = container_of(work, struct iopf_group, work);
- list_for_each_entry(iopf, &group->faults, list) {
- /*
- * For the moment, errors are sticky: don't handle subsequent
- * faults in the group if there is an error.
- */
- if (status != IOMMU_PAGE_RESP_SUCCESS)
- break;
-
- status = iommu_sva_handle_mm(&iopf->fault, group->domain->mm);
- }
-
- iopf_complete_group(group->dev, &group->last_fault, status);
- iopf_free_group(group);
-}
+EXPORT_SYMBOL_GPL(iopf_free_group);

/**
- * iommu_queue_iopf - IO Page Fault handler
+ * iommu_handle_iopf - IO Page Fault handler
* @fault: fault event
* @dev: struct device.
*
@@ -106,7 +66,7 @@ static void iopf_handler(struct work_struct *work)
*
* Return: 0 on success and <0 on error.
*/
-int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev)
+static int iommu_handle_iopf(struct iommu_fault *fault, struct device *dev)
{
int ret;
struct iopf_group *group;
@@ -199,18 +159,117 @@ int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev)
}
return ret;
}
-EXPORT_SYMBOL_GPL(iommu_queue_iopf);

-int iommu_sva_handle_iopf(struct iopf_group *group)
+/**
+ * iommu_report_device_fault() - Report fault event to device driver
+ * @dev: the device
+ * @evt: fault event data
+ *
+ * Called by IOMMU drivers when a fault is detected, typically in a threaded IRQ
+ * handler. When this function fails and the fault is recoverable, it is the
+ * caller's responsibility to complete the fault.
+ *
+ * Return 0 on success, or an error.
+ */
+int iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
{
- struct iommu_fault_param *fault_param = group->dev->iommu->fault_param;
+ struct dev_iommu *param = dev->iommu;
+ struct iopf_fault *evt_pending = NULL;
+ struct iommu_fault_param *fparam;
+ int ret = 0;

- INIT_WORK(&group->work, iopf_handler);
- if (!queue_work(fault_param->queue->wq, &group->work))
- return -EBUSY;
+ if (!param || !evt)
+ return -EINVAL;

- return 0;
+ /* we only report device fault if there is a handler registered */
+ mutex_lock(&param->lock);
+ fparam = param->fault_param;
+
+ if (evt->fault.type == IOMMU_FAULT_PAGE_REQ &&
+ (evt->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) {
+ evt_pending = kmemdup(evt, sizeof(struct iopf_fault),
+ GFP_KERNEL);
+ if (!evt_pending) {
+ ret = -ENOMEM;
+ goto done_unlock;
+ }
+ mutex_lock(&fparam->lock);
+ list_add_tail(&evt_pending->list, &fparam->faults);
+ mutex_unlock(&fparam->lock);
+ }
+
+ ret = iommu_handle_iopf(&evt->fault, dev);
+ if (ret && evt_pending) {
+ mutex_lock(&fparam->lock);
+ list_del(&evt_pending->list);
+ mutex_unlock(&fparam->lock);
+ kfree(evt_pending);
+ }
+done_unlock:
+ mutex_unlock(&param->lock);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_report_device_fault);
+
+int iommu_page_response(struct device *dev,
+ struct iommu_page_response *msg)
+{
+ bool needs_pasid;
+ int ret = -EINVAL;
+ struct iopf_fault *evt;
+ struct iommu_fault_page_request *prm;
+ struct dev_iommu *param = dev->iommu;
+ const struct iommu_ops *ops = dev_iommu_ops(dev);
+ bool has_pasid = msg->flags & IOMMU_PAGE_RESP_PASID_VALID;
+
+ if (!ops->page_response)
+ return -ENODEV;
+
+ if (!param || !param->fault_param)
+ return -EINVAL;
+
+ /* Only send response if there is a fault report pending */
+ mutex_lock(&param->fault_param->lock);
+ if (list_empty(&param->fault_param->faults)) {
+ dev_warn_ratelimited(dev, "no pending PRQ, drop response\n");
+ goto done_unlock;
+ }
+ /*
+ * Check if we have a matching page request pending to respond,
+ * otherwise return -EINVAL
+ */
+ list_for_each_entry(evt, &param->fault_param->faults, list) {
+ prm = &evt->fault.prm;
+ if (prm->grpid != msg->grpid)
+ continue;
+
+ /*
+ * If the PASID is required, the corresponding request is
+ * matched using the group ID, the PASID valid bit and the PASID
+ * value. Otherwise only the group ID matches request and
+ * response.
+ */
+ needs_pasid = prm->flags & IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID;
+ if (needs_pasid && (!has_pasid || msg->pasid != prm->pasid))
+ continue;
+
+ if (!needs_pasid && has_pasid) {
+ /* No big deal, just clear it. */
+ msg->flags &= ~IOMMU_PAGE_RESP_PASID_VALID;
+ msg->pasid = 0;
+ }
+
+ ret = ops->page_response(dev, evt, msg);
+ list_del(&evt->list);
+ kfree(evt);
+ break;
+ }
+
+done_unlock:
+ mutex_unlock(&param->fault_param->lock);
+ return ret;
}
+EXPORT_SYMBOL_GPL(iommu_page_response);

/**
* iopf_queue_flush_dev - Ensure that all queued faults have been processed
@@ -245,6 +304,31 @@ int iopf_queue_flush_dev(struct device *dev)
}
EXPORT_SYMBOL_GPL(iopf_queue_flush_dev);

+/**
+ * iopf_group_response - Respond a group of page faults
+ * @group: the group of faults with the same group id
+ * @status: the response code
+ *
+ * Return 0 on success and <0 on error.
+ */
+int iopf_group_response(struct iopf_group *group,
+ enum iommu_page_response_code status)
+{
+ struct iopf_fault *iopf = &group->last_fault;
+ struct iommu_page_response resp = {
+ .pasid = iopf->fault.prm.pasid,
+ .grpid = iopf->fault.prm.grpid,
+ .code = status,
+ };
+
+ if ((iopf->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) &&
+ (iopf->fault.prm.flags & IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID))
+ resp.flags = IOMMU_PAGE_RESP_PASID_VALID;
+
+ return iommu_page_response(group->dev, &resp);
+}
+EXPORT_SYMBOL_GPL(iopf_group_response);
+
/**
* iopf_queue_discard_partial - Remove all pending partial fault
* @queue: the queue whose partial faults need to be discarded
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index ba0d5b7e106a..325d1810e133 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -7,7 +7,7 @@
#include <linux/sched/mm.h>
#include <linux/iommu.h>

-#include "iommu-sva.h"
+#include "iommu-priv.h"

static DEFINE_MUTEX(iommu_sva_lock);

@@ -145,10 +145,18 @@ u32 iommu_sva_get_pasid(struct iommu_sva *handle)
}
EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);

+void mm_pasid_drop(struct mm_struct *mm)
+{
+ if (likely(!mm_valid_pasid(mm)))
+ return;
+
+ iommu_free_global_pasid(mm->pasid);
+}
+
/*
* I/O page fault handler for SVA
*/
-enum iommu_page_response_code
+static enum iommu_page_response_code
iommu_sva_handle_mm(struct iommu_fault *fault, struct mm_struct *mm)
{
vm_fault_t ret;
@@ -202,10 +210,54 @@ iommu_sva_handle_mm(struct iommu_fault *fault, struct mm_struct *mm)
return status;
}

-void mm_pasid_drop(struct mm_struct *mm)
+static void iommu_sva_handle_iopf(struct work_struct *work)
{
- if (likely(!mm_valid_pasid(mm)))
- return;
+ struct iopf_fault *iopf;
+ struct iopf_group *group;
+ enum iommu_page_response_code status = IOMMU_PAGE_RESP_SUCCESS;

- iommu_free_global_pasid(mm->pasid);
+ group = container_of(work, struct iopf_group, work);
+ list_for_each_entry(iopf, &group->faults, list) {
+ /*
+ * For the moment, errors are sticky: don't handle subsequent
+ * faults in the group if there is an error.
+ */
+ if (status != IOMMU_PAGE_RESP_SUCCESS)
+ break;
+
+ status = iommu_sva_handle_mm(&iopf->fault, group->domain->mm);
+ }
+
+ iopf_group_response(group, status);
+ iopf_free_group(group);
+}
+
+static int iommu_sva_iopf_handler(struct iopf_group *group)
+{
+ struct iommu_fault_param *fault_param = group->dev->iommu->fault_param;
+
+ INIT_WORK(&group->work, iommu_sva_handle_iopf);
+ if (!queue_work(fault_param->queue->wq, &group->work))
+ return -EBUSY;
+
+ return 0;
+}
+
+struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
+ struct mm_struct *mm)
+{
+ const struct iommu_ops *ops = dev_iommu_ops(dev);
+ struct iommu_domain *domain;
+
+ domain = ops->domain_alloc(IOMMU_DOMAIN_SVA);
+ if (!domain)
+ return NULL;
+
+ domain->type = IOMMU_DOMAIN_SVA;
+ mmgrab(mm);
+ domain->mm = mm;
+ domain->owner = ops;
+ domain->iopf_handler = iommu_sva_iopf_handler;
+
+ return domain;
}
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index c7012a96adb3..087b2e900672 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -36,8 +36,6 @@
#include "dma-iommu.h"
#include "iommu-priv.h"

-#include "iommu-sva.h"
-
static struct kset *iommu_group_kset;
static DEFINE_IDA(iommu_group_ida);
static DEFINE_IDA(iommu_global_pasid_ida);
@@ -1319,117 +1317,6 @@ void iommu_group_put(struct iommu_group *group)
}
EXPORT_SYMBOL_GPL(iommu_group_put);

-/**
- * iommu_report_device_fault() - Report fault event to device driver
- * @dev: the device
- * @evt: fault event data
- *
- * Called by IOMMU drivers when a fault is detected, typically in a threaded IRQ
- * handler. When this function fails and the fault is recoverable, it is the
- * caller's responsibility to complete the fault.
- *
- * Return 0 on success, or an error.
- */
-int iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
-{
- struct dev_iommu *param = dev->iommu;
- struct iopf_fault *evt_pending = NULL;
- struct iommu_fault_param *fparam;
- int ret = 0;
-
- if (!param || !evt)
- return -EINVAL;
-
- /* we only report device fault if there is a handler registered */
- mutex_lock(&param->lock);
- fparam = param->fault_param;
-
- if (evt->fault.type == IOMMU_FAULT_PAGE_REQ &&
- (evt->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) {
- evt_pending = kmemdup(evt, sizeof(struct iopf_fault),
- GFP_KERNEL);
- if (!evt_pending) {
- ret = -ENOMEM;
- goto done_unlock;
- }
- mutex_lock(&fparam->lock);
- list_add_tail(&evt_pending->list, &fparam->faults);
- mutex_unlock(&fparam->lock);
- }
-
- ret = iommu_queue_iopf(&evt->fault, dev);
- if (ret && evt_pending) {
- mutex_lock(&fparam->lock);
- list_del(&evt_pending->list);
- mutex_unlock(&fparam->lock);
- kfree(evt_pending);
- }
-done_unlock:
- mutex_unlock(&param->lock);
- return ret;
-}
-EXPORT_SYMBOL_GPL(iommu_report_device_fault);
-
-int iommu_page_response(struct device *dev,
- struct iommu_page_response *msg)
-{
- bool needs_pasid;
- int ret = -EINVAL;
- struct iopf_fault *evt;
- struct iommu_fault_page_request *prm;
- struct dev_iommu *param = dev->iommu;
- const struct iommu_ops *ops = dev_iommu_ops(dev);
- bool has_pasid = msg->flags & IOMMU_PAGE_RESP_PASID_VALID;
-
- if (!ops->page_response)
- return -ENODEV;
-
- if (!param || !param->fault_param)
- return -EINVAL;
-
- /* Only send response if there is a fault report pending */
- mutex_lock(&param->fault_param->lock);
- if (list_empty(&param->fault_param->faults)) {
- dev_warn_ratelimited(dev, "no pending PRQ, drop response\n");
- goto done_unlock;
- }
- /*
- * Check if we have a matching page request pending to respond,
- * otherwise return -EINVAL
- */
- list_for_each_entry(evt, &param->fault_param->faults, list) {
- prm = &evt->fault.prm;
- if (prm->grpid != msg->grpid)
- continue;
-
- /*
- * If the PASID is required, the corresponding request is
- * matched using the group ID, the PASID valid bit and the PASID
- * value. Otherwise only the group ID matches request and
- * response.
- */
- needs_pasid = prm->flags & IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID;
- if (needs_pasid && (!has_pasid || msg->pasid != prm->pasid))
- continue;
-
- if (!needs_pasid && has_pasid) {
- /* No big deal, just clear it. */
- msg->flags &= ~IOMMU_PAGE_RESP_PASID_VALID;
- msg->pasid = 0;
- }
-
- ret = ops->page_response(dev, evt, msg);
- list_del(&evt->list);
- kfree(evt);
- break;
- }
-
-done_unlock:
- mutex_unlock(&param->fault_param->lock);
- return ret;
-}
-EXPORT_SYMBOL_GPL(iommu_page_response);
-
/**
* iommu_group_id - Return ID for a group
* @group: the group to ID
@@ -3504,26 +3391,6 @@ struct iommu_domain *iommu_get_domain_for_dev_pasid(struct device *dev,
}
EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev_pasid);

-struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
- struct mm_struct *mm)
-{
- const struct iommu_ops *ops = dev_iommu_ops(dev);
- struct iommu_domain *domain;
-
- domain = ops->domain_alloc(IOMMU_DOMAIN_SVA);
- if (!domain)
- return NULL;
-
- domain->type = IOMMU_DOMAIN_SVA;
- mmgrab(mm);
- domain->mm = mm;
- domain->owner = ops;
- domain->iopf_handler = iommu_sva_handle_iopf;
- domain->fault_data = mm;
-
- return domain;
-}
-
ioasid_t iommu_alloc_global_pasid(struct device *dev)
{
int ret;
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 7673bb82945b..4d6291664ca6 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -162,6 +162,9 @@ config IOMMU_DMA
config IOMMU_SVA
bool

+config IOMMU_IOPF
+ bool
+
config FSL_PAMU
bool "Freescale IOMMU support"
depends on PCI
@@ -397,6 +400,7 @@ config ARM_SMMU_V3_SVA
bool "Shared Virtual Addressing support for the ARM SMMUv3"
depends on ARM_SMMU_V3
select IOMMU_SVA
+ select IOMMU_IOPF
select MMU_NOTIFIER
help
Support for sharing process address spaces with devices using the
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 95ad9dbfbda0..542760d963ec 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -26,6 +26,7 @@ obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
-obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o io-pgfault.o
+obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
+obj-$(CONFIG_IOMMU_IOPF) += io-pgfault.o
obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o
obj-$(CONFIG_APPLE_DART) += apple-dart.o
diff --git a/drivers/iommu/intel/Kconfig b/drivers/iommu/intel/Kconfig
index 012cd2541a68..a4a125666293 100644
--- a/drivers/iommu/intel/Kconfig
+++ b/drivers/iommu/intel/Kconfig
@@ -51,6 +51,7 @@ config INTEL_IOMMU_SVM
depends on X86_64
select MMU_NOTIFIER
select IOMMU_SVA
+ select IOMMU_IOPF
help
Shared Virtual Memory (SVM) provides a facility for devices
to access DMA resources through process address space by
--
2.34.1

2023-12-07 06:49:44

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v8 09/12] iommu: Make iommu_queue_iopf() more generic

Make iommu_queue_iopf() more generic by making the iopf_group a minimal
set of iopf's that an iopf handler of domain should handle and respond
to. Add domain parameter to struct iopf_group so that the handler can
retrieve and use it directly.

Change iommu_queue_iopf() to forward groups of iopf's to the domain's
iopf handler. This is also a necessary step to decouple the sva iopf
handling code from this interface.

Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Yi Liu <[email protected]>
Tested-by: Yan Zhao <[email protected]>
Tested-by: Longfang Liu <[email protected]>
---
include/linux/iommu.h | 4 +--
drivers/iommu/iommu-sva.h | 6 ++---
drivers/iommu/io-pgfault.c | 55 +++++++++++++++++++++++++++++---------
drivers/iommu/iommu-sva.c | 3 +--
4 files changed, 48 insertions(+), 20 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index c17a99648e44..dcd22bb381a0 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -130,6 +130,7 @@ struct iopf_group {
struct list_head faults;
struct work_struct work;
struct device *dev;
+ struct iommu_domain *domain;
};

/**
@@ -209,8 +210,7 @@ struct iommu_domain {
unsigned long pgsize_bitmap; /* Bitmap of page sizes in use */
struct iommu_domain_geometry geometry;
struct iommu_dma_cookie *iova_cookie;
- enum iommu_page_response_code (*iopf_handler)(struct iommu_fault *fault,
- void *data);
+ int (*iopf_handler)(struct iopf_group *group);
void *fault_data;
union {
struct {
diff --git a/drivers/iommu/iommu-sva.h b/drivers/iommu/iommu-sva.h
index de7819c796ce..27c8da115b41 100644
--- a/drivers/iommu/iommu-sva.h
+++ b/drivers/iommu/iommu-sva.h
@@ -22,8 +22,7 @@ int iopf_queue_flush_dev(struct device *dev);
struct iopf_queue *iopf_queue_alloc(const char *name);
void iopf_queue_free(struct iopf_queue *queue);
int iopf_queue_discard_partial(struct iopf_queue *queue);
-enum iommu_page_response_code
-iommu_sva_handle_iopf(struct iommu_fault *fault, void *data);
+int iommu_sva_handle_iopf(struct iopf_group *group);

#else /* CONFIG_IOMMU_SVA */
static inline int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev)
@@ -62,8 +61,7 @@ static inline int iopf_queue_discard_partial(struct iopf_queue *queue)
return -ENODEV;
}

-static inline enum iommu_page_response_code
-iommu_sva_handle_iopf(struct iommu_fault *fault, void *data)
+static inline int iommu_sva_handle_iopf(struct iopf_group *group)
{
return IOMMU_PAGE_RESP_INVALID;
}
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index c7e6bbed5c05..ccd0e5fefe1d 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -13,6 +13,9 @@

#include "iommu-sva.h"

+enum iommu_page_response_code
+iommu_sva_handle_mm(struct iommu_fault *fault, struct mm_struct *mm);
+
static void iopf_free_group(struct iopf_group *group)
{
struct iopf_fault *iopf, *next;
@@ -45,23 +48,18 @@ static void iopf_handler(struct work_struct *work)
{
struct iopf_fault *iopf;
struct iopf_group *group;
- struct iommu_domain *domain;
enum iommu_page_response_code status = IOMMU_PAGE_RESP_SUCCESS;

group = container_of(work, struct iopf_group, work);
- domain = iommu_get_domain_for_dev_pasid(group->dev,
- group->last_fault.fault.prm.pasid, 0);
- if (!domain || !domain->iopf_handler)
- status = IOMMU_PAGE_RESP_INVALID;
-
list_for_each_entry(iopf, &group->faults, list) {
/*
* For the moment, errors are sticky: don't handle subsequent
* faults in the group if there is an error.
*/
- if (status == IOMMU_PAGE_RESP_SUCCESS)
- status = domain->iopf_handler(&iopf->fault,
- domain->fault_data);
+ if (status != IOMMU_PAGE_RESP_SUCCESS)
+ break;
+
+ status = iommu_sva_handle_mm(&iopf->fault, group->domain->mm);
}

iopf_complete_group(group->dev, &group->last_fault, status);
@@ -113,6 +111,7 @@ int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev)
int ret;
struct iopf_group *group;
struct iopf_fault *iopf, *next;
+ struct iommu_domain *domain = NULL;
struct iommu_fault_param *iopf_param;
struct dev_iommu *param = dev->iommu;

@@ -143,6 +142,23 @@ int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev)
return 0;
}

+ if (fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) {
+ domain = iommu_get_domain_for_dev_pasid(dev, fault->prm.pasid, 0);
+ if (IS_ERR(domain))
+ domain = NULL;
+ }
+
+ if (!domain)
+ domain = iommu_get_domain_for_dev(dev);
+
+ if (!domain || !domain->iopf_handler) {
+ dev_warn_ratelimited(dev,
+ "iopf (pasid %d) without domain attached or handler installed\n",
+ fault->prm.pasid);
+ ret = -ENODEV;
+ goto cleanup_partial;
+ }
+
group = kzalloc(sizeof(*group), GFP_KERNEL);
if (!group) {
/*
@@ -157,8 +173,8 @@ int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev)
group->dev = dev;
group->last_fault.fault = *fault;
INIT_LIST_HEAD(&group->faults);
+ group->domain = domain;
list_add(&group->last_fault.list, &group->faults);
- INIT_WORK(&group->work, iopf_handler);

/* See if we have partial faults for this group */
list_for_each_entry_safe(iopf, next, &iopf_param->partial, list) {
@@ -167,9 +183,13 @@ int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev)
list_move(&iopf->list, &group->faults);
}

- queue_work(iopf_param->queue->wq, &group->work);
- return 0;
+ mutex_unlock(&iopf_param->lock);
+ ret = domain->iopf_handler(group);
+ mutex_lock(&iopf_param->lock);
+ if (ret)
+ iopf_free_group(group);

+ return ret;
cleanup_partial:
list_for_each_entry_safe(iopf, next, &iopf_param->partial, list) {
if (iopf->fault.prm.grpid == fault->prm.grpid) {
@@ -181,6 +201,17 @@ int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev)
}
EXPORT_SYMBOL_GPL(iommu_queue_iopf);

+int iommu_sva_handle_iopf(struct iopf_group *group)
+{
+ struct iommu_fault_param *fault_param = group->dev->iommu->fault_param;
+
+ INIT_WORK(&group->work, iopf_handler);
+ if (!queue_work(fault_param->queue->wq, &group->work))
+ return -EBUSY;
+
+ return 0;
+}
+
/**
* iopf_queue_flush_dev - Ensure that all queued faults have been processed
* @dev: the endpoint whose faults need to be flushed.
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index b78671a8a914..ba0d5b7e106a 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -149,11 +149,10 @@ EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
* I/O page fault handler for SVA
*/
enum iommu_page_response_code
-iommu_sva_handle_iopf(struct iommu_fault *fault, void *data)
+iommu_sva_handle_mm(struct iommu_fault *fault, struct mm_struct *mm)
{
vm_fault_t ret;
struct vm_area_struct *vma;
- struct mm_struct *mm = data;
unsigned int access_flags = 0;
unsigned int fault_flags = FAULT_FLAG_REMOTE;
struct iommu_fault_page_request *prm = &fault->prm;
--
2.34.1

2023-12-07 06:49:46

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v8 11/12] iommu: Refine locking for per-device fault data management

The per-device fault data is a data structure that is used to store
information about faults that occur on a device. This data is allocated
when IOPF is enabled on the device and freed when IOPF is disabled. The
data is used in the paths of iopf reporting, handling, responding, and
draining.

The fault data is protected by two locks:

- dev->iommu->lock: This lock is used to protect the allocation and
freeing of the fault data.
- dev->iommu->fault_parameter->lock: This lock is used to protect the
fault data itself.

Apply the locking mechanism to the fault reporting and responding paths.

The fault_parameter->lock is also added in iopf_queue_discard_partial().
It does not fix any real issue, as iopf_queue_discard_partial() is only
used in the VT-d driver's prq_event_thread(), which is a single-threaded
path that reports the IOPFs.

Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Tested-by: Yan Zhao <[email protected]>
Tested-by: Longfang Liu <[email protected]>
---
drivers/iommu/io-pgfault.c | 61 +++++++++++++++++++-------------------
1 file changed, 30 insertions(+), 31 deletions(-)

diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index f501197a2892..9439eaf54928 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -29,7 +29,7 @@ EXPORT_SYMBOL_GPL(iopf_free_group);
/**
* iommu_handle_iopf - IO Page Fault handler
* @fault: fault event
- * @dev: struct device.
+ * @iopf_param: the fault parameter of the device.
*
* Add a fault to the device workqueue, to be handled by mm.
*
@@ -66,29 +66,21 @@ EXPORT_SYMBOL_GPL(iopf_free_group);
*
* Return: 0 on success and <0 on error.
*/
-static int iommu_handle_iopf(struct iommu_fault *fault, struct device *dev)
+static int iommu_handle_iopf(struct iommu_fault *fault,
+ struct iommu_fault_param *iopf_param)
{
int ret;
struct iopf_group *group;
struct iopf_fault *iopf, *next;
struct iommu_domain *domain = NULL;
- struct iommu_fault_param *iopf_param;
- struct dev_iommu *param = dev->iommu;
+ struct device *dev = iopf_param->dev;

- lockdep_assert_held(&param->lock);
+ lockdep_assert_held(&iopf_param->lock);

if (fault->type != IOMMU_FAULT_PAGE_REQ)
/* Not a recoverable page fault */
return -EOPNOTSUPP;

- /*
- * As long as we're holding param->lock, the queue can't be unlinked
- * from the device and therefore cannot disappear.
- */
- iopf_param = param->fault_param;
- if (!iopf_param)
- return -ENODEV;
-
if (!(fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) {
iopf = kzalloc(sizeof(*iopf), GFP_KERNEL);
if (!iopf)
@@ -173,18 +165,19 @@ static int iommu_handle_iopf(struct iommu_fault *fault, struct device *dev)
*/
int iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
{
- struct dev_iommu *param = dev->iommu;
+ struct iommu_fault_param *fault_param;
struct iopf_fault *evt_pending = NULL;
- struct iommu_fault_param *fparam;
+ struct dev_iommu *param = dev->iommu;
int ret = 0;

- if (!param || !evt)
- return -EINVAL;
-
- /* we only report device fault if there is a handler registered */
mutex_lock(&param->lock);
- fparam = param->fault_param;
+ fault_param = param->fault_param;
+ if (!fault_param) {
+ mutex_unlock(&param->lock);
+ return -EINVAL;
+ }

+ mutex_lock(&fault_param->lock);
if (evt->fault.type == IOMMU_FAULT_PAGE_REQ &&
(evt->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) {
evt_pending = kmemdup(evt, sizeof(struct iopf_fault),
@@ -193,20 +186,18 @@ int iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
ret = -ENOMEM;
goto done_unlock;
}
- mutex_lock(&fparam->lock);
- list_add_tail(&evt_pending->list, &fparam->faults);
- mutex_unlock(&fparam->lock);
+ list_add_tail(&evt_pending->list, &fault_param->faults);
}

- ret = iommu_handle_iopf(&evt->fault, dev);
+ ret = iommu_handle_iopf(&evt->fault, fault_param);
if (ret && evt_pending) {
- mutex_lock(&fparam->lock);
list_del(&evt_pending->list);
- mutex_unlock(&fparam->lock);
kfree(evt_pending);
}
done_unlock:
+ mutex_unlock(&fault_param->lock);
mutex_unlock(&param->lock);
+
return ret;
}
EXPORT_SYMBOL_GPL(iommu_report_device_fault);
@@ -219,18 +210,23 @@ int iommu_page_response(struct device *dev,
struct iopf_fault *evt;
struct iommu_fault_page_request *prm;
struct dev_iommu *param = dev->iommu;
+ struct iommu_fault_param *fault_param;
const struct iommu_ops *ops = dev_iommu_ops(dev);
bool has_pasid = msg->flags & IOMMU_PAGE_RESP_PASID_VALID;

if (!ops->page_response)
return -ENODEV;

- if (!param || !param->fault_param)
+ mutex_lock(&param->lock);
+ fault_param = param->fault_param;
+ if (!fault_param) {
+ mutex_unlock(&param->lock);
return -EINVAL;
+ }

/* Only send response if there is a fault report pending */
- mutex_lock(&param->fault_param->lock);
- if (list_empty(&param->fault_param->faults)) {
+ mutex_lock(&fault_param->lock);
+ if (list_empty(&fault_param->faults)) {
dev_warn_ratelimited(dev, "no pending PRQ, drop response\n");
goto done_unlock;
}
@@ -238,7 +234,7 @@ int iommu_page_response(struct device *dev,
* Check if we have a matching page request pending to respond,
* otherwise return -EINVAL
*/
- list_for_each_entry(evt, &param->fault_param->faults, list) {
+ list_for_each_entry(evt, &fault_param->faults, list) {
prm = &evt->fault.prm;
if (prm->grpid != msg->grpid)
continue;
@@ -266,7 +262,8 @@ int iommu_page_response(struct device *dev,
}

done_unlock:
- mutex_unlock(&param->fault_param->lock);
+ mutex_unlock(&fault_param->lock);
+ mutex_unlock(&param->lock);
return ret;
}
EXPORT_SYMBOL_GPL(iommu_page_response);
@@ -349,11 +346,13 @@ int iopf_queue_discard_partial(struct iopf_queue *queue)

mutex_lock(&queue->lock);
list_for_each_entry(iopf_param, &queue->devices, queue_list) {
+ mutex_lock(&iopf_param->lock);
list_for_each_entry_safe(iopf, next, &iopf_param->partial,
list) {
list_del(&iopf->list);
kfree(iopf);
}
+ mutex_unlock(&iopf_param->lock);
}
mutex_unlock(&queue->lock);
return 0;
--
2.34.1

2023-12-07 06:49:58

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v8 12/12] iommu: Use refcount for fault data access

The per-device fault data structure stores information about faults
occurring on a device. Its lifetime spans from IOPF enablement to
disablement. Multiple paths, including IOPF reporting, handling, and
responding, may access it concurrently.

Previously, a mutex protected the fault data from use after free. But
this is not performance friendly due to the critical nature of IOPF
handling paths.

Refine this with a refcount-based approach. The fault data pointer is
obtained within an RCU read region with a refcount. The fault data
pointer is returned for usage only when the pointer is valid and a
refcount is successfully obtained. The fault data is freed with
kfree_rcu(), ensuring data is only freed after all RCU critical regions
complete.

Suggested-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Tested-by: Yan Zhao <[email protected]>
---
include/linux/iommu.h | 4 ++
drivers/iommu/io-pgfault.c | 81 +++++++++++++++++++++++++-------------
2 files changed, 57 insertions(+), 28 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 63df77cc0b61..8020bb44a64a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -597,6 +597,8 @@ struct iommu_device {
/**
* struct iommu_fault_param - per-device IOMMU fault data
* @lock: protect pending faults list
+ * @users: user counter to manage the lifetime of the data
+ * @ruc: rcu head for kfree_rcu()
* @dev: the device that owns this param
* @queue: IOPF queue
* @queue_list: index into queue->devices
@@ -606,6 +608,8 @@ struct iommu_device {
*/
struct iommu_fault_param {
struct mutex lock;
+ refcount_t users;
+ struct rcu_head rcu;

struct device *dev;
struct iopf_queue *queue;
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 9439eaf54928..2ace32c6d13b 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -26,6 +26,44 @@ void iopf_free_group(struct iopf_group *group)
}
EXPORT_SYMBOL_GPL(iopf_free_group);

+/*
+ * Return the fault parameter of a device if it exists. Otherwise, return NULL.
+ * On a successful return, the caller takes a reference of this parameter and
+ * should put it after use by calling iopf_put_dev_fault_param().
+ */
+static struct iommu_fault_param *iopf_get_dev_fault_param(struct device *dev)
+{
+ struct dev_iommu *param = dev->iommu;
+ struct iommu_fault_param *fault_param;
+
+ if (!param)
+ return NULL;
+
+ rcu_read_lock();
+ fault_param = param->fault_param;
+ if (fault_param && !refcount_inc_not_zero(&fault_param->users))
+ fault_param = NULL;
+ rcu_read_unlock();
+
+ return fault_param;
+}
+
+/* Caller must hold a reference of the fault parameter. */
+static void iopf_put_dev_fault_param(struct iommu_fault_param *fault_param)
+{
+ struct dev_iommu *param = fault_param->dev->iommu;
+
+ rcu_read_lock();
+ if (!refcount_dec_and_test(&fault_param->users)) {
+ rcu_read_unlock();
+ return;
+ }
+ rcu_read_unlock();
+
+ param->fault_param = NULL;
+ kfree_rcu(fault_param, rcu);
+}
+
/**
* iommu_handle_iopf - IO Page Fault handler
* @fault: fault event
@@ -167,15 +205,11 @@ int iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
{
struct iommu_fault_param *fault_param;
struct iopf_fault *evt_pending = NULL;
- struct dev_iommu *param = dev->iommu;
int ret = 0;

- mutex_lock(&param->lock);
- fault_param = param->fault_param;
- if (!fault_param) {
- mutex_unlock(&param->lock);
+ fault_param = iopf_get_dev_fault_param(dev);
+ if (!fault_param)
return -EINVAL;
- }

mutex_lock(&fault_param->lock);
if (evt->fault.type == IOMMU_FAULT_PAGE_REQ &&
@@ -196,7 +230,7 @@ int iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
}
done_unlock:
mutex_unlock(&fault_param->lock);
- mutex_unlock(&param->lock);
+ iopf_put_dev_fault_param(fault_param);

return ret;
}
@@ -209,7 +243,6 @@ int iommu_page_response(struct device *dev,
int ret = -EINVAL;
struct iopf_fault *evt;
struct iommu_fault_page_request *prm;
- struct dev_iommu *param = dev->iommu;
struct iommu_fault_param *fault_param;
const struct iommu_ops *ops = dev_iommu_ops(dev);
bool has_pasid = msg->flags & IOMMU_PAGE_RESP_PASID_VALID;
@@ -217,12 +250,9 @@ int iommu_page_response(struct device *dev,
if (!ops->page_response)
return -ENODEV;

- mutex_lock(&param->lock);
- fault_param = param->fault_param;
- if (!fault_param) {
- mutex_unlock(&param->lock);
+ fault_param = iopf_get_dev_fault_param(dev);
+ if (!fault_param)
return -EINVAL;
- }

/* Only send response if there is a fault report pending */
mutex_lock(&fault_param->lock);
@@ -263,7 +293,8 @@ int iommu_page_response(struct device *dev,

done_unlock:
mutex_unlock(&fault_param->lock);
- mutex_unlock(&param->lock);
+ iopf_put_dev_fault_param(fault_param);
+
return ret;
}
EXPORT_SYMBOL_GPL(iommu_page_response);
@@ -282,22 +313,15 @@ EXPORT_SYMBOL_GPL(iommu_page_response);
*/
int iopf_queue_flush_dev(struct device *dev)
{
- int ret = 0;
- struct iommu_fault_param *iopf_param;
- struct dev_iommu *param = dev->iommu;
+ struct iommu_fault_param *iopf_param = iopf_get_dev_fault_param(dev);

- if (!param)
+ if (!iopf_param)
return -ENODEV;

- mutex_lock(&param->lock);
- iopf_param = param->fault_param;
- if (iopf_param)
- flush_workqueue(iopf_param->queue->wq);
- else
- ret = -ENODEV;
- mutex_unlock(&param->lock);
+ flush_workqueue(iopf_param->queue->wq);
+ iopf_put_dev_fault_param(iopf_param);

- return ret;
+ return 0;
}
EXPORT_SYMBOL_GPL(iopf_queue_flush_dev);

@@ -389,6 +413,8 @@ int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev)
INIT_LIST_HEAD(&fault_param->faults);
INIT_LIST_HEAD(&fault_param->partial);
fault_param->dev = dev;
+ refcount_set(&fault_param->users, 1);
+ init_rcu_head(&fault_param->rcu);
list_add(&fault_param->queue_list, &queue->devices);
fault_param->queue = queue;

@@ -441,8 +467,7 @@ int iopf_queue_remove_device(struct iopf_queue *queue, struct device *dev)
list_for_each_entry_safe(iopf, next, &fault_param->partial, list)
kfree(iopf);

- param->fault_param = NULL;
- kfree(fault_param);
+ iopf_put_dev_fault_param(fault_param);
unlock:
mutex_unlock(&param->lock);
mutex_unlock(&queue->lock);
--
2.34.1

2023-12-11 14:50:55

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v8 11/12] iommu: Refine locking for per-device fault data management

On Thu, Dec 07, 2023 at 02:43:07PM +0800, Lu Baolu wrote:
> The per-device fault data is a data structure that is used to store
> information about faults that occur on a device. This data is allocated
> when IOPF is enabled on the device and freed when IOPF is disabled. The
> data is used in the paths of iopf reporting, handling, responding, and
> draining.
>
> The fault data is protected by two locks:
>
> - dev->iommu->lock: This lock is used to protect the allocation and
> freeing of the fault data.
> - dev->iommu->fault_parameter->lock: This lock is used to protect the
> fault data itself.
>
> Apply the locking mechanism to the fault reporting and responding paths.
>
> The fault_parameter->lock is also added in iopf_queue_discard_partial().
> It does not fix any real issue, as iopf_queue_discard_partial() is only
> used in the VT-d driver's prq_event_thread(), which is a single-threaded
> path that reports the IOPFs.
>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Kevin Tian <[email protected]>
> Tested-by: Yan Zhao <[email protected]>
> Tested-by: Longfang Liu <[email protected]>
> ---
> drivers/iommu/io-pgfault.c | 61 +++++++++++++++++++-------------------
> 1 file changed, 30 insertions(+), 31 deletions(-)

Reviewed-by: Jason Gunthorpe <[email protected]>

Jason

2023-12-11 15:12:53

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v8 12/12] iommu: Use refcount for fault data access

On Thu, Dec 07, 2023 at 02:43:08PM +0800, Lu Baolu wrote:
> +/*
> + * Return the fault parameter of a device if it exists. Otherwise, return NULL.
> + * On a successful return, the caller takes a reference of this parameter and
> + * should put it after use by calling iopf_put_dev_fault_param().
> + */
> +static struct iommu_fault_param *iopf_get_dev_fault_param(struct device *dev)
> +{
> + struct dev_iommu *param = dev->iommu;
> + struct iommu_fault_param *fault_param;
> +
> + if (!param)
> + return NULL;

Is it actually possible to call this function on a device that does
not have an iommu driver probed? I'd be surprised by that, maybe this
should be WARN_ONE

> +
> + rcu_read_lock();
> + fault_param = param->fault_param;

The RCU stuff is not right, like this:

diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 2ace32c6d13bf3..0258f79c8ddf98 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -40,7 +40,7 @@ static struct iommu_fault_param *iopf_get_dev_fault_param(struct device *dev)
return NULL;

rcu_read_lock();
- fault_param = param->fault_param;
+ fault_param = rcu_dereference(param->fault_param);
if (fault_param && !refcount_inc_not_zero(&fault_param->users))
fault_param = NULL;
rcu_read_unlock();
@@ -51,17 +51,8 @@ static struct iommu_fault_param *iopf_get_dev_fault_param(struct device *dev)
/* Caller must hold a reference of the fault parameter. */
static void iopf_put_dev_fault_param(struct iommu_fault_param *fault_param)
{
- struct dev_iommu *param = fault_param->dev->iommu;
-
- rcu_read_lock();
- if (!refcount_dec_and_test(&fault_param->users)) {
- rcu_read_unlock();
- return;
- }
- rcu_read_unlock();
-
- param->fault_param = NULL;
- kfree_rcu(fault_param, rcu);
+ if (refcount_dec_and_test(&fault_param->users))
+ kfree_rcu(fault_param, rcu);
}

/**
@@ -174,7 +165,7 @@ static int iommu_handle_iopf(struct iommu_fault *fault,
}

mutex_unlock(&iopf_param->lock);
- ret = domain->iopf_handler(group);
+ ret = domain->iopf_handler(iopf_param, group);
mutex_lock(&iopf_param->lock);
if (ret)
iopf_free_group(group);
@@ -398,7 +389,8 @@ int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev)

mutex_lock(&queue->lock);
mutex_lock(&param->lock);
- if (param->fault_param) {
+ if (rcu_dereference_check(param->fault_param,
+ lockdep_is_held(&param->lock))) {
ret = -EBUSY;
goto done_unlock;
}
@@ -418,7 +410,7 @@ int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev)
list_add(&fault_param->queue_list, &queue->devices);
fault_param->queue = queue;

- param->fault_param = fault_param;
+ rcu_assign_pointer(param->fault_param, fault_param);

done_unlock:
mutex_unlock(&param->lock);
@@ -442,10 +434,12 @@ int iopf_queue_remove_device(struct iopf_queue *queue, struct device *dev)
int ret = 0;
struct iopf_fault *iopf, *next;
struct dev_iommu *param = dev->iommu;
- struct iommu_fault_param *fault_param = param->fault_param;
+ struct iommu_fault_param *fault_param;

mutex_lock(&queue->lock);
mutex_lock(&param->lock);
+ fault_param = rcu_dereference_check(param->fault_param,
+ lockdep_is_held(&param->lock));
if (!fault_param) {
ret = -ENODEV;
goto unlock;
@@ -467,7 +461,10 @@ int iopf_queue_remove_device(struct iopf_queue *queue, struct device *dev)
list_for_each_entry_safe(iopf, next, &fault_param->partial, list)
kfree(iopf);

- iopf_put_dev_fault_param(fault_param);
+ /* dec the ref owned by iopf_queue_add_device() */
+ rcu_assign_pointer(param->fault_param, NULL);
+ if (refcount_dec_and_test(&fault_param->users))
+ kfree_rcu(fault_param, rcu);
unlock:
mutex_unlock(&param->lock);
mutex_unlock(&queue->lock);
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 325d1810e133a1..63c1a233a7e91f 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -232,10 +232,9 @@ static void iommu_sva_handle_iopf(struct work_struct *work)
iopf_free_group(group);
}

-static int iommu_sva_iopf_handler(struct iopf_group *group)
+static int iommu_sva_iopf_handler(struct iommu_fault_param *fault_param,
+ struct iopf_group *group)
{
- struct iommu_fault_param *fault_param = group->dev->iommu->fault_param;
-
INIT_WORK(&group->work, iommu_sva_handle_iopf);
if (!queue_work(fault_param->queue->wq, &group->work))
return -EBUSY;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 8020bb44a64ab1..e16fa9811d5023 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -41,6 +41,7 @@ struct iommu_dirty_ops;
struct notifier_block;
struct iommu_sva;
struct iommu_dma_cookie;
+struct iommu_fault_param;

#define IOMMU_FAULT_PERM_READ (1 << 0) /* read */
#define IOMMU_FAULT_PERM_WRITE (1 << 1) /* write */
@@ -210,7 +211,8 @@ struct iommu_domain {
unsigned long pgsize_bitmap; /* Bitmap of page sizes in use */
struct iommu_domain_geometry geometry;
struct iommu_dma_cookie *iova_cookie;
- int (*iopf_handler)(struct iopf_group *group);
+ int (*iopf_handler)(struct iommu_fault_param *fault_param,
+ struct iopf_group *group);
void *fault_data;
union {
struct {
@@ -637,7 +639,7 @@ struct iommu_fault_param {
*/
struct dev_iommu {
struct mutex lock;
- struct iommu_fault_param *fault_param;
+ struct iommu_fault_param __rcu *fault_param;
struct iommu_fwspec *fwspec;
struct iommu_device *iommu_dev;
void *priv;

2023-12-11 15:25:10

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v8 12/12] iommu: Use refcount for fault data access

On Thu, Dec 07, 2023 at 02:43:08PM +0800, Lu Baolu wrote:
> @@ -217,12 +250,9 @@ int iommu_page_response(struct device *dev,
> if (!ops->page_response)
> return -ENODEV;
>
> - mutex_lock(&param->lock);
> - fault_param = param->fault_param;
> - if (!fault_param) {
> - mutex_unlock(&param->lock);
> + fault_param = iopf_get_dev_fault_param(dev);
> + if (!fault_param)
> return -EINVAL;
> - }

The refcounting should work by passing around the fault_param object,
not re-obtaining it from the dev from a work.

The work should be locked to the iommu_fault_param that was active
when the work was launched.

When we get to iommu_page_response it does this:

/* Only send response if there is a fault report pending */
mutex_lock(&fault_param->lock);
if (list_empty(&fault_param->faults)) {
dev_warn_ratelimited(dev, "no pending PRQ, drop response\n");
goto done_unlock;
}

Which determines that the iommu_fault_param is stale and pending
free..

Also iopf_queue_remove_device() is messed up - it returns an error
code but nothing ever does anything with it :( Remove functions like
this should never fail.

Removal should be like I explained earlier:
- Disable new PRI reception
- Ack all outstanding PRQ to the device
- Disable PRI on the device
- Tear down the iopf infrastructure

So under this model if the iopf_queue_remove_device() has been called
it should be sort of a 'disassociate' action where fault_param is
still floating out there but iommu_page_response() does nothing.

IOW pass the refcount from the iommu_report_device_fault() down into
the fault handler, into the work and then into iommu_page_response()
which will ultimately put it back.

> @@ -282,22 +313,15 @@ EXPORT_SYMBOL_GPL(iommu_page_response);
> */
> int iopf_queue_flush_dev(struct device *dev)
> {
> - int ret = 0;
> - struct iommu_fault_param *iopf_param;
> - struct dev_iommu *param = dev->iommu;
> + struct iommu_fault_param *iopf_param = iopf_get_dev_fault_param(dev);
>
> - if (!param)
> + if (!iopf_param)
> return -ENODEV;

And this also seems unnecessary, it is a bug to call this after
iopf_queue_remove_device() right? Just
rcu_derefernce(param->fault_param, true) and WARN_ON NULL.

Jason

2023-12-12 03:49:31

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH v8 12/12] iommu: Use refcount for fault data access

On 12/11/23 11:12 PM, Jason Gunthorpe wrote:
> On Thu, Dec 07, 2023 at 02:43:08PM +0800, Lu Baolu wrote:
>> +/*
>> + * Return the fault parameter of a device if it exists. Otherwise, return NULL.
>> + * On a successful return, the caller takes a reference of this parameter and
>> + * should put it after use by calling iopf_put_dev_fault_param().
>> + */
>> +static struct iommu_fault_param *iopf_get_dev_fault_param(struct device *dev)
>> +{
>> + struct dev_iommu *param = dev->iommu;
>> + struct iommu_fault_param *fault_param;
>> +
>> + if (!param)
>> + return NULL;
>
> Is it actually possible to call this function on a device that does
> not have an iommu driver probed? I'd be surprised by that, maybe this
> should be WARN_ONE

Above check seems to be unnecessary. This helper should only be used
during the iommu probe and release. We can just remove it as any drivers
that abuse this will generate a null-pointer reference warning.

>
>> +
>> + rcu_read_lock();
>> + fault_param = param->fault_param;
>
> The RCU stuff is not right, like this:
>
> diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
> index 2ace32c6d13bf3..0258f79c8ddf98 100644
> --- a/drivers/iommu/io-pgfault.c
> +++ b/drivers/iommu/io-pgfault.c
> @@ -40,7 +40,7 @@ static struct iommu_fault_param *iopf_get_dev_fault_param(struct device *dev)
> return NULL;
>
> rcu_read_lock();
> - fault_param = param->fault_param;
> + fault_param = rcu_dereference(param->fault_param);
> if (fault_param && !refcount_inc_not_zero(&fault_param->users))
> fault_param = NULL;
> rcu_read_unlock();
> @@ -51,17 +51,8 @@ static struct iommu_fault_param *iopf_get_dev_fault_param(struct device *dev)
> /* Caller must hold a reference of the fault parameter. */
> static void iopf_put_dev_fault_param(struct iommu_fault_param *fault_param)
> {
> - struct dev_iommu *param = fault_param->dev->iommu;
> -
> - rcu_read_lock();
> - if (!refcount_dec_and_test(&fault_param->users)) {
> - rcu_read_unlock();
> - return;
> - }
> - rcu_read_unlock();
> -
> - param->fault_param = NULL;
> - kfree_rcu(fault_param, rcu);
> + if (refcount_dec_and_test(&fault_param->users))
> + kfree_rcu(fault_param, rcu);
> }
>
> /**
> @@ -174,7 +165,7 @@ static int iommu_handle_iopf(struct iommu_fault *fault,
> }
>
> mutex_unlock(&iopf_param->lock);
> - ret = domain->iopf_handler(group);
> + ret = domain->iopf_handler(iopf_param, group);
> mutex_lock(&iopf_param->lock);
> if (ret)
> iopf_free_group(group);
> @@ -398,7 +389,8 @@ int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev)
>
> mutex_lock(&queue->lock);
> mutex_lock(&param->lock);
> - if (param->fault_param) {
> + if (rcu_dereference_check(param->fault_param,
> + lockdep_is_held(&param->lock))) {
> ret = -EBUSY;
> goto done_unlock;
> }
> @@ -418,7 +410,7 @@ int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev)
> list_add(&fault_param->queue_list, &queue->devices);
> fault_param->queue = queue;
>
> - param->fault_param = fault_param;
> + rcu_assign_pointer(param->fault_param, fault_param);
>
> done_unlock:
> mutex_unlock(&param->lock);
> @@ -442,10 +434,12 @@ int iopf_queue_remove_device(struct iopf_queue *queue, struct device *dev)
> int ret = 0;
> struct iopf_fault *iopf, *next;
> struct dev_iommu *param = dev->iommu;
> - struct iommu_fault_param *fault_param = param->fault_param;
> + struct iommu_fault_param *fault_param;
>
> mutex_lock(&queue->lock);
> mutex_lock(&param->lock);
> + fault_param = rcu_dereference_check(param->fault_param,
> + lockdep_is_held(&param->lock));
> if (!fault_param) {
> ret = -ENODEV;
> goto unlock;
> @@ -467,7 +461,10 @@ int iopf_queue_remove_device(struct iopf_queue *queue, struct device *dev)
> list_for_each_entry_safe(iopf, next, &fault_param->partial, list)
> kfree(iopf);
>
> - iopf_put_dev_fault_param(fault_param);
> + /* dec the ref owned by iopf_queue_add_device() */
> + rcu_assign_pointer(param->fault_param, NULL);
> + if (refcount_dec_and_test(&fault_param->users))
> + kfree_rcu(fault_param, rcu);
> unlock:
> mutex_unlock(&param->lock);
> mutex_unlock(&queue->lock);
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index 325d1810e133a1..63c1a233a7e91f 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -232,10 +232,9 @@ static void iommu_sva_handle_iopf(struct work_struct *work)
> iopf_free_group(group);
> }
>
> -static int iommu_sva_iopf_handler(struct iopf_group *group)
> +static int iommu_sva_iopf_handler(struct iommu_fault_param *fault_param,
> + struct iopf_group *group)
> {
> - struct iommu_fault_param *fault_param = group->dev->iommu->fault_param;
> -
> INIT_WORK(&group->work, iommu_sva_handle_iopf);
> if (!queue_work(fault_param->queue->wq, &group->work))
> return -EBUSY;
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 8020bb44a64ab1..e16fa9811d5023 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -41,6 +41,7 @@ struct iommu_dirty_ops;
> struct notifier_block;
> struct iommu_sva;
> struct iommu_dma_cookie;
> +struct iommu_fault_param;
>
> #define IOMMU_FAULT_PERM_READ (1 << 0) /* read */
> #define IOMMU_FAULT_PERM_WRITE (1 << 1) /* write */
> @@ -210,7 +211,8 @@ struct iommu_domain {
> unsigned long pgsize_bitmap; /* Bitmap of page sizes in use */
> struct iommu_domain_geometry geometry;
> struct iommu_dma_cookie *iova_cookie;
> - int (*iopf_handler)(struct iopf_group *group);
> + int (*iopf_handler)(struct iommu_fault_param *fault_param,
> + struct iopf_group *group);

How about folding fault_param into iopf_group?

iopf_group is the central data around a iopf handling. The iopf_group
holds the reference count of the device's fault parameter structure
throughout its entire lifecycle.

> void *fault_data;
> union {
> struct {
> @@ -637,7 +639,7 @@ struct iommu_fault_param {
> */
> struct dev_iommu {
> struct mutex lock;
> - struct iommu_fault_param *fault_param;
> + struct iommu_fault_param __rcu *fault_param;
> struct iommu_fwspec *fwspec;
> struct iommu_device *iommu_dev;
> void *priv;

The iommu_page_response() needs to change accordingly which is pointed
out in the next email.

Others look good to me. Thank you so much!

Best regards,
baolu

2023-12-12 05:12:56

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH v8 12/12] iommu: Use refcount for fault data access

On 12/11/23 11:24 PM, Jason Gunthorpe wrote:
> On Thu, Dec 07, 2023 at 02:43:08PM +0800, Lu Baolu wrote:
>> @@ -217,12 +250,9 @@ int iommu_page_response(struct device *dev,
>> if (!ops->page_response)
>> return -ENODEV;
>>
>> - mutex_lock(&param->lock);
>> - fault_param = param->fault_param;
>> - if (!fault_param) {
>> - mutex_unlock(&param->lock);
>> + fault_param = iopf_get_dev_fault_param(dev);
>> + if (!fault_param)
>> return -EINVAL;
>> - }
> The refcounting should work by passing around the fault_param object,
> not re-obtaining it from the dev from a work.
>
> The work should be locked to the iommu_fault_param that was active
> when the work was launched.
>
> When we get to iommu_page_response it does this:
>
> /* Only send response if there is a fault report pending */
> mutex_lock(&fault_param->lock);
> if (list_empty(&fault_param->faults)) {
> dev_warn_ratelimited(dev, "no pending PRQ, drop response\n");
> goto done_unlock;
> }
>
> Which determines that the iommu_fault_param is stale and pending
> free..

Yes, agreed. The iopf_fault_param should be passed in together with the
iopf_group. The reference count should be released in the
iopf_free_group(). These two helps could look like below:

int iommu_page_response(struct iopf_group *group,
struct iommu_page_response *msg)
{
bool needs_pasid;
int ret = -EINVAL;
struct iopf_fault *evt;
struct iommu_fault_page_request *prm;
struct device *dev = group->fault_param->dev;
const struct iommu_ops *ops = dev_iommu_ops(dev);
bool has_pasid = msg->flags & IOMMU_PAGE_RESP_PASID_VALID;
struct iommu_fault_param *fault_param = group->fault_param;

if (!ops->page_response)
return -ENODEV;

/* Only send response if there is a fault report pending */
mutex_lock(&fault_param->lock);
if (list_empty(&fault_param->faults)) {
dev_warn_ratelimited(dev, "no pending PRQ, drop response\n");
goto done_unlock;
}
/*
* Check if we have a matching page request pending to respond,
* otherwise return -EINVAL
*/
list_for_each_entry(evt, &fault_param->faults, list) {
prm = &evt->fault.prm;
if (prm->grpid != msg->grpid)
continue;

/*
* If the PASID is required, the corresponding request is
* matched using the group ID, the PASID valid bit and the PASID
* value. Otherwise only the group ID matches request and
* response.
*/
needs_pasid = prm->flags & IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID;
if (needs_pasid && (!has_pasid || msg->pasid != prm->pasid))
continue;

if (!needs_pasid && has_pasid) {
/* No big deal, just clear it. */
msg->flags &= ~IOMMU_PAGE_RESP_PASID_VALID;
msg->pasid = 0;
}

ret = ops->page_response(dev, evt, msg);
list_del(&evt->list);
kfree(evt);
break;
}

done_unlock:
mutex_unlock(&fault_param->lock);

return ret;
}

...

void iopf_free_group(struct iopf_group *group)
{
struct iopf_fault *iopf, *next;

list_for_each_entry_safe(iopf, next, &group->faults, list) {
if (!(iopf->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE))
kfree(iopf);
}

/* Pair with iommu_report_device_fault(). */
iopf_put_dev_fault_param(group->fault_param);
kfree(group);
}

Best regards,
baolu

2023-12-12 05:23:12

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH v8 12/12] iommu: Use refcount for fault data access

On 12/11/23 11:24 PM, Jason Gunthorpe wrote:
> Also iopf_queue_remove_device() is messed up - it returns an error
> code but nothing ever does anything with it ???? Remove functions like
> this should never fail.

Yes, agreed.

>
> Removal should be like I explained earlier:
> - Disable new PRI reception

This could be done by

rcu_assign_pointer(param->fault_param, NULL);

?

> - Ack all outstanding PRQ to the device

All outstanding page requests are responded with
IOMMU_PAGE_RESP_INVALID, indicating that device should not attempt any
retry.

> - Disable PRI on the device
> - Tear down the iopf infrastructure
>
> So under this model if the iopf_queue_remove_device() has been called
> it should be sort of a 'disassociate' action where fault_param is
> still floating out there but iommu_page_response() does nothing.

Yes. All pending requests have been auto-responded.

> IOW pass the refcount from the iommu_report_device_fault() down into
> the fault handler, into the work and then into iommu_page_response()
> which will ultimately put it back.

Yes.

Best regards,
baolu

2023-12-12 05:28:21

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH v8 12/12] iommu: Use refcount for fault data access

On 12/11/23 11:24 PM, Jason Gunthorpe wrote:
>> @@ -282,22 +313,15 @@ EXPORT_SYMBOL_GPL(iommu_page_response);
>> */
>> int iopf_queue_flush_dev(struct device *dev)
>> {
>> - int ret = 0;
>> - struct iommu_fault_param *iopf_param;
>> - struct dev_iommu *param = dev->iommu;
>> + struct iommu_fault_param *iopf_param = iopf_get_dev_fault_param(dev);
>>
>> - if (!param)
>> + if (!iopf_param)
>> return -ENODEV;
> And this also seems unnecessary, it is a bug to call this after
> iopf_queue_remove_device() right? Just

Yes. They both are called from the iommu driver. The iommu driver should
guarantee this.

> rcu_derefernce(param->fault_param, true) and WARN_ON NULL.

Okay, sure.

Best regards,
baolu

2023-12-12 15:15:00

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v8 12/12] iommu: Use refcount for fault data access

On Tue, Dec 12, 2023 at 01:17:47PM +0800, Baolu Lu wrote:
> On 12/11/23 11:24 PM, Jason Gunthorpe wrote:
> > Also iopf_queue_remove_device() is messed up - it returns an error
> > code but nothing ever does anything with it ???? Remove functions like
> > this should never fail.
>
> Yes, agreed.
>
> >
> > Removal should be like I explained earlier:
> > - Disable new PRI reception
>
> This could be done by
>
> rcu_assign_pointer(param->fault_param, NULL);
>
> ?

Not without a synchronize_rcu

disable new PRI reception should be done by the driver - it should
turn off PRI generation in the IOMMU HW and flush any HW PRI queues.

> > - Ack all outstanding PRQ to the device
>
> All outstanding page requests are responded with
> IOMMU_PAGE_RESP_INVALID, indicating that device should not attempt any
> retry.

Yes

Jason

2023-12-12 15:18:25

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v8 12/12] iommu: Use refcount for fault data access

On Tue, Dec 12, 2023 at 01:07:17PM +0800, Baolu Lu wrote:

> Yes, agreed. The iopf_fault_param should be passed in together with the
> iopf_group. The reference count should be released in the
> iopf_free_group(). These two helps could look like below:
>
> int iommu_page_response(struct iopf_group *group,
> struct iommu_page_response *msg)
> {
> bool needs_pasid;
> int ret = -EINVAL;
> struct iopf_fault *evt;
> struct iommu_fault_page_request *prm;
> struct device *dev = group->fault_param->dev;
> const struct iommu_ops *ops = dev_iommu_ops(dev);
> bool has_pasid = msg->flags & IOMMU_PAGE_RESP_PASID_VALID;
> struct iommu_fault_param *fault_param = group->fault_param;
>
> if (!ops->page_response)
> return -ENODEV;

We should never get here if this is the case, prevent the device from
being added in the first place

> /* Only send response if there is a fault report pending */
> mutex_lock(&fault_param->lock);
> if (list_empty(&fault_param->faults)) {
> dev_warn_ratelimited(dev, "no pending PRQ, drop response\n");
> goto done_unlock;
> }
> /*
> * Check if we have a matching page request pending to respond,
> * otherwise return -EINVAL
> */
> list_for_each_entry(evt, &fault_param->faults, list) {
> prm = &evt->fault.prm;
> if (prm->grpid != msg->grpid)
> continue;
>
> /*
> * If the PASID is required, the corresponding request is
> * matched using the group ID, the PASID valid bit and the PASID
> * value. Otherwise only the group ID matches request and
> * response.
> */
> needs_pasid = prm->flags & IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID;
> if (needs_pasid && (!has_pasid || msg->pasid != prm->pasid))
> continue;
>
> if (!needs_pasid && has_pasid) {
> /* No big deal, just clear it. */
> msg->flags &= ~IOMMU_PAGE_RESP_PASID_VALID;
> msg->pasid = 0;
> }
>
> ret = ops->page_response(dev, evt, msg);
> list_del(&evt->list);
> kfree(evt);
> break;
> }
>
> done_unlock:
> mutex_unlock(&fault_param->lock);

I would have expected the group to free'd here? But regardless this
looks like a good direction

Jason

2023-12-12 15:19:01

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v8 12/12] iommu: Use refcount for fault data access

On Tue, Dec 12, 2023 at 11:44:14AM +0800, Baolu Lu wrote:
> > @@ -210,7 +211,8 @@ struct iommu_domain {
> > unsigned long pgsize_bitmap; /* Bitmap of page sizes in use */
> > struct iommu_domain_geometry geometry;
> > struct iommu_dma_cookie *iova_cookie;
> > - int (*iopf_handler)(struct iopf_group *group);
> > + int (*iopf_handler)(struct iommu_fault_param *fault_param,
> > + struct iopf_group *group);
>
> How about folding fault_param into iopf_group?
>
> iopf_group is the central data around a iopf handling. The iopf_group
> holds the reference count of the device's fault parameter structure
> throughout its entire lifecycle.

Yeah, I think that is the right thing to do

Jason

2023-12-13 02:18:55

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH v8 12/12] iommu: Use refcount for fault data access

On 12/12/23 11:14 PM, Jason Gunthorpe wrote:
> On Tue, Dec 12, 2023 at 01:17:47PM +0800, Baolu Lu wrote:
>> On 12/11/23 11:24 PM, Jason Gunthorpe wrote:
>>> Also iopf_queue_remove_device() is messed up - it returns an error
>>> code but nothing ever does anything with it ???? Remove functions like
>>> this should never fail.
>> Yes, agreed.
>>
>>> Removal should be like I explained earlier:
>>> - Disable new PRI reception
>> This could be done by
>>
>> rcu_assign_pointer(param->fault_param, NULL);
>>
>> ?
> Not without a synchronize_rcu
>
> disable new PRI reception should be done by the driver - it should
> turn off PRI generation in the IOMMU HW and flush any HW PRI queues.

Yeah! Get you now.

Best regards,
baolu

2023-12-13 02:24:45

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH v8 12/12] iommu: Use refcount for fault data access

On 12/12/23 11:18 PM, Jason Gunthorpe wrote:
> On Tue, Dec 12, 2023 at 01:07:17PM +0800, Baolu Lu wrote:
>
>> Yes, agreed. The iopf_fault_param should be passed in together with the
>> iopf_group. The reference count should be released in the
>> iopf_free_group(). These two helps could look like below:
>>
>> int iommu_page_response(struct iopf_group *group,
>> struct iommu_page_response *msg)
>> {
>> bool needs_pasid;
>> int ret = -EINVAL;
>> struct iopf_fault *evt;
>> struct iommu_fault_page_request *prm;
>> struct device *dev = group->fault_param->dev;
>> const struct iommu_ops *ops = dev_iommu_ops(dev);
>> bool has_pasid = msg->flags & IOMMU_PAGE_RESP_PASID_VALID;
>> struct iommu_fault_param *fault_param = group->fault_param;
>>
>> if (!ops->page_response)
>> return -ENODEV;
>
> We should never get here if this is the case, prevent the device from
> being added in the first place

Yeah, could move it to iopf_queue_add_device(). WARN and return failure
there if the driver is not ready for page request handling.

>
>> /* Only send response if there is a fault report pending */
>> mutex_lock(&fault_param->lock);
>> if (list_empty(&fault_param->faults)) {
>> dev_warn_ratelimited(dev, "no pending PRQ, drop response\n");
>> goto done_unlock;
>> }
>> /*
>> * Check if we have a matching page request pending to respond,
>> * otherwise return -EINVAL
>> */
>> list_for_each_entry(evt, &fault_param->faults, list) {
>> prm = &evt->fault.prm;
>> if (prm->grpid != msg->grpid)
>> continue;
>>
>> /*
>> * If the PASID is required, the corresponding request is
>> * matched using the group ID, the PASID valid bit and the PASID
>> * value. Otherwise only the group ID matches request and
>> * response.
>> */
>> needs_pasid = prm->flags & IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID;
>> if (needs_pasid && (!has_pasid || msg->pasid != prm->pasid))
>> continue;
>>
>> if (!needs_pasid && has_pasid) {
>> /* No big deal, just clear it. */
>> msg->flags &= ~IOMMU_PAGE_RESP_PASID_VALID;
>> msg->pasid = 0;
>> }
>>
>> ret = ops->page_response(dev, evt, msg);
>> list_del(&evt->list);
>> kfree(evt);
>> break;
>> }
>>
>> done_unlock:
>> mutex_unlock(&fault_param->lock);
>
> I would have expected the group to free'd here? But regardless this
> looks like a good direction

Both work for me. We can decide it according to the needs of code later.

>
> Jason

Best regards,
baolu