2019-06-03 15:00:06

by Jean-Philippe Brucker

[permalink] [raw]
Subject: [PATCH v2 0/4] iommu: Add device fault reporting API

Allow device drivers and VFIO to get notified on IOMMU translation
fault, and handle recoverable faults (PCI PRI). Several series require
this API (Intel VT-d and Arm SMMUv3 nested support, as well as the
generic host SVA implementation).

Changes since v1 [1]:
* Allocate iommu_param earlier, in iommu_probe_device().
* Pass struct iommu_fault to fault handlers, instead of the
iommu_fault_event wrapper.
* Removed unused iommu_fault_event::iommu_private.
* Removed unnecessary iommu_page_response::addr.
* Added iommu_page_response::version, which would allow to introduce a
new incompatible iommu_page_response structure (as opposed to just
adding a flag + field).

[1] [PATCH 0/4] iommu: Add device fault reporting API
https://lore.kernel.org/lkml/[email protected]/

Jacob Pan (3):
driver core: Add per device iommu param
iommu: Introduce device fault data
iommu: Introduce device fault report API

Jean-Philippe Brucker (1):
iommu: Add recoverable fault reporting

drivers/iommu/iommu.c | 236 ++++++++++++++++++++++++++++++++++++-
include/linux/device.h | 3 +
include/linux/iommu.h | 87 ++++++++++++++
include/uapi/linux/iommu.h | 153 ++++++++++++++++++++++++
4 files changed, 476 insertions(+), 3 deletions(-)
create mode 100644 include/uapi/linux/iommu.h

--
2.21.0


2019-06-03 15:00:21

by Jean-Philippe Brucker

[permalink] [raw]
Subject: [PATCH v2 4/4] iommu: Add recoverable fault reporting

Some IOMMU hardware features, for example PCI PRI and Arm SMMU Stall,
enable recoverable I/O page faults. Allow IOMMU drivers to report PRI Page
Requests and Stall events through the new fault reporting API. The
consumer of the fault can be either an I/O page fault handler in the host,
or a guest OS.

Once handled, the fault must be completed by sending a page response back
to the IOMMU. Add an iommu_page_response() function to complete a page
fault.

There are two ways to extend the userspace API:
* Add a field to iommu_page_response and a flag to
iommu_page_response::flags describing the validity of this field.
* Introduce a new iommu_page_response_X structure with a different version
number. The kernel must then support both versions.

Signed-off-by: Jacob Pan <[email protected]>
Signed-off-by: Jean-Philippe Brucker <[email protected]>
---
drivers/iommu/iommu.c | 94 +++++++++++++++++++++++++++++++++++++-
include/linux/iommu.h | 19 ++++++++
include/uapi/linux/iommu.h | 35 ++++++++++++++
3 files changed, 146 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 8037a3f07f07..956a80364efd 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -891,7 +891,14 @@ EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
* @data: private data passed as argument to the handler
*
* When an IOMMU fault event is received, this handler gets called with the
- * fault event and data as argument. The handler should return 0 on success.
+ * fault event and data as argument. The handler should return 0 on success. If
+ * the fault is recoverable (IOMMU_FAULT_PAGE_REQ), the consumer should also
+ * complete the fault by calling iommu_page_response() with one of the following
+ * response code:
+ * - IOMMU_PAGE_RESP_SUCCESS: retry the translation
+ * - IOMMU_PAGE_RESP_INVALID: terminate the fault
+ * - IOMMU_PAGE_RESP_FAILURE: terminate the fault and stop reporting
+ * page faults if possible.
*
* Return 0 if the fault handler was installed successfully, or an error.
*/
@@ -921,6 +928,8 @@ int iommu_register_device_fault_handler(struct device *dev,
}
param->fault_param->handler = handler;
param->fault_param->data = data;
+ mutex_init(&param->fault_param->lock);
+ INIT_LIST_HEAD(&param->fault_param->faults);

done_unlock:
mutex_unlock(&param->lock);
@@ -951,6 +960,12 @@ int iommu_unregister_device_fault_handler(struct device *dev)
if (!param->fault_param)
goto unlock;

+ /* we cannot unregister handler if there are pending faults */
+ if (!list_empty(&param->fault_param->faults)) {
+ ret = -EBUSY;
+ goto unlock;
+ }
+
kfree(param->fault_param);
param->fault_param = NULL;
put_device(dev);
@@ -967,13 +982,15 @@ EXPORT_SYMBOL_GPL(iommu_unregister_device_fault_handler);
* @evt: fault event data
*
* Called by IOMMU drivers when a fault is detected, typically in a threaded IRQ
- * handler.
+ * handler. When this function fails and the fault is recoverable, it is the
+ * caller's responsibility to complete the fault.
*
* Return 0 on success, or an error.
*/
int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
{
struct iommu_param *param = dev->iommu_param;
+ struct iommu_fault_event *evt_pending = NULL;
struct iommu_fault_param *fparam;
int ret = 0;

@@ -987,13 +1004,86 @@ int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
ret = -EINVAL;
goto done_unlock;
}
+
+ if (evt->fault.type == IOMMU_FAULT_PAGE_REQ &&
+ (evt->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) {
+ evt_pending = kmemdup(evt, sizeof(struct iommu_fault_event),
+ GFP_KERNEL);
+ if (!evt_pending) {
+ ret = -ENOMEM;
+ goto done_unlock;
+ }
+ mutex_lock(&fparam->lock);
+ list_add_tail(&evt_pending->list, &fparam->faults);
+ mutex_unlock(&fparam->lock);
+ }
+
ret = fparam->handler(&evt->fault, fparam->data);
+ if (ret && evt_pending) {
+ mutex_lock(&fparam->lock);
+ list_del(&evt_pending->list);
+ mutex_unlock(&fparam->lock);
+ kfree(evt_pending);
+ }
done_unlock:
mutex_unlock(&param->lock);
return ret;
}
EXPORT_SYMBOL_GPL(iommu_report_device_fault);

+int iommu_page_response(struct device *dev,
+ struct iommu_page_response *msg)
+{
+ bool pasid_valid;
+ int ret = -EINVAL;
+ struct iommu_fault_event *evt;
+ struct iommu_fault_page_request *prm;
+ struct iommu_param *param = dev->iommu_param;
+ struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+ if (!domain || !domain->ops->page_response)
+ return -ENODEV;
+
+ if (!param || !param->fault_param)
+ return -EINVAL;
+
+ if (msg->version != IOMMU_PAGE_RESP_VERSION_1 ||
+ msg->flags & ~IOMMU_PAGE_RESP_PASID_VALID)
+ return -EINVAL;
+
+ /* Only send response if there is a fault report pending */
+ mutex_lock(&param->fault_param->lock);
+ if (list_empty(&param->fault_param->faults)) {
+ dev_warn_ratelimited(dev, "no pending PRQ, drop response\n");
+ goto done_unlock;
+ }
+ /*
+ * Check if we have a matching page request pending to respond,
+ * otherwise return -EINVAL
+ */
+ list_for_each_entry(evt, &param->fault_param->faults, list) {
+ prm = &evt->fault.prm;
+ pasid_valid = prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
+
+ if ((pasid_valid && prm->pasid != msg->pasid) ||
+ prm->grpid != msg->grpid)
+ continue;
+
+ /* Sanitize the reply */
+ msg->flags = pasid_valid ? IOMMU_PAGE_RESP_PASID_VALID : 0;
+
+ ret = domain->ops->page_response(dev, evt, msg);
+ list_del(&evt->list);
+ kfree(evt);
+ break;
+ }
+
+done_unlock:
+ mutex_unlock(&param->fault_param->lock);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_page_response);
+
/**
* iommu_group_id - Return ID for a group
* @group: the group to ID
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 3e783f5bf472..76c8cda61dfd 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -227,6 +227,7 @@ struct iommu_sva_ops {
* @sva_bind: Bind process address space to device
* @sva_unbind: Unbind process address space from device
* @sva_get_pasid: Get PASID associated to a SVA handle
+ * @page_response: handle page request response
* @pgsize_bitmap: bitmap of all possible supported page sizes
*/
struct iommu_ops {
@@ -287,6 +288,10 @@ struct iommu_ops {
void (*sva_unbind)(struct iommu_sva *handle);
int (*sva_get_pasid)(struct iommu_sva *handle);

+ int (*page_response)(struct device *dev,
+ struct iommu_fault_event *evt,
+ struct iommu_page_response *msg);
+
unsigned long pgsize_bitmap;
};

@@ -311,19 +316,25 @@ struct iommu_device {
* unrecoverable faults such as DMA or IRQ remapping faults.
*
* @fault: fault descriptor
+ * @list: pending fault event list, used for tracking responses
*/
struct iommu_fault_event {
struct iommu_fault fault;
+ struct list_head list;
};

/**
* struct iommu_fault_param - per-device IOMMU fault data
* @handler: Callback function to handle IOMMU faults at device level
* @data: handler private data
+ * @faults: holds the pending faults which needs response
+ * @lock: protect pending faults list
*/
struct iommu_fault_param {
iommu_dev_fault_handler_t handler;
void *data;
+ struct list_head faults;
+ struct mutex lock;
};

/**
@@ -437,6 +448,8 @@ extern int iommu_unregister_device_fault_handler(struct device *dev);

extern int iommu_report_device_fault(struct device *dev,
struct iommu_fault_event *evt);
+extern int iommu_page_response(struct device *dev,
+ struct iommu_page_response *msg);

extern int iommu_group_id(struct iommu_group *group);
extern struct iommu_group *iommu_group_get_for_dev(struct device *dev);
@@ -765,6 +778,12 @@ int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
return -ENODEV;
}

+static inline int iommu_page_response(struct device *dev,
+ struct iommu_page_response *msg)
+{
+ return -ENODEV;
+}
+
static inline int iommu_group_id(struct iommu_group *group)
{
return -ENODEV;
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index 796402174d6c..f45d8e9e59c3 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -115,4 +115,39 @@ struct iommu_fault {
struct iommu_fault_page_request prm;
};
};
+
+/**
+ * enum iommu_page_response_code - Return status of fault handlers
+ * @IOMMU_PAGE_RESP_SUCCESS: Fault has been handled and the page tables
+ * populated, retry the access. This is "Success" in PCI PRI.
+ * @IOMMU_PAGE_RESP_FAILURE: General error. Drop all subsequent faults from
+ * this device if possible. This is "Response Failure" in PCI PRI.
+ * @IOMMU_PAGE_RESP_INVALID: Could not handle this fault, don't retry the
+ * access. This is "Invalid Request" in PCI PRI.
+ */
+enum iommu_page_response_code {
+ IOMMU_PAGE_RESP_SUCCESS = 0,
+ IOMMU_PAGE_RESP_INVALID,
+ IOMMU_PAGE_RESP_FAILURE,
+};
+
+/**
+ * struct iommu_page_response - Generic page response information
+ * @version: API version of this structure
+ * @flags: encodes whether the corresponding fields are valid
+ * (IOMMU_FAULT_PAGE_RESPONSE_* values)
+ * @pasid: Process Address Space ID
+ * @grpid: Page Request Group Index
+ * @code: response code from &enum iommu_page_response_code
+ */
+struct iommu_page_response {
+#define IOMMU_PAGE_RESP_VERSION_1 1
+ __u32 version;
+#define IOMMU_PAGE_RESP_PASID_VALID (1 << 0)
+ __u32 flags;
+ __u32 pasid;
+ __u32 grpid;
+ __u32 code;
+};
+
#endif /* _UAPI_IOMMU_H */
--
2.21.0

2019-06-03 15:00:27

by Jean-Philippe Brucker

[permalink] [raw]
Subject: [PATCH v2 3/4] iommu: Introduce device fault report API

From: Jacob Pan <[email protected]>

Traditionally, device specific faults are detected and handled within
their own device drivers. When IOMMU is enabled, faults such as DMA
related transactions are detected by IOMMU. There is no generic
reporting mechanism to report faults back to the in-kernel device
driver or the guest OS in case of assigned devices.

This patch introduces a registration API for device specific fault
handlers. This differs from the existing iommu_set_fault_handler/
report_iommu_fault infrastructures in several ways:
- it allows to report more sophisticated fault events (both
unrecoverable faults and page request faults) due to the nature
of the iommu_fault struct
- it is device specific and not domain specific.

The current iommu_report_device_fault() implementation only handles
the "shoot and forget" unrecoverable fault case. Handling of page
request faults or stalled faults will come later.

Signed-off-by: Jacob Pan <[email protected]>
Signed-off-by: Ashok Raj <[email protected]>
Signed-off-by: Jean-Philippe Brucker <[email protected]>
Signed-off-by: Eric Auger <[email protected]>
---
drivers/iommu/iommu.c | 146 +++++++++++++++++++++++++++++++++++++++++-
include/linux/iommu.h | 29 +++++++++
2 files changed, 172 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 67ee6623f9b2..8037a3f07f07 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -107,15 +107,43 @@ void iommu_device_unregister(struct iommu_device *iommu)
spin_unlock(&iommu_device_lock);
}

+static struct iommu_param *iommu_get_dev_param(struct device *dev)
+{
+ struct iommu_param *param = dev->iommu_param;
+
+ if (param)
+ return param;
+
+ param = kzalloc(sizeof(*param), GFP_KERNEL);
+ if (!param)
+ return NULL;
+
+ mutex_init(&param->lock);
+ dev->iommu_param = param;
+ return param;
+}
+
+static void iommu_free_dev_param(struct device *dev)
+{
+ kfree(dev->iommu_param);
+ dev->iommu_param = NULL;
+}
+
int iommu_probe_device(struct device *dev)
{
const struct iommu_ops *ops = dev->bus->iommu_ops;
- int ret = -EINVAL;
+ int ret;

WARN_ON(dev->iommu_group);
+ if (!ops)
+ return -EINVAL;

- if (ops)
- ret = ops->add_device(dev);
+ if (!iommu_get_dev_param(dev))
+ return -ENOMEM;
+
+ ret = ops->add_device(dev);
+ if (ret)
+ iommu_free_dev_param(dev);

return ret;
}
@@ -126,6 +154,8 @@ void iommu_release_device(struct device *dev)

if (dev->iommu_group)
ops->remove_device(dev);
+
+ iommu_free_dev_param(dev);
}

static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
@@ -854,6 +884,116 @@ int iommu_group_unregister_notifier(struct iommu_group *group,
}
EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);

+/**
+ * iommu_register_device_fault_handler() - Register a device fault handler
+ * @dev: the device
+ * @handler: the fault handler
+ * @data: private data passed as argument to the handler
+ *
+ * When an IOMMU fault event is received, this handler gets called with the
+ * fault event and data as argument. The handler should return 0 on success.
+ *
+ * Return 0 if the fault handler was installed successfully, or an error.
+ */
+int iommu_register_device_fault_handler(struct device *dev,
+ iommu_dev_fault_handler_t handler,
+ void *data)
+{
+ struct iommu_param *param = dev->iommu_param;
+ int ret = 0;
+
+ if (!param)
+ return -EINVAL;
+
+ mutex_lock(&param->lock);
+ /* Only allow one fault handler registered for each device */
+ if (param->fault_param) {
+ ret = -EBUSY;
+ goto done_unlock;
+ }
+
+ get_device(dev);
+ param->fault_param = kzalloc(sizeof(*param->fault_param), GFP_KERNEL);
+ if (!param->fault_param) {
+ put_device(dev);
+ ret = -ENOMEM;
+ goto done_unlock;
+ }
+ param->fault_param->handler = handler;
+ param->fault_param->data = data;
+
+done_unlock:
+ mutex_unlock(&param->lock);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_register_device_fault_handler);
+
+/**
+ * iommu_unregister_device_fault_handler() - Unregister the device fault handler
+ * @dev: the device
+ *
+ * Remove the device fault handler installed with
+ * iommu_register_device_fault_handler().
+ *
+ * Return 0 on success, or an error.
+ */
+int iommu_unregister_device_fault_handler(struct device *dev)
+{
+ struct iommu_param *param = dev->iommu_param;
+ int ret = 0;
+
+ if (!param)
+ return -EINVAL;
+
+ mutex_lock(&param->lock);
+
+ if (!param->fault_param)
+ goto unlock;
+
+ kfree(param->fault_param);
+ param->fault_param = NULL;
+ put_device(dev);
+unlock:
+ mutex_unlock(&param->lock);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_unregister_device_fault_handler);
+
+/**
+ * iommu_report_device_fault() - Report fault event to device driver
+ * @dev: the device
+ * @evt: fault event data
+ *
+ * Called by IOMMU drivers when a fault is detected, typically in a threaded IRQ
+ * handler.
+ *
+ * Return 0 on success, or an error.
+ */
+int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
+{
+ struct iommu_param *param = dev->iommu_param;
+ struct iommu_fault_param *fparam;
+ int ret = 0;
+
+ if (!param || !evt)
+ return -EINVAL;
+
+ /* we only report device fault if there is a handler registered */
+ mutex_lock(&param->lock);
+ fparam = param->fault_param;
+ if (!fparam || !fparam->handler) {
+ ret = -EINVAL;
+ goto done_unlock;
+ }
+ ret = fparam->handler(&evt->fault, fparam->data);
+done_unlock:
+ mutex_unlock(&param->lock);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_report_device_fault);
+
/**
* iommu_group_id - Return ID for a group
* @group: the group to ID
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 2b05056d5fa7..3e783f5bf472 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -336,6 +336,7 @@ struct iommu_fault_param {
* struct iommu_fwspec *iommu_fwspec;
*/
struct iommu_param {
+ struct mutex lock;
struct iommu_fault_param *fault_param;
};

@@ -428,6 +429,15 @@ extern int iommu_group_register_notifier(struct iommu_group *group,
struct notifier_block *nb);
extern int iommu_group_unregister_notifier(struct iommu_group *group,
struct notifier_block *nb);
+extern int iommu_register_device_fault_handler(struct device *dev,
+ iommu_dev_fault_handler_t handler,
+ void *data);
+
+extern int iommu_unregister_device_fault_handler(struct device *dev);
+
+extern int iommu_report_device_fault(struct device *dev,
+ struct iommu_fault_event *evt);
+
extern int iommu_group_id(struct iommu_group *group);
extern struct iommu_group *iommu_group_get_for_dev(struct device *dev);
extern struct iommu_domain *iommu_group_default_domain(struct iommu_group *);
@@ -736,6 +746,25 @@ static inline int iommu_group_unregister_notifier(struct iommu_group *group,
return 0;
}

+static inline
+int iommu_register_device_fault_handler(struct device *dev,
+ iommu_dev_fault_handler_t handler,
+ void *data)
+{
+ return -ENODEV;
+}
+
+static inline int iommu_unregister_device_fault_handler(struct device *dev)
+{
+ return 0;
+}
+
+static inline
+int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
+{
+ return -ENODEV;
+}
+
static inline int iommu_group_id(struct iommu_group *group)
{
return -ENODEV;
--
2.21.0

2019-06-03 15:00:32

by Jean-Philippe Brucker

[permalink] [raw]
Subject: [PATCH v2 1/4] driver core: Add per device iommu param

From: Jacob Pan <[email protected]>

DMA faults can be detected by IOMMU at device level. Adding a pointer
to struct device allows IOMMU subsystem to report relevant faults
back to the device driver for further handling.
For direct assigned device (or user space drivers), guest OS holds
responsibility to handle and respond per device IOMMU fault.
Therefore we need fault reporting mechanism to propagate faults beyond
IOMMU subsystem.

There are two other IOMMU data pointers under struct device today, here
we introduce iommu_param as a parent pointer such that all device IOMMU
data can be consolidated here. The idea was suggested here by Greg KH
and Joerg. The name iommu_param is chosen here since iommu_data has been
used.

Suggested-by: Greg Kroah-Hartman <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Jacob Pan <[email protected]>
Link: https://lkml.org/lkml/2017/10/6/81
---
include/linux/device.h | 3 +++
1 file changed, 3 insertions(+)

diff --git a/include/linux/device.h b/include/linux/device.h
index e85264fb6616..f0a975abd6e9 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -42,6 +42,7 @@ struct iommu_ops;
struct iommu_group;
struct iommu_fwspec;
struct dev_pin_info;
+struct iommu_param;

struct bus_attribute {
struct attribute attr;
@@ -959,6 +960,7 @@ struct dev_links_info {
* device (i.e. the bus driver that discovered the device).
* @iommu_group: IOMMU group the device belongs to.
* @iommu_fwspec: IOMMU-specific properties supplied by firmware.
+ * @iommu_param: Per device generic IOMMU runtime data
*
* @offline_disabled: If set, the device is permanently online.
* @offline: Set after successful invocation of bus type's .offline().
@@ -1052,6 +1054,7 @@ struct device {
void (*release)(struct device *dev);
struct iommu_group *iommu_group;
struct iommu_fwspec *iommu_fwspec;
+ struct iommu_param *iommu_param;

bool offline_disabled:1;
bool offline:1;
--
2.21.0

2019-06-03 17:44:00

by Jean-Philippe Brucker

[permalink] [raw]
Subject: [PATCH v2 2/4] iommu: Introduce device fault data

From: Jacob Pan <[email protected]>

Device faults detected by IOMMU can be reported outside the IOMMU
subsystem for further processing. This patch introduces
a generic device fault data structure.

The fault can be either an unrecoverable fault or a page request,
also referred to as a recoverable fault.

We only care about non internal faults that are likely to be reported
to an external subsystem.

Signed-off-by: Jacob Pan <[email protected]>
Signed-off-by: Jean-Philippe Brucker <[email protected]>
Signed-off-by: Liu, Yi L <[email protected]>
Signed-off-by: Ashok Raj <[email protected]>
Signed-off-by: Eric Auger <[email protected]>
---
include/linux/iommu.h | 39 ++++++++++++
include/uapi/linux/iommu.h | 118 +++++++++++++++++++++++++++++++++++++
2 files changed, 157 insertions(+)
create mode 100644 include/uapi/linux/iommu.h

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index a815cf6f6f47..2b05056d5fa7 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -25,6 +25,7 @@
#include <linux/errno.h>
#include <linux/err.h>
#include <linux/of.h>
+#include <uapi/linux/iommu.h>

#define IOMMU_READ (1 << 0)
#define IOMMU_WRITE (1 << 1)
@@ -49,6 +50,7 @@ struct device;
struct iommu_domain;
struct notifier_block;
struct iommu_sva;
+struct iommu_fault_event;

/* iommu fault flags */
#define IOMMU_FAULT_READ 0x0
@@ -58,6 +60,7 @@ typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
struct device *, unsigned long, int, void *);
typedef int (*iommu_mm_exit_handler_t)(struct device *dev, struct iommu_sva *,
void *);
+typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault *, void *);

struct iommu_domain_geometry {
dma_addr_t aperture_start; /* First address that can be mapped */
@@ -301,6 +304,41 @@ struct iommu_device {
struct device *dev;
};

+/**
+ * struct iommu_fault_event - Generic fault event
+ *
+ * Can represent recoverable faults such as a page requests or
+ * unrecoverable faults such as DMA or IRQ remapping faults.
+ *
+ * @fault: fault descriptor
+ */
+struct iommu_fault_event {
+ struct iommu_fault fault;
+};
+
+/**
+ * struct iommu_fault_param - per-device IOMMU fault data
+ * @handler: Callback function to handle IOMMU faults at device level
+ * @data: handler private data
+ */
+struct iommu_fault_param {
+ iommu_dev_fault_handler_t handler;
+ void *data;
+};
+
+/**
+ * struct iommu_param - collection of per-device IOMMU data
+ *
+ * @fault_param: IOMMU detected device fault reporting data
+ *
+ * TODO: migrate other per device data pointers under iommu_dev_data, e.g.
+ * struct iommu_group *iommu_group;
+ * struct iommu_fwspec *iommu_fwspec;
+ */
+struct iommu_param {
+ struct iommu_fault_param *fault_param;
+};
+
int iommu_device_register(struct iommu_device *iommu);
void iommu_device_unregister(struct iommu_device *iommu);
int iommu_device_sysfs_add(struct iommu_device *iommu,
@@ -504,6 +542,7 @@ struct iommu_ops {};
struct iommu_group {};
struct iommu_fwspec {};
struct iommu_device {};
+struct iommu_fault_param {};

static inline bool iommu_present(struct bus_type *bus)
{
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
new file mode 100644
index 000000000000..796402174d6c
--- /dev/null
+++ b/include/uapi/linux/iommu.h
@@ -0,0 +1,118 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * IOMMU user API definitions
+ */
+
+#ifndef _UAPI_IOMMU_H
+#define _UAPI_IOMMU_H
+
+#include <linux/types.h>
+
+#define IOMMU_FAULT_PERM_READ (1 << 0) /* read */
+#define IOMMU_FAULT_PERM_WRITE (1 << 1) /* write */
+#define IOMMU_FAULT_PERM_EXEC (1 << 2) /* exec */
+#define IOMMU_FAULT_PERM_PRIV (1 << 3) /* privileged */
+
+/* Generic fault types, can be expanded IRQ remapping fault */
+enum iommu_fault_type {
+ IOMMU_FAULT_DMA_UNRECOV = 1, /* unrecoverable fault */
+ IOMMU_FAULT_PAGE_REQ, /* page request fault */
+};
+
+enum iommu_fault_reason {
+ IOMMU_FAULT_REASON_UNKNOWN = 0,
+
+ /* Could not access the PASID table (fetch caused external abort) */
+ IOMMU_FAULT_REASON_PASID_FETCH,
+
+ /* PASID entry is invalid or has configuration errors */
+ IOMMU_FAULT_REASON_BAD_PASID_ENTRY,
+
+ /*
+ * PASID is out of range (e.g. exceeds the maximum PASID
+ * supported by the IOMMU) or disabled.
+ */
+ IOMMU_FAULT_REASON_PASID_INVALID,
+
+ /*
+ * An external abort occurred fetching (or updating) a translation
+ * table descriptor
+ */
+ IOMMU_FAULT_REASON_WALK_EABT,
+
+ /*
+ * Could not access the page table entry (Bad address),
+ * actual translation fault
+ */
+ IOMMU_FAULT_REASON_PTE_FETCH,
+
+ /* Protection flag check failed */
+ IOMMU_FAULT_REASON_PERMISSION,
+
+ /* access flag check failed */
+ IOMMU_FAULT_REASON_ACCESS,
+
+ /* Output address of a translation stage caused Address Size fault */
+ IOMMU_FAULT_REASON_OOR_ADDRESS,
+};
+
+/**
+ * struct iommu_fault_unrecoverable - Unrecoverable fault data
+ * @reason: reason of the fault, from &enum iommu_fault_reason
+ * @flags: parameters of this fault (IOMMU_FAULT_UNRECOV_* values)
+ * @pasid: Process Address Space ID
+ * @perm: requested permission access using by the incoming transaction
+ * (IOMMU_FAULT_PERM_* values)
+ * @addr: offending page address
+ * @fetch_addr: address that caused a fetch abort, if any
+ */
+struct iommu_fault_unrecoverable {
+ __u32 reason;
+#define IOMMU_FAULT_UNRECOV_PASID_VALID (1 << 0)
+#define IOMMU_FAULT_UNRECOV_ADDR_VALID (1 << 1)
+#define IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID (1 << 2)
+ __u32 flags;
+ __u32 pasid;
+ __u32 perm;
+ __u64 addr;
+ __u64 fetch_addr;
+};
+
+/**
+ * struct iommu_fault_page_request - Page Request data
+ * @flags: encodes whether the corresponding fields are valid and whether this
+ * is the last page in group (IOMMU_FAULT_PAGE_REQUEST_* values)
+ * @pasid: Process Address Space ID
+ * @grpid: Page Request Group Index
+ * @perm: requested page permissions (IOMMU_FAULT_PERM_* values)
+ * @addr: page address
+ * @private_data: device-specific private information
+ */
+struct iommu_fault_page_request {
+#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID (1 << 0)
+#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE (1 << 1)
+#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA (1 << 2)
+ __u32 flags;
+ __u32 pasid;
+ __u32 grpid;
+ __u32 perm;
+ __u64 addr;
+ __u64 private_data[2];
+};
+
+/**
+ * struct iommu_fault - Generic fault data
+ * @type: fault type from &enum iommu_fault_type
+ * @padding: reserved for future use (should be zero)
+ * @event: fault event, when @type is %IOMMU_FAULT_DMA_UNRECOV
+ * @prm: Page Request message, when @type is %IOMMU_FAULT_PAGE_REQ
+ */
+struct iommu_fault {
+ __u32 type;
+ __u32 padding;
+ union {
+ struct iommu_fault_unrecoverable event;
+ struct iommu_fault_page_request prm;
+ };
+};
+#endif /* _UAPI_IOMMU_H */
--
2.21.0

2019-06-03 22:07:05

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v2 2/4] iommu: Introduce device fault data

On Mon, 3 Jun 2019 15:57:47 +0100
Jean-Philippe Brucker <[email protected]> wrote:

> +/**
> + * struct iommu_fault_page_request - Page Request data
> + * @flags: encodes whether the corresponding fields are valid and
> whether this
> + * is the last page in group (IOMMU_FAULT_PAGE_REQUEST_*
> values)
> + * @pasid: Process Address Space ID
> + * @grpid: Page Request Group Index
> + * @perm: requested page permissions (IOMMU_FAULT_PERM_* values)
> + * @addr: page address
> + * @private_data: device-specific private information
> + */
> +struct iommu_fault_page_request {
> +#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID (1 << 0)
> +#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE (1 << 1)
> +#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA (1 << 2)
> + __u32 flags;
> + __u32 pasid;
> + __u32 grpid;
> + __u32 perm;
> + __u64 addr;
> + __u64 private_data[2];
> +};
> +

Just a thought, for non-identity G-H PASID management. We could pass on
guest PASID in PRQ to save a lookup in QEMU. In this case, QEMU
allocate a GPASID for vIOMMU then a host PASID for pIOMMU. QEMU has a
G->H lookup. When PRQ comes in to the pIOMMU with HPASID, IOMMU driver
can retrieve GPASID from the bind data then report to the guest via
VFIO. In this case QEMU does not need to do a H->G PASID lookup.

Should we add a gpasid field here? or we can add a flag and field
later, up to you.

Thanks,

Jacob

2019-06-03 22:18:03

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] iommu: Add device fault reporting API

On Mon, 3 Jun 2019 15:57:45 +0100
Jean-Philippe Brucker <[email protected]> wrote:

> Allow device drivers and VFIO to get notified on IOMMU translation
> fault, and handle recoverable faults (PCI PRI). Several series require
> this API (Intel VT-d and Arm SMMUv3 nested support, as well as the
> generic host SVA implementation).
>
> Changes since v1 [1]:
> * Allocate iommu_param earlier, in iommu_probe_device().
> * Pass struct iommu_fault to fault handlers, instead of the
> iommu_fault_event wrapper.
> * Removed unused iommu_fault_event::iommu_private.
> * Removed unnecessary iommu_page_response::addr.
> * Added iommu_page_response::version, which would allow to introduce a
> new incompatible iommu_page_response structure (as opposed to just
> adding a flag + field).
>
> [1] [PATCH 0/4] iommu: Add device fault reporting API
> https://lore.kernel.org/lkml/[email protected]/
>
> Jacob Pan (3):
> driver core: Add per device iommu param
> iommu: Introduce device fault data
> iommu: Introduce device fault report API
>
> Jean-Philippe Brucker (1):
> iommu: Add recoverable fault reporting
>
This interface meet the need for vt-d, just one more comment on 2/4. Do
you want to add Co-developed-by you for the three patches from me?

Thanks,

Jacob

> drivers/iommu/iommu.c | 236
> ++++++++++++++++++++++++++++++++++++- include/linux/device.h |
> 3 + include/linux/iommu.h | 87 ++++++++++++++
> include/uapi/linux/iommu.h | 153 ++++++++++++++++++++++++
> 4 files changed, 476 insertions(+), 3 deletions(-)
> create mode 100644 include/uapi/linux/iommu.h
>

2019-06-05 08:55:26

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v2 2/4] iommu: Introduce device fault data

> From: Jacob Pan
> Sent: Tuesday, June 4, 2019 6:09 AM
>
> On Mon, 3 Jun 2019 15:57:47 +0100
> Jean-Philippe Brucker <[email protected]> wrote:
>
> > +/**
> > + * struct iommu_fault_page_request - Page Request data
> > + * @flags: encodes whether the corresponding fields are valid and
> > whether this
> > + * is the last page in group (IOMMU_FAULT_PAGE_REQUEST_*
> > values)
> > + * @pasid: Process Address Space ID
> > + * @grpid: Page Request Group Index
> > + * @perm: requested page permissions (IOMMU_FAULT_PERM_* values)
> > + * @addr: page address
> > + * @private_data: device-specific private information
> > + */
> > +struct iommu_fault_page_request {
> > +#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID (1 << 0)
> > +#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE (1 << 1)
> > +#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA (1 << 2)
> > + __u32 flags;
> > + __u32 pasid;
> > + __u32 grpid;
> > + __u32 perm;
> > + __u64 addr;
> > + __u64 private_data[2];
> > +};
> > +
>
> Just a thought, for non-identity G-H PASID management. We could pass on
> guest PASID in PRQ to save a lookup in QEMU. In this case, QEMU
> allocate a GPASID for vIOMMU then a host PASID for pIOMMU. QEMU has a
> G->H lookup. When PRQ comes in to the pIOMMU with HPASID, IOMMU
> driver
> can retrieve GPASID from the bind data then report to the guest via
> VFIO. In this case QEMU does not need to do a H->G PASID lookup.
>
> Should we add a gpasid field here? or we can add a flag and field
> later, up to you.
>

Can private_data serve this purpose? It's better not introducing
gpasid awareness within host IOMMU driver. It is just a user-level
data associated with a PASID when binding happens. Kernel doesn't
care the actual meaning, simply record it and then return back to user
space later upon device fault. Qemu interprets the meaning as gpasid
in its own context. otherwise usages may use it for other purpose.

Thanks
Kevin

2019-06-05 11:27:18

by Jean-Philippe Brucker

[permalink] [raw]
Subject: Re: [PATCH v2 2/4] iommu: Introduce device fault data

On 05/06/2019 09:51, Tian, Kevin wrote:
>> From: Jacob Pan
>> Sent: Tuesday, June 4, 2019 6:09 AM
>>
>> On Mon, 3 Jun 2019 15:57:47 +0100
>> Jean-Philippe Brucker <[email protected]> wrote:
>>
>>> +/**
>>> + * struct iommu_fault_page_request - Page Request data
>>> + * @flags: encodes whether the corresponding fields are valid and
>>> whether this
>>> + * is the last page in group (IOMMU_FAULT_PAGE_REQUEST_*
>>> values)
>>> + * @pasid: Process Address Space ID
>>> + * @grpid: Page Request Group Index
>>> + * @perm: requested page permissions (IOMMU_FAULT_PERM_* values)
>>> + * @addr: page address
>>> + * @private_data: device-specific private information
>>> + */
>>> +struct iommu_fault_page_request {
>>> +#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID (1 << 0)
>>> +#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE (1 << 1)
>>> +#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA (1 << 2)
>>> + __u32 flags;
>>> + __u32 pasid;
>>> + __u32 grpid;
>>> + __u32 perm;
>>> + __u64 addr;
>>> + __u64 private_data[2];
>>> +};
>>> +
>>
>> Just a thought, for non-identity G-H PASID management. We could pass on
>> guest PASID in PRQ to save a lookup in QEMU. In this case, QEMU
>> allocate a GPASID for vIOMMU then a host PASID for pIOMMU. QEMU has a
>> G->H lookup. When PRQ comes in to the pIOMMU with HPASID, IOMMU
>> driver
>> can retrieve GPASID from the bind data then report to the guest via
>> VFIO. In this case QEMU does not need to do a H->G PASID lookup.
>>
>> Should we add a gpasid field here? or we can add a flag and field
>> later, up to you.
>>
>
> Can private_data serve this purpose?

Isn't private_data already used for VT-d's Private Data field?

> It's better not introducing
> gpasid awareness within host IOMMU driver. It is just a user-level
> data associated with a PASID when binding happens. Kernel doesn't
> care the actual meaning, simply record it and then return back to user
> space later upon device fault. Qemu interprets the meaning as gpasid
> in its own context. otherwise usages may use it for other purpose.

Regarding a gpasid field I don't mind either way, but extending the
iommu_fault structure later won't be completely straightforward so we
could add some padding now.

Userspace negotiate the iommu_fault struct format with VFIO, before
allocating a circular buffer of N fault structures
(https://lore.kernel.org/lkml/[email protected]/).
So adding new fields requires introducing a new ABI version and a struct
iommu_fault_v2. That may be OK for disruptive changes, but just adding a
new field indicated by a flag shouldn't have to be that complicated.

How about setting the iommu_fault structure to 128 bytes?

struct iommu_fault {
__u32 type;
__u32 padding;
union {
struct iommu_fault_unrecoverable event;
struct iommu_fault_page_request prm;
__u8 padding2[120];
};
};

Given that @prm is currently 40 bytes and @event 32 bytes, the padding
allows either of them to grow 10 new 64-bit fields (or 20 new 32-bit
fields, which is still representable with new flags) before we have to
upgrade the ABI version.

A 4kB and a 64kB queue can hold respectively:

* 85 and 1365 records when iommu_fault is 48 bytes (current format).
* 64 and 1024 records when iommu_fault is 64 bytes (but allows to grow
only 2 new 64-bit fields).
* 32 and 512 records when iommu_fault is 128 bytes.

In comparison,
* the SMMU even queue can hold 128 and 2048 events respectively at those
sizes (and is allowed to grow up to 524k entries)
* the SMMU PRI queue can hold 256 and 4096 PR.

But the SMMU queues have to be physically contiguous, whereas our fault
queues are in userspace memory which is less expensive. So 128-byte
records might be reasonable. What do you think?


The iommu_fault_response (patch 4/4) is a bit easier to extend because
it's userspace->kernel and userspace can just declare the size it's
using. I did add a version field in case we run out of flags or want to
change the whole thing, but I think I was being overly cautious and it
might just be a waste of space.

Thanks,
Jean

2019-06-05 11:28:04

by Jean-Philippe Brucker

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] iommu: Add device fault reporting API

On 03/06/2019 22:59, Jacob Pan wrote:
> On Mon, 3 Jun 2019 15:57:45 +0100
> Jean-Philippe Brucker <[email protected]> wrote:
>
>> Allow device drivers and VFIO to get notified on IOMMU translation
>> fault, and handle recoverable faults (PCI PRI). Several series require
>> this API (Intel VT-d and Arm SMMUv3 nested support, as well as the
>> generic host SVA implementation).
>>
>> Changes since v1 [1]:
>> * Allocate iommu_param earlier, in iommu_probe_device().
>> * Pass struct iommu_fault to fault handlers, instead of the
>> iommu_fault_event wrapper.
>> * Removed unused iommu_fault_event::iommu_private.
>> * Removed unnecessary iommu_page_response::addr.
>> * Added iommu_page_response::version, which would allow to introduce a
>> new incompatible iommu_page_response structure (as opposed to just
>> adding a flag + field).
>>
>> [1] [PATCH 0/4] iommu: Add device fault reporting API
>> https://lore.kernel.org/lkml/[email protected]/
>>
>> Jacob Pan (3):
>> driver core: Add per device iommu param
>> iommu: Introduce device fault data
>> iommu: Introduce device fault report API
>>
>> Jean-Philippe Brucker (1):
>> iommu: Add recoverable fault reporting
>>
> This interface meet the need for vt-d, just one more comment on 2/4. Do
> you want to add Co-developed-by you for the three patches from me?

I'm fine without it, I don't think it adds much to the Signed-off-by,
which is required

Thanks,
Jean

2019-06-05 17:36:21

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v2 2/4] iommu: Introduce device fault data

On Wed, 5 Jun 2019 08:51:45 +0000
"Tian, Kevin" <[email protected]> wrote:

> > From: Jacob Pan
> > Sent: Tuesday, June 4, 2019 6:09 AM
> >
> > On Mon, 3 Jun 2019 15:57:47 +0100
> > Jean-Philippe Brucker <[email protected]> wrote:
> >
> > > +/**
> > > + * struct iommu_fault_page_request - Page Request data
> > > + * @flags: encodes whether the corresponding fields are valid and
> > > whether this
> > > + * is the last page in group (IOMMU_FAULT_PAGE_REQUEST_*
> > > values)
> > > + * @pasid: Process Address Space ID
> > > + * @grpid: Page Request Group Index
> > > + * @perm: requested page permissions (IOMMU_FAULT_PERM_* values)
> > > + * @addr: page address
> > > + * @private_data: device-specific private information
> > > + */
> > > +struct iommu_fault_page_request {
> > > +#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID (1 << 0)
> > > +#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE (1 << 1)
> > > +#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA (1 << 2)
> > > + __u32 flags;
> > > + __u32 pasid;
> > > + __u32 grpid;
> > > + __u32 perm;
> > > + __u64 addr;
> > > + __u64 private_data[2];
> > > +};
> > > +
> >
> > Just a thought, for non-identity G-H PASID management. We could
> > pass on guest PASID in PRQ to save a lookup in QEMU. In this case,
> > QEMU allocate a GPASID for vIOMMU then a host PASID for pIOMMU.
> > QEMU has a G->H lookup. When PRQ comes in to the pIOMMU with
> > HPASID, IOMMU driver
> > can retrieve GPASID from the bind data then report to the guest via
> > VFIO. In this case QEMU does not need to do a H->G PASID lookup.
> >
> > Should we add a gpasid field here? or we can add a flag and field
> > later, up to you.
> >
>
> Can private_data serve this purpose? It's better not introducing
> gpasid awareness within host IOMMU driver. It is just a user-level
> data associated with a PASID when binding happens. Kernel doesn't
> care the actual meaning, simply record it and then return back to
> user space later upon device fault. Qemu interprets the meaning as
> gpasid in its own context. otherwise usages may use it for other
> purpose.
>
private_data was intended for device PRQ with private data, part of the
VT-d PRQ descriptor. For vSVA, we can withhold private_data in the host
then respond back when page response from the guest matches pending PRQ
with the data withheld. But for in-kernel PRQ reporting, private data
still might be passed on to any driver who wants to process the PRQ. So
we can't re-purpose it.

But for in-kernel VDCM driver, it needs a lookup from guest PASID to
host PASID. I thought you wanted to have IOMMU driver provide such
service since the knowledge of H-G pasid can be established during
bind_gpasid time. In that sense, we _do_ have gpasid awareness.

> Thanks
> Kevin

[Jacob Pan]

2019-06-05 21:57:51

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v2 2/4] iommu: Introduce device fault data

On Wed, 5 Jun 2019 12:24:09 +0100
Jean-Philippe Brucker <[email protected]> wrote:

> On 05/06/2019 09:51, Tian, Kevin wrote:
> >> From: Jacob Pan
> >> Sent: Tuesday, June 4, 2019 6:09 AM
> >>
> >> On Mon, 3 Jun 2019 15:57:47 +0100
> >> Jean-Philippe Brucker <[email protected]> wrote:
> >>
> >>> +/**
> >>> + * struct iommu_fault_page_request - Page Request data
> >>> + * @flags: encodes whether the corresponding fields are valid and
> >>> whether this
> >>> + * is the last page in group (IOMMU_FAULT_PAGE_REQUEST_*
> >>> values)
> >>> + * @pasid: Process Address Space ID
> >>> + * @grpid: Page Request Group Index
> >>> + * @perm: requested page permissions (IOMMU_FAULT_PERM_* values)
> >>> + * @addr: page address
> >>> + * @private_data: device-specific private information
> >>> + */
> >>> +struct iommu_fault_page_request {
> >>> +#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID (1 << 0)
> >>> +#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE (1 << 1)
> >>> +#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA (1 << 2)
> >>> + __u32 flags;
> >>> + __u32 pasid;
> >>> + __u32 grpid;
> >>> + __u32 perm;
> >>> + __u64 addr;
> >>> + __u64 private_data[2];
> >>> +};
> >>> +
> >>
> >> Just a thought, for non-identity G-H PASID management. We could
> >> pass on guest PASID in PRQ to save a lookup in QEMU. In this case,
> >> QEMU allocate a GPASID for vIOMMU then a host PASID for pIOMMU.
> >> QEMU has a G->H lookup. When PRQ comes in to the pIOMMU with
> >> HPASID, IOMMU driver
> >> can retrieve GPASID from the bind data then report to the guest via
> >> VFIO. In this case QEMU does not need to do a H->G PASID lookup.
> >>
> >> Should we add a gpasid field here? or we can add a flag and field
> >> later, up to you.
> >>
> >
> > Can private_data serve this purpose?
>
> Isn't private_data already used for VT-d's Private Data field?
>
yes, as part of the PRQ. please see my explanation in the previous
email.
> > It's better not introducing
> > gpasid awareness within host IOMMU driver. It is just a user-level
> > data associated with a PASID when binding happens. Kernel doesn't
> > care the actual meaning, simply record it and then return back to
> > user space later upon device fault. Qemu interprets the meaning as
> > gpasid in its own context. otherwise usages may use it for other
> > purpose.
>
> Regarding a gpasid field I don't mind either way, but extending the
> iommu_fault structure later won't be completely straightforward so we
> could add some padding now.
>
> Userspace negotiate the iommu_fault struct format with VFIO, before
> allocating a circular buffer of N fault structures
> ().
> So adding new fields requires introducing a new ABI version and a
> struct iommu_fault_v2. That may be OK for disruptive changes, but
> just adding a new field indicated by a flag shouldn't have to be that
> complicated.
>
> How about setting the iommu_fault structure to 128 bytes?
>
> struct iommu_fault {
> __u32 type;
> __u32 padding;
> union {
> struct iommu_fault_unrecoverable event;
> struct iommu_fault_page_request prm;
> __u8 padding2[120];
> };
> };
>
> Given that @prm is currently 40 bytes and @event 32 bytes, the padding
> allows either of them to grow 10 new 64-bit fields (or 20 new 32-bit
> fields, which is still representable with new flags) before we have to
> upgrade the ABI version.
>
> A 4kB and a 64kB queue can hold respectively:
>
> * 85 and 1365 records when iommu_fault is 48 bytes (current format).
> * 64 and 1024 records when iommu_fault is 64 bytes (but allows to grow
> only 2 new 64-bit fields).
> * 32 and 512 records when iommu_fault is 128 bytes.
>
> In comparison,
> * the SMMU even queue can hold 128 and 2048 events respectively at
> those sizes (and is allowed to grow up to 524k entries)
> * the SMMU PRI queue can hold 256 and 4096 PR.
>
> But the SMMU queues have to be physically contiguous, whereas our
> fault queues are in userspace memory which is less expensive. So
> 128-byte records might be reasonable. What do you think?
>
I think though 128-byte is large enough for any future extension but
64B might be good enough and it is a cache line. PCI page request msg
is only 16B :)

VT-d currently uses one 4K page for PRQ, holds 128 records of PRQ
descriptors. This can grow to 16K entries per spec. That is per IOMMU.
The user fault queue here is per device. So we do have to be frugal
about it since it will support mdev at per PASID level at some point?

I have to look into Eric's patchset on how he handles queue full in the
producer. If we go with 128B size in iommu_fault and 4KB size queue
(32 entries as in your table), VT-d PRQ size of 128 entries can
potentially cause queue full. We have to handle this VFIO queue full
differently than the IOMMU queue full in that we only need to discard
PRQ for one device. (Whereas IOMMU queue full has to clear out all).

Anyway, I think 64B should be enough but 128B is fine too. We have to
deal with queue full anyway. But queue full is expensive so we should
try to avoid.

>
> The iommu_fault_response (patch 4/4) is a bit easier to extend because
> it's userspace->kernel and userspace can just declare the size it's
> using. I did add a version field in case we run out of flags or want
> to change the whole thing, but I think I was being overly cautious
> and it might just be a waste of space.
>
> Thanks,
> Jean

[Jacob Pan]

2019-06-06 06:57:57

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v2 2/4] iommu: Introduce device fault data

> From: Jacob Pan [mailto:[email protected]]
> Sent: Thursday, June 6, 2019 1:38 AM
>
> On Wed, 5 Jun 2019 08:51:45 +0000
> "Tian, Kevin" <[email protected]> wrote:
>
> > > From: Jacob Pan
> > > Sent: Tuesday, June 4, 2019 6:09 AM
> > >
> > > On Mon, 3 Jun 2019 15:57:47 +0100
> > > Jean-Philippe Brucker <[email protected]> wrote:
> > >
> > > > +/**
> > > > + * struct iommu_fault_page_request - Page Request data
> > > > + * @flags: encodes whether the corresponding fields are valid and
> > > > whether this
> > > > + * is the last page in group (IOMMU_FAULT_PAGE_REQUEST_*
> > > > values)
> > > > + * @pasid: Process Address Space ID
> > > > + * @grpid: Page Request Group Index
> > > > + * @perm: requested page permissions (IOMMU_FAULT_PERM_*
> values)
> > > > + * @addr: page address
> > > > + * @private_data: device-specific private information
> > > > + */
> > > > +struct iommu_fault_page_request {
> > > > +#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID (1 << 0)
> > > > +#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE (1 << 1)
> > > > +#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA (1 << 2)
> > > > + __u32 flags;
> > > > + __u32 pasid;
> > > > + __u32 grpid;
> > > > + __u32 perm;
> > > > + __u64 addr;
> > > > + __u64 private_data[2];
> > > > +};
> > > > +
> > >
> > > Just a thought, for non-identity G-H PASID management. We could
> > > pass on guest PASID in PRQ to save a lookup in QEMU. In this case,
> > > QEMU allocate a GPASID for vIOMMU then a host PASID for pIOMMU.
> > > QEMU has a G->H lookup. When PRQ comes in to the pIOMMU with
> > > HPASID, IOMMU driver
> > > can retrieve GPASID from the bind data then report to the guest via
> > > VFIO. In this case QEMU does not need to do a H->G PASID lookup.
> > >
> > > Should we add a gpasid field here? or we can add a flag and field
> > > later, up to you.
> > >
> >
> > Can private_data serve this purpose? It's better not introducing
> > gpasid awareness within host IOMMU driver. It is just a user-level
> > data associated with a PASID when binding happens. Kernel doesn't
> > care the actual meaning, simply record it and then return back to
> > user space later upon device fault. Qemu interprets the meaning as
> > gpasid in its own context. otherwise usages may use it for other
> > purpose.
> >
> private_data was intended for device PRQ with private data, part of the
> VT-d PRQ descriptor. For vSVA, we can withhold private_data in the host
> then respond back when page response from the guest matches pending PRQ
> with the data withheld. But for in-kernel PRQ reporting, private data
> still might be passed on to any driver who wants to process the PRQ. So
> we can't re-purpose it.

sure. I just use it as one example to extend.

>
> But for in-kernel VDCM driver, it needs a lookup from guest PASID to
> host PASID. I thought you wanted to have IOMMU driver provide such
> service since the knowledge of H-G pasid can be established during
> bind_gpasid time. In that sense, we _do_ have gpasid awareness.
>

yes, it makes sense. My original point is that IOMMU driver itself
doesn't need to know the actual meaning of this field (then it may
be reused for different purpose from gpasid), but you are right that
mdev driver in kernel anyway needs to do G-H translation then
explicitly defining it looks reasonable.

Thanks
Kevin

2019-06-12 08:50:41

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] iommu: Add device fault reporting API

On Mon, Jun 03, 2019 at 03:57:45PM +0100, Jean-Philippe Brucker wrote:
> Jacob Pan (3):
> driver core: Add per device iommu param
> iommu: Introduce device fault data
> iommu: Introduce device fault report API
>
> Jean-Philippe Brucker (1):
> iommu: Add recoverable fault reporting
>
> drivers/iommu/iommu.c | 236 ++++++++++++++++++++++++++++++++++++-
> include/linux/device.h | 3 +
> include/linux/iommu.h | 87 ++++++++++++++
> include/uapi/linux/iommu.h | 153 ++++++++++++++++++++++++
> 4 files changed, 476 insertions(+), 3 deletions(-)
> create mode 100644 include/uapi/linux/iommu.h

Applied, thanks.

2019-06-12 17:10:25

by Jean-Philippe Brucker

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] iommu: Add device fault reporting API

On 12/06/2019 09:19, Joerg Roedel wrote:
> On Mon, Jun 03, 2019 at 03:57:45PM +0100, Jean-Philippe Brucker wrote:
>> Jacob Pan (3):
>> driver core: Add per device iommu param
>> iommu: Introduce device fault data
>> iommu: Introduce device fault report API
>>
>> Jean-Philippe Brucker (1):
>> iommu: Add recoverable fault reporting
>>
>> drivers/iommu/iommu.c | 236 ++++++++++++++++++++++++++++++++++++-
>> include/linux/device.h | 3 +
>> include/linux/iommu.h | 87 ++++++++++++++
>> include/uapi/linux/iommu.h | 153 ++++++++++++++++++++++++
>> 4 files changed, 476 insertions(+), 3 deletions(-)
>> create mode 100644 include/uapi/linux/iommu.h
>
> Applied, thanks.

Thanks! As discussed I think we need to add padding into the iommu_fault
structure before this reaches mainline, to make the UAPI easier to
extend in the future. It's already possible to extend but requires
introducing a new ABI version number and support two structures. Adding
some padding would only require introducing new flags. If there is no
objection I'll send a one-line patch bumping the structure size to 64
bytes (currently 48)

Thanks,
Jean

2019-06-12 17:57:45

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] iommu: Add device fault reporting API

On Wed, Jun 12, 2019 at 12:54:51PM +0100, Jean-Philippe Brucker wrote:
> Thanks! As discussed I think we need to add padding into the iommu_fault
> structure before this reaches mainline, to make the UAPI easier to
> extend in the future. It's already possible to extend but requires
> introducing a new ABI version number and support two structures. Adding
> some padding would only require introducing new flags. If there is no
> objection I'll send a one-line patch bumping the structure size to 64
> bytes (currently 48)

Sounds good, please submit the patch.

Regards,

Joerg

2019-06-12 18:57:04

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] iommu: Add device fault reporting API

On Wed, 12 Jun 2019 15:11:43 +0200
Joerg Roedel <[email protected]> wrote:

> On Wed, Jun 12, 2019 at 12:54:51PM +0100, Jean-Philippe Brucker wrote:
> > Thanks! As discussed I think we need to add padding into the
> > iommu_fault structure before this reaches mainline, to make the
> > UAPI easier to extend in the future. It's already possible to
> > extend but requires introducing a new ABI version number and
> > support two structures. Adding some padding would only require
> > introducing new flags. If there is no objection I'll send a
> > one-line patch bumping the structure size to 64 bytes (currently
> > 48)
>
> Sounds good, please submit the patch.
>
Could you also add padding to page response per our discussion here?
https://lkml.org/lkml/2019/6/12/1131

> Regards,
>
> Joerg

[Jacob Pan]