2023-11-27 06:34:56

by Yi Liu

[permalink] [raw]
Subject: [PATCH 0/8] iommufd support pasid attach/replace

PASID (Process Address Space ID) is a PCIe extension to tag the DMA
transactions out of a physical device, and most modern IOMMU hardware
have supported PASID granular address translation. So a PASID-capable
device can be attached to multiple hwpts (a.k.a. domains), each attachment
is tagged with a pasid.

This series first adds a missing iommu API to replace domain for a pasid,
then adds iommufd APIs for device drivers to attach/replace/detach pasid
to/from hwpt per userspace's request, and adds selftest to validate the
iommufd APIs.

pasid attach/replace is mandatory on Intel VT-d given the PASID table
locates in the physical address space hence must be managed by the kernel,
both for supporting vSVA and coming SIOV. But it's optional on ARM/AMD
which allow configuring the PASID/CD table either in host physical address
space or nested on top of an GPA address space. This series only add VT-d
support as the minimal requirement.

Complete code can be found in below link:

https://github.com/yiliu1765/iommufd/tree/iommufd_pasid

Change log:

v1:
- Implemnet iommu_replace_device_pasid() to fall back to the original domain
if this replacement failed (Kevin)
- Add check in do_attach() to check corressponding attach_fn per the pasid value.

rfc: https://lore.kernel.org/linux-iommu/[email protected]/

Regards,
Yi Liu

Kevin Tian (1):
iommufd: Support attach/replace hwpt per pasid

Lu Baolu (2):
iommu: Introduce a replace API for device pasid
iommu/vt-d: Add set_dev_pasid callback for nested domain

Yi Liu (5):
iommufd: replace attach_fn with a structure
iommufd/selftest: Add set_dev_pasid and remove_dev_pasid in mock iommu
iommufd/selftest: Add a helper to get test device
iommufd/selftest: Add test ops to test pasid attach/detach
iommufd/selftest: Add coverage for iommufd pasid attach/detach

drivers/iommu/intel/nested.c | 47 +++++
drivers/iommu/iommu-priv.h | 2 +
drivers/iommu/iommu.c | 82 ++++++--
drivers/iommu/iommufd/Makefile | 1 +
drivers/iommu/iommufd/device.c | 50 +++--
drivers/iommu/iommufd/iommufd_private.h | 23 +++
drivers/iommu/iommufd/iommufd_test.h | 24 +++
drivers/iommu/iommufd/pasid.c | 138 ++++++++++++++
drivers/iommu/iommufd/selftest.c | 176 ++++++++++++++++--
include/linux/iommufd.h | 6 +
tools/testing/selftests/iommu/iommufd.c | 172 +++++++++++++++++
.../selftests/iommu/iommufd_fail_nth.c | 28 ++-
tools/testing/selftests/iommu/iommufd_utils.h | 78 ++++++++
13 files changed, 785 insertions(+), 42 deletions(-)
create mode 100644 drivers/iommu/iommufd/pasid.c

--
2.34.1


2023-11-27 06:35:01

by Yi Liu

[permalink] [raw]
Subject: [PATCH 3/8] iommufd: Support attach/replace hwpt per pasid

From: Kevin Tian <[email protected]>

This introduces three APIs for device drivers to manage pasid attach/
replace/detach.

int iommufd_device_pasid_attach(struct iommufd_device *idev,
u32 pasid, u32 *pt_id);
int iommufd_device_pasid_replace(struct iommufd_device *idev,
u32 pasid, u32 *pt_id);
void iommufd_device_pasid_detach(struct iommufd_device *idev,
u32 pasid);

pasid operations have different implications when comparing to device
operations:

- No connection to iommufd_group since pasid is a device capability
and can be enabled only in singleton group;

- no reserved region per pasid otherwise SVA architecture is already
broken (CPU address space doesn't count device reserved regions);

- accordingly no sw_msi trick;

- immediated_attach is not supported, expecting that arm-smmu driver
will already remove that requirement before supporting this pasid
operation. This avoids unnecessary change in iommufd_hw_pagetable_alloc()
to carry the pasid from device.c.

With above differences, this puts all pasid related logics into a new
pasid.c file.

Cache coherency enforcement is still applied to pasid operations since
it is about memory accesses post page table walking (no matter the walk
is per RID or per PASID).

Since the attach is per PASID, this introduces a pasid_hwpts xarray to
track the per-pasid attach data.

Signed-off-by: Kevin Tian <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
---
drivers/iommu/iommufd/Makefile | 1 +
drivers/iommu/iommufd/device.c | 17 ++-
drivers/iommu/iommufd/iommufd_private.h | 15 +++
drivers/iommu/iommufd/pasid.c | 138 ++++++++++++++++++++++++
include/linux/iommufd.h | 6 ++
5 files changed, 176 insertions(+), 1 deletion(-)
create mode 100644 drivers/iommu/iommufd/pasid.c

diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile
index 34b446146961..4b4d516b025c 100644
--- a/drivers/iommu/iommufd/Makefile
+++ b/drivers/iommu/iommufd/Makefile
@@ -6,6 +6,7 @@ iommufd-y := \
ioas.o \
main.o \
pages.o \
+ pasid.o \
vfio_compat.o

iommufd-$(CONFIG_IOMMUFD_TEST) += selftest.o
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 0992d9d46af9..a7574d4d5ffa 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -136,6 +136,7 @@ void iommufd_device_destroy(struct iommufd_object *obj)
struct iommufd_device *idev =
container_of(obj, struct iommufd_device, obj);

+ WARN_ON(!xa_empty(&idev->pasid_hwpts));
iommu_device_release_dma_owner(idev->dev);
iommufd_put_group(idev->igroup);
if (!iommufd_selftest_is_mock_dev(idev->dev))
@@ -216,6 +217,8 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
/* igroup refcount moves into iommufd_device */
idev->igroup = igroup;

+ xa_init(&idev->pasid_hwpts);
+
/*
* If the caller fails after this success it must call
* iommufd_unbind_device() which is safe since we hold this refcount.
@@ -534,7 +537,17 @@ iommufd_device_do_replace(struct iommufd_device *idev,
static struct iommufd_hw_pagetable *do_attach(struct iommufd_device *idev,
struct iommufd_hw_pagetable *hwpt, struct attach_data *data)
{
- return data->attach_fn(idev, hwpt);
+ if (data->pasid == IOMMU_PASID_INVALID) {
+ BUG_ON((data->attach_fn != iommufd_device_do_attach) &&
+ (data->attach_fn != iommufd_device_do_replace));
+ return data->attach_fn(idev, hwpt);
+ } else {
+ BUG_ON((data->pasid_attach_fn !=
+ iommufd_device_pasid_do_attach) &&
+ (data->pasid_attach_fn !=
+ iommufd_device_pasid_do_replace));
+ return data->pasid_attach_fn(idev, data->pasid, hwpt);
+ }
}

/*
@@ -684,6 +697,7 @@ int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id)
int rc;
struct attach_data data = {
.attach_fn = &iommufd_device_do_attach,
+ .pasid = IOMMU_PASID_INVALID,
};

rc = iommufd_device_change_pt(idev, pt_id, &data);
@@ -718,6 +732,7 @@ int iommufd_device_replace(struct iommufd_device *idev, u32 *pt_id)
{
struct attach_data data = {
.attach_fn = &iommufd_device_do_replace,
+ .pasid = IOMMU_PASID_INVALID,
};

return iommufd_device_change_pt(idev, pt_id, &data);
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 24fee2c37ce8..d37b7d0bfffe 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -349,6 +349,7 @@ struct iommufd_device {
struct list_head group_item;
/* always the physical device */
struct device *dev;
+ struct xarray pasid_hwpts;
bool enforce_cache_coherency;
};

@@ -368,9 +369,23 @@ struct attach_data {
struct iommufd_hw_pagetable *(*attach_fn)(
struct iommufd_device *idev,
struct iommufd_hw_pagetable *hwpt);
+ struct iommufd_hw_pagetable *(*pasid_attach_fn)(
+ struct iommufd_device *idev, u32 pasid,
+ struct iommufd_hw_pagetable *hwpt);
};
+ u32 pasid;
};

+int iommufd_device_change_pt(struct iommufd_device *idev, u32 *pt_id,
+ struct attach_data *data);
+
+struct iommufd_hw_pagetable *
+iommufd_device_pasid_do_attach(struct iommufd_device *idev, u32 pasid,
+ struct iommufd_hw_pagetable *hwpt);
+struct iommufd_hw_pagetable *
+iommufd_device_pasid_do_replace(struct iommufd_device *idev, u32 pasid,
+ struct iommufd_hw_pagetable *hwpt);
+
struct iommufd_access {
struct iommufd_object obj;
struct iommufd_ctx *ictx;
diff --git a/drivers/iommu/iommufd/pasid.c b/drivers/iommu/iommufd/pasid.c
new file mode 100644
index 000000000000..75499a1d92a1
--- /dev/null
+++ b/drivers/iommu/iommufd/pasid.c
@@ -0,0 +1,138 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2023, Intel Corporation
+ */
+#include <linux/iommufd.h>
+#include <linux/iommu.h>
+#include "../iommu-priv.h"
+
+#include "iommufd_private.h"
+
+static int __iommufd_device_pasid_do_attach(struct iommufd_device *idev,
+ u32 pasid,
+ struct iommufd_hw_pagetable *hwpt,
+ bool replace)
+{
+ int rc;
+
+ if (!replace)
+ rc = iommu_attach_device_pasid(hwpt->domain, idev->dev, pasid);
+ else
+ rc = iommu_replace_device_pasid(hwpt->domain, idev->dev, pasid);
+ if (rc)
+ return rc;
+
+ refcount_inc(&hwpt->obj.users);
+ xa_store(&idev->pasid_hwpts, pasid, hwpt, GFP_KERNEL);
+ return 0;
+}
+
+struct iommufd_hw_pagetable *
+iommufd_device_pasid_do_attach(struct iommufd_device *idev, u32 pasid,
+ struct iommufd_hw_pagetable *hwpt)
+{
+ struct iommufd_hw_pagetable *old_hwpt;
+ int rc;
+
+ old_hwpt = xa_load(&idev->pasid_hwpts, pasid);
+ if (old_hwpt) {
+ /* Attach does not allow overwrite */
+ if (old_hwpt == hwpt)
+ return NULL;
+ else
+ return ERR_PTR(-EINVAL);
+ }
+
+ rc = __iommufd_device_pasid_do_attach(idev, pasid, hwpt, false);
+ return rc ? ERR_PTR(rc) : NULL;
+}
+
+struct iommufd_hw_pagetable *
+iommufd_device_pasid_do_replace(struct iommufd_device *idev, u32 pasid,
+ struct iommufd_hw_pagetable *hwpt)
+{
+ struct iommufd_hw_pagetable *old_hwpt;
+ int rc;
+
+ old_hwpt = xa_load(&idev->pasid_hwpts, pasid);
+ if (!old_hwpt)
+ return ERR_PTR(-EINVAL);
+
+ if (hwpt == old_hwpt)
+ return NULL;
+
+ rc = __iommufd_device_pasid_do_attach(idev, pasid, hwpt, true);
+ /* Caller must destroy old_hwpt */
+ return rc ? ERR_PTR(rc) : old_hwpt;
+}
+
+/**
+ * iommufd_device_pasid_attach - Connect a {device, pasid} to an iommu_domain
+ * @idev: device to attach
+ * @pasid: pasid to attach
+ * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HW_PAGETABLE
+ * Output the IOMMUFD_OBJ_HW_PAGETABLE ID
+ *
+ * This connects a pasid of the device to an iommu_domain. Once this
+ * completes the device could do DMA with the pasid.
+ *
+ * This function is undone by calling iommufd_device_detach_pasid().
+ */
+int iommufd_device_pasid_attach(struct iommufd_device *idev,
+ u32 pasid, u32 *pt_id)
+{
+ struct attach_data data = {
+ .pasid_attach_fn = &iommufd_device_pasid_do_attach,
+ .pasid = pasid,
+ };
+
+ return iommufd_device_change_pt(idev, pt_id, &data);
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_device_pasid_attach, IOMMUFD);
+
+/**
+ * iommufd_device_pasid_replace- Change the {device, pasid}'s iommu_domain
+ * @idev: device to change
+ * @pasid: pasid to change
+ * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HW_PAGETABLE
+ * Output the IOMMUFD_OBJ_HW_PAGETABLE ID
+ *
+ * This is the same as
+ * iommufd_device_pasid_detach();
+ * iommufd_device_pasid_attach();
+ *
+ * If it fails then no change is made to the attachment. The iommu driver may
+ * implement this so there is no disruption in translation. This can only be
+ * called if iommufd_device_pasid_attach() has already succeeded.
+ */
+int iommufd_device_pasid_replace(struct iommufd_device *idev,
+ u32 pasid, u32 *pt_id)
+{
+ struct attach_data data = {
+ .pasid_attach_fn = &iommufd_device_pasid_do_replace,
+ .pasid = pasid,
+ };
+
+ return iommufd_device_change_pt(idev, pt_id, &data);
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_device_pasid_replace, IOMMUFD);
+
+/**
+ * iommufd_device_pasid_detach - Disconnect a {device, pasid} to an iommu_domain
+ * @idev: device to detach
+ * @pasid: pasid to detach
+ *
+ * Undo iommufd_device_pasid_attach(). This disconnects the idev/pasid from
+ * the previously attached pt_id.
+ */
+void iommufd_device_pasid_detach(struct iommufd_device *idev, u32 pasid)
+{
+ struct iommufd_hw_pagetable *hwpt;
+
+ hwpt = xa_load(&idev->pasid_hwpts, pasid);
+ if (!hwpt)
+ return;
+ xa_erase(&idev->pasid_hwpts, pasid);
+ iommu_detach_device_pasid(hwpt->domain, idev->dev, pasid);
+ iommufd_hw_pagetable_put(idev->ictx, hwpt);
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_device_pasid_detach, IOMMUFD);
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index ffc3a949f837..0b007c376306 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -26,6 +26,12 @@ int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id);
int iommufd_device_replace(struct iommufd_device *idev, u32 *pt_id);
void iommufd_device_detach(struct iommufd_device *idev);

+int iommufd_device_pasid_attach(struct iommufd_device *idev,
+ u32 pasid, u32 *pt_id);
+int iommufd_device_pasid_replace(struct iommufd_device *idev,
+ u32 pasid, u32 *pt_id);
+void iommufd_device_pasid_detach(struct iommufd_device *idev, u32 pasid);
+
struct iommufd_ctx *iommufd_device_to_ictx(struct iommufd_device *idev);
u32 iommufd_device_to_id(struct iommufd_device *idev);

--
2.34.1

2023-11-27 06:35:08

by Yi Liu

[permalink] [raw]
Subject: [PATCH 4/8] iommufd/selftest: Add set_dev_pasid and remove_dev_pasid in mock iommu

The two callbacks are needed to make pasid_attach/detach path complete for
mock device. A nop is enough for set_dev_pasid, a domain type check in the
remove_dev_pasid is also helpful.

Signed-off-by: Yi Liu <[email protected]>
---
drivers/iommu/iommufd/selftest.c | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)

diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index f2d4599f701b..221a206bf38f 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -440,6 +440,31 @@ static struct iommu_device *mock_probe_device(struct device *dev)
return &mock_iommu_device;
}

+static void mock_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
+{
+ struct iommu_domain *domain;
+
+ /* Domain type specific cleanup: */
+ domain = iommu_get_domain_for_dev_pasid(dev, pasid, 0);
+ if (domain) {
+ switch (domain->type) {
+ case IOMMU_DOMAIN_NESTED:
+ case IOMMU_DOMAIN_UNMANAGED:
+ break;
+ default:
+ /* should never reach here */
+ WARN_ON(1);
+ break;
+ }
+ }
+}
+
+static int mock_domain_set_dev_pasid_nop(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
+{
+ return 0;
+}
+
static const struct iommu_ops mock_ops = {
/*
* IOMMU_DOMAIN_BLOCKED cannot be returned from def_domain_type()
@@ -455,6 +480,7 @@ static const struct iommu_ops mock_ops = {
.capable = mock_domain_capable,
.device_group = generic_device_group,
.probe_device = mock_probe_device,
+ .remove_dev_pasid = mock_iommu_remove_dev_pasid,
.default_domain_ops =
&(struct iommu_domain_ops){
.free = mock_domain_free,
@@ -462,6 +488,7 @@ static const struct iommu_ops mock_ops = {
.map_pages = mock_domain_map_pages,
.unmap_pages = mock_domain_unmap_pages,
.iova_to_phys = mock_domain_iova_to_phys,
+ .set_dev_pasid = mock_domain_set_dev_pasid_nop,
},
};

@@ -519,6 +546,7 @@ static struct iommu_domain_ops domain_nested_ops = {
.free = mock_domain_free_nested,
.attach_dev = mock_domain_nop_attach,
.cache_invalidate_user = mock_domain_cache_invalidate_user,
+ .set_dev_pasid = mock_domain_set_dev_pasid_nop,
};

static inline struct iommufd_hw_pagetable *
--
2.34.1

2023-11-27 06:35:08

by Yi Liu

[permalink] [raw]
Subject: [PATCH 1/8] iommu: Introduce a replace API for device pasid

From: Lu Baolu <[email protected]>

Provide a high-level API to allow replacements of one domain with
another for specific pasid of a device. This is similar to
iommu_group_replace_domain() and it is also expected to be used
only by IOMMUFD.

Signed-off-by: Lu Baolu <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
---
drivers/iommu/iommu-priv.h | 2 +
drivers/iommu/iommu.c | 82 +++++++++++++++++++++++++++++++-------
2 files changed, 70 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
index 2024a2313348..5c32637f6325 100644
--- a/drivers/iommu/iommu-priv.h
+++ b/drivers/iommu/iommu-priv.h
@@ -19,6 +19,8 @@ static inline const struct iommu_ops *dev_iommu_ops(struct device *dev)

int iommu_group_replace_domain(struct iommu_group *group,
struct iommu_domain *new_domain);
+int iommu_replace_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid);

int iommu_device_register_bus(struct iommu_device *iommu,
const struct iommu_ops *ops, struct bus_type *bus,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 9d573e971aff..ec213ebd5ecc 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3430,6 +3430,27 @@ static void __iommu_remove_group_pasid(struct iommu_group *group,
}
}

+static int __iommu_group_attach_pasid(struct iommu_domain *domain,
+ struct iommu_group *group, ioasid_t pasid)
+{
+ void *curr;
+ int ret;
+
+ lockdep_assert_held(&group->mutex);
+
+ curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain, GFP_KERNEL);
+ if (curr)
+ return xa_err(curr) ? : -EBUSY;
+
+ ret = __iommu_set_group_pasid(domain, group, pasid);
+ if (ret) {
+ __iommu_remove_group_pasid(group, pasid);
+ xa_erase(&group->pasid_array, pasid);
+ }
+
+ return ret;
+}
+
/*
* iommu_attach_device_pasid() - Attach a domain to pasid of device
* @domain: the iommu domain.
@@ -3453,19 +3474,9 @@ int iommu_attach_device_pasid(struct iommu_domain *domain,
return -ENODEV;

mutex_lock(&group->mutex);
- curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain, GFP_KERNEL);
- if (curr) {
- ret = xa_err(curr) ? : -EBUSY;
- goto out_unlock;
- }
-
- ret = __iommu_set_group_pasid(domain, group, pasid);
- if (ret) {
- __iommu_remove_group_pasid(group, pasid);
- xa_erase(&group->pasid_array, pasid);
- }
-out_unlock:
+ ret = __iommu_group_attach_pasid(domain, group, pasid);
mutex_unlock(&group->mutex);
+
return ret;
}
EXPORT_SYMBOL_GPL(iommu_attach_device_pasid);
@@ -3479,8 +3490,8 @@ EXPORT_SYMBOL_GPL(iommu_attach_device_pasid);
* The @domain must have been attached to @pasid of the @dev with
* iommu_attach_device_pasid().
*/
-void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev,
- ioasid_t pasid)
+void iommu_detach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
{
/* Caller must be a probed driver on dev */
struct iommu_group *group = dev->iommu_group;
@@ -3492,6 +3503,49 @@ void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev,
}
EXPORT_SYMBOL_GPL(iommu_detach_device_pasid);

+/**
+ * iommu_replace_device_pasid - replace the domain that a pasid is attached to
+ * @domain: new IOMMU domain to replace with
+ * @dev: the physical device
+ * @pasid: pasid that will be attached to the new domain
+ *
+ * This API allows the pasid to switch domains. Return 0 on success, or an
+ * error. The pasid will roll back to use the old domain if failure. The
+ * caller could call iommu_detach_device_pasid() before free the old domain
+ * in order to avoid use-after-free case.
+ */
+int iommu_replace_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
+{
+ struct iommu_group *group = dev->iommu_group;
+ struct iommu_domain *old_domain;
+ int ret;
+
+ if (!domain)
+ return -EINVAL;
+
+ if (!group)
+ return -ENODEV;
+
+ mutex_lock(&group->mutex);
+ __iommu_remove_group_pasid(group, pasid);
+ old_domain = xa_erase(&group->pasid_array, pasid);
+ ret = __iommu_group_attach_pasid(domain, group, pasid);
+ if (ret)
+ goto err_rollback;
+ mutex_unlock(&group->mutex);
+
+ return 0;
+
+err_rollback:
+ if (old_domain)
+ __iommu_group_attach_pasid(old_domain, group, pasid);
+ mutex_unlock(&group->mutex);
+
+ return ret;
+}
+EXPORT_SYMBOL_NS_GPL(iommu_replace_device_pasid, IOMMUFD_INTERNAL);
+
/*
* iommu_get_domain_for_dev_pasid() - Retrieve domain for @pasid of @dev
* @dev: the queried device
--
2.34.1

2023-11-27 06:35:10

by Yi Liu

[permalink] [raw]
Subject: [PATCH 5/8] iommufd/selftest: Add a helper to get test device

There is need to get the selftest device (sobj->type == TYPE_IDEV) in
multiple places, so have a helper to for it.

Signed-off-by: Yi Liu <[email protected]>
---
drivers/iommu/iommufd/selftest.c | 32 +++++++++++++++++++++-----------
1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 221a206bf38f..9b9fd3f50264 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -720,29 +720,39 @@ static int iommufd_test_mock_domain(struct iommufd_ucmd *ucmd,
return rc;
}

-/* Replace the mock domain with a manually allocated hw_pagetable */
-static int iommufd_test_mock_domain_replace(struct iommufd_ucmd *ucmd,
- unsigned int device_id, u32 pt_id,
- struct iommu_test_cmd *cmd)
+static struct selftest_obj *
+iommufd_test_get_self_test_device(struct iommufd_ctx *ictx, u32 id)
{
struct iommufd_object *dev_obj;
struct selftest_obj *sobj;
- int rc;

/*
* Prefer to use the OBJ_SELFTEST because the destroy_rwsem will ensure
* it doesn't race with detach, which is not allowed.
*/
- dev_obj =
- iommufd_get_object(ucmd->ictx, device_id, IOMMUFD_OBJ_SELFTEST);
+ dev_obj = iommufd_get_object(ictx, id, IOMMUFD_OBJ_SELFTEST);
if (IS_ERR(dev_obj))
- return PTR_ERR(dev_obj);
+ return (struct selftest_obj *)dev_obj;

sobj = container_of(dev_obj, struct selftest_obj, obj);
if (sobj->type != TYPE_IDEV) {
- rc = -EINVAL;
- goto out_dev_obj;
+ iommufd_put_object(dev_obj);
+ return ERR_PTR(-EINVAL);
}
+ return sobj;
+}
+
+/* Replace the mock domain with a manually allocated hw_pagetable */
+static int iommufd_test_mock_domain_replace(struct iommufd_ucmd *ucmd,
+ unsigned int device_id, u32 pt_id,
+ struct iommu_test_cmd *cmd)
+{
+ struct selftest_obj *sobj;
+ int rc;
+
+ sobj = iommufd_test_get_self_test_device(ucmd->ictx, device_id);
+ if (IS_ERR(sobj))
+ return PTR_ERR(sobj);

rc = iommufd_device_replace(sobj->idev.idev, &pt_id);
if (rc)
@@ -752,7 +762,7 @@ static int iommufd_test_mock_domain_replace(struct iommufd_ucmd *ucmd,
rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));

out_dev_obj:
- iommufd_put_object(dev_obj);
+ iommufd_put_object(&sobj->obj);
return rc;
}

--
2.34.1

2023-11-27 06:35:29

by Yi Liu

[permalink] [raw]
Subject: [PATCH 7/8] iommufd/selftest: Add coverage for iommufd pasid attach/detach

This tests iommufd pasid attach/replace/detach.

Signed-off-by: Yi Liu <[email protected]>
---
tools/testing/selftests/iommu/iommufd.c | 172 ++++++++++++++++++
.../selftests/iommu/iommufd_fail_nth.c | 28 ++-
tools/testing/selftests/iommu/iommufd_utils.h | 78 ++++++++
3 files changed, 274 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index 2781d5bc6309..4dfd2d33c0a0 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -2221,4 +2221,176 @@ TEST_F(vfio_compat_mock_domain, huge_map)
}
}

+FIXTURE(iommufd_device_pasid)
+{
+ int fd;
+ uint32_t ioas_id;
+ uint32_t hwpt_id;
+ uint32_t stdev_id;
+ uint32_t device_id;
+};
+
+FIXTURE_SETUP(iommufd_device_pasid)
+{
+ self->fd = open("/dev/iommu", O_RDWR);
+ ASSERT_NE(-1, self->fd);
+ test_ioctl_ioas_alloc(&self->ioas_id);
+
+ test_cmd_mock_domain(self->ioas_id, &self->stdev_id,
+ &self->hwpt_id, &self->device_id);
+}
+
+FIXTURE_TEARDOWN(iommufd_device_pasid)
+{
+ teardown_iommufd(self->fd, _metadata);
+}
+
+TEST_F(iommufd_device_pasid, pasid_attach)
+{
+ if (self->device_id) {
+ struct iommu_hwpt_selftest data = {
+ .iotlb = IOMMU_TEST_IOTLB_DEFAULT,
+ };
+ uint32_t nested_hwpt_id[2] = {};
+ uint32_t parent_hwpt_id = 0;
+ uint32_t pasid = 100;
+ bool result;
+
+ /* Allocate two nested hwpts sharing one common parent hwpt */
+ test_cmd_hwpt_alloc(self->device_id, self->ioas_id,
+ IOMMU_HWPT_ALLOC_NEST_PARENT,
+ &parent_hwpt_id);
+
+ test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id, 0,
+ &nested_hwpt_id[0],
+ IOMMU_HWPT_DATA_SELFTEST,
+ &data, sizeof(data));
+ test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id, 0,
+ &nested_hwpt_id[1],
+ IOMMU_HWPT_DATA_SELFTEST,
+ &data, sizeof(data));
+
+ /*
+ * Attach ioas to pasid 100, should succeed, domain should
+ * be valid.
+ */
+ test_cmd_pasid_attach(pasid, self->ioas_id);
+ ASSERT_EQ(0,
+ test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+ pasid, self->hwpt_id,
+ &result));
+ EXPECT_EQ(1, result);
+
+ /*
+ * Try attach pasid 100 with self->ioas_id, should succeed
+ * as it is the same with existing hwpt.
+ */
+ test_cmd_pasid_attach(pasid, self->ioas_id);
+
+ /*
+ * Try attach pasid 100 with another hwpt, should FAIL
+ * as attach does not allow overwrite, use REPLACE instead.
+ */
+ test_err_cmd_pasid_attach(EINVAL, pasid, nested_hwpt_id[0]);
+
+ /*
+ * Detach hwpt from pasid 100, and check if the pasid 100
+ * has null domain. Should be done before the next attach.
+ */
+ test_cmd_pasid_detach(pasid);
+ ASSERT_EQ(0,
+ test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+ pasid, 0, &result));
+ EXPECT_EQ(1, result);
+
+ /*
+ * Attach nested hwpt to pasid 100, should succeed, domain
+ * should be valid.
+ */
+ test_cmd_pasid_attach(pasid, nested_hwpt_id[0]);
+ ASSERT_EQ(0,
+ test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+ pasid, nested_hwpt_id[0],
+ &result));
+ EXPECT_EQ(1, result);
+
+ /*
+ * Detach hwpt from pasid 100, and check if the pasid 100
+ * has null domain
+ */
+ test_cmd_pasid_detach(pasid);
+ ASSERT_EQ(0,
+ test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+ pasid, 0, &result));
+ EXPECT_EQ(1, result);
+
+ /* Replace tests */
+ pasid = 200;
+
+ /*
+ * Replace pasid 200 without attaching it first, should
+ * fail with -EINVAL.
+ */
+ test_err_cmd_pasid_replace(EINVAL, pasid, parent_hwpt_id);
+
+ /*
+ * Attach a s2 hwpt to pasid 200, should succeed, domain should
+ * be valid.
+ */
+ test_cmd_pasid_attach(pasid, parent_hwpt_id);
+ ASSERT_EQ(0,
+ test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+ pasid, parent_hwpt_id,
+ &result));
+ EXPECT_EQ(1, result);
+
+ /*
+ * Replace pasid 200 with self->ioas_id, should succeed,
+ * and have valid domain.
+ */
+ test_cmd_pasid_replace(pasid, self->ioas_id);
+ ASSERT_EQ(0,
+ test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+ pasid, self->hwpt_id,
+ &result));
+ EXPECT_EQ(1, result);
+
+ /*
+ * Replace a nested hwpt for pasid 200, should succeed,
+ * and have valid domain.
+ */
+ test_cmd_pasid_replace(pasid, nested_hwpt_id[0]);
+ ASSERT_EQ(0,
+ test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+ pasid, nested_hwpt_id[0],
+ &result));
+ EXPECT_EQ(1, result);
+
+ /*
+ * Replace with another nested hwpt for pasid 200, should
+ * succeed, and have valid domain.
+ */
+ test_cmd_pasid_replace(pasid, nested_hwpt_id[1]);
+ ASSERT_EQ(0,
+ test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+ pasid, nested_hwpt_id[1],
+ &result));
+ EXPECT_EQ(1, result);
+
+ /*
+ * Detach hwpt from pasid 200, and check if the pasid 200
+ * has null domain.
+ */
+ test_cmd_pasid_detach(pasid);
+ ASSERT_EQ(0,
+ test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+ pasid, 0, &result));
+ EXPECT_EQ(1, result);
+
+ test_ioctl_destroy(nested_hwpt_id[0]);
+ test_ioctl_destroy(nested_hwpt_id[1]);
+ test_ioctl_destroy(parent_hwpt_id);
+ }
+}
+
TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c
index f590417cd67a..6d1b03e73b9d 100644
--- a/tools/testing/selftests/iommu/iommufd_fail_nth.c
+++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c
@@ -206,12 +206,16 @@ FIXTURE(basic_fail_nth)
{
int fd;
uint32_t access_id;
+ uint32_t stdev_id;
+ uint32_t pasid;
};

FIXTURE_SETUP(basic_fail_nth)
{
self->fd = -1;
self->access_id = 0;
+ self->stdev_id = 0;
+ self->pasid = 0; //test should use a non-zero value
}

FIXTURE_TEARDOWN(basic_fail_nth)
@@ -223,6 +227,8 @@ FIXTURE_TEARDOWN(basic_fail_nth)
rc = _test_cmd_destroy_access(self->access_id);
assert(rc == 0);
}
+ if (self->pasid && self->stdev_id)
+ _test_cmd_pasid_detach(self->fd, self->stdev_id, self->pasid);
teardown_iommufd(self->fd, _metadata);
}

@@ -579,7 +585,6 @@ TEST_FAIL_NTH(basic_fail_nth, device)
struct iommu_test_hw_info info;
uint32_t ioas_id;
uint32_t ioas_id2;
- uint32_t stdev_id;
uint32_t idev_id;
uint32_t hwpt_id;
__u64 iova;
@@ -608,7 +613,7 @@ TEST_FAIL_NTH(basic_fail_nth, device)

fail_nth_enable();

- if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, NULL,
+ if (_test_cmd_mock_domain(self->fd, ioas_id, &self->stdev_id, NULL,
&idev_id))
return -1;

@@ -619,11 +624,26 @@ TEST_FAIL_NTH(basic_fail_nth, device)
IOMMU_HWPT_DATA_NONE, 0, 0))
return -1;

- if (_test_cmd_mock_domain_replace(self->fd, stdev_id, ioas_id2, NULL))
+ if (_test_cmd_mock_domain_replace(self->fd, self->stdev_id, ioas_id2, NULL))
+ return -1;
+
+ if (_test_cmd_mock_domain_replace(self->fd, self->stdev_id, hwpt_id, NULL))
return -1;

- if (_test_cmd_mock_domain_replace(self->fd, stdev_id, hwpt_id, NULL))
+ self->pasid = 200;
+
+ /* Tests for pasid attach/replace/detach */
+ if (_test_cmd_pasid_attach(self->fd, self->stdev_id, self->pasid, ioas_id))
return -1;
+
+ if (_test_cmd_pasid_replace(self->fd, self->stdev_id, self->pasid, ioas_id2))
+ return -1;
+
+ if (_test_cmd_pasid_detach(self->fd, self->stdev_id, self->pasid))
+ return -1;
+
+ self->pasid = 0;
+
return 0;
}

diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index d4a169a2278e..890d622cde86 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -682,3 +682,81 @@ static int _test_cmd_get_hw_info(int fd, __u32 device_id, void *data,

#define test_cmd_get_hw_capabilities(device_id, caps, mask) \
ASSERT_EQ(0, _test_cmd_get_hw_info(self->fd, device_id, NULL, 0, &caps))
+
+static int _test_cmd_pasid_attach(int fd, __u32 stdev_id, __u32 pasid, __u32 pt_id)
+{
+ struct iommu_test_cmd test_attach = {
+ .size = sizeof(test_attach),
+ .op = IOMMU_TEST_OP_PASID_ATTACH,
+ .id = stdev_id,
+ .pasid_attach = {
+ .pasid = pasid,
+ .pt_id = pt_id,
+ },
+ };
+
+ return ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_PASID_ATTACH), &test_attach);
+}
+
+#define test_cmd_pasid_attach(pasid, hwpt_id) \
+ ASSERT_EQ(0, _test_cmd_pasid_attach(self->fd, self->stdev_id, pasid, hwpt_id))
+
+#define test_err_cmd_pasid_attach(_errno, pasid, hwpt_id) \
+ EXPECT_ERRNO(_errno, \
+ _test_cmd_pasid_attach(self->fd, self->stdev_id, pasid, hwpt_id))
+
+static int _test_cmd_pasid_replace(int fd, __u32 stdev_id, __u32 pasid, __u32 pt_id)
+{
+ struct iommu_test_cmd test_replace = {
+ .size = sizeof(test_replace),
+ .op = IOMMU_TEST_OP_PASID_REPLACE,
+ .id = stdev_id,
+ .pasid_replace = {
+ .pasid = pasid,
+ .pt_id = pt_id,
+ },
+ };
+
+ return ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_PASID_REPLACE), &test_replace);
+}
+
+#define test_cmd_pasid_replace(pasid, hwpt_id) \
+ ASSERT_EQ(0, _test_cmd_pasid_replace(self->fd, self->stdev_id, pasid, hwpt_id))
+
+#define test_err_cmd_pasid_replace(_errno, pasid, hwpt_id) \
+ EXPECT_ERRNO(_errno, \
+ _test_cmd_pasid_replace(self->fd, self->stdev_id, pasid, hwpt_id))
+
+static int _test_cmd_pasid_detach(int fd, __u32 stdev_id, __u32 pasid)
+{
+ struct iommu_test_cmd test_detach = {
+ .size = sizeof(test_detach),
+ .op = IOMMU_TEST_OP_PASID_DETACH,
+ .id = stdev_id,
+ .pasid_detach = {
+ .pasid = pasid,
+ },
+ };
+
+ return ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_PASID_DETACH), &test_detach);
+}
+
+#define test_cmd_pasid_detach(pasid) \
+ ASSERT_EQ(0, _test_cmd_pasid_detach(self->fd, self->stdev_id, pasid))
+
+static int test_cmd_pasid_check_domain(int fd, __u32 stdev_id, __u32 pasid,
+ __u32 hwpt_id, bool *result)
+{
+ struct iommu_test_cmd test_pasid_check = {
+ .size = sizeof(test_pasid_check),
+ .op = IOMMU_TEST_OP_PASID_CHECK_DOMAIN,
+ .id = stdev_id,
+ .pasid_check = {
+ .pasid = pasid,
+ .hwpt_id = hwpt_id,
+ .out_result_ptr = (__u64)result,
+ },
+ };
+
+ return ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_PASID_CHECK_DOMAIN), &test_pasid_check);
+}
--
2.34.1

2023-11-27 06:35:41

by Yi Liu

[permalink] [raw]
Subject: [PATCH 6/8] iommufd/selftest: Add test ops to test pasid attach/detach

This adds 4 test ops for pasid attach/replace/detach testing. There are
ops to attach/detach pasid, and also op to check the attached domain of
a pasid.

Signed-off-by: Yi Liu <[email protected]>
---
drivers/iommu/iommufd/iommufd_test.h | 24 ++++++
drivers/iommu/iommufd/selftest.c | 116 +++++++++++++++++++++++++++
2 files changed, 140 insertions(+)

diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
index 35d6207a96dd..67c2a8508b92 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -22,6 +22,10 @@ enum {
IOMMU_TEST_OP_MOCK_DOMAIN_FLAGS,
IOMMU_TEST_OP_DIRTY,
IOMMU_TEST_OP_MD_CHECK_IOTLB,
+ IOMMU_TEST_OP_PASID_ATTACH,
+ IOMMU_TEST_OP_PASID_REPLACE,
+ IOMMU_TEST_OP_PASID_DETACH,
+ IOMMU_TEST_OP_PASID_CHECK_DOMAIN,
};

enum {
@@ -126,6 +130,26 @@ struct iommu_test_cmd {
__u32 id;
__u32 iotlb;
} check_iotlb;
+ struct {
+ __u32 pasid;
+ __u32 pt_id;
+ /* @id is stdev_id for IOMMU_TEST_OP_PASID_ATTACH */
+ } pasid_attach;
+ struct {
+ __u32 pasid;
+ __u32 pt_id;
+ /* @id is stdev_id for IOMMU_TEST_OP_PASID_ATTACH */
+ } pasid_replace;
+ struct {
+ __u32 pasid;
+ /* @id is stdev_id for IOMMU_TEST_OP_PASID_DETACH */
+ } pasid_detach;
+ struct {
+ __u32 pasid;
+ __u32 hwpt_id;
+ __u64 out_result_ptr;
+ /* @id is stdev_id for IOMMU_TEST_OP_HWPT_GET_DOMAIN */
+ } pasid_check;
};
__u32 last;
};
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 9b9fd3f50264..a3139ad534a1 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -1340,6 +1340,114 @@ static int iommufd_test_dirty(struct iommufd_ucmd *ucmd, unsigned int mockpt_id,
return rc;
}

+static int iommufd_test_pasid_attach(struct iommufd_ucmd *ucmd,
+ struct iommu_test_cmd *cmd)
+{
+ struct selftest_obj *sobj;
+ int rc;
+
+ sobj = iommufd_test_get_self_test_device(ucmd->ictx, cmd->id);
+ if (IS_ERR(sobj))
+ return PTR_ERR(sobj);
+
+ rc = iommufd_device_pasid_attach(sobj->idev.idev,
+ cmd->pasid_attach.pasid,
+ &cmd->pasid_attach.pt_id);
+ iommufd_put_object(&sobj->obj);
+ return rc;
+}
+
+static int iommufd_test_pasid_replace(struct iommufd_ucmd *ucmd,
+ struct iommu_test_cmd *cmd)
+{
+ struct selftest_obj *sobj;
+ int rc;
+
+ sobj = iommufd_test_get_self_test_device(ucmd->ictx, cmd->id);
+ if (IS_ERR(sobj))
+ return PTR_ERR(sobj);
+
+ rc = iommufd_device_pasid_replace(sobj->idev.idev,
+ cmd->pasid_attach.pasid,
+ &cmd->pasid_attach.pt_id);
+ iommufd_put_object(&sobj->obj);
+ return rc;
+}
+
+static int iommufd_test_pasid_detach(struct iommufd_ucmd *ucmd,
+ struct iommu_test_cmd *cmd)
+{
+ struct selftest_obj *sobj;
+
+ sobj = iommufd_test_get_self_test_device(ucmd->ictx, cmd->id);
+ if (IS_ERR(sobj))
+ return PTR_ERR(sobj);
+
+ iommufd_device_pasid_detach(sobj->idev.idev,
+ cmd->pasid_detach.pasid);
+ iommufd_put_object(&sobj->obj);
+ return 0;
+}
+
+static inline struct iommufd_hw_pagetable *
+iommufd_get_hwpt(struct iommufd_ucmd *ucmd, u32 id)
+{
+ struct iommufd_object *pt_obj;
+
+ pt_obj = iommufd_get_object(ucmd->ictx, id, IOMMUFD_OBJ_ANY);
+ if (IS_ERR(pt_obj))
+ return ERR_CAST(pt_obj);
+
+ if (pt_obj->type != IOMMUFD_OBJ_HWPT_NESTED &&
+ pt_obj->type != IOMMUFD_OBJ_HWPT_PAGING) {
+ iommufd_put_object(pt_obj);
+ return ERR_PTR(-EINVAL);
+ }
+
+ return container_of(pt_obj, struct iommufd_hw_pagetable, obj);
+}
+
+static int iommufd_test_pasid_check_domain(struct iommufd_ucmd *ucmd,
+ struct iommu_test_cmd *cmd)
+{
+ struct iommu_domain *attached_domain, *expect_domain = NULL;
+ struct iommufd_hw_pagetable *hwpt = NULL;
+ struct selftest_obj *sobj;
+ struct mock_dev *mdev;
+ bool result;
+ int rc = 0;
+
+ sobj = iommufd_test_get_self_test_device(ucmd->ictx, cmd->id);
+ if (IS_ERR(sobj))
+ return PTR_ERR(sobj);
+
+ mdev = sobj->idev.mock_dev;
+
+ attached_domain = iommu_get_domain_for_dev_pasid(&mdev->dev,
+ cmd->pasid_check.pasid, 0);
+ if (IS_ERR(attached_domain))
+ attached_domain = NULL;
+
+ if (cmd->pasid_check.hwpt_id) {
+ hwpt = iommufd_get_hwpt(ucmd, cmd->pasid_check.hwpt_id);
+ if (IS_ERR(hwpt)) {
+ rc = PTR_ERR(hwpt);
+ goto out_put_dev;
+ }
+ expect_domain = hwpt->domain;
+ }
+
+ result = (attached_domain == expect_domain) ? 1 : 0;
+ if (copy_to_user(u64_to_user_ptr(cmd->pasid_check.out_result_ptr),
+ &result, sizeof(result)))
+ rc = -EFAULT;
+ if (hwpt)
+ iommufd_put_object(&hwpt->obj);
+out_put_dev:
+ iommufd_put_object(&sobj->obj);
+ return rc;
+}
+
void iommufd_selftest_destroy(struct iommufd_object *obj)
{
struct selftest_obj *sobj = container_of(obj, struct selftest_obj, obj);
@@ -1415,6 +1523,14 @@ int iommufd_test(struct iommufd_ucmd *ucmd)
cmd->dirty.page_size,
u64_to_user_ptr(cmd->dirty.uptr),
cmd->dirty.flags);
+ case IOMMU_TEST_OP_PASID_ATTACH:
+ return iommufd_test_pasid_attach(ucmd, cmd);
+ case IOMMU_TEST_OP_PASID_REPLACE:
+ return iommufd_test_pasid_replace(ucmd, cmd);
+ case IOMMU_TEST_OP_PASID_DETACH:
+ return iommufd_test_pasid_detach(ucmd, cmd);
+ case IOMMU_TEST_OP_PASID_CHECK_DOMAIN:
+ return iommufd_test_pasid_check_domain(ucmd, cmd);
default:
return -EOPNOTSUPP;
}
--
2.34.1

2023-11-27 06:36:04

by Yi Liu

[permalink] [raw]
Subject: [PATCH 8/8] iommu/vt-d: Add set_dev_pasid callback for nested domain

From: Lu Baolu <[email protected]>

This allows the upper layers to set a nested type domain to a PASID of a
device if the PASID feature is supported by the IOMMU hardware.

The set_dev_pasid callback for non-nest domain has already be there, so
this only needs to add it for nested domains.

Signed-off-by: Lu Baolu <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
---
drivers/iommu/intel/nested.c | 47 ++++++++++++++++++++++++++++++++++++
1 file changed, 47 insertions(+)

diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
index 44ad48db7ea0..f6f687750104 100644
--- a/drivers/iommu/intel/nested.c
+++ b/drivers/iommu/intel/nested.c
@@ -68,6 +68,52 @@ static int intel_nested_attach_dev(struct iommu_domain *domain,
return 0;
}

+static int intel_nested_set_dev_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
+{
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+ struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+ struct intel_iommu *iommu = info->iommu;
+ struct dev_pasid_info *dev_pasid;
+ unsigned long flags;
+ int ret = 0;
+
+ if (!pasid_supported(iommu))
+ return -EOPNOTSUPP;
+
+ if (iommu->agaw < dmar_domain->s2_domain->agaw)
+ return -EINVAL;
+
+ ret = prepare_domain_attach_device(&dmar_domain->s2_domain->domain, dev);
+ if (ret)
+ return ret;
+
+ dev_pasid = kzalloc(sizeof(*dev_pasid), GFP_KERNEL);
+ if (!dev_pasid)
+ return -ENOMEM;
+
+ ret = domain_attach_iommu(dmar_domain, iommu);
+ if (ret)
+ goto err_free;
+
+ ret = intel_pasid_setup_nested(iommu, dev, pasid, dmar_domain);
+ if (ret)
+ goto err_detach_iommu;
+
+ dev_pasid->dev = dev;
+ dev_pasid->pasid = pasid;
+ spin_lock_irqsave(&dmar_domain->lock, flags);
+ list_add(&dev_pasid->link_domain, &dmar_domain->dev_pasids);
+ spin_unlock_irqrestore(&dmar_domain->lock, flags);
+
+ return 0;
+err_detach_iommu:
+ domain_detach_iommu(dmar_domain, iommu);
+err_free:
+ kfree(dev_pasid);
+ return ret;
+}
+
static void intel_nested_domain_free(struct iommu_domain *domain)
{
kfree(to_dmar_domain(domain));
@@ -128,6 +174,7 @@ static int intel_nested_cache_invalidate_user(struct iommu_domain *domain,

static const struct iommu_domain_ops intel_nested_domain_ops = {
.attach_dev = intel_nested_attach_dev,
+ .set_dev_pasid = intel_nested_set_dev_pasid,
.free = intel_nested_domain_free,
.cache_invalidate_user = intel_nested_cache_invalidate_user,
};
--
2.34.1

2023-12-14 02:56:05

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH 8/8] iommu/vt-d: Add set_dev_pasid callback for nested domain

On 11/27/2023 2:34 PM, Yi Liu wrote:
> From: Lu Baolu <[email protected]>
>
> This allows the upper layers to set a nested type domain to a PASID of a
> device if the PASID feature is supported by the IOMMU hardware.
>
> The set_dev_pasid callback for non-nest domain has already be there, so
> this only needs to add it for nested domains.
>
> Signed-off-by: Lu Baolu <[email protected]>
> Signed-off-by: Yi Liu <[email protected]>
> ---
> drivers/iommu/intel/nested.c | 47 ++++++++++++++++++++++++++++++++++++
> 1 file changed, 47 insertions(+)
>
> diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
> index 44ad48db7ea0..f6f687750104 100644
> --- a/drivers/iommu/intel/nested.c
> +++ b/drivers/iommu/intel/nested.c
> @@ -68,6 +68,52 @@ static int intel_nested_attach_dev(struct iommu_domain *domain,
> return 0;
> }
>
> +static int intel_nested_set_dev_pasid(struct iommu_domain *domain,
> + struct device *dev, ioasid_t pasid)
> +{
> + struct device_domain_info *info = dev_iommu_priv_get(dev);
> + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> + struct intel_iommu *iommu = info->iommu;
> + struct dev_pasid_info *dev_pasid;
> + unsigned long flags;
> + int ret = 0;
> +
> + if (!pasid_supported(iommu))
> + return -EOPNOTSUPP;
> +
> + if (iommu->agaw < dmar_domain->s2_domain->agaw)
> + return -EINVAL;
> +
> + ret = prepare_domain_attach_device(&dmar_domain->s2_domain->domain, dev);
> + if (ret)
> + return ret;
> +
> + dev_pasid = kzalloc(sizeof(*dev_pasid), GFP_KERNEL);
> + if (!dev_pasid)
> + return -ENOMEM;
> +
> + ret = domain_attach_iommu(dmar_domain, iommu);
> + if (ret)
> + goto err_free;
> +
> + ret = intel_pasid_setup_nested(iommu, dev, pasid, dmar_domain);
> + if (ret)
> + goto err_detach_iommu;
> +
> + dev_pasid->dev = dev;
> + dev_pasid->pasid = pasid;
> + spin_lock_irqsave(&dmar_domain->lock, flags);
> + list_add(&dev_pasid->link_domain, &dmar_domain->dev_pasids);
> + spin_unlock_irqrestore(&dmar_domain->lock, flags);

---> list_add(&dev_pasid->link_domain, &dmar_domain->dev_pasids);

dev_pasid is linked at later time, this leads to domain->has_iotlb_device is not correctly set, which finally results into a missing of device iotlb flush in iommu_flush_dev_iotlb()when it's called.
Check this call path:
domain_attach_iommu()->domain_update_iommu_cap()->domain_update_iotlb()->domain->has_iotlb_device = has_iotlb_device; The ugly fixup is to call domain_update_iommu_cap() or domain_update_iotlb() here again before return.
The similar issue is in intel_iommu_set_dev_pasid() and intel_nested_attach_dev().
> +
> + return 0;
> +err_detach_iommu:
> + domain_detach_iommu(dmar_domain, iommu);
> +err_free:
> + kfree(dev_pasid);
> + return ret;
> +}
> +
> static void intel_nested_domain_free(struct iommu_domain *domain)
> {
> kfree(to_dmar_domain(domain));
> @@ -128,6 +174,7 @@ static int intel_nested_cache_invalidate_user(struct iommu_domain *domain,
>
> static const struct iommu_domain_ops intel_nested_domain_ops = {
> .attach_dev = intel_nested_attach_dev,
> + .set_dev_pasid = intel_nested_set_dev_pasid,
> .free = intel_nested_domain_free,
> .cache_invalidate_user = intel_nested_cache_invalidate_user,
> };

2023-12-14 13:34:06

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH 8/8] iommu/vt-d: Add set_dev_pasid callback for nested domain

On 2023/12/14 10:55, Yang, Weijiang wrote:
> On 11/27/2023 2:34 PM, Yi Liu wrote:
>> From: Lu Baolu <[email protected]>
>>
>> This allows the upper layers to set a nested type domain to a PASID of a
>> device if the PASID feature is supported by the IOMMU hardware.
>>
>> The set_dev_pasid callback for non-nest domain has already be there, so
>> this only needs to add it for nested domains.
>>
>> Signed-off-by: Lu Baolu <[email protected]>
>> Signed-off-by: Yi Liu <[email protected]>
>> ---
>>   drivers/iommu/intel/nested.c | 47 ++++++++++++++++++++++++++++++++++++
>>   1 file changed, 47 insertions(+)
>>
>> diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
>> index 44ad48db7ea0..f6f687750104 100644
>> --- a/drivers/iommu/intel/nested.c
>> +++ b/drivers/iommu/intel/nested.c
>> @@ -68,6 +68,52 @@ static int intel_nested_attach_dev(struct
>> iommu_domain *domain,
>>       return 0;
>>   }
>> +static int intel_nested_set_dev_pasid(struct iommu_domain *domain,
>> +                      struct device *dev, ioasid_t pasid)
>> +{
>> +    struct device_domain_info *info = dev_iommu_priv_get(dev);
>> +    struct dmar_domain *dmar_domain = to_dmar_domain(domain);
>> +    struct intel_iommu *iommu = info->iommu;
>> +    struct dev_pasid_info *dev_pasid;
>> +    unsigned long flags;
>> +    int ret = 0;
>> +
>> +    if (!pasid_supported(iommu))
>> +        return -EOPNOTSUPP;
>> +
>> +    if (iommu->agaw < dmar_domain->s2_domain->agaw)
>> +        return -EINVAL;
>> +
>> +    ret =
>> prepare_domain_attach_device(&dmar_domain->s2_domain->domain, dev);
>> +    if (ret)
>> +        return ret;
>> +
>> +    dev_pasid = kzalloc(sizeof(*dev_pasid), GFP_KERNEL);
>> +    if (!dev_pasid)
>> +        return -ENOMEM;
>> +
>> +    ret = domain_attach_iommu(dmar_domain, iommu);
>> +    if (ret)
>> +        goto err_free;
>> +
>> +    ret = intel_pasid_setup_nested(iommu, dev, pasid, dmar_domain);
>> +    if (ret)
>> +        goto err_detach_iommu;
>> +
>> +    dev_pasid->dev = dev;
>> +    dev_pasid->pasid = pasid;
>> +    spin_lock_irqsave(&dmar_domain->lock, flags);
>> +    list_add(&dev_pasid->link_domain, &dmar_domain->dev_pasids);
>> +    spin_unlock_irqrestore(&dmar_domain->lock, flags);
>
> ---> list_add(&dev_pasid->link_domain, &dmar_domain->dev_pasids);
>
> dev_pasid is linked at later time, this leads to
> domain->has_iotlb_device is not correctly set, which finally results
> into a missing of device iotlb flush in iommu_flush_dev_iotlb()when it's
> called.
> Check this call path:
> domain_attach_iommu()->domain_update_iommu_cap()->domain_update_iotlb()->domain->has_iotlb_device = has_iotlb_device; The ugly fixup is to call domain_update_iommu_cap() or domain_update_iotlb() here again before return.
> The similar issue is in intel_iommu_set_dev_pasid() and
> intel_nested_attach_dev().

Yes, domain->has_iotlb_device must be updated whenever a domain is
attached to (or removed from) a RID or PASID. I would be grateful if you
could post some patches to fix the set_device_pasid and
nested_attach_dev paths.

I assume Yi can fix this series in the next version.

Best regards,
baolu

2023-12-15 00:38:25

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH 8/8] iommu/vt-d: Add set_dev_pasid callback for nested domain

On 12/14/2023 9:33 PM, Baolu Lu wrote:
> On 2023/12/14 10:55, Yang, Weijiang wrote:
>> On 11/27/2023 2:34 PM, Yi Liu wrote:
>>> From: Lu Baolu <[email protected]>
>>>
>>> This allows the upper layers to set a nested type domain to a PASID of a
>>> device if the PASID feature is supported by the IOMMU hardware.
>>>
>>> The set_dev_pasid callback for non-nest domain has already be there, so
>>> this only needs to add it for nested domains.
>>>
>>> Signed-off-by: Lu Baolu <[email protected]>
>>> Signed-off-by: Yi Liu <[email protected]>
>>> ---
>>>   drivers/iommu/intel/nested.c | 47 ++++++++++++++++++++++++++++++++++++
>>>   1 file changed, 47 insertions(+)
>>>
>>> diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
>>> index 44ad48db7ea0..f6f687750104 100644
>>> --- a/drivers/iommu/intel/nested.c
>>> +++ b/drivers/iommu/intel/nested.c
>>> @@ -68,6 +68,52 @@ static int intel_nested_attach_dev(struct iommu_domain *domain,
>>>       return 0;
>>>   }
>>> +static int intel_nested_set_dev_pasid(struct iommu_domain *domain,
>>> +                      struct device *dev, ioasid_t pasid)
>>> +{
>>> +    struct device_domain_info *info = dev_iommu_priv_get(dev);
>>> +    struct dmar_domain *dmar_domain = to_dmar_domain(domain);
>>> +    struct intel_iommu *iommu = info->iommu;
>>> +    struct dev_pasid_info *dev_pasid;
>>> +    unsigned long flags;
>>> +    int ret = 0;
>>> +
>>> +    if (!pasid_supported(iommu))
>>> +        return -EOPNOTSUPP;
>>> +
>>> +    if (iommu->agaw < dmar_domain->s2_domain->agaw)
>>> +        return -EINVAL;
>>> +
>>> +    ret = prepare_domain_attach_device(&dmar_domain->s2_domain->domain, dev);
>>> +    if (ret)
>>> +        return ret;
>>> +
>>> +    dev_pasid = kzalloc(sizeof(*dev_pasid), GFP_KERNEL);
>>> +    if (!dev_pasid)
>>> +        return -ENOMEM;
>>> +
>>> +    ret = domain_attach_iommu(dmar_domain, iommu);
>>> +    if (ret)
>>> +        goto err_free;
>>> +
>>> +    ret = intel_pasid_setup_nested(iommu, dev, pasid, dmar_domain);
>>> +    if (ret)
>>> +        goto err_detach_iommu;
>>> +
>>> +    dev_pasid->dev = dev;
>>> +    dev_pasid->pasid = pasid;
>>> +    spin_lock_irqsave(&dmar_domain->lock, flags);
>>> +    list_add(&dev_pasid->link_domain, &dmar_domain->dev_pasids);
>>> +    spin_unlock_irqrestore(&dmar_domain->lock, flags);
>>
>> ---> list_add(&dev_pasid->link_domain, &dmar_domain->dev_pasids);
>>
>> dev_pasid is linked at later time, this leads to domain->has_iotlb_device is not correctly set, which finally results into a missing of device iotlb flush in iommu_flush_dev_iotlb()when it's called.
>> Check this call path:
>> domain_attach_iommu()->domain_update_iommu_cap()->domain_update_iotlb()->domain->has_iotlb_device = has_iotlb_device; The ugly fixup is to call domain_update_iommu_cap() or domain_update_iotlb() here again before return.
>> The similar issue is in intel_iommu_set_dev_pasid() and intel_nested_attach_dev().
>
> Yes, domain->has_iotlb_device must be updated whenever a domain is
> attached to (or removed from) a RID or PASID. I would be grateful if you
> could post some patches to fix the set_device_pasid and
> nested_attach_dev paths.

Sure, I'll post fixup patches for the paths, thanks for confirmation!

>
> I assume Yi can fix this series in the next version.
>
> Best regards,
> baolu


2024-01-15 17:20:07

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 1/8] iommu: Introduce a replace API for device pasid

On Sun, Nov 26, 2023 at 10:34:21PM -0800, Yi Liu wrote:
> +int iommu_replace_device_pasid(struct iommu_domain *domain,
> + struct device *dev, ioasid_t pasid)
> +{
> + struct iommu_group *group = dev->iommu_group;
> + struct iommu_domain *old_domain;
> + int ret;
> +
> + if (!domain)
> + return -EINVAL;
> +
> + if (!group)
> + return -ENODEV;
> +
> + mutex_lock(&group->mutex);
> + __iommu_remove_group_pasid(group, pasid);

It is not replace if you do remove first.

Replace must just call set_dev_pasid and nothing much else..

Jason

2024-01-15 17:22:33

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 8/8] iommu/vt-d: Add set_dev_pasid callback for nested domain

On Sun, Nov 26, 2023 at 10:34:28PM -0800, Yi Liu wrote:

> +static int intel_nested_set_dev_pasid(struct iommu_domain *domain,
> + struct device *dev, ioasid_t pasid)
> +{
> + struct device_domain_info *info = dev_iommu_priv_get(dev);
> + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> + struct intel_iommu *iommu = info->iommu;
> + struct dev_pasid_info *dev_pasid;
> + unsigned long flags;
> + int ret = 0;
> +
> + if (!pasid_supported(iommu))
> + return -EOPNOTSUPP;
> +
> + if (iommu->agaw < dmar_domain->s2_domain->agaw)
> + return -EINVAL;
> +
> + ret = prepare_domain_attach_device(&dmar_domain->s2_domain->domain, dev);
> + if (ret)
> + return ret;
> +
> + dev_pasid = kzalloc(sizeof(*dev_pasid), GFP_KERNEL);
> + if (!dev_pasid)
> + return -ENOMEM;
> +
> + ret = domain_attach_iommu(dmar_domain, iommu);
> + if (ret)
> + goto err_free;
> +
> + ret = intel_pasid_setup_nested(iommu, dev, pasid, dmar_domain);
> + if (ret)
> + goto err_detach_iommu;
> +
> + dev_pasid->dev = dev;
> + dev_pasid->pasid = pasid;
> + spin_lock_irqsave(&dmar_domain->lock, flags);
> + list_add(&dev_pasid->link_domain, &dmar_domain->dev_pasids);
> + spin_unlock_irqrestore(&dmar_domain->lock, flags);
> +
> + return 0;
> +err_detach_iommu:
> + domain_detach_iommu(dmar_domain, iommu);
> +err_free:
> + kfree(dev_pasid);
> + return ret;
> +}

This seems alot longer than I'd think it should be, why isn't it
exactly the same code as the other set_dev_pasid's?

Jason

2024-01-15 17:24:47

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 3/8] iommufd: Support attach/replace hwpt per pasid

On Sun, Nov 26, 2023 at 10:34:23PM -0800, Yi Liu wrote:
> @@ -534,7 +537,17 @@ iommufd_device_do_replace(struct iommufd_device *idev,
> static struct iommufd_hw_pagetable *do_attach(struct iommufd_device *idev,
> struct iommufd_hw_pagetable *hwpt, struct attach_data *data)
> {
> - return data->attach_fn(idev, hwpt);
> + if (data->pasid == IOMMU_PASID_INVALID) {
> + BUG_ON((data->attach_fn != iommufd_device_do_attach) &&
> + (data->attach_fn != iommufd_device_do_replace));
> + return data->attach_fn(idev, hwpt);
> + } else {
> + BUG_ON((data->pasid_attach_fn !=
> + iommufd_device_pasid_do_attach) &&
> + (data->pasid_attach_fn !=
> + iommufd_device_pasid_do_replace));
> + return data->pasid_attach_fn(idev, data->pasid, hwpt);
> + }

Seems like the BUG_ON's are pointless

> +/**
> + * iommufd_device_pasid_detach - Disconnect a {device, pasid} to an iommu_domain
> + * @idev: device to detach
> + * @pasid: pasid to detach
> + *
> + * Undo iommufd_device_pasid_attach(). This disconnects the idev/pasid from
> + * the previously attached pt_id.
> + */
> +void iommufd_device_pasid_detach(struct iommufd_device *idev, u32 pasid)
> +{
> + struct iommufd_hw_pagetable *hwpt;
> +
> + hwpt = xa_load(&idev->pasid_hwpts, pasid);
> + if (!hwpt)
> + return;
> + xa_erase(&idev->pasid_hwpts, pasid);
> + iommu_detach_device_pasid(hwpt->domain, idev->dev, pasid);
> + iommufd_hw_pagetable_put(idev->ictx, hwpt);
> +}

None of this xarray stuff looks locked properly

Jason

2024-01-16 01:53:48

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH 3/8] iommufd: Support attach/replace hwpt per pasid

> From: Jason Gunthorpe <[email protected]>
> Sent: Tuesday, January 16, 2024 1:25 AM
>
> On Sun, Nov 26, 2023 at 10:34:23PM -0800, Yi Liu wrote:
> > +/**
> > + * iommufd_device_pasid_detach - Disconnect a {device, pasid} to an
> iommu_domain
> > + * @idev: device to detach
> > + * @pasid: pasid to detach
> > + *
> > + * Undo iommufd_device_pasid_attach(). This disconnects the idev/pasid
> from
> > + * the previously attached pt_id.
> > + */
> > +void iommufd_device_pasid_detach(struct iommufd_device *idev, u32
> pasid)
> > +{
> > + struct iommufd_hw_pagetable *hwpt;
> > +
> > + hwpt = xa_load(&idev->pasid_hwpts, pasid);
> > + if (!hwpt)
> > + return;
> > + xa_erase(&idev->pasid_hwpts, pasid);
> > + iommu_detach_device_pasid(hwpt->domain, idev->dev, pasid);
> > + iommufd_hw_pagetable_put(idev->ictx, hwpt);
> > +}
>
> None of this xarray stuff looks locked properly
>

I had an impression from past discussions that the caller should not
race attach/detach/replace on same device or pasid, otherwise it is
already a problem in a higher level.

and the original intention of the group lock was to ensure all devices
in the group have a same view. Not exactly to guard concurrent
attach/detach.

If this understanding is incorrect we can add a lock for sure. ????

2024-01-16 12:58:28

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 3/8] iommufd: Support attach/replace hwpt per pasid

On Tue, Jan 16, 2024 at 01:18:12AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <[email protected]>
> > Sent: Tuesday, January 16, 2024 1:25 AM
> >
> > On Sun, Nov 26, 2023 at 10:34:23PM -0800, Yi Liu wrote:
> > > +/**
> > > + * iommufd_device_pasid_detach - Disconnect a {device, pasid} to an
> > iommu_domain
> > > + * @idev: device to detach
> > > + * @pasid: pasid to detach
> > > + *
> > > + * Undo iommufd_device_pasid_attach(). This disconnects the idev/pasid
> > from
> > > + * the previously attached pt_id.
> > > + */
> > > +void iommufd_device_pasid_detach(struct iommufd_device *idev, u32
> > pasid)
> > > +{
> > > + struct iommufd_hw_pagetable *hwpt;
> > > +
> > > + hwpt = xa_load(&idev->pasid_hwpts, pasid);
> > > + if (!hwpt)
> > > + return;
> > > + xa_erase(&idev->pasid_hwpts, pasid);
> > > + iommu_detach_device_pasid(hwpt->domain, idev->dev, pasid);
> > > + iommufd_hw_pagetable_put(idev->ictx, hwpt);
> > > +}
> >
> > None of this xarray stuff looks locked properly
> >
>
> I had an impression from past discussions that the caller should not
> race attach/detach/replace on same device or pasid, otherwise it is
> already a problem in a higher level.

I thought that was just at the iommu layer? We want VFIO to do the
same? Then why do we need the dual xarrays?

Still, it looks really wrong to have code like this, we don't need to
- it can be locked properly, it isn't a performance path..

> and the original intention of the group lock was to ensure all devices
> in the group have a same view. Not exactly to guard concurrent
> attach/detach.

We don't have a group lock here, this is in iommufd.

Use the xarray lock..

eg

hwpt = xa_erase(&idev->pasid_hwpts, pasid);
if (WARN_ON(!hwpt))
return

xa_erase is atomic.

Jason

2024-01-17 04:18:07

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH 3/8] iommufd: Support attach/replace hwpt per pasid

> From: Jason Gunthorpe <[email protected]>
> Sent: Tuesday, January 16, 2024 8:58 PM
>
> On Tue, Jan 16, 2024 at 01:18:12AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <[email protected]>
> > > Sent: Tuesday, January 16, 2024 1:25 AM
> > >
> > > On Sun, Nov 26, 2023 at 10:34:23PM -0800, Yi Liu wrote:
> > > > +/**
> > > > + * iommufd_device_pasid_detach - Disconnect a {device, pasid} to an
> > > iommu_domain
> > > > + * @idev: device to detach
> > > > + * @pasid: pasid to detach
> > > > + *
> > > > + * Undo iommufd_device_pasid_attach(). This disconnects the
> idev/pasid
> > > from
> > > > + * the previously attached pt_id.
> > > > + */
> > > > +void iommufd_device_pasid_detach(struct iommufd_device *idev, u32
> > > pasid)
> > > > +{
> > > > + struct iommufd_hw_pagetable *hwpt;
> > > > +
> > > > + hwpt = xa_load(&idev->pasid_hwpts, pasid);
> > > > + if (!hwpt)
> > > > + return;
> > > > + xa_erase(&idev->pasid_hwpts, pasid);
> > > > + iommu_detach_device_pasid(hwpt->domain, idev->dev, pasid);
> > > > + iommufd_hw_pagetable_put(idev->ictx, hwpt);
> > > > +}
> > >
> > > None of this xarray stuff looks locked properly
> > >
> >
> > I had an impression from past discussions that the caller should not
> > race attach/detach/replace on same device or pasid, otherwise it is
> > already a problem in a higher level.
>
> I thought that was just at the iommu layer? We want VFIO to do the
> same? Then why do we need the dual xarrays?
>
> Still, it looks really wrong to have code like this, we don't need to
> - it can be locked properly, it isn't a performance path..

OK, let's add a lock for this.

>
> > and the original intention of the group lock was to ensure all devices
> > in the group have a same view. Not exactly to guard concurrent
> > attach/detach.
>
> We don't have a group lock here, this is in iommufd.

I meant the lock in iommufd_group.

>
> Use the xarray lock..
>
> eg
>
> hwpt = xa_erase(&idev->pasid_hwpts, pasid);
> if (WARN_ON(!hwpt))
> return
>
> xa_erase is atomic.
>

yes, that's better.

2024-01-17 08:21:46

by Yi Liu

[permalink] [raw]
Subject: Re: [PATCH 3/8] iommufd: Support attach/replace hwpt per pasid

On 2024/1/17 12:17, Tian, Kevin wrote:
>> From: Jason Gunthorpe <[email protected]>
>> Sent: Tuesday, January 16, 2024 8:58 PM
>>
>> On Tue, Jan 16, 2024 at 01:18:12AM +0000, Tian, Kevin wrote:
>>>> From: Jason Gunthorpe <[email protected]>
>>>> Sent: Tuesday, January 16, 2024 1:25 AM
>>>>
>>>> On Sun, Nov 26, 2023 at 10:34:23PM -0800, Yi Liu wrote:
>>>>> +/**
>>>>> + * iommufd_device_pasid_detach - Disconnect a {device, pasid} to an
>>>> iommu_domain
>>>>> + * @idev: device to detach
>>>>> + * @pasid: pasid to detach
>>>>> + *
>>>>> + * Undo iommufd_device_pasid_attach(). This disconnects the
>> idev/pasid
>>>> from
>>>>> + * the previously attached pt_id.
>>>>> + */
>>>>> +void iommufd_device_pasid_detach(struct iommufd_device *idev, u32
>>>> pasid)
>>>>> +{
>>>>> + struct iommufd_hw_pagetable *hwpt;
>>>>> +
>>>>> + hwpt = xa_load(&idev->pasid_hwpts, pasid);
>>>>> + if (!hwpt)
>>>>> + return;
>>>>> + xa_erase(&idev->pasid_hwpts, pasid);
>>>>> + iommu_detach_device_pasid(hwpt->domain, idev->dev, pasid);
>>>>> + iommufd_hw_pagetable_put(idev->ictx, hwpt);
>>>>> +}
>>>>
>>>> None of this xarray stuff looks locked properly
>>>>
>>>
>>> I had an impression from past discussions that the caller should not
>>> race attach/detach/replace on same device or pasid, otherwise it is
>>> already a problem in a higher level.
>>
>> I thought that was just at the iommu layer? We want VFIO to do the
>> same? Then why do we need the dual xarrays?
>>
>> Still, it looks really wrong to have code like this, we don't need to
>> - it can be locked properly, it isn't a performance path..
>
> OK, let's add a lock for this.
>
>>
>>> and the original intention of the group lock was to ensure all devices
>>> in the group have a same view. Not exactly to guard concurrent
>>> attach/detach.
>>
>> We don't have a group lock here, this is in iommufd.
>
> I meant the lock in iommufd_group.
>
>>
>> Use the xarray lock..
>>
>> eg
>>
>> hwpt = xa_erase(&idev->pasid_hwpts, pasid);
>> if (WARN_ON(!hwpt))
>> return
>>
>> xa_erase is atomic.
>>
>
> yes, that's better.

Above indeed makes more sense if there can be concurrent attach/replace/detach
on a single pasid. Just have one doubt should we add lock to protect the
whole attach/replace/detach paths. In the attach/replace path[1] [2], the
xarray entry is verified firstly, and then updated after returning from
iommu attach/replace API. It is uneasy to protect the xarray operations only
with xa_lock as a detach path can acquire xa_lock right after attach/replace
path checks the xarray. To avoid it, may need a mutex to protect the whole
attach/replace/detach path to avoid race. Or maybe the attach/replace path
should mark the corresponding entry as a special state that can block the
other path like detach until the attach/replace path update the final hwpt to
the xarray. Is there such state in xarray?

[1] iommufd_device_pasid_attach() -> iommufd_device_pasid_do_attach() ->
__iommufd_device_pasid_do_attach()
[2] iommufd_device_pasid_replace -> iommufd_device_pasid_do_replace ->
__iommufd_device_pasid_do_attach

Regards,
Yi Liu

2024-01-17 12:00:11

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH 8/8] iommu/vt-d: Add set_dev_pasid callback for nested domain

On 2024/1/16 1:22, Jason Gunthorpe wrote:
> On Sun, Nov 26, 2023 at 10:34:28PM -0800, Yi Liu wrote:
>
>> +static int intel_nested_set_dev_pasid(struct iommu_domain *domain,
>> + struct device *dev, ioasid_t pasid)
>> +{
>> + struct device_domain_info *info = dev_iommu_priv_get(dev);
>> + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
>> + struct intel_iommu *iommu = info->iommu;
>> + struct dev_pasid_info *dev_pasid;
>> + unsigned long flags;
>> + int ret = 0;
>> +
>> + if (!pasid_supported(iommu))
>> + return -EOPNOTSUPP;
>> +
>> + if (iommu->agaw < dmar_domain->s2_domain->agaw)
>> + return -EINVAL;
>> +
>> + ret = prepare_domain_attach_device(&dmar_domain->s2_domain->domain, dev);
>> + if (ret)
>> + return ret;
>> +
>> + dev_pasid = kzalloc(sizeof(*dev_pasid), GFP_KERNEL);
>> + if (!dev_pasid)
>> + return -ENOMEM;
>> +
>> + ret = domain_attach_iommu(dmar_domain, iommu);
>> + if (ret)
>> + goto err_free;
>> +
>> + ret = intel_pasid_setup_nested(iommu, dev, pasid, dmar_domain);
>> + if (ret)
>> + goto err_detach_iommu;
>> +
>> + dev_pasid->dev = dev;
>> + dev_pasid->pasid = pasid;
>> + spin_lock_irqsave(&dmar_domain->lock, flags);
>> + list_add(&dev_pasid->link_domain, &dmar_domain->dev_pasids);
>> + spin_unlock_irqrestore(&dmar_domain->lock, flags);
>> +
>> + return 0;
>> +err_detach_iommu:
>> + domain_detach_iommu(dmar_domain, iommu);
>> +err_free:
>> + kfree(dev_pasid);
>> + return ret;
>> +}
> This seems alot longer than I'd think it should be, why isn't it
> exactly the same code as the other set_dev_pasid's?

Yes. It should be. The only difference is how to configure the pasid
entry.

Best regards,
baolu

2024-01-17 12:57:58

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 3/8] iommufd: Support attach/replace hwpt per pasid

On Wed, Jan 17, 2024 at 04:24:24PM +0800, Yi Liu wrote:
> Above indeed makes more sense if there can be concurrent attach/replace/detach
> on a single pasid. Just have one doubt should we add lock to protect the
> whole attach/replace/detach paths. In the attach/replace path[1] [2], the
> xarray entry is verified firstly, and then updated after returning from
> iommu attach/replace API. It is uneasy to protect the xarray operations only
> with xa_lock as a detach path can acquire xa_lock right after attach/replace
> path checks the xarray. To avoid it, may need a mutex to protect the whole
> attach/replace/detach path to avoid race. Or maybe the attach/replace path
> should mark the corresponding entry as a special state that can block the
> other path like detach until the attach/replace path update the final hwpt to
> the xarray. Is there such state in xarray?

If the caller is not allowed to make concurrent attaches/detaches to
the same pasid then you can document that in a comment, but it is
still better to use xarray in a self-consistent way.

Jason

2024-01-18 09:25:45

by Yi Liu

[permalink] [raw]
Subject: Re: [PATCH 3/8] iommufd: Support attach/replace hwpt per pasid

On 2024/1/17 20:56, Jason Gunthorpe wrote:
> On Wed, Jan 17, 2024 at 04:24:24PM +0800, Yi Liu wrote:
>> Above indeed makes more sense if there can be concurrent attach/replace/detach
>> on a single pasid. Just have one doubt should we add lock to protect the
>> whole attach/replace/detach paths. In the attach/replace path[1] [2], the
>> xarray entry is verified firstly, and then updated after returning from
>> iommu attach/replace API. It is uneasy to protect the xarray operations only
>> with xa_lock as a detach path can acquire xa_lock right after attach/replace
>> path checks the xarray. To avoid it, may need a mutex to protect the whole
>> attach/replace/detach path to avoid race. Or maybe the attach/replace path
>> should mark the corresponding entry as a special state that can block the
>> other path like detach until the attach/replace path update the final hwpt to
>> the xarray. Is there such state in xarray?
>
> If the caller is not allowed to make concurrent attaches/detaches to
> the same pasid then you can document that in a comment,

yes. I can document it. Otherwise, we may need a mutex for pasid to allow
concurrent attaches/detaches.

> but it is
> still better to use xarray in a self-consistent way.

sure. I'll try. At least in the detach path, xarray should be what you've
suggested in prior email. Currently in the attach path, the logic is as
below. Perhaps I can skip the check on old_hwpt since
iommu_attach_device_pasid() should fail if there is an existing domain
attached on the pasid. Then the xarray should be more consistent. what
about your opinion?

old_hwpt = xa_load(&idev->pasid_hwpts, pasid);
if (old_hwpt) {
/* Attach does not allow overwrite */
if (old_hwpt == hwpt)
return NULL;
else
return ERR_PTR(-EINVAL);
}

rc = iommu_attach_device_pasid(hwpt->domain, idev->dev, pasid);
if (rc)
return ERR_PTR(rc);

refcount_inc(&hwpt->obj.users);
xa_store(&idev->pasid_hwpts, pasid, hwpt, GFP_KERNEL);

--
Regards,
Yi Liu

2024-01-18 13:39:19

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 3/8] iommufd: Support attach/replace hwpt per pasid

On Thu, Jan 18, 2024 at 05:28:01PM +0800, Yi Liu wrote:
> On 2024/1/17 20:56, Jason Gunthorpe wrote:
> > On Wed, Jan 17, 2024 at 04:24:24PM +0800, Yi Liu wrote:
> > > Above indeed makes more sense if there can be concurrent attach/replace/detach
> > > on a single pasid. Just have one doubt should we add lock to protect the
> > > whole attach/replace/detach paths. In the attach/replace path[1] [2], the
> > > xarray entry is verified firstly, and then updated after returning from
> > > iommu attach/replace API. It is uneasy to protect the xarray operations only
> > > with xa_lock as a detach path can acquire xa_lock right after attach/replace
> > > path checks the xarray. To avoid it, may need a mutex to protect the whole
> > > attach/replace/detach path to avoid race. Or maybe the attach/replace path
> > > should mark the corresponding entry as a special state that can block the
> > > other path like detach until the attach/replace path update the final hwpt to
> > > the xarray. Is there such state in xarray?
> >
> > If the caller is not allowed to make concurrent attaches/detaches to
> > the same pasid then you can document that in a comment,
>
> yes. I can document it. Otherwise, we may need a mutex for pasid to allow
> concurrent attaches/detaches.
>
> > but it is
> > still better to use xarray in a self-consistent way.
>
> sure. I'll try. At least in the detach path, xarray should be what you've
> suggested in prior email. Currently in the attach path, the logic is as
> below. Perhaps I can skip the check on old_hwpt since
> iommu_attach_device_pasid() should fail if there is an existing domain
> attached on the pasid. Then the xarray should be more consistent. what
> about your opinion?
>
> old_hwpt = xa_load(&idev->pasid_hwpts, pasid);
> if (old_hwpt) {
> /* Attach does not allow overwrite */
> if (old_hwpt == hwpt)
> return NULL;
> else
> return ERR_PTR(-EINVAL);
> }
>
> rc = iommu_attach_device_pasid(hwpt->domain, idev->dev, pasid);
> if (rc)
> return ERR_PTR(rc);
>
> refcount_inc(&hwpt->obj.users);
> xa_store(&idev->pasid_hwpts, pasid, hwpt, GFP_KERNEL);

Use xa_cmpxchg()

Jason

2024-01-19 10:12:50

by Yi Liu

[permalink] [raw]
Subject: Re: [PATCH 3/8] iommufd: Support attach/replace hwpt per pasid

On 2024/1/18 21:38, Jason Gunthorpe wrote:
> On Thu, Jan 18, 2024 at 05:28:01PM +0800, Yi Liu wrote:
>> On 2024/1/17 20:56, Jason Gunthorpe wrote:
>>> On Wed, Jan 17, 2024 at 04:24:24PM +0800, Yi Liu wrote:
>>>> Above indeed makes more sense if there can be concurrent attach/replace/detach
>>>> on a single pasid. Just have one doubt should we add lock to protect the
>>>> whole attach/replace/detach paths. In the attach/replace path[1] [2], the
>>>> xarray entry is verified firstly, and then updated after returning from
>>>> iommu attach/replace API. It is uneasy to protect the xarray operations only
>>>> with xa_lock as a detach path can acquire xa_lock right after attach/replace
>>>> path checks the xarray. To avoid it, may need a mutex to protect the whole
>>>> attach/replace/detach path to avoid race. Or maybe the attach/replace path
>>>> should mark the corresponding entry as a special state that can block the
>>>> other path like detach until the attach/replace path update the final hwpt to
>>>> the xarray. Is there such state in xarray?
>>>
>>> If the caller is not allowed to make concurrent attaches/detaches to
>>> the same pasid then you can document that in a comment,
>>
>> yes. I can document it. Otherwise, we may need a mutex for pasid to allow
>> concurrent attaches/detaches.
>>
>>> but it is
>>> still better to use xarray in a self-consistent way.
>>
>> sure. I'll try. At least in the detach path, xarray should be what you've
>> suggested in prior email. Currently in the attach path, the logic is as
>> below. Perhaps I can skip the check on old_hwpt since
>> iommu_attach_device_pasid() should fail if there is an existing domain
>> attached on the pasid. Then the xarray should be more consistent. what
>> about your opinion?
>>
>> old_hwpt = xa_load(&idev->pasid_hwpts, pasid);
>> if (old_hwpt) {
>> /* Attach does not allow overwrite */
>> if (old_hwpt == hwpt)
>> return NULL;
>> else
>> return ERR_PTR(-EINVAL);
>> }
>>
>> rc = iommu_attach_device_pasid(hwpt->domain, idev->dev, pasid);
>> if (rc)
>> return ERR_PTR(rc);
>>
>> refcount_inc(&hwpt->obj.users);
>> xa_store(&idev->pasid_hwpts, pasid, hwpt, GFP_KERNEL);
>
> Use xa_cmpxchg()

sure.

--
Regards,
Yi Liu

2024-03-10 13:03:00

by Yi Liu

[permalink] [raw]
Subject: Re: [PATCH 1/8] iommu: Introduce a replace API for device pasid

On 2024/1/16 01:19, Jason Gunthorpe wrote:
> On Sun, Nov 26, 2023 at 10:34:21PM -0800, Yi Liu wrote:
>> +int iommu_replace_device_pasid(struct iommu_domain *domain,
>> + struct device *dev, ioasid_t pasid)
>> +{
>> + struct iommu_group *group = dev->iommu_group;
>> + struct iommu_domain *old_domain;
>> + int ret;
>> +
>> + if (!domain)
>> + return -EINVAL;
>> +
>> + if (!group)
>> + return -ENODEV;
>> +
>> + mutex_lock(&group->mutex);
>> + __iommu_remove_group_pasid(group, pasid);
>
> It is not replace if you do remove first.
>
> Replace must just call set_dev_pasid and nothing much else..

Seems uneasy to do it so far. The VT-d driver needs to get the old domain
first in order to do replacement. However, VT-d driver does not track the
attached domains of pasids. It gets domain of a pasid
by iommu_get_domain_for_dev_pasid(). Like intel_iommu_remove_dev_pasid)
in link [1]. While the iommu layer exchanges the domain in the
group->pasid_array before calling into VT-d driver. So when calling into
VT-d driver, the domain got by iommu_get_domain_for_dev_pasid() is already
the new domain. To solve it, we may need to let VT-d driver have its
own tracking on the domains. How about your thoughts? @Jason, @Kevin, @Baoplu?

[1]
https://github.com/torvalds/linux/blob/master/drivers/iommu/intel/iommu.c#L4621C19-L4621C20

--
Regards,
Yi Liu

2024-03-11 09:26:36

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH 1/8] iommu: Introduce a replace API for device pasid

> From: Liu, Yi L <[email protected]>
> Sent: Sunday, March 10, 2024 9:06 PM
>
> On 2024/1/16 01:19, Jason Gunthorpe wrote:
> > On Sun, Nov 26, 2023 at 10:34:21PM -0800, Yi Liu wrote:
> >> +int iommu_replace_device_pasid(struct iommu_domain *domain,
> >> + struct device *dev, ioasid_t pasid)
> >> +{
> >> + struct iommu_group *group = dev->iommu_group;
> >> + struct iommu_domain *old_domain;
> >> + int ret;
> >> +
> >> + if (!domain)
> >> + return -EINVAL;
> >> +
> >> + if (!group)
> >> + return -ENODEV;
> >> +
> >> + mutex_lock(&group->mutex);
> >> + __iommu_remove_group_pasid(group, pasid);
> >
> > It is not replace if you do remove first.
> >
> > Replace must just call set_dev_pasid and nothing much else..
>
> Seems uneasy to do it so far. The VT-d driver needs to get the old domain
> first in order to do replacement. However, VT-d driver does not track the
> attached domains of pasids. It gets domain of a pasid
> by iommu_get_domain_for_dev_pasid(). Like
> intel_iommu_remove_dev_pasid)
> in link [1]. While the iommu layer exchanges the domain in the
> group->pasid_array before calling into VT-d driver. So when calling into
> VT-d driver, the domain got by iommu_get_domain_for_dev_pasid() is
> already
> the new domain. To solve it, we may need to let VT-d driver have its
> own tracking on the domains. How about your thoughts? @Jason, @Kevin,
> @Baoplu?
>
> [1]
> https://github.com/torvalds/linux/blob/master/drivers/iommu/intel/iommu
> .c#L4621C19-L4621C20
>

Jason's point was that the core helper should directly call set_dev_pasid
and underlying iommu driver decides whether atomic replacement
can be implemented inside the callback. If you first remove in the core
then one can never implement a replace semantics.

2024-03-12 03:04:51

by Yi Liu

[permalink] [raw]
Subject: Re: [PATCH 1/8] iommu: Introduce a replace API for device pasid

On 2024/3/11 17:26, Tian, Kevin wrote:
>> From: Liu, Yi L <[email protected]>
>> Sent: Sunday, March 10, 2024 9:06 PM
>>
>> On 2024/1/16 01:19, Jason Gunthorpe wrote:
>>> On Sun, Nov 26, 2023 at 10:34:21PM -0800, Yi Liu wrote:
>>>> +int iommu_replace_device_pasid(struct iommu_domain *domain,
>>>> + struct device *dev, ioasid_t pasid)
>>>> +{
>>>> + struct iommu_group *group = dev->iommu_group;
>>>> + struct iommu_domain *old_domain;
>>>> + int ret;
>>>> +
>>>> + if (!domain)
>>>> + return -EINVAL;
>>>> +
>>>> + if (!group)
>>>> + return -ENODEV;
>>>> +
>>>> + mutex_lock(&group->mutex);
>>>> + __iommu_remove_group_pasid(group, pasid);
>>>
>>> It is not replace if you do remove first.
>>>
>>> Replace must just call set_dev_pasid and nothing much else..
>>
>> Seems uneasy to do it so far. The VT-d driver needs to get the old domain
>> first in order to do replacement. However, VT-d driver does not track the
>> attached domains of pasids. It gets domain of a pasid
>> by iommu_get_domain_for_dev_pasid(). Like
>> intel_iommu_remove_dev_pasid)
>> in link [1]. While the iommu layer exchanges the domain in the
>> group->pasid_array before calling into VT-d driver. So when calling into
>> VT-d driver, the domain got by iommu_get_domain_for_dev_pasid() is
>> already
>> the new domain. To solve it, we may need to let VT-d driver have its
>> own tracking on the domains. How about your thoughts? @Jason, @Kevin,
>> @Baoplu?
>>
>> [1]
>> https://github.com/torvalds/linux/blob/master/drivers/iommu/intel/iommu
>> .c#L4621C19-L4621C20
>>
>
> Jason's point was that the core helper should directly call set_dev_pasid
> and underlying iommu driver decides whether atomic replacement
> can be implemented inside the callback. If you first remove in the core
> then one can never implement a replace semantics.

yeah, I got Jason's point. I'm raising an open to make the set_dev_pasid
callback to handle domain replacement. The existing intel iommu driver
does not track attached domains for pasid. But it's needed to know the
attached domain of a pasid and find the dev_pasid under the domain, hence
be able to clean up the old attachment and do the new attachment. Existing
code cannot work as I mentioned above. The group->pasid_xarray is updated
before calling set_dev_pasid callback. This means the underlying iommu
driver cannot get the old domain in the set_dev_pasid callback by the
iommu_get_domain_for_dev_pasid() helper.

As above, I'm considering the possibility to track attached domains for
pasid by an xarray in the struct device_domain_info. Hence, intel iommu
driver could store/retrieve attached domain for pasids. If it's
replacement, the set_dev_pasid callback could find the attached domain and
the dev_pasid instance in the domain. Then be able to do detach and clean
up the tracking structures (e.g. dev_pasid).

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index e58263dcfadd..03a847a406e1 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4398,6 +4398,7 @@ static struct iommu_device
*intel_iommu_probe_device(struct device *dev)

info->dev = dev;
info->iommu = iommu;
+ xa_init(&info->pasid_array);
if (dev_is_pci(dev)) {
if (ecap_dev_iotlb_support(iommu->ecap) &&
pci_ats_supported(pdev) &&
@@ -4464,6 +4465,7 @@ static void intel_iommu_release_device(struct device
*dev)
mutex_unlock(&iommu->iopf_lock);
dmar_remove_one_dev_info(dev);
intel_pasid_free_table(dev);
+ WARN_ON(!xa_empty(&info->pasid_array));
intel_iommu_debugfs_remove_dev(info);
kfree(info);
set_dma_ops(dev, NULL);
@@ -4707,7 +4709,13 @@ static int intel_iommu_iotlb_sync_map(struct
iommu_domain *domain,
return 0;
}

-static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
+/*
+ * Clear the pasid table entries so that all DMA requests with specific
+ * PASID from the device are blocked. If the page table has been set, clean
+ * up the data structures.
+ */
+static void device_pasid_block_translation(struct device *dev, ioasid_t pasid,
+ bool remove)
{
struct device_domain_info *info = dev_iommu_priv_get(dev);
struct dev_pasid_info *curr, *dev_pasid = NULL;
@@ -4716,9 +4724,11 @@ static void intel_iommu_remove_dev_pasid(struct
device *dev, ioasid_t pasid)
struct iommu_domain *domain;
unsigned long flags;

- domain = iommu_get_domain_for_dev_pasid(dev, pasid, 0);
- if (WARN_ON_ONCE(!domain))
+ domain = xa_erase(&info->pasid_array, pasid);
+ if (!domain) {
+ WARN_ON_ONCE(remove);
return;
+ }

/*
* The SVA implementation needs to handle its own stuffs like the mm
@@ -4753,6 +4763,11 @@ static void intel_iommu_remove_dev_pasid(struct
device *dev, ioasid_t pasid)
intel_drain_pasid_prq(dev, pasid);
}

+static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
+{
+ device_pasid_block_translation(dev, pasid, true);
+}
+
static int intel_iommu_set_dev_pasid(struct iommu_domain *domain,
struct device *dev, ioasid_t pasid)
{
@@ -4772,6 +4787,9 @@ static int intel_iommu_set_dev_pasid(struct
iommu_domain *domain,
if (context_copied(iommu, info->bus, info->devfn))
return -EBUSY;

+ /* Try to block translation if it already has */
+ device_pasid_block_translation(dev, pasid, false);
+
ret = prepare_domain_attach_device(domain, dev);
if (ret)
return ret;
@@ -4801,6 +4819,8 @@ static int intel_iommu_set_dev_pasid(struct
iommu_domain *domain,
list_add(&dev_pasid->link_domain, &dmar_domain->dev_pasids);
spin_unlock_irqrestore(&dmar_domain->lock, flags);

+ xa_store(&info->pasid_array, pasid, dmar_domain, GFP_KERNEL);
+
if (domain->type & __IOMMU_DOMAIN_PAGING)
intel_iommu_debugfs_create_dev_pasid(dev_pasid);

diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 94e2ecc3c73f..a466555f61fe 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -767,6 +767,7 @@ struct device_domain_info {
#ifdef CONFIG_INTEL_IOMMU_DEBUGFS
struct dentry *debugfs_dentry; /* pointer to device directory dentry */
#endif
+ struct xarray pasid_array; /* Attached domains per PASID */
};

struct dev_pasid_info {

--
Regards,
Yi Liu

2024-03-13 08:01:52

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH 1/8] iommu: Introduce a replace API for device pasid

On 2024/3/12 11:07, Yi Liu wrote:
> On 2024/3/11 17:26, Tian, Kevin wrote:
>>> From: Liu, Yi L <[email protected]>
>>> Sent: Sunday, March 10, 2024 9:06 PM
>>>
>>> On 2024/1/16 01:19, Jason Gunthorpe wrote:
>>>> On Sun, Nov 26, 2023 at 10:34:21PM -0800, Yi Liu wrote:
>>>>> +int iommu_replace_device_pasid(struct iommu_domain *domain,
>>>>> +                   struct device *dev, ioasid_t pasid)
>>>>> +{
>>>>> +    struct iommu_group *group = dev->iommu_group;
>>>>> +    struct iommu_domain *old_domain;
>>>>> +    int ret;
>>>>> +
>>>>> +    if (!domain)
>>>>> +        return -EINVAL;
>>>>> +
>>>>> +    if (!group)
>>>>> +        return -ENODEV;
>>>>> +
>>>>> +    mutex_lock(&group->mutex);
>>>>> +    __iommu_remove_group_pasid(group, pasid);
>>>>
>>>> It is not replace if you do remove first.
>>>>
>>>> Replace must just call set_dev_pasid and nothing much else..
>>>
>>> Seems uneasy to do it so far. The VT-d driver needs to get the old
>>> domain
>>> first in order to do replacement. However, VT-d driver does not track
>>> the
>>> attached domains of pasids. It gets domain of a pasid
>>> by iommu_get_domain_for_dev_pasid(). Like
>>> intel_iommu_remove_dev_pasid)
>>> in link [1]. While the iommu layer exchanges the domain in the
>>> group->pasid_array before calling into VT-d driver. So when calling into
>>> VT-d driver, the domain got by iommu_get_domain_for_dev_pasid() is
>>> already
>>> the new domain. To solve it, we may need to let VT-d driver have its
>>> own tracking on the domains. How about your thoughts? @Jason, @Kevin,
>>> @Baoplu?
>>>
>>> [1]
>>> https://github.com/torvalds/linux/blob/master/drivers/iommu/intel/iommu
>>> .c#L4621C19-L4621C20
>>>
>>
>> Jason's point was that the core helper should directly call set_dev_pasid
>> and underlying iommu driver decides whether atomic replacement
>> can be implemented inside the callback. If you first remove in the core
>> then one can never implement a replace semantics.
>
> yeah, I got Jason's point. I'm raising an open to make the set_dev_pasid
> callback to handle domain replacement. The existing intel iommu driver
> does not track attached domains for pasid. But it's needed to know the
> attached domain of a pasid and find the dev_pasid under the domain, hence
> be able to clean up the old attachment and do the new attachment. Existing
> code cannot work as I mentioned above. The group->pasid_xarray is updated
> before calling set_dev_pasid callback. This means the underlying iommu
> driver cannot get the old domain in the set_dev_pasid callback by the
> iommu_get_domain_for_dev_pasid() helper.
>
> As above, I'm considering the possibility to track attached domains for
> pasid by an xarray in the struct device_domain_info. Hence, intel iommu
> driver could store/retrieve attached domain for pasids. If it's
> replacement, the set_dev_pasid callback could find the attached domain and
> the dev_pasid instance in the domain. Then be able to do detach and clean
> up the tracking structures (e.g. dev_pasid).

Maintaining the same data structure in both core and individual iommu
drivers seems to be duplicate effort. It appears that what you want here
is just to get the currently used domain in the set_dev_pasid path. Is
it possible to adjust the core code so that the pasid array is updated
after the driver's set_dev_pasid callback returns success?

Best regards,
baolu

2024-03-13 08:08:54

by Yi Liu

[permalink] [raw]
Subject: Re: [PATCH 1/8] iommu: Introduce a replace API for device pasid

On 2024/3/13 11:13, Baolu Lu wrote:
> On 2024/3/12 11:07, Yi Liu wrote:
>> On 2024/3/11 17:26, Tian, Kevin wrote:
>>>> From: Liu, Yi L <[email protected]>
>>>> Sent: Sunday, March 10, 2024 9:06 PM
>>>>
>>>> On 2024/1/16 01:19, Jason Gunthorpe wrote:
>>>>> On Sun, Nov 26, 2023 at 10:34:21PM -0800, Yi Liu wrote:
>>>>>> +int iommu_replace_device_pasid(struct iommu_domain *domain,
>>>>>> +                   struct device *dev, ioasid_t pasid)
>>>>>> +{
>>>>>> +    struct iommu_group *group = dev->iommu_group;
>>>>>> +    struct iommu_domain *old_domain;
>>>>>> +    int ret;
>>>>>> +
>>>>>> +    if (!domain)
>>>>>> +        return -EINVAL;
>>>>>> +
>>>>>> +    if (!group)
>>>>>> +        return -ENODEV;
>>>>>> +
>>>>>> +    mutex_lock(&group->mutex);
>>>>>> +    __iommu_remove_group_pasid(group, pasid);
>>>>>
>>>>> It is not replace if you do remove first.
>>>>>
>>>>> Replace must just call set_dev_pasid and nothing much else..
>>>>
>>>> Seems uneasy to do it so far. The VT-d driver needs to get the old domain
>>>> first in order to do replacement. However, VT-d driver does not track the
>>>> attached domains of pasids. It gets domain of a pasid
>>>> by iommu_get_domain_for_dev_pasid(). Like
>>>> intel_iommu_remove_dev_pasid)
>>>> in link [1]. While the iommu layer exchanges the domain in the
>>>> group->pasid_array before calling into VT-d driver. So when calling into
>>>> VT-d driver, the domain got by iommu_get_domain_for_dev_pasid() is
>>>> already
>>>> the new domain. To solve it, we may need to let VT-d driver have its
>>>> own tracking on the domains. How about your thoughts? @Jason, @Kevin,
>>>> @Baoplu?
>>>>
>>>> [1]
>>>> https://github.com/torvalds/linux/blob/master/drivers/iommu/intel/iommu
>>>> .c#L4621C19-L4621C20
>>>>
>>>
>>> Jason's point was that the core helper should directly call set_dev_pasid
>>> and underlying iommu driver decides whether atomic replacement
>>> can be implemented inside the callback. If you first remove in the core
>>> then one can never implement a replace semantics.
>>
>> yeah, I got Jason's point. I'm raising an open to make the set_dev_pasid
>> callback to handle domain replacement. The existing intel iommu driver
>> does not track attached domains for pasid. But it's needed to know the
>> attached domain of a pasid and find the dev_pasid under the domain, hence
>> be able to clean up the old attachment and do the new attachment. Existing
>> code cannot work as I mentioned above. The group->pasid_xarray is updated
>> before calling set_dev_pasid callback. This means the underlying iommu
>> driver cannot get the old domain in the set_dev_pasid callback by the
>> iommu_get_domain_for_dev_pasid() helper.
>>
>> As above, I'm considering the possibility to track attached domains for
>> pasid by an xarray in the struct device_domain_info. Hence, intel iommu
>> driver could store/retrieve attached domain for pasids. If it's
>> replacement, the set_dev_pasid callback could find the attached domain and
>> the dev_pasid instance in the domain. Then be able to do detach and clean
>> up the tracking structures (e.g. dev_pasid).
>
> Maintaining the same data structure in both core and individual iommu
> drivers seems to be duplicate effort. It appears that what you want here
> is just to get the currently used domain in the set_dev_pasid path. Is
> it possible to adjust the core code so that the pasid array is updated
> after the driver's set_dev_pasid callback returns success?

yes. how about your opinion? @Jason. I noticed the set_dev_pasid callback
and pasid_array update is under the group->lock, so update it should be
fine to adjust the order to update pasid_array after set_dev_pasid returns.

--
Regards,
Yi Liu

2024-03-18 16:53:08

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 1/8] iommu: Introduce a replace API for device pasid

On Wed, Mar 13, 2024 at 04:11:41PM +0800, Yi Liu wrote:

> yes. how about your opinion? @Jason. I noticed the set_dev_pasid callback
> and pasid_array update is under the group->lock, so update it should be
> fine to adjust the order to update pasid_array after set_dev_pasid returns.

Yes, it makes some sense

But, also I would like it very much if we just have the core pass in
the actual old domain as a an addition function argument.

I think we have some small mistakes in multi-device group error
unwinding for remove because the global xarray can't isn't actually
going to be correct in all scenarios.

Jason


2024-03-19 07:26:36

by Yi Liu

[permalink] [raw]
Subject: Re: [PATCH 1/8] iommu: Introduce a replace API for device pasid

On 2024/3/19 00:52, Jason Gunthorpe wrote:
> On Wed, Mar 13, 2024 at 04:11:41PM +0800, Yi Liu wrote:
>
>> yes. how about your opinion? @Jason. I noticed the set_dev_pasid callback
>> and pasid_array update is under the group->lock, so update it should be
>> fine to adjust the order to update pasid_array after set_dev_pasid returns.
>
> Yes, it makes some sense
>
> But, also I would like it very much if we just have the core pass in
> the actual old domain as a an addition function argument.

ok, this works too. For normal attach, just pass in a NULL old domain.

> I think we have some small mistakes in multi-device group error
> unwinding for remove because the global xarray can't isn't actually
> going to be correct in all scenarios.

do you mean the __iommu_remove_group_pasid() call in the below function?
Currently, it is called when __iommu_set_group_pasid() failed. However,
__iommu_set_group_pasid() may need to do remove itself when error happens,
so the helper can be more self-contained. Or you mean something else?

int iommu_attach_device_pasid(struct iommu_domain *domain,
struct device *dev, ioasid_t pasid)
{
/* Caller must be a probed driver on dev */
struct iommu_group *group = dev->iommu_group;
void *curr;
int ret;

if (!domain->ops->set_dev_pasid)
return -EOPNOTSUPP;

if (!group)
return -ENODEV;

if (!dev_has_iommu(dev) || dev_iommu_ops(dev) != domain->owner)
return -EINVAL;

mutex_lock(&group->mutex);
curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain, GFP_KERNEL);
if (curr) {
ret = xa_err(curr) ? : -EBUSY;
goto out_unlock;
}

ret = __iommu_set_group_pasid(domain, group, pasid);
if (ret) {
__iommu_remove_group_pasid(group, pasid);
xa_erase(&group->pasid_array, pasid);
}
out_unlock:
mutex_unlock(&group->mutex);
return ret;
}
EXPORT_SYMBOL_GPL(iommu_attach_device_pasid);

https://github.com/torvalds/linux/blob/b3603fcb79b1036acae10602bffc4855a4b9af80/drivers/iommu/iommu.c#L3376

--
Regards,
Yi Liu

2024-03-20 12:38:28

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 1/8] iommu: Introduce a replace API for device pasid

On Tue, Mar 19, 2024 at 03:29:39PM +0800, Yi Liu wrote:
> On 2024/3/19 00:52, Jason Gunthorpe wrote:
> > On Wed, Mar 13, 2024 at 04:11:41PM +0800, Yi Liu wrote:
> >
> > > yes. how about your opinion? @Jason. I noticed the set_dev_pasid callback
> > > and pasid_array update is under the group->lock, so update it should be
> > > fine to adjust the order to update pasid_array after set_dev_pasid returns.
> >
> > Yes, it makes some sense
> >
> > But, also I would like it very much if we just have the core pass in
> > the actual old domain as a an addition function argument.
>
> ok, this works too. For normal attach, just pass in a NULL old domain.
>
> > I think we have some small mistakes in multi-device group error
> > unwinding for remove because the global xarray can't isn't actually
> > going to be correct in all scenarios.
>
> do you mean the __iommu_remove_group_pasid() call in the below function?
> Currently, it is called when __iommu_set_group_pasid() failed. However,
> __iommu_set_group_pasid() may need to do remove itself when error happens,
> so the helper can be more self-contained. Or you mean something else?

Yes..

> int iommu_attach_device_pasid(struct iommu_domain *domain,
> struct device *dev, ioasid_t pasid)
> {
> /* Caller must be a probed driver on dev */
> struct iommu_group *group = dev->iommu_group;
> void *curr;
> int ret;
>
> if (!domain->ops->set_dev_pasid)
> return -EOPNOTSUPP;
>
> if (!group)
> return -ENODEV;
>
> if (!dev_has_iommu(dev) || dev_iommu_ops(dev) != domain->owner)
> return -EINVAL;
>
> mutex_lock(&group->mutex);
> curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain, GFP_KERNEL);
> if (curr) {
> ret = xa_err(curr) ? : -EBUSY;
> goto out_unlock;
> }
>
> ret = __iommu_set_group_pasid(domain, group, pasid);

So here we have the xa set to the new domain

> if (ret) {
> __iommu_remove_group_pasid(group, pasid);

And here we still have it set to the new domain even though some of
the devices within the group failed to attach. The logic needs to be
more like the main domain attach path where iterate and then undo only
what we did

And the whole thing is easier to reason about if an input argument
specifies the current attached domain instead of having the driver
read it from the xarray.

Jason

2024-03-21 06:13:38

by Yi Liu

[permalink] [raw]
Subject: Re: [PATCH 1/8] iommu: Introduce a replace API for device pasid

On 2024/3/20 20:38, Jason Gunthorpe wrote:
> On Tue, Mar 19, 2024 at 03:29:39PM +0800, Yi Liu wrote:
>> On 2024/3/19 00:52, Jason Gunthorpe wrote:
>>> On Wed, Mar 13, 2024 at 04:11:41PM +0800, Yi Liu wrote:
>>>
>>>> yes. how about your opinion? @Jason. I noticed the set_dev_pasid callback
>>>> and pasid_array update is under the group->lock, so update it should be
>>>> fine to adjust the order to update pasid_array after set_dev_pasid returns.
>>>
>>> Yes, it makes some sense
>>>
>>> But, also I would like it very much if we just have the core pass in
>>> the actual old domain as a an addition function argument.
>>
>> ok, this works too. For normal attach, just pass in a NULL old domain.
>>
>>> I think we have some small mistakes in multi-device group error
>>> unwinding for remove because the global xarray can't isn't actually
>>> going to be correct in all scenarios.
>>
>> do you mean the __iommu_remove_group_pasid() call in the below function?
>> Currently, it is called when __iommu_set_group_pasid() failed. However,
>> __iommu_set_group_pasid() may need to do remove itself when error happens,
>> so the helper can be more self-contained. Or you mean something else?
>
> Yes..
>
>> int iommu_attach_device_pasid(struct iommu_domain *domain,
>> struct device *dev, ioasid_t pasid)
>> {
>> /* Caller must be a probed driver on dev */
>> struct iommu_group *group = dev->iommu_group;
>> void *curr;
>> int ret;
>>
>> if (!domain->ops->set_dev_pasid)
>> return -EOPNOTSUPP;
>>
>> if (!group)
>> return -ENODEV;
>>
>> if (!dev_has_iommu(dev) || dev_iommu_ops(dev) != domain->owner)
>> return -EINVAL;
>>
>> mutex_lock(&group->mutex);
>> curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain, GFP_KERNEL);
>> if (curr) {
>> ret = xa_err(curr) ? : -EBUSY;
>> goto out_unlock;
>> }
>>
>> ret = __iommu_set_group_pasid(domain, group, pasid);
>
> So here we have the xa set to the new domain

aha, yes. The underlying driver won't be able to get the correct domain
in the remove_dev_pasid callback.

>> if (ret) {
>> __iommu_remove_group_pasid(group, pasid);
>
> And here we still have it set to the new domain even though some of
> the devices within the group failed to attach. The logic needs to be
> more like the main domain attach path where iterate and then undo only
> what we did

yes, the correct way is to undo what have been done before the fail
device. However, I somehow remember that pasid capability is only
available when the group is singleton. So iterate all devices of the
devices just means one device in fact. If this is true, then the
current code is fine although a bit confusing.

> And the whole thing is easier to reason about if an input argument
> specifies the current attached domain instead of having the driver
> read it from the xarray.

yep, will correct it as a fix patch.

--
Regards,
Yi Liu

2024-03-21 11:23:46

by Yi Liu

[permalink] [raw]
Subject: Re: [PATCH 1/8] iommu: Introduce a replace API for device pasid

On 2024/3/21 14:16, Yi Liu wrote:
> On 2024/3/20 20:38, Jason Gunthorpe wrote:
>> On Tue, Mar 19, 2024 at 03:29:39PM +0800, Yi Liu wrote:
>>> On 2024/3/19 00:52, Jason Gunthorpe wrote:
>>>> On Wed, Mar 13, 2024 at 04:11:41PM +0800, Yi Liu wrote:
>>>>
>>>>> yes. how about your opinion? @Jason. I noticed the set_dev_pasid callback
>>>>> and pasid_array update is under the group->lock, so update it should be
>>>>> fine to adjust the order to update pasid_array after set_dev_pasid
>>>>> returns.
>>>>
>>>> Yes, it makes some sense
>>>>
>>>> But, also I would like it very much if we just have the core pass in
>>>> the actual old domain as a an addition function argument.
>>>
>>> ok, this works too. For normal attach, just pass in a NULL old domain.
>>>
>>>> I think we have some small mistakes in multi-device group error
>>>> unwinding for remove because the global xarray can't isn't actually
>>>> going to be correct in all scenarios.
>>>
>>> do you mean the __iommu_remove_group_pasid() call in the below function?
>>> Currently, it is called when __iommu_set_group_pasid() failed. However,
>>> __iommu_set_group_pasid() may need to do remove itself when error happens,
>>> so the helper can be more self-contained. Or you mean something else?
>>
>> Yes..
>>
>>> int iommu_attach_device_pasid(struct iommu_domain *domain,
>>>                   struct device *dev, ioasid_t pasid)
>>> {
>>>     /* Caller must be a probed driver on dev */
>>>     struct iommu_group *group = dev->iommu_group;
>>>     void *curr;
>>>     int ret;
>>>
>>>     if (!domain->ops->set_dev_pasid)
>>>         return -EOPNOTSUPP;
>>>
>>>     if (!group)
>>>         return -ENODEV;
>>>
>>>     if (!dev_has_iommu(dev) || dev_iommu_ops(dev) != domain->owner)
>>>         return -EINVAL;
>>>
>>>     mutex_lock(&group->mutex);
>>>     curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain,
>>> GFP_KERNEL);
>>>     if (curr) {
>>>         ret = xa_err(curr) ? : -EBUSY;
>>>         goto out_unlock;
>>>     }
>>>
>>>     ret = __iommu_set_group_pasid(domain, group, pasid);
>>
>> So here we have the xa set to the new domain
>
> aha, yes. The underlying driver won't be able to get the correct domain
> in the remove_dev_pasid callback.
>
>>>     if (ret) {
>>>         __iommu_remove_group_pasid(group, pasid);
>>
>> And here we still have it set to the new domain even though some of
>> the devices within the group failed to attach. The logic needs to be
>> more like the main domain attach path where iterate and then undo only
>> what we did
>
> yes, the correct way is to undo what have been done before the fail
> device. However, I somehow remember that pasid capability is only
> available when the group is singleton. So iterate all devices of the
> devices just means one device in fact. If this is true, then the
> current code is fine although a bit confusing.
>
>> And the whole thing is easier to reason about if an input argument
>> specifies the current attached domain instead of having the driver
>> read it from the xarray.
>
> yep, will correct it as a fix patch.

Hi Jason,

It appears there are two solutions here.

First, only undo the devices that have set_dev_pasid successfully in
the __iommu_set_group_pasid(), so the problematic
__iommu_remove_group_pasid() call at line 3378 [1] would go away.
This also makes the helper more self-contained. Draft patch in [2]

Second, pass in the domain to remove_dev_pasid(). Draft patch in [3]

Either of the above two should be able to solve the mistake you mentioned.
BTW. They are orthogonal, so it's also possible to apply both of them.
Which one is your preference then?

[1]
https://github.com/torvalds/linux/blob/b3603fcb79b1036acae10602bffc4855a4b9af80/drivers/iommu/iommu.c#L3378
[2]
https://github.com/yiliu1765/iommufd/commit/bb83270767ab460978a3452750f5e032b43e59bf
[3]
https://github.com/yiliu1765/iommufd/commit/bd270b7b6619b8ba35dfaf9870df8728e370163f

Regards,
Yi Liu


2024-03-21 12:21:24

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 1/8] iommu: Introduce a replace API for device pasid

On Thu, Mar 21, 2024 at 07:26:41PM +0800, Yi Liu wrote:
> > yes, the correct way is to undo what have been done before the fail
> > device. However, I somehow remember that pasid capability is only
> > available when the group is singleton. So iterate all devices of the
> > devices just means one device in fact. If this is true, then the
> > current code is fine although a bit confusing.

Platform devicse don't have that limitation.. It is PCI only.

> > > And the whole thing is easier to reason about if an input argument
> > > specifies the current attached domain instead of having the driver
> > > read it from the xarray.
> >
> > yep, will correct it as a fix patch.
>
> Hi Jason,
>
> It appears there are two solutions here.
>
> First, only undo the devices that have set_dev_pasid successfully in
> the __iommu_set_group_pasid(), so the problematic
> __iommu_remove_group_pasid() call at line 3378 [1] would go away.
> This also makes the helper more self-contained. Draft patch in [2]
>
> Second, pass in the domain to remove_dev_pasid(). Draft patch in [3]
>
> Either of the above two should be able to solve the mistake you mentioned.
> BTW. They are orthogonal, so it's also possible to apply both of them.
> Which one is your preference then?

I would do both because I also think it is not nice that the drivers
always have to have the boiler plate to read the xarray in their
remove..

Jason

2024-03-21 13:56:20

by Yi Liu

[permalink] [raw]
Subject: Re: [PATCH 1/8] iommu: Introduce a replace API for device pasid

On 2024/3/21 20:20, Jason Gunthorpe wrote:
> On Thu, Mar 21, 2024 at 07:26:41PM +0800, Yi Liu wrote:
>>> yes, the correct way is to undo what have been done before the fail
>>> device. However, I somehow remember that pasid capability is only
>>> available when the group is singleton. So iterate all devices of the
>>> devices just means one device in fact. If this is true, then the
>>> current code is fine although a bit confusing.
>
> Platform devicse don't have that limitation.. It is PCI only.

ok.

>>>> And the whole thing is easier to reason about if an input argument
>>>> specifies the current attached domain instead of having the driver
>>>> read it from the xarray.
>>>
>>> yep, will correct it as a fix patch.
>>
>> Hi Jason,
>>
>> It appears there are two solutions here.
>>
>> First, only undo the devices that have set_dev_pasid successfully in
>> the __iommu_set_group_pasid(), so the problematic
>> __iommu_remove_group_pasid() call at line 3378 [1] would go away.
>> This also makes the helper more self-contained. Draft patch in [2]
>>
>> Second, pass in the domain to remove_dev_pasid(). Draft patch in [3]
>>
>> Either of the above two should be able to solve the mistake you mentioned.
>> BTW. They are orthogonal, so it's also possible to apply both of them.
>> Which one is your preference then?
>
> I would do both because I also think it is not nice that the drivers
> always have to have the boiler plate to read the xarray in their
> remove..

sure. I'll send the two patches as Fix series.

--
Regards,
Yi Liu