2023-09-22 06:42:03

by Yi Liu

[permalink] [raw]
Subject: [PATCH v4 00/17] iommufd: Add nesting infrastructure

Nested translation is a hardware feature that is supported by many modern
IOMMU hardwares. It has two stages (stage-1, stage-2) address translation
to get access to the physical address. stage-1 translation table is owned
by userspace (e.g. by a guest OS), while stage-2 is owned by kernel. Changes
to stage-1 translation table should be followed by an IOTLB invalidation.

Take Intel VT-d as an example, the stage-1 translation table is I/O page
table. As the below diagram shows, guest I/O page table pointer in GPA
(guest physical address) is passed to host and be used to perform the stage-1
address translation. Along with it, modifications to present mappings in the
guest I/O page table should be followed with an IOTLB invalidation.

.-------------. .---------------------------.
| vIOMMU | | Guest I/O page table |
| | '---------------------------'
.----------------/
| PASID Entry |--- PASID cache flush --+
'-------------' |
| | V
| | I/O page table pointer in GPA
'-------------'
Guest
------| Shadow |---------------------------|--------
v v v
Host
.-------------. .------------------------.
| pIOMMU | | FS for GIOVA->GPA |
| | '------------------------'
.----------------/ |
| PASID Entry | V (Nested xlate)
'----------------\.----------------------------------.
| | | SS for GPA->HPA, unmanaged domain|
| | '----------------------------------'
'-------------'
Where:
- FS = First stage page tables
- SS = Second stage page tables
<Intel VT-d Nested translation>

In IOMMUFD, all the translation tables are tracked by hw_pagetable (hwpt)
and each has an iommu_domain allocated from iommu driver. So in this series
hw_pagetable and iommu_domain means the same thing if no special note.
IOMMUFD has already supported allocating hw_pagetable that is linked with
an IOAS. However, nesting requires IOMMUFD to allow allocating hw_pagetable
with driver specific parameters and interface to sync stage-1 IOTLB as user
owns the stage-1 translation table.

This series is based on the iommu hw info reporting series [1]. It first
extends domain_alloc_user to allocate domains with user data and adds new
op for invalidate stage-1 IOTLB for user-managed domains, then extends the
IOMMUFD internal infrastructure to accept user_data and parent hwpt, relay
the user_data/parent to iommu core to allocate user-managed iommu_domain.
After it, extends the ioctl IOMMU_HWPT_ALLOC to accept user data and stage-2
hwpt ID. Along with it, ioctl IOMMU_HWPT_INVALIDATE is added to invalidate
stage-1 IOTLB. This is needed for user-managed hwpts. Selftest is added as
well to cover the new ioctls.

Complete code can be found in [2], QEMU could can be found in [3].

At last, this is a team work together with Nicolin Chen, Lu Baolu. Thanks
them for the help. ^_^. Look forward to your feedbacks.

[1] https://lore.kernel.org/linux-iommu/[email protected]/#r - merged
[2] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting
[3] https://github.com/yiliu1765/qemu/tree/zhenzhong/wip/iommufd_nesting_rfcv1

Change log:

v4:
- Separate HWPT alloc/destroy/abort functions between user-managed HWPTs
and kernel-managed HWPTs
- Rework invalidate uAPI to be a multi-request array-based design
- Add a struct iommu_user_data_array and a helper for driver to sanitize
and copy the entry data from user space invalidation array
- Add a patch fixing TEST_LENGTH() in selftest program
- Drop IOMMU_RESV_IOVA_RANGES patches
- Update kdoc and inline comments
- Drop the code to add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation,
this does not change the rule that resv regions should only be added to the
kernel-managed HWPT. The IOMMU_RESV_SW_MSI stuff will be added in later series
as it is needed only by SMMU so far.

v3: https://lore.kernel.org/linux-iommu/[email protected]/
- Add new uAPI things in alphabetical order
- Pass in "enum iommu_hwpt_type hwpt_type" to op->domain_alloc_user for
sanity, replacing the previous op->domain_alloc_user_data_len solution
- Return ERR_PTR from domain_alloc_user instead of NULL
- Only add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation (Kevin)
- Add IOMMU_RESV_IOVA_RANGES to report resv iova ranges to userspace hence
userspace is able to exclude the ranges in the stage-1 HWPT (e.g. guest I/O
page table). (Kevin)
- Add selftest coverage for the new IOMMU_RESV_IOVA_RANGES ioctl
- Minor changes per Kevin's inputs

v2: https://lore.kernel.org/linux-iommu/[email protected]/
- Add union iommu_domain_user_data to include all user data structures to avoid
passing void * in kernel APIs.
- Add iommu op to return user data length for user domain allocation
- Rename struct iommu_hwpt_alloc::data_type to be hwpt_type
- Store the invalidation data length in iommu_domain_ops::cache_invalidate_user_data_len
- Convert cache_invalidate_user op to be int instead of void
- Remove @data_type in struct iommu_hwpt_invalidate
- Remove out_hwpt_type_bitmap in struct iommu_hw_info hence drop patch 08 of v1

v1: https://lore.kernel.org/linux-iommu/[email protected]/

Thanks,
Yi Liu

Lu Baolu (1):
iommu: Add nested domain support

Nicolin Chen (12):
iommufd: Unite all kernel-managed members into a struct
iommufd: Separate kernel-managed HWPT alloc/destroy/abort functions
iommufd: Add shared alloc_fn function pointer and mutex pointer
iommufd: Add user-managed hw_pagetable support
iommufd: Always setup MSI and anforce cc on kernel-managed domains
iommufd/device: Add helpers to enforce/remove device reserved regions
iommufd/selftest: Rework TEST_LENGTH to test min_size explicitly
iommufd/selftest: Add nested domain allocation for mock domain
iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with nested HWPTs
iommufd/selftest: Add mock_domain_cache_invalidate_user support
iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op
iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl

Yi Liu (4):
iommu: Add hwpt_type with user_data for domain_alloc_user op
iommufd: Pass in hwpt_type/user_data to iommufd_hw_pagetable_alloc()
iommufd: Support IOMMU_HWPT_ALLOC allocation with user data
iommufd: Add IOMMU_HWPT_INVALIDATE

drivers/iommu/intel/iommu.c | 5 +-
drivers/iommu/iommufd/device.c | 51 +++-
drivers/iommu/iommufd/hw_pagetable.c | 257 ++++++++++++++++--
drivers/iommu/iommufd/iommufd_private.h | 59 +++-
drivers/iommu/iommufd/iommufd_test.h | 40 +++
drivers/iommu/iommufd/main.c | 3 +
drivers/iommu/iommufd/selftest.c | 184 ++++++++++++-
include/linux/iommu.h | 110 +++++++-
include/uapi/linux/iommufd.h | 60 +++-
tools/testing/selftests/iommu/iommufd.c | 209 +++++++++++++-
.../selftests/iommu/iommufd_fail_nth.c | 3 +-
tools/testing/selftests/iommu/iommufd_utils.h | 91 ++++++-
12 files changed, 998 insertions(+), 74 deletions(-)

--
2.34.1


2023-09-22 07:12:50

by Yi Liu

[permalink] [raw]
Subject: [PATCH v4 07/17] iommufd: Add user-managed hw_pagetable support

From: Nicolin Chen <[email protected]>

Add a parent hw_pagetable pointer for user-managed hw_pagetables. Similar
to the ioas->mutex, add another mutex in the kernel-managed hw_pagetable
to serialize associating user-managed hw_pagetable allocations. Then, add
user_managed flag too in the struct to ease identifying a HWPT.

Also, add a new allocator iommufd_user_managed_hwpt_alloc() and two pairs
of cleanup functions iommufd_user_managed_hwpt_destroy/abort().

Signed-off-by: Nicolin Chen <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
---
drivers/iommu/iommufd/hw_pagetable.c | 112 +++++++++++++++++++++++-
drivers/iommu/iommufd/iommufd_private.h | 6 ++
2 files changed, 117 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index b2af68776877..dc3e11a23acf 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -8,6 +8,17 @@
#include "../iommu-priv.h"
#include "iommufd_private.h"

+static void iommufd_user_managed_hwpt_destroy(struct iommufd_object *obj)
+{
+ struct iommufd_hw_pagetable *hwpt =
+ container_of(obj, struct iommufd_hw_pagetable, obj);
+
+ if (hwpt->domain)
+ iommu_domain_free(hwpt->domain);
+
+ refcount_dec(&hwpt->parent->obj.users);
+}
+
static void iommufd_kernel_managed_hwpt_destroy(struct iommufd_object *obj)
{
struct iommufd_hw_pagetable *hwpt =
@@ -32,6 +43,17 @@ void iommufd_hw_pagetable_destroy(struct iommufd_object *obj)
container_of(obj, struct iommufd_hw_pagetable, obj)->destroy(obj);
}

+static void iommufd_user_managed_hwpt_abort(struct iommufd_object *obj)
+{
+ struct iommufd_hw_pagetable *hwpt =
+ container_of(obj, struct iommufd_hw_pagetable, obj);
+
+ /* The parent->mutex must be held until finalize is called. */
+ lockdep_assert_held(&hwpt->parent->mutex);
+
+ iommufd_hw_pagetable_destroy(obj);
+}
+
static void iommufd_kernel_managed_hwpt_abort(struct iommufd_object *obj)
{
struct iommufd_hw_pagetable *hwpt =
@@ -52,6 +74,82 @@ void iommufd_hw_pagetable_abort(struct iommufd_object *obj)
container_of(obj, struct iommufd_hw_pagetable, obj)->abort(obj);
}

+/**
+ * iommufd_user_managed_hwpt_alloc() - Get a user-managed hw_pagetable
+ * @ictx: iommufd context
+ * @pt_obj: Parent object to an HWPT to associate the domain with
+ * @idev: Device to get an iommu_domain for
+ * @flags: Flags from userspace
+ * @hwpt_type: Requested type of hw_pagetable
+ * @user_data: user_data pointer
+ * @dummy: never used
+ *
+ * Allocate a new iommu_domain (must be IOMMU_DOMAIN_NESTED) and return it as
+ * a user-managed hw_pagetable.
+ */
+static struct iommufd_hw_pagetable *
+iommufd_user_managed_hwpt_alloc(struct iommufd_ctx *ictx,
+ struct iommufd_object *pt_obj,
+ struct iommufd_device *idev,
+ u32 flags,
+ enum iommu_hwpt_type hwpt_type,
+ struct iommu_user_data *user_data,
+ bool dummy)
+{
+ struct iommufd_hw_pagetable *parent =
+ container_of(pt_obj, struct iommufd_hw_pagetable, obj);
+ const struct iommu_ops *ops = dev_iommu_ops(idev->dev);
+ struct iommufd_hw_pagetable *hwpt;
+ int rc;
+
+ if (!user_data)
+ return ERR_PTR(-EINVAL);
+ if (parent->auto_domain)
+ return ERR_PTR(-EINVAL);
+ if (!parent->nest_parent)
+ return ERR_PTR(-EINVAL);
+ if (hwpt_type == IOMMU_HWPT_TYPE_DEFAULT)
+ return ERR_PTR(-EINVAL);
+
+ if (!ops->domain_alloc_user)
+ return ERR_PTR(-EOPNOTSUPP);
+
+ lockdep_assert_held(&parent->mutex);
+
+ hwpt = iommufd_object_alloc(ictx, hwpt, IOMMUFD_OBJ_HW_PAGETABLE);
+ if (IS_ERR(hwpt))
+ return hwpt;
+
+ refcount_inc(&parent->obj.users);
+ hwpt->parent = parent;
+ hwpt->user_managed = true;
+ hwpt->abort = iommufd_user_managed_hwpt_abort;
+ hwpt->destroy = iommufd_user_managed_hwpt_destroy;
+
+ hwpt->domain = ops->domain_alloc_user(idev->dev, flags, hwpt_type,
+ parent->domain, user_data);
+ if (IS_ERR(hwpt->domain)) {
+ rc = PTR_ERR(hwpt->domain);
+ hwpt->domain = NULL;
+ goto out_abort;
+ }
+
+ if (WARN_ON_ONCE(hwpt->domain->type != IOMMU_DOMAIN_NESTED)) {
+ rc = -EINVAL;
+ goto out_abort;
+ }
+ /* Driver is buggy by missing cache_invalidate_user in domain_ops */
+ if (WARN_ON_ONCE(!hwpt->domain->ops->cache_invalidate_user)) {
+ rc = -EINVAL;
+ goto out_abort;
+ }
+ return hwpt;
+
+out_abort:
+ iommufd_object_abort_and_destroy(ictx, &hwpt->obj);
+ return ERR_PTR(rc);
+}
+
int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
{
if (hwpt->enforce_cache_coherency)
@@ -112,10 +210,12 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx,
if (IS_ERR(hwpt))
return hwpt;

+ mutex_init(&hwpt->mutex);
INIT_LIST_HEAD(&hwpt->hwpt_item);
/* Pairs with iommufd_hw_pagetable_destroy() */
refcount_inc(&ioas->obj.users);
hwpt->ioas = ioas;
+ hwpt->nest_parent = flags & IOMMU_HWPT_ALLOC_NEST_PARENT;
hwpt->abort = iommufd_kernel_managed_hwpt_abort;
hwpt->destroy = iommufd_kernel_managed_hwpt_destroy;

@@ -194,8 +294,8 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
u32 flags, enum iommu_hwpt_type type,
struct iommu_user_data *user_data,
bool flag);
+ struct iommufd_hw_pagetable *hwpt, *parent;
struct iommu_hwpt_alloc *cmd = ucmd->cmd;
- struct iommufd_hw_pagetable *hwpt;
struct iommufd_object *pt_obj;
struct iommufd_device *idev;
struct iommufd_ioas *ioas;
@@ -221,6 +321,16 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
mutex = &ioas->mutex;
alloc_fn = iommufd_hw_pagetable_alloc;
break;
+ case IOMMUFD_OBJ_HW_PAGETABLE:
+ parent = container_of(pt_obj, struct iommufd_hw_pagetable, obj);
+ /* No user-managed HWPT on top of an user-managed one */
+ if (parent->user_managed) {
+ rc = -EINVAL;
+ goto out_put_pt;
+ }
+ mutex = &parent->mutex;
+ alloc_fn = iommufd_user_managed_hwpt_alloc;
+ break;
default:
rc = -EINVAL;
goto out_put_pt;
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index e4d06ae6b0c5..34940596c2c2 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -237,12 +237,18 @@ struct iommufd_hw_pagetable {
void (*abort)(struct iommufd_object *obj);
void (*destroy)(struct iommufd_object *obj);

+ bool user_managed : 1;
union {
+ struct { /* user-managed */
+ struct iommufd_hw_pagetable *parent;
+ };
struct { /* kernel-managed */
struct iommufd_ioas *ioas;
+ struct mutex mutex;
bool auto_domain : 1;
bool enforce_cache_coherency : 1;
bool msi_cookie : 1;
+ bool nest_parent : 1;
/* Head at iommufd_ioas::hwpt_list */
struct list_head hwpt_item;
};
--
2.34.1

2023-09-22 08:48:15

by Yi Liu

[permalink] [raw]
Subject: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains

From: Nicolin Chen <[email protected]>

Now enforce_cache_coherency and msi_cookie are kernel-managed hwpt things.
So, they should be only setup on kernel-managed domains. If the attaching
domain is a user-managed domain, redirect the hwpt to hwpt->parent to do
it correctly.

Signed-off-by: Nicolin Chen <[email protected]>
Co-developed-by: Yi Liu <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
---
drivers/iommu/iommufd/device.c | 4 ++++
drivers/iommu/iommufd/hw_pagetable.c | 4 ++++
2 files changed, 8 insertions(+)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index eb120f70a3e3..104dd061a2a3 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -305,12 +305,16 @@ static int iommufd_group_setup_msi(struct iommufd_group *igroup,
* domain after request_irq(). If it is not done interrupts will not
* work on this domain.
*
+ * Note: always set up a msi_cookie on a kernel-manage hw_pagetable.
+ *
* FIXME: This is conceptually broken for iommufd since we want to allow
* userspace to change the domains, eg switch from an identity IOAS to a
* DMA IOAS. There is currently no way to create a MSI window that
* matches what the IRQ layer actually expects in a newly created
* domain.
*/
+ if (hwpt->user_managed)
+ hwpt = hwpt->parent;
if (sw_msi_start != PHYS_ADDR_MAX && !hwpt->msi_cookie) {
rc = iommu_get_msi_cookie(hwpt->domain, sw_msi_start);
if (rc)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index dc3e11a23acf..90fd65859e28 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -152,6 +152,10 @@ iommufd_user_managed_hwpt_alloc(struct iommufd_ctx *ictx,

int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
{
+ /* Always enforce cache coherency on a kernel-managed hw_pagetable */
+ if (hwpt->user_managed)
+ hwpt = hwpt->parent;
+
if (hwpt->enforce_cache_coherency)
return 0;

--
2.34.1

2023-09-22 12:40:46

by Yi Liu

[permalink] [raw]
Subject: [PATCH v4 09/17] iommufd/device: Add helpers to enforce/remove device reserved regions

From: Nicolin Chen <[email protected]>

The iopt_table_enforce_dev_resv_regions() and iopt_remove_reserved_iova()
require callers to pass in an ioas->iopt pointer. It simply works with a
kernel-managed hw_pagetable by passing in its hwpt->ioas->iopt pointer.
However, now there could be a user-managed hw_pagetable that doesn't have
an ioas pointer. And typically most of device reserved regions should be
enforced to a kernel-managed domain only, although the IOMMU_RESV_SW_MSI
used by SMMU will introduce some complication.

Add a pair of iommufd_device_enforce_rr/iommufd_device_remove_rr helpers
that calls iopt_table_enforce_dev_resv_regions/iopt_remove_reserved_iova
functions after some additional checks. This would also ease any further
extension to support the IOMMU_RESV_SW_MSI complication mentioned above.

For the replace() routine, add another helper to compare ioas pointers,
with the support of user-managed hw_pagetables.

Signed-off-by: Nicolin Chen <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
---
drivers/iommu/iommufd/device.c | 42 ++++++++++++++++++-------
drivers/iommu/iommufd/iommufd_private.h | 18 +++++++++++
2 files changed, 48 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 104dd061a2a3..10e6ec590ede 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -329,6 +329,28 @@ static int iommufd_group_setup_msi(struct iommufd_group *igroup,
return 0;
}

+static void iommufd_device_remove_rr(struct iommufd_device *idev,
+ struct iommufd_hw_pagetable *hwpt)
+{
+ if (WARN_ON(!hwpt))
+ return;
+ if (hwpt->user_managed)
+ return;
+ iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev);
+}
+
+static int iommufd_device_enforce_rr(struct iommufd_device *idev,
+ struct iommufd_hw_pagetable *hwpt,
+ phys_addr_t *sw_msi_start)
+{
+ if (WARN_ON(!hwpt))
+ return -EINVAL;
+ if (hwpt->user_managed)
+ return 0;
+ return iopt_table_enforce_dev_resv_regions(&hwpt->ioas->iopt, idev->dev,
+ sw_msi_start);
+}
+
int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
struct iommufd_device *idev)
{
@@ -348,8 +370,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
goto err_unlock;
}

- rc = iopt_table_enforce_dev_resv_regions(&hwpt->ioas->iopt, idev->dev,
- &idev->igroup->sw_msi_start);
+ rc = iommufd_device_enforce_rr(idev, hwpt, &idev->igroup->sw_msi_start);
if (rc)
goto err_unlock;

@@ -375,7 +396,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
mutex_unlock(&idev->igroup->lock);
return 0;
err_unresv:
- iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev);
+ iommufd_device_remove_rr(idev, hwpt);
err_unlock:
mutex_unlock(&idev->igroup->lock);
return rc;
@@ -392,7 +413,7 @@ iommufd_hw_pagetable_detach(struct iommufd_device *idev)
iommu_detach_group(hwpt->domain, idev->igroup->group);
idev->igroup->hwpt = NULL;
}
- iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev);
+ iommufd_device_remove_rr(idev, hwpt);
mutex_unlock(&idev->igroup->lock);

/* Caller must destroy hwpt */
@@ -444,10 +465,9 @@ iommufd_device_do_replace(struct iommufd_device *idev,
}

old_hwpt = igroup->hwpt;
- if (hwpt->ioas != old_hwpt->ioas) {
+ if (iommufd_hw_pagetable_compare_ioas(old_hwpt, hwpt)) {
list_for_each_entry(cur, &igroup->device_list, group_item) {
- rc = iopt_table_enforce_dev_resv_regions(
- &hwpt->ioas->iopt, cur->dev, NULL);
+ rc = iommufd_device_enforce_rr(cur, hwpt, NULL);
if (rc)
goto err_unresv;
}
@@ -461,12 +481,10 @@ iommufd_device_do_replace(struct iommufd_device *idev,
if (rc)
goto err_unresv;

- if (hwpt->ioas != old_hwpt->ioas) {
+ if (iommufd_hw_pagetable_compare_ioas(old_hwpt, hwpt)) {
list_for_each_entry(cur, &igroup->device_list, group_item)
- iopt_remove_reserved_iova(&old_hwpt->ioas->iopt,
- cur->dev);
+ iommufd_device_remove_rr(cur, hwpt);
}
-
igroup->hwpt = hwpt;

/*
@@ -483,7 +501,7 @@ iommufd_device_do_replace(struct iommufd_device *idev,
return old_hwpt;
err_unresv:
list_for_each_entry(cur, &igroup->device_list, group_item)
- iopt_remove_reserved_iova(&hwpt->ioas->iopt, cur->dev);
+ iommufd_device_remove_rr(cur, hwpt);
err_unlock:
mutex_unlock(&idev->igroup->lock);
return ERR_PTR(rc);
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 34940596c2c2..b14f23d3f42e 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -281,6 +281,24 @@ static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx,
refcount_dec(&hwpt->obj.users);
}

+static inline bool
+iommufd_hw_pagetable_compare_ioas(struct iommufd_hw_pagetable *old_hwpt,
+ struct iommufd_hw_pagetable *new_hwpt)
+{
+ struct iommufd_ioas *old_ioas, *new_ioas;
+
+ WARN_ON(!old_hwpt || !new_hwpt);
+ if (old_hwpt->user_managed)
+ old_ioas = old_hwpt->parent->ioas;
+ else
+ old_ioas = old_hwpt->ioas;
+ if (new_hwpt->user_managed)
+ new_ioas = new_hwpt->parent->ioas;
+ else
+ new_ioas = new_hwpt->ioas;
+ return old_ioas != new_ioas;
+}
+
struct iommufd_group {
struct kref ref;
struct mutex lock;
--
2.34.1

2023-09-22 14:07:47

by Yi Liu

[permalink] [raw]
Subject: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op

domain_alloc_user op already accepts user flags for domain allocation,
add hwpt_type and user_data support as well. This op chooses to make the
special parameters opaque to the core. This suits the current usage model
where accessing any of the IOMMU device special parameters does require
a userspace driver that matches the kernel driver. If a need for common
parameters, implemented similarly by several drivers, arises then there
is room in the design to grow a generic parameter set as well.

Define enum iommu_hwpt_type to differentiate the user parameters used by
different iommu vendor drivers in the future. For backward compatibility
and the allocations without any specified hwpt_type or user parameters,
add an IOMMU_HWPT_TYPE_DEFAULT in the list.

Also, add a struct iommu_user_data as a package of data_ptr and data_len
from an iommufd core uAPI structure, then provide a helper function for
a driver to run data length sanity and copy a defined hwpt_type specific
driver user data.

Signed-off-by: Lu Baolu <[email protected]>
Co-developed-by: Nicolin Chen <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
---
drivers/iommu/intel/iommu.c | 5 ++-
drivers/iommu/iommufd/hw_pagetable.c | 4 ++-
drivers/iommu/iommufd/selftest.c | 5 ++-
include/linux/iommu.h | 51 ++++++++++++++++++++++++++--
include/uapi/linux/iommufd.h | 8 +++++
5 files changed, 67 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 491bcde1ff96..4ffc939a71f0 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4075,7 +4075,10 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type)
}

static struct iommu_domain *
-intel_iommu_domain_alloc_user(struct device *dev, u32 flags)
+intel_iommu_domain_alloc_user(struct device *dev, u32 flags,
+ enum iommu_hwpt_type hwpt_type,
+ struct iommu_domain *parent,
+ const struct iommu_user_data *user_data)
{
struct iommu_domain *domain;
struct intel_iommu *iommu;
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 26a8a818ffa3..1d7378a6cbb3 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -96,7 +96,9 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
hwpt->ioas = ioas;

if (ops->domain_alloc_user) {
- hwpt->domain = ops->domain_alloc_user(idev->dev, flags);
+ hwpt->domain = ops->domain_alloc_user(idev->dev, flags,
+ IOMMU_HWPT_TYPE_DEFAULT,
+ NULL, NULL);
if (IS_ERR(hwpt->domain)) {
rc = PTR_ERR(hwpt->domain);
hwpt->domain = NULL;
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index b54cbfb1862b..2205a552e570 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -171,7 +171,10 @@ static struct iommu_domain *mock_domain_alloc(unsigned int iommu_domain_type)
}

static struct iommu_domain *
-mock_domain_alloc_user(struct device *dev, u32 flags)
+mock_domain_alloc_user(struct device *dev, u32 flags,
+ enum iommu_hwpt_type hwpt_type,
+ struct iommu_domain *parent,
+ const struct iommu_user_data *user_data)
{
struct iommu_domain *domain;

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 660dc1931dc9..12e12e5563e6 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -14,6 +14,7 @@
#include <linux/err.h>
#include <linux/of.h>
#include <uapi/linux/iommu.h>
+#include <uapi/linux/iommufd.h>

#define IOMMU_READ (1 << 0)
#define IOMMU_WRITE (1 << 1)
@@ -227,6 +228,41 @@ struct iommu_iotlb_gather {
bool queued;
};

+/**
+ * struct iommu_user_data - iommu driver specific user space data info
+ * @uptr: Pointer to the user buffer for copy_from_user()
+ * @len: The length of the user buffer in bytes
+ *
+ * A user space data is an uAPI that is defined in include/uapi/linux/iommufd.h
+ * Both @uptr and @len should be just copied from an iommufd core uAPI structure
+ */
+struct iommu_user_data {
+ void __user *uptr;
+ size_t len;
+};
+
+/**
+ * iommu_copy_user_data - Copy iommu driver specific user space data
+ * @dst_data: Pointer to an iommu driver specific user data that is defined in
+ * include/uapi/linux/iommufd.h
+ * @src_data: Pointer to a struct iommu_user_data for user space data info
+ * @data_len: Length of current user data structure, i.e. sizeof(struct _dst)
+ * @min_len: Initial length of user data structure for backward compatibility.
+ * This should be offsetofend using the last member in the user data
+ * struct that was initially added to include/uapi/linux/iommufd.h
+ */
+static inline int iommu_copy_user_data(void *dst_data,
+ const struct iommu_user_data *src_data,
+ size_t data_len, size_t min_len)
+{
+ if (WARN_ON(!dst_data || !src_data))
+ return -EINVAL;
+ if (src_data->len < min_len || data_len < src_data->len)
+ return -EINVAL;
+ return copy_struct_from_user(dst_data, data_len,
+ src_data->uptr, src_data->len);
+}
+
/**
* struct iommu_ops - iommu ops and capabilities
* @capable: check capability
@@ -237,11 +273,17 @@ struct iommu_iotlb_gather {
* @domain_alloc: allocate iommu domain
* @domain_alloc_user: Allocate an iommu domain corresponding to the input
* parameters like flags defined as enum iommufd_ioas_map_flags
+ * and the @hwpt_type that is defined as enum iommu_hwpt_type
* in include/uapi/linux/iommufd.h. Different from the
* domain_alloc op, it requires iommu driver to fully
* initialize a new domain including the generic iommu_domain
- * struct. Upon success, a domain is returned. Upon failure,
- * ERR_PTR must be returned.
+ * struct.
+ * Upon success, if the @user_data is valid and the @parent
+ * points to a kernel-managed domain, the new domain must be
+ * IOMMU_DOMAIN_NESTED type; otherwise, the @parent must be
+ * NULL while the @user_data can be optionally provided, the
+ * new domain must be IOMMU_DOMAIN_UNMANAGED type.
+ * Upon failure, ERR_PTR must be returned.
* @probe_device: Add device to iommu driver handling
* @release_device: Remove device from iommu driver handling
* @probe_finalize: Do final setup work after the device is added to an IOMMU
@@ -274,7 +316,10 @@ struct iommu_ops {

/* Domain allocation and freeing by the iommu driver */
struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type);
- struct iommu_domain *(*domain_alloc_user)(struct device *dev, u32 flags);
+ struct iommu_domain *(*domain_alloc_user)(struct device *dev, u32 flags,
+ enum iommu_hwpt_type hwpt_type,
+ struct iommu_domain *parent,
+ const struct iommu_user_data *user_data);

struct iommu_device *(*probe_device)(struct device *dev);
void (*release_device)(struct device *dev);
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 4a7c5c8fdbb4..3c8660fe9bb1 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -357,6 +357,14 @@ enum iommufd_hwpt_alloc_flags {
IOMMU_HWPT_ALLOC_NEST_PARENT = 1 << 0,
};

+/**
+ * enum iommu_hwpt_type - IOMMU HWPT Type
+ * @IOMMU_HWPT_TYPE_DEFAULT: default
+ */
+enum iommu_hwpt_type {
+ IOMMU_HWPT_TYPE_DEFAULT,
+};
+
/**
* struct iommu_hwpt_alloc - ioctl(IOMMU_HWPT_ALLOC)
* @size: sizeof(struct iommu_hwpt_alloc)
--
2.34.1

2023-09-22 14:27:30

by Yi Liu

[permalink] [raw]
Subject: [PATCH v4 11/17] iommufd: Add IOMMU_HWPT_INVALIDATE

In nested translation, the stage-1 page table is user-managed but cached
by the IOMMU hardware, so an update on present page table entries in the
stage-1 page table should be followed with a cache invalidation.

Add an IOMMU_HWPT_INVALIDATE ioctl to support such a cache invalidation.
It takes hwpt_id to specify the iommu_domain, and a multi-entry array to
support multiple invalidation requests in one ioctl.

Co-developed-by: Nicolin Chen <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
---
drivers/iommu/iommufd/hw_pagetable.c | 33 +++++++++++++++++++++++++
drivers/iommu/iommufd/iommufd_private.h | 9 +++++++
drivers/iommu/iommufd/main.c | 3 +++
include/uapi/linux/iommufd.h | 29 ++++++++++++++++++++++
4 files changed, 74 insertions(+)

diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index ab25de149ae6..72c46de1396b 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -382,3 +382,36 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
iommufd_put_object(&idev->obj);
return rc;
}
+
+int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd)
+{
+ struct iommu_hwpt_invalidate *cmd = ucmd->cmd;
+ struct iommu_user_data_array data_array = {
+ .uptr = u64_to_user_ptr(cmd->reqs_uptr),
+ .entry_len = cmd->req_len,
+ .entry_num = cmd->req_num,
+ };
+ struct iommufd_hw_pagetable *hwpt;
+ int rc = 0;
+
+ if (!cmd->req_len || !cmd->req_num)
+ return -EOPNOTSUPP;
+
+ hwpt = iommufd_get_hwpt(ucmd, cmd->hwpt_id);
+ if (IS_ERR(hwpt))
+ return PTR_ERR(hwpt);
+
+ if (!hwpt->user_managed) {
+ rc = -EINVAL;
+ goto out_put_hwpt;
+ }
+
+ rc = hwpt->domain->ops->cache_invalidate_user(hwpt->domain, &data_array,
+ &cmd->out_driver_error_code);
+ cmd->req_num = data_array.entry_num;
+ if (iommufd_ucmd_respond(ucmd, sizeof(*cmd)))
+ return -EFAULT;
+out_put_hwpt:
+ iommufd_put_object(&hwpt->obj);
+ return rc;
+}
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index b14f23d3f42e..bdbc8dac2fd8 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -270,6 +270,7 @@ iommufd_hw_pagetable_detach(struct iommufd_device *idev);
void iommufd_hw_pagetable_destroy(struct iommufd_object *obj);
void iommufd_hw_pagetable_abort(struct iommufd_object *obj);
int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd);
+int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd);

static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx,
struct iommufd_hw_pagetable *hwpt)
@@ -281,6 +282,14 @@ static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx,
refcount_dec(&hwpt->obj.users);
}

+static inline struct iommufd_hw_pagetable *
+iommufd_get_hwpt(struct iommufd_ucmd *ucmd, u32 id)
+{
+ return container_of(iommufd_get_object(ucmd->ictx, id,
+ IOMMUFD_OBJ_HW_PAGETABLE),
+ struct iommufd_hw_pagetable, obj);
+}
+
static inline bool
iommufd_hw_pagetable_compare_ioas(struct iommufd_hw_pagetable *old_hwpt,
struct iommufd_hw_pagetable *new_hwpt)
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index e71523cbd0de..d9d82a413105 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -307,6 +307,7 @@ union ucmd_buffer {
struct iommu_destroy destroy;
struct iommu_hw_info info;
struct iommu_hwpt_alloc hwpt;
+ struct iommu_hwpt_invalidate cache;
struct iommu_ioas_alloc alloc;
struct iommu_ioas_allow_iovas allow_iovas;
struct iommu_ioas_copy ioas_copy;
@@ -342,6 +343,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
__reserved),
IOCTL_OP(IOMMU_HWPT_ALLOC, iommufd_hwpt_alloc, struct iommu_hwpt_alloc,
__reserved),
+ IOCTL_OP(IOMMU_HWPT_INVALIDATE, iommufd_hwpt_invalidate,
+ struct iommu_hwpt_invalidate, out_driver_error_code),
IOCTL_OP(IOMMU_IOAS_ALLOC, iommufd_ioas_alloc_ioctl,
struct iommu_ioas_alloc, out_ioas_id),
IOCTL_OP(IOMMU_IOAS_ALLOW_IOVAS, iommufd_ioas_allow_iovas,
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index c46b1f772f20..2083a0309a9b 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -47,6 +47,7 @@ enum {
IOMMUFD_CMD_VFIO_IOAS,
IOMMUFD_CMD_HWPT_ALLOC,
IOMMUFD_CMD_GET_HW_INFO,
+ IOMMUFD_CMD_HWPT_INVALIDATE,
};

/**
@@ -478,4 +479,32 @@ struct iommu_hw_info {
__u32 __reserved;
};
#define IOMMU_GET_HW_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_GET_HW_INFO)
+
+/**
+ * struct iommu_hwpt_invalidate - ioctl(IOMMU_HWPT_INVALIDATE)
+ * @size: sizeof(struct iommu_hwpt_invalidate)
+ * @hwpt_id: HWPT ID of target hardware page table for cache invalidation
+ * @reqs_uptr: User pointer to an array having @req_num of cache invalidation
+ * requests. The request entries in the array are of fixed width
+ * @req_len, and contain a user data structure for invalidation
+ * request specific to the given hardware page table.
+ * @req_len: Length (in bytes) of a request entry in the request array
+ * @req_num: Input the number of cache invalidation requests in the array.
+ * Output the number of requests successfully handled by kernel.
+ * @out_driver_error_code: Report a driver speicifc error code upon failure
+ *
+ * Invalidate the iommu cache for user-managed page table. Modifications on a
+ * user-managed page table should be followed by this operation to sync cache.
+ * Each ioctl can support one or more cache invalidation requests in the array
+ * that has a total size of @req_len * @req_num.
+ */
+struct iommu_hwpt_invalidate {
+ __u32 size;
+ __u32 hwpt_id;
+ __aligned_u64 reqs_uptr;
+ __u32 req_len;
+ __u32 req_num;
+ __u32 out_driver_error_code;
+};
+#define IOMMU_HWPT_INVALIDATE _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_INVALIDATE)
#endif
--
2.34.1

2023-09-22 14:46:01

by Yi Liu

[permalink] [raw]
Subject: [PATCH v4 13/17] iommufd/selftest: Add nested domain allocation for mock domain

From: Nicolin Chen <[email protected]>

Add nested domain support in the ->domain_alloc_user op with some proper
sanity checks. Then, add a domain_nested_ops for all nested domains.

Also, add an iotlb as a testing property of a nested domain. A following
patch will verify its value for the success of a nested domain allocation
and a cache invalidation request.

Signed-off-by: Nicolin Chen <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
---
drivers/iommu/iommufd/iommufd_test.h | 18 +++++
drivers/iommu/iommufd/selftest.c | 114 ++++++++++++++++++++++++---
2 files changed, 123 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
index 3f3644375bf1..7f997234a1a8 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -40,6 +40,11 @@ enum {
MOCK_FLAGS_ACCESS_CREATE_NEEDS_PIN_PAGES = 1 << 0,
};

+enum {
+ MOCK_NESTED_DOMAIN_IOTLB_ID_MAX = 3,
+ MOCK_NESTED_DOMAIN_IOTLB_NUM = 4,
+};
+
struct iommu_test_cmd {
__u32 size;
__u32 op;
@@ -109,4 +114,17 @@ struct iommu_test_hw_info {
__u32 test_reg;
};

+/* Should not be equal to any defined value in enum iommu_hwpt_type */
+#define IOMMU_HWPT_TYPE_SELFTEST 0xdead
+
+/**
+ * struct iommu_hwpt_selftest
+ *
+ * @iotlb: default mock iotlb value, IOMMU_TEST_IOTLB_DEFAULT
+ */
+struct iommu_hwpt_selftest {
+#define IOMMU_TEST_IOTLB_DEFAULT 0xbadbeef
+ __u32 iotlb;
+};
+
#endif
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 2205a552e570..bd967317927f 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -88,6 +88,17 @@ void iommufd_test_syz_conv_iova_id(struct iommufd_ucmd *ucmd,
struct mock_iommu_domain {
struct iommu_domain domain;
struct xarray pfns;
+ bool nested : 1;
+ /* mock domain test data */
+ union {
+ struct { /* nested */
+ struct mock_iommu_domain *parent;
+ u32 iotlb[MOCK_NESTED_DOMAIN_IOTLB_NUM];
+ };
+ struct { /* parent */
+ enum iommu_hwpt_type hwpt_type;
+ };
+ };
};

enum selftest_obj_type {
@@ -147,8 +158,12 @@ static void *mock_domain_hw_info(struct device *dev, u32 *length, u32 *type)
}

static const struct iommu_ops mock_ops;
+static struct iommu_domain_ops domain_nested_ops;

-static struct iommu_domain *mock_domain_alloc(unsigned int iommu_domain_type)
+static struct iommu_domain *
+__mock_domain_alloc_kernel(unsigned int iommu_domain_type,
+ struct mock_iommu_domain *dummy,
+ const struct iommu_hwpt_selftest *user_cfg)
{
struct mock_iommu_domain *mock;

@@ -156,11 +171,11 @@ static struct iommu_domain *mock_domain_alloc(unsigned int iommu_domain_type)
return &mock_blocking_domain;

if (iommu_domain_type != IOMMU_DOMAIN_UNMANAGED)
- return NULL;
+ return ERR_PTR(-EINVAL);

mock = kzalloc(sizeof(*mock), GFP_KERNEL);
if (!mock)
- return NULL;
+ return ERR_PTR(-ENOMEM);
mock->domain.geometry.aperture_start = MOCK_APERTURE_START;
mock->domain.geometry.aperture_end = MOCK_APERTURE_LAST;
mock->domain.pgsize_bitmap = MOCK_IO_PAGE_SIZE;
@@ -170,18 +185,91 @@ static struct iommu_domain *mock_domain_alloc(unsigned int iommu_domain_type)
return &mock->domain;
}

+static struct iommu_domain *
+__mock_domain_alloc_nested(unsigned int iommu_domain_type,
+ struct mock_iommu_domain *mock_parent,
+ const struct iommu_hwpt_selftest *user_cfg)
+{
+ struct mock_iommu_domain *mock;
+ int i;
+
+ if (iommu_domain_type != IOMMU_DOMAIN_NESTED)
+ return ERR_PTR(-EINVAL);
+
+ if (!user_cfg)
+ return ERR_PTR(-EINVAL);
+
+ mock = kzalloc(sizeof(*mock), GFP_KERNEL);
+ if (!mock)
+ return ERR_PTR(-ENOMEM);
+ mock->nested = true;
+ mock->parent = mock_parent;
+ mock->domain.type = iommu_domain_type;
+ mock->domain.ops = &domain_nested_ops;
+ for (i = 0; i < MOCK_NESTED_DOMAIN_IOTLB_NUM; i++)
+ mock->iotlb[i] = user_cfg->iotlb;
+ return &mock->domain;
+}
+
+static struct iommu_domain *mock_domain_alloc(unsigned int iommu_domain_type)
+{
+ struct iommu_domain *domain;
+
+ if (iommu_domain_type != IOMMU_DOMAIN_BLOCKED &&
+ iommu_domain_type != IOMMU_DOMAIN_UNMANAGED)
+ return NULL;
+ domain = __mock_domain_alloc_kernel(iommu_domain_type, NULL, NULL);
+ if (IS_ERR(domain))
+ domain = NULL;
+ return domain;
+}
+
static struct iommu_domain *
mock_domain_alloc_user(struct device *dev, u32 flags,
enum iommu_hwpt_type hwpt_type,
struct iommu_domain *parent,
const struct iommu_user_data *user_data)
{
- struct iommu_domain *domain;
+ struct iommu_domain *(*alloc_fn)(unsigned int iommu_domain_type,
+ struct mock_iommu_domain *mock_parent,
+ const struct iommu_hwpt_selftest *user_cfg);
+ unsigned int iommu_domain_type = IOMMU_DOMAIN_UNMANAGED;
+ struct iommu_hwpt_selftest data, *user_cfg = NULL;
+ struct mock_iommu_domain *mock_parent = NULL;
+ size_t min_len, data_len;
+
+ switch (hwpt_type) {
+ case IOMMU_HWPT_TYPE_DEFAULT:
+ if (user_data || parent)
+ return ERR_PTR(-EINVAL);
+ min_len = data_len = 0;
+ alloc_fn = __mock_domain_alloc_kernel;
+ break;
+ default:
+ /* IOMMU_HWPT_TYPE_SELFTEST cannot be a case for a big value */
+ if (hwpt_type != IOMMU_HWPT_TYPE_SELFTEST)
+ return ERR_PTR(-EINVAL);
+ if (!user_data || !parent ||
+ parent->ops != mock_ops.default_domain_ops)
+ return ERR_PTR(-EINVAL);
+ iommu_domain_type = IOMMU_DOMAIN_NESTED;
+ mock_parent = container_of(parent,
+ struct mock_iommu_domain, domain);
+ min_len = offsetofend(struct iommu_hwpt_selftest, iotlb);
+ data_len = sizeof(struct iommu_hwpt_selftest);
+ alloc_fn = __mock_domain_alloc_nested;
+ break;
+ }

- domain = mock_domain_alloc(IOMMU_DOMAIN_UNMANAGED);
- if (!domain)
- domain = ERR_PTR(-ENOMEM);
- return domain;
+ if (user_data) {
+ int rc = iommu_copy_user_data(&data, user_data,
+ data_len, min_len);
+ if (rc)
+ return ERR_PTR(rc);
+ user_cfg = &data;
+ }
+
+ return alloc_fn(iommu_domain_type, mock_parent, user_cfg);
}

static void mock_domain_free(struct iommu_domain *domain)
@@ -340,6 +428,11 @@ static const struct iommu_ops mock_ops = {
},
};

+static struct iommu_domain_ops domain_nested_ops = {
+ .free = mock_domain_free,
+ .attach_dev = mock_domain_nop_attach,
+};
+
static inline struct iommufd_hw_pagetable *
get_md_pagetable(struct iommufd_ucmd *ucmd, u32 mockpt_id,
struct mock_iommu_domain **mock)
@@ -352,7 +445,10 @@ get_md_pagetable(struct iommufd_ucmd *ucmd, u32 mockpt_id,
if (IS_ERR(obj))
return ERR_CAST(obj);
hwpt = container_of(obj, struct iommufd_hw_pagetable, obj);
- if (hwpt->domain->ops != mock_ops.default_domain_ops) {
+ if ((hwpt->domain->type == IOMMU_DOMAIN_UNMANAGED &&
+ hwpt->domain->ops != mock_ops.default_domain_ops) ||
+ (hwpt->domain->type == IOMMU_DOMAIN_NESTED &&
+ hwpt->domain->ops != &domain_nested_ops)) {
iommufd_put_object(&hwpt->obj);
return ERR_PTR(-EINVAL);
}
--
2.34.1

2023-09-25 17:05:47

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op

On Mon, Sep 25, 2023 at 04:01:55PM +0800, Baolu Lu wrote:
> On 2023/9/25 14:22, Yi Liu wrote:
> > On 2023/9/21 20:10, Baolu Lu wrote:
> > > On 2023/9/21 15:51, Yi Liu wrote:
> > > > +/**
> > > > + * iommu_copy_user_data - Copy iommu driver specific user space data
> > > > + * @dst_data: Pointer to an iommu driver specific user data
> > > > that is defined in
> > > > + *            include/uapi/linux/iommufd.h
> > > > + * @src_data: Pointer to a struct iommu_user_data for user
> > > > space data info
> > > > + * @data_len: Length of current user data structure, i.e.
> > > > sizeof(struct _dst)
> > > > + * @min_len: Initial length of user data structure for backward
> > > > compatibility.
> > > > + *           This should be offsetofend using the last member
> > > > in the user data
> > > > + *           struct that was initially added to
> > > > include/uapi/linux/iommufd.h
> > > > + */
> > > > +static inline int iommu_copy_user_data(void *dst_data,
> > > > +                       const struct iommu_user_data *src_data,
> > > > +                       size_t data_len, size_t min_len)
> > > > +{
> > > > +    if (WARN_ON(!dst_data || !src_data))
> > > > +        return -EINVAL;
> > > > +    if (src_data->len < min_len || data_len < src_data->len)
> > > > +        return -EINVAL;
> > > > +    return copy_struct_from_user(dst_data, data_len,
> > > > +                     src_data->uptr, src_data->len);
> > > > +}
> > >
> > > I am not sure that I understand the purpose of "min_len" correctly. It
> > > seems like it would always be equal to data_len?
> >
> > no, it will not be equal to data_len once there is extension in the
> > uAPI structure.
> >
> > > Or, it means the minimal data length that the iommu driver requires?
> >
> > it is the minimal data length the uAPI requires. min_len is finalized
> > per the upstream of the first version of the uAPI.
>
> So, it looks like a constant. Perhaps we should document it in the
> uapi/iommuf.h and avoid using it as a parameter of a helper
> function?

It is per-driver, per-struct, so this is the right way to do it

Jason

2023-09-25 18:50:17

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op

On 2023/9/25 14:22, Yi Liu wrote:
> On 2023/9/21 20:10, Baolu Lu wrote:
>> On 2023/9/21 15:51, Yi Liu wrote:
>>> +/**
>>> + * iommu_copy_user_data - Copy iommu driver specific user space data
>>> + * @dst_data: Pointer to an iommu driver specific user data that is
>>> defined in
>>> + *            include/uapi/linux/iommufd.h
>>> + * @src_data: Pointer to a struct iommu_user_data for user space
>>> data info
>>> + * @data_len: Length of current user data structure, i.e.
>>> sizeof(struct _dst)
>>> + * @min_len: Initial length of user data structure for backward
>>> compatibility.
>>> + *           This should be offsetofend using the last member in the
>>> user data
>>> + *           struct that was initially added to
>>> include/uapi/linux/iommufd.h
>>> + */
>>> +static inline int iommu_copy_user_data(void *dst_data,
>>> +                       const struct iommu_user_data *src_data,
>>> +                       size_t data_len, size_t min_len)
>>> +{
>>> +    if (WARN_ON(!dst_data || !src_data))
>>> +        return -EINVAL;
>>> +    if (src_data->len < min_len || data_len < src_data->len)
>>> +        return -EINVAL;
>>> +    return copy_struct_from_user(dst_data, data_len,
>>> +                     src_data->uptr, src_data->len);
>>> +}
>>
>> I am not sure that I understand the purpose of "min_len" correctly. It
>> seems like it would always be equal to data_len?
>
> no, it will not be equal to data_len once there is extension in the
> uAPI structure.
>
>> Or, it means the minimal data length that the iommu driver requires?
>
> it is the minimal data length the uAPI requires. min_len is finalized
> per the upstream of the first version of the uAPI.

So, it looks like a constant. Perhaps we should document it in the
uapi/iommuf.h and avoid using it as a parameter of a helper function?

Best regards,
baolu

2023-09-25 21:08:57

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op

On Mon, Sep 25, 2023 at 02:17:37PM +0800, Yi Liu wrote:

> yes. It is just what the existing domain_alloc_user op does. Here we add
> a hwpt_type as the type can be given by user, so we need to define a
> specific type for it.
>
> Perhaps we can also name it as IOMMU_HWPT_TYPE_UNMANAGED to be
> aligned with

unmanged is also a weird nonsense name these days. We are slowly
replacing it with paging.

> the domain type naming. IOMMU_HWPT_TYPE_GENERIC is also a good choice.
> Please feel free let me know your preference.

This seems OK to me so far

Jason

2023-09-26 11:14:08

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v4 07/17] iommufd: Add user-managed hw_pagetable support

> From: Liu, Yi L <[email protected]>
> Sent: Thursday, September 21, 2023 3:51 PM
>
> +static void iommufd_user_managed_hwpt_abort(struct iommufd_object
> *obj)
> +{
> + struct iommufd_hw_pagetable *hwpt =
> + container_of(obj, struct iommufd_hw_pagetable, obj);
> +
> + /* The parent->mutex must be held until finalize is called. */
> + lockdep_assert_held(&hwpt->parent->mutex);
> +
> + iommufd_hw_pagetable_destroy(obj);
> +}

Can you elaborate what exactly is protected by parent->mutex?

My gut-feeling that the child just grabs a refcnt on the parent
object. It doesn't change any other data of the parent.

>
> +/**
> + * iommufd_user_managed_hwpt_alloc() - Get a user-managed
> hw_pagetable
> + * @ictx: iommufd context
> + * @pt_obj: Parent object to an HWPT to associate the domain with
> + * @idev: Device to get an iommu_domain for
> + * @flags: Flags from userspace
> + * @hwpt_type: Requested type of hw_pagetable
> + * @user_data: user_data pointer
> + * @dummy: never used
> + *
> + * Allocate a new iommu_domain (must be IOMMU_DOMAIN_NESTED) and
> return it as
> + * a user-managed hw_pagetable.

Add some text to highlight the requirement being a parent, e.g. not
an auto domain, must be capable of being a parent, etc.

> + case IOMMUFD_OBJ_HW_PAGETABLE:
> + parent = container_of(pt_obj, struct iommufd_hw_pagetable,
> obj);
> + /* No user-managed HWPT on top of an user-managed one
> */
> + if (parent->user_managed) {
> + rc = -EINVAL;
> + goto out_put_pt;
> + }

move to alloc_fn().

2023-09-26 12:11:33

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains

> From: Liu, Yi L <[email protected]>
> Sent: Thursday, September 21, 2023 3:51 PM
>
> From: Nicolin Chen <[email protected]>
>
> Now enforce_cache_coherency and msi_cookie are kernel-managed hwpt
> things.
> So, they should be only setup on kernel-managed domains. If the attaching
> domain is a user-managed domain, redirect the hwpt to hwpt->parent to do
> it correctly.
>

No redirection. The parent should already have the configuration done
when it's created. It shouldn't be triggered in the nesting path.

2023-09-27 03:59:14

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op

> From: Yi Liu <[email protected]>
> Sent: Thursday, September 21, 2023 3:51 PM
> +
> +/**
> + * iommu_copy_user_data - Copy iommu driver specific user space data
> + * @dst_data: Pointer to an iommu driver specific user data that is defined
> in
> + * include/uapi/linux/iommufd.h
> + * @src_data: Pointer to a struct iommu_user_data for user space data info
> + * @data_len: Length of current user data structure, i.e. sizeof(struct _dst)
> + * @min_len: Initial length of user data structure for backward compatibility.
> + * This should be offsetofend using the last member in the user data
> + * struct that was initially added to include/uapi/linux/iommufd.h
> + */
> +static inline int iommu_copy_user_data(void *dst_data,
> + const struct iommu_user_data *src_data,
> + size_t data_len, size_t min_len)

iommu_copy_struct_from_user()?

btw given the confusion raised on how this would be used is it clearer
to move it to the patch together with the 1st user?

2023-10-07 07:49:06

by Yan Zhao

[permalink] [raw]
Subject: Re: [PATCH v4 09/17] iommufd/device: Add helpers to enforce/remove device reserved regions

> @@ -444,10 +465,9 @@ iommufd_device_do_replace(struct iommufd_device *idev,
> }
>
> old_hwpt = igroup->hwpt;
> - if (hwpt->ioas != old_hwpt->ioas) {
> + if (iommufd_hw_pagetable_compare_ioas(old_hwpt, hwpt)) {
> list_for_each_entry(cur, &igroup->device_list, group_item) {
> - rc = iopt_table_enforce_dev_resv_regions(
> - &hwpt->ioas->iopt, cur->dev, NULL);
> + rc = iommufd_device_enforce_rr(cur, hwpt, NULL);
> if (rc)
> goto err_unresv;
> }
> @@ -461,12 +481,10 @@ iommufd_device_do_replace(struct iommufd_device *idev,
> if (rc)
> goto err_unresv;
>
> - if (hwpt->ioas != old_hwpt->ioas) {
> + if (iommufd_hw_pagetable_compare_ioas(old_hwpt, hwpt)) {
> list_for_each_entry(cur, &igroup->device_list, group_item)
> - iopt_remove_reserved_iova(&old_hwpt->ioas->iopt,
> - cur->dev);
> + iommufd_device_remove_rr(cur, hwpt);
Should be "iommufd_device_remove_rr(cur, old_hwpt);"

> }
> -
> igroup->hwpt = hwpt;
>
> /*

2023-10-07 09:28:12

by Nicolin Chen

[permalink] [raw]
Subject: Re: [PATCH v4 09/17] iommufd/device: Add helpers to enforce/remove device reserved regions

On Sat, Oct 07, 2023 at 03:20:41PM +0800, Yan Zhao wrote:
> > @@ -444,10 +465,9 @@ iommufd_device_do_replace(struct iommufd_device *idev,
> > }
> >
> > old_hwpt = igroup->hwpt;
> > - if (hwpt->ioas != old_hwpt->ioas) {
> > + if (iommufd_hw_pagetable_compare_ioas(old_hwpt, hwpt)) {
> > list_for_each_entry(cur, &igroup->device_list, group_item) {
> > - rc = iopt_table_enforce_dev_resv_regions(
> > - &hwpt->ioas->iopt, cur->dev, NULL);
> > + rc = iommufd_device_enforce_rr(cur, hwpt, NULL);
> > if (rc)
> > goto err_unresv;
> > }
> > @@ -461,12 +481,10 @@ iommufd_device_do_replace(struct iommufd_device *idev,
> > if (rc)
> > goto err_unresv;
> >
> > - if (hwpt->ioas != old_hwpt->ioas) {
> > + if (iommufd_hw_pagetable_compare_ioas(old_hwpt, hwpt)) {
> > list_for_each_entry(cur, &igroup->device_list, group_item)
> > - iopt_remove_reserved_iova(&old_hwpt->ioas->iopt,
> > - cur->dev);
> > + iommufd_device_remove_rr(cur, hwpt);
> Should be "iommufd_device_remove_rr(cur, old_hwpt);"

Ah, right. Should fix this.

Thanks!
Nicolin

2023-10-10 16:53:33

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 00/17] iommufd: Add nesting infrastructure

On Thu, Sep 21, 2023 at 12:51:21AM -0700, Yi Liu wrote:
> v4:
> - Separate HWPT alloc/destroy/abort functions between user-managed HWPTs
> and kernel-managed HWPTs
> - Rework invalidate uAPI to be a multi-request array-based design
> - Add a struct iommu_user_data_array and a helper for driver to sanitize
> and copy the entry data from user space invalidation array
> - Add a patch fixing TEST_LENGTH() in selftest program
> - Drop IOMMU_RESV_IOVA_RANGES patches
> - Update kdoc and inline comments
> - Drop the code to add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation,
> this does not change the rule that resv regions should only be added to the
> kernel-managed HWPT. The IOMMU_RESV_SW_MSI stuff will be added in later series
> as it is needed only by SMMU so far.

This doesn't apply to iommufd next so you will have to resend it

Jason

2023-10-10 16:58:59

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op

On Thu, Sep 21, 2023 at 12:51:22AM -0700, Yi Liu wrote:
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 660dc1931dc9..12e12e5563e6 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -14,6 +14,7 @@
> #include <linux/err.h>
> #include <linux/of.h>
> #include <uapi/linux/iommu.h>
> +#include <uapi/linux/iommufd.h>

Oh we should definately avoid doing that!

Maybe this is a good moment to start a new header file exclusively for
iommu drivers and core subsystem to include?

include/linux/iommu-driver.h

?

Put iommu_copy_user_data() and struct iommu_user_data in there

Avoid this include in this file.

> #define IOMMU_READ (1 << 0)
> #define IOMMU_WRITE (1 << 1)
> @@ -227,6 +228,41 @@ struct iommu_iotlb_gather {
> bool queued;
> };
>
> +/**
> + * struct iommu_user_data - iommu driver specific user space data info
> + * @uptr: Pointer to the user buffer for copy_from_user()
> + * @len: The length of the user buffer in bytes
> + *
> + * A user space data is an uAPI that is defined in include/uapi/linux/iommufd.h
> + * Both @uptr and @len should be just copied from an iommufd core uAPI structure
> + */
> +struct iommu_user_data {
> + void __user *uptr;
> + size_t len;
> +};

Put the "hwpt_type" in here and just call it type

Jason

2023-10-12 08:45:41

by Yi Liu

[permalink] [raw]
Subject: Re: [PATCH v4 00/17] iommufd: Add nesting infrastructure

On 2023/10/11 00:53, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 12:51:21AM -0700, Yi Liu wrote:
>> v4:
>> - Separate HWPT alloc/destroy/abort functions between user-managed HWPTs
>> and kernel-managed HWPTs
>> - Rework invalidate uAPI to be a multi-request array-based design
>> - Add a struct iommu_user_data_array and a helper for driver to sanitize
>> and copy the entry data from user space invalidation array
>> - Add a patch fixing TEST_LENGTH() in selftest program
>> - Drop IOMMU_RESV_IOVA_RANGES patches
>> - Update kdoc and inline comments
>> - Drop the code to add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation,
>> this does not change the rule that resv regions should only be added to the
>> kernel-managed HWPT. The IOMMU_RESV_SW_MSI stuff will be added in later series
>> as it is needed only by SMMU so far.
>
> This doesn't apply to iommufd next so you will have to resend it

sure. I'm working with Nic on a new version.

--
Regards,
Yi Liu

2023-10-12 09:09:16

by Yi Liu

[permalink] [raw]
Subject: Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op

On 2023/10/11 00:58, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 12:51:22AM -0700, Yi Liu wrote:
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index 660dc1931dc9..12e12e5563e6 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -14,6 +14,7 @@
>> #include <linux/err.h>
>> #include <linux/of.h>
>> #include <uapi/linux/iommu.h>
>> +#include <uapi/linux/iommufd.h>
>
> Oh we should definately avoid doing that!
>
> Maybe this is a good moment to start a new header file exclusively for
> iommu drivers and core subsystem to include?
>
> include/linux/iommu-driver.h
>
> ?
>
> Put iommu_copy_user_data() and struct iommu_user_data in there
>
> Avoid this include in this file.

sure. btw. seems all the user of this API and structure are in the
drivers/iommu directory. can we just putting them in
drivers/iommu/iommu-priv.h?

>
>> #define IOMMU_READ (1 << 0)
>> #define IOMMU_WRITE (1 << 1)
>> @@ -227,6 +228,41 @@ struct iommu_iotlb_gather {
>> bool queued;
>> };
>>
>> +/**
>> + * struct iommu_user_data - iommu driver specific user space data info
>> + * @uptr: Pointer to the user buffer for copy_from_user()
>> + * @len: The length of the user buffer in bytes
>> + *
>> + * A user space data is an uAPI that is defined in include/uapi/linux/iommufd.h
>> + * Both @uptr and @len should be just copied from an iommufd core uAPI structure
>> + */
>> +struct iommu_user_data {
>> + void __user *uptr;
>> + size_t len;
>> +};
>
> Put the "hwpt_type" in here and just call it type

I assume this is a change related to the above comment to avoid
including uapi/linux/iommufd.h. right?

Just one concern. There are other paths (like cache_invalidate of
this series and Nic's set_dev_data) uses this struct as well. I'm
a bit worrying if it is good to put type here as type is meaningful
for the domain_alloc_user path.

--
Regards,
Yi Liu

2023-10-12 09:11:12

by Yi Liu

[permalink] [raw]
Subject: Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op

On 2023/9/26 14:56, Tian, Kevin wrote:
>> From: Yi Liu <[email protected]>
>> Sent: Thursday, September 21, 2023 3:51 PM
>> +
>> +/**
>> + * iommu_copy_user_data - Copy iommu driver specific user space data
>> + * @dst_data: Pointer to an iommu driver specific user data that is defined
>> in
>> + * include/uapi/linux/iommufd.h
>> + * @src_data: Pointer to a struct iommu_user_data for user space data info
>> + * @data_len: Length of current user data structure, i.e. sizeof(struct _dst)
>> + * @min_len: Initial length of user data structure for backward compatibility.
>> + * This should be offsetofend using the last member in the user data
>> + * struct that was initially added to include/uapi/linux/iommufd.h
>> + */
>> +static inline int iommu_copy_user_data(void *dst_data,
>> + const struct iommu_user_data *src_data,
>> + size_t data_len, size_t min_len)
>
> iommu_copy_struct_from_user()?
>
> btw given the confusion raised on how this would be used is it clearer
> to move it to the patch together with the 1st user?

sure. How about your opinion? @Nic.

--
Regards,
Yi Liu

2023-10-12 13:39:43

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op

On Thu, Oct 12, 2023 at 05:11:09PM +0800, Yi Liu wrote:
> On 2023/10/11 00:58, Jason Gunthorpe wrote:
> > On Thu, Sep 21, 2023 at 12:51:22AM -0700, Yi Liu wrote:
> > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > > index 660dc1931dc9..12e12e5563e6 100644
> > > --- a/include/linux/iommu.h
> > > +++ b/include/linux/iommu.h
> > > @@ -14,6 +14,7 @@
> > > #include <linux/err.h>
> > > #include <linux/of.h>
> > > #include <uapi/linux/iommu.h>
> > > +#include <uapi/linux/iommufd.h>
> >
> > Oh we should definately avoid doing that!
> > Maybe this is a good moment to start a new header file exclusively for
> > iommu drivers and core subsystem to include?
> >
> > include/linux/iommu-driver.h
> >
> > ?
> >
> > Put iommu_copy_user_data() and struct iommu_user_data in there
> >
> > Avoid this include in this file.
>
> sure. btw. seems all the user of this API and structure are in the
> drivers/iommu directory. can we just putting them in
> drivers/iommu/iommu-priv.h?

iommu-priv.h should be private to the core iommu code, and we sort of
extended it to iommufd as well.

iommu-driver.h would be "private" to the core and all the drivers
only.

As include ../.. is often frown on at large scale it is probably
better to be in include/linux

> Just one concern. There are other paths (like cache_invalidate of
> this series and Nic's set_dev_data) uses this struct as well. I'm
> a bit worrying if it is good to put type here as type is meaningful
> for the domain_alloc_user path.

There is always a type though? I haven't got that far in the series
yet to see..

Jason

2023-10-13 00:35:57

by Nicolin Chen

[permalink] [raw]
Subject: Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op

On Tue, Oct 10, 2023 at 01:58:44PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 12:51:22AM -0700, Yi Liu wrote:
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 660dc1931dc9..12e12e5563e6 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -14,6 +14,7 @@
> > #include <linux/err.h>
> > #include <linux/of.h>
> > #include <uapi/linux/iommu.h>
> > +#include <uapi/linux/iommufd.h>
>
> Oh we should definately avoid doing that!
>
> Maybe this is a good moment to start a new header file exclusively for
> iommu drivers and core subsystem to include?
>
> include/linux/iommu-driver.h
>
> ?
>
> Put iommu_copy_user_data() and struct iommu_user_data in there
>
> Avoid this include in this file.

By looking closer, it seems that we included the uapi header for:
+ struct iommu_domain *(*domain_alloc_user)(struct device *dev, u32 flags,
+ enum iommu_hwpt_data_type data_type,
+ struct iommu_domain *parent,
+ const struct iommu_user_data *user_data);

So we could drop the include, and instead add this next to structs:
+enum iommu_hwpt_data_type;

Then it's not that necessary to have a new header? We could mark a
section of "driver exclusively functions" in iommu.h, I think.

> > #define IOMMU_READ (1 << 0)
> > #define IOMMU_WRITE (1 << 1)
> > @@ -227,6 +228,41 @@ struct iommu_iotlb_gather {
> > bool queued;
> > };
> >
> > +/**
> > + * struct iommu_user_data - iommu driver specific user space data info
> > + * @uptr: Pointer to the user buffer for copy_from_user()
> > + * @len: The length of the user buffer in bytes
> > + *
> > + * A user space data is an uAPI that is defined in include/uapi/linux/iommufd.h
> > + * Both @uptr and @len should be just copied from an iommufd core uAPI structure
> > + */
> > +struct iommu_user_data {
> > + void __user *uptr;
> > + size_t len;
> > +};
>
> Put the "hwpt_type" in here and just call it type

Ah, then domain_alloc_user can omit the enum iommu_hwpt_data_type.

Thanks!
Nic

2023-10-13 04:32:13

by Yi Liu

[permalink] [raw]
Subject: Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op

On 2023/10/12 21:39, Jason Gunthorpe wrote:
> On Thu, Oct 12, 2023 at 05:11:09PM +0800, Yi Liu wrote:
>> On 2023/10/11 00:58, Jason Gunthorpe wrote:
>>> On Thu, Sep 21, 2023 at 12:51:22AM -0700, Yi Liu wrote:
>>>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>>>> index 660dc1931dc9..12e12e5563e6 100644
>>>> --- a/include/linux/iommu.h
>>>> +++ b/include/linux/iommu.h
>>>> @@ -14,6 +14,7 @@
>>>> #include <linux/err.h>
>>>> #include <linux/of.h>
>>>> #include <uapi/linux/iommu.h>
>>>> +#include <uapi/linux/iommufd.h>
>>>
>>> Oh we should definately avoid doing that!
>>> Maybe this is a good moment to start a new header file exclusively for
>>> iommu drivers and core subsystem to include?
>>>
>>> include/linux/iommu-driver.h
>>>
>>> ?
>>>
>>> Put iommu_copy_user_data() and struct iommu_user_data in there
>>>
>>> Avoid this include in this file.
>>
>> sure. btw. seems all the user of this API and structure are in the
>> drivers/iommu directory. can we just putting them in
>> drivers/iommu/iommu-priv.h?
>
> iommu-priv.h should be private to the core iommu code, and we sort of
> extended it to iommufd as well.
>
> iommu-driver.h would be "private" to the core and all the drivers
> only.
>
> As include ../.. is often frown on at large scale it is probably
> better to be in include/linux

got it.

>
>> Just one concern. There are other paths (like cache_invalidate of
>> this series and Nic's set_dev_data) uses this struct as well. I'm
>> a bit worrying if it is good to put type here as type is meaningful
>> for the domain_alloc_user path.
>
> There is always a type though? I haven't got that far in the series
> yet to see..

not really. Below the users of the struct iommu_user_data in my current
iommufd_nesting branch. Only the domain_alloc_user op has type as there
can be multiple vendor specific alloc data types. Basically, I'm ok to
make the change you suggested, just not sure if it is good to add type
as it is only needed by one path.

/* Domain allocation and freeing by the iommu driver */
struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type);
struct iommu_domain *(*domain_alloc_user)(struct device *dev, u32 flags,
enum iommu_hwpt_type hwpt_type,
struct iommu_domain *parent,
const struct iommu_user_data *user_data);


int (*set_dev_user_data)(struct device *dev,
const struct iommu_user_data *user_data);
void (*unset_dev_user_data)(struct device *dev);


int (*cache_invalidate_user)(struct iommu_domain *domain,
struct iommu_user_data_array *array,
u32 *error_code);

above code snippet is from below file:

https://github.com/yiliu1765/iommufd/blob/iommufd_nesting/include/linux/iommu.h

--
Regards,
Yi Liu

2023-10-13 11:42:18

by Yi Liu

[permalink] [raw]
Subject: Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op

On 2023/10/12 17:12, Yi Liu wrote:
> On 2023/9/26 14:56, Tian, Kevin wrote:
>>> From: Yi Liu <[email protected]>
>>> Sent: Thursday, September 21, 2023 3:51 PM
>>> +
>>> +/**
>>> + * iommu_copy_user_data - Copy iommu driver specific user space data
>>> + * @dst_data: Pointer to an iommu driver specific user data that is
>>> defined
>>> in
>>> + *            include/uapi/linux/iommufd.h
>>> + * @src_data: Pointer to a struct iommu_user_data for user space data info
>>> + * @data_len: Length of current user data structure, i.e. sizeof(struct
>>> _dst)
>>> + * @min_len: Initial length of user data structure for backward
>>> compatibility.
>>> + *           This should be offsetofend using the last member in the
>>> user data
>>> + *           struct that was initially added to
>>> include/uapi/linux/iommufd.h
>>> + */
>>> +static inline int iommu_copy_user_data(void *dst_data,
>>> +                       const struct iommu_user_data *src_data,
>>> +                       size_t data_len, size_t min_len)
>>
>> iommu_copy_struct_from_user()?
>>
>> btw given the confusion raised on how this would be used is it clearer
>> to move it to the patch together with the 1st user?
>
> sure. How about your opinion? @Nic.
>

after a second thinking, the first user of this helper is the patch to
extend mock iommu driver. Is it suitable to introduce a common API together
with selftest code?

https://lore.kernel.org/linux-iommu/[email protected]/

--
Regards,
Yi Liu

2023-10-13 14:04:25

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op

On Thu, Oct 12, 2023 at 05:34:58PM -0700, Nicolin Chen wrote:
> On Tue, Oct 10, 2023 at 01:58:44PM -0300, Jason Gunthorpe wrote:
> > On Thu, Sep 21, 2023 at 12:51:22AM -0700, Yi Liu wrote:
> > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > > index 660dc1931dc9..12e12e5563e6 100644
> > > --- a/include/linux/iommu.h
> > > +++ b/include/linux/iommu.h
> > > @@ -14,6 +14,7 @@
> > > #include <linux/err.h>
> > > #include <linux/of.h>
> > > #include <uapi/linux/iommu.h>
> > > +#include <uapi/linux/iommufd.h>
> >
> > Oh we should definately avoid doing that!
> >
> > Maybe this is a good moment to start a new header file exclusively for
> > iommu drivers and core subsystem to include?
> >
> > include/linux/iommu-driver.h
> >
> > ?
> >
> > Put iommu_copy_user_data() and struct iommu_user_data in there
> >
> > Avoid this include in this file.
>
> By looking closer, it seems that we included the uapi header for:
> + struct iommu_domain *(*domain_alloc_user)(struct device *dev, u32 flags,
> + enum iommu_hwpt_data_type data_type,
> + struct iommu_domain *parent,
> + const struct iommu_user_data *user_data);
>
> So we could drop the include, and instead add this next to structs:
> +enum iommu_hwpt_data_type;
>
> Then it's not that necessary to have a new header? We could mark a
> section of "driver exclusively functions" in iommu.h, I think.

Yeah, OK, though I still have a desire to split this header..

(though can you really forward declare enums and then pass it by value?)

Jason

2023-10-13 14:05:28

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op

On Fri, Oct 13, 2023 at 12:33:13PM +0800, Yi Liu wrote:

> not really. Below the users of the struct iommu_user_data in my current
> iommufd_nesting branch. Only the domain_alloc_user op has type as there
> can be multiple vendor specific alloc data types. Basically, I'm ok to
> make the change you suggested, just not sure if it is good to add type
> as it is only needed by one path.

I don't think we should ever have an opaque data blob without a type
tag..

Jason

2023-10-13 17:57:39

by Nicolin Chen

[permalink] [raw]
Subject: Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op

On Fri, Oct 13, 2023 at 11:04:56AM -0300, Jason Gunthorpe wrote:
> On Fri, Oct 13, 2023 at 12:33:13PM +0800, Yi Liu wrote:
>
> > not really. Below the users of the struct iommu_user_data in my current
> > iommufd_nesting branch. Only the domain_alloc_user op has type as there
> > can be multiple vendor specific alloc data types. Basically, I'm ok to
> > make the change you suggested, just not sure if it is good to add type
> > as it is only needed by one path.
>
> I don't think we should ever have an opaque data blob without a type
> tag..

I can add those "missing" data types, and then a driver will be
responsible for sanitizing the type along with the data_len.

I notice that the enum iommu_hwpt_data_type in the posted patch
is confined to the alloc_user uAPI. Perhaps we should share it
with invalidate too:

/**
* enum iommu_hwpt_data_type - IOMMU HWPT Data Type
* @IOMMU_HWPT_DATA_NONE: no data
* @IOMMU_HWPT_DATA_VTD_S1: Intel VT-d stage-1 page table
* @IOMMU_HWPT_DATA_ARM_SMMUV3: ARM SMMUv3 Context Descriptor Table
*/
enum iommu_hwpt_data_type {
IOMMU_HWPT_DATA_NONE,
IOMMU_HWPT_DATA_VTD_S1,
IOMMU_HWPT_DATA_ARM_SMMUV3,
};

Though inevitably we'd have to define a separate data group for
things like set_dev_data that is related to idev v.s. hwpt:

// IOMMU_DEV_DATA_TYPE sounds like an IOMMU device, other than a
// passthrough device, so renaming to "_IDEV_" here. And perhaps
// "set_dev_data" could be "set_idev_data" too? Any better name?

/**
* enum iommu_idev_data_type - Data Type for a Device behind an IOMMU
* @IOMMU_IDEV_DATA_NONE: no data
* @IOMMU_IDEV_DATA_ARM_SMMUV3: ARM SMMUv3 specific device data
*/
enum iommu_idev_data_type {
IOMMU_IDEV_DATA_NONE,
IOMMU_IDEV_DATA_ARM_SMMUV3,
};

/**
* struct iommu_idev_data_arm_smmuv3 - ARM SMMUv3 specific device data
* @sid: The Stream ID that is assigned in the user space
*
* The SMMUv3 specific user space data for a device that is behind an SMMU HW.
* The guest-level user data should be linked to the host-level kernel data,
* which will be used by user space cache invalidation commands.
*/
struct iommu_idev_data_arm_smmuv3 {
__u32 sid;
};

/**
* struct iommu_set_idev_data - ioctl(IOMMU_SET_IDEV_DATA)
* @size: sizeof(struct iommu_set_idev_data)
* @dev_id: The device to set an iommu specific device data
* @data_uptr: User pointer of the device user data
* @data_len: Length of the device user data
*
* The device data must be unset using ioctl(IOMMU_UNSET_IDEV_DATA), before
* another ioctl(IOMMU_SET_IDEV_DATA) call or before the device itself gets
* unbind'd from the iommufd context.
*/
struct iommu_set_idev_data {
__u32 size;
__u32 dev_id;
__aligned_u64 data_uptr;
__u32 data_len;
};

Thanks
Nic

2023-10-13 22:22:55

by Nicolin Chen

[permalink] [raw]
Subject: Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op

On Fri, Oct 13, 2023 at 07:42:50PM +0800, Yi Liu wrote:
> On 2023/10/12 17:12, Yi Liu wrote:
> > On 2023/9/26 14:56, Tian, Kevin wrote:
> > > > From: Yi Liu <[email protected]>
> > > > Sent: Thursday, September 21, 2023 3:51 PM
> > > > +
> > > > +/**
> > > > + * iommu_copy_user_data - Copy iommu driver specific user space data
> > > > + * @dst_data: Pointer to an iommu driver specific user data that is
> > > > defined
> > > > in
> > > > + * include/uapi/linux/iommufd.h
> > > > + * @src_data: Pointer to a struct iommu_user_data for user space data info
> > > > + * @data_len: Length of current user data structure, i.e. sizeof(struct
> > > > _dst)
> > > > + * @min_len: Initial length of user data structure for backward
> > > > compatibility.
> > > > + * This should be offsetofend using the last member in the
> > > > user data
> > > > + * struct that was initially added to
> > > > include/uapi/linux/iommufd.h
> > > > + */
> > > > +static inline int iommu_copy_user_data(void *dst_data,
> > > > + const struct iommu_user_data *src_data,
> > > > + size_t data_len, size_t min_len)
> > >
> > > iommu_copy_struct_from_user()?
> > >
> > > btw given the confusion raised on how this would be used is it clearer
> > > to move it to the patch together with the 1st user?
> >
> > sure. How about your opinion? @Nic.
> >
>
> after a second thinking, the first user of this helper is the patch to
> extend mock iommu driver. Is it suitable to introduce a common API together
> with selftest code?
>
> https://lore.kernel.org/linux-iommu/[email protected]/

I feel no...

But I could separate iommu_copy_struct_from_user and its array
drivitive into an additional patch placed before the selftest
changes, so at least it would be closer to the first callers.

Thanks
Nic

2023-10-14 00:09:22

by Nicolin Chen

[permalink] [raw]
Subject: Re: [PATCH v4 07/17] iommufd: Add user-managed hw_pagetable support

On Tue, Sep 26, 2023 at 01:14:47AM -0700, Tian, Kevin wrote:
> > From: Liu, Yi L <[email protected]>
> > Sent: Thursday, September 21, 2023 3:51 PM
> >
> > +static void iommufd_user_managed_hwpt_abort(struct iommufd_object
> > *obj)
> > +{
> > + struct iommufd_hw_pagetable *hwpt =
> > + container_of(obj, struct iommufd_hw_pagetable, obj);
> > +
> > + /* The parent->mutex must be held until finalize is called. */
> > + lockdep_assert_held(&hwpt->parent->mutex);
> > +
> > + iommufd_hw_pagetable_destroy(obj);
> > +}
>
> Can you elaborate what exactly is protected by parent->mutex?
>
> My gut-feeling that the child just grabs a refcnt on the parent
> object. It doesn't change any other data of the parent.

Ah, you are right. It's added here just for symmetry so we wouldn't
end up with something like:
if (!hwpt->user_managed)
mutex_lock(&hwpt->mutex);
alloc_fn();
if (!hwpt->user_managed)
mutex_unlock(&hwpt->mutex);

Perhaps I should move the pair of mutex calls to the kernel-managed
hwpt allocator.

> > +/**
> > + * iommufd_user_managed_hwpt_alloc() - Get a user-managed
> > hw_pagetable
> > + * @ictx: iommufd context
> > + * @pt_obj: Parent object to an HWPT to associate the domain with
> > + * @idev: Device to get an iommu_domain for
> > + * @flags: Flags from userspace
> > + * @hwpt_type: Requested type of hw_pagetable
> > + * @user_data: user_data pointer
> > + * @dummy: never used
> > + *
> > + * Allocate a new iommu_domain (must be IOMMU_DOMAIN_NESTED) and
> > return it as
> > + * a user-managed hw_pagetable.
>
> Add some text to highlight the requirement being a parent, e.g. not
> an auto domain, must be capable of being a parent, etc.

OK.

> > + case IOMMUFD_OBJ_HW_PAGETABLE:
> > + parent = container_of(pt_obj, struct iommufd_hw_pagetable,
> > obj);
> > + /* No user-managed HWPT on top of an user-managed one
> > */
> > + if (parent->user_managed) {
> > + rc = -EINVAL;
> > + goto out_put_pt;
> > + }
>
> move to alloc_fn().

Though being a bit covert, this is actually to avoid a data buffer
allocation in the common pathway before calling alloc_fn(), which
is added in the following patch. And the reason why it's in the
common function is because we previously support a kernel-managed
hwpt allocation with data too.

But now, I think we can just move this sanity and data allocation
together into the user-managed hwpt allocator.

Thanks
Nicolin

2023-10-14 00:46:41

by Nicolin Chen

[permalink] [raw]
Subject: Re: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains

On Tue, Sep 26, 2023 at 01:16:35AM -0700, Tian, Kevin wrote:
> > From: Liu, Yi L <[email protected]>
> > Sent: Thursday, September 21, 2023 3:51 PM
> >
> > From: Nicolin Chen <[email protected]>
> >
> > Now enforce_cache_coherency and msi_cookie are kernel-managed hwpt
> > things.
> > So, they should be only setup on kernel-managed domains. If the attaching
> > domain is a user-managed domain, redirect the hwpt to hwpt->parent to do
> > it correctly.
> >
>
> No redirection. The parent should already have the configuration done
> when it's created. It shouldn't be triggered in the nesting path.

iommufd_hw_pagetable_enforce_cc() is not only called in alloc(),
but also in hwpt_attach/replace() if cc is not enforced by the
alloc() because the idev that initiates the hwpt_alloc() might
not have idev->enforce_cache_coherency. Only when another idev
that has idev->enforce_cache_coherency attaches to the shared
hwpt, the cc configuration would be done.

In a nesting case encountering the same situation above, the S2
hwpt is allocated without the iommufd_hw_pagetable_enforce_cc().
But the 2nd idev that has idev->enforce_cache_coherency might
attach directly to the S1 domain/hwpt without touching the S2
domain (for the same VM, S2 domain can be shared). In this case,
without a redirection, the iommufd_hw_pagetable_enforce_cc()
would be missed.

Any thought?

Nic

2023-10-16 03:26:39

by Yi Liu

[permalink] [raw]
Subject: Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op

On 2023/10/14 01:56, Nicolin Chen wrote:
> On Fri, Oct 13, 2023 at 11:04:56AM -0300, Jason Gunthorpe wrote:
>> On Fri, Oct 13, 2023 at 12:33:13PM +0800, Yi Liu wrote:
>>
>>> not really. Below the users of the struct iommu_user_data in my current
>>> iommufd_nesting branch. Only the domain_alloc_user op has type as there
>>> can be multiple vendor specific alloc data types. Basically, I'm ok to
>>> make the change you suggested, just not sure if it is good to add type
>>> as it is only needed by one path.
>>
>> I don't think we should ever have an opaque data blob without a type
>> tag..
>
> I can add those "missing" data types, and then a driver will be
> responsible for sanitizing the type along with the data_len.
>
> I notice that the enum iommu_hwpt_data_type in the posted patch
> is confined to the alloc_user uAPI. Perhaps we should share it
> with invalidate too:

invalidation path does not need a type field today as the data
type is vendor specific, vendor driver should know the data type
when calls in.

>
> /**
> * enum iommu_hwpt_data_type - IOMMU HWPT Data Type
> * @IOMMU_HWPT_DATA_NONE: no data
> * @IOMMU_HWPT_DATA_VTD_S1: Intel VT-d stage-1 page table
> * @IOMMU_HWPT_DATA_ARM_SMMUV3: ARM SMMUv3 Context Descriptor Table
> */
> enum iommu_hwpt_data_type {
> IOMMU_HWPT_DATA_NONE,
> IOMMU_HWPT_DATA_VTD_S1,
> IOMMU_HWPT_DATA_ARM_SMMUV3,
> };
>
> Though inevitably we'd have to define a separate data group for
> things like set_dev_data that is related to idev v.s. hwpt:

yes, the type field is in separate data group per path.

>
> // IOMMU_DEV_DATA_TYPE sounds like an IOMMU device, other than a
> // passthrough device, so renaming to "_IDEV_" here. And perhaps
> // "set_dev_data" could be "set_idev_data" too? Any better name?
>
> /**
> * enum iommu_idev_data_type - Data Type for a Device behind an IOMMU
> * @IOMMU_IDEV_DATA_NONE: no data
> * @IOMMU_IDEV_DATA_ARM_SMMUV3: ARM SMMUv3 specific device data
> */
> enum iommu_idev_data_type {
> IOMMU_IDEV_DATA_NONE,
> IOMMU_IDEV_DATA_ARM_SMMUV3,
> };
>
> /**
> * struct iommu_idev_data_arm_smmuv3 - ARM SMMUv3 specific device data
> * @sid: The Stream ID that is assigned in the user space
> *
> * The SMMUv3 specific user space data for a device that is behind an SMMU HW.
> * The guest-level user data should be linked to the host-level kernel data,
> * which will be used by user space cache invalidation commands.
> */
> struct iommu_idev_data_arm_smmuv3 {
> __u32 sid;
> };
>
> /**
> * struct iommu_set_idev_data - ioctl(IOMMU_SET_IDEV_DATA)
> * @size: sizeof(struct iommu_set_idev_data)
> * @dev_id: The device to set an iommu specific device data
> * @data_uptr: User pointer of the device user data
> * @data_len: Length of the device user data
> *
> * The device data must be unset using ioctl(IOMMU_UNSET_IDEV_DATA), before
> * another ioctl(IOMMU_SET_IDEV_DATA) call or before the device itself gets
> * unbind'd from the iommufd context.
> */
> struct iommu_set_idev_data {
> __u32 size;
> __u32 dev_id;
> __aligned_u64 data_uptr;
> __u32 data_len;
> };
>
> Thanks
> Nic

--
Regards,
Yi Liu

2023-10-16 08:49:06

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains

> From: Nicolin Chen <[email protected]>
> Sent: Saturday, October 14, 2023 8:45 AM
>
> On Tue, Sep 26, 2023 at 01:16:35AM -0700, Tian, Kevin wrote:
> > > From: Liu, Yi L <[email protected]>
> > > Sent: Thursday, September 21, 2023 3:51 PM
> > >
> > > From: Nicolin Chen <[email protected]>
> > >
> > > Now enforce_cache_coherency and msi_cookie are kernel-managed hwpt
> > > things.
> > > So, they should be only setup on kernel-managed domains. If the
> attaching
> > > domain is a user-managed domain, redirect the hwpt to hwpt->parent to
> do
> > > it correctly.
> > >
> >
> > No redirection. The parent should already have the configuration done
> > when it's created. It shouldn't be triggered in the nesting path.
>
> iommufd_hw_pagetable_enforce_cc() is not only called in alloc(),
> but also in hwpt_attach/replace() if cc is not enforced by the
> alloc() because the idev that initiates the hwpt_alloc() might
> not have idev->enforce_cache_coherency. Only when another idev
> that has idev->enforce_cache_coherency attaches to the shared
> hwpt, the cc configuration would be done.
>

is this a bug already? If the 1st device doesn't have enforce_cc in its
iommu, setting the snp bit in the hwpt would lead to reserved
bit violation.

another problem is that intel_iommu_enforce_cache_coherency()
doesn't update existing entries. It only sets a domain flag to affect
future mappings. so it means the 2nd idev is also broken.

The simplest option is to follow vfio type1 i.e. don't mix devices
with different enforce_cc in one domain.

2023-10-16 11:54:36

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op

On Mon, Oct 16, 2023 at 11:28:15AM +0800, Yi Liu wrote:
> On 2023/10/14 01:56, Nicolin Chen wrote:
> > On Fri, Oct 13, 2023 at 11:04:56AM -0300, Jason Gunthorpe wrote:
> > > On Fri, Oct 13, 2023 at 12:33:13PM +0800, Yi Liu wrote:
> > >
> > > > not really. Below the users of the struct iommu_user_data in my current
> > > > iommufd_nesting branch. Only the domain_alloc_user op has type as there
> > > > can be multiple vendor specific alloc data types. Basically, I'm ok to
> > > > make the change you suggested, just not sure if it is good to add type
> > > > as it is only needed by one path.
> > >
> > > I don't think we should ever have an opaque data blob without a type
> > > tag..
> >
> > I can add those "missing" data types, and then a driver will be
> > responsible for sanitizing the type along with the data_len.
> >
> > I notice that the enum iommu_hwpt_data_type in the posted patch
> > is confined to the alloc_user uAPI. Perhaps we should share it
> > with invalidate too:
>
> invalidation path does not need a type field today as the data
> type is vendor specific, vendor driver should know the data type
> when calls in.

I'm not keen on that, what if a driver needs another type in the
future? You'd want to make the invalidation data format part of the
domain allocation?

Jason

2023-10-16 11:58:02

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains

On Mon, Oct 16, 2023 at 08:48:03AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <[email protected]>
> > Sent: Saturday, October 14, 2023 8:45 AM
> >
> > On Tue, Sep 26, 2023 at 01:16:35AM -0700, Tian, Kevin wrote:
> > > > From: Liu, Yi L <[email protected]>
> > > > Sent: Thursday, September 21, 2023 3:51 PM
> > > >
> > > > From: Nicolin Chen <[email protected]>
> > > >
> > > > Now enforce_cache_coherency and msi_cookie are kernel-managed hwpt
> > > > things.
> > > > So, they should be only setup on kernel-managed domains. If the
> > attaching
> > > > domain is a user-managed domain, redirect the hwpt to hwpt->parent to
> > do
> > > > it correctly.
> > > >
> > >
> > > No redirection. The parent should already have the configuration done
> > > when it's created. It shouldn't be triggered in the nesting path.
> >
> > iommufd_hw_pagetable_enforce_cc() is not only called in alloc(),
> > but also in hwpt_attach/replace() if cc is not enforced by the
> > alloc() because the idev that initiates the hwpt_alloc() might
> > not have idev->enforce_cache_coherency. Only when another idev
> > that has idev->enforce_cache_coherency attaches to the shared
> > hwpt, the cc configuration would be done.
>
> is this a bug already? If the 1st device doesn't have enforce_cc in its
> iommu, setting the snp bit in the hwpt would lead to reserved
> bit violation.

I suspect there are technically some gaps in the intel driver, yes..

> another problem is that intel_iommu_enforce_cache_coherency()
> doesn't update existing entries. It only sets a domain flag to affect
> future mappings. so it means the 2nd idev is also broken.

This is such a gap, intel driver should not permit that.

> The simplest option is to follow vfio type1 i.e. don't mix devices
> with different enforce_cc in one domain.

This is why I wanted to get rid of this bad mechanism going forward.

Manually created hwpt should have a manual specification of cc and
then we don't have so many problems.

It means userspace needs to compute if they want to use CC or not, but
userspace already needs to figure this out since without autodomains
it must create two hwpts manually anyhow.

Jason

2023-10-16 18:18:39

by Nicolin Chen

[permalink] [raw]
Subject: Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op

On Mon, Oct 16, 2023 at 08:54:07AM -0300, Jason Gunthorpe wrote:
> On Mon, Oct 16, 2023 at 11:28:15AM +0800, Yi Liu wrote:
> > On 2023/10/14 01:56, Nicolin Chen wrote:
> > > On Fri, Oct 13, 2023 at 11:04:56AM -0300, Jason Gunthorpe wrote:
> > > > On Fri, Oct 13, 2023 at 12:33:13PM +0800, Yi Liu wrote:
> > > >
> > > > > not really. Below the users of the struct iommu_user_data in my current
> > > > > iommufd_nesting branch. Only the domain_alloc_user op has type as there
> > > > > can be multiple vendor specific alloc data types. Basically, I'm ok to
> > > > > make the change you suggested, just not sure if it is good to add type
> > > > > as it is only needed by one path.
> > > >
> > > > I don't think we should ever have an opaque data blob without a type
> > > > tag..
> > >
> > > I can add those "missing" data types, and then a driver will be
> > > responsible for sanitizing the type along with the data_len.
> > >
> > > I notice that the enum iommu_hwpt_data_type in the posted patch
> > > is confined to the alloc_user uAPI. Perhaps we should share it
> > > with invalidate too:
> >
> > invalidation path does not need a type field today as the data
> > type is vendor specific, vendor driver should know the data type
> > when calls in.
>
> I'm not keen on that, what if a driver needs another type in the
> future? You'd want to make the invalidation data format part of the
> domain allocation?

The invalidation data has hwpt_id so it's tied to a hwpt and its
hwpt->domain. Would it be reasonable to have a different type of
invalidation data for the same type of hwpt?

With this being asked, I added it for our next version. At this
moment, it only does a sanity job:

// API
__iommu_copy_struct_from_user(void *dst_data,
const struct iommu_user_data *src_data,
unsigned int data_type, size_t data_len,
size_t min_len)
{
if (src_data->type != data_type)
return -EINVAL;

// Caller
rc = iommu_copy_struct_from_user(&user_cfg, user_data,
IOMMU_HWPT_DATA_SELFTEST, iotlb);
if (rc)
return ERR_PTR(rc);

Thanks
Nic

2023-10-17 08:51:01

by Yi Liu

[permalink] [raw]
Subject: Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op

On 2023/10/17 02:17, Nicolin Chen wrote:
> On Mon, Oct 16, 2023 at 08:54:07AM -0300, Jason Gunthorpe wrote:
>> On Mon, Oct 16, 2023 at 11:28:15AM +0800, Yi Liu wrote:
>>> On 2023/10/14 01:56, Nicolin Chen wrote:
>>>> On Fri, Oct 13, 2023 at 11:04:56AM -0300, Jason Gunthorpe wrote:
>>>>> On Fri, Oct 13, 2023 at 12:33:13PM +0800, Yi Liu wrote:
>>>>>
>>>>>> not really. Below the users of the struct iommu_user_data in my current
>>>>>> iommufd_nesting branch. Only the domain_alloc_user op has type as there
>>>>>> can be multiple vendor specific alloc data types. Basically, I'm ok to
>>>>>> make the change you suggested, just not sure if it is good to add type
>>>>>> as it is only needed by one path.
>>>>>
>>>>> I don't think we should ever have an opaque data blob without a type
>>>>> tag..
>>>>
>>>> I can add those "missing" data types, and then a driver will be
>>>> responsible for sanitizing the type along with the data_len.
>>>>
>>>> I notice that the enum iommu_hwpt_data_type in the posted patch
>>>> is confined to the alloc_user uAPI. Perhaps we should share it
>>>> with invalidate too:
>>>
>>> invalidation path does not need a type field today as the data
>>> type is vendor specific, vendor driver should know the data type
>>> when calls in.
>>
>> I'm not keen on that, what if a driver needs another type in the
>> future? You'd want to make the invalidation data format part of the
>> domain allocation?
>
> The invalidation data has hwpt_id so it's tied to a hwpt and its
> hwpt->domain. Would it be reasonable to have a different type of
> invalidation data for the same type of hwpt?

this seems like what Jason asks. A type of hwpt can have two kinds
of invalidation data types. Is it really possible?

> With this being asked, I added it for our next version. At this
> moment, it only does a sanity job:
>
> // API
> __iommu_copy_struct_from_user(void *dst_data,
> const struct iommu_user_data *src_data,
> unsigned int data_type, size_t data_len,
> size_t min_len)
> {
> if (src_data->type != data_type)
> return -EINVAL;
>
> // Caller
> rc = iommu_copy_struct_from_user(&user_cfg, user_data,
> IOMMU_HWPT_DATA_SELFTEST, iotlb);
> if (rc)
> return ERR_PTR(rc);
>
> Thanks
> Nic

--
Regards,
Yi Liu

2023-10-17 08:53:20

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains

> From: Jason Gunthorpe <[email protected]>
> Sent: Monday, October 16, 2023 7:58 PM
>
> On Mon, Oct 16, 2023 at 08:48:03AM +0000, Tian, Kevin wrote:
> > > From: Nicolin Chen <[email protected]>
> > > Sent: Saturday, October 14, 2023 8:45 AM
> > >
> > > On Tue, Sep 26, 2023 at 01:16:35AM -0700, Tian, Kevin wrote:
> > > > > From: Liu, Yi L <[email protected]>
> > > > > Sent: Thursday, September 21, 2023 3:51 PM
> > > > >
> > > > > From: Nicolin Chen <[email protected]>
> > > > >
> > > > > Now enforce_cache_coherency and msi_cookie are kernel-managed
> hwpt
> > > > > things.
> > > > > So, they should be only setup on kernel-managed domains. If the
> > > attaching
> > > > > domain is a user-managed domain, redirect the hwpt to hwpt->parent
> to
> > > do
> > > > > it correctly.
> > > > >
> > > >
> > > > No redirection. The parent should already have the configuration done
> > > > when it's created. It shouldn't be triggered in the nesting path.
> > >
> > > iommufd_hw_pagetable_enforce_cc() is not only called in alloc(),
> > > but also in hwpt_attach/replace() if cc is not enforced by the
> > > alloc() because the idev that initiates the hwpt_alloc() might
> > > not have idev->enforce_cache_coherency. Only when another idev
> > > that has idev->enforce_cache_coherency attaches to the shared
> > > hwpt, the cc configuration would be done.
> >
> > is this a bug already? If the 1st device doesn't have enforce_cc in its
> > iommu, setting the snp bit in the hwpt would lead to reserved
> > bit violation.
>
> I suspect there are technically some gaps in the intel driver, yes..

double checked. intel driver is doing right thing now:

intel_iommu_enforce_cache_coherency()
domain_support_force_snooping()

static bool domain_support_force_snooping(struct dmar_domain *domain)
{
struct device_domain_info *info;
bool support = true;

assert_spin_locked(&domain->lock);
list_for_each_entry(info, &domain->devices, link) {
if (!ecap_sc_support(info->iommu->ecap)) {
support = false;
break;
}
}

return support;
}

>
> > another problem is that intel_iommu_enforce_cache_coherency()
> > doesn't update existing entries. It only sets a domain flag to affect
> > future mappings. so it means the 2nd idev is also broken.
>
> This is such a gap, intel driver should not permit that.

yes. @Baolu, can you add a fix?

>
> > The simplest option is to follow vfio type1 i.e. don't mix devices
> > with different enforce_cc in one domain.
>
> This is why I wanted to get rid of this bad mechanism going forward.
>
> Manually created hwpt should have a manual specification of cc and
> then we don't have so many problems.
>
> It means userspace needs to compute if they want to use CC or not, but
> userspace already needs to figure this out since without autodomains
> it must create two hwpts manually anyhow.
>

Now there is no interface reporting enforce_cc to userspace.

Actually the user doesn't need to know enforce_cc. As long as there is
an attach incompatibility error the user has to create a 2nd hwpt for
the to-be-attached device then enforce_cc will be handled automatically
in hwpt_alloc.

I prefer to removing enforce_cc in attach fn completely then no parent
trick in this patch. Just keep it in hwpt_alloc and leave to iommu driver to
figure out the attaching compatibility:

- attaching a idev with matching enforce_cc as hwpt is always allowed.
- attaching a idev w/o enforce_cc to a hwpt with enforce_cc is rejected.
- attaching a idev w/ enforce_cc to a hwpt w/o enforce_cc succeeds with
hwpt continuing to be w/o enforce_cc. No promotion.

above has been guaranteed by intel iommu driver.

2023-10-17 09:06:26

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains

> From: Tian, Kevin
> Sent: Tuesday, October 17, 2023 4:53 PM
> >
> > This is why I wanted to get rid of this bad mechanism going forward.
> >
> > Manually created hwpt should have a manual specification of cc and
> > then we don't have so many problems.
> >

Actually I don't see much difference between manual specification of
cc vs. indirect specification of cc via specified idev in hwpt_alloc. What
really matters is how we want to deal with the compatibility in the
following attach path.

2023-10-17 09:30:03

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op

> From: Liu, Yi L <[email protected]>
> Sent: Tuesday, October 17, 2023 4:52 PM
>
> On 2023/10/17 02:17, Nicolin Chen wrote:
> > On Mon, Oct 16, 2023 at 08:54:07AM -0300, Jason Gunthorpe wrote:
> >> On Mon, Oct 16, 2023 at 11:28:15AM +0800, Yi Liu wrote:
> >>> On 2023/10/14 01:56, Nicolin Chen wrote:
> >>>> On Fri, Oct 13, 2023 at 11:04:56AM -0300, Jason Gunthorpe wrote:
> >>>>> On Fri, Oct 13, 2023 at 12:33:13PM +0800, Yi Liu wrote:
> >>>>>
> >>>>>> not really. Below the users of the struct iommu_user_data in my
> current
> >>>>>> iommufd_nesting branch. Only the domain_alloc_user op has type as
> there
> >>>>>> can be multiple vendor specific alloc data types. Basically, I'm ok to
> >>>>>> make the change you suggested, just not sure if it is good to add type
> >>>>>> as it is only needed by one path.
> >>>>>
> >>>>> I don't think we should ever have an opaque data blob without a type
> >>>>> tag..
> >>>>
> >>>> I can add those "missing" data types, and then a driver will be
> >>>> responsible for sanitizing the type along with the data_len.
> >>>>
> >>>> I notice that the enum iommu_hwpt_data_type in the posted patch
> >>>> is confined to the alloc_user uAPI. Perhaps we should share it
> >>>> with invalidate too:
> >>>
> >>> invalidation path does not need a type field today as the data
> >>> type is vendor specific, vendor driver should know the data type
> >>> when calls in.
> >>
> >> I'm not keen on that, what if a driver needs another type in the
> >> future? You'd want to make the invalidation data format part of the
> >> domain allocation?
> >
> > The invalidation data has hwpt_id so it's tied to a hwpt and its
> > hwpt->domain. Would it be reasonable to have a different type of
> > invalidation data for the same type of hwpt?
>
> this seems like what Jason asks. A type of hwpt can have two kinds
> of invalidation data types. Is it really possible?
>

e.g. vhost-iommu may want its own vendor-agnostic format from
previous discussion...

2023-10-17 15:53:19

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains

On Tue, Oct 17, 2023 at 08:52:49AM +0000, Tian, Kevin wrote:

> > This is why I wanted to get rid of this bad mechanism going forward.
> >
> > Manually created hwpt should have a manual specification of cc and
> > then we don't have so many problems.
> >
> > It means userspace needs to compute if they want to use CC or not, but
> > userspace already needs to figure this out since without autodomains
> > it must create two hwpts manually anyhow.
> >
>
> Now there is no interface reporting enforce_cc to userspace.

Yes..

> Actually the user doesn't need to know enforce_cc. As long as there is
> an attach incompatibility error the user has to create a 2nd hwpt for
> the to-be-attached device then enforce_cc will be handled automatically
> in hwpt_alloc.

Uh, OK we can do that.. It is a bit hacky because we don't know the
order do

> I prefer to removing enforce_cc in attach fn completely then no parent
> trick in this patch. Just keep it in hwpt_alloc and leave to iommu driver to
> figure out the attaching compatibility:

You are basically saying to set the cc mode during creation because we
know the idev at that moment and can tell if it should be on/off?

It seems reasonable, but I can't remember why it is in the attach path
at the moment.

Jason

2023-10-17 19:59:17

by Nicolin Chen

[permalink] [raw]
Subject: Re: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains

On Tue, Oct 17, 2023 at 12:53:01PM -0300, Jason Gunthorpe wrote:
> On Tue, Oct 17, 2023 at 08:52:49AM +0000, Tian, Kevin wrote:
> > I prefer to removing enforce_cc in attach fn completely then no parent
> > trick in this patch. Just keep it in hwpt_alloc and leave to iommu driver to
> > figure out the attaching compatibility:
>
> You are basically saying to set the cc mode during creation because we
> know the idev at that moment and can tell if it should be on/off?
>
> It seems reasonable, but I can't remember why it is in the attach path
> at the moment.

This was the commit adding it in the alloc path:
https://lore.kernel.org/linux-iommu/[email protected]/

The older code was doing a hwpt "upgrade" from !cc to cc:
- /*
- * Try to upgrade the domain we have, it is an iommu driver bug to
- * report IOMMU_CAP_ENFORCE_CACHE_COHERENCY but fail
- * enforce_cache_coherency when there are no devices attached to the
- * domain.
- */
- if (idev->enforce_cache_coherency && !hwpt->enforce_cache_coherency) {
- if (hwpt->domain->ops->enforce_cache_coherency)
- hwpt->enforce_cache_coherency =
- hwpt->domain->ops->enforce_cache_coherency(
- hwpt->domain);

If we remove the enforce_cc call in the attach path and let the
driver decide whether to enforce or reject in attach_dev calls,
there seems to be no point in tracking an enforce_cache_coherency
flag in the IOMMUFD pathway but only for the VFIO_DMA_CC_IOMMU
extension check in the vfio-compat pathway?

Thanks
Nic

2023-10-18 02:36:39

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains

On 10/17/23 4:52 PM, Tian, Kevin wrote:
>>> another problem is that intel_iommu_enforce_cache_coherency()
>>> doesn't update existing entries. It only sets a domain flag to affect
>>> future mappings. so it means the 2nd idev is also broken.
>> This is such a gap, intel driver should not permit that.
> yes. @Baolu, can you add a fix?

Sure thing!

Best regards,
baolu

2023-10-18 06:10:31

by Yi Liu

[permalink] [raw]
Subject: Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op

On 2023/10/17 17:28, Tian, Kevin wrote:
>> From: Liu, Yi L <[email protected]>
>> Sent: Tuesday, October 17, 2023 4:52 PM
>>
>> On 2023/10/17 02:17, Nicolin Chen wrote:
>>> On Mon, Oct 16, 2023 at 08:54:07AM -0300, Jason Gunthorpe wrote:
>>>> On Mon, Oct 16, 2023 at 11:28:15AM +0800, Yi Liu wrote:
>>>>> On 2023/10/14 01:56, Nicolin Chen wrote:
>>>>>> On Fri, Oct 13, 2023 at 11:04:56AM -0300, Jason Gunthorpe wrote:
>>>>>>> On Fri, Oct 13, 2023 at 12:33:13PM +0800, Yi Liu wrote:
>>>>>>>
>>>>>>>> not really. Below the users of the struct iommu_user_data in my
>> current
>>>>>>>> iommufd_nesting branch. Only the domain_alloc_user op has type as
>> there
>>>>>>>> can be multiple vendor specific alloc data types. Basically, I'm ok to
>>>>>>>> make the change you suggested, just not sure if it is good to add type
>>>>>>>> as it is only needed by one path.
>>>>>>>
>>>>>>> I don't think we should ever have an opaque data blob without a type
>>>>>>> tag..
>>>>>>
>>>>>> I can add those "missing" data types, and then a driver will be
>>>>>> responsible for sanitizing the type along with the data_len.
>>>>>>
>>>>>> I notice that the enum iommu_hwpt_data_type in the posted patch
>>>>>> is confined to the alloc_user uAPI. Perhaps we should share it
>>>>>> with invalidate too:
>>>>>
>>>>> invalidation path does not need a type field today as the data
>>>>> type is vendor specific, vendor driver should know the data type
>>>>> when calls in.
>>>>
>>>> I'm not keen on that, what if a driver needs another type in the
>>>> future? You'd want to make the invalidation data format part of the
>>>> domain allocation?
>>>
>>> The invalidation data has hwpt_id so it's tied to a hwpt and its
>>> hwpt->domain. Would it be reasonable to have a different type of
>>> invalidation data for the same type of hwpt?
>>
>> this seems like what Jason asks. A type of hwpt can have two kinds
>> of invalidation data types. Is it really possible?
>>
>
> e.g. vhost-iommu may want its own vendor-agnostic format from
> previous discussion...

ok. So in future, if there is vendor-agnostic format cache invalidation
data structure, then each existing hwpt types would have two cache
invalidation types. is it?

So it is required to make the invalidation data format part of the
domain allocation. Perhaps we can add it later?

Regards,
Yi Liu

--
Regards,
Yi Liu

2023-10-18 16:37:32

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op

On Mon, Oct 16, 2023 at 11:17:56AM -0700, Nicolin Chen wrote:
> On Mon, Oct 16, 2023 at 08:54:07AM -0300, Jason Gunthorpe wrote:
> > On Mon, Oct 16, 2023 at 11:28:15AM +0800, Yi Liu wrote:
> > > On 2023/10/14 01:56, Nicolin Chen wrote:
> > > > On Fri, Oct 13, 2023 at 11:04:56AM -0300, Jason Gunthorpe wrote:
> > > > > On Fri, Oct 13, 2023 at 12:33:13PM +0800, Yi Liu wrote:
> > > > >
> > > > > > not really. Below the users of the struct iommu_user_data in my current
> > > > > > iommufd_nesting branch. Only the domain_alloc_user op has type as there
> > > > > > can be multiple vendor specific alloc data types. Basically, I'm ok to
> > > > > > make the change you suggested, just not sure if it is good to add type
> > > > > > as it is only needed by one path.
> > > > >
> > > > > I don't think we should ever have an opaque data blob without a type
> > > > > tag..
> > > >
> > > > I can add those "missing" data types, and then a driver will be
> > > > responsible for sanitizing the type along with the data_len.
> > > >
> > > > I notice that the enum iommu_hwpt_data_type in the posted patch
> > > > is confined to the alloc_user uAPI. Perhaps we should share it
> > > > with invalidate too:
> > >
> > > invalidation path does not need a type field today as the data
> > > type is vendor specific, vendor driver should know the data type
> > > when calls in.
> >
> > I'm not keen on that, what if a driver needs another type in the
> > future? You'd want to make the invalidation data format part of the
> > domain allocation?
>
> The invalidation data has hwpt_id so it's tied to a hwpt and its
> hwpt->domain. Would it be reasonable to have a different type of
> invalidation data for the same type of hwpt?

Yeah, maybe? Go down the road 10 years and we might have SMMUv3
invalidation format v1 and v2 or something?

Like we don't know what the HW side will do, they might extend the
command queue to have bigger commands and negotiate with the driver if
the bigger/smaller format is used. We've done that in our HW a couple
of times now.

Jason

2023-10-18 16:51:34

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains

On Tue, Oct 17, 2023 at 12:58:39PM -0700, Nicolin Chen wrote:
> On Tue, Oct 17, 2023 at 12:53:01PM -0300, Jason Gunthorpe wrote:
> > On Tue, Oct 17, 2023 at 08:52:49AM +0000, Tian, Kevin wrote:
> > > I prefer to removing enforce_cc in attach fn completely then no parent
> > > trick in this patch. Just keep it in hwpt_alloc and leave to iommu driver to
> > > figure out the attaching compatibility:
> >
> > You are basically saying to set the cc mode during creation because we
> > know the idev at that moment and can tell if it should be on/off?
> >
> > It seems reasonable, but I can't remember why it is in the attach path
> > at the moment.
>
> This was the commit adding it in the alloc path:
> https://lore.kernel.org/linux-iommu/[email protected]/
>
> The older code was doing a hwpt "upgrade" from !cc to cc:
> - /*
> - * Try to upgrade the domain we have, it is an iommu driver bug to
> - * report IOMMU_CAP_ENFORCE_CACHE_COHERENCY but fail
> - * enforce_cache_coherency when there are no devices attached to the
> - * domain.
> - */
> - if (idev->enforce_cache_coherency && !hwpt->enforce_cache_coherency) {
> - if (hwpt->domain->ops->enforce_cache_coherency)
> - hwpt->enforce_cache_coherency =
> - hwpt->domain->ops->enforce_cache_coherency(
> - hwpt->domain);
>
> If we remove the enforce_cc call in the attach path and let the
> driver decide whether to enforce or reject in attach_dev calls,
> there seems to be no point in tracking an enforce_cache_coherency
> flag in the IOMMUFD pathway but only for the VFIO_DMA_CC_IOMMU
> extension check in the vfio-compat pathway?

I think the purpose of this code is to try to minimize the number of
domains.

Otherwise we have a problem where the order devices are attached to
the domain decides how many domains you get. ie the first device
attached does not want CC (but is compatible with it) so we create a
non-CC domain

Then later we attach a device that does want CC and now we are forced
to create a second iommu domain when upgrading the first domain would
have been fine.

Hotplug is another scenario (though Intel driver does not support it,
and it looks broken)

Really, I hate this CC mechanism. It is only for Intel, can we just
punt it to userspace and have an intel 'want cc flag' for the entire
nesting path and forget about it?

Jason

2023-10-18 16:51:47

by Nicolin Chen

[permalink] [raw]
Subject: Re: [PATCH v4 01/17] iommu: Add hwpt_type with user_data for domain_alloc_user op

On Wed, Oct 18, 2023 at 01:37:20PM -0300, Jason Gunthorpe wrote:
> On Mon, Oct 16, 2023 at 11:17:56AM -0700, Nicolin Chen wrote:
> > On Mon, Oct 16, 2023 at 08:54:07AM -0300, Jason Gunthorpe wrote:
> > > On Mon, Oct 16, 2023 at 11:28:15AM +0800, Yi Liu wrote:
> > > > On 2023/10/14 01:56, Nicolin Chen wrote:
> > > > > On Fri, Oct 13, 2023 at 11:04:56AM -0300, Jason Gunthorpe wrote:
> > > > > > On Fri, Oct 13, 2023 at 12:33:13PM +0800, Yi Liu wrote:
> > > > > >
> > > > > > > not really. Below the users of the struct iommu_user_data in my current
> > > > > > > iommufd_nesting branch. Only the domain_alloc_user op has type as there
> > > > > > > can be multiple vendor specific alloc data types. Basically, I'm ok to
> > > > > > > make the change you suggested, just not sure if it is good to add type
> > > > > > > as it is only needed by one path.
> > > > > >
> > > > > > I don't think we should ever have an opaque data blob without a type
> > > > > > tag..
> > > > >
> > > > > I can add those "missing" data types, and then a driver will be
> > > > > responsible for sanitizing the type along with the data_len.
> > > > >
> > > > > I notice that the enum iommu_hwpt_data_type in the posted patch
> > > > > is confined to the alloc_user uAPI. Perhaps we should share it
> > > > > with invalidate too:
> > > >
> > > > invalidation path does not need a type field today as the data
> > > > type is vendor specific, vendor driver should know the data type
> > > > when calls in.
> > >
> > > I'm not keen on that, what if a driver needs another type in the
> > > future? You'd want to make the invalidation data format part of the
> > > domain allocation?
> >
> > The invalidation data has hwpt_id so it's tied to a hwpt and its
> > hwpt->domain. Would it be reasonable to have a different type of
> > invalidation data for the same type of hwpt?
>
> Yeah, maybe? Go down the road 10 years and we might have SMMUv3
> invalidation format v1 and v2 or something?
>
> Like we don't know what the HW side will do, they might extend the
> command queue to have bigger commands and negotiate with the driver if
> the bigger/smaller format is used. We've done that in our HW a couple
> of times now.

I see. We'll have the type. Thanks!

2023-10-19 01:56:37

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains

> From: Jason Gunthorpe <[email protected]>
> Sent: Thursday, October 19, 2023 12:51 AM
>
> On Tue, Oct 17, 2023 at 12:58:39PM -0700, Nicolin Chen wrote:
> > On Tue, Oct 17, 2023 at 12:53:01PM -0300, Jason Gunthorpe wrote:
> > > On Tue, Oct 17, 2023 at 08:52:49AM +0000, Tian, Kevin wrote:
> > > > I prefer to removing enforce_cc in attach fn completely then no parent
> > > > trick in this patch. Just keep it in hwpt_alloc and leave to iommu driver
> to
> > > > figure out the attaching compatibility:
> > >
> > > You are basically saying to set the cc mode during creation because we
> > > know the idev at that moment and can tell if it should be on/off?
> > >
> > > It seems reasonable, but I can't remember why it is in the attach path
> > > at the moment.
> >
> > This was the commit adding it in the alloc path:
> > https://lore.kernel.org/linux-iommu/8e897628-61fa-b3fb-b609-
> [email protected]/
> >
> > The older code was doing a hwpt "upgrade" from !cc to cc:
> > - /*
> > - * Try to upgrade the domain we have, it is an iommu driver bug to
> > - * report IOMMU_CAP_ENFORCE_CACHE_COHERENCY but fail
> > - * enforce_cache_coherency when there are no devices attached to
> the
> > - * domain.
> > - */
> > - if (idev->enforce_cache_coherency && !hwpt-
> >enforce_cache_coherency) {
> > - if (hwpt->domain->ops->enforce_cache_coherency)
> > - hwpt->enforce_cache_coherency =
> > - hwpt->domain->ops->enforce_cache_coherency(
> > - hwpt->domain);
> >
> > If we remove the enforce_cc call in the attach path and let the
> > driver decide whether to enforce or reject in attach_dev calls,
> > there seems to be no point in tracking an enforce_cache_coherency
> > flag in the IOMMUFD pathway but only for the VFIO_DMA_CC_IOMMU
> > extension check in the vfio-compat pathway?
>
> I think the purpose of this code is to try to minimize the number of
> domains.
>
> Otherwise we have a problem where the order devices are attached to
> the domain decides how many domains you get. ie the first device
> attached does not want CC (but is compatible with it) so we create a
> non-CC domain

in autodetect model this won't happen. If the first device is capable
of enforce_cc then the domain will be created with enforce_cc.

there is no "does not want CC" input in autodetect.

>
> Then later we attach a device that does want CC and now we are forced
> to create a second iommu domain when upgrading the first domain would
> have been fine.

then in this case the 2nd device will reuse the domain.

>
> Hotplug is another scenario (though Intel driver does not support it,
> and it looks broken)

Can you elaborate how hotplug is broken? If device is hotplugged and
failed to attach to an existing domain, then a new one will be created
for it.

there is indeed a broken case in KVM which cannot handle dynamic
change of coherency due to hotplug. But that one is orthogonal to
what we discussed here about how to decide cc in iommufd.

>
> Really, I hate this CC mechanism. It is only for Intel, can we just

Intel and AMD.

> punt it to userspace and have an intel 'want cc flag' for the entire
> nesting path and forget about it?
>

I dislike it too. But still not get your point why adding such a flag
can really simplify things. As explained before the only difference
between autodetect and having a user flag just affects the decision
of cc when creating a hwpt. whether we should upgrade in the
attach path is an orthogonal open which imho is unnecessary and
Nicoline's patches to remove that check then also remove this
patch makes lot of sense to me.

Then the necessity of introducing such a flag (plus adding a new
interface to report cc to user space) is not obvious.

2023-10-19 23:54:52

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains

On Thu, Oct 19, 2023 at 01:56:01AM +0000, Tian, Kevin wrote:

> > Otherwise we have a problem where the order devices are attached to
> > the domain decides how many domains you get. ie the first device
> > attached does not want CC (but is compatible with it) so we create a
> > non-CC domain
>
> in autodetect model this won't happen. If the first device is capable
> of enforce_cc then the domain will be created with enforce_cc.
>
> there is no "does not want CC" input in autodetect.
> >
> > Then later we attach a device that does want CC and now we are forced
> > to create a second iommu domain when upgrading the first domain would
> > have been fine.
>
> then in this case the 2nd device will reuse the domain.

Then you have the reverse problem where the domain will not be CC when
it should be.

> > Hotplug is another scenario (though Intel driver does not support it,
> > and it looks broken)
>
> Can you elaborate how hotplug is broken? If device is hotplugged and
> failed to attach to an existing domain, then a new one will be created
> for it.

A non-cc domain will ask to be upgraded and the driver will let it
happen even though it doesn't/can't fix the existing IOPTEs

> there is indeed a broken case in KVM which cannot handle dynamic
> change of coherency due to hotplug. But that one is orthogonal to
> what we discussed here about how to decide cc in iommufd.

That too

> > Really, I hate this CC mechanism. It is only for Intel, can we just
>
> Intel and AMD.

Nope, AMD just hardwires their IOMMU to always do CC enforcing. All
this mess is only for Intel and their weird IOMMU that can only do the
enforcement for a GPU.

> > punt it to userspace and have an intel 'want cc flag' for the entire
> > nesting path and forget about it?
>
> I dislike it too. But still not get your point why adding such a flag
> can really simplify things. As explained before the only difference
> between autodetect and having a user flag just affects the decision
> of cc when creating a hwpt. whether we should upgrade in the
> attach path is an orthogonal open which imho is unnecessary and
> Nicoline's patches to remove that check then also remove this
> patch makes lot of sense to me.

I don't think we can remove it, it is supposed to provide consistency
of result regardless of ordering.

Jason

2023-10-20 02:44:29

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains

> From: Jason Gunthorpe <[email protected]>
> Sent: Friday, October 20, 2023 7:54 AM
>
> On Thu, Oct 19, 2023 at 01:56:01AM +0000, Tian, Kevin wrote:
>
> > > Otherwise we have a problem where the order devices are attached to
> > > the domain decides how many domains you get. ie the first device
> > > attached does not want CC (but is compatible with it) so we create a
> > > non-CC domain
> >
> > in autodetect model this won't happen. If the first device is capable
> > of enforce_cc then the domain will be created with enforce_cc.
> >
> > there is no "does not want CC" input in autodetect.
> > >
> > > Then later we attach a device that does want CC and now we are forced
> > > to create a second iommu domain when upgrading the first domain
> would
> > > have been fine.
> >
> > then in this case the 2nd device will reuse the domain.
>
> Then you have the reverse problem where the domain will not be CC when
> it should be.

If the domain has been non-CC it's perfectly fine for the 2nd device with CC
to reuse it. As long as there is one domain with non-CC then KVM has to
have special treatment on wbinvd. In this case there is actually a benefit
to use just one non-CC domain for all devices (either non-CC or CC).

What we want to prevent is attaching a non-CC device to a CC domain
or upgrade a non-CC domain to CC since in both case the non-CC device
will be broken due to incompatible page table format.

>
> > > Hotplug is another scenario (though Intel driver does not support it,
> > > and it looks broken)
> >
> > Can you elaborate how hotplug is broken? If device is hotplugged and
> > failed to attach to an existing domain, then a new one will be created
> > for it.
>
> A non-cc domain will ask to be upgraded and the driver will let it
> happen even though it doesn't/can't fix the existing IOPTEs

iommufd should not ask for upgrade at all. The CC attribute of domain
should be fixed since creation time.

Baolu will fix the intel-iommu driver accordingly.

>
> > there is indeed a broken case in KVM which cannot handle dynamic
> > change of coherency due to hotplug. But that one is orthogonal to
> > what we discussed here about how to decide cc in iommufd.
>
> That too
>
> > > Really, I hate this CC mechanism. It is only for Intel, can we just
> >
> > Intel and AMD.
>
> Nope, AMD just hardwires their IOMMU to always do CC enforcing. All
> this mess is only for Intel and their weird IOMMU that can only do the
> enforcement for a GPU.
>
> > > punt it to userspace and have an intel 'want cc flag' for the entire
> > > nesting path and forget about it?
> >
> > I dislike it too. But still not get your point why adding such a flag
> > can really simplify things. As explained before the only difference
> > between autodetect and having a user flag just affects the decision
> > of cc when creating a hwpt. whether we should upgrade in the
> > attach path is an orthogonal open which imho is unnecessary and
> > Nicoline's patches to remove that check then also remove this
> > patch makes lot of sense to me.
>
> I don't think we can remove it, it is supposed to provide consistency
> of result regardless of ordering.
>

Who cares about such consistency? sure the result is different due to order:

1) creating hwpt for dev1 (non-CC) then later attaching hwpt to
dev2 (CC) will succeed;

2) creating hwpt for dev2 (CC) then later attaching hwpt to
dev1 (non-CC) will fail then the user should create a new hwpt
for dev1;

But the user shouldn't assume such explicit consistency since it's not
defined in our uAPI. All we defined is that the attaching may
fail due to incompatibility for whatever reason then the user can
always try creating a new hwpt for the to-be-attached device. From
this regard I don't see providing consistency of result is necessary. ????

2023-10-20 13:56:14

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains

On Fri, Oct 20, 2023 at 02:43:58AM +0000, Tian, Kevin wrote:

> What we want to prevent is attaching a non-CC device to a CC domain
> or upgrade a non-CC domain to CC since in both case the non-CC
> device will be broken due to incompatible page table format.

[..]

> Who cares about such consistency? sure the result is different due to order:
>
> 1) creating hwpt for dev1 (non-CC) then later attaching hwpt to
> dev2 (CC) will succeed;
>
> 2) creating hwpt for dev2 (CC) then later attaching hwpt to
> dev1 (non-CC) will fail then the user should create a new hwpt
> for dev1;

AH... So really what the Intel driver wants is not upgrade to CC but
*downgrade* from CC.

non-CC is the type that is universally applicable, so if we come
across a non-CC capable device the proper/optimal thing is to degrade
the HWPT and re-use it, not allocate a new HWPT.

So the whole thing is upside down.

As changing the IOPTEs in flight seems hard, and I don't want to see
the Intel driver get slowed down to accomodate this, I think you are
right to say this should be a creation time property only.

I still think userspace should be able to select it so it can minimize
the number of HWPTs required.

> But the user shouldn't assume such explicit consistency since it's not
> defined in our uAPI. All we defined is that the attaching may
> fail due to incompatibility for whatever reason then the user can
> always try creating a new hwpt for the to-be-attached device. From
> this regard I don't see providing consistency of result is
> necessary. ????

Anyhow, OK, lets add a comment summarizing your points and remove the
cc upgrade at attach time (sorry Nicolin/Yi!)

It is easy to add a HWPT flag for this later if someone wants to
optimize it.

Jason

2023-10-20 19:00:00

by Nicolin Chen

[permalink] [raw]
Subject: Re: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains

On Fri, Oct 20, 2023 at 10:55:01AM -0300, Jason Gunthorpe wrote:
> On Fri, Oct 20, 2023 at 02:43:58AM +0000, Tian, Kevin wrote:
>
> > What we want to prevent is attaching a non-CC device to a CC domain
> > or upgrade a non-CC domain to CC since in both case the non-CC
> > device will be broken due to incompatible page table format.
>
> [..]
>
> > Who cares about such consistency? sure the result is different due to order:
> >
> > 1) creating hwpt for dev1 (non-CC) then later attaching hwpt to
> > dev2 (CC) will succeed;
> >
> > 2) creating hwpt for dev2 (CC) then later attaching hwpt to
> > dev1 (non-CC) will fail then the user should create a new hwpt
> > for dev1;
>
> AH... So really what the Intel driver wants is not upgrade to CC but
> *downgrade* from CC.
>
> non-CC is the type that is universally applicable, so if we come
> across a non-CC capable device the proper/optimal thing is to degrade
> the HWPT and re-use it, not allocate a new HWPT.
>
> So the whole thing is upside down.
>
> As changing the IOPTEs in flight seems hard, and I don't want to see
> the Intel driver get slowed down to accomodate this, I think you are
> right to say this should be a creation time property only.
>
> I still think userspace should be able to select it so it can minimize
> the number of HWPTs required.
>
> > But the user shouldn't assume such explicit consistency since it's not
> > defined in our uAPI. All we defined is that the attaching may
> > fail due to incompatibility for whatever reason then the user can
> > always try creating a new hwpt for the to-be-attached device. From
> > this regard I don't see providing consistency of result is
> > necessary. ????
>
> Anyhow, OK, lets add a comment summarizing your points and remove the
> cc upgrade at attach time (sorry Nicolin/Yi!)

Ack. I will send a small removal series. I assume it should CC
stable tree also? And where should we add this comment? Kdoc of
the alloc uAPI?

Thanks!
Nicolin

> It is easy to add a HWPT flag for this later if someone wants to
> optimize it.
>
> Jason

2023-10-21 16:38:45

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains

On Fri, Oct 20, 2023 at 11:59:13AM -0700, Nicolin Chen wrote:
> On Fri, Oct 20, 2023 at 10:55:01AM -0300, Jason Gunthorpe wrote:
> > On Fri, Oct 20, 2023 at 02:43:58AM +0000, Tian, Kevin wrote:
> >
> > > What we want to prevent is attaching a non-CC device to a CC domain
> > > or upgrade a non-CC domain to CC since in both case the non-CC
> > > device will be broken due to incompatible page table format.
> >
> > [..]
> >
> > > Who cares about such consistency? sure the result is different due to order:
> > >
> > > 1) creating hwpt for dev1 (non-CC) then later attaching hwpt to
> > > dev2 (CC) will succeed;
> > >
> > > 2) creating hwpt for dev2 (CC) then later attaching hwpt to
> > > dev1 (non-CC) will fail then the user should create a new hwpt
> > > for dev1;
> >
> > AH... So really what the Intel driver wants is not upgrade to CC but
> > *downgrade* from CC.
> >
> > non-CC is the type that is universally applicable, so if we come
> > across a non-CC capable device the proper/optimal thing is to degrade
> > the HWPT and re-use it, not allocate a new HWPT.
> >
> > So the whole thing is upside down.
> >
> > As changing the IOPTEs in flight seems hard, and I don't want to see
> > the Intel driver get slowed down to accomodate this, I think you are
> > right to say this should be a creation time property only.
> >
> > I still think userspace should be able to select it so it can minimize
> > the number of HWPTs required.
> >
> > > But the user shouldn't assume such explicit consistency since it's not
> > > defined in our uAPI. All we defined is that the attaching may
> > > fail due to incompatibility for whatever reason then the user can
> > > always try creating a new hwpt for the to-be-attached device. From
> > > this regard I don't see providing consistency of result is
> > > necessary. ????
> >
> > Anyhow, OK, lets add a comment summarizing your points and remove the
> > cc upgrade at attach time (sorry Nicolin/Yi!)
>
> Ack. I will send a small removal series. I assume it should CC
> stable tree also?

No, it seems more like tidying that fixing a functional issue, do I
misunderstand?

> And where should we add this comment? Kdoc of
> the alloc uAPI?

Maybe right in front of the only enforce_cc op callback?

Jason

2023-10-23 00:18:43

by Nicolin Chen

[permalink] [raw]
Subject: Re: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains

On Sat, Oct 21, 2023 at 01:38:04PM -0300, Jason Gunthorpe wrote:
> On Fri, Oct 20, 2023 at 11:59:13AM -0700, Nicolin Chen wrote:
> > On Fri, Oct 20, 2023 at 10:55:01AM -0300, Jason Gunthorpe wrote:
> > > On Fri, Oct 20, 2023 at 02:43:58AM +0000, Tian, Kevin wrote:
> > >
> > > > What we want to prevent is attaching a non-CC device to a CC domain
> > > > or upgrade a non-CC domain to CC since in both case the non-CC
> > > > device will be broken due to incompatible page table format.
> > >
> > > [..]
> > >
> > > > Who cares about such consistency? sure the result is different due to order:
> > > >
> > > > 1) creating hwpt for dev1 (non-CC) then later attaching hwpt to
> > > > dev2 (CC) will succeed;
> > > >
> > > > 2) creating hwpt for dev2 (CC) then later attaching hwpt to
> > > > dev1 (non-CC) will fail then the user should create a new hwpt
> > > > for dev1;
> > >
> > > AH... So really what the Intel driver wants is not upgrade to CC but
> > > *downgrade* from CC.
> > >
> > > non-CC is the type that is universally applicable, so if we come
> > > across a non-CC capable device the proper/optimal thing is to degrade
> > > the HWPT and re-use it, not allocate a new HWPT.
> > >
> > > So the whole thing is upside down.
> > >
> > > As changing the IOPTEs in flight seems hard, and I don't want to see
> > > the Intel driver get slowed down to accomodate this, I think you are
> > > right to say this should be a creation time property only.
> > >
> > > I still think userspace should be able to select it so it can minimize
> > > the number of HWPTs required.
> > >
> > > > But the user shouldn't assume such explicit consistency since it's not
> > > > defined in our uAPI. All we defined is that the attaching may
> > > > fail due to incompatibility for whatever reason then the user can
> > > > always try creating a new hwpt for the to-be-attached device. From
> > > > this regard I don't see providing consistency of result is
> > > > necessary. ????
> > >
> > > Anyhow, OK, lets add a comment summarizing your points and remove the
> > > cc upgrade at attach time (sorry Nicolin/Yi!)
> >
> > Ack. I will send a small removal series. I assume it should CC
> > stable tree also?
>
> No, it seems more like tidying that fixing a functional issue, do I
> misunderstand?

Hmm. Maybe the misunderstanding is mine -- Kevin was asking if
it was already a bug and you answered yes:
https://lore.kernel.org/linux-iommu/[email protected]/

If this shouldn't be a bug fix, I could just merge them into a
single tidying patch and add the comments you suggested below.

> > And where should we add this comment? Kdoc of
> > the alloc uAPI?
>
> Maybe right in front of the only enforce_cc op callback?

OK. How about this? Might be a bit verbose though:
/*
* enforce_cache_coherenc must be determined during the HWPT allocation.
* Note that a HWPT (non-CC) created for a device (non-CC) can be later
* reused by another device (either non-CC or CC). However, A HWPT (CC)
* created for a device (CC) cannot be reused by another device (non-CC)
* but only devices (CC). Instead user space in this case would need to
* allocate a separate HWPT (non-CC).
*/

Thanks
Nic

2023-10-23 02:54:12

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains

> From: Nicolin Chen <[email protected]>
> Sent: Monday, October 23, 2023 8:18 AM
>
> On Sat, Oct 21, 2023 at 01:38:04PM -0300, Jason Gunthorpe wrote:
> > On Fri, Oct 20, 2023 at 11:59:13AM -0700, Nicolin Chen wrote:
> > > On Fri, Oct 20, 2023 at 10:55:01AM -0300, Jason Gunthorpe wrote:
> > > > On Fri, Oct 20, 2023 at 02:43:58AM +0000, Tian, Kevin wrote:
> > > >
> > > > > But the user shouldn't assume such explicit consistency since it's not
> > > > > defined in our uAPI. All we defined is that the attaching may
> > > > > fail due to incompatibility for whatever reason then the user can
> > > > > always try creating a new hwpt for the to-be-attached device. From
> > > > > this regard I don't see providing consistency of result is
> > > > > necessary. ????
> > > >
> > > > Anyhow, OK, lets add a comment summarizing your points and remove
> the
> > > > cc upgrade at attach time (sorry Nicolin/Yi!)
> > >
> > > Ack. I will send a small removal series. I assume it should CC
> > > stable tree also?
> >
> > No, it seems more like tidying that fixing a functional issue, do I
> > misunderstand?
>
> Hmm. Maybe the misunderstanding is mine -- Kevin was asking if
> it was already a bug and you answered yes:
> https://lore.kernel.org/linux-iommu/[email protected]/
>

currently intel-iommu driver already rejects 1) enforcing CC on
a domain which is already attached to non-CC device and
2) attaching a non-CC device to a domain which has enforce_cc.

so there is no explorable bug to fix in stable tree.

Just logically intel-iommu driver doesn't support enforcing Cc
on a domain which is capable of enforce-cc and already has
valid mappings. Then it should add proper check to prevent
it from being attempted by future usages.

2023-10-23 14:00:02

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains

On Sun, Oct 22, 2023 at 05:18:03PM -0700, Nicolin Chen wrote:

> > > And where should we add this comment? Kdoc of
> > > the alloc uAPI?
> >
> > Maybe right in front of the only enforce_cc op callback?
>
> OK. How about this? Might be a bit verbose though:
> /*
> * enforce_cache_coherenc must be determined during the HWPT allocation.
> * Note that a HWPT (non-CC) created for a device (non-CC) can be later
> * reused by another device (either non-CC or CC). However, A HWPT (CC)
> * created for a device (CC) cannot be reused by another device (non-CC)
> * but only devices (CC). Instead user space in this case would need to
> * allocate a separate HWPT (non-CC).
> */

Ok, fix the spello in enforce_cache_coherenc

Jason

2023-10-23 18:43:32

by Nicolin Chen

[permalink] [raw]
Subject: Re: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains

On Mon, Oct 23, 2023 at 02:53:20AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <[email protected]>
> > Sent: Monday, October 23, 2023 8:18 AM
> >
> > On Sat, Oct 21, 2023 at 01:38:04PM -0300, Jason Gunthorpe wrote:
> > > On Fri, Oct 20, 2023 at 11:59:13AM -0700, Nicolin Chen wrote:
> > > > On Fri, Oct 20, 2023 at 10:55:01AM -0300, Jason Gunthorpe wrote:
> > > > > On Fri, Oct 20, 2023 at 02:43:58AM +0000, Tian, Kevin wrote:
> > > > >
> > > > > > But the user shouldn't assume such explicit consistency since it's not
> > > > > > defined in our uAPI. All we defined is that the attaching may
> > > > > > fail due to incompatibility for whatever reason then the user can
> > > > > > always try creating a new hwpt for the to-be-attached device. From
> > > > > > this regard I don't see providing consistency of result is
> > > > > > necessary. ????
> > > > >
> > > > > Anyhow, OK, lets add a comment summarizing your points and remove
> > the
> > > > > cc upgrade at attach time (sorry Nicolin/Yi!)
> > > >
> > > > Ack. I will send a small removal series. I assume it should CC
> > > > stable tree also?
> > >
> > > No, it seems more like tidying that fixing a functional issue, do I
> > > misunderstand?
> >
> > Hmm. Maybe the misunderstanding is mine -- Kevin was asking if
> > it was already a bug and you answered yes:
> > https://lore.kernel.org/linux-iommu/[email protected]/
> >
>
> currently intel-iommu driver already rejects 1) enforcing CC on
> a domain which is already attached to non-CC device and
> 2) attaching a non-CC device to a domain which has enforce_cc.
>
> so there is no explorable bug to fix in stable tree.

I see. Thanks!

2023-10-23 18:49:52

by Nicolin Chen

[permalink] [raw]
Subject: Re: [PATCH v4 08/17] iommufd: Always setup MSI and anforce cc on kernel-managed domains

On Mon, Oct 23, 2023 at 10:59:35AM -0300, Jason Gunthorpe wrote:
> On Sun, Oct 22, 2023 at 05:18:03PM -0700, Nicolin Chen wrote:
>
> > > > And where should we add this comment? Kdoc of
> > > > the alloc uAPI?
> > >
> > > Maybe right in front of the only enforce_cc op callback?
> >
> > OK. How about this? Might be a bit verbose though:
> > /*
> > * enforce_cache_coherenc must be determined during the HWPT allocation.
> > * Note that a HWPT (non-CC) created for a device (non-CC) can be later
> > * reused by another device (either non-CC or CC). However, A HWPT (CC)
> > * created for a device (CC) cannot be reused by another device (non-CC)
> > * but only devices (CC). Instead user space in this case would need to
> > * allocate a separate HWPT (non-CC).
> > */
>
> Ok, fix the spello in enforce_cache_coherenc

Oops.

I also found the existing piece sorta says a similar thing:
* Set the coherency mode before we do iopt_table_add_domain() as some
* iommus have a per-PTE bit that controls it and need to decide before
* doing any maps.

So, did this and sending v3:
- * enforce_cache_coherenc must be determined during the HWPT allocation.
+ * The cache coherency mode must be configured here and unchanged later.