v3: https://lore.kernel.org/linux-iommu/[email protected]/
Hi all,
This one really is hopefully ready to go now - rebased on iommu/core,
and lightly tested by booting a machine with SMMUv3 and running the
IOMMUFD selftest for good measure (with the usual handful of mmap()
failures I always seem to get, but nothing relevant exploded).
Thanks,
Robin.
Robin Murphy (7):
iommu: Factor out some helpers
iommu: Decouple iommu_present() from bus ops
iommu: Validate that devices match domains
iommu: Switch __iommu_domain_alloc() to device ops
iommu/arm-smmu: Don't register fwnode for legacy binding
iommu: Retire bus ops
iommu: Clean up open-coded ownership checks
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 3 -
drivers/iommu/arm/arm-smmu/arm-smmu.c | 12 +-
drivers/iommu/arm/arm-smmu/qcom_iommu.c | 16 +-
drivers/iommu/iommu.c | 162 +++++++++++++-------
drivers/iommu/mtk_iommu.c | 7 +-
drivers/iommu/mtk_iommu_v1.c | 3 -
drivers/iommu/sprd-iommu.c | 8 +-
drivers/iommu/virtio-iommu.c | 3 -
include/acpi/acpi_bus.h | 2 +
include/linux/device.h | 1 -
include/linux/device/bus.h | 5 -
include/linux/dma-map-ops.h | 1 +
include/linux/iommu.h | 1 +
13 files changed, 115 insertions(+), 109 deletions(-)
--
2.39.2.101.g768bb238c484.dirty
With the rest of the API internals converted, it's time to finally
tackle probe_device and how we bootstrap the per-device ops association
to begin with. This ends up being disappointingly straightforward, since
fwspec users are already doing it in order to find their of_xlate
callback, and it works out that we can easily do the equivalent for
other drivers too. Then shuffle the remaining awareness of iommu_ops
into the couple of core headers that still need it, and breathe a sigh
of relief.
Ding dong the bus ops are gone!
CC: Rafael J. Wysocki <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
Acked-by: Greg Kroah-Hartman <[email protected]>
Reviewed-by: Lu Baolu <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Robin Murphy <[email protected]>
---
v4: Don't forget new reference in iommu_device_register_bus()
---
drivers/iommu/iommu.c | 31 ++++++++++++++++++-------------
include/acpi/acpi_bus.h | 2 ++
include/linux/device.h | 1 -
include/linux/device/bus.h | 5 -----
include/linux/dma-map-ops.h | 1 +
5 files changed, 21 insertions(+), 19 deletions(-)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index c5b5408d1dd7..3d29434d57c6 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -148,7 +148,7 @@ struct iommu_group_attribute iommu_group_attr_##_name = \
static LIST_HEAD(iommu_device_list);
static DEFINE_SPINLOCK(iommu_device_lock);
-static struct bus_type * const iommu_buses[] = {
+static const struct bus_type * const iommu_buses[] = {
&platform_bus_type,
#ifdef CONFIG_PCI
&pci_bus_type,
@@ -257,13 +257,6 @@ int iommu_device_register(struct iommu_device *iommu,
/* We need to be able to take module references appropriately */
if (WARN_ON(is_module_address((unsigned long)ops) && !ops->owner))
return -EINVAL;
- /*
- * Temporarily enforce global restriction to a single driver. This was
- * already the de-facto behaviour, since any possible combination of
- * existing drivers would compete for at least the PCI or platform bus.
- */
- if (iommu_buses[0]->iommu_ops && iommu_buses[0]->iommu_ops != ops)
- return -EBUSY;
iommu->ops = ops;
if (hwdev)
@@ -273,10 +266,8 @@ int iommu_device_register(struct iommu_device *iommu,
list_add_tail(&iommu->list, &iommu_device_list);
spin_unlock(&iommu_device_lock);
- for (int i = 0; i < ARRAY_SIZE(iommu_buses) && !err; i++) {
- iommu_buses[i]->iommu_ops = ops;
+ for (int i = 0; i < ARRAY_SIZE(iommu_buses) && !err; i++)
err = bus_iommu_probe(iommu_buses[i]);
- }
if (err)
iommu_device_unregister(iommu);
return err;
@@ -329,7 +320,6 @@ int iommu_device_register_bus(struct iommu_device *iommu,
list_add_tail(&iommu->list, &iommu_device_list);
spin_unlock(&iommu_device_lock);
- bus->iommu_ops = ops;
err = bus_iommu_probe(bus);
if (err) {
iommu_device_unregister_bus(iommu, bus, nb);
@@ -496,12 +486,27 @@ static void iommu_deinit_device(struct device *dev)
static int __iommu_probe_device(struct device *dev, struct list_head *group_list)
{
- const struct iommu_ops *ops = dev->bus->iommu_ops;
+ const struct iommu_ops *ops;
+ struct iommu_fwspec *fwspec;
struct iommu_group *group;
static DEFINE_MUTEX(iommu_probe_device_lock);
struct group_device *gdev;
int ret;
+ /*
+ * For FDT-based systems and ACPI IORT/VIOT, drivers register IOMMU
+ * instances with non-NULL fwnodes, and client devices should have been
+ * identified with a fwspec by this point. Otherwise, we can currently
+ * assume that only one of Intel, AMD, s390, PAMU or legacy SMMUv2 can
+ * be present, and that any of their registered instances has suitable
+ * ops for probing, and thus cheekily co-opt the same mechanism.
+ */
+ fwspec = dev_iommu_fwspec_get(dev);
+ if (fwspec && fwspec->ops)
+ ops = fwspec->ops;
+ else
+ ops = iommu_ops_from_fwnode(NULL);
+
if (!ops)
return -ENODEV;
/*
diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index 254685085c82..13d959b3ba29 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -623,6 +623,8 @@ struct acpi_pci_root {
/* helper */
+struct iommu_ops;
+
bool acpi_dma_supported(const struct acpi_device *adev);
enum dev_dma_attr acpi_get_dma_attr(struct acpi_device *adev);
int acpi_iommu_fwspec_init(struct device *dev, u32 id,
diff --git a/include/linux/device.h b/include/linux/device.h
index 56d93a1ffb7b..b78e66f3b34a 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -42,7 +42,6 @@ struct class;
struct subsys_private;
struct device_node;
struct fwnode_handle;
-struct iommu_ops;
struct iommu_group;
struct dev_pin_info;
struct dev_iommu;
diff --git a/include/linux/device/bus.h b/include/linux/device/bus.h
index ae10c4322754..e25aab08f873 100644
--- a/include/linux/device/bus.h
+++ b/include/linux/device/bus.h
@@ -62,9 +62,6 @@ struct fwnode_handle;
* this bus.
* @pm: Power management operations of this bus, callback the specific
* device driver's pm-ops.
- * @iommu_ops: IOMMU specific operations for this bus, used to attach IOMMU
- * driver implementations to a bus and allow the driver to do
- * bus-specific setup
* @need_parent_lock: When probing or removing a device on this bus, the
* device core should lock the device's parent.
*
@@ -104,8 +101,6 @@ struct bus_type {
const struct dev_pm_ops *pm;
- const struct iommu_ops *iommu_ops;
-
bool need_parent_lock;
};
diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index f2fc203fb8a1..a52e508d1869 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -11,6 +11,7 @@
#include <linux/slab.h>
struct cma;
+struct iommu_ops;
/*
* Values for struct dma_map_ops.flags:
--
2.39.2.101.g768bb238c484.dirty
Before we can allow drivers to coexist, we need to make sure that one
driver's domain ops can't misinterpret another driver's dev_iommu_priv
data. To that end, add a token to the domain so we can remember how it
was allocated - for now this may as well be the device ops, since they
still correlate 1:1 with drivers. We can trust ourselves for internal
default domain attachment, so add checks to cover all the public attach
interfaces.
Reviewed-by: Lu Baolu <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Robin Murphy <[email protected]>
---
v4: Cover iommu_attach_device_pasid() as well, and improve robustness
against theoretical attempts to attach a noiommu group.
---
drivers/iommu/iommu.c | 10 ++++++++++
include/linux/iommu.h | 1 +
2 files changed, 11 insertions(+)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index ef7feb0acc34..7c79a58ef010 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2102,6 +2102,7 @@ static struct iommu_domain *__iommu_domain_alloc(const struct iommu_ops *ops,
return NULL;
domain->type = type;
+ domain->owner = ops;
/*
* If not already set, assume all sizes by default; the driver
* may override this later
@@ -2267,10 +2268,16 @@ struct iommu_domain *iommu_get_dma_domain(struct device *dev)
static int __iommu_attach_group(struct iommu_domain *domain,
struct iommu_group *group)
{
+ struct device *dev;
+
if (group->domain && group->domain != group->default_domain &&
group->domain != group->blocking_domain)
return -EBUSY;
+ dev = iommu_group_first_dev(group);
+ if (!dev_has_iommu(dev) || dev_iommu_ops(dev) != domain->owner)
+ return -EINVAL;
+
return __iommu_group_set_domain(group, domain);
}
@@ -3468,6 +3475,9 @@ int iommu_attach_device_pasid(struct iommu_domain *domain,
if (!group)
return -ENODEV;
+ if (!dev_has_iommu(dev) || dev_iommu_ops(dev) != domain->owner)
+ return -EINVAL;
+
mutex_lock(&group->mutex);
curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain, GFP_KERNEL);
if (curr) {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 2d2802fb2c74..5c9560813d05 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -99,6 +99,7 @@ struct iommu_domain_geometry {
struct iommu_domain {
unsigned type;
const struct iommu_domain_ops *ops;
+ const struct iommu_ops *owner; /* Whose domain_alloc we came from */
unsigned long pgsize_bitmap; /* Bitmap of page sizes in use */
struct iommu_domain_geometry geometry;
struct iommu_dma_cookie *iova_cookie;
--
2.39.2.101.g768bb238c484.dirty
Some drivers already implement their own defence against the possibility
of being given someone else's device. Since this is now taken care of by
the core code (and via a slightly different path from the original
fwspec-based idea), let's clean them up.
Reviewed-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Robin Murphy <[email protected]>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 3 ---
drivers/iommu/arm/arm-smmu/arm-smmu.c | 9 +--------
drivers/iommu/arm/arm-smmu/qcom_iommu.c | 16 +++-------------
drivers/iommu/mtk_iommu.c | 7 +------
drivers/iommu/mtk_iommu_v1.c | 3 ---
drivers/iommu/sprd-iommu.c | 8 +-------
drivers/iommu/virtio-iommu.c | 3 ---
7 files changed, 6 insertions(+), 43 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index e82bf1c449a3..01ea307c7791 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2653,9 +2653,6 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
struct arm_smmu_master *master;
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
- if (!fwspec || fwspec->ops != &arm_smmu_ops)
- return ERR_PTR(-ENODEV);
-
if (WARN_ON_ONCE(dev_iommu_priv_get(dev)))
return ERR_PTR(-EBUSY);
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 4b83a3adacd6..4d09c0047892 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1116,11 +1116,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
struct arm_smmu_device *smmu;
int ret;
- if (!fwspec || fwspec->ops != &arm_smmu_ops) {
- dev_err(dev, "cannot attach to SMMU, is it on the same bus?\n");
- return -ENXIO;
- }
-
/*
* FIXME: The arch/arm DMA API code tries to attach devices to its own
* domains between of_xlate() and probe_device() - we have no way to cope
@@ -1357,10 +1352,8 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
fwspec = dev_iommu_fwspec_get(dev);
if (ret)
goto out_free;
- } else if (fwspec && fwspec->ops == &arm_smmu_ops) {
- smmu = arm_smmu_get_by_fwnode(fwspec->iommu_fwnode);
} else {
- return ERR_PTR(-ENODEV);
+ smmu = arm_smmu_get_by_fwnode(fwspec->iommu_fwnode);
}
ret = -EINVAL;
diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
index 97b2122032b2..33f3c870086c 100644
--- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c
+++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
@@ -79,16 +79,6 @@ static struct qcom_iommu_domain *to_qcom_iommu_domain(struct iommu_domain *dom)
static const struct iommu_ops qcom_iommu_ops;
-static struct qcom_iommu_dev * to_iommu(struct device *dev)
-{
- struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
-
- if (!fwspec || fwspec->ops != &qcom_iommu_ops)
- return NULL;
-
- return dev_iommu_priv_get(dev);
-}
-
static struct qcom_iommu_ctx * to_ctx(struct qcom_iommu_domain *d, unsigned asid)
{
struct qcom_iommu_dev *qcom_iommu = d->iommu;
@@ -372,7 +362,7 @@ static void qcom_iommu_domain_free(struct iommu_domain *domain)
static int qcom_iommu_attach_dev(struct iommu_domain *domain, struct device *dev)
{
- struct qcom_iommu_dev *qcom_iommu = to_iommu(dev);
+ struct qcom_iommu_dev *qcom_iommu = dev_iommu_priv_get(dev);
struct qcom_iommu_domain *qcom_domain = to_qcom_iommu_domain(domain);
int ret;
@@ -404,7 +394,7 @@ static int qcom_iommu_identity_attach(struct iommu_domain *identity_domain,
struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
struct qcom_iommu_domain *qcom_domain;
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
- struct qcom_iommu_dev *qcom_iommu = to_iommu(dev);
+ struct qcom_iommu_dev *qcom_iommu = dev_iommu_priv_get(dev);
unsigned int i;
if (domain == identity_domain || !domain)
@@ -535,7 +525,7 @@ static bool qcom_iommu_capable(struct device *dev, enum iommu_cap cap)
static struct iommu_device *qcom_iommu_probe_device(struct device *dev)
{
- struct qcom_iommu_dev *qcom_iommu = to_iommu(dev);
+ struct qcom_iommu_dev *qcom_iommu = dev_iommu_priv_get(dev);
struct device_link *link;
if (!qcom_iommu)
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 19ef50221c93..0a3698c33e19 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -863,16 +863,11 @@ static phys_addr_t mtk_iommu_iova_to_phys(struct iommu_domain *domain,
static struct iommu_device *mtk_iommu_probe_device(struct device *dev)
{
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
- struct mtk_iommu_data *data;
+ struct mtk_iommu_data *data = dev_iommu_priv_get(dev);
struct device_link *link;
struct device *larbdev;
unsigned int larbid, larbidx, i;
- if (!fwspec || fwspec->ops != &mtk_iommu_ops)
- return ERR_PTR(-ENODEV); /* Not a iommu client device */
-
- data = dev_iommu_priv_get(dev);
-
if (!MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_MM))
return &data->iommu;
diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index 67e044c1a7d9..25b41222abae 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -481,9 +481,6 @@ static struct iommu_device *mtk_iommu_v1_probe_device(struct device *dev)
idx++;
}
- if (!fwspec || fwspec->ops != &mtk_iommu_v1_ops)
- return ERR_PTR(-ENODEV); /* Not a iommu client device */
-
data = dev_iommu_priv_get(dev);
/* Link the consumer device with the smi-larb device(supplier) */
diff --git a/drivers/iommu/sprd-iommu.c b/drivers/iommu/sprd-iommu.c
index 9c33ea6903f6..b15a8fe7ae8a 100644
--- a/drivers/iommu/sprd-iommu.c
+++ b/drivers/iommu/sprd-iommu.c
@@ -384,13 +384,7 @@ static phys_addr_t sprd_iommu_iova_to_phys(struct iommu_domain *domain,
static struct iommu_device *sprd_iommu_probe_device(struct device *dev)
{
- struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
- struct sprd_iommu_device *sdev;
-
- if (!fwspec || fwspec->ops != &sprd_iommu_ops)
- return ERR_PTR(-ENODEV);
-
- sdev = dev_iommu_priv_get(dev);
+ struct sprd_iommu_device *sdev = dev_iommu_priv_get(dev);
return &sdev->iommu;
}
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 17dcd826f5c2..bb2e795a80d0 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -969,9 +969,6 @@ static struct iommu_device *viommu_probe_device(struct device *dev)
struct viommu_dev *viommu = NULL;
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
- if (!fwspec || fwspec->ops != &viommu_ops)
- return ERR_PTR(-ENODEV);
-
viommu = viommu_get_by_fwnode(fwspec->iommu_fwnode);
if (!viommu)
return ERR_PTR(-ENODEV);
--
2.39.2.101.g768bb238c484.dirty
In all the places we allocate default domains, we have (or can easily
get hold of) a device from which to resolve the right IOMMU ops; only
the public iommu_domain_alloc() interface actually depends on bus ops.
Reworking the public API is a big enough mission in its own right, but
in the meantime we can still decouple it from bus ops internally to move
forward.
Reviewed-by: Lu Baolu <[email protected]>
Signed-off-by: Robin Murphy <[email protected]>
---
v3: Make sure blocking domains are covered as well
v4: Reinstate correct bus_for_each_dev() handling from v2
---
drivers/iommu/iommu.c | 48 ++++++++++++++++++++++++++++---------------
1 file changed, 31 insertions(+), 17 deletions(-)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 7c79a58ef010..c5b5408d1dd7 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -95,8 +95,8 @@ static const char * const iommu_group_resv_type_string[] = {
static int iommu_bus_notifier(struct notifier_block *nb,
unsigned long action, void *data);
static void iommu_release_device(struct device *dev);
-static struct iommu_domain *
-__iommu_group_domain_alloc(struct iommu_group *group, unsigned int type);
+static struct iommu_domain *__iommu_domain_alloc(struct device *dev,
+ unsigned type);
static int __iommu_attach_device(struct iommu_domain *domain,
struct device *dev);
static int __iommu_attach_group(struct iommu_domain *domain,
@@ -1763,7 +1763,7 @@ __iommu_group_alloc_default_domain(struct iommu_group *group, int req_type)
{
if (group->default_domain && group->default_domain->type == req_type)
return group->default_domain;
- return __iommu_group_domain_alloc(group, req_type);
+ return __iommu_domain_alloc(iommu_group_first_dev(group), req_type);
}
/*
@@ -2082,10 +2082,10 @@ void iommu_set_fault_handler(struct iommu_domain *domain,
}
EXPORT_SYMBOL_GPL(iommu_set_fault_handler);
-static struct iommu_domain *__iommu_domain_alloc(const struct iommu_ops *ops,
- struct device *dev,
- unsigned int type)
+static struct iommu_domain *__iommu_domain_alloc(struct device *dev,
+ unsigned type)
{
+ const struct iommu_ops *ops = dev_iommu_ops(dev);
struct iommu_domain *domain;
unsigned int alloc_type = type & IOMMU_DOMAIN_ALLOC_FLAGS;
@@ -2120,20 +2120,30 @@ static struct iommu_domain *__iommu_domain_alloc(const struct iommu_ops *ops,
return domain;
}
-static struct iommu_domain *
-__iommu_group_domain_alloc(struct iommu_group *group, unsigned int type)
+static int __iommu_domain_alloc_dev(struct device *dev, void *data)
{
- struct device *dev = iommu_group_first_dev(group);
+ struct device **alloc_dev = data;
- return __iommu_domain_alloc(dev_iommu_ops(dev), dev, type);
+ if (!dev_has_iommu(dev))
+ return 0;
+
+ WARN_ONCE(*alloc_dev && dev_iommu_ops(dev) != dev_iommu_ops(*alloc_dev),
+ "Multiple IOMMU drivers present, which the public IOMMU API can't fully support yet. You may still need to disable one or more to get the expected result here, sorry!\n");
+
+ *alloc_dev = dev;
+ return 0;
}
struct iommu_domain *iommu_domain_alloc(const struct bus_type *bus)
{
- if (bus == NULL || bus->iommu_ops == NULL)
+ struct device *dev = NULL;
+
+ /* We always check the whole bus, so the return value isn't useful */
+ bus_for_each_dev(bus, NULL, &dev, __iommu_domain_alloc_dev);
+ if (!dev)
return NULL;
- return __iommu_domain_alloc(bus->iommu_ops, NULL,
- IOMMU_DOMAIN_UNMANAGED);
+
+ return __iommu_domain_alloc(dev, IOMMU_DOMAIN_UNMANAGED);
}
EXPORT_SYMBOL_GPL(iommu_domain_alloc);
@@ -3256,18 +3266,22 @@ void iommu_device_unuse_default_domain(struct device *dev)
static int __iommu_group_alloc_blocking_domain(struct iommu_group *group)
{
+ struct device *dev = iommu_group_first_dev(group);
+
if (group->blocking_domain)
return 0;
- group->blocking_domain =
- __iommu_group_domain_alloc(group, IOMMU_DOMAIN_BLOCKED);
+ /* noiommu groups should never be here */
+ if (WARN_ON(!dev_has_iommu(dev)))
+ return -ENODEV;
+
+ group->blocking_domain = __iommu_domain_alloc(dev, IOMMU_DOMAIN_BLOCKED);
if (!group->blocking_domain) {
/*
* For drivers that do not yet understand IOMMU_DOMAIN_BLOCKED
* create an empty domain instead.
*/
- group->blocking_domain = __iommu_group_domain_alloc(
- group, IOMMU_DOMAIN_UNMANAGED);
+ group->blocking_domain = __iommu_domain_alloc(dev, IOMMU_DOMAIN_UNMANAGED);
if (!group->blocking_domain)
return -EINVAL;
}
--
2.39.2.101.g768bb238c484.dirty
When using the legacy binding we bypass the of_xlate mechanism, so avoid
registering the instance fwnodes which act as keys for that. This will
help __iommu_probe_device() to retrieve the registered ops the same way
as for x86 etc. when no fwspec has previously been set up by of_xlate.
Reviewed-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Robin Murphy <[email protected]>
---
drivers/iommu/arm/arm-smmu/arm-smmu.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index d6d1a2a55cc0..4b83a3adacd6 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -2161,7 +2161,8 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
return err;
}
- err = iommu_device_register(&smmu->iommu, &arm_smmu_ops, dev);
+ err = iommu_device_register(&smmu->iommu, &arm_smmu_ops,
+ using_legacy_binding ? NULL : dev);
if (err) {
dev_err(dev, "Failed to register iommu\n");
iommu_device_sysfs_remove(&smmu->iommu);
--
2.39.2.101.g768bb238c484.dirty
On Mon, Oct 02, 2023 at 02:49:12PM +0100, Robin Murphy wrote:
> @@ -2120,20 +2120,30 @@ static struct iommu_domain *__iommu_domain_alloc(const struct iommu_ops *ops,
> return domain;
> }
>
> -static struct iommu_domain *
> -__iommu_group_domain_alloc(struct iommu_group *group, unsigned int type)
> +static int __iommu_domain_alloc_dev(struct device *dev, void *data)
> {
Why? The point of this design is that drivers are not allowed to
allocate different things for devices in the same group. So we always
force the driver to see only the first device in the group even if we
have a more specific device available in the call chain.
This patch has undone this design and passed in more specific devs :(
The new code here:
> struct iommu_domain *iommu_domain_alloc(const struct bus_type *bus)
> {
> - if (bus == NULL || bus->iommu_ops == NULL)
> + struct device *dev = NULL;
> +
> + /* We always check the whole bus, so the return value isn't useful */
> + bus_for_each_dev(bus, NULL, &dev, __iommu_domain_alloc_dev);
> + if (!dev)
> return NULL;
> - return __iommu_domain_alloc(bus->iommu_ops, NULL,
> - IOMMU_DOMAIN_UNMANAGED);
> +
> + return __iommu_domain_alloc(dev, IOMMU_DOMAIN_UNMANAGED);
> }
> EXPORT_SYMBOL_GPL(iommu_domain_alloc);
Should just obtain any group for the bus and pass that to
__iommu_group_domain_alloc().
Also, how does the locking work here? Definately can't pass dev
outside the bus_for_each_dev() like this.
If this needs to sweep over arbitary devices that are not the caller's
probe'd device it needs to hold at least the device_lock to prevent
racing with release.
So I'd structure this to find the matching device, lock the
device_lock, get the group refcount, unlock the device_lock then
get the group_mutex, check for non-empty and then call
__iommu_group_domain_alloc()
(there is a missing lockdep annotation in
__iommu_group_domain_alloc(), the group mutex is needed)
Jason
The pattern for picking the first device out of the group list is
repeated a few times now, so it's clearly worth factoring out, which
also helps hide the iommu_group_dev detail from places that don't need
to know. Similarly, the safety check for dev_iommu_ops() at certain
public interfaces starts looking a bit repetitive, and might not be
completely obvious at first glance, so let's factor that out for clarity
as well, in preparation for more uses of both.
Reviewed-by: Lu Baolu <[email protected]>
Signed-off-by: Robin Murphy <[email protected]>
---
v3: Rename dev_iommu_ops_valid() to reflect what it's actually checking,
rather than an implied consequence.
v4: Rebase, also catch the sneaky one in iommu_get_group_resv_regions()
which wasn't the full pattern (but really should be since it guards
the dev_iommu_ops() invocation in iommu_get_resv_regions()).
---
drivers/iommu/iommu.c | 56 ++++++++++++++++++++-----------------------
1 file changed, 26 insertions(+), 30 deletions(-)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 1ecac2b5c54f..f7793d1b5c3e 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -368,6 +368,15 @@ static void dev_iommu_free(struct device *dev)
kfree(param);
}
+/*
+ * Internal equivalent of device_iommu_mapped() for when we care that a device
+ * actually has API ops, and don't want false positives from VFIO-only groups.
+ */
+static bool dev_has_iommu(struct device *dev)
+{
+ return dev->iommu && dev->iommu->iommu_dev;
+}
+
static u32 dev_iommu_get_max_pasids(struct device *dev)
{
u32 max_pasids = 0, bits = 0;
@@ -620,7 +629,7 @@ static void __iommu_group_remove_device(struct device *dev)
list_del(&device->list);
__iommu_group_free_device(group, device);
- if (dev->iommu && dev->iommu->iommu_dev)
+ if (dev_has_iommu(dev))
iommu_deinit_device(dev);
else
dev->iommu_group = NULL;
@@ -819,7 +828,7 @@ int iommu_get_group_resv_regions(struct iommu_group *group,
* Non-API groups still expose reserved_regions in sysfs,
* so filter out calls that get here that way.
*/
- if (!device->dev->iommu)
+ if (!dev_has_iommu(device->dev))
break;
INIT_LIST_HEAD(&dev_resv_regions);
@@ -1224,6 +1233,12 @@ void iommu_group_remove_device(struct device *dev)
}
EXPORT_SYMBOL_GPL(iommu_group_remove_device);
+static struct device *iommu_group_first_dev(struct iommu_group *group)
+{
+ lockdep_assert_held(&group->mutex);
+ return list_first_entry(&group->devices, struct group_device, list)->dev;
+}
+
/**
* iommu_group_for_each_dev - iterate over each device in the group
* @group: the group
@@ -1751,23 +1766,6 @@ __iommu_group_alloc_default_domain(struct iommu_group *group, int req_type)
return __iommu_group_domain_alloc(group, req_type);
}
-/*
- * Returns the iommu_ops for the devices in an iommu group.
- *
- * It is assumed that all devices in an iommu group are managed by a single
- * IOMMU unit. Therefore, this returns the dev_iommu_ops of the first device
- * in the group.
- */
-static const struct iommu_ops *group_iommu_ops(struct iommu_group *group)
-{
- struct group_device *device =
- list_first_entry(&group->devices, struct group_device, list);
-
- lockdep_assert_held(&group->mutex);
-
- return dev_iommu_ops(device->dev);
-}
-
/*
* req_type of 0 means "auto" which means to select a domain based on
* iommu_def_domain_type or what the driver actually supports.
@@ -1775,7 +1773,7 @@ static const struct iommu_ops *group_iommu_ops(struct iommu_group *group)
static struct iommu_domain *
iommu_group_alloc_default_domain(struct iommu_group *group, int req_type)
{
- const struct iommu_ops *ops = group_iommu_ops(group);
+ const struct iommu_ops *ops = dev_iommu_ops(iommu_group_first_dev(group));
struct iommu_domain *dom;
lockdep_assert_held(&group->mutex);
@@ -1853,7 +1851,7 @@ static int iommu_bus_notifier(struct notifier_block *nb,
static int iommu_get_def_domain_type(struct iommu_group *group,
struct device *dev, int cur_type)
{
- const struct iommu_ops *ops = group_iommu_ops(group);
+ const struct iommu_ops *ops = dev_iommu_ops(dev);
int type;
if (!ops->def_domain_type)
@@ -2008,7 +2006,7 @@ bool device_iommu_capable(struct device *dev, enum iommu_cap cap)
{
const struct iommu_ops *ops;
- if (!dev->iommu || !dev->iommu->iommu_dev)
+ if (!dev_has_iommu(dev))
return false;
ops = dev_iommu_ops(dev);
@@ -2105,11 +2103,9 @@ static struct iommu_domain *__iommu_domain_alloc(const struct iommu_ops *ops,
static struct iommu_domain *
__iommu_group_domain_alloc(struct iommu_group *group, unsigned int type)
{
- struct device *dev =
- list_first_entry(&group->devices, struct group_device, list)
- ->dev;
+ struct device *dev = iommu_group_first_dev(group);
- return __iommu_domain_alloc(group_iommu_ops(group), dev, type);
+ return __iommu_domain_alloc(dev_iommu_ops(dev), dev, type);
}
struct iommu_domain *iommu_domain_alloc(const struct bus_type *bus)
@@ -2960,8 +2956,8 @@ EXPORT_SYMBOL_GPL(iommu_fwspec_add_ids);
*/
int iommu_dev_enable_feature(struct device *dev, enum iommu_dev_features feat)
{
- if (dev->iommu && dev->iommu->iommu_dev) {
- const struct iommu_ops *ops = dev->iommu->iommu_dev->ops;
+ if (dev_has_iommu(dev)) {
+ const struct iommu_ops *ops = dev_iommu_ops(dev);
if (ops->dev_enable_feat)
return ops->dev_enable_feat(dev, feat);
@@ -2976,8 +2972,8 @@ EXPORT_SYMBOL_GPL(iommu_dev_enable_feature);
*/
int iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features feat)
{
- if (dev->iommu && dev->iommu->iommu_dev) {
- const struct iommu_ops *ops = dev->iommu->iommu_dev->ops;
+ if (dev_has_iommu(dev)) {
+ const struct iommu_ops *ops = dev_iommu_ops(dev);
if (ops->dev_disable_feat)
return ops->dev_disable_feat(dev, feat);
--
2.39.2.101.g768bb238c484.dirty
On 02/10/2023 3:16 pm, Jason Gunthorpe wrote:
> On Mon, Oct 02, 2023 at 02:49:12PM +0100, Robin Murphy wrote:
>> @@ -2120,20 +2120,30 @@ static struct iommu_domain *__iommu_domain_alloc(const struct iommu_ops *ops,
>> return domain;
>> }
>>
>> -static struct iommu_domain *
>> -__iommu_group_domain_alloc(struct iommu_group *group, unsigned int type)
>> +static int __iommu_domain_alloc_dev(struct device *dev, void *data)
>> {
>
> Why?
Because out of 3 callsites, two were in a context which now needed to
make the iommu_group_first_dev() call itself anyway, thus recalculating
it twice more with the mutex still held would clearly be silly, and so
rather than bother refactoring the __iommu_group_domain_alloc() helper I
squashed it into its one remaining user. I don't know a good way to tell
git-format-patch to show a particular range of lines as a complete
removal of one function plus the addition of an entirely unrelated new
one which just happens to be in the same place.
> The point of this design is that drivers are not allowed to
> allocate different things for devices in the same group. So we always
> force the driver to see only the first device in the group even if we
> have a more specific device available in the call chain.
OK, but I can't read your mind. All I have visibility of is some
code which factored out a helper function for a sequence common to
several users, as is typical; neither the commit message of 8359cf39acba
nor the cover letter from that series provide any obvious explanation of
this "design".
> This patch has undone this design and passed in more specific devs :(
Um... Good? I mean in 3/4 cases it's literally the exact same code just
factored out again, while the one I've added picks some arbitrary device
in a different way. But the first device in the group list is still just
that - some arbitrary device - it's by no means guaranteed to be the
*same* device each time the group is re-locked, so pretending it's some
kind of useful invariant is misleading and wrong. Frankly now that you
*have* explained the design intent here, it's only made me more set on
removing it for being unclear, overcomplicated, and yet fundamentally
useless.
Yes, we absolutely expect that if dev_iommu_ops(dev)->device_group for
two devices returns the same group, the driver should bloody well give
the same result for either dev_iommu_ops(dev)->domain_alloc, but there's
no practical way to enforce that at this level of code. If a driver ever
did have inconsistent behaviour across devices within a group, trying to
always use the "one true device" would only help *hide* that and make it
harder to debug, not at all prevent it.
> The new code here:
>
>> struct iommu_domain *iommu_domain_alloc(const struct bus_type *bus)
>> {
>> - if (bus == NULL || bus->iommu_ops == NULL)
>> + struct device *dev = NULL;
>> +
>> + /* We always check the whole bus, so the return value isn't useful */
>> + bus_for_each_dev(bus, NULL, &dev, __iommu_domain_alloc_dev);
>> + if (!dev)
>> return NULL;
>> - return __iommu_domain_alloc(bus->iommu_ops, NULL,
>> - IOMMU_DOMAIN_UNMANAGED);
>> +
>> + return __iommu_domain_alloc(dev, IOMMU_DOMAIN_UNMANAGED);
>> }
>> EXPORT_SYMBOL_GPL(iommu_domain_alloc);
>
> Should just obtain any group for the bus and pass that to
> __iommu_group_domain_alloc().
>
> Also, how does the locking work here? Definately can't pass dev
> outside the bus_for_each_dev() like this.
It works like in most of the rest of the API right now... this is still
the process of cleaning up the old ugly stuff in order to be able to
evolve the API foundations to the point where we *can* have a reasonable
locking scheme. It's a multi-step process, and I'd really like to get
the internal bus-ops-to-instance-ops transition completed in order to
then clean up the "of_iommu_configure() at driver probe time"
catastrophe, get iommu_domain_alloc() callers to pass a valid device
themselves, teach the ARM DMA ops about default domains (for which it's
a great help that you've now got the driver-side groundwork done,
thanks!), and then maybe we can finally have nice things.
> If this needs to sweep over arbitary devices that are not the caller's
> probe'd device it needs to hold at least the device_lock to prevent
> racing with release.
>
> So I'd structure this to find the matching device, lock the
> device_lock, get the group refcount, unlock the device_lock then
> get the group_mutex, check for non-empty and then call
> __iommu_group_domain_alloc()
...and as your reward you'd get much the same deadlocks as with the last
attempt to bring device_lock into it, thanks to
arm_iommu_create_mapping(), and drivers calling iommu_domain_alloc()
from their own probe routines :(
FYI the diff below is what I've hacked out so far, but I'll test it with
a fresh head tomorrow (just pasting it in here I've noticed at least one
bug...)
> (there is a missing lockdep annotation in
> __iommu_group_domain_alloc(), the group mutex is needed)
(fear not, iommu_group_first_dev() brings that to those places!)
Thanks,
Robin.
----->8-----
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3d29434d57c6..78366e988e31 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2128,27 +2128,41 @@ static struct iommu_domain *__iommu_domain_alloc(struct device *dev,
static int __iommu_domain_alloc_dev(struct device *dev, void *data)
{
struct device **alloc_dev = data;
+ struct iommu_group *group = iommu_group_get(dev);
- if (!dev_has_iommu(dev))
+ if (!group)
return 0;
- WARN_ONCE(*alloc_dev && dev_iommu_ops(dev) != dev_iommu_ops(*alloc_dev),
- "Multiple IOMMU drivers present, which the public IOMMU API can't fully support yet. You may still need to disable one or more to get the expected result here, sorry!\n");
+ mutex_lock(&group->mutex);
+ /* Theoretically we could have raced against release */
+ if (list_empty(&group->devices)) {
+ mutex_unlock(&group->mutex);
+ iommu_group_put(group);
+ return 0;
+ }
- *alloc_dev = dev;
+ if (!*alloc_dev)
+ *alloc_dev = dev;
+
+ WARN_ONCE(dev_iommu_ops(dev) != dev_iommu_ops(*alloc_dev),
+ "Multiple IOMMU drivers present, which the public IOMMU API can't fully support yet. You may still need to disable one or more to get the expected result here, sorry!\n");
return 0;
}
struct iommu_domain *iommu_domain_alloc(const struct bus_type *bus)
{
struct device *dev = NULL;
+ struct iommu_domain *dom;
/* We always check the whole bus, so the return value isn't useful */
bus_for_each_dev(bus, NULL, &dev, __iommu_domain_alloc_dev);
if (!dev)
return NULL;
- return __iommu_domain_alloc(dev, IOMMU_DOMAIN_UNMANAGED);
+ dom = __iommu_domain_alloc(dev, IOMMU_DOMAIN_UNMANAGED);
+ mutex_unlock(&dev->iommu_group->mutex);
+ iommu_group_put(dev->iommu_group);
+ return dom;
}
EXPORT_SYMBOL_GPL(iommu_domain_alloc);
On Mon, Oct 02, 2023 at 08:02:23PM +0100, Robin Murphy wrote:
> On 02/10/2023 3:16 pm, Jason Gunthorpe wrote:
> > On Mon, Oct 02, 2023 at 02:49:12PM +0100, Robin Murphy wrote:
> > > @@ -2120,20 +2120,30 @@ static struct iommu_domain *__iommu_domain_alloc(const struct iommu_ops *ops,
> > > return domain;
> > > }
> > > -static struct iommu_domain *
> > > -__iommu_group_domain_alloc(struct iommu_group *group, unsigned int type)
> > > +static int __iommu_domain_alloc_dev(struct device *dev, void *data)
> > > {
> >
> > Why?
>
> Because out of 3 callsites, two were in a context which now needed to
> make the iommu_group_first_dev() call itself anyway,
I don't see it. Why not just do this?
static int __iommu_domain_alloc_dev(struct device *dev, void *data)
{
/* FIXME: This is not correctly locked */
struct iommu_group *group = iommu_group_get(dev);
struct group **alloc_group = data;
if (!group)
return 0;
mutex_lock(&group->mutex);
/* Theoretically we could have raced against release */
if (list_empty(&group->devices)) {
mutex_unlock(&group->mutex);
iommu_group_put(group);
return 0;
}
*alloc_group = group;
return 1;
}
struct iommu_domain *iommu_domain_alloc(const struct bus_type *bus)
{
struct iommu_group *alloc_group;
struct iommu_domain *dom;
/* Only one driver is supported with this interface */
if (WARN_ON(iommu_num_drivers > 1))
return NULL;
bus_for_each_dev(bus, NULL, &alloc_group, __iommu_domain_alloc_dev);
if (!alloc_group)
return NULL;
dom = __iommu_group_domain_alloc(alloc_group, IOMMU_DOMAIN_UNMANAGED);
mutex_unlock(&alloc_group->mutex);
iommu_group_put(alloc_group);
return dom;
}
EXPORT_SYMBOL_GPL(iommu_domain_alloc);
(and ++/-- iommu_num_drivers in iommu_device_register)
One patch, it's pretty easy???
> Um... Good? I mean in 3/4 cases it's literally the exact same code just
> factored out again, while the one I've added picks some arbitrary device
> in a different way.
Sure, but the whole point was to make it obvious that there was no
direct linkage from the various dev parameters we have in places and
what dev will be passed by the driver. Everything passes through
__iommu_group_domain_alloc() and at the end of the day that should be
the only allocation API.
Jason
On 02/10/2023 8:36 pm, Jason Gunthorpe wrote:
> On Mon, Oct 02, 2023 at 08:02:23PM +0100, Robin Murphy wrote:
>> On 02/10/2023 3:16 pm, Jason Gunthorpe wrote:
>>> On Mon, Oct 02, 2023 at 02:49:12PM +0100, Robin Murphy wrote:
>>>> @@ -2120,20 +2120,30 @@ static struct iommu_domain *__iommu_domain_alloc(const struct iommu_ops *ops,
>>>> return domain;
>>>> }
>>>> -static struct iommu_domain *
>>>> -__iommu_group_domain_alloc(struct iommu_group *group, unsigned int type)
>>>> +static int __iommu_domain_alloc_dev(struct device *dev, void *data)
>>>> {
>>>
>>> Why?
>>
>> Because out of 3 callsites, two were in a context which now needed to
>> make the iommu_group_first_dev() call itself anyway,
>
> I don't see it. Why not just do this?
>
> static int __iommu_domain_alloc_dev(struct device *dev, void *data)
> {
> /* FIXME: This is not correctly locked */
> struct iommu_group *group = iommu_group_get(dev);
> struct group **alloc_group = data;
>
> if (!group)
> return 0;
>
> mutex_lock(&group->mutex);
> /* Theoretically we could have raced against release */
> if (list_empty(&group->devices)) {
> mutex_unlock(&group->mutex);
> iommu_group_put(group);
> return 0;
> }
>
> *alloc_group = group;
> return 1;
> }
>
> struct iommu_domain *iommu_domain_alloc(const struct bus_type *bus)
> {
> struct iommu_group *alloc_group;
> struct iommu_domain *dom;
>
> /* Only one driver is supported with this interface */
> if (WARN_ON(iommu_num_drivers > 1))
> return NULL;
>
> bus_for_each_dev(bus, NULL, &alloc_group, __iommu_domain_alloc_dev);
> if (!alloc_group)
> return NULL;
> dom = __iommu_group_domain_alloc(alloc_group, IOMMU_DOMAIN_UNMANAGED);
> mutex_unlock(&alloc_group->mutex);
> iommu_group_put(alloc_group);
> return dom;
> }
> EXPORT_SYMBOL_GPL(iommu_domain_alloc);
>
> (and ++/-- iommu_num_drivers in iommu_device_register)
>
> One patch, it's pretty easy???
Sure, that would then leave two callsites for
__iommu_group_domain_alloc(), which might in turn justify leaving it
factored out, but the endgame is to get a device to resolve
dev_iommu_ops(), so if we're starting with a suitable device in hand
then using its group to immediately re-derive a device again when we
*could* trivially pass the device straight through is silly and
overcomplicated.
However it turns out that we can't easily touch the group here at all
without then needing dependent changes in VFIO, so rather than invite
any more scope creep at this point I'm going to respin this as a purely
sideways step that sticks to resolving ops from a bus, and save any
further device/instance niceties for step 2 when I have to touch all the
external callers anyway.
>> Um... Good? I mean in 3/4 cases it's literally the exact same code just
>> factored out again, while the one I've added picks some arbitrary device
>> in a different way.
>
> Sure, but the whole point was to make it obvious that there was no
> direct linkage from the various dev parameters we have in places and
> what dev will be passed by the driver.
But that's the argument that makes no sense - it happens to be the case
in the current code that all default domain allocation sites are already
buried in several layers worth of group-based machinery, but that in
itself holds no significance to the allocation process. I maintain that
it is simply misleading to pretend that (taking a little artistic
license with types) a round trip through
dev->iommu_group->devices->next->dev->iommu holds any significance over
just dev->iommu. There's hardly anything "obvious" about it either -
it's buried in core code that driver authors have little reason to even
look at.
If I'm implementing a driver, I'm going to see the signature of the op I
need to implement, which tells me to allocate a domain and gives me a
device to allocate it for, and for reference I'm going to look at how
other drivers implement that op. I would not start by picking through
the core code for the callers of the caller of that op, to see some
weird code that looks redundant when in practice I observe it resolving
a device back to itself, and think "ah yes, now I see the undocumented
significance of an idea that only existed in Jason's head". If you don't
want an IOMMU driver to be able to think the particular device is
important, don't pass a bloody device! If the only intended purpose is
for the driver to resolve dev->iommu instance data, then have the core
do that and just pass the dev_iommu or iommu_dev directly; ambiguity solved.
If one is expected to look at subtleties 2 or 3 levels down the
callchain of an API in order to understand how to implement that API
correctly, that is *a badly designed API*, and piling on more
"correctness theatre" code to attempt to highlight the bad design does
not address the real problem there.
> Everything passes through
> __iommu_group_domain_alloc() and at the end of the day that should be
> the only allocation API.
But *why* should it? What's inherently more special about a group vs. a
device or an IOMMU instance (especially with those being the things
drivers actually deal with in relation to domains)?
If you've noticed, between here and patch #3 I already end up
effectively enforcing that devices in the same group must have identical
device ops - that's arguably an even stronger requirement than we need,
but it fits the current drivers so we may as well not relax it until
proven necessary. So AFAICS the problem you're desperate to address is a
theoretical one of a driver author going out of their way to implement a
single .domain_alloc_paging implementation badly, in a way which would
only affect the behaviour of that driver, and should be easy to spot in
patch review anyway. Any complexity in core code in an attempt to make
any difference to that seems about as worthwhile as trying to hold back
the tide.
Thanks,
Robin.
On Wed, Oct 04, 2023 at 06:23:05PM +0100, Robin Murphy wrote:
> Sure, that would then leave two callsites for __iommu_group_domain_alloc(),
> which might in turn justify leaving it factored out, but the endgame is to
> get a device to resolve dev_iommu_ops(), so if we're starting with a
> suitable device in hand then using its group to immediately re-derive a
> device again when we *could* trivially pass the device straight through is
> silly and overcomplicated.
Well, it is the same sort of interface as attaching a domain to a
group. The group stuff is like that.
> However it turns out that we can't easily touch the group here at all
> without then needing dependent changes in VFIO,
Why? VFIO only puts its weird groups on the mdev bus and so they will
never hit this.
> more scope creep at this point I'm going to respin this as a purely sideways
> step that sticks to resolving ops from a bus, and save any further
> device/instance niceties for step 2 when I have to touch all the external
> callers anyway.
One thing that I've recently realized is that this patch will break
everything when combined with the patches I'm sending to start doing
finalize during domain_alloc_paging() ops. It will randomly select a
device on the bus which is proably the wrong smmu for the device we
intend to attach to later.
So, to keep everything together the special bus domain alloc path has
to continue to pass in a NULL dev to domain_alloc_paging.
Tthat is certainly a pretty good reason why the above can't use the
existing group call and then you can make the case it doesn't have
enough users to exist anymore.
> But that's the argument that makes no sense - it happens to be the case in
> the current code that all default domain allocation sites are already buried
> in several layers worth of group-based machinery, but that in itself holds
> no significance to the allocation process.
Wel yes, but also it is what it is. The group stuff gets everywhere in
undesired ways. The original version of this did write it they way you
suggest and Kevin came with the idea to have a group alloc domain. I
tried it and it was indeed cleaner. One place to do the Quite Strange
thing of randomly choosing a device in a group, and combined with the
get group ops function it allowed all the group centric code stay group
centric until the very fringe.
Keeping it clear that the dev was infact just made up in alot of call
paths is a nice bonus.
> only existed in Jason's head". If you don't want an IOMMU driver to be able
> to think the particular device is important, don't pass a bloody device! If
> the only intended purpose is for the driver to resolve dev->iommu instance
> data, then have the core do that and just pass the dev_iommu or iommu_dev
> directly; ambiguity solved.
Oh, I wanted to do that! I looked at doing that even. It doesn't work
for amd and smmuv3 :( They needs to know if the group has any PASID
capable devices or not to choose a PASID compatible RID domain for the
DMA API use. Can't get that from the iommu_dev alone
> If one is expected to look at subtleties 2 or 3 levels down the callchain of
> an API in order to understand how to implement that API correctly, that is
> *a badly designed API*, and piling on more "correctness theatre" code to
> attempt to highlight the bad design does not address the real problem there.
Oh, I agree, this group stuff is troubled like that. It confuses all
kinds of unrelated things into one bundle.
The group stuff exists at the core layer where everything wants to
operate on groups. We attach devices to groups, we allocate domains
for groups. Then the device level works on devices, not groups, and we
have this confusion.
> If you've noticed, between here and patch #3 I already end up effectively
> enforcing that devices in the same group must have identical device ops -
Doesn't it just fall out of the probe rework? All devices are now
attached to a domain during probe. If the single domain in the group
fails attach because of your patch 4 then the device will not probe.
Unraveling that would require multiple domains per groups, which
seems like a huge undertaking :)
> So AFAICS the problem you're desperate to address is a theoretical
> one of a driver author going out of their way to implement a single
> .domain_alloc_paging implementation badly, in a way which would only
> affect the behaviour of that driver, and should be easy to spot in
> patch review anyway. Any complexity in core code in an attempt to
> make any difference to that seems about as worthwhile as trying to
> hold back the tide.
Well note quite.. There are quite obvious things that can trip up a
driver. The requirement for a special RID IOPTE format to get PASID
support is tricky. The drivers are all functionally OK because we link
PASID support to singleton groups, but even that took a long time to
arrive at.
So, please try a v5, lets see what it looks like with the
domain_alloc_paging and semi-locking fix?
Jason