2023-01-19 19:57:02

by Robin Murphy

[permalink] [raw]
Subject: [PATCH 0/8] iommu: The early demise of bus ops

Hi all,

[ Christoph, Greg, Rafael; feel free to ignore all the IOMMU details,
just a heads-up for some pretty trivial header motion in patch #6 ]

This is sort of an RFC, in the sense that the patches are functionally
ready but I don't expect that we necessarily want to merge all them
right away; at this point it's more for the sake of visibility and
checking if anyone strongly objects to the direction I'm taking. As such
I've based these patches on 6.2-rc3 and made no effort to integrate them
with the IOMMUFD-related work going on in parallel and/or already
queued, even though there is some functional intersection and almost
certain conflicts. If we reach a consensus that we would like any of
this for 6.3 I'll rebase as appropriate.

Patches #1-6 here work up to what I originally expected to be the
triumphant finale of the whole mission, but as it turns out is actually
feasible and convenient to get out of the way *before* getting into the
really fiddly bits of refactoring the DT/ACPI of_xlate stuff, ARM DMA
ops, and the iommu_domain_alloc() interface. Patch #8 is included here
as the precursor to another cleanup series for drivers that currently
have an awkward domain_finalise step in their .attach_dev op, but I'm
unlikely to start writing those patches for a while yet. Patch #7 is
also here nominally, but in fact I think it could already go anywhere
since the last rework of iommu_device_register().

Thanks,
Robin.


Robin Murphy (8):
iommu: Decouple iommu_present() from bus ops
iommu: Validate that devices match domains
iommu: Factor out a "first device in group" helper
iommu: Switch __iommu_domain_alloc() to device ops
iommu/arm-smmu: Don't register fwnode for legacy binding
iommu: Retire bus ops
iommu: Clean up open-coded ownership checks
iommu: Pass device through ops->domain_alloc

drivers/iommu/amd/iommu.c | 3 +-
drivers/iommu/apple-dart.c | 3 +-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 6 +-
drivers/iommu/arm/arm-smmu/arm-smmu.c | 15 +--
drivers/iommu/arm/arm-smmu/qcom_iommu.c | 19 +--
drivers/iommu/exynos-iommu.c | 3 +-
drivers/iommu/fsl_pamu_domain.c | 3 +-
drivers/iommu/intel/iommu.c | 3 +-
drivers/iommu/iommu.c | 130 +++++++++++++-------
drivers/iommu/ipmmu-vmsa.c | 3 +-
drivers/iommu/msm_iommu.c | 3 +-
drivers/iommu/mtk_iommu.c | 10 +-
drivers/iommu/mtk_iommu_v1.c | 6 +-
drivers/iommu/omap-iommu.c | 3 +-
drivers/iommu/rockchip-iommu.c | 3 +-
drivers/iommu/s390-iommu.c | 3 +-
drivers/iommu/sprd-iommu.c | 11 +-
drivers/iommu/sun50i-iommu.c | 3 +-
drivers/iommu/tegra-gart.c | 3 +-
drivers/iommu/tegra-smmu.c | 3 +-
drivers/iommu/virtio-iommu.c | 6 +-
include/acpi/acpi_bus.h | 2 +
include/linux/device.h | 1 -
include/linux/device/bus.h | 5 -
include/linux/dma-map-ops.h | 1 +
include/linux/iommu.h | 4 +-
26 files changed, 139 insertions(+), 116 deletions(-)

--
2.36.1.dirty


2023-01-19 19:58:08

by Robin Murphy

[permalink] [raw]
Subject: [PATCH 6/8] iommu: Retire bus ops

With the rest of the API internals converted, it's time to finally
tackle probe_device and how we bootstrap the per-device ops association
to begin with. This ends up being disappointingly straightforward, since
fwspec users are already doing it in order to find their of_xlate
callback, and it works out that we can easily do the equivalent for
other drivers too. Then shuffle the remaining awareness of iommu_ops
into the couple of core headers that still need it, and breathe a sigh
of relief...

Ding dong the bus ops are gone!

CC: Rafael J. Wysocki <[email protected]>
CC: Greg Kroah-Hartman <[email protected]>
CC: Christoph Hellwig <[email protected]>
Signed-off-by: Robin Murphy <[email protected]>
---
drivers/iommu/iommu.c | 27 ++++++++++++++++-----------
include/acpi/acpi_bus.h | 2 ++
include/linux/device.h | 1 -
include/linux/device/bus.h | 5 -----
include/linux/dma-map-ops.h | 1 +
5 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 1a31d94adff5..8997b8f2e79f 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -218,13 +218,6 @@ int iommu_device_register(struct iommu_device *iommu,
/* We need to be able to take module references appropriately */
if (WARN_ON(is_module_address((unsigned long)ops) && !ops->owner))
return -EINVAL;
- /*
- * Temporarily enforce global restriction to a single driver. This was
- * already the de-facto behaviour, since any possible combination of
- * existing drivers would compete for at least the PCI or platform bus.
- */
- if (iommu_buses[0]->iommu_ops && iommu_buses[0]->iommu_ops != ops)
- return -EBUSY;

iommu->ops = ops;
if (hwdev)
@@ -234,10 +227,8 @@ int iommu_device_register(struct iommu_device *iommu,
list_add_tail(&iommu->list, &iommu_device_list);
spin_unlock(&iommu_device_lock);

- for (int i = 0; i < ARRAY_SIZE(iommu_buses) && !err; i++) {
- iommu_buses[i]->iommu_ops = ops;
+ for (int i = 0; i < ARRAY_SIZE(iommu_buses) && !err; i++)
err = bus_iommu_probe(iommu_buses[i]);
- }
if (err)
iommu_device_unregister(iommu);
return err;
@@ -303,12 +294,26 @@ static u32 dev_iommu_get_max_pasids(struct device *dev)

static int __iommu_probe_device(struct device *dev, struct list_head *group_list)
{
- const struct iommu_ops *ops = dev->bus->iommu_ops;
+ const struct iommu_ops *ops;
struct iommu_device *iommu_dev;
+ struct iommu_fwspec *fwspec;
struct iommu_group *group;
static DEFINE_MUTEX(iommu_probe_device_lock);
int ret;

+ /*
+ * For FDT-based systems and ACPI IORT/VIOT, drivers register IOMMU
+ * instances with non-NULL fwnodes, and client devices should have been
+ * identified with a fwspec by this point. For Intel/AMD/s390/PAMU we
+ * can assume a single active driver with global ops, and so grab those
+ * from any registered instance, cheekily co-opting the same mechanism.
+ */
+ fwspec = dev_iommu_fwspec_get(dev);
+ if (fwspec && fwspec->ops)
+ ops = fwspec->ops;
+ else
+ ops = iommu_ops_from_fwnode(NULL);
+
if (!ops)
return -ENODEV;
/*
diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index cd3b75e08ec3..067dde9291c9 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -614,6 +614,8 @@ struct acpi_pci_root {

/* helper */

+struct iommu_ops;
+
bool acpi_dma_supported(const struct acpi_device *adev);
enum dev_dma_attr acpi_get_dma_attr(struct acpi_device *adev);
int acpi_iommu_fwspec_init(struct device *dev, u32 id,
diff --git a/include/linux/device.h b/include/linux/device.h
index 44e3acae7b36..f7a7ecafedd3 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -41,7 +41,6 @@ struct class;
struct subsys_private;
struct device_node;
struct fwnode_handle;
-struct iommu_ops;
struct iommu_group;
struct dev_pin_info;
struct dev_iommu;
diff --git a/include/linux/device/bus.h b/include/linux/device/bus.h
index d8b29ccd07e5..4ece3470112f 100644
--- a/include/linux/device/bus.h
+++ b/include/linux/device/bus.h
@@ -63,9 +63,6 @@ struct fwnode_handle;
* this bus.
* @pm: Power management operations of this bus, callback the specific
* device driver's pm-ops.
- * @iommu_ops: IOMMU specific operations for this bus, used to attach IOMMU
- * driver implementations to a bus and allow the driver to do
- * bus-specific setup
* @p: The private data of the driver core, only the driver core can
* touch this.
* @lock_key: Lock class key for use by the lock validator
@@ -109,8 +106,6 @@ struct bus_type {

const struct dev_pm_ops *pm;

- const struct iommu_ops *iommu_ops;
-
struct subsys_private *p;
struct lock_class_key lock_key;

diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index d678afeb8a13..e8ebf0bf611b 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -10,6 +10,7 @@
#include <linux/pgtable.h>

struct cma;
+struct iommu_ops;

/*
* Values for struct dma_map_ops.flags:
--
2.36.1.dirty

2023-01-19 19:58:32

by Robin Murphy

[permalink] [raw]
Subject: [PATCH 5/8] iommu/arm-smmu: Don't register fwnode for legacy binding

When using the legacy binding we bypass the of_xlate mechanism, so avoid
registering the instance fwnodes which act as keys for that. This will
help __iommu_probe_device() to retrieve the registered ops the same way
as for x86 etc. when no fwspec has previously been set up by of_xlate.

Signed-off-by: Robin Murphy <[email protected]>
---
drivers/iommu/arm/arm-smmu/arm-smmu.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 719fbca1fe52..607f06af01b6 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -2156,7 +2156,8 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
return err;
}

- err = iommu_device_register(&smmu->iommu, &arm_smmu_ops, dev);
+ err = iommu_device_register(&smmu->iommu, &arm_smmu_ops,
+ using_legacy_binding ? NULL : dev);
if (err) {
dev_err(dev, "Failed to register iommu\n");
iommu_device_sysfs_remove(&smmu->iommu);
--
2.36.1.dirty

2023-01-19 20:30:09

by Robin Murphy

[permalink] [raw]
Subject: [PATCH 3/8] iommu: Factor out a "first device in group" helper

This pattern for picking the first device out of the group list is
repeated a few times now, so it's clearly worth factoring out.

Signed-off-by: Robin Murphy <[email protected]>
---
drivers/iommu/iommu.c | 22 ++++++++++------------
1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index bc53ffbba4de..5b37766a09e2 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1084,6 +1084,11 @@ void iommu_group_remove_device(struct device *dev)
}
EXPORT_SYMBOL_GPL(iommu_group_remove_device);

+static struct device *iommu_group_first_dev(struct iommu_group *group)
+{
+ return list_first_entry(&group->devices, struct group_device, list)->dev;
+}
+
static int iommu_group_device_count(struct iommu_group *group)
{
struct group_device *entry;
@@ -2835,7 +2840,6 @@ static int iommu_change_dev_def_domain(struct iommu_group *group,
struct device *prev_dev, int type)
{
struct iommu_domain *prev_dom;
- struct group_device *grp_dev;
int ret, dev_def_dom;
struct device *dev;

@@ -2867,8 +2871,7 @@ static int iommu_change_dev_def_domain(struct iommu_group *group,
}

/* Since group has only one device */
- grp_dev = list_first_entry(&group->devices, struct group_device, list);
- dev = grp_dev->dev;
+ dev = iommu_group_first_dev(group);

if (prev_dev != dev) {
dev_err_ratelimited(prev_dev, "Cannot change default domain: Device has been changed\n");
@@ -2965,7 +2968,6 @@ static int iommu_change_dev_def_domain(struct iommu_group *group,
static ssize_t iommu_group_store_type(struct iommu_group *group,
const char *buf, size_t count)
{
- struct group_device *grp_dev;
struct device *dev;
int ret, req_type;

@@ -3000,8 +3002,7 @@ static ssize_t iommu_group_store_type(struct iommu_group *group,
}

/* Since group has only one device */
- grp_dev = list_first_entry(&group->devices, struct group_device, list);
- dev = grp_dev->dev;
+ dev = iommu_group_first_dev(group);
get_device(dev);

/*
@@ -3126,21 +3127,18 @@ void iommu_device_unuse_default_domain(struct device *dev)

static int __iommu_group_alloc_blocking_domain(struct iommu_group *group)
{
- struct group_device *dev =
- list_first_entry(&group->devices, struct group_device, list);
+ struct device *dev = iommu_group_first_dev(group);

if (group->blocking_domain)
return 0;

- group->blocking_domain =
- __iommu_domain_alloc(dev->dev->bus, IOMMU_DOMAIN_BLOCKED);
+ group->blocking_domain = __iommu_domain_alloc(dev->bus, IOMMU_DOMAIN_BLOCKED);
if (!group->blocking_domain) {
/*
* For drivers that do not yet understand IOMMU_DOMAIN_BLOCKED
* create an empty domain instead.
*/
- group->blocking_domain = __iommu_domain_alloc(
- dev->dev->bus, IOMMU_DOMAIN_UNMANAGED);
+ group->blocking_domain = __iommu_domain_alloc(dev->bus, IOMMU_DOMAIN_UNMANAGED);
if (!group->blocking_domain)
return -EINVAL;
}
--
2.36.1.dirty

2023-01-19 20:39:01

by Robin Murphy

[permalink] [raw]
Subject: [PATCH 1/8] iommu: Decouple iommu_present() from bus ops

Much as I'd like to remove iommu_present(), the final remaining users
are proving stubbornly difficult to clean up, so kick that can down
the road and just rework it to preserve the current behaviour without
depending on bus ops.

Signed-off-by: Robin Murphy <[email protected]>
---
drivers/iommu/iommu.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index b189ed345057..a77d58e1b976 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1871,9 +1871,24 @@ int bus_iommu_probe(struct bus_type *bus)
return ret;
}

+static int __iommu_present(struct device *dev, void *unused)
+{
+ return device_iommu_mapped(dev);
+}
+
+/**
+ * iommu_present() - make platform-specific assumptions about an IOMMU
+ * @bus: bus to check
+ *
+ * Do not use this function. You want device_iommu_mapped() instead.
+ *
+ * Return: true if some IOMMU is present for some device on the given bus. In
+ * general it may not be the only IOMMU, and it may not be for the device you
+ * are ultimately interested in.
+ */
bool iommu_present(struct bus_type *bus)
{
- return bus->iommu_ops != NULL;
+ return bus_for_each_dev(bus, NULL, NULL, __iommu_present) > 0;
}
EXPORT_SYMBOL_GPL(iommu_present);

--
2.36.1.dirty

2023-01-19 20:39:14

by Robin Murphy

[permalink] [raw]
Subject: [PATCH 7/8] iommu: Clean up open-coded ownership checks

Some drivers already implement their own defence against the possibility
of being given someone else's device. Since this is now taken care of by
the core code (and via a slightly different path from the original
fwspec-based idea), let's clean them up.

Signed-off-by: Robin Murphy <[email protected]>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 3 ---
drivers/iommu/arm/arm-smmu/arm-smmu.c | 9 +--------
drivers/iommu/arm/arm-smmu/qcom_iommu.c | 16 +++-------------
drivers/iommu/mtk_iommu.c | 7 +------
drivers/iommu/mtk_iommu_v1.c | 3 ---
drivers/iommu/sprd-iommu.c | 8 +-------
drivers/iommu/virtio-iommu.c | 3 ---
7 files changed, 6 insertions(+), 43 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index ab160198edd6..cb05d9771192 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2642,9 +2642,6 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
struct arm_smmu_master *master;
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);

- if (!fwspec || fwspec->ops != &arm_smmu_ops)
- return ERR_PTR(-ENODEV);
-
if (WARN_ON_ONCE(dev_iommu_priv_get(dev)))
return ERR_PTR(-EBUSY);

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 607f06af01b6..235550db0d59 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1118,11 +1118,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
struct arm_smmu_device *smmu;
int ret;

- if (!fwspec || fwspec->ops != &arm_smmu_ops) {
- dev_err(dev, "cannot attach to SMMU, is it on the same bus?\n");
- return -ENXIO;
- }
-
/*
* FIXME: The arch/arm DMA API code tries to attach devices to its own
* domains between of_xlate() and probe_device() - we have no way to cope
@@ -1352,10 +1347,8 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
fwspec = dev_iommu_fwspec_get(dev);
if (ret)
goto out_free;
- } else if (fwspec && fwspec->ops == &arm_smmu_ops) {
- smmu = arm_smmu_get_by_fwnode(fwspec->iommu_fwnode);
} else {
- return ERR_PTR(-ENODEV);
+ smmu = arm_smmu_get_by_fwnode(fwspec->iommu_fwnode);
}

ret = -EINVAL;
diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
index 270c3d9128ba..b2d3d309be1e 100644
--- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c
+++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
@@ -79,16 +79,6 @@ static struct qcom_iommu_domain *to_qcom_iommu_domain(struct iommu_domain *dom)

static const struct iommu_ops qcom_iommu_ops;

-static struct qcom_iommu_dev * to_iommu(struct device *dev)
-{
- struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
-
- if (!fwspec || fwspec->ops != &qcom_iommu_ops)
- return NULL;
-
- return dev_iommu_priv_get(dev);
-}
-
static struct qcom_iommu_ctx * to_ctx(struct qcom_iommu_domain *d, unsigned asid)
{
struct qcom_iommu_dev *qcom_iommu = d->iommu;
@@ -361,7 +351,7 @@ static void qcom_iommu_domain_free(struct iommu_domain *domain)

static int qcom_iommu_attach_dev(struct iommu_domain *domain, struct device *dev)
{
- struct qcom_iommu_dev *qcom_iommu = to_iommu(dev);
+ struct qcom_iommu_dev *qcom_iommu = dev_iommu_priv_get(dev);
struct qcom_iommu_domain *qcom_domain = to_qcom_iommu_domain(domain);
int ret;

@@ -391,7 +381,7 @@ static void qcom_iommu_detach_dev(struct iommu_domain *domain, struct device *de
{
struct qcom_iommu_domain *qcom_domain = to_qcom_iommu_domain(domain);
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
- struct qcom_iommu_dev *qcom_iommu = to_iommu(dev);
+ struct qcom_iommu_dev *qcom_iommu = dev_iommu_priv_get(dev);
unsigned i;

if (WARN_ON(!qcom_domain->iommu))
@@ -508,7 +498,7 @@ static bool qcom_iommu_capable(struct device *dev, enum iommu_cap cap)

static struct iommu_device *qcom_iommu_probe_device(struct device *dev)
{
- struct qcom_iommu_dev *qcom_iommu = to_iommu(dev);
+ struct qcom_iommu_dev *qcom_iommu = dev_iommu_priv_get(dev);
struct device_link *link;

if (!qcom_iommu)
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 2badd6acfb23..005136a4cc36 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -784,16 +784,11 @@ static phys_addr_t mtk_iommu_iova_to_phys(struct iommu_domain *domain,
static struct iommu_device *mtk_iommu_probe_device(struct device *dev)
{
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
- struct mtk_iommu_data *data;
+ struct mtk_iommu_data *data = dev_iommu_priv_get(dev);
struct device_link *link;
struct device *larbdev;
unsigned int larbid, larbidx, i;

- if (!fwspec || fwspec->ops != &mtk_iommu_ops)
- return ERR_PTR(-ENODEV); /* Not a iommu client device */
-
- data = dev_iommu_priv_get(dev);
-
if (!MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_MM))
return &data->iommu;

diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index 69682ee068d2..dff8ea0af884 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -478,9 +478,6 @@ static struct iommu_device *mtk_iommu_v1_probe_device(struct device *dev)
idx++;
}

- if (!fwspec || fwspec->ops != &mtk_iommu_v1_ops)
- return ERR_PTR(-ENODEV); /* Not a iommu client device */
-
data = dev_iommu_priv_get(dev);

/* Link the consumer device with the smi-larb device(supplier) */
diff --git a/drivers/iommu/sprd-iommu.c b/drivers/iommu/sprd-iommu.c
index 219bfa11f7f4..4cebccb6fc8b 100644
--- a/drivers/iommu/sprd-iommu.c
+++ b/drivers/iommu/sprd-iommu.c
@@ -373,13 +373,7 @@ static phys_addr_t sprd_iommu_iova_to_phys(struct iommu_domain *domain,

static struct iommu_device *sprd_iommu_probe_device(struct device *dev)
{
- struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
- struct sprd_iommu_device *sdev;
-
- if (!fwspec || fwspec->ops != &sprd_iommu_ops)
- return ERR_PTR(-ENODEV);
-
- sdev = dev_iommu_priv_get(dev);
+ struct sprd_iommu_device *sdev = dev_iommu_priv_get(dev);

return &sdev->iommu;
}
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 5b8fe9bfa9a5..59f1abd6ee53 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -945,9 +945,6 @@ static struct iommu_device *viommu_probe_device(struct device *dev)
struct viommu_dev *viommu = NULL;
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);

- if (!fwspec || fwspec->ops != &viommu_ops)
- return ERR_PTR(-ENODEV);
-
viommu = viommu_get_by_fwnode(fwspec->iommu_fwnode);
if (!viommu)
return ERR_PTR(-ENODEV);
--
2.36.1.dirty

2023-01-19 20:44:52

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 0/8] iommu: The early demise of bus ops

On Thu, Jan 19, 2023 at 07:18:18PM +0000, Robin Murphy wrote:
> Hi all,
>
> [ Christoph, Greg, Rafael; feel free to ignore all the IOMMU details,
> just a heads-up for some pretty trivial header motion in patch #6 ]
>
> This is sort of an RFC, in the sense that the patches are functionally
> ready but I don't expect that we necessarily want to merge all them
> right away; at this point it's more for the sake of visibility and
> checking if anyone strongly objects to the direction I'm taking.

Looks good to me

Is there anything beyond some typing preventing the removal of the bus
from the out-of-iommu callers?

> I've based these patches on 6.2-rc3 and made no effort to integrate them
> with the IOMMUFD-related work going on in parallel and/or already
> queued, even though there is some functional intersection and almost
> certain conflicts. If we reach a consensus that we would like any of
> this for 6.3 I'll rebase as appropriate.

I don't see a reason to wait..

Jason

2023-01-20 05:24:46

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH 6/8] iommu: Retire bus ops

On 2023/1/20 3:18, Robin Murphy wrote:
> + /*
> + * For FDT-based systems and ACPI IORT/VIOT, drivers register IOMMU
> + * instances with non-NULL fwnodes, and client devices should have been
> + * identified with a fwspec by this point. For Intel/AMD/s390/PAMU we
> + * can assume a single active driver with global ops, and so grab those
> + * from any registered instance, cheekily co-opting the same mechanism.
> + */
> + fwspec = dev_iommu_fwspec_get(dev);
> + if (fwspec && fwspec->ops)
> + ops = fwspec->ops;
> + else
> + ops = iommu_ops_from_fwnode(NULL);

I'm imagining if Intel/AMD/s390 drivers need to give up global ops.
Is there any way to allow them to make such conversion? I am just
thinking about whether this is a hard limitation for these drivers.

Best regards,
baolu

2023-01-20 10:29:20

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 6/8] iommu: Retire bus ops

On Thu, Jan 19, 2023 at 07:18:24PM +0000, Robin Murphy wrote:
> With the rest of the API internals converted, it's time to finally
> tackle probe_device and how we bootstrap the per-device ops association
> to begin with. This ends up being disappointingly straightforward, since
> fwspec users are already doing it in order to find their of_xlate
> callback, and it works out that we can easily do the equivalent for
> other drivers too. Then shuffle the remaining awareness of iommu_ops
> into the couple of core headers that still need it, and breathe a sigh
> of relief...
>
> Ding dong the bus ops are gone!
>
> CC: Rafael J. Wysocki <[email protected]>
> CC: Greg Kroah-Hartman <[email protected]>
> CC: Christoph Hellwig <[email protected]>
> Signed-off-by: Robin Murphy <[email protected]>

Nice work!

Acked-by: Greg Kroah-Hartman <[email protected]>

2023-01-20 13:06:32

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH 6/8] iommu: Retire bus ops

On 2023-01-20 00:27, Baolu Lu wrote:
> On 2023/1/20 3:18, Robin Murphy wrote:
>> +    /*
>> +     * For FDT-based systems and ACPI IORT/VIOT, drivers register IOMMU
>> +     * instances with non-NULL fwnodes, and client devices should
>> have been
>> +     * identified with a fwspec by this point. For
>> Intel/AMD/s390/PAMU we
>> +     * can assume a single active driver with global ops, and so grab
>> those
>> +     * from any registered instance, cheekily co-opting the same
>> mechanism.
>> +     */
>> +    fwspec = dev_iommu_fwspec_get(dev);
>> +    if (fwspec && fwspec->ops)
>> +        ops = fwspec->ops;
>> +    else
>> +        ops = iommu_ops_from_fwnode(NULL);
>
> I'm imagining if Intel/AMD/s390 drivers need to give up global ops.
> Is there any way to allow them to make such conversion? I am just
> thinking about whether this is a hard limitation for these drivers.

Yes, they could perhaps bodge into the existing fwnode mechanism, or we
could make bigger changes to adapt and generalise the whole
instance-registration-token-lookup concept, or if the driver can resolve
the correct instance for a device internally, then it could suffice to
just have all its device ops share a single common .probe_device
implementation that does the right thing.

The comment is merely noting the fact that we can get away without
having to worry about those changes just yet, since all the drivers
*are* currently still built around the hard constraint of a single set
of device ops per bus.

Thanks,
Robin.

2023-01-20 13:19:28

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH 0/8] iommu: The early demise of bus ops

Hi Robin,

On Thu, Jan 19, 2023 at 07:18:18PM +0000, Robin Murphy wrote:
> This is sort of an RFC, in the sense that the patches are functionally
> ready but I don't expect that we necessarily want to merge all them
> right away; at this point it's more for the sake of visibility and
> checking if anyone strongly objects to the direction I'm taking. As such
> I've based these patches on 6.2-rc3 and made no effort to integrate them
> with the IOMMUFD-related work going on in parallel and/or already
> queued, even though there is some functional intersection and almost
> certain conflicts. If we reach a consensus that we would like any of
> this for 6.3 I'll rebase as appropriate.

Thanks for doing this, I like the direction this is taking! I see no
strict reason to wait with this, but also no strict reason to hurry it
in.

I would say we have another week for this to get ready to be merged into
6.3, which means having a version based on iommu/core with the reviews
addressed and enough relevant Reviewed-by's.

Regards,

Joerg

2023-01-26 12:37:26

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH 6/8] iommu: Retire bus ops

On 2023/1/20 20:31, Robin Murphy wrote:
> On 2023-01-20 00:27, Baolu Lu wrote:
>> On 2023/1/20 3:18, Robin Murphy wrote:
>>> +    /*
>>> +     * For FDT-based systems and ACPI IORT/VIOT, drivers register IOMMU
>>> +     * instances with non-NULL fwnodes, and client devices should
>>> have been
>>> +     * identified with a fwspec by this point. For
>>> Intel/AMD/s390/PAMU we
>>> +     * can assume a single active driver with global ops, and so
>>> grab those
>>> +     * from any registered instance, cheekily co-opting the same
>>> mechanism.
>>> +     */
>>> +    fwspec = dev_iommu_fwspec_get(dev);
>>> +    if (fwspec && fwspec->ops)
>>> +        ops = fwspec->ops;
>>> +    else
>>> +        ops = iommu_ops_from_fwnode(NULL);
>>
>> I'm imagining if Intel/AMD/s390 drivers need to give up global ops.
>> Is there any way to allow them to make such conversion? I am just
>> thinking about whether this is a hard limitation for these drivers.
>
> Yes, they could perhaps bodge into the existing fwnode mechanism, or we
> could make bigger changes to adapt and generalise the whole
> instance-registration-token-lookup concept, or if the driver can resolve
> the correct instance for a device internally, then it could suffice to
> just have all its device ops share a single common .probe_device
> implementation that does the right thing.

Yes. Sharing a common .probe_device entry and let the IOMMU driver
connect the device and its iommu ops is a feasible solution. Thanks!

>
> The comment is merely noting the fact that we can get away without
> having to worry about those changes just yet, since all the drivers
> *are* currently still built around the hard constraint of a single set
> of device ops per bus.

Yes. Fair enough.

--
Best regards,
baolu


2023-01-26 13:13:17

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH 1/8] iommu: Decouple iommu_present() from bus ops

On 2023/1/20 3:18, Robin Murphy wrote:
> Much as I'd like to remove iommu_present(), the final remaining users
> are proving stubbornly difficult to clean up, so kick that can down
> the road and just rework it to preserve the current behaviour without
> depending on bus ops.
>
> Signed-off-by: Robin Murphy <[email protected]>
> ---
> drivers/iommu/iommu.c | 17 ++++++++++++++++-
> 1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index b189ed345057..a77d58e1b976 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1871,9 +1871,24 @@ int bus_iommu_probe(struct bus_type *bus)
> return ret;
> }
>
> +static int __iommu_present(struct device *dev, void *unused)
> +{
> + return device_iommu_mapped(dev);
> +}

/**
* device_iommu_mapped - Returns true when the device DMA is translated
* by an IOMMU
* @dev: Device to perform the check on
*/
static inline bool device_iommu_mapped(struct device *dev)
{
return (dev->iommu_group != NULL);
}

Perhaps device_iommu_mapped() should be improved. In some cases, the
device has an iommu_group filled is not enough to indicate that the
device has IOMMU hardware for DMA translation.

For example, VFIO could allocate an iommu_group and add a device into
the iommu_group even there's no IOMMU hardware in
vfio_noiommu_group_alloc().

Basically iommu_group_add_device() doesn't check the presence of an
IOMMU.

> +
> +/**
> + * iommu_present() - make platform-specific assumptions about an IOMMU
> + * @bus: bus to check
> + *
> + * Do not use this function. You want device_iommu_mapped() instead.
> + *
> + * Return: true if some IOMMU is present for some device on the given bus. In
> + * general it may not be the only IOMMU, and it may not be for the device you
> + * are ultimately interested in.
> + */
> bool iommu_present(struct bus_type *bus)
> {
> - return bus->iommu_ops != NULL;
> + return bus_for_each_dev(bus, NULL, NULL, __iommu_present) > 0;
> }
> EXPORT_SYMBOL_GPL(iommu_present);
>

--
Best regards,
baolu

2023-01-26 14:21:43

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH 1/8] iommu: Decouple iommu_present() from bus ops

On 2023-01-26 13:13, Baolu Lu wrote:
> On 2023/1/20 3:18, Robin Murphy wrote:
>> Much as I'd like to remove iommu_present(), the final remaining users
>> are proving stubbornly difficult to clean up, so kick that can down
>> the road and just rework it to preserve the current behaviour without
>> depending on bus ops.
>>
>> Signed-off-by: Robin Murphy <[email protected]>
>> ---
>>   drivers/iommu/iommu.c | 17 ++++++++++++++++-
>>   1 file changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index b189ed345057..a77d58e1b976 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -1871,9 +1871,24 @@ int bus_iommu_probe(struct bus_type *bus)
>>       return ret;
>>   }
>> +static int __iommu_present(struct device *dev, void *unused)
>> +{
>> +    return device_iommu_mapped(dev);
>> +}
>
> /**
>  * device_iommu_mapped - Returns true when the device DMA is translated
>  *                       by an IOMMU
>  * @dev: Device to perform the check on
>  */
> static inline bool device_iommu_mapped(struct device *dev)
> {
>         return (dev->iommu_group != NULL);
> }
>
> Perhaps device_iommu_mapped() should be improved. In some cases, the
> device has an iommu_group filled is not enough to indicate that the
> device has IOMMU hardware for DMA translation.
>
> For example, VFIO could allocate an iommu_group and add a device into
> the iommu_group even there's no IOMMU hardware in
> vfio_noiommu_group_alloc().
>
> Basically iommu_group_add_device() doesn't check the presence of an
> IOMMU.

/**
* iommu_group_add_device [...]
*
* This function is called by an iommu driver [...]
*/

The "check" is inherent in the fact that it's been called at all. VFIO
noiommu *is* an IOMMU driver in the sense that it provides a bare
minimum of IOMMU API functionality (i.e. creating groups), sufficient to
support (careful) usage by VFIO drivers. There would not seem to be a
legitimate reason for some *other* driver to be specifically querying a
device while it is already bound to a VFIO driver (and thus may have a
noiommu group).

In terms of this patch, I'm confident that nobody is using VFIO noiommu
on old Tegra SoCs; I'm even more confident that they wouldn't be doing
it with platform devices; and I'm supremely confident that they're not
loading the GPU drivers while already in the middle of using noiommu
vfio_platform. Basically "not using VFIO noiommu" is one of the inherent
platform-specific assumptions. If anyone else now ignores the first
sentence of the documentation and tries to use iommu_present() somewhere
that assumption might not hold, returning a meaningless wrong answer is
the documented behaviour :)

Cheers,
Robin.

>
>> +
>> +/**
>> + * iommu_present() - make platform-specific assumptions about an IOMMU
>> + * @bus: bus to check
>> + *
>> + * Do not use this function. You want device_iommu_mapped() instead.
>> + *
>> + * Return: true if some IOMMU is present for some device on the given
>> bus. In
>> + * general it may not be the only IOMMU, and it may not be for the
>> device you
>> + * are ultimately interested in.
>> + */
>>   bool iommu_present(struct bus_type *bus)
>>   {
>> -    return bus->iommu_ops != NULL;
>> +    return bus_for_each_dev(bus, NULL, NULL, __iommu_present) > 0;
>>   }
>>   EXPORT_SYMBOL_GPL(iommu_present);
>
> --
> Best regards,
> baolu

2023-01-26 14:41:46

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 1/8] iommu: Decouple iommu_present() from bus ops

On Thu, Jan 26, 2023 at 02:21:29PM +0000, Robin Murphy wrote:

> The "check" is inherent in the fact that it's been called at all. VFIO
> noiommu *is* an IOMMU driver in the sense that it provides a bare minimum of
> IOMMU API functionality (i.e. creating groups), sufficient to support
> (careful) usage by VFIO drivers. There would not seem to be a legitimate
> reason for some *other* driver to be specifically querying a device while it
> is already bound to a VFIO driver (and thus may have a noiommu group).

Yes, the devices that VFIO assigns to its internal groups never leak
outside VFIO's control during their assignment - ie they are
continuously bound to VFIO never another driver.

So no other driver can ever see the internal groups unless it is
messing around with devices it is not bound to :)

Jason

2023-01-27 13:50:39

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH 1/8] iommu: Decouple iommu_present() from bus ops

On 2023/1/26 22:41, Jason Gunthorpe wrote:
> On Thu, Jan 26, 2023 at 02:21:29PM +0000, Robin Murphy wrote:
>
>> The "check" is inherent in the fact that it's been called at all. VFIO
>> noiommu*is* an IOMMU driver in the sense that it provides a bare minimum of
>> IOMMU API functionality (i.e. creating groups), sufficient to support
>> (careful) usage by VFIO drivers. There would not seem to be a legitimate
>> reason for some*other* driver to be specifically querying a device while it
>> is already bound to a VFIO driver (and thus may have a noiommu group).
> Yes, the devices that VFIO assigns to its internal groups never leak
> outside VFIO's control during their assignment - ie they are
> continuously bound to VFIO never another driver.
>
> So no other driver can ever see the internal groups unless it is
> messing around with devices it is not bound to ????

Fair enough. I was thinking that probably we could make it like below:

/**
* device_iommu_mapped - Returns true when the device DMA is translated
* by an IOMMU
* @dev: Device to perform the check on
*/
static inline bool device_iommu_mapped(struct device *dev)
{
return (dev->iommu && dev->iommu->iommu_dev);
}

The iommu probe device code guarantees that dev->iommu->iommu_dev is
valid only after the IOMMU driver's .probe_device returned successfully.

Any thoughts?

Best regards,
baolu

2023-01-27 14:02:02

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 1/8] iommu: Decouple iommu_present() from bus ops

On Fri, Jan 27, 2023 at 09:50:29PM +0800, Baolu Lu wrote:
> On 2023/1/26 22:41, Jason Gunthorpe wrote:
> > On Thu, Jan 26, 2023 at 02:21:29PM +0000, Robin Murphy wrote:
> >
> > > The "check" is inherent in the fact that it's been called at all. VFIO
> > > noiommu*is* an IOMMU driver in the sense that it provides a bare minimum of
> > > IOMMU API functionality (i.e. creating groups), sufficient to support
> > > (careful) usage by VFIO drivers. There would not seem to be a legitimate
> > > reason for some*other* driver to be specifically querying a device while it
> > > is already bound to a VFIO driver (and thus may have a noiommu group).
> > Yes, the devices that VFIO assigns to its internal groups never leak
> > outside VFIO's control during their assignment - ie they are
> > continuously bound to VFIO never another driver.
> >
> > So no other driver can ever see the internal groups unless it is
> > messing around with devices it is not bound to ????
>
> Fair enough. I was thinking that probably we could make it like below:
>
> /**
> * device_iommu_mapped - Returns true when the device DMA is translated
> * by an IOMMU
> * @dev: Device to perform the check on
> */
> static inline bool device_iommu_mapped(struct device *dev)
> {
> return (dev->iommu && dev->iommu->iommu_dev);
> }
>
> The iommu probe device code guarantees that dev->iommu->iommu_dev is
> valid only after the IOMMU driver's .probe_device returned successfully.
>
> Any thoughts?

I find the above much clearer if it can work

Jason

2023-01-27 15:20:12

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH 1/8] iommu: Decouple iommu_present() from bus ops

On 2023-01-27 13:50, Baolu Lu wrote:
> On 2023/1/26 22:41, Jason Gunthorpe wrote:
>> On Thu, Jan 26, 2023 at 02:21:29PM +0000, Robin Murphy wrote:
>>
>>> The "check" is inherent in the fact that it's been called at all. VFIO
>>> noiommu*is*  an IOMMU driver in the sense that it provides a bare
>>> minimum of
>>> IOMMU API functionality (i.e. creating groups), sufficient to support
>>> (careful) usage by VFIO drivers. There would not seem to be a legitimate
>>> reason for some*other*  driver to be specifically querying a device
>>> while it
>>> is already bound to a VFIO driver (and thus may have a noiommu group).
>> Yes, the devices that VFIO assigns to its internal groups never leak
>> outside VFIO's control during their assignment - ie they are
>> continuously bound to VFIO never another driver.
>>
>> So no other driver can ever see the internal groups unless it is
>> messing around with devices it is not bound to ????
>
> Fair enough. I was thinking that probably we could make it like below:
>
> /**
>  * device_iommu_mapped - Returns true when the device DMA is translated
>  *                       by an IOMMU
>  * @dev: Device to perform the check on
>  */
> static inline bool device_iommu_mapped(struct device *dev)
> {
>         return (dev->iommu && dev->iommu->iommu_dev);
> }
>
> The iommu probe device code guarantees that dev->iommu->iommu_dev is
> valid only after the IOMMU driver's .probe_device returned successfully.
>
> Any thoughts?

Heh, I actually wrote that helper yesterday for v2, but as an internal
check for valid ops :)

The current implementation of device_iommu_mapped() just dates back to
when dev->iommu_group was the only per-device thing we had, so in
principle I don't have any conceptual objection to redefining it in
terms of "device has ops" rather than "device has a group", but as
things stand you'd still have to do something about PPC first (I know
Jason had been pushing on that, but I've not kept track of where it got to).

Thanks,
Robin.

2023-01-27 15:43:42

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 1/8] iommu: Decouple iommu_present() from bus ops

On Fri, Jan 27, 2023 at 03:19:55PM +0000, Robin Murphy wrote:

> The current implementation of device_iommu_mapped() just dates back to when
> dev->iommu_group was the only per-device thing we had, so in principle I
> don't have any conceptual objection to redefining it in terms of "device has
> ops" rather than "device has a group", but as things stand you'd still have
> to do something about PPC first (I know Jason had been pushing on that, but
> I've not kept track of where it got to).

PPC hasn't moved at all, AFAICT. In a few more months I'm going to
suggest we delete the special VFIO support due to it being broken,
distros already having turned it off and nobody caring enough to fix
it..

What does device_iommu_mapped() even really mean?

Looking at usages..

These are fixing SOC HW bugs/issues - the desire seems to be "is the SOC's
IOMMU enabled"

drivers/char/agp/intel-gtt.c: device_iommu_mapped(&intel_private.pcidev->dev));
drivers/dma/sh/rcar-dmac.c: if (device_iommu_mapped(&pdev->dev))
drivers/gpu/drm/i915/i915_utils.c: if (device_iommu_mapped(i915->drm.dev))
?
drivers/usb/dwc3/dwc3-xilinx.c: if (of_dma_is_coherent(dev->of_node) || device_iommu_mapped(dev)) {
drivers/usb/host/xhci.c: if (!(xhci->quirks & XHCI_ZERO_64B_REGS) || !device_iommu_mapped(dev))
drivers/crypto/qat/qat_common/adf_sriov.c: if (!device_iommu_mapped(&pdev->dev))
?

These seem to be trying to decide if iommu_domain's can be used (and
they can't be on power):

drivers/gpu/drm/msm/msm_drv.c: if (device_iommu_mapped(mdp_dev))
drivers/gpu/drm/msm/msm_drv.c: device_iommu_mapped(dev->dev) ||
drivers/gpu/drm/msm/msm_drv.c: device_iommu_mapped(dev->dev->parent);
drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c: if (device_iommu_mapped(dev)) {
drivers/gpu/drm/rockchip/rockchip_drm_drv.c: if (!device_iommu_mapped(dev))
drivers/gpu/drm/tegra/uapi.c: if (device_iommu_mapped(client->base.dev) && client->ops->can_use_memory_ctx) {
drivers/gpu/host1x/context.c: if (!fwspec || !device_iommu_mapped(&ctx->dev)) {
drivers/infiniband/hw/usnic/usnic_ib_main.c: if (!device_iommu_mapped(&pdev->dev)) {

Yikes, trying to map DMA addresses programmed into devices back to CPU addresses:

drivers/misc/habanalabs/common/debugfs.c: if (!user_address || device_iommu_mapped(&hdev->pdev->dev)) {
drivers/misc/habanalabs/gaudi2/gaudi2.c: if (!device_iommu_mapped(&hdev->pdev->dev))

And then sequencing the call to iommu_probe_device() which doesn't
apply to power:

drivers/acpi/scan.c: if (!err && dev->bus && !device_iommu_mapped(dev))
drivers/iommu/of_iommu.c: if (!err && dev->bus && !device_iommu_mapped(dev))

Leaving these:

arch/powerpc/kernel/eeh.c: if (device_iommu_mapped(dev)) {

This is only used to support eeh_iommu_group_to_pe which is only
caleld by vfio_iommu_spapr_tce.c. Since power vfio doesn't work right
now this is uncallable, and when power is fixed this will work
properly.

arch/powerpc/kernel/iommu.c: if (device_iommu_mapped(dev)) {
arch/powerpc/kernel/iommu.c: if (!device_iommu_mapped(dev)) {

These should both be replaced with some kind of 'device has iommu group', since
it is really driving ppc unique group logic.

So, I'd say Baolu's approach is the right thing, just replace the
above two in ppc with something else.

Jason

2023-01-28 08:49:23

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH 1/8] iommu: Decouple iommu_present() from bus ops

On 2023/1/27 23:43, Jason Gunthorpe wrote:
> On Fri, Jan 27, 2023 at 03:19:55PM +0000, Robin Murphy wrote:
>
>> The current implementation of device_iommu_mapped() just dates back to when
>> dev->iommu_group was the only per-device thing we had, so in principle I
>> don't have any conceptual objection to redefining it in terms of "device has
>> ops" rather than "device has a group", but as things stand you'd still have
>> to do something about PPC first (I know Jason had been pushing on that, but
>> I've not kept track of where it got to).
> PPC hasn't moved at all, AFAICT. In a few more months I'm going to
> suggest we delete the special VFIO support due to it being broken,
> distros already having turned it off and nobody caring enough to fix
> it..
>
> What does device_iommu_mapped() even really mean?
>
> Looking at usages..
>
> These are fixing SOC HW bugs/issues - the desire seems to be "is the SOC's
> IOMMU enabled"
>
> drivers/char/agp/intel-gtt.c: device_iommu_mapped(&intel_private.pcidev->dev));
> drivers/dma/sh/rcar-dmac.c: if (device_iommu_mapped(&pdev->dev))
> drivers/gpu/drm/i915/i915_utils.c: if (device_iommu_mapped(i915->drm.dev))
> ?
> drivers/usb/dwc3/dwc3-xilinx.c: if (of_dma_is_coherent(dev->of_node) || device_iommu_mapped(dev)) {
> drivers/usb/host/xhci.c: if (!(xhci->quirks & XHCI_ZERO_64B_REGS) || !device_iommu_mapped(dev))
> drivers/crypto/qat/qat_common/adf_sriov.c: if (!device_iommu_mapped(&pdev->dev))
> ?
>
> These seem to be trying to decide if iommu_domain's can be used (and
> they can't be on power):
>
> drivers/gpu/drm/msm/msm_drv.c: if (device_iommu_mapped(mdp_dev))
> drivers/gpu/drm/msm/msm_drv.c: device_iommu_mapped(dev->dev) ||
> drivers/gpu/drm/msm/msm_drv.c: device_iommu_mapped(dev->dev->parent);
> drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c: if (device_iommu_mapped(dev)) {
> drivers/gpu/drm/rockchip/rockchip_drm_drv.c: if (!device_iommu_mapped(dev))
> drivers/gpu/drm/tegra/uapi.c: if (device_iommu_mapped(client->base.dev) && client->ops->can_use_memory_ctx) {
> drivers/gpu/host1x/context.c: if (!fwspec || !device_iommu_mapped(&ctx->dev)) {
> drivers/infiniband/hw/usnic/usnic_ib_main.c: if (!device_iommu_mapped(&pdev->dev)) {
>
> Yikes, trying to map DMA addresses programmed into devices back to CPU addresses:
>
> drivers/misc/habanalabs/common/debugfs.c: if (!user_address || device_iommu_mapped(&hdev->pdev->dev)) {
> drivers/misc/habanalabs/gaudi2/gaudi2.c: if (!device_iommu_mapped(&hdev->pdev->dev))
>
> And then sequencing the call to iommu_probe_device() which doesn't
> apply to power:
>
> drivers/acpi/scan.c: if (!err && dev->bus && !device_iommu_mapped(dev))
> drivers/iommu/of_iommu.c: if (!err && dev->bus && !device_iommu_mapped(dev))
>
> Leaving these:
>
> arch/powerpc/kernel/eeh.c: if (device_iommu_mapped(dev)) {
>
> This is only used to support eeh_iommu_group_to_pe which is only
> caleld by vfio_iommu_spapr_tce.c. Since power vfio doesn't work right
> now this is uncallable, and when power is fixed this will work
> properly.
>
> arch/powerpc/kernel/iommu.c: if (device_iommu_mapped(dev)) {
> arch/powerpc/kernel/iommu.c: if (!device_iommu_mapped(dev)) {
>
> These should both be replaced with some kind of 'device has iommu group', since
> it is really driving ppc unique group logic.
>
> So, I'd say Baolu's approach is the right thing, just replace the
> above two in ppc with something else.

Thank you both. I will follow up a series later.

Best regards,
baolu

2023-01-30 13:49:45

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH 1/8] iommu: Decouple iommu_present() from bus ops

On 2023-01-28 08:49, Baolu Lu wrote:
> On 2023/1/27 23:43, Jason Gunthorpe wrote:
>> On Fri, Jan 27, 2023 at 03:19:55PM +0000, Robin Murphy wrote:
>>
>>> The current implementation of device_iommu_mapped() just dates back
>>> to when
>>> dev->iommu_group was the only per-device thing we had, so in principle I
>>> don't have any conceptual objection to redefining it in terms of
>>> "device has
>>> ops" rather than "device has a group", but as things stand you'd
>>> still have
>>> to do something about PPC first (I know Jason had been pushing on
>>> that, but
>>> I've not kept track of where it got to).
>> PPC hasn't moved at all, AFAICT. In a few more months I'm going to
>> suggest we delete the special VFIO support due to it being broken,
>> distros already having turned it off and nobody caring enough to fix
>> it..
>>
>> What does device_iommu_mapped() even really mean?
>>
>> Looking at usages..
>>
>> These are fixing SOC HW bugs/issues - the desire seems to be "is the
>> SOC's
>> IOMMU enabled"
>>
>> drivers/char/agp/intel-gtt.c:
>> device_iommu_mapped(&intel_private.pcidev->dev));
>> drivers/dma/sh/rcar-dmac.c:     if (device_iommu_mapped(&pdev->dev))
>> drivers/gpu/drm/i915/i915_utils.c:      if
>> (device_iommu_mapped(i915->drm.dev))
>> ?
>> drivers/usb/dwc3/dwc3-xilinx.c: if (of_dma_is_coherent(dev->of_node)
>> || device_iommu_mapped(dev)) {
>> drivers/usb/host/xhci.c:        if (!(xhci->quirks &
>> XHCI_ZERO_64B_REGS) || !device_iommu_mapped(dev))
>> drivers/crypto/qat/qat_common/adf_sriov.c:      if
>> (!device_iommu_mapped(&pdev->dev))
>> ?
>>
>> These seem to be trying to decide if iommu_domain's can be used (and
>> they can't be on power):
>>
>> drivers/gpu/drm/msm/msm_drv.c:  if (device_iommu_mapped(mdp_dev))
>> drivers/gpu/drm/msm/msm_drv.c:          device_iommu_mapped(dev->dev) ||
>> drivers/gpu/drm/msm/msm_drv.c:
>> device_iommu_mapped(dev->dev->parent);
>> drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c:     if
>> (device_iommu_mapped(dev)) {
>> drivers/gpu/drm/rockchip/rockchip_drm_drv.c:    if
>> (!device_iommu_mapped(dev))
>> drivers/gpu/drm/tegra/uapi.c:   if
>> (device_iommu_mapped(client->base.dev) &&
>> client->ops->can_use_memory_ctx) {
>> drivers/gpu/host1x/context.c:           if (!fwspec ||
>> !device_iommu_mapped(&ctx->dev)) {
>> drivers/infiniband/hw/usnic/usnic_ib_main.c:    if
>> (!device_iommu_mapped(&pdev->dev)) {
>>
>> Yikes, trying to map DMA addresses programmed into devices back to CPU
>> addresses:
>>
>> drivers/misc/habanalabs/common/debugfs.c: if (!user_address ||
>> device_iommu_mapped(&hdev->pdev->dev)) {
>> drivers/misc/habanalabs/gaudi2/gaudi2.c:                if
>> (!device_iommu_mapped(&hdev->pdev->dev))
>>
>> And then sequencing the call to iommu_probe_device() which doesn't
>> apply to power:
>>
>> drivers/acpi/scan.c:    if (!err && dev->bus &&
>> !device_iommu_mapped(dev))
>> drivers/iommu/of_iommu.c:       if (!err && dev->bus &&
>> !device_iommu_mapped(dev))
>>
>> Leaving these:
>>
>> arch/powerpc/kernel/eeh.c:      if (device_iommu_mapped(dev)) {
>>
>> This is only used to support eeh_iommu_group_to_pe which is only
>> caleld by vfio_iommu_spapr_tce.c. Since power vfio doesn't work right
>> now this is uncallable, and when power is fixed this will work
>> properly.

Oh wow, I should have looked at more context... Even better, this one is
already just an elaborate "if (true)" - it has been impossible for
dev_has_iommu_table() to return 0 since at least 2015 :D
>> arch/powerpc/kernel/iommu.c:    if (device_iommu_mapped(dev)) {
>> arch/powerpc/kernel/iommu.c:    if (!device_iommu_mapped(dev)) {
>>
>> These should both be replaced with some kind of 'device has iommu
>> group', since
>> it is really driving ppc unique group logic.

And in fact those appear to mostly serve for printing debug messages; if
we made iommu_group_add_device() return -EBUSY for duplicate calls (not
necessarily a bad idea anyway vs. relying on the noisy failure of
sysfs_create_link()), they could arguably just go.

All in all, it's only actually the habanalabs ones that I'm slightly
wary of, since they're effectively (mis)using device_iommu_mapped() to
infer the DMA ops implementation, which could potentially go wrong (or
at least *more* wrong) on POWER with this change. I guess the saving
grace is that although they are available on PCIe-interfaced modules,
the userspace driver stack seems to be implicitly x86_64-only - as far
as I could tell from a quick poke around their site and documentation,
which doesn't appear to acknowledge the concept of CPU architectures at
all - so the chances of anyone actually trying to use the kernel drivers
in anger on POWER seem minimal.

Thanks,
Robin.

>> So, I'd say Baolu's approach is the right thing, just replace the
>> above two in ppc with something else.
>
> Thank you both. I will follow up a series later.
>
> Best regards,
> baolu

2023-01-30 13:53:56

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 1/8] iommu: Decouple iommu_present() from bus ops

On Mon, Jan 30, 2023 at 01:49:20PM +0000, Robin Murphy wrote:

> All in all, it's only actually the habanalabs ones that I'm slightly wary
> of, since they're effectively (mis)using device_iommu_mapped() to infer the
> DMA ops implementation, which could potentially go wrong (or at least *more*
> wrong) on POWER with this change. I guess the saving grace is that
> although

IMHO habana is not using the DMA API abstraction properly. If it
doesn't work on some archs is their bugs to deal with - we don't need
to complexify the core code to tiptoe around around such an abuse in
an obscure driver.

Jason

2023-01-30 14:25:35

by Oded Gabbay

[permalink] [raw]
Subject: Re: [PATCH 1/8] iommu: Decouple iommu_present() from bus ops

On Mon, Jan 30, 2023 at 3:53 PM Jason Gunthorpe <[email protected]> wrote:
>
> On Mon, Jan 30, 2023 at 01:49:20PM +0000, Robin Murphy wrote:
>
> > All in all, it's only actually the habanalabs ones that I'm slightly wary
> > of, since they're effectively (mis)using device_iommu_mapped() to infer the
> > DMA ops implementation, which could potentially go wrong (or at least *more*
> > wrong) on POWER with this change. I guess the saving grace is that
> > although
>
> IMHO habana is not using the DMA API abstraction properly. If it
> doesn't work on some archs is their bugs to deal with - we don't need
> to complexify the core code to tiptoe around around such an abuse in
> an obscure driver.
>
> Jason
Agreed, feel free to change the kapi as you see fit. Do the right
thing for the kernel.
In any case, we limit ourselves to x86-64 arch in the 6.3 merge cycle.

Oded