2020-04-29 13:43:42

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH v3 00/34] iommu: Move iommu_group setup to IOMMU core code

Hi,

here is the third version of this patch-set. Older versions can be found
here:

v1: https://lore.kernel.org/lkml/[email protected]/
(Has some more introductory text)

v2: https://lore.kernel.org/lkml/[email protected]/

Changes v2 -> v3:

* Rebased v5.7-rc3

* Added a missing iommu_group_put() as reported by Lu Baolu.

* Added a patch to consolidate more initialization work in
__iommu_probe_device(), fixing a bug where no 'struct
device_iommu' was allocated in the hotplug path.

There is also a git-branch available with these patches applied:

https://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git/log/?h=iommu-probe-device-v3

Please review. If there are no objections I plan to put these patches
into the IOMMU tree early next week.

Thanks,

Joerg

Joerg Roedel (33):
iommu: Move default domain allocation to separate function
iommu/amd: Implement iommu_ops->def_domain_type call-back
iommu/vt-d: Wire up iommu_ops->def_domain_type
iommu/amd: Remove dma_mask check from check_device()
iommu/amd: Return -ENODEV in add_device when device is not handled by
IOMMU
iommu: Add probe_device() and release_device() call-backs
iommu: Move default domain allocation to iommu_probe_device()
iommu: Keep a list of allocated groups in __iommu_probe_device()
iommu: Move new probe_device path to separate function
iommu: Split off default domain allocation from group assignment
iommu: Move iommu_group_create_direct_mappings() out of
iommu_group_add_device()
iommu: Export bus_iommu_probe() and make is safe for re-probing
iommu/amd: Remove dev_data->passthrough
iommu/amd: Convert to probe/release_device() call-backs
iommu/vt-d: Convert to probe/release_device() call-backs
iommu/arm-smmu: Convert to probe/release_device() call-backs
iommu/pamu: Convert to probe/release_device() call-backs
iommu/s390: Convert to probe/release_device() call-backs
iommu/virtio: Convert to probe/release_device() call-backs
iommu/msm: Convert to probe/release_device() call-backs
iommu/mediatek: Convert to probe/release_device() call-backs
iommu/mediatek-v1 Convert to probe/release_device() call-backs
iommu/qcom: Convert to probe/release_device() call-backs
iommu/rockchip: Convert to probe/release_device() call-backs
iommu/tegra: Convert to probe/release_device() call-backs
iommu/renesas: Convert to probe/release_device() call-backs
iommu/omap: Remove orphan_dev tracking
iommu/omap: Convert to probe/release_device() call-backs
iommu/exynos: Use first SYSMMU in controllers list for IOMMU core
iommu/exynos: Convert to probe/release_device() call-backs
iommu: Remove add_device()/remove_device() code-paths
iommu: Move more initialization to __iommu_probe_device()
iommu: Unexport iommu_group_get_for_dev()

Sai Praneeth Prakhya (1):
iommu: Add def_domain_type() callback in iommu_ops

drivers/iommu/amd_iommu.c | 97 ++++----
drivers/iommu/amd_iommu_types.h | 1 -
drivers/iommu/arm-smmu-v3.c | 38 +---
drivers/iommu/arm-smmu.c | 39 ++--
drivers/iommu/exynos-iommu.c | 24 +-
drivers/iommu/fsl_pamu_domain.c | 22 +-
drivers/iommu/intel-iommu.c | 68 +-----
drivers/iommu/iommu.c | 387 +++++++++++++++++++++++++-------
drivers/iommu/ipmmu-vmsa.c | 60 ++---
drivers/iommu/msm_iommu.c | 34 +--
drivers/iommu/mtk_iommu.c | 24 +-
drivers/iommu/mtk_iommu_v1.c | 50 ++---
drivers/iommu/omap-iommu.c | 99 ++------
drivers/iommu/qcom_iommu.c | 24 +-
drivers/iommu/rockchip-iommu.c | 26 +--
drivers/iommu/s390-iommu.c | 22 +-
drivers/iommu/tegra-gart.c | 24 +-
drivers/iommu/tegra-smmu.c | 31 +--
drivers/iommu/virtio-iommu.c | 41 +---
include/linux/iommu.h | 21 +-
20 files changed, 531 insertions(+), 601 deletions(-)

--
2.17.1


2020-04-29 13:43:56

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH v3 02/34] iommu: Add def_domain_type() callback in iommu_ops

From: Sai Praneeth Prakhya <[email protected]>

Some devices are reqired to use a specific type (identity or dma)
of default domain when they are used with a vendor iommu. When the
system level default domain type is different from it, the vendor
iommu driver has to request a new default domain with
iommu_request_dma_domain_for_dev() and iommu_request_dm_for_dev()
in the add_dev() callback. Unfortunately, these two helpers only
work when the group hasn't been assigned to any other devices,
hence, some vendor iommu driver has to use a private domain if
it fails to request a new default one.

This adds def_domain_type() callback in the iommu_ops, so that
any special requirement of default domain for a device could be
aware by the iommu generic layer.

Signed-off-by: Sai Praneeth Prakhya <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
[ [email protected]: Added iommu_get_def_domain_type() function and use
it to allocate the default domain ]
Co-developed-by: Joerg Roedel <[email protected]>
Tested-by: Marek Szyprowski <[email protected]>
Acked-by: Marek Szyprowski <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>
---
drivers/iommu/iommu.c | 20 +++++++++++++++++---
include/linux/iommu.h | 6 ++++++
2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index bfe011760ed1..5877abd9b693 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1361,21 +1361,35 @@ struct iommu_group *fsl_mc_device_group(struct device *dev)
}
EXPORT_SYMBOL_GPL(fsl_mc_device_group);

+static int iommu_get_def_domain_type(struct device *dev)
+{
+ const struct iommu_ops *ops = dev->bus->iommu_ops;
+ unsigned int type = 0;
+
+ if (ops->def_domain_type)
+ type = ops->def_domain_type(dev);
+
+ return (type == 0) ? iommu_def_domain_type : type;
+}
+
static int iommu_alloc_default_domain(struct device *dev,
struct iommu_group *group)
{
struct iommu_domain *dom;
+ unsigned int type;

if (group->default_domain)
return 0;

- dom = __iommu_domain_alloc(dev->bus, iommu_def_domain_type);
- if (!dom && iommu_def_domain_type != IOMMU_DOMAIN_DMA) {
+ type = iommu_get_def_domain_type(dev);
+
+ dom = __iommu_domain_alloc(dev->bus, type);
+ if (!dom && type != IOMMU_DOMAIN_DMA) {
dom = __iommu_domain_alloc(dev->bus, IOMMU_DOMAIN_DMA);
if (dom) {
dev_warn(dev,
"failed to allocate default IOMMU domain of type %u; falling back to IOMMU_DOMAIN_DMA",
- iommu_def_domain_type);
+ type);
}
}

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 7ef8b0bda695..1f027b07e499 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -248,6 +248,10 @@ struct iommu_iotlb_gather {
* @cache_invalidate: invalidate translation caches
* @sva_bind_gpasid: bind guest pasid and mm
* @sva_unbind_gpasid: unbind guest pasid and mm
+ * @def_domain_type: device default domain type, return value:
+ * - IOMMU_DOMAIN_IDENTITY: must use an identity domain
+ * - IOMMU_DOMAIN_DMA: must use a dma domain
+ * - 0: use the default setting
* @pgsize_bitmap: bitmap of all possible supported page sizes
* @owner: Driver module providing these ops
*/
@@ -318,6 +322,8 @@ struct iommu_ops {

int (*sva_unbind_gpasid)(struct device *dev, int pasid);

+ int (*def_domain_type)(struct device *dev);
+
unsigned long pgsize_bitmap;
struct module *owner;
};
--
2.17.1

2020-04-29 13:43:57

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH v3 04/34] iommu/vt-d: Wire up iommu_ops->def_domain_type

From: Joerg Roedel <[email protected]>

The Intel VT-d driver already has a matching function to determine the
default domain type for a device. Wire it up in intel_iommu_ops.

Signed-off-by: Joerg Roedel <[email protected]>
---
drivers/iommu/intel-iommu.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index ef0a5246700e..b9f905a55dda 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -6209,6 +6209,7 @@ const struct iommu_ops intel_iommu_ops = {
.dev_enable_feat = intel_iommu_dev_enable_feat,
.dev_disable_feat = intel_iommu_dev_disable_feat,
.is_attach_deferred = intel_iommu_is_attach_deferred,
+ .def_domain_type = device_def_domain_type,
.pgsize_bitmap = INTEL_IOMMU_PGSIZES,
};

--
2.17.1

2020-04-29 13:44:21

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH v3 17/34] iommu/arm-smmu: Convert to probe/release_device() call-backs

From: Joerg Roedel <[email protected]>

Convert the arm-smmu and arm-smmu-v3 drivers to use the probe_device() and
release_device() call-backs of iommu_ops, so that the iommu core code does the
group and sysfs setup.

Signed-off-by: Joerg Roedel <[email protected]>
---
drivers/iommu/arm-smmu-v3.c | 38 ++++++++++--------------------------
drivers/iommu/arm-smmu.c | 39 ++++++++++++++-----------------------
2 files changed, 25 insertions(+), 52 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 82508730feb7..42e1ee7e5197 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2914,27 +2914,26 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid)

static struct iommu_ops arm_smmu_ops;

-static int arm_smmu_add_device(struct device *dev)
+static struct iommu_device *arm_smmu_probe_device(struct device *dev)
{
int i, ret;
struct arm_smmu_device *smmu;
struct arm_smmu_master *master;
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
- struct iommu_group *group;

if (!fwspec || fwspec->ops != &arm_smmu_ops)
- return -ENODEV;
+ return ERR_PTR(-ENODEV);

if (WARN_ON_ONCE(dev_iommu_priv_get(dev)))
- return -EBUSY;
+ return ERR_PTR(-EBUSY);

smmu = arm_smmu_get_by_fwnode(fwspec->iommu_fwnode);
if (!smmu)
- return -ENODEV;
+ return ERR_PTR(-ENODEV);

master = kzalloc(sizeof(*master), GFP_KERNEL);
if (!master)
- return -ENOMEM;
+ return ERR_PTR(-ENOMEM);

master->dev = dev;
master->smmu = smmu;
@@ -2975,30 +2974,15 @@ static int arm_smmu_add_device(struct device *dev)
master->ssid_bits = min_t(u8, master->ssid_bits,
CTXDESC_LINEAR_CDMAX);

- ret = iommu_device_link(&smmu->iommu, dev);
- if (ret)
- goto err_disable_pasid;
+ return &smmu->iommu;

- group = iommu_group_get_for_dev(dev);
- if (IS_ERR(group)) {
- ret = PTR_ERR(group);
- goto err_unlink;
- }
-
- iommu_group_put(group);
- return 0;
-
-err_unlink:
- iommu_device_unlink(&smmu->iommu, dev);
-err_disable_pasid:
- arm_smmu_disable_pasid(master);
err_free_master:
kfree(master);
dev_iommu_priv_set(dev, NULL);
- return ret;
+ return ERR_PTR(ret);
}

-static void arm_smmu_remove_device(struct device *dev)
+static void arm_smmu_release_device(struct device *dev)
{
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct arm_smmu_master *master;
@@ -3010,8 +2994,6 @@ static void arm_smmu_remove_device(struct device *dev)
master = dev_iommu_priv_get(dev);
smmu = master->smmu;
arm_smmu_detach_dev(master);
- iommu_group_remove_device(dev);
- iommu_device_unlink(&smmu->iommu, dev);
arm_smmu_disable_pasid(master);
kfree(master);
iommu_fwspec_free(dev);
@@ -3138,8 +3120,8 @@ static struct iommu_ops arm_smmu_ops = {
.flush_iotlb_all = arm_smmu_flush_iotlb_all,
.iotlb_sync = arm_smmu_iotlb_sync,
.iova_to_phys = arm_smmu_iova_to_phys,
- .add_device = arm_smmu_add_device,
- .remove_device = arm_smmu_remove_device,
+ .probe_device = arm_smmu_probe_device,
+ .release_device = arm_smmu_release_device,
.device_group = arm_smmu_device_group,
.domain_get_attr = arm_smmu_domain_get_attr,
.domain_set_attr = arm_smmu_domain_set_attr,
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index a6a5796e9c41..e622f4e33379 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -220,7 +220,7 @@ static int arm_smmu_register_legacy_master(struct device *dev,
* With the legacy DT binding in play, we have no guarantees about
* probe order, but then we're also not doing default domains, so we can
* delay setting bus ops until we're sure every possible SMMU is ready,
- * and that way ensure that no add_device() calls get missed.
+ * and that way ensure that no probe_device() calls get missed.
*/
static int arm_smmu_legacy_bus_init(void)
{
@@ -1062,7 +1062,6 @@ static int arm_smmu_master_alloc_smes(struct device *dev)
struct arm_smmu_master_cfg *cfg = dev_iommu_priv_get(dev);
struct arm_smmu_device *smmu = cfg->smmu;
struct arm_smmu_smr *smrs = smmu->smrs;
- struct iommu_group *group;
int i, idx, ret;

mutex_lock(&smmu->stream_map_mutex);
@@ -1090,18 +1089,9 @@ static int arm_smmu_master_alloc_smes(struct device *dev)
cfg->smendx[i] = (s16)idx;
}

- group = iommu_group_get_for_dev(dev);
- if (IS_ERR(group)) {
- ret = PTR_ERR(group);
- goto out_err;
- }
- iommu_group_put(group);
-
/* It worked! Now, poke the actual hardware */
- for_each_cfg_sme(cfg, fwspec, i, idx) {
+ for_each_cfg_sme(cfg, fwspec, i, idx)
arm_smmu_write_sme(smmu, idx);
- smmu->s2crs[idx].group = group;
- }

mutex_unlock(&smmu->stream_map_mutex);
return 0;
@@ -1172,7 +1162,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)

/*
* FIXME: The arch/arm DMA API code tries to attach devices to its own
- * domains between of_xlate() and add_device() - we have no way to cope
+ * domains between of_xlate() and probe_device() - we have no way to cope
* with that, so until ARM gets converted to rely on groups and default
* domains, just say no (but more politely than by dereferencing NULL).
* This should be at least a WARN_ON once that's sorted.
@@ -1382,7 +1372,7 @@ struct arm_smmu_device *arm_smmu_get_by_fwnode(struct fwnode_handle *fwnode)
return dev ? dev_get_drvdata(dev) : NULL;
}

-static int arm_smmu_add_device(struct device *dev)
+static struct iommu_device *arm_smmu_probe_device(struct device *dev)
{
struct arm_smmu_device *smmu = NULL;
struct arm_smmu_master_cfg *cfg;
@@ -1403,7 +1393,7 @@ static int arm_smmu_add_device(struct device *dev)
} else if (fwspec && fwspec->ops == &arm_smmu_ops) {
smmu = arm_smmu_get_by_fwnode(fwspec->iommu_fwnode);
} else {
- return -ENODEV;
+ return ERR_PTR(-ENODEV);
}

ret = -EINVAL;
@@ -1444,21 +1434,19 @@ static int arm_smmu_add_device(struct device *dev)
if (ret)
goto out_cfg_free;

- iommu_device_link(&smmu->iommu, dev);
-
device_link_add(dev, smmu->dev,
DL_FLAG_PM_RUNTIME | DL_FLAG_AUTOREMOVE_SUPPLIER);

- return 0;
+ return &smmu->iommu;

out_cfg_free:
kfree(cfg);
out_free:
iommu_fwspec_free(dev);
- return ret;
+ return ERR_PTR(ret);
}

-static void arm_smmu_remove_device(struct device *dev)
+static void arm_smmu_release_device(struct device *dev)
{
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct arm_smmu_master_cfg *cfg;
@@ -1475,13 +1463,11 @@ static void arm_smmu_remove_device(struct device *dev)
if (ret < 0)
return;

- iommu_device_unlink(&smmu->iommu, dev);
arm_smmu_master_free_smes(cfg, fwspec);

arm_smmu_rpm_put(smmu);

dev_iommu_priv_set(dev, NULL);
- iommu_group_remove_device(dev);
kfree(cfg);
iommu_fwspec_free(dev);
}
@@ -1512,6 +1498,11 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev)
else
group = generic_device_group(dev);

+ /* Remember group for faster lookups */
+ if (!IS_ERR(group))
+ for_each_cfg_sme(cfg, fwspec, i, idx)
+ smmu->s2crs[idx].group = group;
+
return group;
}

@@ -1628,8 +1619,8 @@ static struct iommu_ops arm_smmu_ops = {
.flush_iotlb_all = arm_smmu_flush_iotlb_all,
.iotlb_sync = arm_smmu_iotlb_sync,
.iova_to_phys = arm_smmu_iova_to_phys,
- .add_device = arm_smmu_add_device,
- .remove_device = arm_smmu_remove_device,
+ .probe_device = arm_smmu_probe_device,
+ .release_device = arm_smmu_release_device,
.device_group = arm_smmu_device_group,
.domain_get_attr = arm_smmu_domain_get_attr,
.domain_set_attr = arm_smmu_domain_set_attr,
--
2.17.1

2020-04-29 13:44:22

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH v3 15/34] iommu/amd: Convert to probe/release_device() call-backs

From: Joerg Roedel <[email protected]>

Convert the AMD IOMMU Driver to use the probe_device() and
release_device() call-backs of iommu_ops, so that the iommu core code
does the group and sysfs setup.

Signed-off-by: Joerg Roedel <[email protected]>
---
drivers/iommu/amd_iommu.c | 71 ++++++++++++---------------------------
1 file changed, 22 insertions(+), 49 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 0b4b4faa876d..c30367413683 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -343,21 +343,9 @@ static bool check_device(struct device *dev)
return true;
}

-static void init_iommu_group(struct device *dev)
-{
- struct iommu_group *group;
-
- group = iommu_group_get_for_dev(dev);
- if (IS_ERR(group))
- return;
-
- iommu_group_put(group);
-}
-
static int iommu_init_device(struct device *dev)
{
struct iommu_dev_data *dev_data;
- struct amd_iommu *iommu;
int devid;

if (dev->archdata.iommu)
@@ -367,8 +355,6 @@ static int iommu_init_device(struct device *dev)
if (devid < 0)
return devid;

- iommu = amd_iommu_rlookup_table[devid];
-
dev_data = find_dev_data(devid);
if (!dev_data)
return -ENOMEM;
@@ -391,8 +377,6 @@ static int iommu_init_device(struct device *dev)

dev->archdata.iommu = dev_data;

- iommu_device_link(&iommu->iommu, dev);
-
return 0;
}

@@ -410,7 +394,7 @@ static void iommu_ignore_device(struct device *dev)
setup_aliases(dev);
}

-static void iommu_uninit_device(struct device *dev)
+static void amd_iommu_uninit_device(struct device *dev)
{
struct iommu_dev_data *dev_data;
struct amd_iommu *iommu;
@@ -429,13 +413,6 @@ static void iommu_uninit_device(struct device *dev)
if (dev_data->domain)
detach_device(dev);

- iommu_device_unlink(&iommu->iommu, dev);
-
- iommu_group_remove_device(dev);
-
- /* Remove dma-ops */
- dev->dma_ops = NULL;
-
/*
* We keep dev_data around for unplugged devices and reuse it when the
* device is re-plugged - not doing so would introduce a ton of races.
@@ -2152,55 +2129,50 @@ static void detach_device(struct device *dev)
spin_unlock_irqrestore(&domain->lock, flags);
}

-static int amd_iommu_add_device(struct device *dev)
+static struct iommu_device *amd_iommu_probe_device(struct device *dev)
{
- struct iommu_dev_data *dev_data;
- struct iommu_domain *domain;
+ struct iommu_device *iommu_dev;
struct amd_iommu *iommu;
int ret, devid;

- if (get_dev_data(dev))
- return 0;
-
if (!check_device(dev))
- return -ENODEV;
+ return ERR_PTR(-ENODEV);

devid = get_device_id(dev);
if (devid < 0)
- return devid;
+ return ERR_PTR(devid);

iommu = amd_iommu_rlookup_table[devid];

+ if (get_dev_data(dev))
+ return &iommu->iommu;
+
ret = iommu_init_device(dev);
if (ret) {
if (ret != -ENOTSUPP)
dev_err(dev, "Failed to initialize - trying to proceed anyway\n");
-
+ iommu_dev = ERR_PTR(ret);
iommu_ignore_device(dev);
- dev->dma_ops = NULL;
- goto out;
+ } else {
+ iommu_dev = &iommu->iommu;
}
- init_iommu_group(dev);

- dev_data = get_dev_data(dev);
+ iommu_completion_wait(iommu);

- BUG_ON(!dev_data);
+ return iommu_dev;
+}

- if (dev_data->iommu_v2)
- iommu_request_dm_for_dev(dev);
+static void amd_iommu_probe_finalize(struct device *dev)
+{
+ struct iommu_domain *domain;

/* Domains are initialized for this device - have a look what we ended up with */
domain = iommu_get_domain_for_dev(dev);
if (domain->type == IOMMU_DOMAIN_DMA)
iommu_setup_dma_ops(dev, IOVA_START_PFN << PAGE_SHIFT, 0);
-
-out:
- iommu_completion_wait(iommu);
-
- return 0;
}

-static void amd_iommu_remove_device(struct device *dev)
+static void amd_iommu_release_device(struct device *dev)
{
struct amd_iommu *iommu;
int devid;
@@ -2214,7 +2186,7 @@ static void amd_iommu_remove_device(struct device *dev)

iommu = amd_iommu_rlookup_table[devid];

- iommu_uninit_device(dev);
+ amd_iommu_uninit_device(dev);
iommu_completion_wait(iommu);
}

@@ -2687,8 +2659,9 @@ const struct iommu_ops amd_iommu_ops = {
.map = amd_iommu_map,
.unmap = amd_iommu_unmap,
.iova_to_phys = amd_iommu_iova_to_phys,
- .add_device = amd_iommu_add_device,
- .remove_device = amd_iommu_remove_device,
+ .probe_device = amd_iommu_probe_device,
+ .release_device = amd_iommu_release_device,
+ .probe_finalize = amd_iommu_probe_finalize,
.device_group = amd_iommu_device_group,
.domain_get_attr = amd_iommu_domain_get_attr,
.get_resv_regions = amd_iommu_get_resv_regions,
--
2.17.1

2020-04-29 13:45:00

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH v3 11/34] iommu: Split off default domain allocation from group assignment

From: Joerg Roedel <[email protected]>

When a bus is initialized with iommu-ops, all devices on the bus are
scanned and iommu-groups are allocated for them, and each groups will
also get a default domain allocated.

Until now this happened as soon as the group was created and the first
device added to it. When other devices with different default domain
requirements were added to the group later on, the default domain was
re-allocated, if possible.

This resulted in some back and forth and unnecessary allocations, so
change the flow to defer default domain allocation until all devices
have been added to their respective IOMMU groups.

The default domains are allocated for newly allocated groups after
each device on the bus is handled and was probed by the IOMMU driver.

Tested-by: Marek Szyprowski <[email protected]>
Acked-by: Marek Szyprowski <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>
---
drivers/iommu/iommu.c | 154 +++++++++++++++++++++++++++++++++++++++++-
1 file changed, 151 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 8be047a4808f..7de0e29db333 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -199,7 +199,7 @@ static int __iommu_probe_device(struct device *dev, struct list_head *group_list
dev->iommu->iommu_dev = iommu_dev;

group = iommu_group_get_for_dev(dev);
- if (!IS_ERR(group)) {
+ if (IS_ERR(group)) {
ret = PTR_ERR(group);
goto out_release;
}
@@ -1599,6 +1599,37 @@ static int add_iommu_group(struct device *dev, void *data)
return ret;
}

+static int probe_iommu_group(struct device *dev, void *data)
+{
+ const struct iommu_ops *ops = dev->bus->iommu_ops;
+ struct list_head *group_list = data;
+ int ret;
+
+ if (!dev_iommu_get(dev))
+ return -ENOMEM;
+
+ if (!try_module_get(ops->owner)) {
+ ret = -EINVAL;
+ goto err_free_dev_iommu;
+ }
+
+ ret = __iommu_probe_device(dev, group_list);
+ if (ret)
+ goto err_module_put;
+
+ return 0;
+
+err_module_put:
+ module_put(ops->owner);
+err_free_dev_iommu:
+ dev_iommu_free(dev);
+
+ if (ret == -ENODEV)
+ ret = 0;
+
+ return ret;
+}
+
static int remove_iommu_group(struct device *dev, void *data)
{
iommu_release_device(dev);
@@ -1658,10 +1689,127 @@ static int iommu_bus_notifier(struct notifier_block *nb,
return 0;
}

+struct __group_domain_type {
+ struct device *dev;
+ unsigned int type;
+};
+
+static int probe_get_default_domain_type(struct device *dev, void *data)
+{
+ const struct iommu_ops *ops = dev->bus->iommu_ops;
+ struct __group_domain_type *gtype = data;
+ unsigned int type = 0;
+
+ if (ops->def_domain_type)
+ type = ops->def_domain_type(dev);
+
+ if (type) {
+ if (gtype->type && gtype->type != type) {
+ dev_warn(dev, "Device needs domain type %s, but device %s in the same iommu group requires type %s - using default\n",
+ iommu_domain_type_str(type),
+ dev_name(gtype->dev),
+ iommu_domain_type_str(gtype->type));
+ gtype->type = 0;
+ }
+
+ if (!gtype->dev) {
+ gtype->dev = dev;
+ gtype->type = type;
+ }
+ }
+
+ return 0;
+}
+
+static void probe_alloc_default_domain(struct bus_type *bus,
+ struct iommu_group *group)
+{
+ struct __group_domain_type gtype;
+
+ memset(&gtype, 0, sizeof(gtype));
+
+ /* Ask for default domain requirements of all devices in the group */
+ __iommu_group_for_each_dev(group, &gtype,
+ probe_get_default_domain_type);
+
+ if (!gtype.type)
+ gtype.type = iommu_def_domain_type;
+
+ iommu_group_alloc_default_domain(bus, group, gtype.type);
+}
+
+static int iommu_group_do_dma_attach(struct device *dev, void *data)
+{
+ struct iommu_domain *domain = data;
+ const struct iommu_ops *ops;
+ int ret;
+
+ ret = __iommu_attach_device(domain, dev);
+
+ ops = domain->ops;
+
+ if (ret == 0 && ops->probe_finalize)
+ ops->probe_finalize(dev);
+
+ return ret;
+}
+
+static int __iommu_group_dma_attach(struct iommu_group *group)
+{
+ return __iommu_group_for_each_dev(group, group->default_domain,
+ iommu_group_do_dma_attach);
+}
+
+static int bus_iommu_probe(struct bus_type *bus)
+{
+ const struct iommu_ops *ops = bus->iommu_ops;
+ int ret;
+
+ if (ops->probe_device) {
+ struct iommu_group *group, *next;
+ LIST_HEAD(group_list);
+
+ /*
+ * This code-path does not allocate the default domain when
+ * creating the iommu group, so do it after the groups are
+ * created.
+ */
+ ret = bus_for_each_dev(bus, NULL, &group_list, probe_iommu_group);
+ if (ret)
+ return ret;
+
+ list_for_each_entry_safe(group, next, &group_list, entry) {
+ /* Remove item from the list */
+ list_del_init(&group->entry);
+
+ mutex_lock(&group->mutex);
+
+ /* Try to allocate default domain */
+ probe_alloc_default_domain(bus, group);
+
+ if (!group->default_domain) {
+ mutex_unlock(&group->mutex);
+ continue;
+ }
+
+ ret = __iommu_group_dma_attach(group);
+
+ mutex_unlock(&group->mutex);
+
+ if (ret)
+ break;
+ }
+ } else {
+ ret = bus_for_each_dev(bus, NULL, NULL, add_iommu_group);
+ }
+
+ return ret;
+}
+
static int iommu_bus_init(struct bus_type *bus, const struct iommu_ops *ops)
{
- int err;
struct notifier_block *nb;
+ int err;

nb = kzalloc(sizeof(struct notifier_block), GFP_KERNEL);
if (!nb)
@@ -1673,7 +1821,7 @@ static int iommu_bus_init(struct bus_type *bus, const struct iommu_ops *ops)
if (err)
goto out_free;

- err = bus_for_each_dev(bus, NULL, NULL, add_iommu_group);
+ err = bus_iommu_probe(bus);
if (err)
goto out_err;

--
2.17.1

2020-04-29 13:45:06

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH v3 06/34] iommu/amd: Return -ENODEV in add_device when device is not handled by IOMMU

From: Joerg Roedel <[email protected]>

When check_device() fails on the device, it is not handled by the
IOMMU and amd_iommu_add_device() needs to return -ENODEV.

Signed-off-by: Joerg Roedel <[email protected]>
---
drivers/iommu/amd_iommu.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 504f2db75eda..3e0d27f7622e 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2157,9 +2157,12 @@ static int amd_iommu_add_device(struct device *dev)
struct amd_iommu *iommu;
int ret, devid;

- if (!check_device(dev) || get_dev_data(dev))
+ if (get_dev_data(dev))
return 0;

+ if (!check_device(dev))
+ return -ENODEV;
+
devid = get_device_id(dev);
if (devid < 0)
return devid;
--
2.17.1

2020-04-29 13:45:25

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH v3 05/34] iommu/amd: Remove dma_mask check from check_device()

From: Joerg Roedel <[email protected]>

The check was only needed for the DMA-API implementation in the AMD
IOMMU driver, which no longer exists.

Signed-off-by: Joerg Roedel <[email protected]>
---
drivers/iommu/amd_iommu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 73b4f84cf449..504f2db75eda 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -326,7 +326,7 @@ static bool check_device(struct device *dev)
{
int devid;

- if (!dev || !dev->dma_mask)
+ if (!dev)
return false;

devid = get_device_id(dev);
--
2.17.1

2020-04-30 00:00:05

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v3 04/34] iommu/vt-d: Wire up iommu_ops->def_domain_type

On 2020/4/29 21:36, Joerg Roedel wrote:
> From: Joerg Roedel <[email protected]>
>
> The Intel VT-d driver already has a matching function to determine the
> default domain type for a device. Wire it up in intel_iommu_ops.
>
> Signed-off-by: Joerg Roedel <[email protected]>

Reviewed-by: Lu Baolu <[email protected]>

Best regards,
baolu

> ---
> drivers/iommu/intel-iommu.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index ef0a5246700e..b9f905a55dda 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -6209,6 +6209,7 @@ const struct iommu_ops intel_iommu_ops = {
> .dev_enable_feat = intel_iommu_dev_enable_feat,
> .dev_disable_feat = intel_iommu_dev_disable_feat,
> .is_attach_deferred = intel_iommu_is_attach_deferred,
> + .def_domain_type = device_def_domain_type,
> .pgsize_bitmap = INTEL_IOMMU_PGSIZES,
> };
>
>

2020-05-05 12:40:01

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH v3 00/34] iommu: Move iommu_group setup to IOMMU core code

On Wed, Apr 29, 2020 at 03:36:38PM +0200, Joerg Roedel wrote:
> Please review. If there are no objections I plan to put these patches
> into the IOMMU tree early next week.

Series is now applied.

2020-07-01 00:41:39

by Qian Cai

[permalink] [raw]
Subject: Re: [PATCH v3 00/34] iommu: Move iommu_group setup to IOMMU core code

On Wed, Apr 29, 2020 at 03:36:38PM +0200, Joerg Roedel wrote:
> Hi,
>
> here is the third version of this patch-set. Older versions can be found
> here:
>
> v1: https://lore.kernel.org/lkml/[email protected]/
> (Has some more introductory text)
>
> v2: https://lore.kernel.org/lkml/[email protected]/
>
> Changes v2 -> v3:
>
> * Rebased v5.7-rc3
>
> * Added a missing iommu_group_put() as reported by Lu Baolu.
>
> * Added a patch to consolidate more initialization work in
> __iommu_probe_device(), fixing a bug where no 'struct
> device_iommu' was allocated in the hotplug path.
>
> There is also a git-branch available with these patches applied:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git/log/?h=iommu-probe-device-v3
>
> Please review. If there are no objections I plan to put these patches
> into the IOMMU tree early next week.

Looks like this patchset introduced an use-after-free on arm-smmu-v3.

Reproduced using mlx5,

# echo 1 > /sys/class/net/enp11s0f1np1/device/sriov_numvfs
# echo 0 > /sys/class/net/enp11s0f1np1/device/sriov_numvfs

The .config,
https://github.com/cailca/linux-mm/blob/master/arm64.config

Looking at the free stack,

iommu_release_device->iommu_group_remove_device

was introduced in 07/34 ("iommu: Add probe_device() and release_device()
call-backs").

[ 9426.724641][ T3356] pci 0000:0b:01.2: Removing from iommu group 3
[ 9426.731347][ T3356] ==================================================================
[ 9426.739263][ T3356] BUG: KASAN: use-after-free in __lock_acquire+0x3458/0x4440
__lock_acquire at kernel/locking/lockdep.c:4250
[ 9426.746477][ T3356] Read of size 8 at addr ffff0089df1a6f68 by task bash/3356
[ 9426.753601][ T3356]
[ 9426.755782][ T3356] CPU: 5 PID: 3356 Comm: bash Not tainted 5.8.0-rc3-next-20200630 #2
[ 9426.763687][ T3356] Hardware name: HPE Apollo 70 /C01_APACHE_MB , BIOS L50_5.13_1.11 06/18/2019
[ 9426.774111][ T3356] Call trace:
[ 9426.777245][ T3356] dump_backtrace+0x0/0x398
[ 9426.781593][ T3356] show_stack+0x14/0x20
[ 9426.785596][ T3356] dump_stack+0x140/0x1b8
[ 9426.789772][ T3356] print_address_description.isra.12+0x54/0x4a8
[ 9426.795855][ T3356] kasan_report+0x134/0x1b8
[ 9426.800203][ T3356] __asan_report_load8_noabort+0x2c/0x50
[ 9426.805679][ T3356] __lock_acquire+0x3458/0x4440
[ 9426.810373][ T3356] lock_acquire+0x204/0xf10
[ 9426.814722][ T3356] _raw_spin_lock_irqsave+0xf8/0x180
[ 9426.819853][ T3356] arm_smmu_detach_dev+0xd8/0x4a0
arm_smmu_detach_dev at drivers/iommu/arm-smmu-v3.c:2776
[ 9426.824721][ T3356] arm_smmu_release_device+0xb4/0x1c8
arm_smmu_disable_pasid at drivers/iommu/arm-smmu-v3.c:2754
(inlined by) arm_smmu_release_device at drivers/iommu/arm-smmu-v3.c:3000
[ 9426.829937][ T3356] iommu_release_device+0xc0/0x178
iommu_release_device at drivers/iommu/iommu.c:302
[ 9426.834892][ T3356] iommu_bus_notifier+0x118/0x160
[ 9426.839762][ T3356] notifier_call_chain+0xa4/0x128
[ 9426.844630][ T3356] __blocking_notifier_call_chain+0x70/0xa8
[ 9426.850367][ T3356] blocking_notifier_call_chain+0x14/0x20
[ 9426.855929][ T3356] device_del+0x618/0xa00
[ 9426.860105][ T3356] pci_remove_bus_device+0x108/0x2d8
[ 9426.865233][ T3356] pci_stop_and_remove_bus_device+0x1c/0x28
[ 9426.870972][ T3356] pci_iov_remove_virtfn+0x228/0x368
[ 9426.876100][ T3356] sriov_disable+0x8c/0x348
[ 9426.880447][ T3356] pci_disable_sriov+0x5c/0x70
[ 9426.885117][ T3356] mlx5_core_sriov_configure+0xd8/0x260 [mlx5_core]
[ 9426.891549][ T3356] sriov_numvfs_store+0x240/0x318
[ 9426.896417][ T3356] dev_attr_store+0x38/0x68
[ 9426.900766][ T3356] sysfs_kf_write+0xdc/0x128
[ 9426.905200][ T3356] kernfs_fop_write+0x23c/0x448
[ 9426.909897][ T3356] __vfs_write+0x54/0xe8
[ 9426.913984][ T3356] vfs_write+0x124/0x3f0
[ 9426.918070][ T3356] ksys_write+0xe8/0x1b8
[ 9426.922157][ T3356] __arm64_sys_write+0x68/0x98
[ 9426.926766][ T3356] do_el0_svc+0x124/0x220
[ 9426.930941][ T3356] el0_sync_handler+0x260/0x408
[ 9426.935634][ T3356] el0_sync+0x140/0x180
[ 9426.939633][ T3356]
[ 9426.941810][ T3356] Allocated by task 3356:
[ 9426.945985][ T3356] save_stack+0x24/0x50
[ 9426.949986][ T3356] __kasan_kmalloc.isra.13+0xc4/0xe0
[ 9426.955114][ T3356] kasan_kmalloc+0xc/0x18
[ 9426.959288][ T3356] kmem_cache_alloc_trace+0x1ec/0x318
[ 9426.964503][ T3356] arm_smmu_domain_alloc+0x54/0x148
[ 9426.969545][ T3356] iommu_group_alloc_default_domain+0xc0/0x440
[ 9426.975541][ T3356] iommu_probe_device+0x1c0/0x308
[ 9426.980409][ T3356] iort_iommu_configure+0x434/0x518
[ 9426.985452][ T3356] acpi_dma_configure+0xf0/0x128
[ 9426.990235][ T3356] pci_dma_configure+0x114/0x160
[ 9426.995017][ T3356] really_probe+0x124/0x6d8
[ 9426.999364][ T3356] driver_probe_device+0xc4/0x180
[ 9427.004232][ T3356] __device_attach_driver+0x184/0x1e8
[ 9427.009447][ T3356] bus_for_each_drv+0x114/0x1a0
[ 9427.014142][ T3356] __device_attach+0x19c/0x2a8
[ 9427.018749][ T3356] device_attach+0x10/0x18
[ 9427.023009][ T3356] pci_bus_add_device+0x70/0xf8
[ 9427.027704][ T3356] pci_iov_add_virtfn+0x7b4/0xb40
[ 9427.032571][ T3356] sriov_enable+0x5c8/0xc30
[ 9427.036918][ T3356] pci_enable_sriov+0x64/0x80
[ 9427.041485][ T3356] mlx5_core_sriov_configure+0x58/0x260 [mlx5_core]
[ 9427.047917][ T3356] sriov_numvfs_store+0x1c0/0x318
[ 9427.052784][ T3356] dev_attr_store+0x38/0x68
[ 9427.057131][ T3356] sysfs_kf_write+0xdc/0x128
[ 9427.061565][ T3356] kernfs_fop_write+0x23c/0x448
[ 9427.066260][ T3356] __vfs_write+0x54/0xe8
[ 9427.070346][ T3356] vfs_write+0x124/0x3f0
[ 9427.074433][ T3356] ksys_write+0xe8/0x1b8
[ 9427.078519][ T3356] __arm64_sys_write+0x68/0x98
[ 9427.083127][ T3356] do_el0_svc+0x124/0x220
[ 9427.087300][ T3356] el0_sync_handler+0x260/0x408
[ 9427.091994][ T3356] el0_sync+0x140/0x180
[ 9427.095992][ T3356]
[ 9427.098168][ T3356] Freed by task 3356:
[ 9427.101995][ T3356] save_stack+0x24/0x50
[ 9427.105996][ T3356] __kasan_slab_free+0x124/0x198
[ 9427.110777][ T3356] kasan_slab_free+0x10/0x18
[ 9427.115210][ T3356] slab_free_freelist_hook+0x110/0x298
[ 9427.120512][ T3356] kfree+0x128/0x668
[ 9427.124252][ T3356] arm_smmu_domain_free+0xf4/0x1a0
[ 9427.129206][ T3356] iommu_group_release+0xec/0x160
[ 9427.134074][ T3356] kobject_put+0xf4/0x238
[ 9427.138247][ T3356] kobject_del+0x110/0x190
[ 9427.142507][ T3356] kobject_put+0x1e4/0x238
[ 9427.146767][ T3356] iommu_group_remove_device+0x394/0x938
[ 9427.152242][ T3356] iommu_release_device+0x9c/0x178
iommu_release_device at drivers/iommu/iommu.c:300
[ 9427.157196][ T3356] iommu_bus_notifier+0x118/0x160
[ 9427.162065][ T3356] notifier_call_chain+0xa4/0x128
[ 9427.166934][ T3356] __blocking_notifier_call_chain+0x70/0xa8
[ 9427.172670][ T3356] blocking_notifier_call_chain+0x14/0x20
[ 9427.178233][ T3356] device_del+0x618/0xa00
[ 9427.182406][ T3356] pci_remove_bus_device+0x108/0x2d8
[ 9427.187535][ T3356] pci_stop_and_remove_bus_device+0x1c/0x28
[ 9427.193271][ T3356] pci_iov_remove_virtfn+0x228/0x368
[ 9427.198399][ T3356] sriov_disable+0x8c/0x348
[ 9427.202746][ T3356] pci_disable_sriov+0x5c/0x70
[ 9427.207398][ T3356] mlx5_core_sriov_configure+0xd8/0x260 [mlx5_core]
[ 9427.213830][ T3356] sriov_numvfs_store+0x240/0x318
[ 9427.218698][ T3356] dev_attr_store+0x38/0x68
[ 9427.223045][ T3356] sysfs_kf_write+0xdc/0x128
[ 9427.227478][ T3356] kernfs_fop_write+0x23c/0x448
[ 9427.232173][ T3356] __vfs_write+0x54/0xe8
[ 9427.236259][ T3356] vfs_write+0x124/0x3f0
[ 9427.240346][ T3356] ksys_write+0xe8/0x1b8
[ 9427.244433][ T3356] __arm64_sys_write+0x68/0x98
[ 9427.249041][ T3356] do_el0_svc+0x124/0x220
[ 9427.253215][ T3356] el0_sync_handler+0x260/0x408
[ 9427.257908][ T3356] el0_sync+0x140/0x180
[ 9427.261907][ T3356]
[ 9427.264084][ T3356] The buggy address belongs to the object at ffff0089df1a6e00
[ 9427.264084][ T3356] which belongs to the cache kmalloc-512 of size 512
[ 9427.277980][ T3356] The buggy address is located 360 bytes inside of
[ 9427.277980][ T3356] 512-byte region [ffff0089df1a6e00, ffff0089df1a7000)
[ 9427.291094][ T3356] The buggy address belongs to the page:
[ 9427.296571][ T3356] page:ffffffe02257c680 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff0089df1a1400
[ 9427.306823][ T3356] flags: 0x7ffff800000200(slab)
[ 9427.311520][ T3356] raw: 007ffff800000200 ffffffe02246b8c8 ffffffe02257ff88 ffff000000320680
[ 9427.319949][ T3356] raw: ffff0089df1a1400 00000000002a000e 00000001ffffffff ffff0089df1a5001
[ 9427.328374][ T3356] page dumped because: kasan: bad access detected
[ 9427.334630][ T3356] page->mem_cgroup:ffff0089df1a5001
[ 9427.339670][ T3356]
[ 9427.341846][ T3356] Memory state around the buggy address:
[ 9427.347322][ T3356] ffff0089df1a6e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 9427.355228][ T3356] ffff0089df1a6e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 9427.363133][ T3356] >ffff0089df1a6f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 9427.371038][ T3356] ^
[ 9427.378337][ T3356] ffff0089df1a6f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 9427.386242][ T3356] ffff0089df1a7000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 9427.394146][ T3356] ==================================================================
[ 9427.402052][ T3356] Disabling lock debugging due to kernel taint

>
> Thanks,
>
> Joerg
>
> Joerg Roedel (33):
> iommu: Move default domain allocation to separate function
> iommu/amd: Implement iommu_ops->def_domain_type call-back
> iommu/vt-d: Wire up iommu_ops->def_domain_type
> iommu/amd: Remove dma_mask check from check_device()
> iommu/amd: Return -ENODEV in add_device when device is not handled by
> IOMMU
> iommu: Add probe_device() and release_device() call-backs
> iommu: Move default domain allocation to iommu_probe_device()
> iommu: Keep a list of allocated groups in __iommu_probe_device()
> iommu: Move new probe_device path to separate function
> iommu: Split off default domain allocation from group assignment
> iommu: Move iommu_group_create_direct_mappings() out of
> iommu_group_add_device()
> iommu: Export bus_iommu_probe() and make is safe for re-probing
> iommu/amd: Remove dev_data->passthrough
> iommu/amd: Convert to probe/release_device() call-backs
> iommu/vt-d: Convert to probe/release_device() call-backs
> iommu/arm-smmu: Convert to probe/release_device() call-backs
> iommu/pamu: Convert to probe/release_device() call-backs
> iommu/s390: Convert to probe/release_device() call-backs
> iommu/virtio: Convert to probe/release_device() call-backs
> iommu/msm: Convert to probe/release_device() call-backs
> iommu/mediatek: Convert to probe/release_device() call-backs
> iommu/mediatek-v1 Convert to probe/release_device() call-backs
> iommu/qcom: Convert to probe/release_device() call-backs
> iommu/rockchip: Convert to probe/release_device() call-backs
> iommu/tegra: Convert to probe/release_device() call-backs
> iommu/renesas: Convert to probe/release_device() call-backs
> iommu/omap: Remove orphan_dev tracking
> iommu/omap: Convert to probe/release_device() call-backs
> iommu/exynos: Use first SYSMMU in controllers list for IOMMU core
> iommu/exynos: Convert to probe/release_device() call-backs
> iommu: Remove add_device()/remove_device() code-paths
> iommu: Move more initialization to __iommu_probe_device()
> iommu: Unexport iommu_group_get_for_dev()
>
> Sai Praneeth Prakhya (1):
> iommu: Add def_domain_type() callback in iommu_ops
>
> drivers/iommu/amd_iommu.c | 97 ++++----
> drivers/iommu/amd_iommu_types.h | 1 -
> drivers/iommu/arm-smmu-v3.c | 38 +---
> drivers/iommu/arm-smmu.c | 39 ++--
> drivers/iommu/exynos-iommu.c | 24 +-
> drivers/iommu/fsl_pamu_domain.c | 22 +-
> drivers/iommu/intel-iommu.c | 68 +-----
> drivers/iommu/iommu.c | 387 +++++++++++++++++++++++++-------
> drivers/iommu/ipmmu-vmsa.c | 60 ++---
> drivers/iommu/msm_iommu.c | 34 +--
> drivers/iommu/mtk_iommu.c | 24 +-
> drivers/iommu/mtk_iommu_v1.c | 50 ++---
> drivers/iommu/omap-iommu.c | 99 ++------
> drivers/iommu/qcom_iommu.c | 24 +-
> drivers/iommu/rockchip-iommu.c | 26 +--
> drivers/iommu/s390-iommu.c | 22 +-
> drivers/iommu/tegra-gart.c | 24 +-
> drivers/iommu/tegra-smmu.c | 31 +--
> drivers/iommu/virtio-iommu.c | 41 +---
> include/linux/iommu.h | 21 +-
> 20 files changed, 531 insertions(+), 601 deletions(-)
>
> --
> 2.17.1
>
> _______________________________________________
> iommu mailing list
> [email protected]
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

2020-07-01 10:56:01

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH v3 00/34] iommu: Move iommu_group setup to IOMMU core code

On 2020-07-01 01:40, Qian Cai wrote:
> Looks like this patchset introduced an use-after-free on arm-smmu-v3.
>
> Reproduced using mlx5,
>
> # echo 1 > /sys/class/net/enp11s0f1np1/device/sriov_numvfs
> # echo 0 > /sys/class/net/enp11s0f1np1/device/sriov_numvfs
>
> The .config,
> https://github.com/cailca/linux-mm/blob/master/arm64.config
>
> Looking at the free stack,
>
> iommu_release_device->iommu_group_remove_device
>
> was introduced in 07/34 ("iommu: Add probe_device() and release_device()
> call-backs").

Right, iommu_group_remove_device can tear down the group and call
->domain_free before the driver has any knowledge of the last device
going away via the ->release_device call.

I guess the question is do we simply flip the call order in
iommu_release_device() so drivers can easily clean up their internal
per-device state first, or do we now want them to be robust against
freeing domains with devices still nominally attached?

Robin.

2020-07-04 00:17:59

by Qian Cai

[permalink] [raw]
Subject: Re: [PATCH v3 00/34] iommu: Move iommu_group setup to IOMMU core code

On Tue, Jun 30, 2020 at 08:40:28PM -0400, Qian Cai wrote:
> On Wed, Apr 29, 2020 at 03:36:38PM +0200, Joerg Roedel wrote:
> > Hi,
> >
> > here is the third version of this patch-set. Older versions can be found
> > here:
> >
> > v1: https://lore.kernel.org/lkml/[email protected]/
> > (Has some more introductory text)
> >
> > v2: https://lore.kernel.org/lkml/[email protected]/
> >
> > Changes v2 -> v3:
> >
> > * Rebased v5.7-rc3
> >
> > * Added a missing iommu_group_put() as reported by Lu Baolu.
> >
> > * Added a patch to consolidate more initialization work in
> > __iommu_probe_device(), fixing a bug where no 'struct
> > device_iommu' was allocated in the hotplug path.
> >
> > There is also a git-branch available with these patches applied:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git/log/?h=iommu-probe-device-v3
> >
> > Please review. If there are no objections I plan to put these patches
> > into the IOMMU tree early next week.
>
> Looks like this patchset introduced an use-after-free on arm-smmu-v3.
>
> Reproduced using mlx5,
>
> # echo 1 > /sys/class/net/enp11s0f1np1/device/sriov_numvfs
> # echo 0 > /sys/class/net/enp11s0f1np1/device/sriov_numvfs
>
> The .config,
> https://github.com/cailca/linux-mm/blob/master/arm64.config
>
> Looking at the free stack,
>
> iommu_release_device->iommu_group_remove_device
>
> was introduced in 07/34 ("iommu: Add probe_device() and release_device()
> call-backs").

FYI, I have just sent a patch to fix this,

https://lore.kernel.org/linux-iommu/[email protected]/

>
> [ 9426.724641][ T3356] pci 0000:0b:01.2: Removing from iommu group 3
> [ 9426.731347][ T3356] ==================================================================
> [ 9426.739263][ T3356] BUG: KASAN: use-after-free in __lock_acquire+0x3458/0x4440
> __lock_acquire at kernel/locking/lockdep.c:4250
> [ 9426.746477][ T3356] Read of size 8 at addr ffff0089df1a6f68 by task bash/3356
> [ 9426.753601][ T3356]
> [ 9426.755782][ T3356] CPU: 5 PID: 3356 Comm: bash Not tainted 5.8.0-rc3-next-20200630 #2
> [ 9426.763687][ T3356] Hardware name: HPE Apollo 70 /C01_APACHE_MB , BIOS L50_5.13_1.11 06/18/2019
> [ 9426.774111][ T3356] Call trace:
> [ 9426.777245][ T3356] dump_backtrace+0x0/0x398
> [ 9426.781593][ T3356] show_stack+0x14/0x20
> [ 9426.785596][ T3356] dump_stack+0x140/0x1b8
> [ 9426.789772][ T3356] print_address_description.isra.12+0x54/0x4a8
> [ 9426.795855][ T3356] kasan_report+0x134/0x1b8
> [ 9426.800203][ T3356] __asan_report_load8_noabort+0x2c/0x50
> [ 9426.805679][ T3356] __lock_acquire+0x3458/0x4440
> [ 9426.810373][ T3356] lock_acquire+0x204/0xf10
> [ 9426.814722][ T3356] _raw_spin_lock_irqsave+0xf8/0x180
> [ 9426.819853][ T3356] arm_smmu_detach_dev+0xd8/0x4a0
> arm_smmu_detach_dev at drivers/iommu/arm-smmu-v3.c:2776
> [ 9426.824721][ T3356] arm_smmu_release_device+0xb4/0x1c8
> arm_smmu_disable_pasid at drivers/iommu/arm-smmu-v3.c:2754
> (inlined by) arm_smmu_release_device at drivers/iommu/arm-smmu-v3.c:3000
> [ 9426.829937][ T3356] iommu_release_device+0xc0/0x178
> iommu_release_device at drivers/iommu/iommu.c:302
> [ 9426.834892][ T3356] iommu_bus_notifier+0x118/0x160
> [ 9426.839762][ T3356] notifier_call_chain+0xa4/0x128
> [ 9426.844630][ T3356] __blocking_notifier_call_chain+0x70/0xa8
> [ 9426.850367][ T3356] blocking_notifier_call_chain+0x14/0x20
> [ 9426.855929][ T3356] device_del+0x618/0xa00
> [ 9426.860105][ T3356] pci_remove_bus_device+0x108/0x2d8
> [ 9426.865233][ T3356] pci_stop_and_remove_bus_device+0x1c/0x28
> [ 9426.870972][ T3356] pci_iov_remove_virtfn+0x228/0x368
> [ 9426.876100][ T3356] sriov_disable+0x8c/0x348
> [ 9426.880447][ T3356] pci_disable_sriov+0x5c/0x70
> [ 9426.885117][ T3356] mlx5_core_sriov_configure+0xd8/0x260 [mlx5_core]
> [ 9426.891549][ T3356] sriov_numvfs_store+0x240/0x318
> [ 9426.896417][ T3356] dev_attr_store+0x38/0x68
> [ 9426.900766][ T3356] sysfs_kf_write+0xdc/0x128
> [ 9426.905200][ T3356] kernfs_fop_write+0x23c/0x448
> [ 9426.909897][ T3356] __vfs_write+0x54/0xe8
> [ 9426.913984][ T3356] vfs_write+0x124/0x3f0
> [ 9426.918070][ T3356] ksys_write+0xe8/0x1b8
> [ 9426.922157][ T3356] __arm64_sys_write+0x68/0x98
> [ 9426.926766][ T3356] do_el0_svc+0x124/0x220
> [ 9426.930941][ T3356] el0_sync_handler+0x260/0x408
> [ 9426.935634][ T3356] el0_sync+0x140/0x180
> [ 9426.939633][ T3356]
> [ 9426.941810][ T3356] Allocated by task 3356:
> [ 9426.945985][ T3356] save_stack+0x24/0x50
> [ 9426.949986][ T3356] __kasan_kmalloc.isra.13+0xc4/0xe0
> [ 9426.955114][ T3356] kasan_kmalloc+0xc/0x18
> [ 9426.959288][ T3356] kmem_cache_alloc_trace+0x1ec/0x318
> [ 9426.964503][ T3356] arm_smmu_domain_alloc+0x54/0x148
> [ 9426.969545][ T3356] iommu_group_alloc_default_domain+0xc0/0x440
> [ 9426.975541][ T3356] iommu_probe_device+0x1c0/0x308
> [ 9426.980409][ T3356] iort_iommu_configure+0x434/0x518
> [ 9426.985452][ T3356] acpi_dma_configure+0xf0/0x128
> [ 9426.990235][ T3356] pci_dma_configure+0x114/0x160
> [ 9426.995017][ T3356] really_probe+0x124/0x6d8
> [ 9426.999364][ T3356] driver_probe_device+0xc4/0x180
> [ 9427.004232][ T3356] __device_attach_driver+0x184/0x1e8
> [ 9427.009447][ T3356] bus_for_each_drv+0x114/0x1a0
> [ 9427.014142][ T3356] __device_attach+0x19c/0x2a8
> [ 9427.018749][ T3356] device_attach+0x10/0x18
> [ 9427.023009][ T3356] pci_bus_add_device+0x70/0xf8
> [ 9427.027704][ T3356] pci_iov_add_virtfn+0x7b4/0xb40
> [ 9427.032571][ T3356] sriov_enable+0x5c8/0xc30
> [ 9427.036918][ T3356] pci_enable_sriov+0x64/0x80
> [ 9427.041485][ T3356] mlx5_core_sriov_configure+0x58/0x260 [mlx5_core]
> [ 9427.047917][ T3356] sriov_numvfs_store+0x1c0/0x318
> [ 9427.052784][ T3356] dev_attr_store+0x38/0x68
> [ 9427.057131][ T3356] sysfs_kf_write+0xdc/0x128
> [ 9427.061565][ T3356] kernfs_fop_write+0x23c/0x448
> [ 9427.066260][ T3356] __vfs_write+0x54/0xe8
> [ 9427.070346][ T3356] vfs_write+0x124/0x3f0
> [ 9427.074433][ T3356] ksys_write+0xe8/0x1b8
> [ 9427.078519][ T3356] __arm64_sys_write+0x68/0x98
> [ 9427.083127][ T3356] do_el0_svc+0x124/0x220
> [ 9427.087300][ T3356] el0_sync_handler+0x260/0x408
> [ 9427.091994][ T3356] el0_sync+0x140/0x180
> [ 9427.095992][ T3356]
> [ 9427.098168][ T3356] Freed by task 3356:
> [ 9427.101995][ T3356] save_stack+0x24/0x50
> [ 9427.105996][ T3356] __kasan_slab_free+0x124/0x198
> [ 9427.110777][ T3356] kasan_slab_free+0x10/0x18
> [ 9427.115210][ T3356] slab_free_freelist_hook+0x110/0x298
> [ 9427.120512][ T3356] kfree+0x128/0x668
> [ 9427.124252][ T3356] arm_smmu_domain_free+0xf4/0x1a0
> [ 9427.129206][ T3356] iommu_group_release+0xec/0x160
> [ 9427.134074][ T3356] kobject_put+0xf4/0x238
> [ 9427.138247][ T3356] kobject_del+0x110/0x190
> [ 9427.142507][ T3356] kobject_put+0x1e4/0x238
> [ 9427.146767][ T3356] iommu_group_remove_device+0x394/0x938
> [ 9427.152242][ T3356] iommu_release_device+0x9c/0x178
> iommu_release_device at drivers/iommu/iommu.c:300
> [ 9427.157196][ T3356] iommu_bus_notifier+0x118/0x160
> [ 9427.162065][ T3356] notifier_call_chain+0xa4/0x128
> [ 9427.166934][ T3356] __blocking_notifier_call_chain+0x70/0xa8
> [ 9427.172670][ T3356] blocking_notifier_call_chain+0x14/0x20
> [ 9427.178233][ T3356] device_del+0x618/0xa00
> [ 9427.182406][ T3356] pci_remove_bus_device+0x108/0x2d8
> [ 9427.187535][ T3356] pci_stop_and_remove_bus_device+0x1c/0x28
> [ 9427.193271][ T3356] pci_iov_remove_virtfn+0x228/0x368
> [ 9427.198399][ T3356] sriov_disable+0x8c/0x348
> [ 9427.202746][ T3356] pci_disable_sriov+0x5c/0x70
> [ 9427.207398][ T3356] mlx5_core_sriov_configure+0xd8/0x260 [mlx5_core]
> [ 9427.213830][ T3356] sriov_numvfs_store+0x240/0x318
> [ 9427.218698][ T3356] dev_attr_store+0x38/0x68
> [ 9427.223045][ T3356] sysfs_kf_write+0xdc/0x128
> [ 9427.227478][ T3356] kernfs_fop_write+0x23c/0x448
> [ 9427.232173][ T3356] __vfs_write+0x54/0xe8
> [ 9427.236259][ T3356] vfs_write+0x124/0x3f0
> [ 9427.240346][ T3356] ksys_write+0xe8/0x1b8
> [ 9427.244433][ T3356] __arm64_sys_write+0x68/0x98
> [ 9427.249041][ T3356] do_el0_svc+0x124/0x220
> [ 9427.253215][ T3356] el0_sync_handler+0x260/0x408
> [ 9427.257908][ T3356] el0_sync+0x140/0x180
> [ 9427.261907][ T3356]
> [ 9427.264084][ T3356] The buggy address belongs to the object at ffff0089df1a6e00
> [ 9427.264084][ T3356] which belongs to the cache kmalloc-512 of size 512
> [ 9427.277980][ T3356] The buggy address is located 360 bytes inside of
> [ 9427.277980][ T3356] 512-byte region [ffff0089df1a6e00, ffff0089df1a7000)
> [ 9427.291094][ T3356] The buggy address belongs to the page:
> [ 9427.296571][ T3356] page:ffffffe02257c680 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff0089df1a1400
> [ 9427.306823][ T3356] flags: 0x7ffff800000200(slab)
> [ 9427.311520][ T3356] raw: 007ffff800000200 ffffffe02246b8c8 ffffffe02257ff88 ffff000000320680
> [ 9427.319949][ T3356] raw: ffff0089df1a1400 00000000002a000e 00000001ffffffff ffff0089df1a5001
> [ 9427.328374][ T3356] page dumped because: kasan: bad access detected
> [ 9427.334630][ T3356] page->mem_cgroup:ffff0089df1a5001
> [ 9427.339670][ T3356]
> [ 9427.341846][ T3356] Memory state around the buggy address:
> [ 9427.347322][ T3356] ffff0089df1a6e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [ 9427.355228][ T3356] ffff0089df1a6e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [ 9427.363133][ T3356] >ffff0089df1a6f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [ 9427.371038][ T3356] ^
> [ 9427.378337][ T3356] ffff0089df1a6f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [ 9427.386242][ T3356] ffff0089df1a7000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [ 9427.394146][ T3356] ==================================================================
> [ 9427.402052][ T3356] Disabling lock debugging due to kernel taint
>
> >
> > Thanks,
> >
> > Joerg
> >
> > Joerg Roedel (33):
> > iommu: Move default domain allocation to separate function
> > iommu/amd: Implement iommu_ops->def_domain_type call-back
> > iommu/vt-d: Wire up iommu_ops->def_domain_type
> > iommu/amd: Remove dma_mask check from check_device()
> > iommu/amd: Return -ENODEV in add_device when device is not handled by
> > IOMMU
> > iommu: Add probe_device() and release_device() call-backs
> > iommu: Move default domain allocation to iommu_probe_device()
> > iommu: Keep a list of allocated groups in __iommu_probe_device()
> > iommu: Move new probe_device path to separate function
> > iommu: Split off default domain allocation from group assignment
> > iommu: Move iommu_group_create_direct_mappings() out of
> > iommu_group_add_device()
> > iommu: Export bus_iommu_probe() and make is safe for re-probing
> > iommu/amd: Remove dev_data->passthrough
> > iommu/amd: Convert to probe/release_device() call-backs
> > iommu/vt-d: Convert to probe/release_device() call-backs
> > iommu/arm-smmu: Convert to probe/release_device() call-backs
> > iommu/pamu: Convert to probe/release_device() call-backs
> > iommu/s390: Convert to probe/release_device() call-backs
> > iommu/virtio: Convert to probe/release_device() call-backs
> > iommu/msm: Convert to probe/release_device() call-backs
> > iommu/mediatek: Convert to probe/release_device() call-backs
> > iommu/mediatek-v1 Convert to probe/release_device() call-backs
> > iommu/qcom: Convert to probe/release_device() call-backs
> > iommu/rockchip: Convert to probe/release_device() call-backs
> > iommu/tegra: Convert to probe/release_device() call-backs
> > iommu/renesas: Convert to probe/release_device() call-backs
> > iommu/omap: Remove orphan_dev tracking
> > iommu/omap: Convert to probe/release_device() call-backs
> > iommu/exynos: Use first SYSMMU in controllers list for IOMMU core
> > iommu/exynos: Convert to probe/release_device() call-backs
> > iommu: Remove add_device()/remove_device() code-paths
> > iommu: Move more initialization to __iommu_probe_device()
> > iommu: Unexport iommu_group_get_for_dev()
> >
> > Sai Praneeth Prakhya (1):
> > iommu: Add def_domain_type() callback in iommu_ops
> >
> > drivers/iommu/amd_iommu.c | 97 ++++----
> > drivers/iommu/amd_iommu_types.h | 1 -
> > drivers/iommu/arm-smmu-v3.c | 38 +---
> > drivers/iommu/arm-smmu.c | 39 ++--
> > drivers/iommu/exynos-iommu.c | 24 +-
> > drivers/iommu/fsl_pamu_domain.c | 22 +-
> > drivers/iommu/intel-iommu.c | 68 +-----
> > drivers/iommu/iommu.c | 387 +++++++++++++++++++++++++-------
> > drivers/iommu/ipmmu-vmsa.c | 60 ++---
> > drivers/iommu/msm_iommu.c | 34 +--
> > drivers/iommu/mtk_iommu.c | 24 +-
> > drivers/iommu/mtk_iommu_v1.c | 50 ++---
> > drivers/iommu/omap-iommu.c | 99 ++------
> > drivers/iommu/qcom_iommu.c | 24 +-
> > drivers/iommu/rockchip-iommu.c | 26 +--
> > drivers/iommu/s390-iommu.c | 22 +-
> > drivers/iommu/tegra-gart.c | 24 +-
> > drivers/iommu/tegra-smmu.c | 31 +--
> > drivers/iommu/virtio-iommu.c | 41 +---
> > include/linux/iommu.h | 21 +-
> > 20 files changed, 531 insertions(+), 601 deletions(-)
> >
> > --
> > 2.17.1
> >
> > _______________________________________________
> > iommu mailing list
> > [email protected]
> > https://lists.linuxfoundation.org/mailman/listinfo/iommu

2020-07-09 15:25:26

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH v3 00/34] iommu: Move iommu_group setup to IOMMU core code

On Fri, Jul 03, 2020 at 08:17:09PM -0400, Qian Cai wrote:
> FYI, I have just sent a patch to fix this,
>
> https://lore.kernel.org/linux-iommu/[email protected]/

Just queued that fix, thanks. Please don't send patches to my suse
email address, use only the 8bytes.org one.

Thanks,

Joerg