2022-04-18 10:47:41

by Baolu Lu

[permalink] [raw]
Subject: [RESEND PATCH v8 00/11] Fix BUG_ON in vfio_iommu_group_notifier()

Hi Joerg,

This is a resend version of v8 posted here:
https://lore.kernel.org/linux-iommu/[email protected]/
as we discussed in this thread:
https://lore.kernel.org/linux-iommu/Yk%[email protected]/

All patches can be applied perfectly except this one:
- [PATCH v8 02/11] driver core: Add dma_cleanup callback in bus_type
It conflicts with below refactoring commit:
- 4b775aaf1ea99 "driver core: Refactor sysfs and drv/bus remove hooks"
The conflict has been fixed in this post.

No functional changes in this series. I suppress cc-ing this series to
all v8 reviewers in order to avoid spam.

Please consider it for your iommu tree.

Best regards,
baolu

Change log:
- v8 and before:
- Please refer to v8 post for all the change history.

- v8-resend
- Rebase the series on top of v5.18-rc3.
- Add Reviewed-by's granted by Robin.
- Add a Tested-by granted by Eric.

Jason Gunthorpe (1):
vfio: Delete the unbound_list

Lu Baolu (10):
iommu: Add DMA ownership management interfaces
driver core: Add dma_cleanup callback in bus_type
amba: Stop sharing platform_dma_configure()
bus: platform,amba,fsl-mc,PCI: Add device DMA ownership management
PCI: pci_stub: Set driver_managed_dma
PCI: portdrv: Set driver_managed_dma
vfio: Set DMA ownership for VFIO devices
vfio: Remove use of vfio_group_viable()
vfio: Remove iommu group notifier
iommu: Remove iommu group changes notifier

include/linux/amba/bus.h | 8 +
include/linux/device/bus.h | 3 +
include/linux/fsl/mc.h | 8 +
include/linux/iommu.h | 54 +++---
include/linux/pci.h | 8 +
include/linux/platform_device.h | 10 +-
drivers/amba/bus.c | 37 +++-
drivers/base/dd.c | 5 +
drivers/base/platform.c | 21 ++-
drivers/bus/fsl-mc/fsl-mc-bus.c | 24 ++-
drivers/iommu/iommu.c | 228 ++++++++++++++++--------
drivers/pci/pci-driver.c | 18 ++
drivers/pci/pci-stub.c | 1 +
drivers/pci/pcie/portdrv_pci.c | 2 +
drivers/vfio/fsl-mc/vfio_fsl_mc.c | 1 +
drivers/vfio/pci/vfio_pci.c | 1 +
drivers/vfio/platform/vfio_amba.c | 1 +
drivers/vfio/platform/vfio_platform.c | 1 +
drivers/vfio/vfio.c | 245 ++------------------------
19 files changed, 338 insertions(+), 338 deletions(-)

--
2.25.1


2022-04-18 12:22:28

by Baolu Lu

[permalink] [raw]
Subject: [RESEND PATCH v8 04/11] bus: platform,amba,fsl-mc,PCI: Add device DMA ownership management

The devices on platform/amba/fsl-mc/PCI buses could be bound to drivers
with the device DMA managed by kernel drivers or user-space applications.
Unfortunately, multiple devices may be placed in the same IOMMU group
because they cannot be isolated from each other. The DMA on these devices
must either be entirely under kernel control or userspace control, never
a mixture. Otherwise the driver integrity is not guaranteed because they
could access each other through the peer-to-peer accesses which by-pass
the IOMMU protection.

This checks and sets the default DMA mode during driver binding, and
cleanups during driver unbinding. In the default mode, the device DMA is
managed by the device driver which handles DMA operations through the
kernel DMA APIs (see Documentation/core-api/dma-api.rst).

For cases where the devices are assigned for userspace control through the
userspace driver framework(i.e. VFIO), the drivers(for example, vfio_pci/
vfio_platfrom etc.) may set a new flag (driver_managed_dma) to skip this
default setting in the assumption that the drivers know what they are
doing with the device DMA.

Calling iommu_device_use_default_domain() before {of,acpi}_dma_configure
is currently a problem. As things stand, the IOMMU driver ignored the
initial iommu_probe_device() call when the device was added, since at
that point it had no fwspec yet. In this situation,
{of,acpi}_iommu_configure() are retriggering iommu_probe_device() after
the IOMMU driver has seen the firmware data via .of_xlate to learn that
it actually responsible for the given device. As the result, before
that gets fixed, iommu_use_default_domain() goes at the end, and calls
arch_teardown_dma_ops() if it fails.

Cc: Greg Kroah-Hartman <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Stuart Yoder <[email protected]>
Cc: Laurentiu Tudor <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Robin Murphy <[email protected]>
Tested-by: Eric Auger <[email protected]>
---
include/linux/amba/bus.h | 8 ++++++++
include/linux/fsl/mc.h | 8 ++++++++
include/linux/pci.h | 8 ++++++++
include/linux/platform_device.h | 8 ++++++++
drivers/amba/bus.c | 18 ++++++++++++++++++
drivers/base/platform.c | 18 ++++++++++++++++++
drivers/bus/fsl-mc/fsl-mc-bus.c | 24 ++++++++++++++++++++++--
drivers/pci/pci-driver.c | 18 ++++++++++++++++++
8 files changed, 108 insertions(+), 2 deletions(-)

diff --git a/include/linux/amba/bus.h b/include/linux/amba/bus.h
index 6562f543c3e0..2ddce9bcd00e 100644
--- a/include/linux/amba/bus.h
+++ b/include/linux/amba/bus.h
@@ -79,6 +79,14 @@ struct amba_driver {
void (*remove)(struct amba_device *);
void (*shutdown)(struct amba_device *);
const struct amba_id *id_table;
+ /*
+ * For most device drivers, no need to care about this flag as long as
+ * all DMAs are handled through the kernel DMA API. For some special
+ * ones, for example VFIO drivers, they know how to manage the DMA
+ * themselves and set this flag so that the IOMMU layer will allow them
+ * to setup and manage their own I/O address space.
+ */
+ bool driver_managed_dma;
};

/*
diff --git a/include/linux/fsl/mc.h b/include/linux/fsl/mc.h
index 7b6c42bfb660..27efef8affb1 100644
--- a/include/linux/fsl/mc.h
+++ b/include/linux/fsl/mc.h
@@ -32,6 +32,13 @@ struct fsl_mc_io;
* @shutdown: Function called at shutdown time to quiesce the device
* @suspend: Function called when a device is stopped
* @resume: Function called when a device is resumed
+ * @driver_managed_dma: Device driver doesn't use kernel DMA API for DMA.
+ * For most device drivers, no need to care about this flag
+ * as long as all DMAs are handled through the kernel DMA API.
+ * For some special ones, for example VFIO drivers, they know
+ * how to manage the DMA themselves and set this flag so that
+ * the IOMMU layer will allow them to setup and manage their
+ * own I/O address space.
*
* Generic DPAA device driver object for device drivers that are registered
* with a DPRC bus. This structure is to be embedded in each device-specific
@@ -45,6 +52,7 @@ struct fsl_mc_driver {
void (*shutdown)(struct fsl_mc_device *dev);
int (*suspend)(struct fsl_mc_device *dev, pm_message_t state);
int (*resume)(struct fsl_mc_device *dev);
+ bool driver_managed_dma;
};

#define to_fsl_mc_driver(_drv) \
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 60adf42460ab..b933d2b08d4d 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -895,6 +895,13 @@ struct module;
* created once it is bound to the driver.
* @driver: Driver model structure.
* @dynids: List of dynamically added device IDs.
+ * @driver_managed_dma: Device driver doesn't use kernel DMA API for DMA.
+ * For most device drivers, no need to care about this flag
+ * as long as all DMAs are handled through the kernel DMA API.
+ * For some special ones, for example VFIO drivers, they know
+ * how to manage the DMA themselves and set this flag so that
+ * the IOMMU layer will allow them to setup and manage their
+ * own I/O address space.
*/
struct pci_driver {
struct list_head node;
@@ -913,6 +920,7 @@ struct pci_driver {
const struct attribute_group **dev_groups;
struct device_driver driver;
struct pci_dynids dynids;
+ bool driver_managed_dma;
};

static inline struct pci_driver *to_pci_driver(struct device_driver *drv)
diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
index 17fde717df68..b3d9c744f1e5 100644
--- a/include/linux/platform_device.h
+++ b/include/linux/platform_device.h
@@ -210,6 +210,14 @@ struct platform_driver {
struct device_driver driver;
const struct platform_device_id *id_table;
bool prevent_deferred_probe;
+ /*
+ * For most device drivers, no need to care about this flag as long as
+ * all DMAs are handled through the kernel DMA API. For some special
+ * ones, for example VFIO drivers, they know how to manage the DMA
+ * themselves and set this flag so that the IOMMU layer will allow them
+ * to setup and manage their own I/O address space.
+ */
+ bool driver_managed_dma;
};

#define to_platform_driver(drv) (container_of((drv), struct platform_driver, \
diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c
index 76b52bd2c2a4..a0ec61232b6c 100644
--- a/drivers/amba/bus.c
+++ b/drivers/amba/bus.c
@@ -22,6 +22,8 @@
#include <linux/of_irq.h>
#include <linux/of_device.h>
#include <linux/acpi.h>
+#include <linux/iommu.h>
+#include <linux/dma-map-ops.h>

#define to_amba_driver(d) container_of(d, struct amba_driver, drv)

@@ -277,6 +279,7 @@ static void amba_shutdown(struct device *dev)

static int amba_dma_configure(struct device *dev)
{
+ struct amba_driver *drv = to_amba_driver(dev->driver);
enum dev_dma_attr attr;
int ret = 0;

@@ -287,9 +290,23 @@ static int amba_dma_configure(struct device *dev)
ret = acpi_dma_configure(dev, attr);
}

+ if (!ret && !drv->driver_managed_dma) {
+ ret = iommu_device_use_default_domain(dev);
+ if (ret)
+ arch_teardown_dma_ops(dev);
+ }
+
return ret;
}

+static void amba_dma_cleanup(struct device *dev)
+{
+ struct amba_driver *drv = to_amba_driver(dev->driver);
+
+ if (!drv->driver_managed_dma)
+ iommu_device_unuse_default_domain(dev);
+}
+
#ifdef CONFIG_PM
/*
* Hooks to provide runtime PM of the pclk (bus clock). It is safe to
@@ -359,6 +376,7 @@ struct bus_type amba_bustype = {
.remove = amba_remove,
.shutdown = amba_shutdown,
.dma_configure = amba_dma_configure,
+ .dma_cleanup = amba_dma_cleanup,
.pm = &amba_pm,
};
EXPORT_SYMBOL_GPL(amba_bustype);
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index d7915734d931..70bc30cf575c 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -30,6 +30,8 @@
#include <linux/property.h>
#include <linux/kmemleak.h>
#include <linux/types.h>
+#include <linux/iommu.h>
+#include <linux/dma-map-ops.h>

#include "base.h"
#include "power/power.h"
@@ -1456,6 +1458,7 @@ static void platform_shutdown(struct device *_dev)

static int platform_dma_configure(struct device *dev)
{
+ struct platform_driver *drv = to_platform_driver(dev->driver);
enum dev_dma_attr attr;
int ret = 0;

@@ -1466,9 +1469,23 @@ static int platform_dma_configure(struct device *dev)
ret = acpi_dma_configure(dev, attr);
}

+ if (!ret && !drv->driver_managed_dma) {
+ ret = iommu_device_use_default_domain(dev);
+ if (ret)
+ arch_teardown_dma_ops(dev);
+ }
+
return ret;
}

+static void platform_dma_cleanup(struct device *dev)
+{
+ struct platform_driver *drv = to_platform_driver(dev->driver);
+
+ if (!drv->driver_managed_dma)
+ iommu_device_unuse_default_domain(dev);
+}
+
static const struct dev_pm_ops platform_dev_pm_ops = {
SET_RUNTIME_PM_OPS(pm_generic_runtime_suspend, pm_generic_runtime_resume, NULL)
USE_PLATFORM_PM_SLEEP_OPS
@@ -1483,6 +1500,7 @@ struct bus_type platform_bus_type = {
.remove = platform_remove,
.shutdown = platform_shutdown,
.dma_configure = platform_dma_configure,
+ .dma_cleanup = platform_dma_cleanup,
.pm = &platform_dev_pm_ops,
};
EXPORT_SYMBOL_GPL(platform_bus_type);
diff --git a/drivers/bus/fsl-mc/fsl-mc-bus.c b/drivers/bus/fsl-mc/fsl-mc-bus.c
index 8fd4a356a86e..76648c4fdaf4 100644
--- a/drivers/bus/fsl-mc/fsl-mc-bus.c
+++ b/drivers/bus/fsl-mc/fsl-mc-bus.c
@@ -21,6 +21,7 @@
#include <linux/dma-mapping.h>
#include <linux/acpi.h>
#include <linux/iommu.h>
+#include <linux/dma-map-ops.h>

#include "fsl-mc-private.h"

@@ -140,15 +141,33 @@ static int fsl_mc_dma_configure(struct device *dev)
{
struct device *dma_dev = dev;
struct fsl_mc_device *mc_dev = to_fsl_mc_device(dev);
+ struct fsl_mc_driver *mc_drv = to_fsl_mc_driver(dev->driver);
u32 input_id = mc_dev->icid;
+ int ret;

while (dev_is_fsl_mc(dma_dev))
dma_dev = dma_dev->parent;

if (dev_of_node(dma_dev))
- return of_dma_configure_id(dev, dma_dev->of_node, 0, &input_id);
+ ret = of_dma_configure_id(dev, dma_dev->of_node, 0, &input_id);
+ else
+ ret = acpi_dma_configure_id(dev, DEV_DMA_COHERENT, &input_id);
+
+ if (!ret && !mc_drv->driver_managed_dma) {
+ ret = iommu_device_use_default_domain(dev);
+ if (ret)
+ arch_teardown_dma_ops(dev);
+ }
+
+ return ret;
+}
+
+static void fsl_mc_dma_cleanup(struct device *dev)
+{
+ struct fsl_mc_driver *mc_drv = to_fsl_mc_driver(dev->driver);

- return acpi_dma_configure_id(dev, DEV_DMA_COHERENT, &input_id);
+ if (!mc_drv->driver_managed_dma)
+ iommu_device_unuse_default_domain(dev);
}

static ssize_t modalias_show(struct device *dev, struct device_attribute *attr,
@@ -312,6 +331,7 @@ struct bus_type fsl_mc_bus_type = {
.match = fsl_mc_bus_match,
.uevent = fsl_mc_bus_uevent,
.dma_configure = fsl_mc_dma_configure,
+ .dma_cleanup = fsl_mc_dma_cleanup,
.dev_groups = fsl_mc_dev_groups,
.bus_groups = fsl_mc_bus_groups,
};
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 4ceeb75fc899..f83f7fbac68f 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -20,6 +20,7 @@
#include <linux/of_device.h>
#include <linux/acpi.h>
#include <linux/dma-map-ops.h>
+#include <linux/iommu.h>
#include "pci.h"
#include "pcie/portdrv.h"

@@ -1601,6 +1602,7 @@ static int pci_bus_num_vf(struct device *dev)
*/
static int pci_dma_configure(struct device *dev)
{
+ struct pci_driver *driver = to_pci_driver(dev->driver);
struct device *bridge;
int ret = 0;

@@ -1616,9 +1618,24 @@ static int pci_dma_configure(struct device *dev)
}

pci_put_host_bridge_device(bridge);
+
+ if (!ret && !driver->driver_managed_dma) {
+ ret = iommu_device_use_default_domain(dev);
+ if (ret)
+ arch_teardown_dma_ops(dev);
+ }
+
return ret;
}

+static void pci_dma_cleanup(struct device *dev)
+{
+ struct pci_driver *driver = to_pci_driver(dev->driver);
+
+ if (!driver->driver_managed_dma)
+ iommu_device_unuse_default_domain(dev);
+}
+
struct bus_type pci_bus_type = {
.name = "pci",
.match = pci_bus_match,
@@ -1632,6 +1649,7 @@ struct bus_type pci_bus_type = {
.pm = PCI_PM_OPS_PTR,
.num_vf = pci_bus_num_vf,
.dma_configure = pci_dma_configure,
+ .dma_cleanup = pci_dma_cleanup,
};
EXPORT_SYMBOL(pci_bus_type);

--
2.25.1

2022-04-18 12:31:33

by Baolu Lu

[permalink] [raw]
Subject: [RESEND PATCH v8 03/11] amba: Stop sharing platform_dma_configure()

Stop sharing platform_dma_configure() helper as they are about to have
their own bus dma_configure callbacks.

Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
---
include/linux/platform_device.h | 2 --
drivers/amba/bus.c | 19 ++++++++++++++++++-
drivers/base/platform.c | 3 +--
3 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
index 7c96f169d274..17fde717df68 100644
--- a/include/linux/platform_device.h
+++ b/include/linux/platform_device.h
@@ -328,8 +328,6 @@ extern int platform_pm_restore(struct device *dev);
#define platform_pm_restore NULL
#endif

-extern int platform_dma_configure(struct device *dev);
-
#ifdef CONFIG_PM_SLEEP
#define USE_PLATFORM_PM_SLEEP_OPS \
.suspend = platform_pm_suspend, \
diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c
index d3bd14aaabf6..76b52bd2c2a4 100644
--- a/drivers/amba/bus.c
+++ b/drivers/amba/bus.c
@@ -20,6 +20,8 @@
#include <linux/platform_device.h>
#include <linux/reset.h>
#include <linux/of_irq.h>
+#include <linux/of_device.h>
+#include <linux/acpi.h>

#define to_amba_driver(d) container_of(d, struct amba_driver, drv)

@@ -273,6 +275,21 @@ static void amba_shutdown(struct device *dev)
drv->shutdown(to_amba_device(dev));
}

+static int amba_dma_configure(struct device *dev)
+{
+ enum dev_dma_attr attr;
+ int ret = 0;
+
+ if (dev->of_node) {
+ ret = of_dma_configure(dev, dev->of_node, true);
+ } else if (has_acpi_companion(dev)) {
+ attr = acpi_get_dma_attr(to_acpi_device_node(dev->fwnode));
+ ret = acpi_dma_configure(dev, attr);
+ }
+
+ return ret;
+}
+
#ifdef CONFIG_PM
/*
* Hooks to provide runtime PM of the pclk (bus clock). It is safe to
@@ -341,7 +358,7 @@ struct bus_type amba_bustype = {
.probe = amba_probe,
.remove = amba_remove,
.shutdown = amba_shutdown,
- .dma_configure = platform_dma_configure,
+ .dma_configure = amba_dma_configure,
.pm = &amba_pm,
};
EXPORT_SYMBOL_GPL(amba_bustype);
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 8cc272fd5c99..d7915734d931 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -1454,8 +1454,7 @@ static void platform_shutdown(struct device *_dev)
drv->shutdown(dev);
}

-
-int platform_dma_configure(struct device *dev)
+static int platform_dma_configure(struct device *dev)
{
enum dev_dma_attr attr;
int ret = 0;
--
2.25.1

2022-04-18 12:31:41

by Baolu Lu

[permalink] [raw]
Subject: [RESEND PATCH v8 05/11] PCI: pci_stub: Set driver_managed_dma

The current VFIO implementation allows pci-stub driver to be bound to
a PCI device with other devices in the same IOMMU group being assigned
to userspace. The pci-stub driver has no dependencies on DMA or the
IOVA mapping of the device, but it does prevent the user from having
direct access to the device, which is useful in some circumstances.

The pci_dma_configure() marks the iommu_group as containing only devices
with kernel drivers that manage DMA. For compatibility with the VFIO
usage, avoid this default behavior for the pci_stub. This allows the
pci_stub still able to be used by the admin to block driver binding after
applying the DMA ownership to VFIO.

Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
---
drivers/pci/pci-stub.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/pci/pci-stub.c b/drivers/pci/pci-stub.c
index e408099fea52..d1f4c1ce7bd1 100644
--- a/drivers/pci/pci-stub.c
+++ b/drivers/pci/pci-stub.c
@@ -36,6 +36,7 @@ static struct pci_driver stub_driver = {
.name = "pci-stub",
.id_table = NULL, /* only dynamic id's */
.probe = pci_stub_probe,
+ .driver_managed_dma = true,
};

static int __init pci_stub_init(void)
--
2.25.1

2022-04-18 12:38:58

by Baolu Lu

[permalink] [raw]
Subject: [RESEND PATCH v8 02/11] driver core: Add dma_cleanup callback in bus_type

The bus_type structure defines dma_configure() callback for bus drivers
to configure DMA on the devices. This adds the paired dma_cleanup()
callback and calls it during driver unbinding so that bus drivers can do
some cleanup work.

One use case for this paired DMA callbacks is for the bus driver to check
for DMA ownership conflicts during driver binding, where multiple devices
belonging to a same IOMMU group (the minimum granularity of isolation and
protection) may be assigned to kernel drivers or user space respectively.

Without this change, for example, the vfio driver has to listen to a bus
BOUND_DRIVER event and then BUG_ON() in case of dma ownership conflict.
This leads to bad user experience since careless driver binding operation
may crash the system if the admin overlooks the group restriction. Aside
from bad design, this leads to a security problem as a root user, even with
lockdown=integrity, can force the kernel to BUG.

With this change, the bus driver could check and set the DMA ownership in
driver binding process and fail on ownership conflicts. The DMA ownership
should be released during driver unbinding.

Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
---
include/linux/device/bus.h | 3 +++
drivers/base/dd.c | 5 +++++
2 files changed, 8 insertions(+)

diff --git a/include/linux/device/bus.h b/include/linux/device/bus.h
index a039ab809753..d8b29ccd07e5 100644
--- a/include/linux/device/bus.h
+++ b/include/linux/device/bus.h
@@ -59,6 +59,8 @@ struct fwnode_handle;
* bus supports.
* @dma_configure: Called to setup DMA configuration on a device on
* this bus.
+ * @dma_cleanup: Called to cleanup DMA configuration on a device on
+ * this bus.
* @pm: Power management operations of this bus, callback the specific
* device driver's pm-ops.
* @iommu_ops: IOMMU specific operations for this bus, used to attach IOMMU
@@ -103,6 +105,7 @@ struct bus_type {
int (*num_vf)(struct device *dev);

int (*dma_configure)(struct device *dev);
+ void (*dma_cleanup)(struct device *dev);

const struct dev_pm_ops *pm;

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 3fc3b5940bb3..94b7ac9bf459 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -671,6 +671,8 @@ static int really_probe(struct device *dev, struct device_driver *drv)
if (dev->bus)
blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
BUS_NOTIFY_DRIVER_NOT_BOUND, dev);
+ if (dev->bus && dev->bus->dma_cleanup)
+ dev->bus->dma_cleanup(dev);
pinctrl_bind_failed:
device_links_no_driver(dev);
device_unbind_cleanup(dev);
@@ -1199,6 +1201,9 @@ static void __device_release_driver(struct device *dev, struct device *parent)

device_remove(dev);

+ if (dev->bus && dev->bus->dma_cleanup)
+ dev->bus->dma_cleanup(dev);
+
device_links_driver_cleanup(dev);
device_unbind_cleanup(dev);

--
2.25.1

2022-04-18 12:44:50

by Baolu Lu

[permalink] [raw]
Subject: [RESEND PATCH v8 01/11] iommu: Add DMA ownership management interfaces

Multiple devices may be placed in the same IOMMU group because they
cannot be isolated from each other. These devices must either be
entirely under kernel control or userspace control, never a mixture.

This adds dma ownership management in iommu core and exposes several
interfaces for the device drivers and the device userspace assignment
framework (i.e. VFIO), so that any conflict between user and kernel
controlled dma could be detected at the beginning.

The device driver oriented interfaces are,

int iommu_device_use_default_domain(struct device *dev);
void iommu_device_unuse_default_domain(struct device *dev);

By calling iommu_device_use_default_domain(), the device driver tells
the iommu layer that the device dma is handled through the kernel DMA
APIs. The iommu layer will manage the IOVA and use the default domain
for DMA address translation.

The device user-space assignment framework oriented interfaces are,

int iommu_group_claim_dma_owner(struct iommu_group *group,
void *owner);
void iommu_group_release_dma_owner(struct iommu_group *group);
bool iommu_group_dma_owner_claimed(struct iommu_group *group);

The device userspace assignment must be disallowed if the DMA owner
claiming interface returns failure.

Signed-off-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Kevin Tian <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Robin Murphy <[email protected]>
---
include/linux/iommu.h | 31 +++++++++
drivers/iommu/iommu.c | 153 +++++++++++++++++++++++++++++++++++++++++-
2 files changed, 181 insertions(+), 3 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 9208eca4b0d1..77972ef978b5 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -675,6 +675,13 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev,
void iommu_sva_unbind_device(struct iommu_sva *handle);
u32 iommu_sva_get_pasid(struct iommu_sva *handle);

+int iommu_device_use_default_domain(struct device *dev);
+void iommu_device_unuse_default_domain(struct device *dev);
+
+int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner);
+void iommu_group_release_dma_owner(struct iommu_group *group);
+bool iommu_group_dma_owner_claimed(struct iommu_group *group);
+
#else /* CONFIG_IOMMU_API */

struct iommu_ops {};
@@ -1031,6 +1038,30 @@ static inline struct iommu_fwspec *dev_iommu_fwspec_get(struct device *dev)
{
return NULL;
}
+
+static inline int iommu_device_use_default_domain(struct device *dev)
+{
+ return 0;
+}
+
+static inline void iommu_device_unuse_default_domain(struct device *dev)
+{
+}
+
+static inline int
+iommu_group_claim_dma_owner(struct iommu_group *group, void *owner)
+{
+ return -ENODEV;
+}
+
+static inline void iommu_group_release_dma_owner(struct iommu_group *group)
+{
+}
+
+static inline bool iommu_group_dma_owner_claimed(struct iommu_group *group)
+{
+ return false;
+}
#endif /* CONFIG_IOMMU_API */

/**
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f2c45b85b9fc..eba8e8ccf19d 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -48,6 +48,8 @@ struct iommu_group {
struct iommu_domain *default_domain;
struct iommu_domain *domain;
struct list_head entry;
+ unsigned int owner_cnt;
+ void *owner;
};

struct group_device {
@@ -294,7 +296,11 @@ int iommu_probe_device(struct device *dev)
mutex_lock(&group->mutex);
iommu_alloc_default_domain(group, dev);

- if (group->default_domain) {
+ /*
+ * If device joined an existing group which has been claimed, don't
+ * attach the default domain.
+ */
+ if (group->default_domain && !group->owner) {
ret = __iommu_attach_device(group->default_domain, dev);
if (ret) {
mutex_unlock(&group->mutex);
@@ -2109,7 +2115,7 @@ static int __iommu_attach_group(struct iommu_domain *domain,
{
int ret;

- if (group->default_domain && group->domain != group->default_domain)
+ if (group->domain && group->domain != group->default_domain)
return -EBUSY;

ret = __iommu_group_for_each_dev(group, domain,
@@ -2146,7 +2152,11 @@ static void __iommu_detach_group(struct iommu_domain *domain,
{
int ret;

- if (!group->default_domain) {
+ /*
+ * If the group has been claimed already, do not re-attach the default
+ * domain.
+ */
+ if (!group->default_domain || group->owner) {
__iommu_group_for_each_dev(group, domain,
iommu_group_do_detach_device);
group->domain = NULL;
@@ -3095,3 +3105,140 @@ static ssize_t iommu_group_store_type(struct iommu_group *group,

return ret;
}
+
+/**
+ * iommu_device_use_default_domain() - Device driver wants to handle device
+ * DMA through the kernel DMA API.
+ * @dev: The device.
+ *
+ * The device driver about to bind @dev wants to do DMA through the kernel
+ * DMA API. Return 0 if it is allowed, otherwise an error.
+ */
+int iommu_device_use_default_domain(struct device *dev)
+{
+ struct iommu_group *group = iommu_group_get(dev);
+ int ret = 0;
+
+ if (!group)
+ return 0;
+
+ mutex_lock(&group->mutex);
+ if (group->owner_cnt) {
+ if (group->domain != group->default_domain ||
+ group->owner) {
+ ret = -EBUSY;
+ goto unlock_out;
+ }
+ }
+
+ group->owner_cnt++;
+
+unlock_out:
+ mutex_unlock(&group->mutex);
+ iommu_group_put(group);
+
+ return ret;
+}
+
+/**
+ * iommu_device_unuse_default_domain() - Device driver stops handling device
+ * DMA through the kernel DMA API.
+ * @dev: The device.
+ *
+ * The device driver doesn't want to do DMA through kernel DMA API anymore.
+ * It must be called after iommu_device_use_default_domain().
+ */
+void iommu_device_unuse_default_domain(struct device *dev)
+{
+ struct iommu_group *group = iommu_group_get(dev);
+
+ if (!group)
+ return;
+
+ mutex_lock(&group->mutex);
+ if (!WARN_ON(!group->owner_cnt))
+ group->owner_cnt--;
+
+ mutex_unlock(&group->mutex);
+ iommu_group_put(group);
+}
+
+/**
+ * iommu_group_claim_dma_owner() - Set DMA ownership of a group
+ * @group: The group.
+ * @owner: Caller specified pointer. Used for exclusive ownership.
+ *
+ * This is to support backward compatibility for vfio which manages
+ * the dma ownership in iommu_group level. New invocations on this
+ * interface should be prohibited.
+ */
+int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner)
+{
+ int ret = 0;
+
+ mutex_lock(&group->mutex);
+ if (group->owner_cnt) {
+ ret = -EPERM;
+ goto unlock_out;
+ } else {
+ if (group->domain && group->domain != group->default_domain) {
+ ret = -EBUSY;
+ goto unlock_out;
+ }
+
+ group->owner = owner;
+ if (group->domain)
+ __iommu_detach_group(group->domain, group);
+ }
+
+ group->owner_cnt++;
+unlock_out:
+ mutex_unlock(&group->mutex);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_group_claim_dma_owner);
+
+/**
+ * iommu_group_release_dma_owner() - Release DMA ownership of a group
+ * @group: The group.
+ *
+ * Release the DMA ownership claimed by iommu_group_claim_dma_owner().
+ */
+void iommu_group_release_dma_owner(struct iommu_group *group)
+{
+ mutex_lock(&group->mutex);
+ if (WARN_ON(!group->owner_cnt || !group->owner))
+ goto unlock_out;
+
+ group->owner_cnt = 0;
+ /*
+ * The UNMANAGED domain should be detached before all USER
+ * owners have been released.
+ */
+ if (!WARN_ON(group->domain) && group->default_domain)
+ __iommu_attach_group(group->default_domain, group);
+ group->owner = NULL;
+unlock_out:
+ mutex_unlock(&group->mutex);
+}
+EXPORT_SYMBOL_GPL(iommu_group_release_dma_owner);
+
+/**
+ * iommu_group_dma_owner_claimed() - Query group dma ownership status
+ * @group: The group.
+ *
+ * This provides status query on a given group. It is racy and only for
+ * non-binding status reporting.
+ */
+bool iommu_group_dma_owner_claimed(struct iommu_group *group)
+{
+ unsigned int user;
+
+ mutex_lock(&group->mutex);
+ user = group->owner_cnt;
+ mutex_unlock(&group->mutex);
+
+ return user;
+}
+EXPORT_SYMBOL_GPL(iommu_group_dma_owner_claimed);
--
2.25.1

2022-04-28 12:52:12

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RESEND PATCH v8 00/11] Fix BUG_ON in vfio_iommu_group_notifier()

On Thu, Apr 28, 2022 at 11:32:04AM +0200, Joerg Roedel wrote:
> On Mon, Apr 18, 2022 at 08:49:49AM +0800, Lu Baolu wrote:
> > Lu Baolu (10):
> > iommu: Add DMA ownership management interfaces
> > driver core: Add dma_cleanup callback in bus_type
> > amba: Stop sharing platform_dma_configure()
> > bus: platform,amba,fsl-mc,PCI: Add device DMA ownership management
> > PCI: pci_stub: Set driver_managed_dma
> > PCI: portdrv: Set driver_managed_dma
> > vfio: Set DMA ownership for VFIO devices
> > vfio: Remove use of vfio_group_viable()
> > vfio: Remove iommu group notifier
> > iommu: Remove iommu group changes notifier
>
> Applied to core branch, thanks Baolu.

Can we get this on a topic branch so Alex can pull it? There are
conflicts with other VFIO patches

Thanks!
Jason

2022-04-28 13:26:35

by Joerg Roedel

[permalink] [raw]
Subject: Re: [RESEND PATCH v8 00/11] Fix BUG_ON in vfio_iommu_group_notifier()

On Mon, Apr 18, 2022 at 08:49:49AM +0800, Lu Baolu wrote:
> Lu Baolu (10):
> iommu: Add DMA ownership management interfaces
> driver core: Add dma_cleanup callback in bus_type
> amba: Stop sharing platform_dma_configure()
> bus: platform,amba,fsl-mc,PCI: Add device DMA ownership management
> PCI: pci_stub: Set driver_managed_dma
> PCI: portdrv: Set driver_managed_dma
> vfio: Set DMA ownership for VFIO devices
> vfio: Remove use of vfio_group_viable()
> vfio: Remove iommu group notifier
> iommu: Remove iommu group changes notifier

Applied to core branch, thanks Baolu.

2022-04-29 13:56:59

by Joerg Roedel

[permalink] [raw]
Subject: Re: [RESEND PATCH v8 00/11] Fix BUG_ON in vfio_iommu_group_notifier()

On Thu, Apr 28, 2022 at 08:54:11AM -0300, Jason Gunthorpe wrote:
> Can we get this on a topic branch so Alex can pull it? There are
> conflicts with other VFIO patches

Right, we already discussed this. Moved the patches to a separate topic
branch. It will appear as 'vfio-notifier-fix' once I pushed the changes.

Regards,

Joerg

2022-05-02 23:55:46

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RESEND PATCH v8 00/11] Fix BUG_ON in vfio_iommu_group_notifier()

On Mon, May 02, 2022 at 12:12:04PM -0400, Qian Cai wrote:
> On Mon, Apr 18, 2022 at 08:49:49AM +0800, Lu Baolu wrote:
> > Hi Joerg,
> >
> > This is a resend version of v8 posted here:
> > https://lore.kernel.org/linux-iommu/[email protected]/
> > as we discussed in this thread:
> > https://lore.kernel.org/linux-iommu/Yk%[email protected]/
> >
> > All patches can be applied perfectly except this one:
> > - [PATCH v8 02/11] driver core: Add dma_cleanup callback in bus_type
> > It conflicts with below refactoring commit:
> > - 4b775aaf1ea99 "driver core: Refactor sysfs and drv/bus remove hooks"
> > The conflict has been fixed in this post.
> >
> > No functional changes in this series. I suppress cc-ing this series to
> > all v8 reviewers in order to avoid spam.
> >
> > Please consider it for your iommu tree.
>
> Reverting this series fixed an user-after-free while doing SR-IOV.
>
> BUG: KASAN: use-after-free in __lock_acquire
> Read of size 8 at addr ffff080279825d78 by task qemu-system-aar/22429
> CPU: 24 PID: 22429 Comm: qemu-system-aar Not tainted 5.18.0-rc5-next-20220502 #69
> Call trace:
> dump_backtrace
> show_stack
> dump_stack_lvl
> print_address_description.constprop.0
> print_report
> kasan_report
> __asan_report_load8_noabort
> __lock_acquire
> lock_acquire.part.0
> lock_acquire
> _raw_spin_lock_irqsave
> arm_smmu_detach_dev
> arm_smmu_detach_dev at drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:2377
> arm_smmu_attach_dev

Hum.

So what has happened is that VFIO does this sequence:

iommu_detach_group()
iommu_domain_free()
iommu_group_release_dma_owner()

Which, I think should be valid, API wise.

From what I can see reading the code SMMUv3 blows up above because it
doesn't have a detach_dev op:

.default_domain_ops = &(const struct iommu_domain_ops) {
.attach_dev = arm_smmu_attach_dev,
.map_pages = arm_smmu_map_pages,
.unmap_pages = arm_smmu_unmap_pages,
.flush_iotlb_all = arm_smmu_flush_iotlb_all,
.iotlb_sync = arm_smmu_iotlb_sync,
.iova_to_phys = arm_smmu_iova_to_phys,
.enable_nesting = arm_smmu_enable_nesting,
.free = arm_smmu_domain_free,
}

But it is internally tracking the domain inside the master - so when
the next domain is attached it does this:

static void arm_smmu_detach_dev(struct arm_smmu_master *master)
{
struct arm_smmu_domain *smmu_domain = master->domain;

spin_lock_irqsave(&smmu_domain->devices_lock, flags);

And explodes as the domain has been freed but master->domain was not
NULL'd.

It worked before because iommu_detach_group() used to attach the
default group and that was before the domain was freed in the above
sequence.

I'm guessing SMMU3 needs to call it's arm_smmu_detach_dev(master) from
the detach_dev op and null it's cached copy of the domain, but I don't
know this driver.. Robin?

Thanks,
Jason

2022-05-03 00:54:39

by Qian Cai

[permalink] [raw]
Subject: Re: [RESEND PATCH v8 00/11] Fix BUG_ON in vfio_iommu_group_notifier()

On Mon, Apr 18, 2022 at 08:49:49AM +0800, Lu Baolu wrote:
> Hi Joerg,
>
> This is a resend version of v8 posted here:
> https://lore.kernel.org/linux-iommu/[email protected]/
> as we discussed in this thread:
> https://lore.kernel.org/linux-iommu/Yk%[email protected]/
>
> All patches can be applied perfectly except this one:
> - [PATCH v8 02/11] driver core: Add dma_cleanup callback in bus_type
> It conflicts with below refactoring commit:
> - 4b775aaf1ea99 "driver core: Refactor sysfs and drv/bus remove hooks"
> The conflict has been fixed in this post.
>
> No functional changes in this series. I suppress cc-ing this series to
> all v8 reviewers in order to avoid spam.
>
> Please consider it for your iommu tree.

Reverting this series fixed an user-after-free while doing SR-IOV.

BUG: KASAN: use-after-free in __lock_acquire
Read of size 8 at addr ffff080279825d78 by task qemu-system-aar/22429
CPU: 24 PID: 22429 Comm: qemu-system-aar Not tainted 5.18.0-rc5-next-20220502 #69
Call trace:
dump_backtrace
show_stack
dump_stack_lvl
print_address_description.constprop.0
print_report
kasan_report
__asan_report_load8_noabort
__lock_acquire
lock_acquire.part.0
lock_acquire
_raw_spin_lock_irqsave
arm_smmu_detach_dev
arm_smmu_detach_dev at drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:2377
arm_smmu_attach_dev
__iommu_attach_group
__iommu_attach_device at drivers/iommu/iommu.c:1942
(inlined by) iommu_group_do_attach_device at drivers/iommu/iommu.c:2058
(inlined by) __iommu_group_for_each_dev at drivers/iommu/iommu.c:989
(inlined by) __iommu_attach_group at drivers/iommu/iommu.c:2069
iommu_group_release_dma_owner
__vfio_group_unset_container
vfio_group_try_dissolve_container
vfio_group_put_external_user
kvm_vfio_destroy
kvm_destroy_vm
kvm_vm_release
__fput
____fput
task_work_run
do_exit
do_group_exit
get_signal
do_signal
do_notify_resume
el0_svc
el0t_64_sync_handler
el0t_64_sync

Allocated by task 22427:
kasan_save_stack
__kasan_kmalloc
kmem_cache_alloc_trace
arm_smmu_domain_alloc
iommu_domain_alloc
vfio_iommu_type1_attach_group
vfio_ioctl_set_iommu
vfio_fops_unl_ioctl
__arm64_sys_ioctl
invoke_syscall
el0_svc_common.constprop.0
do_el0_svc
el0_svc
el0t_64_sync_handler
el0t_64_sync

Freed by task 22429:
kasan_save_stack
kasan_set_track
kasan_set_free_info
____kasan_slab_free
__kasan_slab_free
slab_free_freelist_hook
kfree
arm_smmu_domain_free
arm_smmu_domain_free at iommu/arm/arm-smmu-v3/arm-smmu-v3.c:2067
iommu_domain_free
vfio_iommu_type1_detach_group
__vfio_group_unset_container
vfio_group_try_dissolve_container
vfio_group_put_external_user
kvm_vfio_destroy
kvm_destroy_vm
kvm_vm_release
__fput
____fput
task_work_run
do_exit
do_group_exit
get_signal
do_signal
do_notify_resume
el0_svc
el0t_64_sync_handler
el0t_64_sync

2022-05-03 18:02:32

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RESEND PATCH v8 00/11] Fix BUG_ON in vfio_iommu_group_notifier()

On Tue, May 03, 2022 at 02:04:37PM +0100, Robin Murphy wrote:

> > I'm guessing SMMU3 needs to call it's arm_smmu_detach_dev(master) from
> > the detach_dev op and null it's cached copy of the domain, but I don't
> > know this driver.. Robin?
>
> The original intent was that .detach_dev is deprecated in favour of default
> domains, and when the latter are in use, a device is always attached
> *somewhere* once probed (i.e. group->domain is never NULL). At face value,
> the neatest fix IMO would probably be for SMMUv3's .domain_free to handle
> smmu_domain->devices being non-empty and detach them at that point. However
> that wouldn't be viable for virtio-iommu or anyone else keeping an internal
> one-way association of devices to their current domains.

Oh wow that is not obvious

Actually, I think it is much worse than this because
iommu_group_claim_dma_owner() does a __iommu_detach_group() with the
expecation that this would actually result in DMA being blocked,
immediately. The idea that __iomuu_detatch_group() is a NOP is kind of
scary.

Leaving the group attached to the kernel DMA domain will allow
userspace to DMA to all kernel memory :\

So one approach could be to block use of iommu_group_claim_dma_owner()
if no detatch_dev op is present and then go through and put them back
or do something else. This could be short-term OK if we add an op to
SMMUv3, but long term everything would have to be fixed

Or we can allocate a dummy empty/blocked domain during
iommu_group_claim_dma_owner() and attach it whenever.

The really ugly trick is that detatch cannot fail, so attach to this
blocking domain must also not fail - IMHO this is a very complicated
API to expect for the driver to implement correctly... I see there is
already a WARN_ON that attaching to the default domain cannot
fail. Maybe this warrants an actual no-fail attach op so the driver
can be more aware of this..

And some of these internal APIs could stand some adjusting if we
really never want a true "detatch" it is always some kind of
replace/swap type operation, either to the default domain or to the
blocking domain.

> We *could* stay true to the original paradigm by introducing some real usage
> of IOMMU_DOMAIN_BLOCKED, such that we could keep one or more of those around
> to actively attach to instead of having groups in this unattached limbo
> state, but that's a bigger job involving adding support to drivers as well;
> too much for a quick fix now...

I suspect for the short term we can get by with an empty mapping
domain - using DOMAIN_BLOCKED is a bit of a refinement.

Thanks,
Jason

2022-05-03 18:59:52

by Robin Murphy

[permalink] [raw]
Subject: Re: [RESEND PATCH v8 00/11] Fix BUG_ON in vfio_iommu_group_notifier()

On 2022-05-03 16:23, Jason Gunthorpe wrote:
> On Tue, May 03, 2022 at 02:04:37PM +0100, Robin Murphy wrote:
>
>>> I'm guessing SMMU3 needs to call it's arm_smmu_detach_dev(master) from
>>> the detach_dev op and null it's cached copy of the domain, but I don't
>>> know this driver.. Robin?
>>
>> The original intent was that .detach_dev is deprecated in favour of default
>> domains, and when the latter are in use, a device is always attached
>> *somewhere* once probed (i.e. group->domain is never NULL). At face value,
>> the neatest fix IMO would probably be for SMMUv3's .domain_free to handle
>> smmu_domain->devices being non-empty and detach them at that point. However
>> that wouldn't be viable for virtio-iommu or anyone else keeping an internal
>> one-way association of devices to their current domains.
>
> Oh wow that is not obvious
>
> Actually, I think it is much worse than this because
> iommu_group_claim_dma_owner() does a __iommu_detach_group() with the
> expecation that this would actually result in DMA being blocked,
> immediately. The idea that __iomuu_detatch_group() is a NOP is kind of
> scary.

Scarier than the fact that even where it *is* implemented, .detach_dev
has never had a well-defined behaviour either, and plenty of drivers
treat it as a "remove the IOMMU from the picture altogether" operation
which ends up with the device in bypass rather than blocked?

> Leaving the group attached to the kernel DMA domain will allow
> userspace to DMA to all kernel memory :\

Note that a fair amount of IOMMU hardware only has two states, thus
could only actually achieve a blocking behaviour by enabling translation
with an empty pagetable anyway. (Trivia: and technically some of them
aren't even capable of blocking invalid accesses *when* translating -
they can only apply a "default" translation targeting some scratch page)

> So one approach could be to block use of iommu_group_claim_dma_owner()
> if no detatch_dev op is present and then go through and put them back
> or do something else. This could be short-term OK if we add an op to
> SMMUv3, but long term everything would have to be fixed
>
> Or we can allocate a dummy empty/blocked domain during
> iommu_group_claim_dma_owner() and attach it whenever.

How does the compile-tested diff below seem? There's a fair chance it's
still broken, but I don't have the bandwidth to give it much more
thought right now.

Cheers,
Robin.

----->8-----
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 29906bc16371..597d70ed7007 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -45,6 +45,7 @@ struct iommu_group {
int id;
struct iommu_domain *default_domain;
struct iommu_domain *domain;
+ struct iommu_domain *purgatory;
struct list_head entry;
unsigned int owner_cnt;
void *owner;
@@ -596,6 +597,8 @@ static void iommu_group_release(struct kobject *kobj)

if (group->default_domain)
iommu_domain_free(group->default_domain);
+ if (group->purgatory)
+ iommu_domain_free(group->purgatory);

kfree(group->name);
kfree(group);
@@ -2041,6 +2044,12 @@ struct iommu_domain *iommu_get_dma_domain(struct device *dev)
return dev->iommu_group->default_domain;
}

+static bool iommu_group_user_attached(struct iommu_group *group)
+{
+ return group->domain && group->domain != group->default_domain &&
+ group->domain != group->purgatory;
+}
+
/*
* IOMMU groups are really the natural working unit of the IOMMU, but
* the IOMMU API works on domains and devices. Bridge that gap by
@@ -2063,7 +2072,7 @@ static int __iommu_attach_group(struct iommu_domain *domain,
{
int ret;

- if (group->domain && group->domain != group->default_domain)
+ if (iommu_group_user_attached(group))
return -EBUSY;

ret = __iommu_group_for_each_dev(group, domain,
@@ -2104,7 +2113,12 @@ static void __iommu_detach_group(struct iommu_domain *domain,
* If the group has been claimed already, do not re-attach the default
* domain.
*/
- if (!group->default_domain || group->owner) {
+ if (group->owner) {
+ WARN_ON(__iommu_attach_group(group->purgatory, group));
+ return;
+ }
+
+ if (!group->default_domain) {
__iommu_group_for_each_dev(group, domain,
iommu_group_do_detach_device);
group->domain = NULL;
@@ -3111,6 +3125,25 @@ void iommu_device_unuse_default_domain(struct device *dev)
iommu_group_put(group);
}

+static struct iommu_domain *iommu_group_get_purgatory(struct iommu_group *group)
+{
+ struct group_device *dev;
+
+ mutex_lock(&group->mutex);
+ if (group->purgatory)
+ goto out;
+
+ dev = list_first_entry(&group->devices, struct group_device, list);
+ group->purgatory = __iommu_domain_alloc(dev->dev->bus,
+ IOMMU_DOMAIN_BLOCKED);
+ if (!group->purgatory)
+ group->purgatory = __iommu_domain_alloc(dev->dev->bus,
+ IOMMU_DOMAIN_UNMANAGED);
+out:
+ mutex_unlock(&group->mutex);
+ return group->purgatory;
+}
+
/**
* iommu_group_claim_dma_owner() - Set DMA ownership of a group
* @group: The group.
@@ -3122,6 +3155,7 @@ void iommu_device_unuse_default_domain(struct device *dev)
*/
int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner)
{
+ struct iommu_domain *pd;
int ret = 0;

mutex_lock(&group->mutex);
@@ -3133,10 +3167,13 @@ int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner)
ret = -EBUSY;
goto unlock_out;
}
+ pd = iommu_group_get_purgatory(group);
+ if (!pd)
+ return -ENOMEM;

group->owner = owner;
- if (group->domain)
- __iommu_detach_group(group->domain, group);
+ if (group->domain && group->domain != pd)
+ __iommu_attach_group(pd, group);
}

group->owner_cnt++;
@@ -3164,7 +3201,7 @@ void iommu_group_release_dma_owner(struct iommu_group *group)
* The UNMANAGED domain should be detached before all USER
* owners have been released.
*/
- if (!WARN_ON(group->domain) && group->default_domain)
+ if (!WARN_ON(iommu_group_user_attached(group) && group->default_domain))
__iommu_attach_group(group->default_domain, group);
group->owner = NULL;
unlock_out:

2022-05-03 21:47:33

by Robin Murphy

[permalink] [raw]
Subject: Re: [RESEND PATCH v8 00/11] Fix BUG_ON in vfio_iommu_group_notifier()

On 2022-05-02 17:42, Jason Gunthorpe wrote:
> On Mon, May 02, 2022 at 12:12:04PM -0400, Qian Cai wrote:
>> On Mon, Apr 18, 2022 at 08:49:49AM +0800, Lu Baolu wrote:
>>> Hi Joerg,
>>>
>>> This is a resend version of v8 posted here:
>>> https://lore.kernel.org/linux-iommu/[email protected]/
>>> as we discussed in this thread:
>>> https://lore.kernel.org/linux-iommu/Yk%[email protected]/
>>>
>>> All patches can be applied perfectly except this one:
>>> - [PATCH v8 02/11] driver core: Add dma_cleanup callback in bus_type
>>> It conflicts with below refactoring commit:
>>> - 4b775aaf1ea99 "driver core: Refactor sysfs and drv/bus remove hooks"
>>> The conflict has been fixed in this post.
>>>
>>> No functional changes in this series. I suppress cc-ing this series to
>>> all v8 reviewers in order to avoid spam.
>>>
>>> Please consider it for your iommu tree.
>>
>> Reverting this series fixed an user-after-free while doing SR-IOV.
>>
>> BUG: KASAN: use-after-free in __lock_acquire
>> Read of size 8 at addr ffff080279825d78 by task qemu-system-aar/22429
>> CPU: 24 PID: 22429 Comm: qemu-system-aar Not tainted 5.18.0-rc5-next-20220502 #69
>> Call trace:
>> dump_backtrace
>> show_stack
>> dump_stack_lvl
>> print_address_description.constprop.0
>> print_report
>> kasan_report
>> __asan_report_load8_noabort
>> __lock_acquire
>> lock_acquire.part.0
>> lock_acquire
>> _raw_spin_lock_irqsave
>> arm_smmu_detach_dev
>> arm_smmu_detach_dev at drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:2377
>> arm_smmu_attach_dev
>
> Hum.
>
> So what has happened is that VFIO does this sequence:
>
> iommu_detach_group()
> iommu_domain_free()
> iommu_group_release_dma_owner()
>
> Which, I think should be valid, API wise.
>
> From what I can see reading the code SMMUv3 blows up above because it
> doesn't have a detach_dev op:
>
> .default_domain_ops = &(const struct iommu_domain_ops) {
> .attach_dev = arm_smmu_attach_dev,
> .map_pages = arm_smmu_map_pages,
> .unmap_pages = arm_smmu_unmap_pages,
> .flush_iotlb_all = arm_smmu_flush_iotlb_all,
> .iotlb_sync = arm_smmu_iotlb_sync,
> .iova_to_phys = arm_smmu_iova_to_phys,
> .enable_nesting = arm_smmu_enable_nesting,
> .free = arm_smmu_domain_free,
> }
>
> But it is internally tracking the domain inside the master - so when
> the next domain is attached it does this:
>
> static void arm_smmu_detach_dev(struct arm_smmu_master *master)
> {
> struct arm_smmu_domain *smmu_domain = master->domain;
>
> spin_lock_irqsave(&smmu_domain->devices_lock, flags);
>
> And explodes as the domain has been freed but master->domain was not
> NULL'd.
>
> It worked before because iommu_detach_group() used to attach the
> default group and that was before the domain was freed in the above
> sequence.

Oof, I totally overlooked the significance of that little subtlety in
review :(

> I'm guessing SMMU3 needs to call it's arm_smmu_detach_dev(master) from
> the detach_dev op and null it's cached copy of the domain, but I don't
> know this driver.. Robin?

The original intent was that .detach_dev is deprecated in favour of
default domains, and when the latter are in use, a device is always
attached *somewhere* once probed (i.e. group->domain is never NULL). At
face value, the neatest fix IMO would probably be for SMMUv3's
.domain_free to handle smmu_domain->devices being non-empty and detach
them at that point. However that wouldn't be viable for virtio-iommu or
anyone else keeping an internal one-way association of devices to their
current domains.

If we're giving up entirely on that notion of .detach_dev going away
then all default-domain-supporting drivers probably want checking to
make sure that path hasn't bitrotted; both Arm SMMU drivers had it
proactively removed 6 years ago; virtio-iommu never had it at all; newer
drivers like apple-dart have some code there, but it won't have ever run
until now.

We *could* stay true to the original paradigm by introducing some real
usage of IOMMU_DOMAIN_BLOCKED, such that we could keep one or more of
those around to actively attach to instead of having groups in this
unattached limbo state, but that's a bigger job involving adding support
to drivers as well; too much for a quick fix now...

Robin.

2022-05-04 17:29:28

by Joerg Roedel

[permalink] [raw]
Subject: Re: [RESEND PATCH v8 00/11] Fix BUG_ON in vfio_iommu_group_notifier()

On Mon, May 02, 2022 at 12:12:04PM -0400, Qian Cai wrote:
> Reverting this series fixed an user-after-free while doing SR-IOV.
>
> BUG: KASAN: use-after-free in __lock_acquire

Hrm, okay. I am going exclude this series from my next branch for now
until this has been sorted out.

Alex, I suggest you do the same.

Regards,

Joerg

2022-05-04 17:47:29

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RESEND PATCH v8 00/11] Fix BUG_ON in vfio_iommu_group_notifier()

On Wed, May 04, 2022 at 10:42:07AM +0200, Joerg Roedel wrote:
> On Mon, May 02, 2022 at 12:12:04PM -0400, Qian Cai wrote:
> > Reverting this series fixed an user-after-free while doing SR-IOV.
> >
> > BUG: KASAN: use-after-free in __lock_acquire
>
> Hrm, okay. I am going exclude this series from my next branch for now
> until this has been sorted out.

This is going to blow up everything going on in vfio right now, let's
not do something so drastic please.

There is already a patch to fix it, lets wait for it to get sorted
out.

Nicolin and Eric have been testing with this series on ARM for a long
time now, it is not like it is completely broken.

Thanks,
Jason

2022-05-04 20:36:03

by Joerg Roedel

[permalink] [raw]
Subject: Re: [RESEND PATCH v8 00/11] Fix BUG_ON in vfio_iommu_group_notifier()

On Wed, May 04, 2022 at 08:51:35AM -0300, Jason Gunthorpe wrote:
> Nicolin and Eric have been testing with this series on ARM for a long
> time now, it is not like it is completely broken.

Yeah, I am also optimistic this can be fixed soon. But the rule is that
the next branch should only contain patches which I would send to Linus.
And with a known issue in it I wouldn't, so it is excluded at least from
my next branch for now. The topic branch is still alive and I will merge
it again when the fix is in.

Regards,

Joerg

2022-05-09 05:25:20

by Alex Williamson

[permalink] [raw]
Subject: Re: [RESEND PATCH v8 00/11] Fix BUG_ON in vfio_iommu_group_notifier()

On Wed, 4 May 2022 10:42:07 +0200
Joerg Roedel <[email protected]> wrote:

> On Mon, May 02, 2022 at 12:12:04PM -0400, Qian Cai wrote:
> > Reverting this series fixed an user-after-free while doing SR-IOV.
> >
> > BUG: KASAN: use-after-free in __lock_acquire
>
> Hrm, okay. I am going exclude this series from my next branch for now
> until this has been sorted out.
>
> Alex, I suggest you do the same.

Done, and thanks for the heads-up. Please try to cc me when the
vfio-notifier-fix branch is merged back into your next branch. Thanks,

Alex


2022-05-09 18:39:21

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RESEND PATCH v8 00/11] Fix BUG_ON in vfio_iommu_group_notifier()

On Wed, May 04, 2022 at 01:57:05PM +0200, Joerg Roedel wrote:
> On Wed, May 04, 2022 at 08:51:35AM -0300, Jason Gunthorpe wrote:
> > Nicolin and Eric have been testing with this series on ARM for a long
> > time now, it is not like it is completely broken.
>
> Yeah, I am also optimistic this can be fixed soon. But the rule is that
> the next branch should only contain patches which I would send to Linus.
> And with a known issue in it I wouldn't, so it is excluded at least from
> my next branch for now. The topic branch is still alive and I will merge
> it again when the fix is in.

The fix is out, lets merge it back in so we can have some more time to
discover any additional issues. People seem to test when it is in your
branch.

Thanks,
Jason

2022-05-14 00:22:11

by Tian, Kevin

[permalink] [raw]
Subject: RE: [RESEND PATCH v8 00/11] Fix BUG_ON in vfio_iommu_group_notifier()

> From: Jason Gunthorpe
> Sent: Tuesday, May 10, 2022 2:33 AM
>
> On Wed, May 04, 2022 at 01:57:05PM +0200, Joerg Roedel wrote:
> > On Wed, May 04, 2022 at 08:51:35AM -0300, Jason Gunthorpe wrote:
> > > Nicolin and Eric have been testing with this series on ARM for a long
> > > time now, it is not like it is completely broken.
> >
> > Yeah, I am also optimistic this can be fixed soon. But the rule is that
> > the next branch should only contain patches which I would send to Linus.
> > And with a known issue in it I wouldn't, so it is excluded at least from
> > my next branch for now. The topic branch is still alive and I will merge
> > it again when the fix is in.
>
> The fix is out, lets merge it back in so we can have some more time to
> discover any additional issues. People seem to test when it is in your
> branch.
>

Joerg, any chance you may give it a priority? This is the first step of
a long refactoring effort and it has been gating quite a few
well-reviewed improvements down the road. having it tested earlier
in your branch is definitely appreciated. ????

Thanks
Kevin

2022-05-14 01:33:55

by Alex Williamson

[permalink] [raw]
Subject: Re: [RESEND PATCH v8 00/11] Fix BUG_ON in vfio_iommu_group_notifier()

On Fri, 13 May 2022 17:49:44 +0200
Joerg Roedel <[email protected]> wrote:

> Hi Alex,
>
> On Wed, May 04, 2022 at 10:29:56AM -0600, Alex Williamson wrote:
> > Done, and thanks for the heads-up. Please try to cc me when the
> > vfio-notifier-fix branch is merged back into your next branch. Thanks,
>
> This has happened now, the vfio-notifier-fix branch got the fix and is
> merged back into my next branch.

Thanks, Joerg!

Jason, I'll push a merge of this with

Subject: [PATCH] vfio: Delete container_q
[email protected]

and

Subject: [PATCH v3 0/8] Remove vfio_group from the struct file facing VFIO API
[email protected]

as soon as my sanity build finishes. Thanks,

Alex


2022-05-14 03:14:01

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RESEND PATCH v8 00/11] Fix BUG_ON in vfio_iommu_group_notifier()

On Fri, May 13, 2022 at 10:25:48AM -0600, Alex Williamson wrote:
> On Fri, 13 May 2022 17:49:44 +0200
> Joerg Roedel <[email protected]> wrote:
>
> > Hi Alex,
> >
> > On Wed, May 04, 2022 at 10:29:56AM -0600, Alex Williamson wrote:
> > > Done, and thanks for the heads-up. Please try to cc me when the
> > > vfio-notifier-fix branch is merged back into your next branch. Thanks,
> >
> > This has happened now, the vfio-notifier-fix branch got the fix and is
> > merged back into my next branch.
>
> Thanks, Joerg!
>
> Jason, I'll push a merge of this with
>
> Subject: [PATCH] vfio: Delete container_q
> [email protected]
>
> and
>
> Subject: [PATCH v3 0/8] Remove vfio_group from the struct file facing VFIO API
> [email protected]
>
> as soon as my sanity build finishes. Thanks,

Thanks, I'll rebase and repost the remaining vfio series.

Jason

2022-05-14 03:52:53

by Joerg Roedel

[permalink] [raw]
Subject: Re: [RESEND PATCH v8 00/11] Fix BUG_ON in vfio_iommu_group_notifier()

Hi Alex,

On Wed, May 04, 2022 at 10:29:56AM -0600, Alex Williamson wrote:
> Done, and thanks for the heads-up. Please try to cc me when the
> vfio-notifier-fix branch is merged back into your next branch. Thanks,

This has happened now, the vfio-notifier-fix branch got the fix and is
merged back into my next branch.

Regards,

Joerg

2022-06-15 10:33:45

by Steven Price

[permalink] [raw]
Subject: Re: [RESEND PATCH v8 01/11] iommu: Add DMA ownership management interfaces

On 18/04/2022 01:49, Lu Baolu wrote:
> Multiple devices may be placed in the same IOMMU group because they
> cannot be isolated from each other. These devices must either be
> entirely under kernel control or userspace control, never a mixture.
>
> This adds dma ownership management in iommu core and exposes several
> interfaces for the device drivers and the device userspace assignment
> framework (i.e. VFIO), so that any conflict between user and kernel
> controlled dma could be detected at the beginning.
>
> The device driver oriented interfaces are,
>
> int iommu_device_use_default_domain(struct device *dev);
> void iommu_device_unuse_default_domain(struct device *dev);
>
> By calling iommu_device_use_default_domain(), the device driver tells
> the iommu layer that the device dma is handled through the kernel DMA
> APIs. The iommu layer will manage the IOVA and use the default domain
> for DMA address translation.
>
> The device user-space assignment framework oriented interfaces are,
>
> int iommu_group_claim_dma_owner(struct iommu_group *group,
> void *owner);
> void iommu_group_release_dma_owner(struct iommu_group *group);
> bool iommu_group_dma_owner_claimed(struct iommu_group *group);
>
> The device userspace assignment must be disallowed if the DMA owner
> claiming interface returns failure.
>
> Signed-off-by: Jason Gunthorpe <[email protected]>
> Signed-off-by: Kevin Tian <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Robin Murphy <[email protected]>

I'm seeing a regression that I've bisected to this commit on a Firefly
RK3288 board. The display driver fails to probe properly because
__iommu_attach_group() returns -EBUSY. This causes long hangs and splats
as the display flips timeout.

The call stack to __iommu_attach_group() is:

__iommu_attach_group from iommu_attach_device+0x64/0xb4
iommu_attach_device from rockchip_drm_dma_attach_device+0x20/0x50
rockchip_drm_dma_attach_device from vop_crtc_atomic_enable+0x10c/0xa64
vop_crtc_atomic_enable from drm_atomic_helper_commit_modeset_enables+0xa8/0x290
drm_atomic_helper_commit_modeset_enables from drm_atomic_helper_commit_tail_rpm+0x44/0x8c
drm_atomic_helper_commit_tail_rpm from commit_tail+0x9c/0x180
commit_tail from drm_atomic_helper_commit+0x164/0x18c
drm_atomic_helper_commit from drm_atomic_commit+0xac/0xe4
drm_atomic_commit from drm_client_modeset_commit_atomic+0x23c/0x284
drm_client_modeset_commit_atomic from drm_client_modeset_commit_locked+0x60/0x1c8
drm_client_modeset_commit_locked from drm_client_modeset_commit+0x24/0x40
drm_client_modeset_commit from drm_fb_helper_set_par+0xb8/0xf8
drm_fb_helper_set_par from drm_fb_helper_hotplug_event.part.0+0xa8/0xc0
drm_fb_helper_hotplug_event.part.0 from output_poll_execute+0xb8/0x224

> @@ -2109,7 +2115,7 @@ static int __iommu_attach_group(struct iommu_domain *domain,
> {
> int ret;
>
> - if (group->default_domain && group->domain != group->default_domain)
> + if (group->domain && group->domain != group->default_domain)
> return -EBUSY;
>
> ret = __iommu_group_for_each_dev(group, domain,

Reverting this 'fixes' the problem for me. The follow up 0286300e6045
("iommu: iommu_group_claim_dma_owner() must always assign a domain")
doesn't help.

Adding some debug printks I can see that domain is a valid pointer, but
both default_domain and blocking_domain are NULL.

I'm using the DTB from the kernel tree (rk3288-firefly.dtb).

Any ideas?

Thanks,

Steve

2022-06-15 11:51:29

by Robin Murphy

[permalink] [raw]
Subject: Re: [RESEND PATCH v8 01/11] iommu: Add DMA ownership management interfaces

On 2022-06-15 10:53, Steven Price wrote:
> On 18/04/2022 01:49, Lu Baolu wrote:
>> Multiple devices may be placed in the same IOMMU group because they
>> cannot be isolated from each other. These devices must either be
>> entirely under kernel control or userspace control, never a mixture.
>>
>> This adds dma ownership management in iommu core and exposes several
>> interfaces for the device drivers and the device userspace assignment
>> framework (i.e. VFIO), so that any conflict between user and kernel
>> controlled dma could be detected at the beginning.
>>
>> The device driver oriented interfaces are,
>>
>> int iommu_device_use_default_domain(struct device *dev);
>> void iommu_device_unuse_default_domain(struct device *dev);
>>
>> By calling iommu_device_use_default_domain(), the device driver tells
>> the iommu layer that the device dma is handled through the kernel DMA
>> APIs. The iommu layer will manage the IOVA and use the default domain
>> for DMA address translation.
>>
>> The device user-space assignment framework oriented interfaces are,
>>
>> int iommu_group_claim_dma_owner(struct iommu_group *group,
>> void *owner);
>> void iommu_group_release_dma_owner(struct iommu_group *group);
>> bool iommu_group_dma_owner_claimed(struct iommu_group *group);
>>
>> The device userspace assignment must be disallowed if the DMA owner
>> claiming interface returns failure.
>>
>> Signed-off-by: Jason Gunthorpe <[email protected]>
>> Signed-off-by: Kevin Tian <[email protected]>
>> Signed-off-by: Lu Baolu <[email protected]>
>> Reviewed-by: Robin Murphy <[email protected]>
>
> I'm seeing a regression that I've bisected to this commit on a Firefly
> RK3288 board. The display driver fails to probe properly because
> __iommu_attach_group() returns -EBUSY. This causes long hangs and splats
> as the display flips timeout.
>
> The call stack to __iommu_attach_group() is:
>
> __iommu_attach_group from iommu_attach_device+0x64/0xb4
> iommu_attach_device from rockchip_drm_dma_attach_device+0x20/0x50
> rockchip_drm_dma_attach_device from vop_crtc_atomic_enable+0x10c/0xa64
> vop_crtc_atomic_enable from drm_atomic_helper_commit_modeset_enables+0xa8/0x290
> drm_atomic_helper_commit_modeset_enables from drm_atomic_helper_commit_tail_rpm+0x44/0x8c
> drm_atomic_helper_commit_tail_rpm from commit_tail+0x9c/0x180
> commit_tail from drm_atomic_helper_commit+0x164/0x18c
> drm_atomic_helper_commit from drm_atomic_commit+0xac/0xe4
> drm_atomic_commit from drm_client_modeset_commit_atomic+0x23c/0x284
> drm_client_modeset_commit_atomic from drm_client_modeset_commit_locked+0x60/0x1c8
> drm_client_modeset_commit_locked from drm_client_modeset_commit+0x24/0x40
> drm_client_modeset_commit from drm_fb_helper_set_par+0xb8/0xf8
> drm_fb_helper_set_par from drm_fb_helper_hotplug_event.part.0+0xa8/0xc0
> drm_fb_helper_hotplug_event.part.0 from output_poll_execute+0xb8/0x224
>
>> @@ -2109,7 +2115,7 @@ static int __iommu_attach_group(struct iommu_domain *domain,
>> {
>> int ret;
>>
>> - if (group->default_domain && group->domain != group->default_domain)
>> + if (group->domain && group->domain != group->default_domain)
>> return -EBUSY;
>>
>> ret = __iommu_group_for_each_dev(group, domain,
>
> Reverting this 'fixes' the problem for me. The follow up 0286300e6045
> ("iommu: iommu_group_claim_dma_owner() must always assign a domain")
> doesn't help.
>
> Adding some debug printks I can see that domain is a valid pointer, but
> both default_domain and blocking_domain are NULL.
>
> I'm using the DTB from the kernel tree (rk3288-firefly.dtb).
>
> Any ideas?

Hmm, TBH I'm not sure how that worked previously... it'll be complaining
because the ARM DMA domain is still attached, but even when the attach
goes ahead and replaces the ARM domain with the driver's new one, it's
not using the special arm_iommu_detach_device() interface anywhere so
the device would still be left with the wrong DMA ops :/

I guess the most pragmatic option is probably to give rockchip-drm a
similar bodge to exynos and tegra, to explicitly remove the ARM domain
before attaching its own.

Thanks,
Robin.

2022-06-15 15:14:36

by Steven Price

[permalink] [raw]
Subject: Re: [RESEND PATCH v8 01/11] iommu: Add DMA ownership management interfaces

On 15/06/2022 11:57, Robin Murphy wrote:
> On 2022-06-15 10:53, Steven Price wrote:
>> On 18/04/2022 01:49, Lu Baolu wrote:
>>> Multiple devices may be placed in the same IOMMU group because they
>>> cannot be isolated from each other. These devices must either be
>>> entirely under kernel control or userspace control, never a mixture.
>>>
>>> This adds dma ownership management in iommu core and exposes several
>>> interfaces for the device drivers and the device userspace assignment
>>> framework (i.e. VFIO), so that any conflict between user and kernel
>>> controlled dma could be detected at the beginning.
>>>
>>> The device driver oriented interfaces are,
>>>
>>>     int iommu_device_use_default_domain(struct device *dev);
>>>     void iommu_device_unuse_default_domain(struct device *dev);
>>>
>>> By calling iommu_device_use_default_domain(), the device driver tells
>>> the iommu layer that the device dma is handled through the kernel DMA
>>> APIs. The iommu layer will manage the IOVA and use the default domain
>>> for DMA address translation.
>>>
>>> The device user-space assignment framework oriented interfaces are,
>>>
>>>     int iommu_group_claim_dma_owner(struct iommu_group *group,
>>>                     void *owner);
>>>     void iommu_group_release_dma_owner(struct iommu_group *group);
>>>     bool iommu_group_dma_owner_claimed(struct iommu_group *group);
>>>
>>> The device userspace assignment must be disallowed if the DMA owner
>>> claiming interface returns failure.
>>>
>>> Signed-off-by: Jason Gunthorpe <[email protected]>
>>> Signed-off-by: Kevin Tian <[email protected]>
>>> Signed-off-by: Lu Baolu <[email protected]>
>>> Reviewed-by: Robin Murphy <[email protected]>
>>
>> I'm seeing a regression that I've bisected to this commit on a Firefly
>> RK3288 board. The display driver fails to probe properly because
>> __iommu_attach_group() returns -EBUSY. This causes long hangs and splats
>> as the display flips timeout.
>>
>> The call stack to __iommu_attach_group() is:
>>
>>   __iommu_attach_group from iommu_attach_device+0x64/0xb4
>>   iommu_attach_device from rockchip_drm_dma_attach_device+0x20/0x50
>>   rockchip_drm_dma_attach_device from vop_crtc_atomic_enable+0x10c/0xa64
>>   vop_crtc_atomic_enable from
>> drm_atomic_helper_commit_modeset_enables+0xa8/0x290
>>   drm_atomic_helper_commit_modeset_enables from
>> drm_atomic_helper_commit_tail_rpm+0x44/0x8c
>>   drm_atomic_helper_commit_tail_rpm from commit_tail+0x9c/0x180
>>   commit_tail from drm_atomic_helper_commit+0x164/0x18c
>>   drm_atomic_helper_commit from drm_atomic_commit+0xac/0xe4
>>   drm_atomic_commit from drm_client_modeset_commit_atomic+0x23c/0x284
>>   drm_client_modeset_commit_atomic from
>> drm_client_modeset_commit_locked+0x60/0x1c8
>>   drm_client_modeset_commit_locked from
>> drm_client_modeset_commit+0x24/0x40
>>   drm_client_modeset_commit from drm_fb_helper_set_par+0xb8/0xf8
>>   drm_fb_helper_set_par from drm_fb_helper_hotplug_event.part.0+0xa8/0xc0
>>   drm_fb_helper_hotplug_event.part.0 from output_poll_execute+0xb8/0x224
>>
>>> @@ -2109,7 +2115,7 @@ static int __iommu_attach_group(struct
>>> iommu_domain *domain,
>>>   {
>>>       int ret;
>>>   -    if (group->default_domain && group->domain !=
>>> group->default_domain)
>>> +    if (group->domain && group->domain != group->default_domain)
>>>           return -EBUSY;
>>>         ret = __iommu_group_for_each_dev(group, domain,
>>
>> Reverting this 'fixes' the problem for me. The follow up 0286300e6045
>> ("iommu: iommu_group_claim_dma_owner() must always assign a domain")
>> doesn't help.
>>
>> Adding some debug printks I can see that domain is a valid pointer, but
>> both default_domain and blocking_domain are NULL.
>>
>> I'm using the DTB from the kernel tree (rk3288-firefly.dtb).
>>
>> Any ideas?
>
> Hmm, TBH I'm not sure how that worked previously... it'll be complaining
> because the ARM DMA domain is still attached, but even when the attach
> goes ahead and replaces the ARM domain with the driver's new one, it's
> not using the special arm_iommu_detach_device() interface anywhere so
> the device would still be left with the wrong DMA ops :/
>
> I guess the most pragmatic option is probably to give rockchip-drm a
> similar bodge to exynos and tegra, to explicitly remove the ARM domain
> before attaching its own.

A bodge like below indeed 'fixes' the problem:

---8<---
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_drv.c b/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
index 67d38f53d3e5..cbc6a5121296 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
@@ -23,6 +23,14 @@
#include <drm/drm_probe_helper.h>
#include <drm/drm_vblank.h>

+#if defined(CONFIG_ARM_DMA_USE_IOMMU)
+#include <asm/dma-iommu.h>
+#else
+#define arm_iommu_detach_device(...) ({ })
+#define arm_iommu_release_mapping(...) ({ })
+#define to_dma_iommu_mapping(dev) NULL
+#endif
+
#include "rockchip_drm_drv.h"
#include "rockchip_drm_fb.h"
#include "rockchip_drm_gem.h"
@@ -49,6 +57,14 @@ int rockchip_drm_dma_attach_device(struct drm_device *drm_dev,
if (!private->domain)
return 0;

+ if (IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU)) {
+ struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+ if (mapping) {
+ arm_iommu_detach_device(dev);
+ arm_iommu_release_mapping(mapping);
+ }
+ }
+
ret = iommu_attach_device(private->domain, dev);
if (ret) {
DRM_DEV_ERROR(dev, "Failed to attach iommu device\n");
---8<---

I'll type up a proper commit message and see what the DRM maintainers think.

Thanks,

Steve

2023-06-26 13:19:59

by Zenghui Yu

[permalink] [raw]
Subject: Re: [RESEND PATCH v8 04/11] bus: platform, amba, fsl-mc, PCI: Add device DMA ownership management

On 2022/4/18 8:49, Lu Baolu wrote:
> The devices on platform/amba/fsl-mc/PCI buses could be bound to drivers
> with the device DMA managed by kernel drivers or user-space applications.
> Unfortunately, multiple devices may be placed in the same IOMMU group
> because they cannot be isolated from each other. The DMA on these devices
> must either be entirely under kernel control or userspace control, never
> a mixture. Otherwise the driver integrity is not guaranteed because they
> could access each other through the peer-to-peer accesses which by-pass
> the IOMMU protection.
>
> This checks and sets the default DMA mode during driver binding, and
> cleanups during driver unbinding. In the default mode, the device DMA is
> managed by the device driver which handles DMA operations through the
> kernel DMA APIs (see Documentation/core-api/dma-api.rst).
>
> For cases where the devices are assigned for userspace control through the
> userspace driver framework(i.e. VFIO), the drivers(for example, vfio_pci/
> vfio_platfrom etc.) may set a new flag (driver_managed_dma) to skip this
> default setting in the assumption that the drivers know what they are
> doing with the device DMA.
>
> Calling iommu_device_use_default_domain() before {of,acpi}_dma_configure
> is currently a problem. As things stand, the IOMMU driver ignored the
> initial iommu_probe_device() call when the device was added, since at
> that point it had no fwspec yet. In this situation,
> {of,acpi}_iommu_configure() are retriggering iommu_probe_device() after
> the IOMMU driver has seen the firmware data via .of_xlate to learn that
> it actually responsible for the given device. As the result, before
> that gets fixed, iommu_use_default_domain() goes at the end, and calls
> arch_teardown_dma_ops() if it fails.
>
> Cc: Greg Kroah-Hartman <[email protected]>
> Cc: Bjorn Helgaas <[email protected]>
> Cc: Stuart Yoder <[email protected]>
> Cc: Laurentiu Tudor <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Greg Kroah-Hartman <[email protected]>
> Reviewed-by: Jason Gunthorpe <[email protected]>
> Reviewed-by: Robin Murphy <[email protected]>
> Tested-by: Eric Auger <[email protected]>

[...]

> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 60adf42460ab..b933d2b08d4d 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -895,6 +895,13 @@ struct module;
> * created once it is bound to the driver.
> * @driver: Driver model structure.
> * @dynids: List of dynamically added device IDs.
> + * @driver_managed_dma: Device driver doesn't use kernel DMA API for DMA.
> + * For most device drivers, no need to care about this flag
> + * as long as all DMAs are handled through the kernel DMA API.
> + * For some special ones, for example VFIO drivers, they know
> + * how to manage the DMA themselves and set this flag so that
> + * the IOMMU layer will allow them to setup and manage their
> + * own I/O address space.
> */
> struct pci_driver {
> struct list_head node;
> @@ -913,6 +920,7 @@ struct pci_driver {
> const struct attribute_group **dev_groups;
> struct device_driver driver;
> struct pci_dynids dynids;
> + bool driver_managed_dma;
> };
>
> static inline struct pci_driver *to_pci_driver(struct device_driver *drv)

[...]

> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 4ceeb75fc899..f83f7fbac68f 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -20,6 +20,7 @@
> #include <linux/of_device.h>
> #include <linux/acpi.h>
> #include <linux/dma-map-ops.h>
> +#include <linux/iommu.h>
> #include "pci.h"
> #include "pcie/portdrv.h"
>
> @@ -1601,6 +1602,7 @@ static int pci_bus_num_vf(struct device *dev)
> */
> static int pci_dma_configure(struct device *dev)
> {
> + struct pci_driver *driver = to_pci_driver(dev->driver);
> struct device *bridge;
> int ret = 0;
>
> @@ -1616,9 +1618,24 @@ static int pci_dma_configure(struct device *dev)
> }
>
> pci_put_host_bridge_device(bridge);
> +
> + if (!ret && !driver->driver_managed_dma) {
> + ret = iommu_device_use_default_domain(dev);
> + if (ret)
> + arch_teardown_dma_ops(dev);
> + }
> +
> return ret;
> }
>
> +static void pci_dma_cleanup(struct device *dev)
> +{
> + struct pci_driver *driver = to_pci_driver(dev->driver);
> +
> + if (!driver->driver_managed_dma)
> + iommu_device_unuse_default_domain(dev);
> +}
> +
> struct bus_type pci_bus_type = {
> .name = "pci",
> .match = pci_bus_match,
> @@ -1632,6 +1649,7 @@ struct bus_type pci_bus_type = {
> .pm = PCI_PM_OPS_PTR,
> .num_vf = pci_bus_num_vf,
> .dma_configure = pci_dma_configure,
> + .dma_cleanup = pci_dma_cleanup,
> };
> EXPORT_SYMBOL(pci_bus_type);

I (somehow forgot to delete DEBUG_TEST_DRIVER_REMOVE in my .config, and)
failed to start the guest with an assigned PCI device, with something
like:

| qemu-system-aarch64: -device
vfio-pci,host=0000:03:00.1,id=hostdev0,bus=pci.8,addr=0x0: vfio
0000:03:00.1: group 45 is not viable
| Please ensure all devices within the iommu_group are bound to their
vfio bus driver.

It looks like on device probe, with DEBUG_TEST_DRIVER_REMOVE,
.dma_configure() will be executed *twice* via the
really_probe()/re_probe path, and *no* .dma_cleanup() will be executed.
The resulting dev::iommu_group::owner_cnt is 2, which will confuse the
later iommu_group_dma_owner_claimed() call from VFIO on guest startup.

I can locally workaround the problem by deleting the DEBUG option or
performing a .dma_cleanup() during test_remove'ing, but it'd be great if
you can take a look.

Thanks,
Zenghui

2023-06-28 14:43:17

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RESEND PATCH v8 04/11] bus: platform, amba, fsl-mc, PCI: Add device DMA ownership management

On Mon, Jun 26, 2023 at 09:02:40PM +0800, Zenghui Yu wrote:

> It looks like on device probe, with DEBUG_TEST_DRIVER_REMOVE,
> .dma_configure() will be executed *twice* via the
> really_probe()/re_probe path, and *no* .dma_cleanup() will be executed.
> The resulting dev::iommu_group::owner_cnt is 2, which will confuse the
> later iommu_group_dma_owner_claimed() call from VFIO on guest startup.

Does this work for you?

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 9c09ca5c4ab68e..7145d9b940b14b 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -693,6 +693,8 @@ static int really_probe(struct device *dev, struct device_driver *drv)

device_remove(dev);
driver_sysfs_remove(dev);
+ if (dev->bus && dev->bus->dma_cleanup)
+ dev->bus->dma_cleanup(dev);
device_unbind_cleanup(dev);

goto re_probe;


2023-06-29 03:21:46

by Zenghui Yu

[permalink] [raw]
Subject: Re: [RESEND PATCH v8 04/11] bus: platform, amba, fsl-mc, PCI: Add device DMA ownership management

On 2023/6/28 22:36, Jason Gunthorpe wrote:
> On Mon, Jun 26, 2023 at 09:02:40PM +0800, Zenghui Yu wrote:
>
>> It looks like on device probe, with DEBUG_TEST_DRIVER_REMOVE,
>> .dma_configure() will be executed *twice* via the
>> really_probe()/re_probe path, and *no* .dma_cleanup() will be executed.
>> The resulting dev::iommu_group::owner_cnt is 2, which will confuse the
>> later iommu_group_dma_owner_claimed() call from VFIO on guest startup.
>
> Does this work for you?

It works. Please feel free to add my Tested-by if you send it as a
formal patch. Thanks.

> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 9c09ca5c4ab68e..7145d9b940b14b 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -693,6 +693,8 @@ static int really_probe(struct device *dev, struct device_driver *drv)
>
> device_remove(dev);
> driver_sysfs_remove(dev);
> + if (dev->bus && dev->bus->dma_cleanup)
> + dev->bus->dma_cleanup(dev);
> device_unbind_cleanup(dev);
>
> goto re_probe;