2024-03-05 01:39:26

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v3 0/5] iommu: Fix domain check on release

This is a follow-up to the discussion thread here:

https://lore.kernel.org/linux-iommu/[email protected]/

It fixes a NULL pointer dereference issue in the Intel iommu driver and
strengthens the iommu core to possibly prevent similar failures in other
iommu drivers.

There are two parts of this topic:
[x] Introduce release_domain and fix a kernel NULL pointer dereference
issue in the intel iommu driver.
[x] Follow-up patches to cleanup intel iommu driver.

Best regards,
baolu

Change log:
v3:
- Avoid global IOTLB and PASID cache invalidation in normal release
path to mitigate the impact on other devices.
- Comment and code refinements.

v2:
- https://lore.kernel.org/linux-iommu/[email protected]/
- https://lore.kernel.org/linux-iommu/[email protected]/
- The scalable mode context entry should be removed in the release path
as it's not part of the blocking domain.

v1: https://lore.kernel.org/linux-iommu/[email protected]/

Lu Baolu (5):
iommu: Add static iommu_ops->release_domain
iommu/vt-d: Fix NULL domain on device release
iommu/vt-d: Setup scalable mode context entry in probe path
iommu/vt-d: Remove scalable mode context entry setup from attach_dev
iommu/vt-d: Remove scalabe mode in domain_context_clear_one()

include/linux/iommu.h | 1 +
drivers/iommu/intel/pasid.h | 2 +
drivers/iommu/intel/iommu.c | 214 +++++++++++-------------------------
drivers/iommu/intel/pasid.c | 202 ++++++++++++++++++++++++++++++++++
drivers/iommu/iommu.c | 19 +++-
5 files changed, 283 insertions(+), 155 deletions(-)

--
2.34.1



2024-03-05 01:39:38

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v3 1/5] iommu: Add static iommu_ops->release_domain

The current device_release callback for individual iommu drivers does the
following:

1) Silent IOMMU DMA translation: It detaches any existing domain from the
device and puts it into a blocking state (some drivers might use the
identity state).
2) Resource release: It releases resources allocated during the
device_probe callback and restores the device to its pre-probe state.

Step 1 is challenging for individual iommu drivers because each must check
if a domain is already attached to the device. Additionally, if a deferred
attach never occurred, the device_release should avoid modifying hardware
configuration regardless of the reason for its call.

To simplify this process, introduce a static release_domain within the
iommu_ops structure. It can be either a blocking or identity domain
depending on the iommu hardware. The iommu core will decide whether to
attach this domain before the device_release callback, eliminating the
need for repetitive code in various drivers.

Consequently, the device_release callback can focus solely on the opposite
operations of device_probe, including releasing all resources allocated
during that callback.

Co-developed-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
---
include/linux/iommu.h | 1 +
drivers/iommu/iommu.c | 19 +++++++++++++++----
2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index af6c367ed673..2e925b5eba53 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -585,6 +585,7 @@ struct iommu_ops {
struct module *owner;
struct iommu_domain *identity_domain;
struct iommu_domain *blocked_domain;
+ struct iommu_domain *release_domain;
struct iommu_domain *default_domain;
};

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index eb50543bf956..098869007c69 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -462,13 +462,24 @@ static void iommu_deinit_device(struct device *dev)

/*
* release_device() must stop using any attached domain on the device.
- * If there are still other devices in the group they are not effected
+ * If there are still other devices in the group, they are not affected
* by this callback.
*
- * The IOMMU driver must set the device to either an identity or
- * blocking translation and stop using any domain pointer, as it is
- * going to be freed.
+ * If the iommu driver provides release_domain, the core code ensures
+ * that domain is attached prior to calling release_device. Drivers can
+ * use this to enforce a translation on the idle iommu. Typically, the
+ * global static blocked_domain is a good choice.
+ *
+ * Otherwise, the iommu driver must set the device to either an identity
+ * or a blocking translation in release_device() and stop using any
+ * domain pointer, as it is going to be freed.
+ *
+ * Regardless, if a delayed attach never occurred, then the release
+ * should still avoid touching any hardware configuration either.
*/
+ if (!dev->iommu->attach_deferred && ops->release_domain)
+ ops->release_domain->ops->attach_dev(ops->release_domain, dev);
+
if (ops->release_device)
ops->release_device(dev);

--
2.34.1


2024-03-05 01:40:13

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v3 3/5] iommu/vt-d: Setup scalable mode context entry in probe path

In contrast to legacy mode, the DMA translation table is configured in
the PASID table entry instead of the context entry for scalable mode.
For this reason, it is more appropriate to set up the scalable mode
context entry in the device_probe callback and direct it to the
appropriate PASID table.

The iommu domain attach/detach operations only affect the PASID table
entry. Therefore, there is no need to modify the context entry when
configuring the translation type and page table.

The only exception is the kdump case, where context entry setup is
postponed until the device driver invokes the first DMA interface.

Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/intel/pasid.h | 1 +
drivers/iommu/intel/iommu.c | 12 ++++
drivers/iommu/intel/pasid.c | 138 ++++++++++++++++++++++++++++++++++++
3 files changed, 151 insertions(+)

diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
index 42fda97fd851..da9978fef7ac 100644
--- a/drivers/iommu/intel/pasid.h
+++ b/drivers/iommu/intel/pasid.h
@@ -318,5 +318,6 @@ void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
bool fault_ignore);
void intel_pasid_setup_page_snoop_control(struct intel_iommu *iommu,
struct device *dev, u32 pasid);
+int intel_pasid_setup_sm_context(struct device *dev);
void intel_pasid_teardown_sm_context(struct device *dev);
#endif /* __INTEL_PASID_H */
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index f74d42d3258f..9b96d36b9d2a 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4073,6 +4073,10 @@ int prepare_domain_attach_device(struct iommu_domain *domain,
dmar_domain->agaw--;
}

+ if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev) &&
+ context_copied(iommu, info->bus, info->devfn))
+ return intel_pasid_setup_sm_context(dev);
+
return 0;
}

@@ -4386,11 +4390,19 @@ static struct iommu_device *intel_iommu_probe_device(struct device *dev)
dev_err(dev, "PASID table allocation failed\n");
goto clear_rbtree;
}
+
+ if (!context_copied(iommu, info->bus, info->devfn)) {
+ ret = intel_pasid_setup_sm_context(dev);
+ if (ret)
+ goto free_table;
+ }
}

intel_iommu_debugfs_create_dev(info);

return &iommu->iommu;
+free_table:
+ intel_pasid_free_table(dev);
clear_rbtree:
device_rbtree_remove(info);
free:
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 9261ea986dbf..625d5cff49c3 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -731,3 +731,141 @@ void intel_pasid_teardown_sm_context(struct device *dev)

pci_for_each_dma_alias(to_pci_dev(dev), pci_pasid_table_teardown, dev);
}
+
+/*
+ * Get the PASID directory size for scalable mode context entry.
+ * Value of X in the PDTS field of a scalable mode context entry
+ * indicates PASID directory with 2^(X + 7) entries.
+ */
+static unsigned long context_get_sm_pds(struct pasid_table *table)
+{
+ unsigned long pds, max_pde;
+
+ max_pde = table->max_pasid >> PASID_PDE_SHIFT;
+ pds = find_first_bit(&max_pde, MAX_NR_PASID_BITS);
+ if (pds < 7)
+ return 0;
+
+ return pds - 7;
+}
+
+static int context_entry_set_pasid_table(struct context_entry *context,
+ struct device *dev)
+{
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+ struct pasid_table *table = info->pasid_table;
+ struct intel_iommu *iommu = info->iommu;
+ unsigned long pds;
+
+ context_clear_entry(context);
+
+ pds = context_get_sm_pds(table);
+ context->lo = (u64)virt_to_phys(table->table) | context_pdts(pds);
+ context_set_sm_rid2pasid(context, IOMMU_NO_PASID);
+
+ if (info->ats_supported)
+ context_set_sm_dte(context);
+ if (info->pri_supported)
+ context_set_sm_pre(context);
+ if (info->pasid_supported)
+ context_set_pasid(context);
+
+ context_set_fault_enable(context);
+ context_set_present(context);
+ __iommu_flush_cache(iommu, context, sizeof(*context));
+
+ return 0;
+}
+
+static int device_pasid_table_setup(struct device *dev, u8 bus, u8 devfn)
+{
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+ struct intel_iommu *iommu = info->iommu;
+ struct context_entry *context;
+
+ spin_lock(&iommu->lock);
+ context = iommu_context_addr(iommu, bus, devfn, true);
+ if (!context) {
+ spin_unlock(&iommu->lock);
+ return -ENOMEM;
+ }
+
+ if (context_present(context) && !context_copied(iommu, bus, devfn)) {
+ spin_unlock(&iommu->lock);
+ return 0;
+ }
+
+ if (context_copied(iommu, bus, devfn)) {
+ context_clear_entry(context);
+ __iommu_flush_cache(iommu, context, sizeof(*context));
+
+ /*
+ * For kdump cases, old valid entries may be cached due to
+ * the in-flight DMA and copied pgtable, but there is no
+ * unmapping behaviour for them, thus we need explicit cache
+ * flushes for all affected domain IDs and PASIDs used in
+ * the copied PASID table. Given that we have no idea about
+ * which domain IDs and PASIDs were used in the copied tables,
+ * upgrade them to global PASID and IOTLB cache invalidation.
+ */
+ iommu->flush.flush_context(iommu, 0,
+ PCI_DEVID(bus, devfn),
+ DMA_CCMD_MASK_NOBIT,
+ DMA_CCMD_DEVICE_INVL);
+ qi_flush_pasid_cache(iommu, 0, QI_PC_GLOBAL, 0);
+ iommu->flush.flush_iotlb(iommu, 0, 0, 0, DMA_TLB_GLOBAL_FLUSH);
+ devtlb_invalidation_with_pasid(iommu, dev, IOMMU_NO_PASID);
+
+ /*
+ * At this point, the device is supposed to finish reset at
+ * its driver probe stage, so no in-flight DMA will exist,
+ * and we don't need to worry anymore hereafter.
+ */
+ clear_context_copied(iommu, bus, devfn);
+ }
+
+ context_entry_set_pasid_table(context, dev);
+ spin_unlock(&iommu->lock);
+
+ /*
+ * It's a non-present to present mapping. If hardware doesn't cache
+ * non-present entry we don't need to flush the caches. If it does
+ * cache non-present entries, then it does so in the special
+ * domain #0, which we have to flush:
+ */
+ if (cap_caching_mode(iommu->cap)) {
+ iommu->flush.flush_context(iommu, 0,
+ PCI_DEVID(bus, devfn),
+ DMA_CCMD_MASK_NOBIT,
+ DMA_CCMD_DEVICE_INVL);
+ iommu->flush.flush_iotlb(iommu, 0, 0, 0, DMA_TLB_DSI_FLUSH);
+ }
+
+ return 0;
+}
+
+static int pci_pasid_table_setup(struct pci_dev *pdev, u16 alias, void *data)
+{
+ struct device *dev = data;
+
+ if (dev != &pdev->dev)
+ return 0;
+
+ return device_pasid_table_setup(dev, PCI_BUS_NUM(alias), alias & 0xff);
+}
+
+/*
+ * Set the device's PASID table to its context table entry.
+ *
+ * The PASID table is set to the context entries of both device itself
+ * and its alias requester ID for DMA.
+ */
+int intel_pasid_setup_sm_context(struct device *dev)
+{
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+
+ if (!dev_is_pci(dev))
+ return device_pasid_table_setup(dev, info->bus, info->devfn);
+
+ return pci_for_each_dma_alias(to_pci_dev(dev), pci_pasid_table_setup, dev);
+}
--
2.34.1


2024-03-05 01:40:12

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v3 2/5] iommu/vt-d: Fix NULL domain on device release

In the kdump kernel, the IOMMU operates in deferred_attach mode. In this
mode, info->domain may not yet be assigned by the time the release_device
function is called. It leads to the following crash in the crash kernel:

BUG: kernel NULL pointer dereference, address: 000000000000003c
...
RIP: 0010:do_raw_spin_lock+0xa/0xa0
...
_raw_spin_lock_irqsave+0x1b/0x30
intel_iommu_release_device+0x96/0x170
iommu_deinit_device+0x39/0xf0
__iommu_group_remove_device+0xa0/0xd0
iommu_bus_notifier+0x55/0xb0
notifier_call_chain+0x5a/0xd0
blocking_notifier_call_chain+0x41/0x60
bus_notify+0x34/0x50
device_del+0x269/0x3d0
pci_remove_bus_device+0x77/0x100
p2sb_bar+0xae/0x1d0
...
i801_probe+0x423/0x740

Use the release_domain mechanism to fix it. The scalable mode context
entry which is not part of release_domain should be cleared in
release_device().

Fixes: 586081d3f6b1 ("iommu/vt-d: Remove DEFER_DEVICE_DOMAIN_INFO")
Reported-by: Eric Badger <[email protected]>
Closes: https://lore.kernel.org/r/[email protected]
Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/intel/pasid.h | 1 +
drivers/iommu/intel/iommu.c | 31 ++++--------------
drivers/iommu/intel/pasid.c | 64 +++++++++++++++++++++++++++++++++++++
3 files changed, 71 insertions(+), 25 deletions(-)

diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
index 487ede039bdd..42fda97fd851 100644
--- a/drivers/iommu/intel/pasid.h
+++ b/drivers/iommu/intel/pasid.h
@@ -318,4 +318,5 @@ void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
bool fault_ignore);
void intel_pasid_setup_page_snoop_control(struct intel_iommu *iommu,
struct device *dev, u32 pasid);
+void intel_pasid_teardown_sm_context(struct device *dev);
#endif /* __INTEL_PASID_H */
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index cc3994efd362..f74d42d3258f 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3869,30 +3869,6 @@ static void domain_context_clear(struct device_domain_info *info)
&domain_context_clear_one_cb, info);
}

-static void dmar_remove_one_dev_info(struct device *dev)
-{
- struct device_domain_info *info = dev_iommu_priv_get(dev);
- struct dmar_domain *domain = info->domain;
- struct intel_iommu *iommu = info->iommu;
- unsigned long flags;
-
- if (!dev_is_real_dma_subdevice(info->dev)) {
- if (dev_is_pci(info->dev) && sm_supported(iommu))
- intel_pasid_tear_down_entry(iommu, info->dev,
- IOMMU_NO_PASID, false);
-
- iommu_disable_pci_caps(info);
- domain_context_clear(info);
- }
-
- spin_lock_irqsave(&domain->lock, flags);
- list_del(&info->link);
- spin_unlock_irqrestore(&domain->lock, flags);
-
- domain_detach_iommu(domain, iommu);
- info->domain = NULL;
-}
-
/*
* Clear the page table pointer in context or pasid table entries so that
* all DMA requests without PASID from the device are blocked. If the page
@@ -4431,7 +4407,11 @@ static void intel_iommu_release_device(struct device *dev)
mutex_lock(&iommu->iopf_lock);
device_rbtree_remove(info);
mutex_unlock(&iommu->iopf_lock);
- dmar_remove_one_dev_info(dev);
+
+ if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev) &&
+ !context_copied(iommu, info->bus, info->devfn))
+ intel_pasid_teardown_sm_context(dev);
+
intel_pasid_free_table(dev);
intel_iommu_debugfs_remove_dev(info);
kfree(info);
@@ -4922,6 +4902,7 @@ static const struct iommu_dirty_ops intel_dirty_ops = {

const struct iommu_ops intel_iommu_ops = {
.blocked_domain = &blocking_domain,
+ .release_domain = &blocking_domain,
.capable = intel_iommu_capable,
.hw_info = intel_iommu_hw_info,
.domain_alloc = intel_iommu_domain_alloc,
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 108158e2b907..9261ea986dbf 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -667,3 +667,67 @@ int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device *dev,

return 0;
}
+
+/*
+ * Interfaces to setup or teardown a pasid table to the scalable-mode
+ * context table entry:
+ */
+
+static void device_pasid_table_teardown(struct device *dev, u8 bus, u8 devfn)
+{
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+ struct intel_iommu *iommu = info->iommu;
+ struct context_entry *context;
+
+ spin_lock(&iommu->lock);
+ context = iommu_context_addr(iommu, bus, devfn, false);
+ if (!context) {
+ spin_unlock(&iommu->lock);
+ return;
+ }
+
+ context_clear_entry(context);
+ __iommu_flush_cache(iommu, context, sizeof(*context));
+ spin_unlock(&iommu->lock);
+
+ /*
+ * Cache invalidation for changes to a scalable-mode context table
+ * entry.
+ *
+ * Section 6.5.3.3 of the VT-d spec:
+ * - Device-selective context-cache invalidation;
+ * - Domain-selective PASID-cache invalidation to affected domains
+ * (can be skipped if all PASID entries were not-present);
+ * - Domain-selective IOTLB invalidation to affected domains;
+ * - Global Device-TLB invalidation to affected functions.
+ *
+ * The iommu has been parked in the blocking state. All domains have
+ * been detached from the device or PASID. The PASID and IOTLB caches
+ * have been invalidated during the domain detach path.
+ */
+ iommu->flush.flush_context(iommu, 0, PCI_DEVID(bus, devfn),
+ DMA_CCMD_MASK_NOBIT, DMA_CCMD_DEVICE_INVL);
+ devtlb_invalidation_with_pasid(iommu, dev, IOMMU_NO_PASID);
+}
+
+static int pci_pasid_table_teardown(struct pci_dev *pdev, u16 alias, void *data)
+{
+ struct device *dev = data;
+
+ if (dev == &pdev->dev)
+ device_pasid_table_teardown(dev, PCI_BUS_NUM(alias), alias & 0xff);
+
+ return 0;
+}
+
+void intel_pasid_teardown_sm_context(struct device *dev)
+{
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+
+ if (!dev_is_pci(dev)) {
+ device_pasid_table_teardown(dev, info->bus, info->devfn);
+ return;
+ }
+
+ pci_for_each_dma_alias(to_pci_dev(dev), pci_pasid_table_teardown, dev);
+}
--
2.34.1


2024-03-05 01:40:31

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v3 4/5] iommu/vt-d: Remove scalable mode context entry setup from attach_dev

The scalable mode context entry is now setup in the probe_device path,
eliminating the need to configure it in the attach_dev path. Removes the
redundant code from the attach_dev path to avoid dead code.

Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
---
drivers/iommu/intel/iommu.c | 156 ++++++++++--------------------------
1 file changed, 44 insertions(+), 112 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 9b96d36b9d2a..d682eb6ad4d2 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1850,34 +1850,17 @@ static void domain_exit(struct dmar_domain *domain)
kfree(domain);
}

-/*
- * Get the PASID directory size for scalable mode context entry.
- * Value of X in the PDTS field of a scalable mode context entry
- * indicates PASID directory with 2^(X + 7) entries.
- */
-static unsigned long context_get_sm_pds(struct pasid_table *table)
-{
- unsigned long pds, max_pde;
-
- max_pde = table->max_pasid >> PASID_PDE_SHIFT;
- pds = find_first_bit(&max_pde, MAX_NR_PASID_BITS);
- if (pds < 7)
- return 0;
-
- return pds - 7;
-}
-
static int domain_context_mapping_one(struct dmar_domain *domain,
struct intel_iommu *iommu,
- struct pasid_table *table,
u8 bus, u8 devfn)
{
struct device_domain_info *info =
domain_lookup_dev_info(domain, iommu, bus, devfn);
u16 did = domain_id_iommu(domain, iommu);
int translation = CONTEXT_TT_MULTI_LEVEL;
+ struct dma_pte *pgd = domain->pgd;
struct context_entry *context;
- int ret;
+ int agaw, ret;

if (hw_pass_through && domain_type_is_si(domain))
translation = CONTEXT_TT_PASS_THROUGH;
@@ -1920,65 +1903,37 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
}

context_clear_entry(context);
+ context_set_domain_id(context, did);

- if (sm_supported(iommu)) {
- unsigned long pds;
-
- /* Setup the PASID DIR pointer: */
- pds = context_get_sm_pds(table);
- context->lo = (u64)virt_to_phys(table->table) |
- context_pdts(pds);
-
- /* Setup the RID_PASID field: */
- context_set_sm_rid2pasid(context, IOMMU_NO_PASID);
-
+ if (translation != CONTEXT_TT_PASS_THROUGH) {
/*
- * Setup the Device-TLB enable bit and Page request
- * Enable bit:
+ * Skip top levels of page tables for iommu which has
+ * less agaw than default. Unnecessary for PT mode.
*/
+ for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) {
+ ret = -ENOMEM;
+ pgd = phys_to_virt(dma_pte_addr(pgd));
+ if (!dma_pte_present(pgd))
+ goto out_unlock;
+ }
+
if (info && info->ats_supported)
- context_set_sm_dte(context);
- if (info && info->pri_supported)
- context_set_sm_pre(context);
- if (info && info->pasid_supported)
- context_set_pasid(context);
+ translation = CONTEXT_TT_DEV_IOTLB;
+ else
+ translation = CONTEXT_TT_MULTI_LEVEL;
+
+ context_set_address_root(context, virt_to_phys(pgd));
+ context_set_address_width(context, agaw);
} else {
- struct dma_pte *pgd = domain->pgd;
- int agaw;
-
- context_set_domain_id(context, did);
-
- if (translation != CONTEXT_TT_PASS_THROUGH) {
- /*
- * Skip top levels of page tables for iommu which has
- * less agaw than default. Unnecessary for PT mode.
- */
- for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) {
- ret = -ENOMEM;
- pgd = phys_to_virt(dma_pte_addr(pgd));
- if (!dma_pte_present(pgd))
- goto out_unlock;
- }
-
- if (info && info->ats_supported)
- translation = CONTEXT_TT_DEV_IOTLB;
- else
- translation = CONTEXT_TT_MULTI_LEVEL;
-
- context_set_address_root(context, virt_to_phys(pgd));
- context_set_address_width(context, agaw);
- } else {
- /*
- * In pass through mode, AW must be programmed to
- * indicate the largest AGAW value supported by
- * hardware. And ASR is ignored by hardware.
- */
- context_set_address_width(context, iommu->msagaw);
- }
-
- context_set_translation_type(context, translation);
+ /*
+ * In pass through mode, AW must be programmed to
+ * indicate the largest AGAW value supported by
+ * hardware. And ASR is ignored by hardware.
+ */
+ context_set_address_width(context, iommu->msagaw);
}

+ context_set_translation_type(context, translation);
context_set_fault_enable(context);
context_set_present(context);
if (!ecap_coherent(iommu->ecap))
@@ -2008,43 +1963,29 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
return ret;
}

-struct domain_context_mapping_data {
- struct dmar_domain *domain;
- struct intel_iommu *iommu;
- struct pasid_table *table;
-};
-
static int domain_context_mapping_cb(struct pci_dev *pdev,
u16 alias, void *opaque)
{
- struct domain_context_mapping_data *data = opaque;
+ struct device_domain_info *info = dev_iommu_priv_get(&pdev->dev);
+ struct intel_iommu *iommu = info->iommu;
+ struct dmar_domain *domain = opaque;

- return domain_context_mapping_one(data->domain, data->iommu,
- data->table, PCI_BUS_NUM(alias),
- alias & 0xff);
+ return domain_context_mapping_one(domain, iommu,
+ PCI_BUS_NUM(alias), alias & 0xff);
}

static int
domain_context_mapping(struct dmar_domain *domain, struct device *dev)
{
struct device_domain_info *info = dev_iommu_priv_get(dev);
- struct domain_context_mapping_data data;
struct intel_iommu *iommu = info->iommu;
u8 bus = info->bus, devfn = info->devfn;
- struct pasid_table *table;
-
- table = intel_pasid_get_table(dev);

if (!dev_is_pci(dev))
- return domain_context_mapping_one(domain, iommu, table,
- bus, devfn);
-
- data.domain = domain;
- data.iommu = iommu;
- data.table = table;
+ return domain_context_mapping_one(domain, iommu, bus, devfn);

return pci_for_each_dma_alias(to_pci_dev(dev),
- &domain_context_mapping_cb, &data);
+ domain_context_mapping_cb, domain);
}

/* Returns a number of VTD pages, but aligned to MM page size */
@@ -2404,28 +2345,19 @@ static int dmar_domain_attach_device(struct dmar_domain *domain,
list_add(&info->link, &domain->devices);
spin_unlock_irqrestore(&domain->lock, flags);

- /* PASID table is mandatory for a PCI device in scalable mode. */
- if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev)) {
- /* Setup the PASID entry for requests without PASID: */
- if (hw_pass_through && domain_type_is_si(domain))
- ret = intel_pasid_setup_pass_through(iommu,
- dev, IOMMU_NO_PASID);
- else if (domain->use_first_level)
- ret = domain_setup_first_level(iommu, domain, dev,
- IOMMU_NO_PASID);
- else
- ret = intel_pasid_setup_second_level(iommu, domain,
- dev, IOMMU_NO_PASID);
- if (ret) {
- dev_err(dev, "Setup RID2PASID failed\n");
- device_block_translation(dev);
- return ret;
- }
- }
+ if (dev_is_real_dma_subdevice(dev))
+ return 0;
+
+ if (!sm_supported(iommu))
+ ret = domain_context_mapping(domain, dev);
+ else if (hw_pass_through && domain_type_is_si(domain))
+ ret = intel_pasid_setup_pass_through(iommu, dev, IOMMU_NO_PASID);
+ else if (domain->use_first_level)
+ ret = domain_setup_first_level(iommu, domain, dev, IOMMU_NO_PASID);
+ else
+ ret = intel_pasid_setup_second_level(iommu, domain, dev, IOMMU_NO_PASID);

- ret = domain_context_mapping(domain, dev);
if (ret) {
- dev_err(dev, "Domain context map failed\n");
device_block_translation(dev);
return ret;
}
--
2.34.1


2024-03-05 01:49:31

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v3 2/5] iommu/vt-d: Fix NULL domain on device release

> From: Lu Baolu <[email protected]>
> Sent: Tuesday, March 5, 2024 9:33 AM
>
> In the kdump kernel, the IOMMU operates in deferred_attach mode. In this
> mode, info->domain may not yet be assigned by the time the release_device
> function is called. It leads to the following crash in the crash kernel:
>
> BUG: kernel NULL pointer dereference, address: 000000000000003c
> ...
> RIP: 0010:do_raw_spin_lock+0xa/0xa0
> ...
> _raw_spin_lock_irqsave+0x1b/0x30
> intel_iommu_release_device+0x96/0x170
> iommu_deinit_device+0x39/0xf0
> __iommu_group_remove_device+0xa0/0xd0
> iommu_bus_notifier+0x55/0xb0
> notifier_call_chain+0x5a/0xd0
> blocking_notifier_call_chain+0x41/0x60
> bus_notify+0x34/0x50
> device_del+0x269/0x3d0
> pci_remove_bus_device+0x77/0x100
> p2sb_bar+0xae/0x1d0
> ...
> i801_probe+0x423/0x740
>
> Use the release_domain mechanism to fix it. The scalable mode context
> entry which is not part of release_domain should be cleared in
> release_device().
>
> Fixes: 586081d3f6b1 ("iommu/vt-d: Remove DEFER_DEVICE_DOMAIN_INFO")
> Reported-by: Eric Badger <[email protected]>
> Closes: https://lore.kernel.org/r/20240113181713.1817855-1-
> [email protected]
> Signed-off-by: Lu Baolu <[email protected]>

Reviewed-by: Kevin Tian <[email protected]>

2024-03-05 01:50:24

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v3 3/5] iommu/vt-d: Setup scalable mode context entry in probe path

> From: Lu Baolu <[email protected]>
> Sent: Tuesday, March 5, 2024 9:33 AM
>
> In contrast to legacy mode, the DMA translation table is configured in
> the PASID table entry instead of the context entry for scalable mode.
> For this reason, it is more appropriate to set up the scalable mode
> context entry in the device_probe callback and direct it to the
> appropriate PASID table.
>
> The iommu domain attach/detach operations only affect the PASID table
> entry. Therefore, there is no need to modify the context entry when
> configuring the translation type and page table.
>
> The only exception is the kdump case, where context entry setup is
> postponed until the device driver invokes the first DMA interface.
>
> Signed-off-by: Lu Baolu <[email protected]>

Reviewed-by: Kevin Tian <[email protected]>

2024-03-05 02:42:15

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v3 5/5] iommu/vt-d: Remove scalabe mode in domain_context_clear_one()

domain_context_clear_one() only handles the context entry teardown in
legacy mode. Remove the scalable mode check in it to avoid dead code.

Remove an unnecessary check in the code as well.

Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
---
drivers/iommu/intel/iommu.c | 15 +--------------
1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d682eb6ad4d2..50eb9aed47cc 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2175,9 +2175,6 @@ static void domain_context_clear_one(struct device_domain_info *info, u8 bus, u8
struct context_entry *context;
u16 did_old;

- if (!iommu)
- return;
-
spin_lock(&iommu->lock);
context = iommu_context_addr(iommu, bus, devfn, 0);
if (!context) {
@@ -2185,14 +2182,7 @@ static void domain_context_clear_one(struct device_domain_info *info, u8 bus, u8
return;
}

- if (sm_supported(iommu)) {
- if (hw_pass_through && domain_type_is_si(info->domain))
- did_old = FLPT_DEFAULT_DID;
- else
- did_old = domain_id_iommu(info->domain, iommu);
- } else {
- did_old = context_domain_id(context);
- }
+ did_old = context_domain_id(context);

context_clear_entry(context);
__iommu_flush_cache(iommu, context, sizeof(*context));
@@ -2203,9 +2193,6 @@ static void domain_context_clear_one(struct device_domain_info *info, u8 bus, u8
DMA_CCMD_MASK_NOBIT,
DMA_CCMD_DEVICE_INVL);

- if (sm_supported(iommu))
- qi_flush_pasid_cache(iommu, did_old, QI_PC_ALL_PASIDS, 0);
-
iommu->flush.flush_iotlb(iommu,
did_old,
0,
--
2.34.1


2024-03-05 07:56:18

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] iommu: Fix domain check on release

Hi Baolu,

On Tue, Mar 05, 2024 at 09:33:00AM +0800, Lu Baolu wrote:
> It fixes a NULL pointer dereference issue in the Intel iommu driver and
> strengthens the iommu core to possibly prevent similar failures in other
> iommu drivers.

Please send me another pull request once you consider this ready. I
guess it is v6.9 material.

Regards,

Joerg

2024-03-05 12:02:09

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] iommu: Fix domain check on release

On 2024/3/5 15:54, Joerg Roedel wrote:
> On Tue, Mar 05, 2024 at 09:33:00AM +0800, Lu Baolu wrote:
>> It fixes a NULL pointer dereference issue in the Intel iommu driver and
>> strengthens the iommu core to possibly prevent similar failures in other
>> iommu drivers.
> Please send me another pull request once you consider this ready. I
> guess it is v6.9 material.

Sure. I will queue this series in a pull request.

Best regards,
baolu

2024-04-10 15:57:38

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] iommu: Add static iommu_ops->release_domain

On Tue, Mar 05, 2024 at 09:33:01AM +0800, Lu Baolu wrote:
> The current device_release callback for individual iommu drivers does the
> following:
>
> 1) Silent IOMMU DMA translation: It detaches any existing domain from the
> device and puts it into a blocking state (some drivers might use the
> identity state).
> 2) Resource release: It releases resources allocated during the
> device_probe callback and restores the device to its pre-probe state.
>
> Step 1 is challenging for individual iommu drivers because each must check
> if a domain is already attached to the device. Additionally, if a deferred
> attach never occurred, the device_release should avoid modifying hardware
> configuration regardless of the reason for its call.
>
> To simplify this process, introduce a static release_domain within the
> iommu_ops structure. It can be either a blocking or identity domain
> depending on the iommu hardware. The iommu core will decide whether to
> attach this domain before the device_release callback, eliminating the
> need for repetitive code in various drivers.
>
> Consequently, the device_release callback can focus solely on the opposite
> operations of device_probe, including releasing all resources allocated
> during that callback.
>
> Co-developed-by: Jason Gunthorpe <[email protected]>
> Signed-off-by: Jason Gunthorpe <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Kevin Tian <[email protected]>
> ---
> include/linux/iommu.h | 1 +
> drivers/iommu/iommu.c | 19 +++++++++++++++----
> 2 files changed, 16 insertions(+), 4 deletions(-)

I looked at all the drivers:
1) Didn't spend time to guess what ipmmu-vmss is doing
2) Several drivers are obviously missing the release_domain behavior
and now be trivially fixed
3) amd, s390, virtio-iommu and arm-smmu should probably support
blocked_domain (I assume that is what their detach does)
4) arm-smmuv3 can use it too once disable_bypass is removed
5) Several drivers don't even have release_device functions.
Probably all of them can do their identiy domains too.

This seems like a pretty reasonable thing to add to this series too:

diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
index eb1e62cd499a58..3ddc4b4418a049 100644
--- a/drivers/iommu/apple-dart.c
+++ b/drivers/iommu/apple-dart.c
@@ -979,6 +979,7 @@ static void apple_dart_get_resv_regions(struct device *dev,
static const struct iommu_ops apple_dart_iommu_ops = {
.identity_domain = &apple_dart_identity_domain,
.blocked_domain = &apple_dart_blocked_domain,
+ .release_domain = &apple_dart_blocked_domain,
.domain_alloc_paging = apple_dart_domain_alloc_paging,
.probe_device = apple_dart_probe_device,
.release_device = apple_dart_release_device,
diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index d98c9161948a25..902dc4da44f987 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -1424,8 +1424,6 @@ static void exynos_iommu_release_device(struct device *dev)
struct exynos_iommu_owner *owner = dev_iommu_priv_get(dev);
struct sysmmu_drvdata *data;

- WARN_ON(exynos_iommu_identity_attach(&exynos_identity_domain, dev));
-
list_for_each_entry(data, &owner->controllers, owner_node)
device_link_del(data->link);
}
@@ -1471,6 +1469,7 @@ static int exynos_iommu_of_xlate(struct device *dev,

static const struct iommu_ops exynos_iommu_ops = {
.identity_domain = &exynos_identity_domain,
+ .release_domain = &exynos_identity_domain,
.domain_alloc_paging = exynos_iommu_domain_alloc_paging,
.device_group = generic_device_group,
.probe_device = exynos_iommu_probe_device,
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index b8c47f18bc2612..b5693041b18dd4 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -1012,6 +1012,7 @@ static void mtk_iommu_get_resv_regions(struct device *dev,

static const struct iommu_ops mtk_iommu_ops = {
.identity_domain = &mtk_iommu_identity_domain,
+ .release_domain = &mtk_iommu_identity_domain,
.domain_alloc_paging = mtk_iommu_domain_alloc_paging,
.probe_device = mtk_iommu_probe_device,
.release_device = mtk_iommu_release_device,
diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index a9fa2a54dc9b39..9e7205af7d7316 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -580,6 +580,7 @@ static int mtk_iommu_v1_hw_init(const struct mtk_iommu_v1_data *data)

static const struct iommu_ops mtk_iommu_v1_ops = {
.identity_domain = &mtk_iommu_v1_identity_domain,
+ .release_domain = &mtk_iommu_v1_identity_domain,
.domain_alloc_paging = mtk_iommu_v1_domain_alloc_paging,
.probe_device = mtk_iommu_v1_probe_device,
.probe_finalize = mtk_iommu_v1_probe_finalize,
diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index c9528065a59afa..c4c76aaec19e50 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -1725,6 +1725,7 @@ static void omap_iommu_release_device(struct device *dev)

static const struct iommu_ops omap_iommu_ops = {
.identity_domain = &omap_iommu_identity_domain,
+ .release_domain = &omap_iommu_identity_domain,
.domain_alloc_paging = omap_iommu_domain_alloc_paging,
.probe_device = omap_iommu_probe_device,
.release_device = omap_iommu_release_device,
diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index da79d9f4cf6371..e551c5b143bbd3 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -1162,6 +1162,7 @@ static int rk_iommu_of_xlate(struct device *dev,

static const struct iommu_ops rk_iommu_ops = {
.identity_domain = &rk_identity_domain,
+ .release_domain = &rk_identity_domain,
.domain_alloc_paging = rk_iommu_domain_alloc_paging,
.probe_device = rk_iommu_probe_device,
.release_device = rk_iommu_release_device,

> + if (!dev->iommu->attach_deferred && ops->release_domain)
> + ops->release_domain->ops->attach_dev(ops->release_domain, dev);

We should probably be sensitive to the
dev->iommu->require_direct flag - generally drivers should prefer the
blocked for the release domain, but in case the FW ias asking for
require_direct we need to switch to identity.

Also, may as well avoid switching a domain if the group is already
correct and use the common attach function to get the tracing.. So
like this?

if (!dev->iommu->attach_deferred) {
struct iommu_domain *release_domain = ops->release_domain;

if (dev->iommu->require_direct && ops->identity_domain)
release_domain = ops->identity_domain;

if (release_domain && group->domain != release_domain)
WARN_ON(__iommu_attach_device(release_domain, dev));
}

Jason

2024-04-10 16:37:53

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] iommu: Add static iommu_ops->release_domain

On 2024-04-10 4:26 pm, Jason Gunthorpe wrote:
> On Tue, Mar 05, 2024 at 09:33:01AM +0800, Lu Baolu wrote:
>> The current device_release callback for individual iommu drivers does the
>> following:
>>
>> 1) Silent IOMMU DMA translation: It detaches any existing domain from the
>> device and puts it into a blocking state (some drivers might use the
>> identity state).
>> 2) Resource release: It releases resources allocated during the
>> device_probe callback and restores the device to its pre-probe state.
>>
>> Step 1 is challenging for individual iommu drivers because each must check
>> if a domain is already attached to the device. Additionally, if a deferred
>> attach never occurred, the device_release should avoid modifying hardware
>> configuration regardless of the reason for its call.
>>
>> To simplify this process, introduce a static release_domain within the
>> iommu_ops structure. It can be either a blocking or identity domain
>> depending on the iommu hardware. The iommu core will decide whether to
>> attach this domain before the device_release callback, eliminating the
>> need for repetitive code in various drivers.
>>
>> Consequently, the device_release callback can focus solely on the opposite
>> operations of device_probe, including releasing all resources allocated
>> during that callback.
>>
>> Co-developed-by: Jason Gunthorpe <[email protected]>
>> Signed-off-by: Jason Gunthorpe <[email protected]>
>> Signed-off-by: Lu Baolu <[email protected]>
>> Reviewed-by: Kevin Tian <[email protected]>
>> ---
>> include/linux/iommu.h | 1 +
>> drivers/iommu/iommu.c | 19 +++++++++++++++----
>> 2 files changed, 16 insertions(+), 4 deletions(-)
>
> I looked at all the drivers:
> 1) Didn't spend time to guess what ipmmu-vmss is doing
> 2) Several drivers are obviously missing the release_domain behavior
> and now be trivially fixed
> 3) amd, s390, virtio-iommu and arm-smmu should probably support
> blocked_domain (I assume that is what their detach does)
> 4) arm-smmuv3 can use it too once disable_bypass is removed
> 5) Several drivers don't even have release_device functions.
> Probably all of them can do their identiy domains too.
>
> This seems like a pretty reasonable thing to add to this series too:
>
> diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
> index eb1e62cd499a58..3ddc4b4418a049 100644
> --- a/drivers/iommu/apple-dart.c
> +++ b/drivers/iommu/apple-dart.c
> @@ -979,6 +979,7 @@ static void apple_dart_get_resv_regions(struct device *dev,
> static const struct iommu_ops apple_dart_iommu_ops = {
> .identity_domain = &apple_dart_identity_domain,
> .blocked_domain = &apple_dart_blocked_domain,
> + .release_domain = &apple_dart_blocked_domain,
> .domain_alloc_paging = apple_dart_domain_alloc_paging,
> .probe_device = apple_dart_probe_device,
> .release_device = apple_dart_release_device,
> diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
> index d98c9161948a25..902dc4da44f987 100644
> --- a/drivers/iommu/exynos-iommu.c
> +++ b/drivers/iommu/exynos-iommu.c
> @@ -1424,8 +1424,6 @@ static void exynos_iommu_release_device(struct device *dev)
> struct exynos_iommu_owner *owner = dev_iommu_priv_get(dev);
> struct sysmmu_drvdata *data;
>
> - WARN_ON(exynos_iommu_identity_attach(&exynos_identity_domain, dev));
> -
> list_for_each_entry(data, &owner->controllers, owner_node)
> device_link_del(data->link);
> }
> @@ -1471,6 +1469,7 @@ static int exynos_iommu_of_xlate(struct device *dev,
>
> static const struct iommu_ops exynos_iommu_ops = {
> .identity_domain = &exynos_identity_domain,
> + .release_domain = &exynos_identity_domain,
> .domain_alloc_paging = exynos_iommu_domain_alloc_paging,
> .device_group = generic_device_group,
> .probe_device = exynos_iommu_probe_device,
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index b8c47f18bc2612..b5693041b18dd4 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -1012,6 +1012,7 @@ static void mtk_iommu_get_resv_regions(struct device *dev,
>
> static const struct iommu_ops mtk_iommu_ops = {
> .identity_domain = &mtk_iommu_identity_domain,
> + .release_domain = &mtk_iommu_identity_domain,
> .domain_alloc_paging = mtk_iommu_domain_alloc_paging,
> .probe_device = mtk_iommu_probe_device,
> .release_device = mtk_iommu_release_device,
> diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
> index a9fa2a54dc9b39..9e7205af7d7316 100644
> --- a/drivers/iommu/mtk_iommu_v1.c
> +++ b/drivers/iommu/mtk_iommu_v1.c
> @@ -580,6 +580,7 @@ static int mtk_iommu_v1_hw_init(const struct mtk_iommu_v1_data *data)
>
> static const struct iommu_ops mtk_iommu_v1_ops = {
> .identity_domain = &mtk_iommu_v1_identity_domain,
> + .release_domain = &mtk_iommu_v1_identity_domain,
> .domain_alloc_paging = mtk_iommu_v1_domain_alloc_paging,
> .probe_device = mtk_iommu_v1_probe_device,
> .probe_finalize = mtk_iommu_v1_probe_finalize,
> diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
> index c9528065a59afa..c4c76aaec19e50 100644
> --- a/drivers/iommu/omap-iommu.c
> +++ b/drivers/iommu/omap-iommu.c
> @@ -1725,6 +1725,7 @@ static void omap_iommu_release_device(struct device *dev)
>
> static const struct iommu_ops omap_iommu_ops = {
> .identity_domain = &omap_iommu_identity_domain,
> + .release_domain = &omap_iommu_identity_domain,
> .domain_alloc_paging = omap_iommu_domain_alloc_paging,
> .probe_device = omap_iommu_probe_device,
> .release_device = omap_iommu_release_device,
> diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
> index da79d9f4cf6371..e551c5b143bbd3 100644
> --- a/drivers/iommu/rockchip-iommu.c
> +++ b/drivers/iommu/rockchip-iommu.c
> @@ -1162,6 +1162,7 @@ static int rk_iommu_of_xlate(struct device *dev,
>
> static const struct iommu_ops rk_iommu_ops = {
> .identity_domain = &rk_identity_domain,
> + .release_domain = &rk_identity_domain,
> .domain_alloc_paging = rk_iommu_domain_alloc_paging,
> .probe_device = rk_iommu_probe_device,
> .release_device = rk_iommu_release_device,
>
>> + if (!dev->iommu->attach_deferred && ops->release_domain)
>> + ops->release_domain->ops->attach_dev(ops->release_domain, dev);
>
> We should probably be sensitive to the
> dev->iommu->require_direct flag - generally drivers should prefer the
> blocked for the release domain, but in case the FW ias asking for
> require_direct we need to switch to identity.

At this point do we even need release_domain? It sounds like the logic
you want to enforce is going to be trivial to resolve directly in the
core code. As typed-in-mail-client pseudocode, roughly:

static void iommu_set_release_domain(struct device *dev)
{
const struct iommu_ops *ops = dev_iommu_ops(dev);
struct iommu_domain *rd;

/*
* Static domains are expected not to track any device state,
* and thus be tolerant of devices disappearing once "attached"
*/
if (ops->blocked_domain && !(dev->iommu->require_direct ||
other_arch_or_platform_reason))
rd = ops->blocked_domain;
else if (ops->identity_domain)
rd = ops->identity_domain;
else /* Hope release_device does the right thing! */
return;

if (!dev->iommu->attach_deferred && rd != dev->iommu_group->domain)
__iommu_attach_device(rd, dev);
}

..no driver churn necessary. (And frankly I think it's a further bonus
to avoid risking any notion of release_domain being implementable as its
own distinct special thing)

Thanks,
Robin.

>
> Also, may as well avoid switching a domain if the group is already
> correct and use the common attach function to get the tracing.. So
> like this?
>
> if (!dev->iommu->attach_deferred) {
> struct iommu_domain *release_domain = ops->release_domain;
>
> if (dev->iommu->require_direct && ops->identity_domain)
> release_domain = ops->identity_domain;
>
> if (release_domain && group->domain != release_domain)
> WARN_ON(__iommu_attach_device(release_domain, dev));
> }
>
> Jason

2024-04-11 13:47:25

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] iommu: Add static iommu_ops->release_domain

On Wed, Apr 10, 2024 at 05:37:06PM +0100, Robin Murphy wrote:
> > We should probably be sensitive to the
> > dev->iommu->require_direct flag - generally drivers should prefer the
> > blocked for the release domain, but in case the FW ias asking for
> > require_direct we need to switch to identity.
>
> At this point do we even need release_domain?

Ultimately ideally not, but I feel better going through the exercise
driver-by-driver before we just make the core code do it
automatically. Maybe I'm being overly pessimistic about the drivers..

Have all the drivers setting identity/blocked domain set release
domain before we switch to this unconditional method.

Anyhow, I just noticed it went into -rc1 already, so may as well keep
going.

> static void iommu_set_release_domain(struct device *dev)
> {
> const struct iommu_ops *ops = dev_iommu_ops(dev);
> struct iommu_domain *rd;
>
> /*
> * Static domains are expected not to track any device state,
> * and thus be tolerant of devices disappearing once "attached"
> */
> if (ops->blocked_domain && !(dev->iommu->require_direct ||
> other_arch_or_platform_reason))
> rd = ops->blocked_domain;
> else if (ops->identity_domain)
> rd = ops->identity_domain;
> else /* Hope release_device does the right thing! */
> return;
>
> if (!dev->iommu->attach_deferred && rd != dev->iommu_group->domain)
> __iommu_attach_device(rd, dev);
> }

Yeah, this is a good end goal.

Jason