2023-12-05 01:26:58

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v2 0/6] iommu/vt-d: Convert to use static identity domain

Intel's IOMMU driver used a special domain called 1:1 mapping domain to
support the domain of type IOMMU_DOMAIN_IDENTITY, which enables device
drivers to directly utilize physical addresses for DMA access despite
the presence of IOMMU units.

The implementation of the 1:1 mapping domain is influenced by hardware
differences. While modern Intel VT-d implementations support hardware
passthrough translation mode, earlier versions lacked this feature,
which requires a more complex implementation approach.

The 1:1 mapping domain for earlier hardware was implemented by associating
a DMA domain with an IOVA (IO Virtual Address) equivalent to the
physical address. While, for most hardware supporting passthrough mode,
simply setting the hardware's passthrough mode is sufficient. These two
modes were merged together in si_domain, which is a special DMA domain
sharing the domain ops of an ordinary DMA domain.

As the iommu core has evolved, it has introduced global static identity
domain with "never fail" attach semantics. This means that the domain is
always available and cannot fail to attach. The iommu driver now assigns
this domain directly at iommu_ops->identity_domain instead of allocating
it through the domain allocation interface.

This converts the Intel IOMMU driver to embrace the global static
identity domain. For early legacy hardwares that don't support
passthrough translation mode, ask the iommu core to use a DMA type of
default domain. For modern hardwares that support passthrough
translation mode, implement a static global identity domain.

The whole series is also avaiable at

https://github.com/LuBaolu/intel-iommu/commits/vtd-static-identity-domain-v2

Very appreciated for your review comments and suggestions.

Change log:
v2:
- Re-orgnize the patches by removing 1:1 mappings before implementing
global static domain.

v1: https://lore.kernel.org/linux-iommu/[email protected]/

Lu Baolu (6):
iommu/vt-d: Setup scalable mode context entry in probe path
iommu/vt-d: Remove scalable mode context entry setup from attach_dev
iommu/vt-d: Refactor domain_context_mapping_one() to be reusable
iommu/vt-d: Remove 1:1 mappings from identity domain
iommu/vt-d: Add support for static identity domain
iommu/vt-d: Cleanup si_domain

drivers/iommu/intel/pasid.h | 1 +
drivers/iommu/intel/iommu.c | 568 +++++++++++++++---------------------
drivers/iommu/intel/pasid.c | 180 ++++++++++++
drivers/iommu/intel/svm.c | 2 +-
4 files changed, 415 insertions(+), 336 deletions(-)

--
2.34.1


2023-12-05 01:27:02

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v2 3/6] iommu/vt-d: Refactor domain_context_mapping_one() to be reusable

Extract common code from domain_context_mapping_one() into new functions,
making it reusable by other functions such as the upcoming identity domain
implementation. No intentional functional changes.

Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/intel/iommu.c | 99 ++++++++++++++++++++++---------------
1 file changed, 58 insertions(+), 41 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index a324b3a3a005..605cd1c52e95 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1727,6 +1727,61 @@ static void domain_exit(struct dmar_domain *domain)
kfree(domain);
}

+/*
+ * For kdump cases, old valid entries may be cached due to the
+ * in-flight DMA and copied pgtable, but there is no unmapping
+ * behaviour for them, thus we need an explicit cache flush for
+ * the newly-mapped device. For kdump, at this point, the device
+ * is supposed to finish reset at its driver probe stage, so no
+ * in-flight DMA will exist, and we don't need to worry anymore
+ * hereafter.
+ */
+static void copied_context_tear_down(struct intel_iommu *iommu,
+ struct context_entry *context,
+ u8 bus, u8 devfn)
+{
+ u16 did_old;
+
+ if (!context_copied(iommu, bus, devfn))
+ return;
+
+ assert_spin_locked(&iommu->lock);
+
+ did_old = context_domain_id(context);
+ context_clear_entry(context);
+
+ if (did_old < cap_ndoms(iommu->cap)) {
+ iommu->flush.flush_context(iommu, did_old,
+ (((u16)bus) << 8) | devfn,
+ DMA_CCMD_MASK_NOBIT,
+ DMA_CCMD_DEVICE_INVL);
+ iommu->flush.flush_iotlb(iommu, did_old, 0, 0,
+ DMA_TLB_DSI_FLUSH);
+ }
+
+ clear_context_copied(iommu, bus, devfn);
+}
+
+/*
+ * It's a non-present to present mapping. If hardware doesn't cache
+ * non-present entry we only need to flush the write-buffer. If the
+ * _does_ cache non-present entries, then it does so in the special
+ * domain #0, which we have to flush:
+ */
+static void context_present_cache_flush(struct intel_iommu *iommu, u16 did,
+ u8 bus, u8 devfn)
+{
+ if (cap_caching_mode(iommu->cap)) {
+ iommu->flush.flush_context(iommu, 0,
+ (((u16)bus) << 8) | devfn,
+ DMA_CCMD_MASK_NOBIT,
+ DMA_CCMD_DEVICE_INVL);
+ iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
+ } else {
+ iommu_flush_write_buffer(iommu);
+ }
+}
+
static int domain_context_mapping_one(struct dmar_domain *domain,
struct intel_iommu *iommu,
u8 bus, u8 devfn)
@@ -1755,31 +1810,9 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
if (context_present(context) && !context_copied(iommu, bus, devfn))
goto out_unlock;

- /*
- * For kdump cases, old valid entries may be cached due to the
- * in-flight DMA and copied pgtable, but there is no unmapping
- * behaviour for them, thus we need an explicit cache flush for
- * the newly-mapped device. For kdump, at this point, the device
- * is supposed to finish reset at its driver probe stage, so no
- * in-flight DMA will exist, and we don't need to worry anymore
- * hereafter.
- */
- if (context_copied(iommu, bus, devfn)) {
- u16 did_old = context_domain_id(context);
-
- if (did_old < cap_ndoms(iommu->cap)) {
- iommu->flush.flush_context(iommu, did_old,
- (((u16)bus) << 8) | devfn,
- DMA_CCMD_MASK_NOBIT,
- DMA_CCMD_DEVICE_INVL);
- iommu->flush.flush_iotlb(iommu, did_old, 0, 0,
- DMA_TLB_DSI_FLUSH);
- }
-
- clear_context_copied(iommu, bus, devfn);
- }
-
+ copied_context_tear_down(iommu, context, bus, devfn);
context_clear_entry(context);
+
context_set_domain_id(context, did);

if (translation != CONTEXT_TT_PASS_THROUGH) {
@@ -1815,23 +1848,7 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
context_set_present(context);
if (!ecap_coherent(iommu->ecap))
clflush_cache_range(context, sizeof(*context));
-
- /*
- * It's a non-present to present mapping. If hardware doesn't cache
- * non-present entry we only need to flush the write-buffer. If the
- * _does_ cache non-present entries, then it does so in the special
- * domain #0, which we have to flush:
- */
- if (cap_caching_mode(iommu->cap)) {
- iommu->flush.flush_context(iommu, 0,
- (((u16)bus) << 8) | devfn,
- DMA_CCMD_MASK_NOBIT,
- DMA_CCMD_DEVICE_INVL);
- iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
- } else {
- iommu_flush_write_buffer(iommu);
- }
-
+ context_present_cache_flush(iommu, did, bus, devfn);
ret = 0;

out_unlock:
--
2.34.1

2023-12-05 01:27:21

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v2 2/6] iommu/vt-d: Remove scalable mode context entry setup from attach_dev

The scalable mode context entry is now setup in the probe_device path,
eliminating the need to configure it in the attach_dev path. Removes the
redundant code from the attach_dev path to avoid dead code.

Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/intel/iommu.c | 156 ++++++++++--------------------------
1 file changed, 44 insertions(+), 112 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 9dc005031dd2..a324b3a3a005 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1727,34 +1727,17 @@ static void domain_exit(struct dmar_domain *domain)
kfree(domain);
}

-/*
- * Get the PASID directory size for scalable mode context entry.
- * Value of X in the PDTS field of a scalable mode context entry
- * indicates PASID directory with 2^(X + 7) entries.
- */
-static unsigned long context_get_sm_pds(struct pasid_table *table)
-{
- unsigned long pds, max_pde;
-
- max_pde = table->max_pasid >> PASID_PDE_SHIFT;
- pds = find_first_bit(&max_pde, MAX_NR_PASID_BITS);
- if (pds < 7)
- return 0;
-
- return pds - 7;
-}
-
static int domain_context_mapping_one(struct dmar_domain *domain,
struct intel_iommu *iommu,
- struct pasid_table *table,
u8 bus, u8 devfn)
{
struct device_domain_info *info =
domain_lookup_dev_info(domain, iommu, bus, devfn);
u16 did = domain_id_iommu(domain, iommu);
int translation = CONTEXT_TT_MULTI_LEVEL;
+ struct dma_pte *pgd = domain->pgd;
struct context_entry *context;
- int ret;
+ int agaw, ret;

if (hw_pass_through && domain_type_is_si(domain))
translation = CONTEXT_TT_PASS_THROUGH;
@@ -1797,65 +1780,37 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
}

context_clear_entry(context);
+ context_set_domain_id(context, did);

- if (sm_supported(iommu)) {
- unsigned long pds;
-
- /* Setup the PASID DIR pointer: */
- pds = context_get_sm_pds(table);
- context->lo = (u64)virt_to_phys(table->table) |
- context_pdts(pds);
-
- /* Setup the RID_PASID field: */
- context_set_sm_rid2pasid(context, IOMMU_NO_PASID);
-
+ if (translation != CONTEXT_TT_PASS_THROUGH) {
/*
- * Setup the Device-TLB enable bit and Page request
- * Enable bit:
+ * Skip top levels of page tables for iommu which has
+ * less agaw than default. Unnecessary for PT mode.
*/
+ for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) {
+ ret = -ENOMEM;
+ pgd = phys_to_virt(dma_pte_addr(pgd));
+ if (!dma_pte_present(pgd))
+ goto out_unlock;
+ }
+
if (info && info->ats_supported)
- context_set_sm_dte(context);
- if (info && info->pri_supported)
- context_set_sm_pre(context);
- if (info && info->pasid_supported)
- context_set_pasid(context);
+ translation = CONTEXT_TT_DEV_IOTLB;
+ else
+ translation = CONTEXT_TT_MULTI_LEVEL;
+
+ context_set_address_root(context, virt_to_phys(pgd));
+ context_set_address_width(context, agaw);
} else {
- struct dma_pte *pgd = domain->pgd;
- int agaw;
-
- context_set_domain_id(context, did);
-
- if (translation != CONTEXT_TT_PASS_THROUGH) {
- /*
- * Skip top levels of page tables for iommu which has
- * less agaw than default. Unnecessary for PT mode.
- */
- for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) {
- ret = -ENOMEM;
- pgd = phys_to_virt(dma_pte_addr(pgd));
- if (!dma_pte_present(pgd))
- goto out_unlock;
- }
-
- if (info && info->ats_supported)
- translation = CONTEXT_TT_DEV_IOTLB;
- else
- translation = CONTEXT_TT_MULTI_LEVEL;
-
- context_set_address_root(context, virt_to_phys(pgd));
- context_set_address_width(context, agaw);
- } else {
- /*
- * In pass through mode, AW must be programmed to
- * indicate the largest AGAW value supported by
- * hardware. And ASR is ignored by hardware.
- */
- context_set_address_width(context, iommu->msagaw);
- }
-
- context_set_translation_type(context, translation);
+ /*
+ * In pass through mode, AW must be programmed to
+ * indicate the largest AGAW value supported by
+ * hardware. And ASR is ignored by hardware.
+ */
+ context_set_address_width(context, iommu->msagaw);
}

+ context_set_translation_type(context, translation);
context_set_fault_enable(context);
context_set_present(context);
if (!ecap_coherent(iommu->ecap))
@@ -1885,43 +1840,29 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
return ret;
}

-struct domain_context_mapping_data {
- struct dmar_domain *domain;
- struct intel_iommu *iommu;
- struct pasid_table *table;
-};
-
static int domain_context_mapping_cb(struct pci_dev *pdev,
u16 alias, void *opaque)
{
- struct domain_context_mapping_data *data = opaque;
+ struct device_domain_info *info = dev_iommu_priv_get(&pdev->dev);
+ struct intel_iommu *iommu = info->iommu;
+ struct dmar_domain *domain = opaque;

- return domain_context_mapping_one(data->domain, data->iommu,
- data->table, PCI_BUS_NUM(alias),
- alias & 0xff);
+ return domain_context_mapping_one(domain, iommu,
+ PCI_BUS_NUM(alias), alias & 0xff);
}

static int
domain_context_mapping(struct dmar_domain *domain, struct device *dev)
{
struct device_domain_info *info = dev_iommu_priv_get(dev);
- struct domain_context_mapping_data data;
struct intel_iommu *iommu = info->iommu;
u8 bus = info->bus, devfn = info->devfn;
- struct pasid_table *table;
-
- table = intel_pasid_get_table(dev);

if (!dev_is_pci(dev))
- return domain_context_mapping_one(domain, iommu, table,
- bus, devfn);
-
- data.domain = domain;
- data.iommu = iommu;
- data.table = table;
+ return domain_context_mapping_one(domain, iommu, bus, devfn);

return pci_for_each_dma_alias(to_pci_dev(dev),
- &domain_context_mapping_cb, &data);
+ domain_context_mapping_cb, domain);
}

/* Returns a number of VTD pages, but aligned to MM page size */
@@ -2278,28 +2219,19 @@ static int dmar_domain_attach_device(struct dmar_domain *domain,
list_add(&info->link, &domain->devices);
spin_unlock_irqrestore(&domain->lock, flags);

- /* PASID table is mandatory for a PCI device in scalable mode. */
- if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev)) {
- /* Setup the PASID entry for requests without PASID: */
- if (hw_pass_through && domain_type_is_si(domain))
- ret = intel_pasid_setup_pass_through(iommu,
- dev, IOMMU_NO_PASID);
- else if (domain->use_first_level)
- ret = domain_setup_first_level(iommu, domain, dev,
- IOMMU_NO_PASID);
- else
- ret = intel_pasid_setup_second_level(iommu, domain,
- dev, IOMMU_NO_PASID);
- if (ret) {
- dev_err(dev, "Setup RID2PASID failed\n");
- device_block_translation(dev);
- return ret;
- }
- }
+ if (dev_is_real_dma_subdevice(dev))
+ return 0;
+
+ if (!sm_supported(iommu))
+ ret = domain_context_mapping(domain, dev);
+ else if (hw_pass_through && domain_type_is_si(domain))
+ ret = intel_pasid_setup_pass_through(iommu, dev, IOMMU_NO_PASID);
+ else if (domain->use_first_level)
+ ret = domain_setup_first_level(iommu, domain, dev, IOMMU_NO_PASID);
+ else
+ ret = intel_pasid_setup_second_level(iommu, domain, dev, IOMMU_NO_PASID);

- ret = domain_context_mapping(domain, dev);
if (ret) {
- dev_err(dev, "Domain context map failed\n");
device_block_translation(dev);
return ret;
}
--
2.34.1

2023-12-05 01:27:25

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v2 4/6] iommu/vt-d: Remove 1:1 mappings from identity domain

Older VT-d hardware implementations did not support pass-through
translation mode. The iommu driver relied on a DMA domain with all
physical memory addresses identically mapped to the same IOVA to
simulate pass-through translation.

This workaround is no longer necessary due to the evolution of iommu
core. The core has introduced def_domain_type op, allowing the iommu
driver to specify its capabilities. Additionally, the identity domain
has become a static system domain with "never fail" attach semantics.

Eliminate support for the 1:1 mapping domain on older hardware and
removes the unused code that created the 1:1 page table. This paves a
way for the implementation of a global static identity domain.

Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/intel/iommu.c | 118 +++---------------------------------
1 file changed, 10 insertions(+), 108 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 605cd1c52e95..7022cc183120 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2146,29 +2146,10 @@ static bool dev_is_real_dma_subdevice(struct device *dev)
pci_real_dma_dev(to_pci_dev(dev)) != to_pci_dev(dev);
}

-static int iommu_domain_identity_map(struct dmar_domain *domain,
- unsigned long first_vpfn,
- unsigned long last_vpfn)
-{
- /*
- * RMRR range might have overlap with physical memory range,
- * clear it first
- */
- dma_pte_clear_range(domain, first_vpfn, last_vpfn);
-
- return __domain_mapping(domain, first_vpfn,
- first_vpfn, last_vpfn - first_vpfn + 1,
- DMA_PTE_READ|DMA_PTE_WRITE, GFP_KERNEL);
-}
-
static int md_domain_init(struct dmar_domain *domain, int guest_width);

static int __init si_domain_init(int hw)
{
- struct dmar_rmrr_unit *rmrr;
- struct device *dev;
- int i, nid, ret;
-
si_domain = alloc_domain(IOMMU_DOMAIN_IDENTITY);
if (!si_domain)
return -EFAULT;
@@ -2179,44 +2160,6 @@ static int __init si_domain_init(int hw)
return -EFAULT;
}

- if (hw)
- return 0;
-
- for_each_online_node(nid) {
- unsigned long start_pfn, end_pfn;
- int i;
-
- for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) {
- ret = iommu_domain_identity_map(si_domain,
- mm_to_dma_pfn_start(start_pfn),
- mm_to_dma_pfn_end(end_pfn));
- if (ret)
- return ret;
- }
- }
-
- /*
- * Identity map the RMRRs so that devices with RMRRs could also use
- * the si_domain.
- */
- for_each_rmrr_units(rmrr) {
- for_each_active_dev_scope(rmrr->devices, rmrr->devices_cnt,
- i, dev) {
- unsigned long long start = rmrr->base_address;
- unsigned long long end = rmrr->end_address;
-
- if (WARN_ON(end < start ||
- end >> agaw_to_width(si_domain->agaw)))
- continue;
-
- ret = iommu_domain_identity_map(si_domain,
- mm_to_dma_pfn_start(start >> PAGE_SHIFT),
- mm_to_dma_pfn_end(end >> PAGE_SHIFT));
- if (ret)
- return ret;
- }
- }
-
return 0;
}

@@ -2301,6 +2244,9 @@ static bool device_rmrr_is_relaxable(struct device *dev)
*/
static int device_def_domain_type(struct device *dev)
{
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+ struct intel_iommu *iommu = info->iommu;
+
if (dev_is_pci(dev)) {
struct pci_dev *pdev = to_pci_dev(dev);

@@ -2311,6 +2257,13 @@ static int device_def_domain_type(struct device *dev)
return IOMMU_DOMAIN_IDENTITY;
}

+ /*
+ * Hardware does not support the passthrough translation mode.
+ * Always use a dynamaic mapping domain.
+ */
+ if (!ecap_pass_through(iommu->ecap))
+ return IOMMU_DOMAIN_DMA;
+
return 0;
}

@@ -3301,52 +3254,6 @@ int dmar_iommu_notify_scope_dev(struct dmar_pci_notify_info *info)
return 0;
}

-static int intel_iommu_memory_notifier(struct notifier_block *nb,
- unsigned long val, void *v)
-{
- struct memory_notify *mhp = v;
- unsigned long start_vpfn = mm_to_dma_pfn_start(mhp->start_pfn);
- unsigned long last_vpfn = mm_to_dma_pfn_end(mhp->start_pfn +
- mhp->nr_pages - 1);
-
- switch (val) {
- case MEM_GOING_ONLINE:
- if (iommu_domain_identity_map(si_domain,
- start_vpfn, last_vpfn)) {
- pr_warn("Failed to build identity map for [%lx-%lx]\n",
- start_vpfn, last_vpfn);
- return NOTIFY_BAD;
- }
- break;
-
- case MEM_OFFLINE:
- case MEM_CANCEL_ONLINE:
- {
- struct dmar_drhd_unit *drhd;
- struct intel_iommu *iommu;
- LIST_HEAD(freelist);
-
- domain_unmap(si_domain, start_vpfn, last_vpfn, &freelist);
-
- rcu_read_lock();
- for_each_active_iommu(iommu, drhd)
- iommu_flush_iotlb_psi(iommu, si_domain,
- start_vpfn, mhp->nr_pages,
- list_empty(&freelist), 0);
- rcu_read_unlock();
- put_pages_list(&freelist);
- }
- break;
- }
-
- return NOTIFY_OK;
-}
-
-static struct notifier_block intel_iommu_memory_nb = {
- .notifier_call = intel_iommu_memory_notifier,
- .priority = 0
-};
-
static void intel_disable_iommus(void)
{
struct intel_iommu *iommu = NULL;
@@ -3643,12 +3550,7 @@ int __init intel_iommu_init(void)

iommu_pmu_register(iommu);
}
- up_read(&dmar_global_lock);

- if (si_domain && !hw_pass_through)
- register_memory_notifier(&intel_iommu_memory_nb);
-
- down_read(&dmar_global_lock);
if (probe_acpi_namespace_devices())
pr_warn("ACPI name space devices didn't probe correctly\n");

--
2.34.1

2023-12-05 01:27:39

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v2 5/6] iommu/vt-d: Add support for static identity domain

Add a global static identity domain with guaranteed attach semantics.

Software determines VT-d hardware support for pass-through translation by
inspecting the capability register. If pass-through translation is not
supported, the device is instructed to use DMA domain for its default
domain. While most recent VT-d hardware implementations support pass-
through translation, this capability is only lacking in some early VT-d
implementations.

With the static identity domain in place, the domain field of per-device
iommu driver data can be either a pointer to a DMA translation domain, or
NULL, indicating that the static identity domain is attached. Refine some
code to accommodate this change.

Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/intel/iommu.c | 122 +++++++++++++++++++++++++++++++++---
drivers/iommu/intel/svm.c | 2 +-
2 files changed, 116 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 7022cc183120..3c747d19495e 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1282,7 +1282,8 @@ static void iommu_enable_pci_caps(struct device_domain_info *info)
if (info->ats_supported && pci_ats_page_aligned(pdev) &&
!pci_enable_ats(pdev, VTD_PAGE_SHIFT)) {
info->ats_enabled = 1;
- domain_update_iotlb(info->domain);
+ if (info->domain)
+ domain_update_iotlb(info->domain);
}
}

@@ -1298,7 +1299,8 @@ static void iommu_disable_pci_caps(struct device_domain_info *info)
if (info->ats_enabled) {
pci_disable_ats(pdev);
info->ats_enabled = 0;
- domain_update_iotlb(info->domain);
+ if (info->domain)
+ domain_update_iotlb(info->domain);
}

if (info->pasid_enabled) {
@@ -1549,8 +1551,7 @@ static int iommu_init_domains(struct intel_iommu *iommu)
* second-level or nested translation. We reserve a domain id for
* this purpose.
*/
- if (sm_supported(iommu))
- set_bit(FLPT_DEFAULT_DID, iommu->domain_ids);
+ set_bit(FLPT_DEFAULT_DID, iommu->domain_ids);

return 0;
}
@@ -3614,6 +3615,9 @@ static void dmar_remove_one_dev_info(struct device *dev)
domain_context_clear(info);
}

+ if (!domain)
+ return;
+
spin_lock_irqsave(&domain->lock, flags);
list_del(&info->link);
spin_unlock_irqrestore(&domain->lock, flags);
@@ -3822,11 +3826,9 @@ int prepare_domain_attach_device(struct iommu_domain *domain,
static int intel_iommu_attach_device(struct iommu_domain *domain,
struct device *dev)
{
- struct device_domain_info *info = dev_iommu_priv_get(dev);
int ret;

- if (info->domain)
- device_block_translation(dev);
+ device_block_translation(dev);

ret = prepare_domain_attach_device(domain, dev);
if (ret)
@@ -4431,6 +4433,9 @@ static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
goto out_tear_down;
}

+ if (domain->type == IOMMU_DOMAIN_IDENTITY)
+ goto out_tear_down;
+
dmar_domain = to_dmar_domain(domain);
spin_lock_irqsave(&dmar_domain->lock, flags);
list_for_each_entry(curr, &dmar_domain->dev_pasids, link_domain) {
@@ -4605,8 +4610,111 @@ static const struct iommu_dirty_ops intel_dirty_ops = {
.read_and_clear_dirty = intel_iommu_read_and_clear_dirty,
};

+static int context_setup_pass_through(struct device *dev, u8 bus, u8 devfn)
+{
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+ struct intel_iommu *iommu = info->iommu;
+ struct context_entry *context;
+
+ spin_lock(&iommu->lock);
+ context = iommu_context_addr(iommu, bus, devfn, 1);
+ if (!context) {
+ spin_unlock(&iommu->lock);
+ return -ENOMEM;
+ }
+
+ if (context_present(context) && !context_copied(iommu, bus, devfn)) {
+ spin_unlock(&iommu->lock);
+ return 0;
+ }
+
+ copied_context_tear_down(iommu, context, bus, devfn);
+ context_clear_entry(context);
+ context_set_domain_id(context, FLPT_DEFAULT_DID);
+
+ /*
+ * In pass through mode, AW must be programmed to indicate the largest
+ * AGAW value supported by hardware. And ASR is ignored by hardware.
+ */
+ context_set_address_width(context, iommu->msagaw);
+ context_set_translation_type(context, CONTEXT_TT_PASS_THROUGH);
+ context_set_fault_enable(context);
+ context_set_present(context);
+ if (!ecap_coherent(iommu->ecap))
+ clflush_cache_range(context, sizeof(*context));
+ context_present_cache_flush(iommu, FLPT_DEFAULT_DID, bus, devfn);
+ spin_unlock(&iommu->lock);
+
+ return 0;
+}
+
+static int context_setup_pass_through_cb(struct pci_dev *pdev, u16 alias, void *data)
+{
+ struct device *dev = data;
+
+ if (dev != &pdev->dev)
+ return 0;
+
+ return context_setup_pass_through(dev, PCI_BUS_NUM(alias), alias & 0xff);
+}
+
+static int device_setup_pass_through(struct device *dev)
+{
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+
+ if (!dev_is_pci(dev))
+ return context_setup_pass_through(dev, info->bus, info->devfn);
+
+ return pci_for_each_dma_alias(to_pci_dev(dev),
+ context_setup_pass_through_cb, dev);
+}
+
+static int identity_domain_attach_dev(struct iommu_domain *domain,
+ struct device *dev)
+{
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+ struct intel_iommu *iommu = info->iommu;
+ int ret;
+
+ device_block_translation(dev);
+
+ if (dev_is_real_dma_subdevice(dev))
+ return 0;
+
+ if (sm_supported(iommu)) {
+ ret = intel_pasid_setup_pass_through(iommu, dev, IOMMU_NO_PASID);
+ if (!ret)
+ iommu_enable_pci_caps(info);
+ } else {
+ ret = device_setup_pass_through(dev);
+ }
+
+ return ret;
+}
+
+static int identity_domain_set_dev_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
+{
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+ struct intel_iommu *iommu = info->iommu;
+
+ if (!pasid_supported(iommu) || dev_is_real_dma_subdevice(dev))
+ return -EOPNOTSUPP;
+
+ return intel_pasid_setup_pass_through(iommu, dev, pasid);
+}
+
+static struct iommu_domain identity_domain = {
+ .type = IOMMU_DOMAIN_IDENTITY,
+ .ops = &(const struct iommu_domain_ops) {
+ .attach_dev = identity_domain_attach_dev,
+ .set_dev_pasid = identity_domain_set_dev_pasid,
+ },
+};
+
const struct iommu_ops intel_iommu_ops = {
.blocked_domain = &blocking_domain,
+ .identity_domain = &identity_domain,
.capable = intel_iommu_capable,
.hw_info = intel_iommu_hw_info,
.domain_alloc = intel_iommu_domain_alloc,
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 442ff9905ca9..d78beb132f5b 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -493,7 +493,7 @@ void intel_drain_pasid_prq(struct device *dev, u32 pasid)
domain = info->domain;
pdev = to_pci_dev(dev);
sid = PCI_DEVID(info->bus, info->devfn);
- did = domain_id_iommu(domain, iommu);
+ did = domain ? domain_id_iommu(domain, iommu) : FLPT_DEFAULT_DID;
qdep = pci_ats_queue_depth(pdev);

/*
--
2.34.1

2023-12-05 01:27:41

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v2 6/6] iommu/vt-d: Cleanup si_domain

The static identity domain has been introduced, rendering the si_domain
obsolete. Remove si_domain and cleanup the code accordingly.

Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/intel/iommu.c | 118 +++++++-----------------------------
1 file changed, 23 insertions(+), 95 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 3c747d19495e..91443b34111b 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -97,15 +97,6 @@ static phys_addr_t root_entry_uctp(struct root_entry *re)
return re->hi & VTD_PAGE_MASK;
}

-/*
- * This domain is a statically identity mapping domain.
- * 1. This domain creats a static 1:1 mapping to all usable memory.
- * 2. It maps to each iommu if successful.
- * 3. Each iommu mapps to this domain if successful.
- */
-static struct dmar_domain *si_domain;
-static int hw_pass_through = 1;
-
struct dmar_rmrr_unit {
struct list_head list; /* list of rmrr units */
struct acpi_dmar_header *hdr; /* ACPI header */
@@ -240,11 +231,6 @@ void free_pgtable_page(void *vaddr)
free_page((unsigned long)vaddr);
}

-static int domain_type_is_si(struct dmar_domain *domain)
-{
- return domain->domain.type == IOMMU_DOMAIN_IDENTITY;
-}
-
static int domain_pfn_supported(struct dmar_domain *domain, unsigned long pfn)
{
int addr_width = agaw_to_width(domain->agaw) - VTD_PAGE_SHIFT;
@@ -1795,9 +1781,6 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
struct context_entry *context;
int agaw, ret;

- if (hw_pass_through && domain_type_is_si(domain))
- translation = CONTEXT_TT_PASS_THROUGH;
-
pr_debug("Set context mapping for %02x:%02x.%d\n",
bus, PCI_SLOT(devfn), PCI_FUNC(devfn));

@@ -1816,34 +1799,24 @@ static int domain_context_mapping_one(struct dmar_domain *domain,

context_set_domain_id(context, did);

- if (translation != CONTEXT_TT_PASS_THROUGH) {
- /*
- * Skip top levels of page tables for iommu which has
- * less agaw than default. Unnecessary for PT mode.
- */
- for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) {
- ret = -ENOMEM;
- pgd = phys_to_virt(dma_pte_addr(pgd));
- if (!dma_pte_present(pgd))
- goto out_unlock;
- }
-
- if (info && info->ats_supported)
- translation = CONTEXT_TT_DEV_IOTLB;
- else
- translation = CONTEXT_TT_MULTI_LEVEL;
-
- context_set_address_root(context, virt_to_phys(pgd));
- context_set_address_width(context, agaw);
- } else {
- /*
- * In pass through mode, AW must be programmed to
- * indicate the largest AGAW value supported by
- * hardware. And ASR is ignored by hardware.
- */
- context_set_address_width(context, iommu->msagaw);
+ /*
+ * Skip top levels of page tables for iommu which has
+ * less agaw than default. Unnecessary for PT mode.
+ */
+ for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) {
+ ret = -ENOMEM;
+ pgd = phys_to_virt(dma_pte_addr(pgd));
+ if (!dma_pte_present(pgd))
+ goto out_unlock;
}

+ if (info && info->ats_supported)
+ translation = CONTEXT_TT_DEV_IOTLB;
+ else
+ translation = CONTEXT_TT_MULTI_LEVEL;
+
+ context_set_address_root(context, virt_to_phys(pgd));
+ context_set_address_width(context, agaw);
context_set_translation_type(context, translation);
context_set_fault_enable(context);
context_set_present(context);
@@ -2077,14 +2050,10 @@ static void domain_context_clear_one(struct device_domain_info *info, u8 bus, u8
return;
}

- if (sm_supported(iommu)) {
- if (hw_pass_through && domain_type_is_si(info->domain))
- did_old = FLPT_DEFAULT_DID;
- else
- did_old = domain_id_iommu(info->domain, iommu);
- } else {
- did_old = context_domain_id(context);
- }
+ if (info->domain)
+ did_old = domain_id_iommu(info->domain, iommu);
+ else
+ did_old = FLPT_DEFAULT_DID;

context_clear_entry(context);
__iommu_flush_cache(iommu, context, sizeof(*context));
@@ -2147,23 +2116,6 @@ static bool dev_is_real_dma_subdevice(struct device *dev)
pci_real_dma_dev(to_pci_dev(dev)) != to_pci_dev(dev);
}

-static int md_domain_init(struct dmar_domain *domain, int guest_width);
-
-static int __init si_domain_init(int hw)
-{
- si_domain = alloc_domain(IOMMU_DOMAIN_IDENTITY);
- if (!si_domain)
- return -EFAULT;
-
- if (md_domain_init(si_domain, DEFAULT_DOMAIN_ADDRESS_WIDTH)) {
- domain_exit(si_domain);
- si_domain = NULL;
- return -EFAULT;
- }
-
- return 0;
-}
-
static int dmar_domain_attach_device(struct dmar_domain *domain,
struct device *dev)
{
@@ -2185,8 +2137,6 @@ static int dmar_domain_attach_device(struct dmar_domain *domain,

if (!sm_supported(iommu))
ret = domain_context_mapping(domain, dev);
- else if (hw_pass_through && domain_type_is_si(domain))
- ret = intel_pasid_setup_pass_through(iommu, dev, IOMMU_NO_PASID);
else if (domain->use_first_level)
ret = domain_setup_first_level(iommu, domain, dev, IOMMU_NO_PASID);
else
@@ -2197,8 +2147,7 @@ static int dmar_domain_attach_device(struct dmar_domain *domain,
return ret;
}

- if (sm_supported(info->iommu) || !domain_type_is_si(info->domain))
- iommu_enable_pci_caps(info);
+ iommu_enable_pci_caps(info);

return 0;
}
@@ -2548,8 +2497,6 @@ static int __init init_dmars(void)
}
}

- if (!ecap_pass_through(iommu->ecap))
- hw_pass_through = 0;
intel_svm_check(iommu);
}

@@ -2572,10 +2519,6 @@ static int __init init_dmars(void)

check_tylersburg_isoch();

- ret = si_domain_init(hw_pass_through);
- if (ret)
- goto free_iommu;
-
/*
* for each drhd
* enable fault log
@@ -2621,10 +2564,6 @@ static int __init init_dmars(void)
disable_dmar_iommu(iommu);
free_dmar_iommu(iommu);
}
- if (si_domain) {
- domain_exit(si_domain);
- si_domain = NULL;
- }

return ret;
}
@@ -2999,12 +2938,6 @@ static int intel_iommu_add(struct dmar_drhd_unit *dmaru)
if (ret)
goto out;

- if (hw_pass_through && !ecap_pass_through(iommu->ecap)) {
- pr_warn("%s: Doesn't support hardware pass through.\n",
- iommu->name);
- return -ENXIO;
- }
-
sp = domain_update_iommu_superpage(NULL, iommu) - 1;
if (sp >= 0 && !(cap_super_page_val(iommu->cap) & (1 << sp))) {
pr_warn("%s: Doesn't support large page.\n",
@@ -3718,8 +3651,6 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type)
domain->geometry.force_aperture = true;

return domain;
- case IOMMU_DOMAIN_IDENTITY:
- return &si_domain->domain;
case IOMMU_DOMAIN_SVA:
return intel_svm_domain_alloc();
default:
@@ -3779,8 +3710,7 @@ intel_iommu_domain_alloc_user(struct device *dev, u32 flags,

static void intel_iommu_domain_free(struct iommu_domain *domain)
{
- if (domain != &si_domain->domain)
- domain_exit(to_dmar_domain(domain));
+ domain_exit(to_dmar_domain(domain));
}

int prepare_domain_attach_device(struct iommu_domain *domain,
@@ -4487,9 +4417,7 @@ static int intel_iommu_set_dev_pasid(struct iommu_domain *domain,
if (ret)
goto out_free;

- if (domain_type_is_si(dmar_domain))
- ret = intel_pasid_setup_pass_through(iommu, dev, pasid);
- else if (dmar_domain->use_first_level)
+ if (dmar_domain->use_first_level)
ret = domain_setup_first_level(iommu, dmar_domain,
dev, pasid);
else
--
2.34.1

2023-12-08 08:57:10

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v2 2/6] iommu/vt-d: Remove scalable mode context entry setup from attach_dev

> From: Lu Baolu <[email protected]>
> Sent: Tuesday, December 5, 2023 9:22 AM
>
> static int domain_context_mapping_one(struct dmar_domain *domain,
> struct intel_iommu *iommu,
> - struct pasid_table *table,
> u8 bus, u8 devfn)

since this is called in legacy now it's clearer to add a legacy_ prefix
to this and other related functions.

>
> - /* PASID table is mandatory for a PCI device in scalable mode. */
> - if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev)) {
> - /* Setup the PASID entry for requests without PASID: */
> - if (hw_pass_through && domain_type_is_si(domain))
> - ret = intel_pasid_setup_pass_through(iommu,
> - dev, IOMMU_NO_PASID);
> - else if (domain->use_first_level)
> - ret = domain_setup_first_level(iommu, domain, dev,
> - IOMMU_NO_PASID);
> - else
> - ret = intel_pasid_setup_second_level(iommu,
> domain,
> - dev, IOMMU_NO_PASID);
> - if (ret) {
> - dev_err(dev, "Setup RID2PASID failed\n");
> - device_block_translation(dev);
> - return ret;
> - }
> - }
> + if (dev_is_real_dma_subdevice(dev))
> + return 0;

is it a functional change? old code doesn't early exit for such device.

2023-12-08 09:09:21

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v2 4/6] iommu/vt-d: Remove 1:1 mappings from identity domain

> From: Lu Baolu <[email protected]>
> Sent: Tuesday, December 5, 2023 9:22 AM
>
> Older VT-d hardware implementations did not support pass-through
> translation mode. The iommu driver relied on a DMA domain with all
> physical memory addresses identically mapped to the same IOVA to
> simulate pass-through translation.
>
> This workaround is no longer necessary due to the evolution of iommu
> core. The core has introduced def_domain_type op, allowing the iommu
> driver to specify its capabilities. Additionally, the identity domain
> has become a static system domain with "never fail" attach semantics.

I'm not sure above explains the reason for removing the identity support
on older hardware. Looks the reason is simply that continuing to maintain
that debt prevents intel-iommu driver from catching up with iommu core
evolution so we decide to remove it.

>
> Eliminate support for the 1:1 mapping domain on older hardware and
> removes the unused code that created the 1:1 page table. This paves a
> way for the implementation of a global static identity domain.

Do you know how old such hardware is?

> @@ -2311,6 +2257,13 @@ static int device_def_domain_type(struct device
> *dev)
> return IOMMU_DOMAIN_IDENTITY;
> }
>
> + /*
> + * Hardware does not support the passthrough translation mode.
> + * Always use a dynamaic mapping domain.
> + */
> + if (!ecap_pass_through(iommu->ecap))
> + return IOMMU_DOMAIN_DMA;
> +
> return 0;

there are two cases above which mandates IDENTITY. Have you confirmed
that those platforms support hardware passthrough? otherwise this change
is broken.

2023-12-08 12:46:22

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] iommu/vt-d: Remove 1:1 mappings from identity domain

On 2023/12/8 17:09, Tian, Kevin wrote:
>> From: Lu Baolu <[email protected]>
>> Sent: Tuesday, December 5, 2023 9:22 AM
>>
>> Older VT-d hardware implementations did not support pass-through
>> translation mode. The iommu driver relied on a DMA domain with all
>> physical memory addresses identically mapped to the same IOVA to
>> simulate pass-through translation.
>>
>> This workaround is no longer necessary due to the evolution of iommu
>> core. The core has introduced def_domain_type op, allowing the iommu
>> driver to specify its capabilities. Additionally, the identity domain
>> has become a static system domain with "never fail" attach semantics.
>
> I'm not sure above explains the reason for removing the identity support
> on older hardware. Looks the reason is simply that continuing to maintain
> that debt prevents intel-iommu driver from catching up with iommu core
> evolution so we decide to remove it.

It is not that simple. I should put more words here.

Generally speaking, hardware lacking passthrough translation support,
but the iommu driver fakes it by using a DMA domain with 1:1 mappings,
makes no sense because it doesn't mitigate any hardware overheads.

If the device driver uses the kernel DMA API to do DMA, it does not need
to care about the DMA translation type. This is a user-decided policy.
The iommu subsystem has already provided this support to users through
kernel build options, kernel boot commands, and sysfs nodes.

If the device driver doesn't use kernel API for DMA. While we discourage
this behavior, the iommu subsystem provides the DMA ownership mechanism.
This allows the driver to take over the DMA ownership, allocate and
manage its own domain, and replace it with the default domain, as the
iommu default domain is only designed for kernel DMA with the DMA API.

In summary, whether or not to use a DMA domain with 1:1 mappings should
be a design decision made in the device driver, not a common behavior
for a modern iommu driver.

>
>>
>> Eliminate support for the 1:1 mapping domain on older hardware and
>> removes the unused code that created the 1:1 page table. This paves a
>> way for the implementation of a global static identity domain.
>
> Do you know how old such hardware is?

I am not sure, but I have a desktop that is over 10 years old and
supports passthrough translation. :-)

>
>> @@ -2311,6 +2257,13 @@ static int device_def_domain_type(struct device
>> *dev)
>> return IOMMU_DOMAIN_IDENTITY;
>> }
>>
>> + /*
>> + * Hardware does not support the passthrough translation mode.
>> + * Always use a dynamaic mapping domain.
>> + */
>> + if (!ecap_pass_through(iommu->ecap))
>> + return IOMMU_DOMAIN_DMA;
>> +
>> return 0;
>
> there are two cases above which mandates IDENTITY. Have you confirmed
> that those platforms support hardware passthrough? otherwise this change
> is broken.

Those two cases should be hardware quirks for SoC-integrated devices. It
makes no reason that a quirk requires IOMMU passthrough translation, but
the hardware doesn't support it.

If, unfortunately, those quirks turn out to be workarounds for a poorly
designed device driver, we should remove those quirks and request the
device driver to utilize the DMA ownership framework to achieve the same
functionality within the driver itself.

Best regards,
baolu

2023-12-09 08:02:36

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] iommu/vt-d: Remove scalable mode context entry setup from attach_dev

On 12/8/23 4:56 PM, Tian, Kevin wrote:
>> From: Lu Baolu <[email protected]>
>> Sent: Tuesday, December 5, 2023 9:22 AM
>>
>> static int domain_context_mapping_one(struct dmar_domain *domain,
>> struct intel_iommu *iommu,
>> - struct pasid_table *table,
>> u8 bus, u8 devfn)
>
> since this is called in legacy now it's clearer to add a legacy_ prefix
> to this and other related functions.

Okay, that would make the code more readable.

>>
>> - /* PASID table is mandatory for a PCI device in scalable mode. */
>> - if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev)) {
>> - /* Setup the PASID entry for requests without PASID: */
>> - if (hw_pass_through && domain_type_is_si(domain))
>> - ret = intel_pasid_setup_pass_through(iommu,
>> - dev, IOMMU_NO_PASID);
>> - else if (domain->use_first_level)
>> - ret = domain_setup_first_level(iommu, domain, dev,
>> - IOMMU_NO_PASID);
>> - else
>> - ret = intel_pasid_setup_second_level(iommu,
>> domain,
>> - dev, IOMMU_NO_PASID);
>> - if (ret) {
>> - dev_err(dev, "Setup RID2PASID failed\n");
>> - device_block_translation(dev);
>> - return ret;
>> - }
>> - }
>> + if (dev_is_real_dma_subdevice(dev))
>> + return 0;
>
> is it a functional change? old code doesn't early exit for such device.

This type of device is aliased to a RID on another bus. Therefore,
there's no need to set up the context and PASID table for them. I should
make the change mentioned above in a separate patch to make the
background clear.

Best regards,
baolu

2023-12-11 03:59:02

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v2 4/6] iommu/vt-d: Remove 1:1 mappings from identity domain

> From: Baolu Lu <[email protected]>
> Sent: Friday, December 8, 2023 8:46 PM
>
> On 2023/12/8 17:09, Tian, Kevin wrote:
> >> From: Lu Baolu <[email protected]>
> >> Sent: Tuesday, December 5, 2023 9:22 AM
> >>
> >> Older VT-d hardware implementations did not support pass-through
> >> translation mode. The iommu driver relied on a DMA domain with all
> >> physical memory addresses identically mapped to the same IOVA to
> >> simulate pass-through translation.
> >>
> >> This workaround is no longer necessary due to the evolution of iommu
> >> core. The core has introduced def_domain_type op, allowing the iommu
> >> driver to specify its capabilities. Additionally, the identity domain
> >> has become a static system domain with "never fail" attach semantics.
> >
> > I'm not sure above explains the reason for removing the identity support
> > on older hardware. Looks the reason is simply that continuing to maintain
> > that debt prevents intel-iommu driver from catching up with iommu core
> > evolution so we decide to remove it.
>
> It is not that simple. I should put more words here.
>
> Generally speaking, hardware lacking passthrough translation support,
> but the iommu driver fakes it by using a DMA domain with 1:1 mappings,
> makes no sense because it doesn't mitigate any hardware overheads.

Not about overhead. Might just be that device or driver is not capable of
handling remapped dma address.

>
> If the device driver uses the kernel DMA API to do DMA, it does not need
> to care about the DMA translation type. This is a user-decided policy.
> The iommu subsystem has already provided this support to users through
> kernel build options, kernel boot commands, and sysfs nodes.
>
> If the device driver doesn't use kernel API for DMA. While we discourage
> this behavior, the iommu subsystem provides the DMA ownership
> mechanism.
> This allows the driver to take over the DMA ownership, allocate and
> manage its own domain, and replace it with the default domain, as the
> iommu default domain is only designed for kernel DMA with the DMA API.
>
> In summary, whether or not to use a DMA domain with 1:1 mappings
> should
> be a design decision made in the device driver, not a common behavior
> for a modern iommu driver.
>
> >
> >>
> >> Eliminate support for the 1:1 mapping domain on older hardware and
> >> removes the unused code that created the 1:1 page table. This paves a
> >> way for the implementation of a global static identity domain.
> >
> > Do you know how old such hardware is?
>
> I am not sure, but I have a desktop that is over 10 years old and
> supports passthrough translation. :-)
>
> >
> >> @@ -2311,6 +2257,13 @@ static int device_def_domain_type(struct
> device
> >> *dev)
> >> return IOMMU_DOMAIN_IDENTITY;
> >> }
> >>
> >> + /*
> >> + * Hardware does not support the passthrough translation mode.
> >> + * Always use a dynamaic mapping domain.
> >> + */
> >> + if (!ecap_pass_through(iommu->ecap))
> >> + return IOMMU_DOMAIN_DMA;
> >> +
> >> return 0;
> >
> > there are two cases above which mandates IDENTITY. Have you confirmed
> > that those platforms support hardware passthrough? otherwise this
> change
> > is broken.
>
> Those two cases should be hardware quirks for SoC-integrated devices. It
> makes no reason that a quirk requires IOMMU passthrough translation, but
> the hardware doesn't support it.
>
> If, unfortunately, those quirks turn out to be workarounds for a poorly
> designed device driver, we should remove those quirks and request the
> device driver to utilize the DMA ownership framework to achieve the same
> functionality within the driver itself.
>

if that is the case you should fix the drivers first before breaking them.

But at a glance looks those two quirks are just fine.

For Azalia sound device the problem is that BIOS enables a dedicated
DMAR for it but allocates zero TLB entries to cause deadlock. This
implies a hw passthrough mode otherwise it's still broken.

For GFX it's a workaround added since day one. there is even still
an option CONFIG_INTEL_IOMMU_BROKEN_GFX_WA available. But
now its meaning is really disabling IOMMU instead of using identity.

sounds like IDENTMAP_GFX can be fully removed now:

#ifdef CONFIG_INTEL_IOMMU_BROKEN_GFX_WA
dmar_map_gfx = 0;
#endif

if (!dmar_map_gfx)
iommu_identity_mapping |= IDENTMAP_GFX;

2023-12-12 06:26:02

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] iommu/vt-d: Remove 1:1 mappings from identity domain

On 12/11/23 11:58 AM, Tian, Kevin wrote:
>>> there are two cases above which mandates IDENTITY. Have you confirmed
>>> that those platforms support hardware passthrough? otherwise this
>> change
>>> is broken.
>> Those two cases should be hardware quirks for SoC-integrated devices. It
>> makes no reason that a quirk requires IOMMU passthrough translation, but
>> the hardware doesn't support it.
>>
>> If, unfortunately, those quirks turn out to be workarounds for a poorly
>> designed device driver, we should remove those quirks and request the
>> device driver to utilize the DMA ownership framework to achieve the same
>> functionality within the driver itself.
>>
> if that is the case you should fix the drivers first before breaking them.
>
> But at a glance looks those two quirks are just fine.
>
> For Azalia sound device the problem is that BIOS enables a dedicated
> DMAR for it but allocates zero TLB entries to cause deadlock. This
> implies a hw passthrough mode otherwise it's still broken.

Yes. It's safe for Azalia sound device.

>
> For GFX it's a workaround added since day one. there is even still
> an option CONFIG_INTEL_IOMMU_BROKEN_GFX_WA available. But
> now its meaning is really disabling IOMMU instead of using identity.
>
> sounds like IDENTMAP_GFX can be fully removed now:
>
> #ifdef CONFIG_INTEL_IOMMU_BROKEN_GFX_WA
> dmar_map_gfx = 0;
> #endif

We should already remove the workaround for the 2.6 kernel. :-)

It's default "n". Therefore, if any gfx driver still needs this
workaround, there should already be a bug report.

>
> if (!dmar_map_gfx)
> iommu_identity_mapping |= IDENTMAP_GFX;

So with above cleaned up, we have no need to worry about drivers that
are not capable of handling remapped dma address any more.

Did I miss anything?

Best regards,
baolu

2023-12-13 02:21:50

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v2 4/6] iommu/vt-d: Remove 1:1 mappings from identity domain

> From: Baolu Lu <[email protected]>
> Sent: Tuesday, December 12, 2023 2:21 PM
> >
> > For GFX it's a workaround added since day one. there is even still
> > an option CONFIG_INTEL_IOMMU_BROKEN_GFX_WA available. But
> > now its meaning is really disabling IOMMU instead of using identity.
> >
> > sounds like IDENTMAP_GFX can be fully removed now:
> >
> > #ifdef CONFIG_INTEL_IOMMU_BROKEN_GFX_WA
> > dmar_map_gfx = 0;
> > #endif
>
> We should already remove the workaround for the 2.6 kernel. :-)

I don't know what happened to ignore that temporary option.

>
> It's default "n". Therefore, if any gfx driver still needs this
> workaround, there should already be a bug report.

It depends on how it is set in distros.

>
> >
> > if (!dmar_map_gfx)
> > iommu_identity_mapping |= IDENTMAP_GFX;
>
> So with above cleaned up, we have no need to worry about drivers that
> are not capable of handling remapped dma address any more.
>
> Did I miss anything?

I prefer to removing IDENTMAP_GFX in this series and put a comment
explaining why Azalia device has no problem.

Then send a separate patch to remove the GFX workaround option.
If there is any valid usage still relying on that, it's easy to revert.

2023-12-13 02:48:41

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] iommu/vt-d: Remove 1:1 mappings from identity domain

On 12/13/23 10:20 AM, Tian, Kevin wrote:
>>> if (!dmar_map_gfx)
>>> iommu_identity_mapping |= IDENTMAP_GFX;
>> So with above cleaned up, we have no need to worry about drivers that
>> are not capable of handling remapped dma address any more.
>>
>> Did I miss anything?
> I prefer to removing IDENTMAP_GFX in this series and put a comment
> explaining why Azalia device has no problem.
>
> Then send a separate patch to remove the GFX workaround option.
> If there is any valid usage still relying on that, it's easy to revert.

Agreed. We should be more cautious. Perhaps I will postpone this series
to a time when we are sure that graphic drivers are okay with this
change. As a first step, perhaps we can make a change to remove the
workaround for graphic drivers, so that any hidden bugs in the graphic
driver could be reported.

The patch looks like,

iommu/vt-d: Remove INTEL_IOMMU_BROKEN_GFX_WA

Commit 62edf5dc4a524 ("intel-iommu: Restore DMAR_BROKEN_GFX_WA option for
broken graphics drivers") was introduced 24 years ago as a temporary
workaround for graphics drivers that used physical addresses for DMA and
avoided DMA APIs. This workaround was disabled by default.

As 24 years have passed, it is expected that graphics driver developers
have migrated their drivers to use kernel DMA APIs. Therefore, this
workaround is no longer required and could been removed.

Suggested-by: Kevin Tian <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/intel/iommu.c | 10 ----------
drivers/iommu/intel/Kconfig | 11 -----------
2 files changed, 21 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 84b78e42a470..27b8638291f2 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2357,9 +2357,6 @@ static int device_def_domain_type(struct device *dev)

if ((iommu_identity_mapping & IDENTMAP_AZALIA) && IS_AZALIA(pdev))
return IOMMU_DOMAIN_IDENTITY;
-
- if ((iommu_identity_mapping & IDENTMAP_GFX) && IS_GFX_DEVICE(pdev))
- return IOMMU_DOMAIN_IDENTITY;
}

return 0;
@@ -2660,13 +2657,6 @@ static int __init init_dmars(void)
iommu_set_root_entry(iommu);
}

-#ifdef CONFIG_INTEL_IOMMU_BROKEN_GFX_WA
- dmar_map_gfx = 0;
-#endif
-
- if (!dmar_map_gfx)
- iommu_identity_mapping |= IDENTMAP_GFX;
-
check_tylersburg_isoch();

ret = si_domain_init(hw_pass_through);
diff --git a/drivers/iommu/intel/Kconfig b/drivers/iommu/intel/Kconfig
index 012cd2541a68..d2d34eb28d94 100644
--- a/drivers/iommu/intel/Kconfig
+++ b/drivers/iommu/intel/Kconfig
@@ -64,17 +64,6 @@ config INTEL_IOMMU_DEFAULT_ON
one is found. If this option is not selected, DMAR support can
be enabled by passing intel_iommu=on to the kernel.

-config INTEL_IOMMU_BROKEN_GFX_WA
- bool "Workaround broken graphics drivers (going away soon)"
- depends on BROKEN && X86
- help
- Current Graphics drivers tend to use physical address
- for DMA and avoid using DMA APIs. Setting this config
- option permits the IOMMU driver to set a unity map for
- all the OS-visible memory. Hence the driver can continue
- to use physical addresses for DMA, at least until this
- option is removed in the 2.6.32 kernel.
-
config INTEL_IOMMU_FLOPPY_WA
def_bool y
depends on X86

Best regards,
baolu

2023-12-13 03:04:35

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v2 4/6] iommu/vt-d: Remove 1:1 mappings from identity domain

> From: Baolu Lu <[email protected]>
> Sent: Wednesday, December 13, 2023 10:44 AM
>
> On 12/13/23 10:20 AM, Tian, Kevin wrote:
> >>> if (!dmar_map_gfx)
> >>> iommu_identity_mapping |= IDENTMAP_GFX;
> >> So with above cleaned up, we have no need to worry about drivers that
> >> are not capable of handling remapped dma address any more.
> >>
> >> Did I miss anything?
> > I prefer to removing IDENTMAP_GFX in this series and put a comment
> > explaining why Azalia device has no problem.
> >
> > Then send a separate patch to remove the GFX workaround option.
> > If there is any valid usage still relying on that, it's easy to revert.
>
> Agreed. We should be more cautious. Perhaps I will postpone this series
> to a time when we are sure that graphic drivers are okay with this
> change. As a first step, perhaps we can make a change to remove the
> workaround for graphic drivers, so that any hidden bugs in the graphic
> driver could be reported.
>
> The patch looks like,
>
> iommu/vt-d: Remove INTEL_IOMMU_BROKEN_GFX_WA
>
> Commit 62edf5dc4a524 ("intel-iommu: Restore DMAR_BROKEN_GFX_WA
> option for
> broken graphics drivers") was introduced 24 years ago as a temporary
> workaround for graphics drivers that used physical addresses for DMA and
> avoided DMA APIs. This workaround was disabled by default.
>
> As 24 years have passed, it is expected that graphics driver developers
> have migrated their drivers to use kernel DMA APIs. Therefore, this
> workaround is no longer required and could been removed.

and igfx_off option is still available just in case.

>
> Suggested-by: Kevin Tian <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> ---
> drivers/iommu/intel/iommu.c | 10 ----------
> drivers/iommu/intel/Kconfig | 11 -----------
> 2 files changed, 21 deletions(-)
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 84b78e42a470..27b8638291f2 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -2357,9 +2357,6 @@ static int device_def_domain_type(struct device
> *dev)
>
> if ((iommu_identity_mapping & IDENTMAP_AZALIA) &&
> IS_AZALIA(pdev))
> return IOMMU_DOMAIN_IDENTITY;
> -
> - if ((iommu_identity_mapping & IDENTMAP_GFX) &&
> IS_GFX_DEVICE(pdev))
> - return IOMMU_DOMAIN_IDENTITY;
> }
>
> return 0;
> @@ -2660,13 +2657,6 @@ static int __init init_dmars(void)
> iommu_set_root_entry(iommu);
> }
>
> -#ifdef CONFIG_INTEL_IOMMU_BROKEN_GFX_WA
> - dmar_map_gfx = 0;
> -#endif
> -
> - if (!dmar_map_gfx)
> - iommu_identity_mapping |= IDENTMAP_GFX;
> -

let's remove IDENTMAP_GFX (and all its references) in a separate patch.

this patch is for removing the workaround option.

another patch removes IDENTMAP_GFX as it's meaningless when
dmar_map_gfx simply implies disabling the gfx iommu.

> check_tylersburg_isoch();
>
> ret = si_domain_init(hw_pass_through);
> diff --git a/drivers/iommu/intel/Kconfig b/drivers/iommu/intel/Kconfig
> index 012cd2541a68..d2d34eb28d94 100644
> --- a/drivers/iommu/intel/Kconfig
> +++ b/drivers/iommu/intel/Kconfig
> @@ -64,17 +64,6 @@ config INTEL_IOMMU_DEFAULT_ON
> one is found. If this option is not selected, DMAR support can
> be enabled by passing intel_iommu=on to the kernel.
>
> -config INTEL_IOMMU_BROKEN_GFX_WA
> - bool "Workaround broken graphics drivers (going away soon)"
> - depends on BROKEN && X86
> - help
> - Current Graphics drivers tend to use physical address
> - for DMA and avoid using DMA APIs. Setting this config
> - option permits the IOMMU driver to set a unity map for
> - all the OS-visible memory. Hence the driver can continue
> - to use physical addresses for DMA, at least until this
> - option is removed in the 2.6.32 kernel.
> -
> config INTEL_IOMMU_FLOPPY_WA
> def_bool y
> depends on X86
>
> Best regards,
> baolu

2023-12-13 03:07:05

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] iommu/vt-d: Remove 1:1 mappings from identity domain

On 12/13/23 11:04 AM, Tian, Kevin wrote:
>> From: Baolu Lu<[email protected]>
>> Sent: Wednesday, December 13, 2023 10:44 AM
>>
>> On 12/13/23 10:20 AM, Tian, Kevin wrote:
>>>>> if (!dmar_map_gfx)
>>>>> iommu_identity_mapping |= IDENTMAP_GFX;
>>>> So with above cleaned up, we have no need to worry about drivers that
>>>> are not capable of handling remapped dma address any more.
>>>>
>>>> Did I miss anything?
>>> I prefer to removing IDENTMAP_GFX in this series and put a comment
>>> explaining why Azalia device has no problem.
>>>
>>> Then send a separate patch to remove the GFX workaround option.
>>> If there is any valid usage still relying on that, it's easy to revert.
>> Agreed. We should be more cautious. Perhaps I will postpone this series
>> to a time when we are sure that graphic drivers are okay with this
>> change. As a first step, perhaps we can make a change to remove the
>> workaround for graphic drivers, so that any hidden bugs in the graphic
>> driver could be reported.
>>
>> The patch looks like,
>>
>> iommu/vt-d: Remove INTEL_IOMMU_BROKEN_GFX_WA
>>
>> Commit 62edf5dc4a524 ("intel-iommu: Restore DMAR_BROKEN_GFX_WA
>> option for
>> broken graphics drivers") was introduced 24 years ago as a temporary
>> workaround for graphics drivers that used physical addresses for DMA and
>> avoided DMA APIs. This workaround was disabled by default.
>>
>> As 24 years have passed, it is expected that graphics driver developers
>> have migrated their drivers to use kernel DMA APIs. Therefore, this
>> workaround is no longer required and could been removed.
> and igfx_off option is still available just in case.

Yeah!

>
>> Suggested-by: Kevin Tian<[email protected]>
>> Signed-off-by: Lu Baolu<[email protected]>
>> ---
>> drivers/iommu/intel/iommu.c | 10 ----------
>> drivers/iommu/intel/Kconfig | 11 -----------
>> 2 files changed, 21 deletions(-)
>>
>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>> index 84b78e42a470..27b8638291f2 100644
>> --- a/drivers/iommu/intel/iommu.c
>> +++ b/drivers/iommu/intel/iommu.c
>> @@ -2357,9 +2357,6 @@ static int device_def_domain_type(struct device
>> *dev)
>>
>> if ((iommu_identity_mapping & IDENTMAP_AZALIA) &&
>> IS_AZALIA(pdev))
>> return IOMMU_DOMAIN_IDENTITY;
>> -
>> - if ((iommu_identity_mapping & IDENTMAP_GFX) &&
>> IS_GFX_DEVICE(pdev))
>> - return IOMMU_DOMAIN_IDENTITY;
>> }
>>
>> return 0;
>> @@ -2660,13 +2657,6 @@ static int __init init_dmars(void)
>> iommu_set_root_entry(iommu);
>> }
>>
>> -#ifdef CONFIG_INTEL_IOMMU_BROKEN_GFX_WA
>> - dmar_map_gfx = 0;
>> -#endif
>> -
>> - if (!dmar_map_gfx)
>> - iommu_identity_mapping |= IDENTMAP_GFX;
>> -
> let's remove IDENTMAP_GFX (and all its references) in a separate patch.
>
> this patch is for removing the workaround option.
>
> another patch removes IDENTMAP_GFX as it's meaningless when
> dmar_map_gfx simply implies disabling the gfx iommu.

Okay, it's fine to me.

Best regards,
baolu