2023-11-20 11:36:55

by Lu Baolu

[permalink] [raw]
Subject: [PATCH 0/5] iommu/vt-d: Convert to use static identity domain

Intel's IOMMU driver used a special domain called 1:1 mapping domain to
support the domain of type IOMMU_DOMAIN_IDENTITY, which enables device
drivers to directly utilize physical addresses for DMA access despite
the presence of IOMMU units.

The implementation of the 1:1 mapping domain is influenced by hardware
differences. While modern Intel VT-d implementations support hardware
passthrough translation mode, earlier versions lacked this feature,
which requires a more complex implementation approach.

The 1:1 mapping domain for earlier hardware was implemented by associating
a DMA domain with an IOVA (IO Virtual Address) equivalent to the
physical address. While, for most hardware supporting passthrough mode,
simply setting the hardware's passthrough mode is sufficient. These two
modes were merged together in si_domain, which is a special DMA domain
sharing the domain ops of an ordinary DMA domain.

As the iommu core has evolved, it has introduced global static identity
domain with "never fail" attach semantics. This means that the domain is
always available and cannot fail to attach. The iommu driver now assigns
this domain directly at iommu_ops->identity_domain instead of allocating
it through the domain allocation interface.

This converts the Intel IOMMU driver to embrace the global static
identity domain. For early legacy hardwares that don't support
passthrough translation mode, ask the iommu core to use a DMA type of
default domain. For modern hardwares that support passthrough
translation mode, implement a static global identity domain.

The whole series is also avaiable at

https://github.com/LuBaolu/intel-iommu/commits/vtd-static-identity-domain-v1

Very appreciated for your review comments and suggestions.

Lu Baolu (5):
iommu/vt-d: Setup scalable mode context entry in probe path
iommu/vt-d: Remove scalable mode context entry setup from attach_dev
iommu/vt-d: Refactor domain_context_mapping_one() to be reusable
iommu/vt-d: Add support for static identity domain
iommu/vt-d: Remove si_domain

drivers/iommu/intel/pasid.h | 1 +
drivers/iommu/intel/iommu.c | 565 +++++++++++++++---------------------
drivers/iommu/intel/pasid.c | 180 ++++++++++++
drivers/iommu/intel/svm.c | 2 +-
4 files changed, 414 insertions(+), 334 deletions(-)

--
2.34.1


2023-11-20 11:38:35

by Lu Baolu

[permalink] [raw]
Subject: [PATCH 5/5] iommu/vt-d: Remove si_domain

The static identity domain has been introduced, rendering the si_domain
obsolete. Remove si_domain and cleanup the code accordingly.

Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/intel/iommu.c | 226 ++++--------------------------------
1 file changed, 23 insertions(+), 203 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index ad2e53821a05..90dd7f24beda 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -97,15 +97,6 @@ static phys_addr_t root_entry_uctp(struct root_entry *re)
return re->hi & VTD_PAGE_MASK;
}

-/*
- * This domain is a statically identity mapping domain.
- * 1. This domain creats a static 1:1 mapping to all usable memory.
- * 2. It maps to each iommu if successful.
- * 3. Each iommu mapps to this domain if successful.
- */
-static struct dmar_domain *si_domain;
-static int hw_pass_through = 1;
-
struct dmar_rmrr_unit {
struct list_head list; /* list of rmrr units */
struct acpi_dmar_header *hdr; /* ACPI header */
@@ -240,11 +231,6 @@ void free_pgtable_page(void *vaddr)
free_page((unsigned long)vaddr);
}

-static int domain_type_is_si(struct dmar_domain *domain)
-{
- return domain->domain.type == IOMMU_DOMAIN_IDENTITY;
-}
-
static int domain_pfn_supported(struct dmar_domain *domain, unsigned long pfn)
{
int addr_width = agaw_to_width(domain->agaw) - VTD_PAGE_SHIFT;
@@ -1796,9 +1782,6 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
struct context_entry *context;
int agaw, ret;

- if (hw_pass_through && domain_type_is_si(domain))
- translation = CONTEXT_TT_PASS_THROUGH;
-
pr_debug("Set context mapping for %02x:%02x.%d\n",
bus, PCI_SLOT(devfn), PCI_FUNC(devfn));

@@ -1817,34 +1800,24 @@ static int domain_context_mapping_one(struct dmar_domain *domain,

context_set_domain_id(context, did);

- if (translation != CONTEXT_TT_PASS_THROUGH) {
- /*
- * Skip top levels of page tables for iommu which has
- * less agaw than default. Unnecessary for PT mode.
- */
- for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) {
- ret = -ENOMEM;
- pgd = phys_to_virt(dma_pte_addr(pgd));
- if (!dma_pte_present(pgd))
- goto out_unlock;
- }
-
- if (info && info->ats_supported)
- translation = CONTEXT_TT_DEV_IOTLB;
- else
- translation = CONTEXT_TT_MULTI_LEVEL;
-
- context_set_address_root(context, virt_to_phys(pgd));
- context_set_address_width(context, agaw);
- } else {
- /*
- * In pass through mode, AW must be programmed to
- * indicate the largest AGAW value supported by
- * hardware. And ASR is ignored by hardware.
- */
- context_set_address_width(context, iommu->msagaw);
+ /*
+ * Skip top levels of page tables for iommu which has
+ * less agaw than default. Unnecessary for PT mode.
+ */
+ for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) {
+ ret = -ENOMEM;
+ pgd = phys_to_virt(dma_pte_addr(pgd));
+ if (!dma_pte_present(pgd))
+ goto out_unlock;
}

+ if (info && info->ats_supported)
+ translation = CONTEXT_TT_DEV_IOTLB;
+ else
+ translation = CONTEXT_TT_MULTI_LEVEL;
+
+ context_set_address_root(context, virt_to_phys(pgd));
+ context_set_address_width(context, agaw);
context_set_translation_type(context, translation);
context_set_fault_enable(context);
context_set_present(context);
@@ -2078,14 +2051,10 @@ static void domain_context_clear_one(struct device_domain_info *info, u8 bus, u8
return;
}

- if (sm_supported(iommu)) {
- if (hw_pass_through && domain_type_is_si(info->domain))
- did_old = FLPT_DEFAULT_DID;
- else
- did_old = domain_id_iommu(info->domain, iommu);
- } else {
- did_old = context_domain_id(context);
- }
+ if (info->domain)
+ did_old = domain_id_iommu(info->domain, iommu);
+ else
+ did_old = FLPT_DEFAULT_DID;

context_clear_entry(context);
__iommu_flush_cache(iommu, context, sizeof(*context));
@@ -2148,80 +2117,6 @@ static bool dev_is_real_dma_subdevice(struct device *dev)
pci_real_dma_dev(to_pci_dev(dev)) != to_pci_dev(dev);
}

-static int iommu_domain_identity_map(struct dmar_domain *domain,
- unsigned long first_vpfn,
- unsigned long last_vpfn)
-{
- /*
- * RMRR range might have overlap with physical memory range,
- * clear it first
- */
- dma_pte_clear_range(domain, first_vpfn, last_vpfn);
-
- return __domain_mapping(domain, first_vpfn,
- first_vpfn, last_vpfn - first_vpfn + 1,
- DMA_PTE_READ|DMA_PTE_WRITE, GFP_KERNEL);
-}
-
-static int md_domain_init(struct dmar_domain *domain, int guest_width);
-
-static int __init si_domain_init(int hw)
-{
- struct dmar_rmrr_unit *rmrr;
- struct device *dev;
- int i, nid, ret;
-
- si_domain = alloc_domain(IOMMU_DOMAIN_IDENTITY);
- if (!si_domain)
- return -EFAULT;
-
- if (md_domain_init(si_domain, DEFAULT_DOMAIN_ADDRESS_WIDTH)) {
- domain_exit(si_domain);
- si_domain = NULL;
- return -EFAULT;
- }
-
- if (hw)
- return 0;
-
- for_each_online_node(nid) {
- unsigned long start_pfn, end_pfn;
- int i;
-
- for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) {
- ret = iommu_domain_identity_map(si_domain,
- mm_to_dma_pfn_start(start_pfn),
- mm_to_dma_pfn_end(end_pfn));
- if (ret)
- return ret;
- }
- }
-
- /*
- * Identity map the RMRRs so that devices with RMRRs could also use
- * the si_domain.
- */
- for_each_rmrr_units(rmrr) {
- for_each_active_dev_scope(rmrr->devices, rmrr->devices_cnt,
- i, dev) {
- unsigned long long start = rmrr->base_address;
- unsigned long long end = rmrr->end_address;
-
- if (WARN_ON(end < start ||
- end >> agaw_to_width(si_domain->agaw)))
- continue;
-
- ret = iommu_domain_identity_map(si_domain,
- mm_to_dma_pfn_start(start >> PAGE_SHIFT),
- mm_to_dma_pfn_end(end >> PAGE_SHIFT));
- if (ret)
- return ret;
- }
- }
-
- return 0;
-}
-
static int dmar_domain_attach_device(struct dmar_domain *domain,
struct device *dev)
{
@@ -2243,8 +2138,6 @@ static int dmar_domain_attach_device(struct dmar_domain *domain,

if (!sm_supported(iommu))
ret = domain_context_mapping(domain, dev);
- else if (hw_pass_through && domain_type_is_si(domain))
- ret = intel_pasid_setup_pass_through(iommu, dev, IOMMU_NO_PASID);
else if (domain->use_first_level)
ret = domain_setup_first_level(iommu, domain, dev, IOMMU_NO_PASID);
else
@@ -2255,8 +2148,7 @@ static int dmar_domain_attach_device(struct dmar_domain *domain,
return ret;
}

- if (sm_supported(info->iommu) || !domain_type_is_si(info->domain))
- iommu_enable_pci_caps(info);
+ iommu_enable_pci_caps(info);

return 0;
}
@@ -2606,8 +2498,6 @@ static int __init init_dmars(void)
}
}

- if (!ecap_pass_through(iommu->ecap))
- hw_pass_through = 0;
intel_svm_check(iommu);
}

@@ -2630,10 +2520,6 @@ static int __init init_dmars(void)

check_tylersburg_isoch();

- ret = si_domain_init(hw_pass_through);
- if (ret)
- goto free_iommu;
-
/*
* for each drhd
* enable fault log
@@ -2679,10 +2565,6 @@ static int __init init_dmars(void)
disable_dmar_iommu(iommu);
free_dmar_iommu(iommu);
}
- if (si_domain) {
- domain_exit(si_domain);
- si_domain = NULL;
- }

return ret;
}
@@ -3057,12 +2939,6 @@ static int intel_iommu_add(struct dmar_drhd_unit *dmaru)
if (ret)
goto out;

- if (hw_pass_through && !ecap_pass_through(iommu->ecap)) {
- pr_warn("%s: Doesn't support hardware pass through.\n",
- iommu->name);
- return -ENXIO;
- }
-
sp = domain_update_iommu_superpage(NULL, iommu) - 1;
if (sp >= 0 && !(cap_super_page_val(iommu->cap) & (1 << sp))) {
pr_warn("%s: Doesn't support large page.\n",
@@ -3313,52 +3189,6 @@ int dmar_iommu_notify_scope_dev(struct dmar_pci_notify_info *info)
return 0;
}

-static int intel_iommu_memory_notifier(struct notifier_block *nb,
- unsigned long val, void *v)
-{
- struct memory_notify *mhp = v;
- unsigned long start_vpfn = mm_to_dma_pfn_start(mhp->start_pfn);
- unsigned long last_vpfn = mm_to_dma_pfn_end(mhp->start_pfn +
- mhp->nr_pages - 1);
-
- switch (val) {
- case MEM_GOING_ONLINE:
- if (iommu_domain_identity_map(si_domain,
- start_vpfn, last_vpfn)) {
- pr_warn("Failed to build identity map for [%lx-%lx]\n",
- start_vpfn, last_vpfn);
- return NOTIFY_BAD;
- }
- break;
-
- case MEM_OFFLINE:
- case MEM_CANCEL_ONLINE:
- {
- struct dmar_drhd_unit *drhd;
- struct intel_iommu *iommu;
- LIST_HEAD(freelist);
-
- domain_unmap(si_domain, start_vpfn, last_vpfn, &freelist);
-
- rcu_read_lock();
- for_each_active_iommu(iommu, drhd)
- iommu_flush_iotlb_psi(iommu, si_domain,
- start_vpfn, mhp->nr_pages,
- list_empty(&freelist), 0);
- rcu_read_unlock();
- put_pages_list(&freelist);
- }
- break;
- }
-
- return NOTIFY_OK;
-}
-
-static struct notifier_block intel_iommu_memory_nb = {
- .notifier_call = intel_iommu_memory_notifier,
- .priority = 0
-};
-
static void intel_disable_iommus(void)
{
struct intel_iommu *iommu = NULL;
@@ -3655,12 +3485,7 @@ int __init intel_iommu_init(void)

iommu_pmu_register(iommu);
}
- up_read(&dmar_global_lock);

- if (si_domain && !hw_pass_through)
- register_memory_notifier(&intel_iommu_memory_nb);
-
- down_read(&dmar_global_lock);
if (probe_acpi_namespace_devices())
pr_warn("ACPI name space devices didn't probe correctly\n");

@@ -3827,8 +3652,6 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type)
domain->geometry.force_aperture = true;

return domain;
- case IOMMU_DOMAIN_IDENTITY:
- return &si_domain->domain;
case IOMMU_DOMAIN_SVA:
return intel_svm_domain_alloc();
default:
@@ -3888,8 +3711,7 @@ intel_iommu_domain_alloc_user(struct device *dev, u32 flags,

static void intel_iommu_domain_free(struct iommu_domain *domain)
{
- if (domain != &si_domain->domain)
- domain_exit(to_dmar_domain(domain));
+ domain_exit(to_dmar_domain(domain));
}

int prepare_domain_attach_device(struct iommu_domain *domain,
@@ -4596,9 +4418,7 @@ static int intel_iommu_set_dev_pasid(struct iommu_domain *domain,
if (ret)
goto out_free;

- if (domain_type_is_si(dmar_domain))
- ret = intel_pasid_setup_pass_through(iommu, dev, pasid);
- else if (dmar_domain->use_first_level)
+ if (dmar_domain->use_first_level)
ret = domain_setup_first_level(iommu, dmar_domain,
dev, pasid);
else
--
2.34.1

2023-11-20 11:38:52

by Lu Baolu

[permalink] [raw]
Subject: [PATCH 3/5] iommu/vt-d: Refactor domain_context_mapping_one() to be reusable

Extract common code from domain_context_mapping_one() into new functions,
making it reusable by other functions such as the upcoming identity domain
implementation. No intentional functional changes.

Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/intel/iommu.c | 99 ++++++++++++++++++++++---------------
1 file changed, 58 insertions(+), 41 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 77769ad4706e..86339dc30243 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1727,6 +1727,61 @@ static void domain_exit(struct dmar_domain *domain)
kfree(domain);
}

+/*
+ * For kdump cases, old valid entries may be cached due to the
+ * in-flight DMA and copied pgtable, but there is no unmapping
+ * behaviour for them, thus we need an explicit cache flush for
+ * the newly-mapped device. For kdump, at this point, the device
+ * is supposed to finish reset at its driver probe stage, so no
+ * in-flight DMA will exist, and we don't need to worry anymore
+ * hereafter.
+ */
+static void copied_context_tear_down(struct intel_iommu *iommu,
+ struct context_entry *context,
+ u8 bus, u8 devfn)
+{
+ u16 did_old;
+
+ if (!context_copied(iommu, bus, devfn))
+ return;
+
+ assert_spin_locked(&iommu->lock);
+
+ did_old = context_domain_id(context);
+ context_clear_entry(context);
+
+ if (did_old < cap_ndoms(iommu->cap)) {
+ iommu->flush.flush_context(iommu, did_old,
+ (((u16)bus) << 8) | devfn,
+ DMA_CCMD_MASK_NOBIT,
+ DMA_CCMD_DEVICE_INVL);
+ iommu->flush.flush_iotlb(iommu, did_old, 0, 0,
+ DMA_TLB_DSI_FLUSH);
+ }
+
+ clear_context_copied(iommu, bus, devfn);
+}
+
+/*
+ * It's a non-present to present mapping. If hardware doesn't cache
+ * non-present entry we only need to flush the write-buffer. If the
+ * _does_ cache non-present entries, then it does so in the special
+ * domain #0, which we have to flush:
+ */
+static void context_present_cache_flush(struct intel_iommu *iommu, u16 did,
+ u8 bus, u8 devfn)
+{
+ if (cap_caching_mode(iommu->cap)) {
+ iommu->flush.flush_context(iommu, 0,
+ (((u16)bus) << 8) | devfn,
+ DMA_CCMD_MASK_NOBIT,
+ DMA_CCMD_DEVICE_INVL);
+ iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
+ } else {
+ iommu_flush_write_buffer(iommu);
+ }
+}
+
static int domain_context_mapping_one(struct dmar_domain *domain,
struct intel_iommu *iommu,
u8 bus, u8 devfn)
@@ -1755,31 +1810,9 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
if (context_present(context) && !context_copied(iommu, bus, devfn))
goto out_unlock;

- /*
- * For kdump cases, old valid entries may be cached due to the
- * in-flight DMA and copied pgtable, but there is no unmapping
- * behaviour for them, thus we need an explicit cache flush for
- * the newly-mapped device. For kdump, at this point, the device
- * is supposed to finish reset at its driver probe stage, so no
- * in-flight DMA will exist, and we don't need to worry anymore
- * hereafter.
- */
- if (context_copied(iommu, bus, devfn)) {
- u16 did_old = context_domain_id(context);
-
- if (did_old < cap_ndoms(iommu->cap)) {
- iommu->flush.flush_context(iommu, did_old,
- (((u16)bus) << 8) | devfn,
- DMA_CCMD_MASK_NOBIT,
- DMA_CCMD_DEVICE_INVL);
- iommu->flush.flush_iotlb(iommu, did_old, 0, 0,
- DMA_TLB_DSI_FLUSH);
- }
-
- clear_context_copied(iommu, bus, devfn);
- }
-
+ copied_context_tear_down(iommu, context, bus, devfn);
context_clear_entry(context);
+
context_set_domain_id(context, did);

if (translation != CONTEXT_TT_PASS_THROUGH) {
@@ -1815,23 +1848,7 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
context_set_present(context);
if (!ecap_coherent(iommu->ecap))
clflush_cache_range(context, sizeof(*context));
-
- /*
- * It's a non-present to present mapping. If hardware doesn't cache
- * non-present entry we only need to flush the write-buffer. If the
- * _does_ cache non-present entries, then it does so in the special
- * domain #0, which we have to flush:
- */
- if (cap_caching_mode(iommu->cap)) {
- iommu->flush.flush_context(iommu, 0,
- (((u16)bus) << 8) | devfn,
- DMA_CCMD_MASK_NOBIT,
- DMA_CCMD_DEVICE_INVL);
- iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
- } else {
- iommu_flush_write_buffer(iommu);
- }
-
+ context_present_cache_flush(iommu, did, bus, devfn);
ret = 0;

out_unlock:
--
2.34.1

2023-11-20 11:38:52

by Lu Baolu

[permalink] [raw]
Subject: [PATCH 4/5] iommu/vt-d: Add support for static identity domain

Add a global static identity domain with guaranteed attach semantics.

Software determines VT-d hardware support for pass-through translation by
inspecting the capability register. If pass-through translation is not
supported, the device is instructed to use DMA domain for its default
domain. While most recent VT-d hardware implementations support pass-
through translation, this capability is only lacking in some early VT-d
implementations.

With the static identity domain in place, the domain field of per-device
iommu driver data can be either a pointer to a DMA translation domain, or
NULL, indicating that the static identity domain is attached. Refine some
code to accommodate this change.

Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/intel/iommu.c | 129 ++++++++++++++++++++++++++++++++++--
drivers/iommu/intel/svm.c | 2 +-
2 files changed, 125 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 86339dc30243..ad2e53821a05 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1282,7 +1282,8 @@ static void iommu_enable_pci_caps(struct device_domain_info *info)
if (info->ats_supported && pci_ats_page_aligned(pdev) &&
!pci_enable_ats(pdev, VTD_PAGE_SHIFT)) {
info->ats_enabled = 1;
- domain_update_iotlb(info->domain);
+ if (info->domain)
+ domain_update_iotlb(info->domain);
}
}

@@ -1298,7 +1299,8 @@ static void iommu_disable_pci_caps(struct device_domain_info *info)
if (info->ats_enabled) {
pci_disable_ats(pdev);
info->ats_enabled = 0;
- domain_update_iotlb(info->domain);
+ if (info->domain)
+ domain_update_iotlb(info->domain);
}

if (info->pasid_enabled) {
@@ -2301,6 +2303,9 @@ static bool device_rmrr_is_relaxable(struct device *dev)
*/
static int device_def_domain_type(struct device *dev)
{
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+ struct intel_iommu *iommu = info->iommu;
+
if (dev_is_pci(dev)) {
struct pci_dev *pdev = to_pci_dev(dev);

@@ -2311,6 +2316,13 @@ static int device_def_domain_type(struct device *dev)
return IOMMU_DOMAIN_IDENTITY;
}

+ /*
+ * Hardware does not support the passthrough translation mode.
+ * Always use a dynamaic mapping domain.
+ */
+ if (!ecap_pass_through(iommu->ecap))
+ return IOMMU_DOMAIN_DMA;
+
return 0;
}

@@ -3712,6 +3724,9 @@ static void dmar_remove_one_dev_info(struct device *dev)
domain_context_clear(info);
}

+ if (!domain)
+ return;
+
spin_lock_irqsave(&domain->lock, flags);
list_del(&info->link);
spin_unlock_irqrestore(&domain->lock, flags);
@@ -3920,11 +3935,9 @@ int prepare_domain_attach_device(struct iommu_domain *domain,
static int intel_iommu_attach_device(struct iommu_domain *domain,
struct device *dev)
{
- struct device_domain_info *info = dev_iommu_priv_get(dev);
int ret;

- if (info->domain)
- device_block_translation(dev);
+ device_block_translation(dev);

ret = prepare_domain_attach_device(domain, dev);
if (ret)
@@ -4529,6 +4542,9 @@ static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
goto out_tear_down;
}

+ if (domain->type == IOMMU_DOMAIN_IDENTITY)
+ goto out_tear_down;
+
dmar_domain = to_dmar_domain(domain);
spin_lock_irqsave(&dmar_domain->lock, flags);
list_for_each_entry(curr, &dmar_domain->dev_pasids, link_domain) {
@@ -4703,8 +4719,111 @@ const struct iommu_dirty_ops intel_dirty_ops = {
.read_and_clear_dirty = intel_iommu_read_and_clear_dirty,
};

+static int context_setup_pass_through(struct device *dev, u8 bus, u8 devfn)
+{
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+ struct intel_iommu *iommu = info->iommu;
+ struct context_entry *context;
+
+ spin_lock(&iommu->lock);
+ context = iommu_context_addr(iommu, bus, devfn, 1);
+ if (!context) {
+ spin_unlock(&iommu->lock);
+ return -ENOMEM;
+ }
+
+ if (context_present(context) && !context_copied(iommu, bus, devfn)) {
+ spin_unlock(&iommu->lock);
+ return 0;
+ }
+
+ copied_context_tear_down(iommu, context, bus, devfn);
+ context_clear_entry(context);
+ context_set_domain_id(context, FLPT_DEFAULT_DID);
+
+ /*
+ * In pass through mode, AW must be programmed to indicate the largest
+ * AGAW value supported by hardware. And ASR is ignored by hardware.
+ */
+ context_set_address_width(context, iommu->msagaw);
+ context_set_translation_type(context, CONTEXT_TT_PASS_THROUGH);
+ context_set_fault_enable(context);
+ context_set_present(context);
+ if (!ecap_coherent(iommu->ecap))
+ clflush_cache_range(context, sizeof(*context));
+ context_present_cache_flush(iommu, FLPT_DEFAULT_DID, bus, devfn);
+ spin_unlock(&iommu->lock);
+
+ return 0;
+}
+
+static int context_setup_pass_through_cb(struct pci_dev *pdev, u16 alias, void *data)
+{
+ struct device *dev = data;
+
+ if (dev != &pdev->dev)
+ return 0;
+
+ return context_setup_pass_through(dev, PCI_BUS_NUM(alias), alias & 0xff);
+}
+
+static int device_setup_pass_through(struct device *dev)
+{
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+
+ if (!dev_is_pci(dev))
+ return context_setup_pass_through(dev, info->bus, info->devfn);
+
+ return pci_for_each_dma_alias(to_pci_dev(dev),
+ context_setup_pass_through_cb, dev);
+}
+
+static int identity_domain_attach_dev(struct iommu_domain *domain,
+ struct device *dev)
+{
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+ struct intel_iommu *iommu = info->iommu;
+ int ret;
+
+ device_block_translation(dev);
+
+ if (dev_is_real_dma_subdevice(dev))
+ return 0;
+
+ if (sm_supported(iommu)) {
+ ret = intel_pasid_setup_pass_through(iommu, dev, IOMMU_NO_PASID);
+ if (!ret)
+ iommu_enable_pci_caps(info);
+ } else {
+ ret = device_setup_pass_through(dev);
+ }
+
+ return ret;
+}
+
+static int identity_domain_set_dev_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
+{
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+ struct intel_iommu *iommu = info->iommu;
+
+ if (!pasid_supported(iommu) || dev_is_real_dma_subdevice(dev))
+ return -EOPNOTSUPP;
+
+ return intel_pasid_setup_pass_through(iommu, dev, pasid);
+}
+
+static struct iommu_domain identity_domain = {
+ .type = IOMMU_DOMAIN_IDENTITY,
+ .ops = &(const struct iommu_domain_ops) {
+ .attach_dev = identity_domain_attach_dev,
+ .set_dev_pasid = identity_domain_set_dev_pasid,
+ },
+};
+
const struct iommu_ops intel_iommu_ops = {
.blocked_domain = &blocking_domain,
+ .identity_domain = &identity_domain,
.capable = intel_iommu_capable,
.hw_info = intel_iommu_hw_info,
.domain_alloc = intel_iommu_domain_alloc,
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 64c02e6ab1b3..e744f7439e97 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -493,7 +493,7 @@ void intel_drain_pasid_prq(struct device *dev, u32 pasid)
domain = info->domain;
pdev = to_pci_dev(dev);
sid = PCI_DEVID(info->bus, info->devfn);
- did = domain_id_iommu(domain, iommu);
+ did = domain ? domain_id_iommu(domain, iommu) : FLPT_DEFAULT_DID;
qdep = pci_ats_queue_depth(pdev);

/*
--
2.34.1

2023-11-29 20:26:48

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 4/5] iommu/vt-d: Add support for static identity domain

On Mon, Nov 20, 2023 at 07:29:43PM +0800, Lu Baolu wrote:

> @@ -2311,6 +2316,13 @@ static int device_def_domain_type(struct device *dev)
> return IOMMU_DOMAIN_IDENTITY;
> }
>
> + /*
> + * Hardware does not support the passthrough translation mode.
> + * Always use a dynamaic mapping domain.
> + */
> + if (!ecap_pass_through(iommu->ecap))
> + return IOMMU_DOMAIN_DMA;
> +

Doesn't this return from def_domain_type completely prevent using an
identity domain?

I thought the point of this was to allow the identity domain but have
it be translating?

Jason

2023-11-29 20:29:14

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 4/5] iommu/vt-d: Add support for static identity domain

On Wed, Nov 29, 2023 at 04:26:15PM -0400, Jason Gunthorpe wrote:
> On Mon, Nov 20, 2023 at 07:29:43PM +0800, Lu Baolu wrote:
>
> > @@ -2311,6 +2316,13 @@ static int device_def_domain_type(struct device *dev)
> > return IOMMU_DOMAIN_IDENTITY;
> > }
> >
> > + /*
> > + * Hardware does not support the passthrough translation mode.
> > + * Always use a dynamaic mapping domain.
> > + */
> > + if (!ecap_pass_through(iommu->ecap))
> > + return IOMMU_DOMAIN_DMA;
> > +
>
> Doesn't this return from def_domain_type completely prevent using an
> identity domain?
>
> I thought the point of this was to allow the identity domain but have
> it be translating?

I suppose the answer is the next patch deletes that stuff.

I would probably have structured this in the other order, first add
this hunk and say that old HW is being de-supported. Remove all the
now-dead code creating the 1:1 page table, then refactor the remainder
to create the global static.

Jason

2023-11-30 06:10:09

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH 4/5] iommu/vt-d: Add support for static identity domain

On 2023/11/30 4:28, Jason Gunthorpe wrote:
> On Wed, Nov 29, 2023 at 04:26:15PM -0400, Jason Gunthorpe wrote:
>> On Mon, Nov 20, 2023 at 07:29:43PM +0800, Lu Baolu wrote:
>>
>>> @@ -2311,6 +2316,13 @@ static int device_def_domain_type(struct device *dev)
>>> return IOMMU_DOMAIN_IDENTITY;
>>> }
>>>
>>> + /*
>>> + * Hardware does not support the passthrough translation mode.
>>> + * Always use a dynamaic mapping domain.
>>> + */
>>> + if (!ecap_pass_through(iommu->ecap))
>>> + return IOMMU_DOMAIN_DMA;
>>> +
>> Doesn't this return from def_domain_type completely prevent using an
>> identity domain?
>>
>> I thought the point of this was to allow the identity domain but have
>> it be translating?
> I suppose the answer is the next patch deletes that stuff.
>
> I would probably have structured this in the other order, first add
> this hunk and say that old HW is being de-supported. Remove all the
> now-dead code creating the 1:1 page table, then refactor the remainder
> to create the global static.

Okay, that would be more readable.

Best regards,
baolu