2023-02-16 13:16:46

by Lu Baolu

[permalink] [raw]
Subject: [PATCH 0/4] [PULL REQUEST] iommu/vt-d: Fixes for v6.2-rc8

Hi Joerg,

Below iommu/vt-d fixes are queued for your fixes branch.

- Two performance optimizations
- Fix PASID directory pointer coherency
- Fix missed rollbacks in error path

Please consider it for the iommu/fixes branch.

Best regards,
Lu Baolu

Jacob Pan (2):
iommu/vt-d: Avoid superfluous IOTLB tracking in lazy mode
iommu/vt-d: Fix PASID directory pointer coherency

Lu Baolu (1):
iommu/vt-d: Fix error handling in sva enable/disable paths

Tina Zhang (1):
iommu/vt-d: Allow to use flush-queue when first level is default

drivers/iommu/intel/iommu.c | 26 ++++++++++++++++++++------
drivers/iommu/intel/pasid.c | 7 +++++++
2 files changed, 27 insertions(+), 6 deletions(-)

--
2.34.1



2023-02-16 13:16:48

by Lu Baolu

[permalink] [raw]
Subject: [PATCH 1/4] iommu/vt-d: Fix error handling in sva enable/disable paths

Roll back all previous actions in error paths of intel_iommu_enable_sva()
and intel_iommu_disable_sva().

Fixes: d5b9e4bfe0d8 ("iommu/vt-d: Report prq to io-pgfault framework")
Reviewed-by: Kevin Tian <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
drivers/iommu/intel/iommu.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 59df7e42fd53..994dffa1db57 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4642,8 +4642,12 @@ static int intel_iommu_enable_sva(struct device *dev)
return -EINVAL;

ret = iopf_queue_add_device(iommu->iopf_queue, dev);
- if (!ret)
- ret = iommu_register_device_fault_handler(dev, iommu_queue_iopf, dev);
+ if (ret)
+ return ret;
+
+ ret = iommu_register_device_fault_handler(dev, iommu_queue_iopf, dev);
+ if (ret)
+ iopf_queue_remove_device(iommu->iopf_queue, dev);

return ret;
}
@@ -4655,8 +4659,12 @@ static int intel_iommu_disable_sva(struct device *dev)
int ret;

ret = iommu_unregister_device_fault_handler(dev);
- if (!ret)
- ret = iopf_queue_remove_device(iommu->iopf_queue, dev);
+ if (ret)
+ return ret;
+
+ ret = iopf_queue_remove_device(iommu->iopf_queue, dev);
+ if (ret)
+ iommu_register_device_fault_handler(dev, iommu_queue_iopf, dev);

return ret;
}
--
2.34.1


2023-02-16 13:16:54

by Lu Baolu

[permalink] [raw]
Subject: [PATCH 2/4] iommu/vt-d: Avoid superfluous IOTLB tracking in lazy mode

From: Jacob Pan <[email protected]>

Intel IOMMU driver implements IOTLB flush queue with domain selective
or PASID selective invalidations. In this case there's no need to track
IOVA page range and sync IOTLBs, which may cause significant performance
hit.

This patch adds a check to avoid IOVA gather page and IOTLB sync for
the lazy path.

The performance difference on Sapphire Rapids 100Gb NIC is improved by
the following (as measured by iperf send):

w/o this fix~48 Gbits/s. with this fix ~54 Gbits/s

Cc: <[email protected]>
Fixes: 2a2b8eaa5b25 ("iommu: Handle freelists when using deferred flushing in iommu drivers")
Reviewed-by: Robin Murphy <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Tested-by: Sanjay Kumar <[email protected]>
Signed-off-by: Sanjay Kumar <[email protected]>
Signed-off-by: Jacob Pan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/intel/iommu.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 994dffa1db57..ce36a16efc97 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4346,7 +4346,12 @@ static size_t intel_iommu_unmap(struct iommu_domain *domain,
if (dmar_domain->max_addr == iova + size)
dmar_domain->max_addr = iova;

- iommu_iotlb_gather_add_page(domain, gather, iova, size);
+ /*
+ * We do not use page-selective IOTLB invalidation in flush queue,
+ * so there is no need to track page and sync iotlb.
+ */
+ if (!iommu_iotlb_gather_queued(gather))
+ iommu_iotlb_gather_add_page(domain, gather, iova, size);

return size;
}
--
2.34.1


2023-02-16 13:16:57

by Lu Baolu

[permalink] [raw]
Subject: [PATCH 3/4] iommu/vt-d: Fix PASID directory pointer coherency

From: Jacob Pan <[email protected]>

On platforms that do not support IOMMU Extended capability bit 0
Page-walk Coherency, CPU caches are not snooped when IOMMU is accessing
any translation structures. IOMMU access goes only directly to
memory. Intel IOMMU code was missing a flush for the PASID table
directory that resulted in the unrecoverable fault as shown below.

This patch adds clflush calls whenever allocating and updating
a PASID table directory to ensure cache coherency.

On the reverse direction, there's no need to clflush the PASID directory
pointer when we deactivate a context entry in that IOMMU hardware will
not see the old PASID directory pointer after we clear the context entry.
PASID directory entries are also never freed once allocated.

DMAR: DRHD: handling fault status reg 3
DMAR: [DMA Read NO_PASID] Request device [00:0d.2] fault addr 0x1026a4000
[fault reason 0x51] SM: Present bit in Directory Entry is clear
DMAR: Dump dmar1 table entries for IOVA 0x1026a4000
DMAR: scalable mode root entry: hi 0x0000000102448001, low 0x0000000101b3e001
DMAR: context entry: hi 0x0000000000000000, low 0x0000000101b4d401
DMAR: pasid dir entry: 0x0000000101b4e001
DMAR: pasid table entry[0]: 0x0000000000000109
DMAR: pasid table entry[1]: 0x0000000000000001
DMAR: pasid table entry[2]: 0x0000000000000000
DMAR: pasid table entry[3]: 0x0000000000000000
DMAR: pasid table entry[4]: 0x0000000000000000
DMAR: pasid table entry[5]: 0x0000000000000000
DMAR: pasid table entry[6]: 0x0000000000000000
DMAR: pasid table entry[7]: 0x0000000000000000
DMAR: PTE not present at level 4

Cc: <[email protected]>
Fixes: 0bbeb01a4faf ("iommu/vt-d: Manage scalalble mode PASID tables")
Reviewed-by: Kevin Tian <[email protected]>
Reported-by: Sukumar Ghorai <[email protected]>
Signed-off-by: Ashok Raj <[email protected]>
Signed-off-by: Jacob Pan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/intel/pasid.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index fb3c7020028d..979f796175b1 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -128,6 +128,9 @@ int intel_pasid_alloc_table(struct device *dev)
pasid_table->max_pasid = 1 << (order + PAGE_SHIFT + 3);
info->pasid_table = pasid_table;

+ if (!ecap_coherent(info->iommu->ecap))
+ clflush_cache_range(pasid_table->table, size);
+
return 0;
}

@@ -215,6 +218,10 @@ static struct pasid_entry *intel_pasid_get_entry(struct device *dev, u32 pasid)
free_pgtable_page(entries);
goto retry;
}
+ if (!ecap_coherent(info->iommu->ecap)) {
+ clflush_cache_range(entries, VTD_PAGE_SIZE);
+ clflush_cache_range(&dir[dir_index].val, sizeof(*dir));
+ }
}

return &entries[index];
--
2.34.1


2023-02-16 13:17:00

by Lu Baolu

[permalink] [raw]
Subject: [PATCH 4/4] iommu/vt-d: Allow to use flush-queue when first level is default

From: Tina Zhang <[email protected]>

Commit 29b32839725f ("iommu/vt-d: Do not use flush-queue when caching-mode
is on") forced default domains to be strict mode as long as IOMMU
caching-mode is flagged. The reason for doing this is that when vIOMMU
uses VT-d caching mode to synchronize shadowing page tables, the strict
mode shows better performance.

However, this optimization is orthogonal to the first-level page table
because the Intel VT-d architecture does not define the caching mode of
the first-level page table. Refer to VT-d spec, section 6.1, "When the
CM field is reported as Set, any software updates to remapping
structures other than first-stage mapping (including updates to not-
present entries or present entries whose programming resulted in
translation faults) requires explicit invalidation of the caches."
Exclude the first-level page table from this optimization.

Generally using first-stage translation in vIOMMU implies nested
translation enabled in the physical IOMMU. In this case the first-stage
page table is wholly captured by the guest. The vIOMMU only needs to
transfer the cache invalidations on vIOMMU to the physical IOMMU.
Forcing the default domain to strict mode will cause more frequent
cache invalidations, resulting in performance degradation. In a real
performance benchmark test measured by iperf receive, the performance
result on Sapphire Rapids 100Gb NIC shows:
w/ this fix ~51 Gbits/s, w/o this fix ~39.3 Gbits/s.

Theoretically a first-stage IOMMU page table can still be shadowed
in absence of the caching mode, e.g. with host write-protecting guest
IOMMU page table to synchronize changed PTEs with the physical
IOMMU page table. In this case the shadowing overhead is decoupled
from emulating IOTLB invalidation then the overhead of the latter part
is solely decided by the frequency of IOTLB invalidations. Hence
allowing guest default dma domain to be lazy can also benefit the
overall performance by reducing the total VM-exit numbers.

Fixes: 29b32839725f ("iommu/vt-d: Do not use flush-queue when caching-mode is on")
Reported-by: Sanjay Kumar <[email protected]>
Suggested-by: Sanjay Kumar <[email protected]>
Signed-off-by: Tina Zhang <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/intel/iommu.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index ce36a16efc97..52afcdaf7c7f 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4005,7 +4005,8 @@ int __init intel_iommu_init(void)
* is likely to be much lower than the overhead of synchronizing
* the virtual and physical IOMMU page-tables.
*/
- if (cap_caching_mode(iommu->cap)) {
+ if (cap_caching_mode(iommu->cap) &&
+ !first_level_by_default(IOMMU_DOMAIN_DMA)) {
pr_info_once("IOMMU batching disallowed due to virtualization\n");
iommu_set_dma_strict();
}
--
2.34.1


2023-02-16 13:42:29

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH 0/4] [PULL REQUEST] iommu/vt-d: Fixes for v6.2-rc8

Hi Baolu,

On Thu, Feb 16, 2023 at 09:08:12PM +0800, Lu Baolu wrote:
> Below iommu/vt-d fixes are queued for your fixes branch.
>
> - Two performance optimizations
> - Fix PASID directory pointer coherency
> - Fix missed rollbacks in error path
>
> Please consider it for the iommu/fixes branch.

So nothing of this seems really critical (e.g. fixes a regression that a
number of people are encountering). Especially the performance
optimizations do not qualify as fixes at this stage of the cycle. I will
queue them in the VT-d branch so that they go upstream in the next merge
window, unless you convince me otherwise.

Regards,

Joerg

2023-02-16 13:46:05

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH 0/4] [PULL REQUEST] iommu/vt-d: Fixes for v6.2-rc8

On 2023/2/16 21:42, Joerg Roedel wrote:
> Hi Baolu,
>
> On Thu, Feb 16, 2023 at 09:08:12PM +0800, Lu Baolu wrote:
>> Below iommu/vt-d fixes are queued for your fixes branch.
>>
>> - Two performance optimizations
>> - Fix PASID directory pointer coherency
>> - Fix missed rollbacks in error path
>>
>> Please consider it for the iommu/fixes branch.
> So nothing of this seems really critical (e.g. fixes a regression that a
> number of people are encountering). Especially the performance
> optimizations do not qualify as fixes at this stage of the cycle. I will
> queue them in the VT-d branch so that they go upstream in the next merge
> window, unless you convince me otherwise.

Yes. Nothing really critical. It's fine to put them in the vt-d branch.

Best regards,
baolu