2022-12-01 04:17:37

by Lu Baolu

[permalink] [raw]
Subject: [PATCH 0/4] [PULL REQUEST] iommu/vt-d: Fixes for v6.1-rc8

Hi Joerg,

Below fixes are queued for v6.1. They aim to:

- Add a fix to handle a QAT device quirk.
- Fix 3 PCI device refcount leaks.

This series is also available at github.
https://github.com/LuBaolu/intel-iommu/commits/vtd-fix-for-v6.1-rc8

Please consider it for the iommu/fix branch.

Best regards,
Lu Baolu

Jacob Pan (1):
iommu/vt-d: Add a fix for devices need extra dtlb flush

Xiongfeng Wang (2):
iommu/vt-d: Fix PCI device refcount leak in has_external_pci()
iommu/vt-d: Fix PCI device refcount leak in dmar_dev_scope_init()

Yang Yingliang (1):
iommu/vt-d: Fix PCI device refcount leak in prq_event_thread()

drivers/iommu/intel/iommu.h | 4 ++
drivers/iommu/intel/dmar.c | 1 +
drivers/iommu/intel/iommu.c | 73 +++++++++++++++++++++++++++++++++++--
drivers/iommu/intel/svm.c | 19 +++++++---
4 files changed, 88 insertions(+), 9 deletions(-)

--
2.34.1


2022-12-01 04:34:57

by Lu Baolu

[permalink] [raw]
Subject: [PATCH 4/4] iommu/vt-d: Fix PCI device refcount leak in dmar_dev_scope_init()

From: Xiongfeng Wang <[email protected]>

for_each_pci_dev() is implemented by pci_get_device(). The comment of
pci_get_device() says that it will increase the reference count for the
returned pci_dev and also decrease the reference count for the input
pci_dev @from if it is not NULL.

If we break for_each_pci_dev() loop with pdev not NULL, we need to call
pci_dev_put() to decrease the reference count. Add the missing
pci_dev_put() for the error path to avoid reference count leak.

Fixes: 2e4552893038 ("iommu/vt-d: Unify the way to process DMAR device scope array")
Signed-off-by: Xiongfeng Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/intel/dmar.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 5a8f780e7ffd..bc94059a5b87 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -820,6 +820,7 @@ int __init dmar_dev_scope_init(void)
info = dmar_alloc_pci_notify_info(dev,
BUS_NOTIFY_ADD_DEVICE);
if (!info) {
+ pci_dev_put(dev);
return dmar_dev_scope_status;
} else {
dmar_pci_bus_add_dev(info);
--
2.34.1

2022-12-01 04:38:56

by Lu Baolu

[permalink] [raw]
Subject: [PATCH 3/4] iommu/vt-d: Fix PCI device refcount leak in has_external_pci()

From: Xiongfeng Wang <[email protected]>

for_each_pci_dev() is implemented by pci_get_device(). The comment of
pci_get_device() says that it will increase the reference count for the
returned pci_dev and also decrease the reference count for the input
pci_dev @from if it is not NULL.

If we break for_each_pci_dev() loop with pdev not NULL, we need to call
pci_dev_put() to decrease the reference count. Add the missing
pci_dev_put() before 'return true' to avoid reference count leak.

Fixes: 89a6079df791 ("iommu/vt-d: Force IOMMU on for platform opt in hint")
Signed-off-by: Xiongfeng Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/intel/iommu.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 587eebe39820..5287efe247b1 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3873,8 +3873,10 @@ static inline bool has_external_pci(void)
struct pci_dev *pdev = NULL;

for_each_pci_dev(pdev)
- if (pdev->external_facing)
+ if (pdev->external_facing) {
+ pci_dev_put(pdev);
return true;
+ }

return false;
}
--
2.34.1

2022-12-01 04:50:10

by Lu Baolu

[permalink] [raw]
Subject: [PATCH 1/4] iommu/vt-d: Add a fix for devices need extra dtlb flush

From: Jacob Pan <[email protected]>

QAT devices on Intel Sapphire Rapids and Emerald Rapids have a defect in
address translation service (ATS). These devices may inadvertently issue
ATS invalidation completion before posted writes initiated with
translated address that utilized translations matching the invalidation
address range, violating the invalidation completion ordering.

This patch adds an extra device TLB invalidation for the affected devices,
it is needed to ensure no more posted writes with translated address
following the invalidation completion. Therefore, the ordering is
preserved and data-corruption is prevented.

Device TLBs are invalidated under the following six conditions:
1. Device driver does DMA API unmap IOVA
2. Device driver unbind a PASID from a process, sva_unbind_device()
3. PASID is torn down, after PASID cache is flushed. e.g. process
exit_mmap() due to crash
4. Under SVA usage, called by mmu_notifier.invalidate_range() where
VM has to free pages that were unmapped
5. userspace driver unmaps a DMA buffer
6. Cache invalidation in vSVA usage (upcoming)

For #1 and #2, device drivers are responsible for stopping DMA traffic
before unmap/unbind. For #3, iommu driver gets mmu_notifier to
invalidate TLB the same way as normal user unmap which will do an extra
invalidation. The dTLB invalidation after PASID cache flush does not
need an extra invalidation.

Therefore, we only need to deal with #4 and #5 in this patch. #1 is also
covered by this patch due to common code path with #5.

Tested-by: Yuzhang Luo <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Signed-off-by: Jacob Pan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/intel/iommu.h | 4 +++
drivers/iommu/intel/iommu.c | 69 +++++++++++++++++++++++++++++++++++--
drivers/iommu/intel/svm.c | 5 ++-
3 files changed, 75 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 92023dff9513..db9df7c3790c 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -623,6 +623,7 @@ struct device_domain_info {
u8 pri_enabled:1;
u8 ats_supported:1;
u8 ats_enabled:1;
+ u8 dtlb_extra_inval:1; /* Quirk for devices need extra flush */
u8 ats_qdep;
struct device *dev; /* it's NULL for PCIe-to-PCI bridge */
struct intel_iommu *iommu; /* IOMMU used by this device */
@@ -728,6 +729,9 @@ void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u32 pasid, u64 addr,
void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid,
u32 pasid, u16 qdep, u64 addr,
unsigned int size_order);
+void quirk_extra_dev_tlb_flush(struct device_domain_info *info,
+ unsigned long address, unsigned long pages,
+ u32 pasid, u16 qdep);
void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu,
u32 pasid);

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 996a8b5ee5ee..587eebe39820 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1396,6 +1396,24 @@ static void domain_update_iotlb(struct dmar_domain *domain)
spin_unlock_irqrestore(&domain->lock, flags);
}

+/*
+ * The extra devTLB flush quirk impacts those QAT devices with PCI device
+ * IDs ranging from 0x4940 to 0x4943. It is exempted from risky_device()
+ * check because it applies only to the built-in QAT devices and it doesn't
+ * grant additional privileges.
+ */
+#define BUGGY_QAT_DEVID_MASK 0x494c
+static bool dev_needs_extra_dtlb_flush(struct pci_dev *pdev)
+{
+ if (pdev->vendor != PCI_VENDOR_ID_INTEL)
+ return false;
+
+ if ((pdev->device & 0xfffc) != BUGGY_QAT_DEVID_MASK)
+ return false;
+
+ return true;
+}
+
static void iommu_enable_pci_caps(struct device_domain_info *info)
{
struct pci_dev *pdev;
@@ -1478,6 +1496,7 @@ static void __iommu_flush_dev_iotlb(struct device_domain_info *info,
qdep = info->ats_qdep;
qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
qdep, addr, mask);
+ quirk_extra_dev_tlb_flush(info, addr, mask, PASID_RID2PASID, qdep);
}

static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
@@ -4490,9 +4509,10 @@ static struct iommu_device *intel_iommu_probe_device(struct device *dev)
if (dev_is_pci(dev)) {
if (ecap_dev_iotlb_support(iommu->ecap) &&
pci_ats_supported(pdev) &&
- dmar_ats_supported(pdev, iommu))
+ dmar_ats_supported(pdev, iommu)) {
info->ats_supported = 1;
-
+ info->dtlb_extra_inval = dev_needs_extra_dtlb_flush(pdev);
+ }
if (sm_supported(iommu)) {
if (pasid_supported(iommu)) {
int features = pci_pasid_features(pdev);
@@ -4931,3 +4951,48 @@ static void __init check_tylersburg_isoch(void)
pr_warn("Recommended TLB entries for ISOCH unit is 16; your BIOS set %d\n",
vtisochctrl);
}
+
+/*
+ * Here we deal with a device TLB defect where device may inadvertently issue ATS
+ * invalidation completion before posted writes initiated with translated address
+ * that utilized translations matching the invalidation address range, violating
+ * the invalidation completion ordering.
+ * Therefore, any use cases that cannot guarantee DMA is stopped before unmap is
+ * vulnerable to this defect. In other words, any dTLB invalidation initiated not
+ * under the control of the trusted/privileged host device driver must use this
+ * quirk.
+ * Device TLBs are invalidated under the following six conditions:
+ * 1. Device driver does DMA API unmap IOVA
+ * 2. Device driver unbind a PASID from a process, sva_unbind_device()
+ * 3. PASID is torn down, after PASID cache is flushed. e.g. process
+ * exit_mmap() due to crash
+ * 4. Under SVA usage, called by mmu_notifier.invalidate_range() where
+ * VM has to free pages that were unmapped
+ * 5. Userspace driver unmaps a DMA buffer
+ * 6. Cache invalidation in vSVA usage (upcoming)
+ *
+ * For #1 and #2, device drivers are responsible for stopping DMA traffic
+ * before unmap/unbind. For #3, iommu driver gets mmu_notifier to
+ * invalidate TLB the same way as normal user unmap which will use this quirk.
+ * The dTLB invalidation after PASID cache flush does not need this quirk.
+ *
+ * As a reminder, #6 will *NEED* this quirk as we enable nested translation.
+ */
+void quirk_extra_dev_tlb_flush(struct device_domain_info *info,
+ unsigned long address, unsigned long mask,
+ u32 pasid, u16 qdep)
+{
+ u16 sid;
+
+ if (likely(!info->dtlb_extra_inval))
+ return;
+
+ sid = PCI_DEVID(info->bus, info->devfn);
+ if (pasid == PASID_RID2PASID) {
+ qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
+ qdep, address, mask);
+ } else {
+ qi_flush_dev_iotlb_pasid(info->iommu, sid, info->pfsid,
+ pasid, qdep, address, mask);
+ }
+}
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 7d08eb034f2d..fe615c53479c 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -184,10 +184,13 @@ static void __flush_svm_range_dev(struct intel_svm *svm,
return;

qi_flush_piotlb(sdev->iommu, sdev->did, svm->pasid, address, pages, ih);
- if (info->ats_enabled)
+ if (info->ats_enabled) {
qi_flush_dev_iotlb_pasid(sdev->iommu, sdev->sid, info->pfsid,
svm->pasid, sdev->qdep, address,
order_base_2(pages));
+ quirk_extra_dev_tlb_flush(info, address, order_base_2(pages),
+ svm->pasid, sdev->qdep);
+ }
}

static void intel_flush_svm_range_dev(struct intel_svm *svm,
--
2.34.1

2022-12-01 08:48:39

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH 1/4] iommu/vt-d: Add a fix for devices need extra dtlb flush

On 2022/12/1 12:01, Lu Baolu wrote:
> From: Jacob Pan<[email protected]>
>
> QAT devices on Intel Sapphire Rapids and Emerald Rapids have a defect in
> address translation service (ATS). These devices may inadvertently issue
> ATS invalidation completion before posted writes initiated with
> translated address that utilized translations matching the invalidation
> address range, violating the invalidation completion ordering.
>
> This patch adds an extra device TLB invalidation for the affected devices,
> it is needed to ensure no more posted writes with translated address
> following the invalidation completion. Therefore, the ordering is
> preserved and data-corruption is prevented.
>
> Device TLBs are invalidated under the following six conditions:
> 1. Device driver does DMA API unmap IOVA
> 2. Device driver unbind a PASID from a process, sva_unbind_device()
> 3. PASID is torn down, after PASID cache is flushed. e.g. process
> exit_mmap() due to crash
> 4. Under SVA usage, called by mmu_notifier.invalidate_range() where
> VM has to free pages that were unmapped
> 5. userspace driver unmaps a DMA buffer
> 6. Cache invalidation in vSVA usage (upcoming)
>
> For #1 and #2, device drivers are responsible for stopping DMA traffic
> before unmap/unbind. For #3, iommu driver gets mmu_notifier to
> invalidate TLB the same way as normal user unmap which will do an extra
> invalidation. The dTLB invalidation after PASID cache flush does not
> need an extra invalidation.
>
> Therefore, we only need to deal with #4 and #5 in this patch. #1 is also
> covered by this patch due to common code path with #5.
>
> Tested-by: Yuzhang Luo<[email protected]>
> Reviewed-by: Ashok Raj<[email protected]>
> Reviewed-by: Kevin Tian<[email protected]>

This missed cc stable tag.

Cc: [email protected] # v5.15+

Sorry for the inconvenience.

> Signed-off-by: Jacob Pan<[email protected]>
> Link:https://lore.kernel.org/r/[email protected]
> Signed-off-by: Lu Baolu<[email protected]>
> ---
> drivers/iommu/intel/iommu.h | 4 +++
> drivers/iommu/intel/iommu.c | 69 +++++++++++++++++++++++++++++++++++--
> drivers/iommu/intel/svm.c | 5 ++-
> 3 files changed, 75 insertions(+), 3 deletions(-)

Best regards,
baolu

2022-12-02 11:50:01

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH 0/4] [PULL REQUEST] iommu/vt-d: Fixes for v6.1-rc8

On Thu, Dec 01, 2022 at 12:01:23PM +0800, Lu Baolu wrote:
> Jacob Pan (1):
> iommu/vt-d: Add a fix for devices need extra dtlb flush
>
> Xiongfeng Wang (2):
> iommu/vt-d: Fix PCI device refcount leak in has_external_pci()
> iommu/vt-d: Fix PCI device refcount leak in dmar_dev_scope_init()
>
> Yang Yingliang (1):
> iommu/vt-d: Fix PCI device refcount leak in prq_event_thread()
>
> drivers/iommu/intel/iommu.h | 4 ++
> drivers/iommu/intel/dmar.c | 1 +
> drivers/iommu/intel/iommu.c | 73 +++++++++++++++++++++++++++++++++++--
> drivers/iommu/intel/svm.c | 19 +++++++---
> 4 files changed, 88 insertions(+), 9 deletions(-)

Applied to iommu/fixes, thanks Baolu.

2022-12-03 01:29:44

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH 0/4] [PULL REQUEST] iommu/vt-d: Fixes for v6.1-rc8

Hi Joerg,

On Fri, 2 Dec 2022 11:47:23 +0100, Joerg Roedel <[email protected]> wrote:

> On Thu, Dec 01, 2022 at 12:01:23PM +0800, Lu Baolu wrote:
> > Jacob Pan (1):
> > iommu/vt-d: Add a fix for devices need extra dtlb flush
There is a bug in this patch, I will send a fix patch. Or can you squash the
fix below?


From: Jacob Pan <[email protected]>
Date: Fri, 2 Dec 2022 16:22:42 -0800
Subject: [PATCH] iommu/vt-d: Fix buggy QAT device mask

Impacted QAT device IDs that need extra dtlb flush quirk is ranging
from 0x4940 to 0x4943. After bitwise AND device ID with 0xfffc the
result should be 0x4940 instead of 0x494c to identify these devices.

Reported-by: Raghunathan Srinivasan <[email protected]>
Signed-off-by: Jacob Pan <[email protected]>
---
drivers/iommu/intel/iommu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d8759f445aff..0b10104c4b99 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1401,7 +1401,7 @@ static void domain_update_iotlb(struct dmar_domain *domain)
* This quirk is exempted from risky_device() check because it applies only
* to the built-in QAT devices and it doesn't grant additional privileges.
*/
-#define BUGGY_QAT_DEVID_MASK 0x494c
+#define BUGGY_QAT_DEVID_MASK 0x4940
static bool dev_needs_extra_dtlb_flush(struct pci_dev *pdev)
{
if (pdev->vendor != PCI_VENDOR_ID_INTEL)

sorry about that,


Thanks,

Jacob