2024-02-20 07:06:16

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v2 2/2] iommu/vt-d: Use device rbtree in iopf reporting path

The existing I/O page fault handler currently locates the PCI device by
calling pci_get_domain_bus_and_slot(). This function searches the list
of all PCI devices until the desired device is found. To improve lookup
efficiency, replace it with device_rbtree_find() to search the device
within the probed device rbtree.

The I/O page fault is initiated by the device, which does not have any
synchronization mechanism with the software to ensure that the device
stays in the probed device tree. Theoretically, a device could be released
by the IOMMU subsystem after device_rbtree_find() and before
iopf_get_dev_fault_param(), which would cause a use-after-free problem.

Add a mutex to synchronize the I/O page fault reporting path and the IOMMU
release device path. This lock doesn't introduce any performance overhead,
as the conflict between I/O page fault reporting and device releasing is
very rare.

Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/intel/iommu.h | 2 ++
drivers/iommu/intel/dmar.c | 1 +
drivers/iommu/intel/iommu.c | 3 +++
drivers/iommu/intel/svm.c | 17 +++++++++--------
4 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 2b67ad0d6fe9..404d2476a877 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -719,6 +719,8 @@ struct intel_iommu {
#endif
struct iopf_queue *iopf_queue;
unsigned char iopfq_name[16];
+ /* Synchronization between fault report and iommu device release. */
+ struct mutex iopf_lock;
struct q_inval *qi; /* Queued invalidation info */
u32 iommu_state[MAX_SR_DMAR_REGS]; /* Store iommu states between suspend and resume.*/

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index f9b63c2875f7..d14797aabb7a 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1097,6 +1097,7 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd)
iommu->segment = drhd->segment;
iommu->device_rbtree = RB_ROOT;
spin_lock_init(&iommu->device_rbtree_lock);
+ mutex_init(&iommu->iopf_lock);
iommu->node = NUMA_NO_NODE;

ver = readl(iommu->reg + DMAR_VER_REG);
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index acfe27bd3448..6743fe6c7a36 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4430,8 +4430,11 @@ static struct iommu_device *intel_iommu_probe_device(struct device *dev)
static void intel_iommu_release_device(struct device *dev)
{
struct device_domain_info *info = dev_iommu_priv_get(dev);
+ struct intel_iommu *iommu = info->iommu;

+ mutex_lock(&iommu->iopf_lock);
device_rbtree_remove(info);
+ mutex_unlock(&iommu->iopf_lock);
dmar_remove_one_dev_info(dev);
intel_pasid_free_table(dev);
intel_iommu_debugfs_remove_dev(info);
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index b644d57da841..dda276e28325 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -645,7 +645,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
struct intel_iommu *iommu = d;
struct page_req_dsc *req;
int head, tail, handled;
- struct pci_dev *pdev;
+ struct device *dev;
u64 address;

/*
@@ -691,21 +691,22 @@ static irqreturn_t prq_event_thread(int irq, void *d)
if (unlikely(req->lpig && !req->rd_req && !req->wr_req))
goto prq_advance;

- pdev = pci_get_domain_bus_and_slot(iommu->segment,
- PCI_BUS_NUM(req->rid),
- req->rid & 0xff);
/*
* If prq is to be handled outside iommu driver via receiver of
* the fault notifiers, we skip the page response here.
*/
- if (!pdev)
+ mutex_lock(&iommu->iopf_lock);
+ dev = device_rbtree_find(iommu, req->rid);
+ if (!dev) {
+ mutex_unlock(&iommu->iopf_lock);
goto bad_req;
+ }

- intel_svm_prq_report(iommu, &pdev->dev, req);
- trace_prq_report(iommu, &pdev->dev, req->qw_0, req->qw_1,
+ intel_svm_prq_report(iommu, dev, req);
+ trace_prq_report(iommu, dev, req->qw_0, req->qw_1,
req->priv_data[0], req->priv_data[1],
iommu->prq_seq_number++);
- pci_dev_put(pdev);
+ mutex_unlock(&iommu->iopf_lock);
prq_advance:
head = (head + sizeof(*req)) & PRQ_RING_MASK;
}
--
2.34.1



2024-02-21 15:34:47

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] iommu/vt-d: Use device rbtree in iopf reporting path

On Tue, Feb 20, 2024 at 02:59:39PM +0800, Lu Baolu wrote:
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index acfe27bd3448..6743fe6c7a36 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -4430,8 +4430,11 @@ static struct iommu_device *intel_iommu_probe_device(struct device *dev)
> static void intel_iommu_release_device(struct device *dev)
> {
> struct device_domain_info *info = dev_iommu_priv_get(dev);
> + struct intel_iommu *iommu = info->iommu;
>
> + mutex_lock(&iommu->iopf_lock);
> device_rbtree_remove(info);
> + mutex_unlock(&iommu->iopf_lock);

This seems like a pretty reasonable solution, maybe someday it can
become lockless.. This is a fast path right?

> @@ -691,21 +691,22 @@ static irqreturn_t prq_event_thread(int irq, void *d)
> if (unlikely(req->lpig && !req->rd_req && !req->wr_req))
> goto prq_advance;
>
> - pdev = pci_get_domain_bus_and_slot(iommu->segment,
> - PCI_BUS_NUM(req->rid),
> - req->rid & 0xff);
> /*
> * If prq is to be handled outside iommu driver via receiver of
> * the fault notifiers, we skip the page response here.
> */
> - if (!pdev)
> + mutex_lock(&iommu->iopf_lock);
> + dev = device_rbtree_find(iommu, req->rid);
> + if (!dev) {
> + mutex_unlock(&iommu->iopf_lock);
> goto bad_req;
> + }

Though now we have a mutex and a spinlock covering the same data
structure.. It could be optimized some more.

But maybe we should leave micro optimization aside for now.

Reviewed-by: Jason Gunthorpe <[email protected]>

Jason

2024-03-07 19:44:59

by Bert Karwatzki

[permalink] [raw]
Subject: [PATCH] iommu: fix compilation without CONFIG_IOMMU_INTEL

When the kernel is comiled with CONFIG_IRQ_REMAP=y but without
CONFIG_IOMMU_INTEL compilation fails since commit def054b01a8678 with an
undefined reference to device_rbtree_find(). This patch makes sure that
intel specific code is only compiled with CONFIG_IOMMU_INTEL=y.

Fixes: def054b01a8678 ("iommu/vt-d: Use device rbtree in iopf reporting path")

Signed-off-by: Bert Karwatzki <[email protected]>
---
drivers/iommu/Kconfig | 2 +-
drivers/iommu/intel/Makefile | 2 ++
drivers/iommu/irq_remapping.c | 3 ++-
3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index e0796fa84227..0af39bbbe3a3 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -199,7 +199,7 @@ source "drivers/iommu/iommufd/Kconfig"
config IRQ_REMAP
bool "Support for Interrupt Remapping"
depends on X86_64 && X86_IO_APIC && PCI_MSI && ACPI
- select DMAR_TABLE
+ select DMAR_TABLE if INTEL_IOMMU
help
Supports Interrupt remapping for IO-APIC and MSI devices.
To use x2apic mode in the CPU's which support x2APIC enhancements or
diff --git a/drivers/iommu/intel/Makefile b/drivers/iommu/intel/Makefile
index 5dabf081a779..5402b699a122 100644
--- a/drivers/iommu/intel/Makefile
+++ b/drivers/iommu/intel/Makefile
@@ -5,5 +5,7 @@ obj-$(CONFIG_DMAR_TABLE) += trace.o cap_audit.o
obj-$(CONFIG_DMAR_PERF) += perf.o
obj-$(CONFIG_INTEL_IOMMU_DEBUGFS) += debugfs.o
obj-$(CONFIG_INTEL_IOMMU_SVM) += svm.o
+ifdef CONFIG_INTEL_IOMMU
obj-$(CONFIG_IRQ_REMAP) += irq_remapping.o
+endif
obj-$(CONFIG_INTEL_IOMMU_PERF_EVENTS) += perfmon.o
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index 83314b9d8f38..ee59647c2050 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -99,7 +99,8 @@ int __init irq_remapping_prepare(void)
if (disable_irq_remap)
return -ENOSYS;

- if (intel_irq_remap_ops.prepare() == 0)
+ if (IS_ENABLED(CONFIG_INTEL_IOMMU) &&
+ intel_irq_remap_ops.prepare() == 0)
remap_ops = &intel_irq_remap_ops;
else if (IS_ENABLED(CONFIG_AMD_IOMMU) &&
amd_iommu_irq_ops.prepare() == 0)
--
2.39.2

Since commit def054b01a8678 compilation fails on x86_64 without
CONFIG_IOMMU_INTEL=y with an undefined reference to device_rbtree_find()
when linking drivers/iommu/intel/dmar.o. Even though this file is intel
specific it is compile because CONFIG_IRQ_REMAP unconditionally selects
CONFIG_DMAR_TABLE. This patch fixes this by only compiling intel
specific files when CONFIG_IOMMU_INTEL=y.

Bert Karwatzki


2024-03-08 05:47:24

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH] iommu: fix compilation without CONFIG_IOMMU_INTEL

On 3/8/24 3:44 AM, Bert Karwatzki wrote:
> When the kernel is comiled with CONFIG_IRQ_REMAP=y but without
> CONFIG_IOMMU_INTEL compilation fails since commit def054b01a8678 with an
> undefined reference to device_rbtree_find(). This patch makes sure that
> intel specific code is only compiled with CONFIG_IOMMU_INTEL=y.
>
> Fixes: def054b01a8678 ("iommu/vt-d: Use device rbtree in iopf reporting path")

I think it should fix below commit:

Fixes: 80a9b50c0b9e ("iommu/vt-d: Improve ITE fault handling if target
device isn't present")

>
> Signed-off-by: Bert Karwatzki<[email protected]>

For users who want a configuration with interrupt remapping and without
DMA remapping, they can achieve this by selecting
IOMMU_DEFAULT_PASSTHROUGH or using kernel command "iommu.passthrough=1".

Reviewed-by: Lu Baolu <[email protected]>

Best regards,
baolu



2024-03-08 08:04:13

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH] iommu: fix compilation without CONFIG_IOMMU_INTEL

On Thu, Mar 07, 2024 at 08:44:19PM +0100, Bert Karwatzki wrote:
> When the kernel is comiled with CONFIG_IRQ_REMAP=y but without
> CONFIG_IOMMU_INTEL compilation fails since commit def054b01a8678 with an
> undefined reference to device_rbtree_find(). This patch makes sure that
> intel specific code is only compiled with CONFIG_IOMMU_INTEL=y.
>
> Fixes: def054b01a8678 ("iommu/vt-d: Use device rbtree in iopf reporting path")
>
> Signed-off-by: Bert Karwatzki <[email protected]>
> ---
> drivers/iommu/Kconfig | 2 +-
> drivers/iommu/intel/Makefile | 2 ++
> drivers/iommu/irq_remapping.c | 3 ++-
> 3 files changed, 5 insertions(+), 2 deletions(-)

Applied, thanks.