2022-10-25 09:06:08

by Lennert Buytenhek

[permalink] [raw]
Subject: [PATCH,RFC] iommu/vt-d: Convert dmar_fault IRQ to a threaded IRQ

Under a high enough I/O page fault load, the dmar_fault hardirq handler
can end up starving other tasks that wanted to run on the CPU that the
IRQ is being routed to. On an i7-6700 CPU this seems to happen at
around 2.5 million I/O page faults per second, and at a fraction of
that rate on some of the lower-end CPUs that we use.

An I/O page fault rate of 2.5 million per second may seem like a very
high number, but when we get an I/O page fault for every cache line
touched by a DMA operation, this I/O page fault rate can be the result
of a confused PCIe device DMAing to RAM at 2.5 * 64 = 160 MB/sec, which
is not an unlikely rate to be DMAing things to RAM at. And, in fact,
when we do see PCIe devices getting confused like this, this sort of
I/O page fault rate is not uncommon.

A peripheral device continuously DMAing to RAM at 160 MB/s is
inarguably a bug, either in the kernel driver for the device or in the
firmware for the device, and should be fixed there, but it's the sort
of bug that iommu/vt-d could be handling better than it currently does,
and there is a fairly simple way to achieve that.

This patch changes the dmar_fault IRQ handler to be a threaded IRQ
handler. This is a pretty minimal code change, and comes with the
advantage that Intel IOMMU I/O page fault handling work is now subject
to RT throttling, which allows it to be kept under control using the
sched_rt_period_us / sched_rt_runtime_us parameters.

iommu/amd already uses a threaded IRQ handler for its I/O page fault
reporting, and so it already has this advantage.

When IRQ remapping is enabled, iommu/vt-d will try to set up its
dmar_fault IRQ handler from start_kernel() -> x86_late_time_init()
-> apic_intr_mode_init() -> apic_bsp_setup() ->
irq_remap_enable_fault_handling() -> enable_drhd_fault_handling(),
which happens before kthreadd is started, and trying to set up a
threaded IRQ handler this early on will oops. However, there
doesn't seem to be a reason why iommu/vt-d needs to set up its fault
reporting IRQ handler this early, and if we remove the IRQ setup code
from enable_drhd_fault_handling(), the IRQ will be registered instead
from pci_iommu_init() -> intel_iommu_init() -> init_dmars(), which
seems to work just fine.

Suggested-by: Scarlett Gourley <[email protected]>
Suggested-by: James Sewart <[email protected]>
Suggested-by: Jack O'Sullivan <[email protected]>
Signed-off-by: Lennert Buytenhek <[email protected]>
---
drivers/iommu/intel/dmar.c | 27 ++-------------------------
1 file changed, 2 insertions(+), 25 deletions(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 5a8f780e7ffd..d0871fe9d04d 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -2043,7 +2043,8 @@ int dmar_set_interrupt(struct intel_iommu *iommu)
return -EINVAL;
}

- ret = request_irq(irq, dmar_fault, IRQF_NO_THREAD, iommu->name, iommu);
+ ret = request_threaded_irq(irq, NULL, dmar_fault, IRQF_ONESHOT,
+ iommu->name, iommu);
if (ret)
pr_err("Can't request irq\n");
return ret;
@@ -2051,30 +2052,6 @@ int dmar_set_interrupt(struct intel_iommu *iommu)

int __init enable_drhd_fault_handling(void)
{
- struct dmar_drhd_unit *drhd;
- struct intel_iommu *iommu;
-
- /*
- * Enable fault control interrupt.
- */
- for_each_iommu(iommu, drhd) {
- u32 fault_status;
- int ret = dmar_set_interrupt(iommu);
-
- if (ret) {
- pr_err("DRHD %Lx: failed to enable fault, interrupt, ret %d\n",
- (unsigned long long)drhd->reg_base_addr, ret);
- return -1;
- }
-
- /*
- * Clear any previous faults.
- */
- dmar_fault(iommu->irq, iommu);
- fault_status = readl(iommu->reg + DMAR_FSTS_REG);
- writel(fault_status, iommu->reg + DMAR_FSTS_REG);
- }
-
return 0;
}

--
2.37.3


2022-10-26 02:42:10

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH,RFC] iommu/vt-d: Convert dmar_fault IRQ to a threaded IRQ

On 10/25/22 4:08 PM, Lennert Buytenhek wrote:
> Under a high enough I/O page fault load, the dmar_fault hardirq handler
> can end up starving other tasks that wanted to run on the CPU that the
> IRQ is being routed to. On an i7-6700 CPU this seems to happen at
> around 2.5 million I/O page faults per second, and at a fraction of
> that rate on some of the lower-end CPUs that we use.
>
> An I/O page fault rate of 2.5 million per second may seem like a very
> high number, but when we get an I/O page fault for every cache line
> touched by a DMA operation, this I/O page fault rate can be the result
> of a confused PCIe device DMAing to RAM at 2.5 * 64 = 160 MB/sec, which
> is not an unlikely rate to be DMAing things to RAM at. And, in fact,
> when we do see PCIe devices getting confused like this, this sort of
> I/O page fault rate is not uncommon.
>
> A peripheral device continuously DMAing to RAM at 160 MB/s is
> inarguably a bug, either in the kernel driver for the device or in the
> firmware for the device, and should be fixed there, but it's the sort
> of bug that iommu/vt-d could be handling better than it currently does,
> and there is a fairly simple way to achieve that.
>
> This patch changes the dmar_fault IRQ handler to be a threaded IRQ
> handler. This is a pretty minimal code change, and comes with the
> advantage that Intel IOMMU I/O page fault handling work is now subject
> to RT throttling, which allows it to be kept under control using the
> sched_rt_period_us / sched_rt_runtime_us parameters.

Thanks for the patch! I like it, but also have some concerns.

If you look at the commit history, you will find that the opposite
change took place 10+ years ago.

commit 477694e71113fd0694b6bb0bcc2d006b8ac62691
Author: Thomas Gleixner <[email protected]>
Date: Tue Jul 19 16:25:42 2011 +0200

x86, iommu: Mark DMAR IRQ as non-threaded

Mark this lowlevel IRQ handler as non-threaded. This prevents a boot
crash when "threadirqs" is on the kernel commandline. Also the
interrupt handler is handling hardware critical events which should
not be delayed into a thread.

Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>

I am not sure whether the "boot crash" mentioned above is due to that
"trying to setup a threaded IRQ handler before kthreadd is started".

>
> iommu/amd already uses a threaded IRQ handler for its I/O page fault
> reporting, and so it already has this advantage.
>
> When IRQ remapping is enabled, iommu/vt-d will try to set up its
> dmar_fault IRQ handler from start_kernel() -> x86_late_time_init()
> -> apic_intr_mode_init() -> apic_bsp_setup() ->
> irq_remap_enable_fault_handling() -> enable_drhd_fault_handling(),
> which happens before kthreadd is started, and trying to set up a
> threaded IRQ handler this early on will oops. However, there
> doesn't seem to be a reason why iommu/vt-d needs to set up its fault
> reporting IRQ handler this early, and if we remove the IRQ setup code
> from enable_drhd_fault_handling(), the IRQ will be registered instead
> from pci_iommu_init() -> intel_iommu_init() -> init_dmars(), which
> seems to work just fine.

At present, we cannot do so. Because the VT-d interrupt remapping and
DMA remapping can be independently enabled. In another words, it's a
possible case where interrupt remapping is enabled while DMA remapping
is not.

>
> Suggested-by: Scarlett Gourley <[email protected]>
> Suggested-by: James Sewart <[email protected]>
> Suggested-by: Jack O'Sullivan <[email protected]>
> Signed-off-by: Lennert Buytenhek <[email protected]>
> ---
> drivers/iommu/intel/dmar.c | 27 ++-------------------------
> 1 file changed, 2 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> index 5a8f780e7ffd..d0871fe9d04d 100644
> --- a/drivers/iommu/intel/dmar.c
> +++ b/drivers/iommu/intel/dmar.c
> @@ -2043,7 +2043,8 @@ int dmar_set_interrupt(struct intel_iommu *iommu)
> return -EINVAL;
> }
>
> - ret = request_irq(irq, dmar_fault, IRQF_NO_THREAD, iommu->name, iommu);
> + ret = request_threaded_irq(irq, NULL, dmar_fault, IRQF_ONESHOT,
> + iommu->name, iommu);
> if (ret)
> pr_err("Can't request irq\n");
> return ret;
> @@ -2051,30 +2052,6 @@ int dmar_set_interrupt(struct intel_iommu *iommu)
>
> int __init enable_drhd_fault_handling(void)
> {
> - struct dmar_drhd_unit *drhd;
> - struct intel_iommu *iommu;
> -
> - /*
> - * Enable fault control interrupt.
> - */
> - for_each_iommu(iommu, drhd) {
> - u32 fault_status;
> - int ret = dmar_set_interrupt(iommu);
> -
> - if (ret) {
> - pr_err("DRHD %Lx: failed to enable fault, interrupt, ret %d\n",
> - (unsigned long long)drhd->reg_base_addr, ret);
> - return -1;
> - }
> -
> - /*
> - * Clear any previous faults.
> - */
> - dmar_fault(iommu->irq, iommu);
> - fault_status = readl(iommu->reg + DMAR_FSTS_REG);
> - writel(fault_status, iommu->reg + DMAR_FSTS_REG);
> - }
> -
> return 0;
> }
>

Best regards,
baolu

2022-10-27 08:29:33

by Lennert Buytenhek

[permalink] [raw]
Subject: Re: [PATCH,RFC] iommu/vt-d: Convert dmar_fault IRQ to a threaded IRQ

On Wed, Oct 26, 2022 at 10:10:29AM +0800, Baolu Lu wrote:

> > Under a high enough I/O page fault load, the dmar_fault hardirq handler
> > can end up starving other tasks that wanted to run on the CPU that the
> > IRQ is being routed to. On an i7-6700 CPU this seems to happen at
> > around 2.5 million I/O page faults per second, and at a fraction of
> > that rate on some of the lower-end CPUs that we use.
> >
> > An I/O page fault rate of 2.5 million per second may seem like a very
> > high number, but when we get an I/O page fault for every cache line
> > touched by a DMA operation, this I/O page fault rate can be the result
> > of a confused PCIe device DMAing to RAM at 2.5 * 64 = 160 MB/sec, which
> > is not an unlikely rate to be DMAing things to RAM at. And, in fact,
> > when we do see PCIe devices getting confused like this, this sort of
> > I/O page fault rate is not uncommon.
> >
> > A peripheral device continuously DMAing to RAM at 160 MB/s is
> > inarguably a bug, either in the kernel driver for the device or in the
> > firmware for the device, and should be fixed there, but it's the sort
> > of bug that iommu/vt-d could be handling better than it currently does,
> > and there is a fairly simple way to achieve that.
> >
> > This patch changes the dmar_fault IRQ handler to be a threaded IRQ
> > handler. This is a pretty minimal code change, and comes with the
> > advantage that Intel IOMMU I/O page fault handling work is now subject
> > to RT throttling, which allows it to be kept under control using the
> > sched_rt_period_us / sched_rt_runtime_us parameters.
>
> Thanks for the patch! I like it, but also have some concerns.

Thanks for having a look!


> If you look at the commit history, you will find that the opposite
> change took place 10+ years ago.
>
> commit 477694e71113fd0694b6bb0bcc2d006b8ac62691
> Author: Thomas Gleixner <[email protected]>
> Date: Tue Jul 19 16:25:42 2011 +0200
>
> x86, iommu: Mark DMAR IRQ as non-threaded
>
> Mark this lowlevel IRQ handler as non-threaded. This prevents a boot
> crash when "threadirqs" is on the kernel commandline. Also the
> interrupt handler is handling hardware critical events which should
> not be delayed into a thread.
>
> Signed-off-by: Thomas Gleixner <[email protected]>
> Cc: [email protected]
> Signed-off-by: Ingo Molnar <[email protected]>
>
> I am not sure whether the "boot crash" mentioned above is due to that
> "trying to setup a threaded IRQ handler before kthreadd is started".

On v6.1-rc you also get a boot crash if you force the dmar_fault IRQ
to be a threaded IRQ without moving the IRQ registration out of the
start_kernel() -> x86_late_time_init() -> apic_intr_mode_init() ->
apic_bsp_setup() -> irq_remap_enable_fault_handling() ->
enable_drhd_fault_handling() path. The crash seen on v3.0 when forcing
the dmar_fault IRQ to be a threaded IRQ may have been due to the same
reason, but I'm not sure how this may have worked in 2011. :-)

I'm not sure I agree with the "the interrupt handler is handling
hardware critical events which should not be delayed into a thread"
part of this commit message. All that dmar_fault does is log
translation faults to the console, and I don't think that anything
will break if that gets delayed for a while.


> > iommu/amd already uses a threaded IRQ handler for its I/O page fault
> > reporting, and so it already has this advantage.
> >
> > When IRQ remapping is enabled, iommu/vt-d will try to set up its
> > dmar_fault IRQ handler from start_kernel() -> x86_late_time_init()
> > -> apic_intr_mode_init() -> apic_bsp_setup() ->
> > irq_remap_enable_fault_handling() -> enable_drhd_fault_handling(),
> > which happens before kthreadd is started, and trying to set up a
> > threaded IRQ handler this early on will oops. However, there
> > doesn't seem to be a reason why iommu/vt-d needs to set up its fault
> > reporting IRQ handler this early, and if we remove the IRQ setup code
> > from enable_drhd_fault_handling(), the IRQ will be registered instead
> > from pci_iommu_init() -> intel_iommu_init() -> init_dmars(), which
> > seems to work just fine.
>
> At present, we cannot do so. Because the VT-d interrupt remapping and
> DMA remapping can be independently enabled. In another words, it's a
> possible case where interrupt remapping is enabled while DMA remapping
> is not.

Is there a way I can test this easily?

I think we should be able to handle the "interrupt remapping enabled
but DMA remapping disabled" case in the same way, by registering the
dmar_fault IRQ sometime after kthreadd has been started. I don't think
the dmar_fault handler performs any function that is critical for the
operation of the IOMMU, and I think that we can defer setting it up
until whenever is convenient.

Thank you!


> > Suggested-by: Scarlett Gourley <[email protected]>
> > Suggested-by: James Sewart <[email protected]>
> > Suggested-by: Jack O'Sullivan <[email protected]>
> > Signed-off-by: Lennert Buytenhek <[email protected]>
> > ---
> > drivers/iommu/intel/dmar.c | 27 ++-------------------------
> > 1 file changed, 2 insertions(+), 25 deletions(-)
> >
> > diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> > index 5a8f780e7ffd..d0871fe9d04d 100644
> > --- a/drivers/iommu/intel/dmar.c
> > +++ b/drivers/iommu/intel/dmar.c
> > @@ -2043,7 +2043,8 @@ int dmar_set_interrupt(struct intel_iommu *iommu)
> > return -EINVAL;
> > }
> > - ret = request_irq(irq, dmar_fault, IRQF_NO_THREAD, iommu->name, iommu);
> > + ret = request_threaded_irq(irq, NULL, dmar_fault, IRQF_ONESHOT,
> > + iommu->name, iommu);
> > if (ret)
> > pr_err("Can't request irq\n");
> > return ret;
> > @@ -2051,30 +2052,6 @@ int dmar_set_interrupt(struct intel_iommu *iommu)
> > int __init enable_drhd_fault_handling(void)
> > {
> > - struct dmar_drhd_unit *drhd;
> > - struct intel_iommu *iommu;
> > -
> > - /*
> > - * Enable fault control interrupt.
> > - */
> > - for_each_iommu(iommu, drhd) {
> > - u32 fault_status;
> > - int ret = dmar_set_interrupt(iommu);
> > -
> > - if (ret) {
> > - pr_err("DRHD %Lx: failed to enable fault, interrupt, ret %d\n",
> > - (unsigned long long)drhd->reg_base_addr, ret);
> > - return -1;
> > - }
> > -
> > - /*
> > - * Clear any previous faults.
> > - */
> > - dmar_fault(iommu->irq, iommu);
> > - fault_status = readl(iommu->reg + DMAR_FSTS_REG);
> > - writel(fault_status, iommu->reg + DMAR_FSTS_REG);
> > - }
> > -
> > return 0;
> > }

2022-10-29 08:44:41

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH,RFC] iommu/vt-d: Convert dmar_fault IRQ to a threaded IRQ

On 2022/10/27 16:19, Lennert Buytenhek wrote:
>>> iommu/amd already uses a threaded IRQ handler for its I/O page fault
>>> reporting, and so it already has this advantage.
>>>
>>> When IRQ remapping is enabled, iommu/vt-d will try to set up its
>>> dmar_fault IRQ handler from start_kernel() -> x86_late_time_init()
>>> -> apic_intr_mode_init() -> apic_bsp_setup() ->
>>> irq_remap_enable_fault_handling() -> enable_drhd_fault_handling(),
>>> which happens before kthreadd is started, and trying to set up a
>>> threaded IRQ handler this early on will oops. However, there
>>> doesn't seem to be a reason why iommu/vt-d needs to set up its fault
>>> reporting IRQ handler this early, and if we remove the IRQ setup code
>>> from enable_drhd_fault_handling(), the IRQ will be registered instead
>>> from pci_iommu_init() -> intel_iommu_init() -> init_dmars(), which
>>> seems to work just fine.
>> At present, we cannot do so. Because the VT-d interrupt remapping and
>> DMA remapping can be independently enabled. In another words, it's a
>> possible case where interrupt remapping is enabled while DMA remapping
>> is not.
> Is there a way I can test this easily?
>
> I think we should be able to handle the "interrupt remapping enabled
> but DMA remapping disabled" case in the same way, by registering the
> dmar_fault IRQ sometime after kthreadd has been started. I don't think
> the dmar_fault handler performs any function that is critical for the
> operation of the IOMMU, and I think that we can defer setting it up
> until whenever is convenient.

Another possible way is not to split VT-d DMA remapping and interrupt
remapping. The possible case of "intr remapping enabled but DMA
remapping not" that I can imagine is that the guest VM doesn't want DMA
translation because of poor efficiency. If so, the overhead impacted by
DMA translation can be eliminated through "iommu=pt" or kernel build
configuration. Of course, there may also be some special needs that I
did not think of.

Best regards,
baolu