2021-07-31 19:27:35

by Luigi Rizzo

[permalink] [raw]
Subject: [PATCH] amd/iommu: fix logic bug in amd_iommu_report_page_fault()

amd_iommu_report_page_fault() has two print paths, depending on whether or
not it can find a pci device. But the code erroneously enters the second
path if the rate limiter in the first path triggers:
if (dev_data && ratelimit(A)) { A; } else if (ratelimit(B)) { B; }
The correct code should be
if (dev_data) { if (ratelimit(A)) { A;} } else if (ratelimit(B)) { B; }

Signed-off-by: Luigi Rizzo <[email protected]>
---
drivers/iommu/amd/iommu.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 811a49a95d04..38b4aff70800 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -480,9 +480,11 @@ static void amd_iommu_report_page_fault(u16 devid, u16 domain_id,
if (pdev)
dev_data = dev_iommu_priv_get(&pdev->dev);

- if (dev_data && __ratelimit(&dev_data->rs)) {
- pci_err(pdev, "Event logged [IO_PAGE_FAULT domain=0x%04x address=0x%llx flags=0x%04x]\n",
- domain_id, address, flags);
+ if (dev_data) {
+ if (__ratelimit(&dev_data->rs)) {
+ pci_err(pdev, "Event logged [IO_PAGE_FAULT domain=0x%04x address=0x%llx flags=0x%04x]\n",
+ domain_id, address, flags);
+ }
} else if (printk_ratelimit()) {
pr_err("Event logged [IO_PAGE_FAULT device=%02x:%02x.%x domain=0x%04x address=0x%llx flags=0x%04x]\n",
PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
--
2.32.0.554.ge1b32706d8-goog



2021-08-02 03:53:53

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH] amd/iommu: fix logic bug in amd_iommu_report_page_fault()

On Sat, 31 Jul 2021, Luigi Rizzo wrote:

> amd_iommu_report_page_fault() has two print paths, depending on whether or
> not it can find a pci device. But the code erroneously enters the second
> path if the rate limiter in the first path triggers:
> if (dev_data && ratelimit(A)) { A; } else if (ratelimit(B)) { B; }
> The correct code should be
> if (dev_data) { if (ratelimit(A)) { A;} } else if (ratelimit(B)) { B; }
>
> Signed-off-by: Luigi Rizzo <[email protected]>

Acked-by: David Rientjes <[email protected]>

This would be very helpful so we don't erroneously classify these KERN_ERR
messages as dangerous.

2021-08-02 14:18:26

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH] amd/iommu: fix logic bug in amd_iommu_report_page_fault()

On Sat, Jul 31, 2021 at 12:26:37PM -0700, Luigi Rizzo wrote:
> amd_iommu_report_page_fault() has two print paths, depending on whether or
> not it can find a pci device. But the code erroneously enters the second
> path if the rate limiter in the first path triggers:
> if (dev_data && ratelimit(A)) { A; } else if (ratelimit(B)) { B; }
> The correct code should be
> if (dev_data) { if (ratelimit(A)) { A;} } else if (ratelimit(B)) { B; }
>
> Signed-off-by: Luigi Rizzo <[email protected]>
> ---
> drivers/iommu/amd/iommu.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)

Thanks, but I queued this patch already:

https://lore.kernel.org/r/[email protected]

Regards,

Joerg


2021-08-02 14:33:13

by Luigi Rizzo

[permalink] [raw]
Subject: Re: [PATCH] amd/iommu: fix logic bug in amd_iommu_report_page_fault()

On Mon, Aug 2, 2021 at 4:13 PM Joerg Roedel <[email protected]> wrote:
>
> On Sat, Jul 31, 2021 at 12:26:37PM -0700, Luigi Rizzo wrote:
> > amd_iommu_report_page_fault() has two print paths, depending on whether or
> > not it can find a pci device. But the code erroneously enters the second
> > path if the rate limiter in the first path triggers:
> > if (dev_data && ratelimit(A)) { A; } else if (ratelimit(B)) { B; }
> > The correct code should be
> > if (dev_data) { if (ratelimit(A)) { A;} } else if (ratelimit(B)) { B; }
> >
> > Signed-off-by: Luigi Rizzo <[email protected]>
> > ---
> > drivers/iommu/amd/iommu.c | 8 +++++---
> > 1 file changed, 5 insertions(+), 3 deletions(-)
>
> Thanks, but I queued this patch already:
>
> https://lore.kernel.org/r/[email protected]


Ah didn't realize that. Thank you!

Two questions on the topic:
1. how comes only the AMD driver is so verbose in reporting io page faults?
Neither intel nor other iommu drivers seem to log anything

2. Would it make sense to have a control to disable such logging,
either per-device or globally? Eg something like this (negative
logic so it must be set explicitly to disable logging).


@ -985,6 +985,7 @@ struct device {
defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
bool dma_coherent:1;
#endif
+ bool no_log_fault:1;

...
+ if (!pdev->dev.no_log_fault && __ratelimit(&dev_data->rs))
+ dev_err(&pdev->dev, "Event logged [IO_PAGE_FAULT dom


cheers
luigi

2021-08-02 14:41:38

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH] amd/iommu: fix logic bug in amd_iommu_report_page_fault()

On Mon, Aug 02, 2021 at 04:30:50PM +0200, Luigi Rizzo wrote:
> Ah didn't realize that. Thank you!
>
> Two questions on the topic:
> 1. how comes only the AMD driver is so verbose in reporting io page faults?
> Neither intel nor other iommu drivers seem to log anything

What do you mean by 'verbose'? It is only a line per fault, and at least
the Intel driver also logs DMAR faults with one line per fault :)

> 2. Would it make sense to have a control to disable such logging,
> either per-device or globally? Eg something like this (negative
> logic so it must be set explicitly to disable logging).

Yes, we can talk about that. But only after the trace-event for it
landed in the code. There must be some way to report the faults and if
userspace prefers to catch them via trace-events than we can disable
printing them to the kernel log.

Regards,

Joerg

2021-08-02 15:28:32

by Luigi Rizzo

[permalink] [raw]
Subject: Re: [PATCH] amd/iommu: fix logic bug in amd_iommu_report_page_fault()

On Mon, Aug 2, 2021 at 4:39 PM Joerg Roedel <[email protected]> wrote:
>
> On Mon, Aug 02, 2021 at 04:30:50PM +0200, Luigi Rizzo wrote:
> > Ah didn't realize that. Thank you!
> >
> > Two questions on the topic:
> > 1. how comes only the AMD driver is so verbose in reporting io page faults?
> > Neither intel nor other iommu drivers seem to log anything
>
> What do you mean by 'verbose'? It is only a line per fault, and at least
> the Intel driver also logs DMAR faults with one line per fault :)

Ah never mind, I was grep'ing keywords from the amd message and
there is almost zero overlap!

thanks
luigi