2021-09-10 12:09:04

by Hamza Mahfooz

[permalink] [raw]
Subject: [PATCH] dma-debug: prevent an error message from causing runtime problems

For some drivers, that call add_dma_entry() from somewhere down the call
stack. If this error condition is triggered once, it causes the error
message to spam the kernel's printk buffer and bring the CPU usage up to
100%. Also, since there is at least one driver that is in the mainline
and suffers from the error condition, it is more useful to WARN_ON() here
instead of just printing the error message (in hopes that it will make it
easier for other drivers that suffer from this issue to be spotted).

Link: https://lkml.kernel.org/r/[email protected]
Reported-by: Jeremy Linton <[email protected]>
Signed-off-by: Hamza Mahfooz <[email protected]>
---
kernel/dma/debug.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index 6c90c69e5311..d9806689666e 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -567,7 +567,9 @@ static void add_dma_entry(struct dma_debug_entry *entry)
pr_err("cacheline tracking ENOMEM, dma-debug disabled\n");
global_disable = true;
} else if (rc == -EEXIST) {
- pr_err("cacheline tracking EEXIST, overlapping mappings aren't supported\n");
+ WARN_ONCE(1,
+ pr_fmt("cacheline tracking EEXIST, overlapping mappings aren't supported\n"
+ ));
}
}

--
2.33.0


2021-09-10 12:57:43

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH] dma-debug: prevent an error message from causing runtime problems

On 2021-09-10 13:05, Hamza Mahfooz wrote:
> For some drivers, that call add_dma_entry() from somewhere down the call
> stack.

Nit: strictly, drivers don't call add_dma_entry(). Drivers only call the
DMA API functions, and it is the DMA API internals which take a detour
through dma-debug when desired.

> If this error condition is triggered once, it causes the error
> message to spam the kernel's printk buffer

Is that true? It doesn't look like anything in dma-debug itself can
obviously lead to that; I was assuming that in Jeremy's case it's the
driver which has managed to do something such that every new mapping
call it makes ends up hitting the warning. A busy network interface is
probably more than capable of saturating the kernel log with a print for
every packet (particularly a great big 100GBE-capable multi-queue thing
like that one).

> and bring the CPU usage up to
> 100%. Also, since there is at least one driver that is in the mainline
> and suffers from the error condition, it is more useful to WARN_ON() here
> instead of just printing the error message (in hopes that it will make it
> easier for other drivers that suffer from this issue to be spotted).
>
> Link: https://lkml.kernel.org/r/[email protected]
> Reported-by: Jeremy Linton <[email protected]>
> Signed-off-by: Hamza Mahfooz <[email protected]>
> ---
> kernel/dma/debug.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
> index 6c90c69e5311..d9806689666e 100644
> --- a/kernel/dma/debug.c
> +++ b/kernel/dma/debug.c
> @@ -567,7 +567,9 @@ static void add_dma_entry(struct dma_debug_entry *entry)
> pr_err("cacheline tracking ENOMEM, dma-debug disabled\n");
> global_disable = true;
> } else if (rc == -EEXIST) {
> - pr_err("cacheline tracking EEXIST, overlapping mappings aren't supported\n");
> + WARN_ONCE(1,
> + pr_fmt("cacheline tracking EEXIST, overlapping mappings aren't supported\n"
> + ));

Unless there's some subtlety I'm missing, it would be better to use
err_printk() here - not only for consistency of output, but also to tie
in with dma-debug's existing output-limiting controls.

Robin.

> }
> }
>
>