The commit 701850dc0c31bfadf75a0 ("printk, tracing: fix console
tracepoint") moved the tracepoint from console_unlock() to
vprintk_store(). As a result, it might be called in any
context and triggered the following warning:
WARNING: CPU: 1 PID: 16462 at include/trace/events/printk.h:10 printk_sprint+0x81/0xda
Modules linked in: ppdev parport_pc parport
CPU: 1 PID: 16462 Comm: event_benchmark Not tainted 5.19.0-rc5-test+ #5
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
EIP: printk_sprint+0x81/0xda
Code: 89 d8 e8 88 fc 33 00 e9 02 00 00 00 eb 6b 64 a1 a4 b8 91 c1 e8 fd d6 ff ff 84 c0 74 5c 64 a1 14 08 92 c1 a9 00 00 f0 00 74 02 <0f> 0b 64 ff 05 14 08 92 c1 b8 e0 c4 6b c1 e8 a5 dc 00 00 89 c7 e8
EAX: 80110001 EBX: c20a52f8 ECX: 0000000c EDX: 6d203036
ESI: 3df6004c EDI: 00000000 EBP: c61fbd7c ESP: c61fbd70
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010006
CR0: 80050033 CR2: b7efc000 CR3: 05b80000 CR4: 001506f0
Call Trace:
vprintk_store+0x24b/0x2ff
vprintk+0x37/0x4d
_printk+0x14/0x16
nmi_handle+0x1ef/0x24e
? find_next_bit.part.0+0x13/0x13
? find_next_bit.part.0+0x13/0x13
? function_trace_call+0xd8/0xd9
default_do_nmi+0x57/0x1af
? trace_hardirqs_off_finish+0x2a/0xd9
? to_kthread+0xf/0xf
exc_nmi+0x9b/0xf4
asm_exc_nmi+0xae/0x29c
It comes from:
#define __DO_TRACE(name, args, cond, rcuidle) \
[...]
/* srcu can't be used from NMI */ \
WARN_ON_ONCE(rcuidle && in_nmi()); \
It might be possible to make srcu working in NMI. But it
would be slower on some architectures. It is not worth
doing it just because of this tracepoint.
It would be possible to disable this tracepoint in NMI
or in rcuidle context. Where the rcuidle context looks
more rare and thus more acceptable to be ignored.
Alternative solution would be to move the tracepoint
back to console code. But the location is less reliable
by definition. Also the synchronization against other
tracing messages is much worse.
Let's ignore the tracepoint in rcuidle context as the least
evil solution.
Link: https://lore.kernel.org/r/20220712151655.GU1790663@paulmck-ThinkPad-P17-Gen-1
Suggested-by: Steven Rostedt <[email protected]>
Signed-off-by: Petr Mladek <[email protected]>
---
include/trace/events/printk.h | 11 ++++++++++-
kernel/printk/printk.c | 2 +-
2 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/include/trace/events/printk.h b/include/trace/events/printk.h
index 13d405b2fd8b..a3ee720f41b5 100644
--- a/include/trace/events/printk.h
+++ b/include/trace/events/printk.h
@@ -7,11 +7,20 @@
#include <linux/tracepoint.h>
-TRACE_EVENT(console,
+TRACE_EVENT_CONDITION(console,
TP_PROTO(const char *text, size_t len),
TP_ARGS(text, len),
+ /*
+ * trace_console_rcuidle() is not working in NMI. printk()
+ * is used more often in NMI than in rcuidle context.
+ * Choose the less evil solution here.
+ *
+ * raw_smp_processor_id() is reliable in rcuidle context.
+ */
+ TP_CONDITION(!rcu_is_idle_cpu(raw_smp_processor_id())),
+
TP_STRUCT__entry(
__dynamic_array(char, msg, len + 1)
),
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b49c6ff6dca0..bd76a45ecc7f 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2108,7 +2108,7 @@ static u16 printk_sprint(char *text, u16 size, int facility,
}
}
- trace_console_rcuidle(text, text_len);
+ trace_console(text, text_len);
return text_len;
}
--
2.35.3
On Fri, 15 Jul 2022 at 14:02, Petr Mladek <[email protected]> wrote:
>
> The commit 701850dc0c31bfadf75a0 ("printk, tracing: fix console
> tracepoint") moved the tracepoint from console_unlock() to
> vprintk_store(). As a result, it might be called in any
> context and triggered the following warning:
>
> WARNING: CPU: 1 PID: 16462 at include/trace/events/printk.h:10 printk_sprint+0x81/0xda
> Modules linked in: ppdev parport_pc parport
> CPU: 1 PID: 16462 Comm: event_benchmark Not tainted 5.19.0-rc5-test+ #5
> Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
> EIP: printk_sprint+0x81/0xda
> Code: 89 d8 e8 88 fc 33 00 e9 02 00 00 00 eb 6b 64 a1 a4 b8 91 c1 e8 fd d6 ff ff 84 c0 74 5c 64 a1 14 08 92 c1 a9 00 00 f0 00 74 02 <0f> 0b 64 ff 05 14 08 92 c1 b8 e0 c4 6b c1 e8 a5 dc 00 00 89 c7 e8
> EAX: 80110001 EBX: c20a52f8 ECX: 0000000c EDX: 6d203036
> ESI: 3df6004c EDI: 00000000 EBP: c61fbd7c ESP: c61fbd70
> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010006
> CR0: 80050033 CR2: b7efc000 CR3: 05b80000 CR4: 001506f0
> Call Trace:
> vprintk_store+0x24b/0x2ff
> vprintk+0x37/0x4d
> _printk+0x14/0x16
> nmi_handle+0x1ef/0x24e
> ? find_next_bit.part.0+0x13/0x13
> ? find_next_bit.part.0+0x13/0x13
> ? function_trace_call+0xd8/0xd9
> default_do_nmi+0x57/0x1af
> ? trace_hardirqs_off_finish+0x2a/0xd9
> ? to_kthread+0xf/0xf
> exc_nmi+0x9b/0xf4
> asm_exc_nmi+0xae/0x29c
>
> It comes from:
>
> #define __DO_TRACE(name, args, cond, rcuidle) \
> [...]
> /* srcu can't be used from NMI */ \
> WARN_ON_ONCE(rcuidle && in_nmi()); \
>
> It might be possible to make srcu working in NMI. But it
> would be slower on some architectures. It is not worth
> doing it just because of this tracepoint.
>
> It would be possible to disable this tracepoint in NMI
> or in rcuidle context. Where the rcuidle context looks
> more rare and thus more acceptable to be ignored.
>
> Alternative solution would be to move the tracepoint
> back to console code. But the location is less reliable
> by definition. Also the synchronization against other
> tracing messages is much worse.
>
> Let's ignore the tracepoint in rcuidle context as the least
> evil solution.
>
> Link: https://lore.kernel.org/r/20220712151655.GU1790663@paulmck-ThinkPad-P17-Gen-1
>
> Suggested-by: Steven Rostedt <[email protected]>
> Signed-off-by: Petr Mladek <[email protected]>
> ---
> include/trace/events/printk.h | 11 ++++++++++-
> kernel/printk/printk.c | 2 +-
> 2 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/include/trace/events/printk.h b/include/trace/events/printk.h
> index 13d405b2fd8b..a3ee720f41b5 100644
> --- a/include/trace/events/printk.h
> +++ b/include/trace/events/printk.h
> @@ -7,11 +7,20 @@
>
> #include <linux/tracepoint.h>
>
> -TRACE_EVENT(console,
> +TRACE_EVENT_CONDITION(console,
> TP_PROTO(const char *text, size_t len),
>
> TP_ARGS(text, len),
>
> + /*
> + * trace_console_rcuidle() is not working in NMI. printk()
> + * is used more often in NMI than in rcuidle context.
> + * Choose the less evil solution here.
> + *
> + * raw_smp_processor_id() is reliable in rcuidle context.
> + */
> + TP_CONDITION(!rcu_is_idle_cpu(raw_smp_processor_id())),
> +
Couldn't this just use rcu_is_watching()?
| * rcu_is_watching - see if RCU thinks that the current CPU is not idle
Thanks,
-- Marco
On Fri, 15 Jul 2022 14:39:52 +0200
Marco Elver <[email protected]> wrote:
> Couldn't this just use rcu_is_watching()?
>
> | * rcu_is_watching - see if RCU thinks that the current CPU is not idle
Maybe, but I was thinking that Petr had a way to hit the issue that we
worry about. But since the non _rcuide() call requires rcu watching,
prehaps that is better to use.
-- Steve
On Fri, Jul 15, 2022 at 09:51:56AM -0400, Steven Rostedt wrote:
> On Fri, 15 Jul 2022 14:39:52 +0200
> Marco Elver <[email protected]> wrote:
>
> > Couldn't this just use rcu_is_watching()?
> >
> > | * rcu_is_watching - see if RCU thinks that the current CPU is not idle
>
> Maybe, but I was thinking that Petr had a way to hit the issue that we
> worry about. But since the non _rcuide() call requires rcu watching,
> prehaps that is better to use.
In case this helps... ;-)
The rcu_is_watching() function is designed to be used from the current
CPU, so it dispenses with memory ordering. However, it explicitly
disables preemption in order to avoid weird preemption patterns.
The formulation that Marco used is designed to be used from a remote
CPU, and so it includes explicit memory ordering that is not needed
in this case. But it does not disable preemption.
So if preemption is enabled at that point in tracing, you really want
to be using rcu_is_watching().
Thanx, Paul
On Fri, 15 Jul 2022 14:01:52 +0200
Petr Mladek <[email protected]> wrote:
> + /*
> + * trace_console_rcuidle() is not working in NMI. printk()
> + * is used more often in NMI than in rcuidle context.
> + * Choose the less evil solution here.
> + *
> + * raw_smp_processor_id() is reliable in rcuidle context.
> + */
> + TP_CONDITION(!rcu_is_idle_cpu(raw_smp_processor_id())),
> +
As Marco mentioned in the other thread, would a check for
'rcu_is_watching()' be better?
-- Steve
> TP_STRUCT__entry(
> __dynamic_array(char, msg, len + 1)
> ),
On Fri 2022-07-15 08:10:00, Paul E. McKenney wrote:
> On Fri, Jul 15, 2022 at 09:51:56AM -0400, Steven Rostedt wrote:
> > On Fri, 15 Jul 2022 14:39:52 +0200
> > Marco Elver <[email protected]> wrote:
> >
> > > Couldn't this just use rcu_is_watching()?
> > >
> > > | * rcu_is_watching - see if RCU thinks that the current CPU is not idle
> >
> > Maybe, but I was thinking that Petr had a way to hit the issue that we
> > worry about. But since the non _rcuide() call requires rcu watching,
> > prehaps that is better to use.
I actually saw the warning even with simple sysrq+l. I wonder why
I have missed it during testing. It was probably well hidden within
the other backtraces.
I was not aware that rcu_is_watching() and rcu_is_idle_cpu() did
basically the same. I used rcu_is_idle_cpu() because of the "idle"
in the name and the function description ;-)
> In case this helps... ;-)
>
> The rcu_is_watching() function is designed to be used from the current
> CPU, so it dispenses with memory ordering. However, it explicitly
> disables preemption in order to avoid weird preemption patterns.
>
> The formulation that Marco used is designed to be used from a remote
> CPU, and so it includes explicit memory ordering that is not needed
> in this case. But it does not disable preemption.
>
> So if preemption is enabled at that point in tracing, you really want
> to be using rcu_is_watching().
rcu_is_watching() is the right variant then. I am going to send v2.
Thanks a lot for the detailed explanation.
Best Regards,
Petr
On Fri, 15 Jul 2022 08:10:00 -0700
"Paul E. McKenney" <[email protected]> wrote:
> So if preemption is enabled at that point in tracing, you really want
> to be using rcu_is_watching().
And yes, at that point in tracing, preemption is still enabled if the
tracepoint was called with preemption enabled.
Thus, we really need to convert that to rcu_is_watching().
-- Steve