2021-01-23 23:39:43

by Qais Yousef

[permalink] [raw]
Subject: [PATCH 0/2] Fix BUG: Invalid wait context in hrtimer_interrupt()

I hit a pr_warn() inside hrtimer_interrupt() which lead to a BUG: Invalid wait
context splat.

The problem wasn't reproducible but I think the cause is obvious, printk can't
be called from interrupt context.

AFAICU printk_deferred() is safe from NMI, so I assumed it is safe to be called
from hrtimer_interrupt() too. Adding a pr_warn_once() inside
hrtimer_interrupt() in a location where it is always hit produces the BUG
splat. Replacing it with pr_warn_deferred_once() generates the printk warning
without any splat.

I added a new pr_*_deferred_once() variants to avoid open coding; but the name
ended not much shorter and I'm not sure if the wrappers are a win overall.
Since I've already done it, I'm sticking to it in this post. But will be happy
to drop it and just open code the printk_deferred_once(KERN_WARN, ...) in
hrtimer_interrupt() instead.

Thanks

Qais Yousef (2):
printk: Add new pr_*_deferred_once() variants
hrtimer: Use pr_warn_deferred_once() in hrtimer_interrupt()

include/linux/printk.h | 24 ++++++++++++++++++++++++
kernel/time/hrtimer.c | 3 ++-
2 files changed, 26 insertions(+), 1 deletion(-)

--
2.25.1


2021-01-23 23:40:51

by Qais Yousef

[permalink] [raw]
Subject: [PATCH 2/2] hrtimer: Use pr_warn_deferred_once() in hrtimer_interrupt()

printk is not allowed in this context and causes a BUG: Invalid wait context.

Signed-off-by: Qais Yousef <[email protected]>
---
kernel/time/hrtimer.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 743c852e10f2..2d9b7cf1d5e2 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1700,7 +1700,8 @@ void hrtimer_interrupt(struct clock_event_device *dev)
else
expires_next = ktime_add(now, delta);
tick_program_event(expires_next, 1);
- pr_warn_once("hrtimer: interrupt took %llu ns\n", ktime_to_ns(delta));
+ pr_warn_deferred_once("hrtimer: interrupt took %llu ns\n",
+ ktime_to_ns(delta));
}

/* called with interrupts disabled */
--
2.25.1

2021-01-23 23:41:08

by Qais Yousef

[permalink] [raw]
Subject: [PATCH 1/2] printk: Add new pr_*_deferred_once() variants

To allow users in code where printk is not allowed.

Signed-off-by: Qais Yousef <[email protected]>
---
include/linux/printk.h | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index fe7eb2351610..92c0978c7b44 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -480,21 +480,45 @@ extern int kptr_restrict;
printk_once(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
/* no pr_cont_once, don't do that... */

+#define pr_emerg_deferred_once(fmt, ...) \
+ printk_deferred_once(KERN_EMERG pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_alert_deferred_once(fmt, ...) \
+ printk_deferred_once(KERN_ALERT pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_crit_deferred_once(fmt, ...) \
+ printk_deferred_once(KERN_CRIT pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_err_deferred_once(fmt, ...) \
+ printk_deferred_once(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_warn_deferred_once(fmt, ...) \
+ printk_deferred_once(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_notice_deferred_once(fmt, ...) \
+ printk_deferred_once(KERN_NOTICE pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_info_deferred_once(fmt, ...) \
+ printk_deferred_once(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
+/* no pr_cont_deferred_once, don't do that... */
+
#if defined(DEBUG)
#define pr_devel_once(fmt, ...) \
printk_once(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_devel_deferred_once(fmt, ...) \
+ printk_deferred_once(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
#else
#define pr_devel_once(fmt, ...) \
no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_devel_deferred_once(fmt, ...) \
+ no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
#endif

/* If you are writing a driver, please use dev_dbg instead */
#if defined(DEBUG)
#define pr_debug_once(fmt, ...) \
printk_once(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_debug_deferred_once(fmt, ...) \
+ printk_deferred_once(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
#else
#define pr_debug_once(fmt, ...) \
no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_debug_deferred_once(fmt, ...) \
+ no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
#endif

/*
--
2.25.1

2021-01-25 10:48:02

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/2] printk: Add new pr_*_deferred_once() variants

On Sat, Jan 23, 2021 at 11:37:40PM +0000, Qais Yousef wrote:
> To allow users in code where printk is not allowed.
>
> Signed-off-by: Qais Yousef <[email protected]>
> ---
> include/linux/printk.h | 24 ++++++++++++++++++++++++
> 1 file changed, 24 insertions(+)
>
> diff --git a/include/linux/printk.h b/include/linux/printk.h
> index fe7eb2351610..92c0978c7b44 100644
> --- a/include/linux/printk.h
> +++ b/include/linux/printk.h
> @@ -480,21 +480,45 @@ extern int kptr_restrict;
> printk_once(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
> /* no pr_cont_once, don't do that... */
>
> +#define pr_emerg_deferred_once(fmt, ...) \
> + printk_deferred_once(KERN_EMERG pr_fmt(fmt), ##__VA_ARGS__)
> +#define pr_alert_deferred_once(fmt, ...) \
> + printk_deferred_once(KERN_ALERT pr_fmt(fmt), ##__VA_ARGS__)
> +#define pr_crit_deferred_once(fmt, ...) \
> + printk_deferred_once(KERN_CRIT pr_fmt(fmt), ##__VA_ARGS__)
> +#define pr_err_deferred_once(fmt, ...) \
> + printk_deferred_once(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
> +#define pr_warn_deferred_once(fmt, ...) \
> + printk_deferred_once(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__)
> +#define pr_notice_deferred_once(fmt, ...) \
> + printk_deferred_once(KERN_NOTICE pr_fmt(fmt), ##__VA_ARGS__)
> +#define pr_info_deferred_once(fmt, ...) \
> + printk_deferred_once(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
> +/* no pr_cont_deferred_once, don't do that... */

I absolutely detest this.. the whole _deferred thing is an abomination.

We should be very close to printk not needing this anymore, printk
people?

2021-01-25 21:05:23

by Qais Yousef

[permalink] [raw]
Subject: Re: [PATCH 1/2] printk: Add new pr_*_deferred_once() variants

On 01/25/21 12:04, John Ogness wrote:
> On 2021-01-25, Peter Zijlstra <[email protected]> wrote:
> > On Sat, Jan 23, 2021 at 11:37:40PM +0000, Qais Yousef wrote:
> >> To allow users in code where printk is not allowed.
> >>
> >> Signed-off-by: Qais Yousef <[email protected]>
> >> ---
> >> include/linux/printk.h | 24 ++++++++++++++++++++++++
> >> 1 file changed, 24 insertions(+)
> >>
> >> diff --git a/include/linux/printk.h b/include/linux/printk.h
> >> index fe7eb2351610..92c0978c7b44 100644
> >> --- a/include/linux/printk.h
> >> +++ b/include/linux/printk.h
> >> @@ -480,21 +480,45 @@ extern int kptr_restrict;
> >> printk_once(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
> >> /* no pr_cont_once, don't do that... */
> >>
> >> +#define pr_emerg_deferred_once(fmt, ...) \
> >> + printk_deferred_once(KERN_EMERG pr_fmt(fmt), ##__VA_ARGS__)
> >> +#define pr_alert_deferred_once(fmt, ...) \
> >> + printk_deferred_once(KERN_ALERT pr_fmt(fmt), ##__VA_ARGS__)
> >> +#define pr_crit_deferred_once(fmt, ...) \
> >> + printk_deferred_once(KERN_CRIT pr_fmt(fmt), ##__VA_ARGS__)
> >> +#define pr_err_deferred_once(fmt, ...) \
> >> + printk_deferred_once(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
> >> +#define pr_warn_deferred_once(fmt, ...) \
> >> + printk_deferred_once(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__)
> >> +#define pr_notice_deferred_once(fmt, ...) \
> >> + printk_deferred_once(KERN_NOTICE pr_fmt(fmt), ##__VA_ARGS__)
> >> +#define pr_info_deferred_once(fmt, ...) \
> >> + printk_deferred_once(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
> >> +/* no pr_cont_deferred_once, don't do that... */
> >
> > I absolutely detest this.. the whole _deferred thing is an
> > abomination.
>
> And it will disappear at some point.
>
> > We should be very close to printk not needing this anymore, printk
> > people?
>
> It will disappear once console printing threads are introduced. We
> probably still have a few kernel releases until we see that. First we
> need to finish merging full lockless access, remove the safe buffers,
> and merge the atomic consoles.

Okay. As I said in the cover letter, I didn't think the addition of these new
macros looked like a win overall.

I will drop this patch and just open code the use of printk_deferred_once() in
hrtimer_interrupt(). Which should be easier to fix later when it should
disappear.

Thanks

--
Qais Yousef

2021-01-26 16:24:10

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/2] Fix BUG: Invalid wait context in hrtimer_interrupt()

On Tue, Jan 26, 2021 at 04:28:42PM +0106, John Ogness wrote:
> On 2021-01-26, Qais Yousef <[email protected]> wrote:
> > [67628.393550] =============================
> > [67628.393554] [ BUG: Invalid wait context ]
> > [67628.393557] 5.11.0-rc3-00019-g86be331946f7 #37 Not tainted
> > [67628.393560] -----------------------------
> > [67628.393563] sugov:0/192 is trying to lock:
> > [67628.393566] ffff000800b1d898 (&port_lock_key){-.-.}-{3:3}, at: pl011_console_write+0x138/0x218
> > [67628.393581] other info that might help us debug this:
> > [67628.393584] context-{2:2}
> > [67628.393586] 4 locks held by sugov:0/192:
> > [67628.393589] #0: ffff0008059cb720 (&sg_policy->work_lock){+.+.}-{4:4}, at: sugov_work+0x58/0x88
> > [67628.393603] #1: ffff800015446f20 (prepare_lock){+.+.}-{4:4}, at: clk_prepare_lock+0x34/0xb0


> pl011_console_write() does:
>
> local_irq_save(flags);
> spin_lock(&uap->port.lock);
>
> This needs to be:
>
> spin_lock_irqsave(&uap->port.lock, flags);
>
> Of course, it will be a trickier than that because of the
> @oops_in_progress madness. This bug has been around for quite a while.

Well yes, but that's not going to be fixing anything.

The problem is that sugov_work() is holding a raw_spin_lock, and you're
trying to acquire a spin_lock.

That's an invalid lock nesting.

2021-01-26 16:29:22

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/2] Fix BUG: Invalid wait context in hrtimer_interrupt()

On Wed, Jan 27, 2021 at 12:58:33AM +0900, Sergey Senozhatsky wrote:
> On (21/01/26 14:59), Qais Yousef wrote:

> > # [67628.388606] hrtimer: interrupt took 304720 ns
> > [67628.393546]
> > [67628.393550] =============================
> > [67628.393554] [ BUG: Invalid wait context ]
> > [67628.393557] 5.11.0-rc3-00019-g86be331946f7 #37 Not tainted
> > [67628.393560] -----------------------------
> > [67628.393563] sugov:0/192 is trying to lock:
> > [67628.393566] ffff000800b1d898 (&port_lock_key){-.-.}-{3:3}, at: pl011_console_write+0x138/0x218
> > [67628.393581] other info that might help us debug this:
> > [67628.393584] context-{2:2}
> > [67628.393586] 4 locks held by sugov:0/192:
> > [67628.393589] #0: ffff0008059cb720 (&sg_policy->work_lock){+.+.}-{4:4}, at: sugov_work+0x58/0x88
> > [67628.393603] #1: ffff800015446f20 (prepare_lock){+.+.}-{4:4}, at: clk_prepare_lock+0x34/0xb0
> > [67628.393618] #2: ffff8000152aaa60 (console_lock){+.+.}-{0:0}, at: vprintk_emit+0x12c/0x310
> > [67628.393632] #3: ffff8000152aab88 (console_owner){-.-.}-{0:0}, at: console_unlock+0x190/0x6d8

> > Did I miss something?
>
> printk() is not permitted to sleep/schedule/etc and it never does.
> Generally it should be OK to call it from IRQ (module recursion paths).

The report is that it is trying to acquire spin_lock() while holding
raw_spin_lock(), which is an invalid lock nesting.

Note that this is CONFIG_PROVE_RAW_LOCK_NESTING=y which specifically
checks for this.

On current (mainline) kernel configs this is not yet a problem, but the
moment we do land PREEMPT_RT this order will be problematic.

2021-01-26 19:18:44

by Qais Yousef

[permalink] [raw]
Subject: Re: [PATCH 0/2] Fix BUG: Invalid wait context in hrtimer_interrupt()

On 01/26/21 13:46, Sergey Senozhatsky wrote:
> On (21/01/23 23:37), Qais Yousef wrote:
> >
> > I hit a pr_warn() inside hrtimer_interrupt() which lead to a BUG: Invalid wait
> > context splat.
> >
> > The problem wasn't reproducible but I think the cause is obvious, printk can't
> > be called from interrupt context.
> >
> > AFAICU printk_deferred() is safe from NMI, so I assumed it is safe to be called
> > from hrtimer_interrupt() too. Adding a pr_warn_once() inside
> > hrtimer_interrupt() in a location where it is always hit produces the BUG
> > splat. Replacing it with pr_warn_deferred_once() generates the printk warning
> > without any splat.
>
> Could you please post the lockdep splat?

Sorry I should have done that. I thought printk() in interrupt is not allowed
and obvious.

I reported it here: https://lore.kernel.org/lkml/20210114124512.mg3vexudfmoadef5@e107158-lin/

# [67628.388606] hrtimer: interrupt took 304720 ns
[67628.393546]
[67628.393550] =============================
[67628.393554] [ BUG: Invalid wait context ]
[67628.393557] 5.11.0-rc3-00019-g86be331946f7 #37 Not tainted
[67628.393560] -----------------------------
[67628.393563] sugov:0/192 is trying to lock:
[67628.393566] ffff000800b1d898 (&port_lock_key){-.-.}-{3:3}, at: pl011_console_write+0x138/0x218
[67628.393581] other info that might help us debug this:
[67628.393584] context-{2:2}
[67628.393586] 4 locks held by sugov:0/192:
[67628.393589] #0: ffff0008059cb720 (&sg_policy->work_lock){+.+.}-{4:4}, at: sugov_work+0x58/0x88
[67628.393603] #1: ffff800015446f20 (prepare_lock){+.+.}-{4:4}, at: clk_prepare_lock+0x34/0xb0
[67628.393618] #2: ffff8000152aaa60 (console_lock){+.+.}-{0:0}, at: vprintk_emit+0x12c/0x310
[67628.393632] #3: ffff8000152aab88 (console_owner){-.-.}-{0:0}, at: console_unlock+0x190/0x6d8
[67628.393646] stack backtrace:
[67628.393649] CPU: 0 PID: 192 Comm: sugov:0 Not tainted 5.11.0-rc3-00019-g86be331946f7 #37
[67628.393653] Hardware name: ARM Juno development board (r2) (DT)
[67628.393656] Call trace:
[67628.393659] dump_backtrace+0x0/0x1b0
[67628.393661] show_stack+0x20/0x70
[67628.393664] dump_stack+0xf8/0x168
[67628.393666] __lock_acquire+0x964/0x1c88
[67628.393669] lock_acquire+0x26c/0x500
[67628.393671] _raw_spin_lock+0x64/0x88
[67628.393674] pl011_console_write+0x138/0x218
[67628.393677] console_unlock+0x2c4/0x6d8
[67628.393680] vprintk_emit+0x134/0x310
[67628.393682] vprintk_default+0x40/0x50
[67628.393685] vprintk_func+0xfc/0x2b0
[67628.393687] printk+0x68/0x90
[67628.393690] hrtimer_interrupt+0x24c/0x250
[67628.393693] arch_timer_handler_phys+0x3c/0x50
[67628.393695] handle_percpu_devid_irq+0xd8/0x460
[67628.393698] generic_handle_irq+0x38/0x50
[67628.393701] __handle_domain_irq+0x6c/0xc8
[67628.393704] gic_handle_irq+0xb0/0xf0
[67628.393706] el1_irq+0xb4/0x180
[67628.393709] _raw_spin_unlock_irqrestore+0x58/0xa8
[67628.393712] hrtimer_start_range_ns+0x1b0/0x420
[67628.393715] msg_submit+0x100/0x108
[67628.393717] mbox_send_message+0x84/0x128
[67628.393720] scpi_send_message+0x11c/0x2a8
[67628.393723] scpi_dvfs_set_idx+0x48/0x70
[67628.393726] scpi_dvfs_set_rate+0x60/0x78
[67628.393728] clk_change_rate+0x184/0x8a8
[67628.393731] clk_core_set_rate_nolock+0x1c0/0x280
[67628.393734] clk_set_rate+0x40/0x98
[67628.393736] scpi_cpufreq_set_target+0x40/0x68
[67628.393739] __cpufreq_driver_target+0x1a0/0x5d0

> Why is it invalid? Is this... -rt kernel, perhaps?

No I was running on Linus master at the time.

AFAIU printk will hold the console lock which could sleep in interrupt context.

Did I miss something?

Thanks

--
Qais Yousef

2021-01-26 19:23:06

by John Ogness

[permalink] [raw]
Subject: Re: [PATCH 0/2] Fix BUG: Invalid wait context in hrtimer_interrupt()

On 2021-01-26, Qais Yousef <[email protected]> wrote:
> [67628.393550] =============================
> [67628.393554] [ BUG: Invalid wait context ]
> [67628.393557] 5.11.0-rc3-00019-g86be331946f7 #37 Not tainted
> [67628.393560] -----------------------------
> [67628.393563] sugov:0/192 is trying to lock:
> [67628.393566] ffff000800b1d898 (&port_lock_key){-.-.}-{3:3}, at: pl011_console_write+0x138/0x218
> [67628.393581] other info that might help us debug this:
> [67628.393584] context-{2:2}
> [67628.393586] 4 locks held by sugov:0/192:
> [67628.393589] #0: ffff0008059cb720 (&sg_policy->work_lock){+.+.}-{4:4}, at: sugov_work+0x58/0x88
> [67628.393603] #1: ffff800015446f20 (prepare_lock){+.+.}-{4:4}, at: clk_prepare_lock+0x34/0xb0
> [67628.393618] #2: ffff8000152aaa60 (console_lock){+.+.}-{0:0}, at: vprintk_emit+0x12c/0x310
> [67628.393632] #3: ffff8000152aab88 (console_owner){-.-.}-{0:0}, at: console_unlock+0x190/0x6d8
> [67628.393646] stack backtrace:
> [67628.393649] CPU: 0 PID: 192 Comm: sugov:0 Not tainted 5.11.0-rc3-00019-g86be331946f7 #37
> [67628.393653] Hardware name: ARM Juno development board (r2) (DT)
> [67628.393656] Call trace:
> [67628.393659] dump_backtrace+0x0/0x1b0
> [67628.393661] show_stack+0x20/0x70
> [67628.393664] dump_stack+0xf8/0x168
> [67628.393666] __lock_acquire+0x964/0x1c88
> [67628.393669] lock_acquire+0x26c/0x500
> [67628.393671] _raw_spin_lock+0x64/0x88
> [67628.393674] pl011_console_write+0x138/0x218
> [67628.393677] console_unlock+0x2c4/0x6d8
> [67628.393680] vprintk_emit+0x134/0x310
> [67628.393682] vprintk_default+0x40/0x50
> [67628.393685] vprintk_func+0xfc/0x2b0
> [67628.393687] printk+0x68/0x90
> [67628.393690] hrtimer_interrupt+0x24c/0x250
> [67628.393693] arch_timer_handler_phys+0x3c/0x50
> [67628.393695] handle_percpu_devid_irq+0xd8/0x460
> [67628.393698] generic_handle_irq+0x38/0x50
> [67628.393701] __handle_domain_irq+0x6c/0xc8
> [67628.393704] gic_handle_irq+0xb0/0xf0
> [67628.393706] el1_irq+0xb4/0x180
> [67628.393709] _raw_spin_unlock_irqrestore+0x58/0xa8
> [67628.393712] hrtimer_start_range_ns+0x1b0/0x420
> [67628.393715] msg_submit+0x100/0x108
> [67628.393717] mbox_send_message+0x84/0x128
> [67628.393720] scpi_send_message+0x11c/0x2a8
> [67628.393723] scpi_dvfs_set_idx+0x48/0x70
> [67628.393726] scpi_dvfs_set_rate+0x60/0x78
> [67628.393728] clk_change_rate+0x184/0x8a8
> [67628.393731] clk_core_set_rate_nolock+0x1c0/0x280
> [67628.393734] clk_set_rate+0x40/0x98
> [67628.393736] scpi_cpufreq_set_target+0x40/0x68
> [67628.393739] __cpufreq_driver_target+0x1a0/0x5d0

pl011_console_write() does:

local_irq_save(flags);
spin_lock(&uap->port.lock);

This needs to be:

spin_lock_irqsave(&uap->port.lock, flags);

Of course, it will be a trickier than that because of the
@oops_in_progress madness. This bug has been around for quite a while.

John Ogness

2021-01-26 19:32:42

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH 0/2] Fix BUG: Invalid wait context in hrtimer_interrupt()

On (21/01/26 14:59), Qais Yousef wrote:
> On 01/26/21 13:46, Sergey Senozhatsky wrote:
> > On (21/01/23 23:37), Qais Yousef wrote:

[..]

> I reported it here: https://lore.kernel.org/lkml/20210114124512.mg3vexudfmoadef5@e107158-lin/
>
> # [67628.388606] hrtimer: interrupt took 304720 ns
> [67628.393546]
> [67628.393550] =============================
> [67628.393554] [ BUG: Invalid wait context ]
> [67628.393557] 5.11.0-rc3-00019-g86be331946f7 #37 Not tainted
> [67628.393560] -----------------------------
> [67628.393563] sugov:0/192 is trying to lock:
> [67628.393566] ffff000800b1d898 (&port_lock_key){-.-.}-{3:3}, at: pl011_console_write+0x138/0x218
> [67628.393581] other info that might help us debug this:
> [67628.393584] context-{2:2}
> [67628.393586] 4 locks held by sugov:0/192:
> [67628.393589] #0: ffff0008059cb720 (&sg_policy->work_lock){+.+.}-{4:4}, at: sugov_work+0x58/0x88
> [67628.393603] #1: ffff800015446f20 (prepare_lock){+.+.}-{4:4}, at: clk_prepare_lock+0x34/0xb0
> [67628.393618] #2: ffff8000152aaa60 (console_lock){+.+.}-{0:0}, at: vprintk_emit+0x12c/0x310
> [67628.393632] #3: ffff8000152aab88 (console_owner){-.-.}-{0:0}, at: console_unlock+0x190/0x6d8
> [67628.393646] stack backtrace:
> [67628.393649] CPU: 0 PID: 192 Comm: sugov:0 Not tainted 5.11.0-rc3-00019-g86be331946f7 #37
> [67628.393653] Hardware name: ARM Juno development board (r2) (DT)
> [67628.393656] Call trace:
> [67628.393659] dump_backtrace+0x0/0x1b0
> [67628.393661] show_stack+0x20/0x70
> [67628.393664] dump_stack+0xf8/0x168
> [67628.393666] __lock_acquire+0x964/0x1c88
> [67628.393669] lock_acquire+0x26c/0x500
> [67628.393671] _raw_spin_lock+0x64/0x88
> [67628.393674] pl011_console_write+0x138/0x218
> [67628.393677] console_unlock+0x2c4/0x6d8
> [67628.393680] vprintk_emit+0x134/0x310
> [67628.393682] vprintk_default+0x40/0x50
> [67628.393685] vprintk_func+0xfc/0x2b0
> [67628.393687] printk+0x68/0x90
> [67628.393690] hrtimer_interrupt+0x24c/0x250
> [67628.393693] arch_timer_handler_phys+0x3c/0x50
> [67628.393695] handle_percpu_devid_irq+0xd8/0x460
> [67628.393698] generic_handle_irq+0x38/0x50
> [67628.393701] __handle_domain_irq+0x6c/0xc8
> [67628.393704] gic_handle_irq+0xb0/0xf0
> [67628.393706] el1_irq+0xb4/0x180
> [67628.393709] _raw_spin_unlock_irqrestore+0x58/0xa8
> [67628.393712] hrtimer_start_range_ns+0x1b0/0x420
> [67628.393715] msg_submit+0x100/0x108
> [67628.393717] mbox_send_message+0x84/0x128
> [67628.393720] scpi_send_message+0x11c/0x2a8
> [67628.393723] scpi_dvfs_set_idx+0x48/0x70
> [67628.393726] scpi_dvfs_set_rate+0x60/0x78
> [67628.393728] clk_change_rate+0x184/0x8a8
> [67628.393731] clk_core_set_rate_nolock+0x1c0/0x280
> [67628.393734] clk_set_rate+0x40/0x98
> [67628.393736] scpi_cpufreq_set_target+0x40/0x68
> [67628.393739] __cpufreq_driver_target+0x1a0/0x5d0

Thanks.

> > Why is it invalid? Is this... -rt kernel, perhaps?
>
> No I was running on Linus master at the time.
>
> AFAIU printk will hold the console lock which could sleep in interrupt context.
>
> Did I miss something?

printk() is not permitted to sleep/schedule/etc and it never does.
Generally it should be OK to call it from IRQ (module recursion paths).

-ss

2021-01-26 21:43:05

by John Ogness

[permalink] [raw]
Subject: Re: [PATCH 1/2] printk: Add new pr_*_deferred_once() variants

On 2021-01-25, Peter Zijlstra <[email protected]> wrote:
> On Sat, Jan 23, 2021 at 11:37:40PM +0000, Qais Yousef wrote:
>> To allow users in code where printk is not allowed.
>>
>> Signed-off-by: Qais Yousef <[email protected]>
>> ---
>> include/linux/printk.h | 24 ++++++++++++++++++++++++
>> 1 file changed, 24 insertions(+)
>>
>> diff --git a/include/linux/printk.h b/include/linux/printk.h
>> index fe7eb2351610..92c0978c7b44 100644
>> --- a/include/linux/printk.h
>> +++ b/include/linux/printk.h
>> @@ -480,21 +480,45 @@ extern int kptr_restrict;
>> printk_once(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
>> /* no pr_cont_once, don't do that... */
>>
>> +#define pr_emerg_deferred_once(fmt, ...) \
>> + printk_deferred_once(KERN_EMERG pr_fmt(fmt), ##__VA_ARGS__)
>> +#define pr_alert_deferred_once(fmt, ...) \
>> + printk_deferred_once(KERN_ALERT pr_fmt(fmt), ##__VA_ARGS__)
>> +#define pr_crit_deferred_once(fmt, ...) \
>> + printk_deferred_once(KERN_CRIT pr_fmt(fmt), ##__VA_ARGS__)
>> +#define pr_err_deferred_once(fmt, ...) \
>> + printk_deferred_once(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
>> +#define pr_warn_deferred_once(fmt, ...) \
>> + printk_deferred_once(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__)
>> +#define pr_notice_deferred_once(fmt, ...) \
>> + printk_deferred_once(KERN_NOTICE pr_fmt(fmt), ##__VA_ARGS__)
>> +#define pr_info_deferred_once(fmt, ...) \
>> + printk_deferred_once(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
>> +/* no pr_cont_deferred_once, don't do that... */
>
> I absolutely detest this.. the whole _deferred thing is an
> abomination.

And it will disappear at some point.

> We should be very close to printk not needing this anymore, printk
> people?

It will disappear once console printing threads are introduced. We
probably still have a few kernel releases until we see that. First we
need to finish merging full lockless access, remove the safe buffers,
and merge the atomic consoles.

John Ogness

2021-01-27 00:37:47

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH 0/2] Fix BUG: Invalid wait context in hrtimer_interrupt()

On Wed 2021-01-27 00:58:33, Sergey Senozhatsky wrote:
> On (21/01/26 14:59), Qais Yousef wrote:
> > On 01/26/21 13:46, Sergey Senozhatsky wrote:
> > > On (21/01/23 23:37), Qais Yousef wrote:
>
> [..]
>
> > I reported it here: https://lore.kernel.org/lkml/20210114124512.mg3vexudfmoadef5@e107158-lin/
> >
> > # [67628.388606] hrtimer: interrupt took 304720 ns
> > [67628.393546]
> > [67628.393550] =============================
> > [67628.393554] [ BUG: Invalid wait context ]
> > [67628.393557] 5.11.0-rc3-00019-g86be331946f7 #37 Not tainted
> > [67628.393560] -----------------------------
> > [67628.393563] sugov:0/192 is trying to lock:
> > [67628.393566] ffff000800b1d898 (&port_lock_key){-.-.}-{3:3}, at: pl011_console_write+0x138/0x218
> > [67628.393581] other info that might help us debug this:
> > [67628.393584] context-{2:2}
> > [67628.393586] 4 locks held by sugov:0/192:
> > [67628.393589] #0: ffff0008059cb720 (&sg_policy->work_lock){+.+.}-{4:4}, at: sugov_work+0x58/0x88
> > [67628.393603] #1: ffff800015446f20 (prepare_lock){+.+.}-{4:4}, at: clk_prepare_lock+0x34/0xb0
> > [67628.393618] #2: ffff8000152aaa60 (console_lock){+.+.}-{0:0}, at: vprintk_emit+0x12c/0x310
> > [67628.393632] #3: ffff8000152aab88 (console_owner){-.-.}-{0:0}, at: console_unlock+0x190/0x6d8
> > [67628.393646] stack backtrace:
> > [67628.393649] CPU: 0 PID: 192 Comm: sugov:0 Not tainted 5.11.0-rc3-00019-g86be331946f7 #37
> > [67628.393653] Hardware name: ARM Juno development board (r2) (DT)
> > [67628.393656] Call trace:
> > [67628.393659] dump_backtrace+0x0/0x1b0
> > [67628.393661] show_stack+0x20/0x70
> > [67628.393664] dump_stack+0xf8/0x168
> > [67628.393666] __lock_acquire+0x964/0x1c88
> > [67628.393669] lock_acquire+0x26c/0x500
> > [67628.393671] _raw_spin_lock+0x64/0x88
> > [67628.393674] pl011_console_write+0x138/0x218
> > [67628.393677] console_unlock+0x2c4/0x6d8
> > [67628.393680] vprintk_emit+0x134/0x310
> > [67628.393682] vprintk_default+0x40/0x50
> > [67628.393685] vprintk_func+0xfc/0x2b0
> > [67628.393687] printk+0x68/0x90
> > [67628.393690] hrtimer_interrupt+0x24c/0x250
> > [67628.393693] arch_timer_handler_phys+0x3c/0x50
> > [67628.393695] handle_percpu_devid_irq+0xd8/0x460
> > [67628.393698] generic_handle_irq+0x38/0x50
> > [67628.393701] __handle_domain_irq+0x6c/0xc8
> > [67628.393704] gic_handle_irq+0xb0/0xf0
> > [67628.393706] el1_irq+0xb4/0x180
> > [67628.393709] _raw_spin_unlock_irqrestore+0x58/0xa8
> > [67628.393712] hrtimer_start_range_ns+0x1b0/0x420
> > [67628.393715] msg_submit+0x100/0x108
> > [67628.393717] mbox_send_message+0x84/0x128
> > [67628.393720] scpi_send_message+0x11c/0x2a8
> > [67628.393723] scpi_dvfs_set_idx+0x48/0x70
> > [67628.393726] scpi_dvfs_set_rate+0x60/0x78
> > [67628.393728] clk_change_rate+0x184/0x8a8
> > [67628.393731] clk_core_set_rate_nolock+0x1c0/0x280
> > [67628.393734] clk_set_rate+0x40/0x98
> > [67628.393736] scpi_cpufreq_set_target+0x40/0x68
> > [67628.393739] __cpufreq_driver_target+0x1a0/0x5d0
>
> Thanks.
>
> > > Why is it invalid? Is this... -rt kernel, perhaps?
> >
> > No I was running on Linus master at the time.
> >
> > AFAIU printk will hold the console lock which could sleep in interrupt context.
> >
> > Did I miss something?
>
> printk() is not permitted to sleep/schedule/etc and it never does.
> Generally it should be OK to call it from IRQ (module recursion paths).

To make it clear. printk() takes the mutex via console_trylock().
It does not sleep when the lock is not available. But lockdep should
be aware of this.

Honestly, I am not sure what the lockdep complains about here.
IMHO, the key is that we try to take lock

(&port_lock_key){-.-.}-{3:3}

in context

context-{2:2}

where {-.-.} looks fine. Both dots and dashes looks safe, see
get_usage_char()

So, it is likely the {3:3} vs. {2:2}. I have never seen this.
It has been added by the commit de8f5e4f2dc1f032 ("lockdep: Introduce
wait-type checks").

Might it be that we take normal mutex under a raw one?

It would be safe in normal kernel. But it would be bad
in RT one. But lockep should complain about it even in
the normal kernel.

Best Regards,
Petr

2021-01-27 06:21:35

by Qais Yousef

[permalink] [raw]
Subject: Re: [PATCH 0/2] Fix BUG: Invalid wait context in hrtimer_interrupt()

On 01/26/21 17:23, Peter Zijlstra wrote:
> On Wed, Jan 27, 2021 at 12:58:33AM +0900, Sergey Senozhatsky wrote:
> > On (21/01/26 14:59), Qais Yousef wrote:
>
> > > # [67628.388606] hrtimer: interrupt took 304720 ns
> > > [67628.393546]
> > > [67628.393550] =============================
> > > [67628.393554] [ BUG: Invalid wait context ]
> > > [67628.393557] 5.11.0-rc3-00019-g86be331946f7 #37 Not tainted
> > > [67628.393560] -----------------------------
> > > [67628.393563] sugov:0/192 is trying to lock:
> > > [67628.393566] ffff000800b1d898 (&port_lock_key){-.-.}-{3:3}, at: pl011_console_write+0x138/0x218
> > > [67628.393581] other info that might help us debug this:
> > > [67628.393584] context-{2:2}
> > > [67628.393586] 4 locks held by sugov:0/192:
> > > [67628.393589] #0: ffff0008059cb720 (&sg_policy->work_lock){+.+.}-{4:4}, at: sugov_work+0x58/0x88
> > > [67628.393603] #1: ffff800015446f20 (prepare_lock){+.+.}-{4:4}, at: clk_prepare_lock+0x34/0xb0
> > > [67628.393618] #2: ffff8000152aaa60 (console_lock){+.+.}-{0:0}, at: vprintk_emit+0x12c/0x310
> > > [67628.393632] #3: ffff8000152aab88 (console_owner){-.-.}-{0:0}, at: console_unlock+0x190/0x6d8
>
> > > Did I miss something?
> >
> > printk() is not permitted to sleep/schedule/etc and it never does.
> > Generally it should be OK to call it from IRQ (module recursion paths).
>
> The report is that it is trying to acquire spin_lock() while holding
> raw_spin_lock(), which is an invalid lock nesting.
>
> Note that this is CONFIG_PROVE_RAW_LOCK_NESTING=y which specifically
> checks for this.
>
> On current (mainline) kernel configs this is not yet a problem, but the
> moment we do land PREEMPT_RT this order will be problematic.

I should have dug more into the history of printk() and the meaning of the
splat. Sorry for the noise.

Looking at v5.10.8-rt24 the following fix is applied in RT

https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/commit/drivers/tty/serial/amba-pl011.c?h=linux-5.10.y-rt&id=008cc77aff249e830e5eb90b7ae3a6784597b8cf

which is what John suggested.

Looking at the locks held

> > > [67628.393589] #0: ffff0008059cb720 (&sg_policy->work_lock){+.+.}-{4:4}, at: sugov_work+0x58/0x88
> > > [67628.393603] #1: ffff800015446f20 (prepare_lock){+.+.}-{4:4}, at: clk_prepare_lock+0x34/0xb0

These two are mutexes.

> > > [67628.393618] #2: ffff8000152aaa60 (console_lock){+.+.}-{0:0}, at: vprintk_emit+0x12c/0x310

This is a semaphore.

> > > [67628.393632] #3: ffff8000152aab88 (console_owner){-.-.}-{0:0}, at: console_unlock+0x190/0x6d8

I think this is acquired by console_lock_spinning_enable() which has acquiring
syntax I'm not familiar with. console_owner_lock is defined as RAW_SPINLOCK, so
regardless of how it is acquired, it must be the problem.

Looks like John has reworked this code in RT too. So maybe this is just a red
herring after all..

https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/commit/kernel/printk/printk.c?h=linux-5.10.y-rt&id=0097798fd99948d3ffea535005eee7eb3b14fd06

Thanks

--
Qais Yousef

2021-01-27 19:43:40

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH 0/2] Fix BUG: Invalid wait context in hrtimer_interrupt()

On (21/01/23 23:37), Qais Yousef wrote:
>
> I hit a pr_warn() inside hrtimer_interrupt() which lead to a BUG: Invalid wait
> context splat.
>
> The problem wasn't reproducible but I think the cause is obvious, printk can't
> be called from interrupt context.
>
> AFAICU printk_deferred() is safe from NMI, so I assumed it is safe to be called
> from hrtimer_interrupt() too. Adding a pr_warn_once() inside
> hrtimer_interrupt() in a location where it is always hit produces the BUG
> splat. Replacing it with pr_warn_deferred_once() generates the printk warning
> without any splat.

Could you please post the lockdep splat?
Why is it invalid? Is this... -rt kernel, perhaps?

-ss

2022-01-11 13:42:18

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH 1/2] printk: Add new pr_*_deferred_once() variants

On Mon, Jan 25, 2021 at 12:04:01PM +0106, John Ogness wrote:
> On 2021-01-25, Peter Zijlstra <[email protected]> wrote:
> > On Sat, Jan 23, 2021 at 11:37:40PM +0000, Qais Yousef wrote:

...

> > We should be very close to printk not needing this anymore, printk
> > people?
>
> It will disappear once console printing threads are introduced. We
> probably still have a few kernel releases until we see that. First we
> need to finish merging full lockless access, remove the safe buffers,
> and merge the atomic consoles.

Where are we now at this?

I almost easily can reproduce the below at boot time for last several releases

cblist_init_generic: Setting adjustable number of callback queues.

=============================
[ BUG: Invalid wait context ]
5.16.0-next-20220111-00101-gb3e9f395c6f1 #28 Not tainted
-----------------------------
swapper/0/1 is trying to lock:
ffffffffb3dce758 (&port_lock_key){....}-{3:3}, at: serial8250_console_write+0x3a8/0x480
other info that might help us debug this:
context-{5:5}
3 locks held by swapper/0/1:
#0: ffffffffb2955e08 (rcu_tasks.cbs_gbl_lock){....}-{2:2}, at: cblist_init_generic.constprop.0+0x14/0x1e0
#1: ffffffffb2953ec0 (console_lock){+.+.}-{0:0}, at: _printk+0x53/0x6a
#2: ffffffffb2873ca0 (console_owner){....}-{0:0}, at: console_unlock+0x157/0x580
stack backtrace:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.16.0-next-20220111-00101-gb3e9f395c6f1 #28
Hardware name: Intel Corporation CHERRYVIEW D0 PLATFORM/Braswell CRB, BIOS BRAS.X64.B082.R00.1507270557 07/27/2015
Call Trace:
<TASK>
dump_stack_lvl+0x57/0x7d
__lock_acquire.cold+0xca/0x2e8
? stack_trace_save+0x46/0x70
lock_acquire+0xd2/0x300
? serial8250_console_write+0x3a8/0x480
_raw_spin_lock_irqsave+0x39/0x50
? serial8250_console_write+0x3a8/0x480
serial8250_console_write+0x3a8/0x480
console_unlock+0x397/0x580
vprintk_emit+0x99/0x2e0
_printk+0x53/0x6a
cblist_init_generic.constprop.0.cold+0x24/0x31
rcu_init_tasks_generic+0x5/0xd9
kernel_init_freeable+0x14d/0x294
? rest_init+0x250/0x250
kernel_init+0x11/0x110
ret_from_fork+0x22/0x30
</TASK>
cblist_init_generic: Setting shift to 2 and lim to 1.

--
With Best Regards,
Andy Shevchenko



2022-01-13 11:20:30

by Qais Yousef

[permalink] [raw]
Subject: Re: [PATCH 1/2] printk: Add new pr_*_deferred_once() variants

On 01/11/22 15:40, Andy Shevchenko wrote:
> On Mon, Jan 25, 2021 at 12:04:01PM +0106, John Ogness wrote:
> > On 2021-01-25, Peter Zijlstra <[email protected]> wrote:
> > > On Sat, Jan 23, 2021 at 11:37:40PM +0000, Qais Yousef wrote:
>
> ...
>
> > > We should be very close to printk not needing this anymore, printk
> > > people?
> >
> > It will disappear once console printing threads are introduced. We
> > probably still have a few kernel releases until we see that. First we
> > need to finish merging full lockless access, remove the safe buffers,
> > and merge the atomic consoles.
>
> Where are we now at this?

I think this answer from Peter then the summarizes the problem

https://lore.kernel.org/lkml/[email protected]/

"""
Note that this is CONFIG_PROVE_RAW_LOCK_NESTING=y which specifically
checks for this.

On current (mainline) kernel configs this is not yet a problem, but the
moment we do land PREEMPT_RT this order will be problematic.
"""

PREEMPT_RT patchset has fixes that should land on mainline eventually and AFAIU
this is a red herring on mainline. Although this might be a new call site that
breaks the rule for PREEMPT_RT and they'd want to know about it.

I wonder if CONFIG_PROVE_RAW_LOCK_NESTING should depend on PREEMPT_RT until the
fixes land - assuming they haven't, I'm not following its state.

HTH.

Cheers

--
Qais Yousef

>
> I almost easily can reproduce the below at boot time for last several releases
>
> cblist_init_generic: Setting adjustable number of callback queues.
>
> =============================
> [ BUG: Invalid wait context ]
> 5.16.0-next-20220111-00101-gb3e9f395c6f1 #28 Not tainted
> -----------------------------
> swapper/0/1 is trying to lock:
> ffffffffb3dce758 (&port_lock_key){....}-{3:3}, at: serial8250_console_write+0x3a8/0x480
> other info that might help us debug this:
> context-{5:5}
> 3 locks held by swapper/0/1:
> #0: ffffffffb2955e08 (rcu_tasks.cbs_gbl_lock){....}-{2:2}, at: cblist_init_generic.constprop.0+0x14/0x1e0
> #1: ffffffffb2953ec0 (console_lock){+.+.}-{0:0}, at: _printk+0x53/0x6a
> #2: ffffffffb2873ca0 (console_owner){....}-{0:0}, at: console_unlock+0x157/0x580
> stack backtrace:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.16.0-next-20220111-00101-gb3e9f395c6f1 #28
> Hardware name: Intel Corporation CHERRYVIEW D0 PLATFORM/Braswell CRB, BIOS BRAS.X64.B082.R00.1507270557 07/27/2015
> Call Trace:
> <TASK>
> dump_stack_lvl+0x57/0x7d
> __lock_acquire.cold+0xca/0x2e8
> ? stack_trace_save+0x46/0x70
> lock_acquire+0xd2/0x300
> ? serial8250_console_write+0x3a8/0x480
> _raw_spin_lock_irqsave+0x39/0x50
> ? serial8250_console_write+0x3a8/0x480
> serial8250_console_write+0x3a8/0x480
> console_unlock+0x397/0x580
> vprintk_emit+0x99/0x2e0
> _printk+0x53/0x6a
> cblist_init_generic.constprop.0.cold+0x24/0x31
> rcu_init_tasks_generic+0x5/0xd9
> kernel_init_freeable+0x14d/0x294
> ? rest_init+0x250/0x250
> kernel_init+0x11/0x110
> ret_from_fork+0x22/0x30
> </TASK>
> cblist_init_generic: Setting shift to 2 and lim to 1.
>
> --
> With Best Regards,
> Andy Shevchenko
>
>