2023-09-13 02:07:55

by Liam R. Howlett

[permalink] [raw]
Subject: [PATCH] init/main: Clear boot task idle flag

Initial booting is setting the task flag to idle (PF_IDLE) by the call
path sched_init() -> init_idle(). Having the task idle and calling
call_rcu() in kernel/rcu/tiny.c means that TIF_NEED_RESCHED will be
set. Subsequent calls to any cond_resched() will enable IRQs,
potentially earlier than the IRQ setup has completed. Recent changes
have caused just this scenario and IRQs have been enabled early.

This causes a warning later in start_kernel() as interrupts are enabled
before they are fully set up.

Fix this issue by clearing the PF_IDLE flag on return from sched_init()
and restore the flag in rest_init(). Although the boot task was marked
as idle since (at least) d80e4fda576d, I am not sure that it is wrong to
do so. The forced context-switch on idle task was introduced in the
tiny_rcu update, so I'm going to claim this fixes 5f6130fa52ee.

Link: https://lore.kernel.org/linux-mm/[email protected]/
Link: https://lore.kernel.org/linux-mm/CAMuHMdWpvpWoDa=Ox-do92czYRvkok6_x6pYUH+ZouMcJbXy+Q@mail.gmail.com/
Fixes: 5f6130fa52ee ("tiny_rcu: Directly force QS when call_rcu_[bh|sched]() on idle_task")
Cc: [email protected]
Cc: Geert Uytterhoeven <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Andreas Schwab <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Peng Zhang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: "Mike Rapoport (IBM)" <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Liam R. Howlett <[email protected]>
---
init/main.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/init/main.c b/init/main.c
index ad920fac325c..f74772acf612 100644
--- a/init/main.c
+++ b/init/main.c
@@ -696,7 +696,7 @@ noinline void __ref __noreturn rest_init(void)
*/
rcu_read_lock();
tsk = find_task_by_pid_ns(pid, &init_pid_ns);
- tsk->flags |= PF_NO_SETAFFINITY;
+ tsk->flags |= PF_NO_SETAFFINITY | PF_IDLE;
set_cpus_allowed_ptr(tsk, cpumask_of(smp_processor_id()));
rcu_read_unlock();

@@ -938,6 +938,8 @@ void start_kernel(void)
* time - but meanwhile we still have a functioning scheduler.
*/
sched_init();
+ /* Avoid early context switch, rest_init() restores PF_IDLE */
+ current->flags &= ~PF_IDLE;

if (WARN(!irqs_disabled(),
"Interrupts were enabled *very* early, fixing it\n"))
--
2.39.2


2023-09-13 11:30:15

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] init/main: Clear boot task idle flag

On Wed, Sep 13, 2023 at 01:01:39PM +0200, Peter Zijlstra wrote:
> On Tue, Sep 12, 2023 at 08:56:47PM -0400, Liam R. Howlett wrote:
> > Initial booting is setting the task flag to idle (PF_IDLE) by the call
> > path sched_init() -> init_idle(). Having the task idle and calling
> > call_rcu() in kernel/rcu/tiny.c means that TIF_NEED_RESCHED will be
> > set. Subsequent calls to any cond_resched() will enable IRQs,
> > potentially earlier than the IRQ setup has completed. Recent changes
> > have caused just this scenario and IRQs have been enabled early.
> >
> > This causes a warning later in start_kernel() as interrupts are enabled
> > before they are fully set up.
> >
> > Fix this issue by clearing the PF_IDLE flag on return from sched_init()
> > and restore the flag in rest_init(). Although the boot task was marked
> > as idle since (at least) d80e4fda576d, I am not sure that it is wrong to
> > do so. The forced context-switch on idle task was introduced in the
> > tiny_rcu update, so I'm going to claim this fixes 5f6130fa52ee.
> >
> > Link: https://lore.kernel.org/linux-mm/[email protected]/
> > Link: https://lore.kernel.org/linux-mm/CAMuHMdWpvpWoDa=Ox-do92czYRvkok6_x6pYUH+ZouMcJbXy+Q@mail.gmail.com/
> > Fixes: 5f6130fa52ee ("tiny_rcu: Directly force QS when call_rcu_[bh|sched]() on idle_task")
> > Cc: [email protected]
> > Cc: Geert Uytterhoeven <[email protected]>
> > Cc: "Paul E. McKenney" <[email protected]>
> > Cc: Christophe Leroy <[email protected]>
> > Cc: Andreas Schwab <[email protected]>
> > Cc: Matthew Wilcox <[email protected]>
> > Cc: Peng Zhang <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Juri Lelli <[email protected]>
> > Cc: Vincent Guittot <[email protected]>
> > Cc: Andrew Morton <[email protected]>
> > Cc: "Mike Rapoport (IBM)" <[email protected]>
> > Cc: Vlastimil Babka <[email protected]>
> > Signed-off-by: Liam R. Howlett <[email protected]>
> > ---
> > init/main.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/init/main.c b/init/main.c
> > index ad920fac325c..f74772acf612 100644
> > --- a/init/main.c
> > +++ b/init/main.c
> > @@ -696,7 +696,7 @@ noinline void __ref __noreturn rest_init(void)
> > */
> > rcu_read_lock();
> > tsk = find_task_by_pid_ns(pid, &init_pid_ns);
> > - tsk->flags |= PF_NO_SETAFFINITY;
> > + tsk->flags |= PF_NO_SETAFFINITY | PF_IDLE;
> > set_cpus_allowed_ptr(tsk, cpumask_of(smp_processor_id()));
> > rcu_read_unlock();
> >
> > @@ -938,6 +938,8 @@ void start_kernel(void)
> > * time - but meanwhile we still have a functioning scheduler.
> > */
> > sched_init();
> > + /* Avoid early context switch, rest_init() restores PF_IDLE */
> > + current->flags &= ~PF_IDLE;
> >
> > if (WARN(!irqs_disabled(),
> > "Interrupts were enabled *very* early, fixing it\n"))
>
> Hurmph... so since this is about IRQs, would it not make sense to have
> the | PF_IDLE near 'early_boot_irqs_disabled = false' ?
>
> Or, alternatively, make the tinyrcu thing check that variable?

We could do that, but do we really the decidedly non-idle early boot
sequence designated as idle?

What surprises me is that this is just now showing up. The ingredients
for this one have been in place for almost 10 years.

Thanx, Paul

2023-09-13 13:06:06

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH] init/main: Clear boot task idle flag

Hi Liam,

On Wed, Sep 13, 2023 at 2:57 AM Liam R. Howlett <[email protected]> wrote:
> Initial booting is setting the task flag to idle (PF_IDLE) by the call
> path sched_init() -> init_idle(). Having the task idle and calling
> call_rcu() in kernel/rcu/tiny.c means that TIF_NEED_RESCHED will be
> set. Subsequent calls to any cond_resched() will enable IRQs,
> potentially earlier than the IRQ setup has completed. Recent changes
> have caused just this scenario and IRQs have been enabled early.
>
> This causes a warning later in start_kernel() as interrupts are enabled
> before they are fully set up.
>
> Fix this issue by clearing the PF_IDLE flag on return from sched_init()
> and restore the flag in rest_init(). Although the boot task was marked
> as idle since (at least) d80e4fda576d, I am not sure that it is wrong to
> do so. The forced context-switch on idle task was introduced in the
> tiny_rcu update, so I'm going to claim this fixes 5f6130fa52ee.
>
> Link: https://lore.kernel.org/linux-mm/[email protected]/
> Link: https://lore.kernel.org/linux-mm/CAMuHMdWpvpWoDa=Ox-do92czYRvkok6_x6pYUH+ZouMcJbXy+Q@mail.gmail.com/
> Fixes: 5f6130fa52ee ("tiny_rcu: Directly force QS when call_rcu_[bh|sched]() on idle_task")
> Cc: [email protected]
> Cc: Geert Uytterhoeven <[email protected]>
> Cc: "Paul E. McKenney" <[email protected]>
> Cc: Christophe Leroy <[email protected]>
> Cc: Andreas Schwab <[email protected]>
> Cc: Matthew Wilcox <[email protected]>
> Cc: Peng Zhang <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Juri Lelli <[email protected]>
> Cc: Vincent Guittot <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: "Mike Rapoport (IBM)" <[email protected]>
> Cc: Vlastimil Babka <[email protected]>
> Signed-off-by: Liam R. Howlett <[email protected]>

Thanks for your patch!

This fixes the

WARNING: CPU: 0 PID: 0 at init/main.c:992 start_kernel+0x2f0/0x480

I was seeing during boot on Renesas RZ/A1 and RZ/A2 since commit
cfeb6ae8bcb96ccf ("maple_tree: disable mas_wr_append() when other
readers are possible") in v6.5.

However, this causes lots of new warnings on e.g. R-Car M2-W, which
wasn't affected before, like:

+BUG: sleeping function called from invalid context at
kernel/sched/completion.c:111
+in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
+preempt_count: 0, expected: 0
+no locks held by swapper/0/1.
+CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W
6.6.0-rc1-koelsch-02359-g9a5bed6a8bd8 #1864
+Hardware name: Generic R-Car Gen2 (Flattened Device Tree)
+ unwind_backtrace from show_stack+0x10/0x14
+ show_stack from dump_stack_lvl+0x68/0x90
+ dump_stack_lvl from __might_resched+0x1b0/0x22c
+ __might_resched from __wait_for_common+0x38/0x170
+ __wait_for_common from kernel_init+0x10/0x130
+ kernel_init from ret_from_fork+0x14/0x30
+Exception stack(0xf0815fb0 to 0xf0815ff8)
+5fa0: 00000000 00000000 00000000 00000000
+5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
+5fe0: 00000000 00000000 00000000 00000000 00000013 00000000

and

+BUG: sleeping function called from invalid context at
kernel/locking/mutex.c:580
+in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1, name: systemd
+preempt_count: 0, expected: 0
+no locks held by systemd/1.
+CPU: 1 PID: 1 Comm: systemd Tainted: G W N
6.6.0-rc1-koelsch-02359-g9a5bed6a8bd8 #1864
+Hardware name: Generic R-Car Gen2 (Flattened Device Tree)
+ unwind_backtrace from show_stack+0x10/0x14
+ show_stack from dump_stack_lvl+0x68/0x90
+ dump_stack_lvl from __might_resched+0x1b0/0x22c
+ __might_resched from __mutex_lock+0x34/0x3a8
+ __mutex_lock from mutex_lock_nested+0x18/0x20
+ mutex_lock_nested from do_epoll_wait+0x220/0x5ac
+ do_epoll_wait from sys_epoll_wait+0x44/0x6c
+ sys_epoll_wait from ret_fast_syscall+0x0/0x1c
+Exception stack(0xf0815fa8 to 0xf0815ff0)
+5fa0: beff75e0 00000004 00000004 beff75e0 0000003b ffffffff
+5fc0: beff75e0 00000004 ffffffff 000000fc 00000000 ffffffff 0000003b 00000004
+5fe0: 000000fc beff75b8 b6ef8511 b6e75786

Gr{oetje,eeting}s,

Geert


--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2023-09-13 13:27:00

by Liam R. Howlett

[permalink] [raw]
Subject: Re: [PATCH] init/main: Clear boot task idle flag

* Paul E. McKenney <[email protected]> [230913 07:28]:
> On Wed, Sep 13, 2023 at 01:01:39PM +0200, Peter Zijlstra wrote:
> > On Tue, Sep 12, 2023 at 08:56:47PM -0400, Liam R. Howlett wrote:
> > > Initial booting is setting the task flag to idle (PF_IDLE) by the call
> > > path sched_init() -> init_idle(). Having the task idle and calling
> > > call_rcu() in kernel/rcu/tiny.c means that TIF_NEED_RESCHED will be
> > > set. Subsequent calls to any cond_resched() will enable IRQs,
> > > potentially earlier than the IRQ setup has completed. Recent changes
> > > have caused just this scenario and IRQs have been enabled early.
> > >
> > > This causes a warning later in start_kernel() as interrupts are enabled
> > > before they are fully set up.
> > >
> > > Fix this issue by clearing the PF_IDLE flag on return from sched_init()
> > > and restore the flag in rest_init(). Although the boot task was marked
> > > as idle since (at least) d80e4fda576d, I am not sure that it is wrong to
> > > do so. The forced context-switch on idle task was introduced in the
> > > tiny_rcu update, so I'm going to claim this fixes 5f6130fa52ee.
> > >
> > > Link: https://lore.kernel.org/linux-mm/[email protected]/
> > > Link: https://lore.kernel.org/linux-mm/CAMuHMdWpvpWoDa=Ox-do92czYRvkok6_x6pYUH+ZouMcJbXy+Q@mail.gmail.com/
> > > Fixes: 5f6130fa52ee ("tiny_rcu: Directly force QS when call_rcu_[bh|sched]() on idle_task")
> > > Cc: [email protected]
> > > Cc: Geert Uytterhoeven <[email protected]>
> > > Cc: "Paul E. McKenney" <[email protected]>
> > > Cc: Christophe Leroy <[email protected]>
> > > Cc: Andreas Schwab <[email protected]>
> > > Cc: Matthew Wilcox <[email protected]>
> > > Cc: Peng Zhang <[email protected]>
> > > Cc: Peter Zijlstra <[email protected]>
> > > Cc: Ingo Molnar <[email protected]>
> > > Cc: Juri Lelli <[email protected]>
> > > Cc: Vincent Guittot <[email protected]>
> > > Cc: Andrew Morton <[email protected]>
> > > Cc: "Mike Rapoport (IBM)" <[email protected]>
> > > Cc: Vlastimil Babka <[email protected]>
> > > Signed-off-by: Liam R. Howlett <[email protected]>
> > > ---
> > > init/main.c | 4 +++-
> > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/init/main.c b/init/main.c
> > > index ad920fac325c..f74772acf612 100644
> > > --- a/init/main.c
> > > +++ b/init/main.c
> > > @@ -696,7 +696,7 @@ noinline void __ref __noreturn rest_init(void)
> > > */
> > > rcu_read_lock();
> > > tsk = find_task_by_pid_ns(pid, &init_pid_ns);
> > > - tsk->flags |= PF_NO_SETAFFINITY;
> > > + tsk->flags |= PF_NO_SETAFFINITY | PF_IDLE;
> > > set_cpus_allowed_ptr(tsk, cpumask_of(smp_processor_id()));
> > > rcu_read_unlock();
> > >
> > > @@ -938,6 +938,8 @@ void start_kernel(void)
> > > * time - but meanwhile we still have a functioning scheduler.
> > > */
> > > sched_init();
> > > + /* Avoid early context switch, rest_init() restores PF_IDLE */
> > > + current->flags &= ~PF_IDLE;
> > >
> > > if (WARN(!irqs_disabled(),
> > > "Interrupts were enabled *very* early, fixing it\n"))
> >
> > Hurmph... so since this is about IRQs, would it not make sense to have
> > the | PF_IDLE near 'early_boot_irqs_disabled = false' ?

I was thinking that this isn't an idle thread until the end of boot, so
leave it set as not idle until the end.

> >
> > Or, alternatively, make the tinyrcu thing check that variable?
>
> We could do that, but do we really the decidedly non-idle early boot
> sequence designated as idle?

call_rcu() tiny is called more than this code, so unless there is
another reason we want to check the IRQ status it seemed best to change
the boot task flag. I mean, I think the is_idle_task() check is valid
in every other context, right?

>
> What surprises me is that this is just now showing up. The ingredients
> for this one have been in place for almost 10 years.
>

Am I just that lucky?

Cheers,
Liam

2023-09-13 17:44:48

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] init/main: Clear boot task idle flag

On Tue, Sep 12, 2023 at 08:56:47PM -0400, Liam R. Howlett wrote:
> Initial booting is setting the task flag to idle (PF_IDLE) by the call
> path sched_init() -> init_idle(). Having the task idle and calling
> call_rcu() in kernel/rcu/tiny.c means that TIF_NEED_RESCHED will be
> set. Subsequent calls to any cond_resched() will enable IRQs,
> potentially earlier than the IRQ setup has completed. Recent changes
> have caused just this scenario and IRQs have been enabled early.
>
> This causes a warning later in start_kernel() as interrupts are enabled
> before they are fully set up.
>
> Fix this issue by clearing the PF_IDLE flag on return from sched_init()
> and restore the flag in rest_init(). Although the boot task was marked
> as idle since (at least) d80e4fda576d, I am not sure that it is wrong to
> do so. The forced context-switch on idle task was introduced in the
> tiny_rcu update, so I'm going to claim this fixes 5f6130fa52ee.
>
> Link: https://lore.kernel.org/linux-mm/[email protected]/
> Link: https://lore.kernel.org/linux-mm/CAMuHMdWpvpWoDa=Ox-do92czYRvkok6_x6pYUH+ZouMcJbXy+Q@mail.gmail.com/
> Fixes: 5f6130fa52ee ("tiny_rcu: Directly force QS when call_rcu_[bh|sched]() on idle_task")
> Cc: [email protected]
> Cc: Geert Uytterhoeven <[email protected]>
> Cc: "Paul E. McKenney" <[email protected]>
> Cc: Christophe Leroy <[email protected]>
> Cc: Andreas Schwab <[email protected]>
> Cc: Matthew Wilcox <[email protected]>
> Cc: Peng Zhang <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Juri Lelli <[email protected]>
> Cc: Vincent Guittot <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: "Mike Rapoport (IBM)" <[email protected]>
> Cc: Vlastimil Babka <[email protected]>
> Signed-off-by: Liam R. Howlett <[email protected]>
> ---
> init/main.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/init/main.c b/init/main.c
> index ad920fac325c..f74772acf612 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -696,7 +696,7 @@ noinline void __ref __noreturn rest_init(void)
> */
> rcu_read_lock();
> tsk = find_task_by_pid_ns(pid, &init_pid_ns);
> - tsk->flags |= PF_NO_SETAFFINITY;
> + tsk->flags |= PF_NO_SETAFFINITY | PF_IDLE;
> set_cpus_allowed_ptr(tsk, cpumask_of(smp_processor_id()));
> rcu_read_unlock();
>
> @@ -938,6 +938,8 @@ void start_kernel(void)
> * time - but meanwhile we still have a functioning scheduler.
> */
> sched_init();
> + /* Avoid early context switch, rest_init() restores PF_IDLE */
> + current->flags &= ~PF_IDLE;
>
> if (WARN(!irqs_disabled(),
> "Interrupts were enabled *very* early, fixing it\n"))


Hurmph... so since this is about IRQs, would it not make sense to have
the | PF_IDLE near 'early_boot_irqs_disabled = false' ?

Or, alternatively, make the tinyrcu thing check that variable?

2023-09-13 18:29:00

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] init/main: Clear boot task idle flag

On Tue, Sep 12, 2023 at 08:56:47PM -0400, Liam R. Howlett wrote:

> diff --git a/init/main.c b/init/main.c
> index ad920fac325c..f74772acf612 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -696,7 +696,7 @@ noinline void __ref __noreturn rest_init(void)
> */
> rcu_read_lock();
> tsk = find_task_by_pid_ns(pid, &init_pid_ns);
> - tsk->flags |= PF_NO_SETAFFINITY;
> + tsk->flags |= PF_NO_SETAFFINITY | PF_IDLE;
> set_cpus_allowed_ptr(tsk, cpumask_of(smp_processor_id()));
> rcu_read_unlock();
>

Hmm, isn't that pid-1 you're setting PF_IDLE on?

The task becoming idle is 'current' at this point, see the
cpu_startup_entry() call below.

Would not something like so be the right thing?


diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2299a5cfbfb9..802551e0009b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9269,7 +9269,7 @@ void __init init_idle(struct task_struct *idle, int cpu)
* PF_KTHREAD should already be set at this point; regardless, make it
* look like a proper per-CPU kthread.
*/
- idle->flags |= PF_IDLE | PF_KTHREAD | PF_NO_SETAFFINITY;
+ idle->flags |= PF_KTHREAD | PF_NO_SETAFFINITY;
kthread_set_per_cpu(idle, cpu);

#ifdef CONFIG_SMP
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 342f58a329f5..5007b25c5bc6 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -373,6 +373,7 @@ EXPORT_SYMBOL_GPL(play_idle_precise);

void cpu_startup_entry(enum cpuhp_state state)
{
+ current->flags |= PF_IDLE;
arch_cpu_idle_prepare();
cpuhp_online_idle(state);
while (1)

2023-09-13 19:33:48

by Liam R. Howlett

[permalink] [raw]
Subject: Re: [PATCH] init/main: Clear boot task idle flag

* Peter Zijlstra <[email protected]> [230913 12:13]:
> On Wed, Sep 13, 2023 at 10:51:25AM -0400, Liam R. Howlett wrote:
> > * Peter Zijlstra <[email protected]> [230913 09:53]:
> > > On Tue, Sep 12, 2023 at 08:56:47PM -0400, Liam R. Howlett wrote:
> > >
> > > > diff --git a/init/main.c b/init/main.c
> > > > index ad920fac325c..f74772acf612 100644
> > > > --- a/init/main.c
> > > > +++ b/init/main.c
> > > > @@ -696,7 +696,7 @@ noinline void __ref __noreturn rest_init(void)
> > > > */
> > > > rcu_read_lock();
> > > > tsk = find_task_by_pid_ns(pid, &init_pid_ns);
> > > > - tsk->flags |= PF_NO_SETAFFINITY;
> > > > + tsk->flags |= PF_NO_SETAFFINITY | PF_IDLE;
> > > > set_cpus_allowed_ptr(tsk, cpumask_of(smp_processor_id()));
> > > > rcu_read_unlock();
> > > >
> > >
> > > Hmm, isn't that pid-1 you're setting PF_IDLE on?
> >
> > Yes, thanks. I think that is what Geert is hitting with my patch.
> >
> > debug __might_resched() in kernel/sched/core.c is failing to return in
> > that first (complex) if statement. His report says pid 1 so this is
> > likely the issue.
> >
> > >
> > > The task becoming idle is 'current' at this point, see the
> > > cpu_startup_entry() call below.
> > >
> > > Would not something like so be the right thing?
> > >
> > >
> > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > > index 2299a5cfbfb9..802551e0009b 100644
> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -9269,7 +9269,7 @@ void __init init_idle(struct task_struct *idle, int cpu)
> > > * PF_KTHREAD should already be set at this point; regardless, make it
> > > * look like a proper per-CPU kthread.
> > > */
> > > - idle->flags |= PF_IDLE | PF_KTHREAD | PF_NO_SETAFFINITY;
> > > + idle->flags |= PF_KTHREAD | PF_NO_SETAFFINITY;
> >
> > I am concerned this will alter more than just the current task, which
> > would mean more modifications later. There is a comment about it being
> > called 'more than once' and 'per cpu' so I am hesitant to change the
> > function itself.
> >
> > Although I am unsure of the call path.. fork_idle() -> init_idle() I
> > guess?
>
> There's only 2 ways to get into do_idle(), through cpu_startup_entry()
> and play_idle_precise(). The latter already frobs PF_IDLE since it is
> the forced idle path, this then leaves cpu_startup_entry() which is the
> regular idle path.
>
> All idle threads will end up calling into it, the boot CPU through the
> rest_init() and the SMP cpus through arch SMP bringup.
>
> IOW, this ensures all idle loops will have PF_IDLE set but not the
> pre-idle loop setup code these threads run.

Thanks for the information. This does leave the init_idle() function in
the odd state of not setting PF_IDLE, but I guess that's okay?



2023-09-14 01:07:07

by Liam R. Howlett

[permalink] [raw]
Subject: Re: [PATCH] init/main: Clear boot task idle flag

* Peter Zijlstra <[email protected]> [230913 09:53]:
> On Tue, Sep 12, 2023 at 08:56:47PM -0400, Liam R. Howlett wrote:
>
> > diff --git a/init/main.c b/init/main.c
> > index ad920fac325c..f74772acf612 100644
> > --- a/init/main.c
> > +++ b/init/main.c
> > @@ -696,7 +696,7 @@ noinline void __ref __noreturn rest_init(void)
> > */
> > rcu_read_lock();
> > tsk = find_task_by_pid_ns(pid, &init_pid_ns);
> > - tsk->flags |= PF_NO_SETAFFINITY;
> > + tsk->flags |= PF_NO_SETAFFINITY | PF_IDLE;
> > set_cpus_allowed_ptr(tsk, cpumask_of(smp_processor_id()));
> > rcu_read_unlock();
> >
>
> Hmm, isn't that pid-1 you're setting PF_IDLE on?

Yes, thanks. I think that is what Geert is hitting with my patch.

debug __might_resched() in kernel/sched/core.c is failing to return in
that first (complex) if statement. His report says pid 1 so this is
likely the issue.

>
> The task becoming idle is 'current' at this point, see the
> cpu_startup_entry() call below.
>
> Would not something like so be the right thing?
>
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 2299a5cfbfb9..802551e0009b 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -9269,7 +9269,7 @@ void __init init_idle(struct task_struct *idle, int cpu)
> * PF_KTHREAD should already be set at this point; regardless, make it
> * look like a proper per-CPU kthread.
> */
> - idle->flags |= PF_IDLE | PF_KTHREAD | PF_NO_SETAFFINITY;
> + idle->flags |= PF_KTHREAD | PF_NO_SETAFFINITY;

I am concerned this will alter more than just the current task, which
would mean more modifications later. There is a comment about it being
called 'more than once' and 'per cpu' so I am hesitant to change the
function itself.

Although I am unsure of the call path.. fork_idle() -> init_idle() I
guess?

> kthread_set_per_cpu(idle, cpu);
>
> #ifdef CONFIG_SMP
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index 342f58a329f5..5007b25c5bc6 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -373,6 +373,7 @@ EXPORT_SYMBOL_GPL(play_idle_precise);
>
> void cpu_startup_entry(enum cpuhp_state state)
> {
> + current->flags |= PF_IDLE;
> arch_cpu_idle_prepare();
> cpuhp_online_idle(state);
> while (1)

2023-09-14 02:54:04

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] init/main: Clear boot task idle flag

On Wed, Sep 13, 2023 at 10:51:25AM -0400, Liam R. Howlett wrote:
> * Peter Zijlstra <[email protected]> [230913 09:53]:
> > On Tue, Sep 12, 2023 at 08:56:47PM -0400, Liam R. Howlett wrote:
> >
> > > diff --git a/init/main.c b/init/main.c
> > > index ad920fac325c..f74772acf612 100644
> > > --- a/init/main.c
> > > +++ b/init/main.c
> > > @@ -696,7 +696,7 @@ noinline void __ref __noreturn rest_init(void)
> > > */
> > > rcu_read_lock();
> > > tsk = find_task_by_pid_ns(pid, &init_pid_ns);
> > > - tsk->flags |= PF_NO_SETAFFINITY;
> > > + tsk->flags |= PF_NO_SETAFFINITY | PF_IDLE;
> > > set_cpus_allowed_ptr(tsk, cpumask_of(smp_processor_id()));
> > > rcu_read_unlock();
> > >
> >
> > Hmm, isn't that pid-1 you're setting PF_IDLE on?
>
> Yes, thanks. I think that is what Geert is hitting with my patch.
>
> debug __might_resched() in kernel/sched/core.c is failing to return in
> that first (complex) if statement. His report says pid 1 so this is
> likely the issue.
>
> >
> > The task becoming idle is 'current' at this point, see the
> > cpu_startup_entry() call below.
> >
> > Would not something like so be the right thing?
> >
> >
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 2299a5cfbfb9..802551e0009b 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -9269,7 +9269,7 @@ void __init init_idle(struct task_struct *idle, int cpu)
> > * PF_KTHREAD should already be set at this point; regardless, make it
> > * look like a proper per-CPU kthread.
> > */
> > - idle->flags |= PF_IDLE | PF_KTHREAD | PF_NO_SETAFFINITY;
> > + idle->flags |= PF_KTHREAD | PF_NO_SETAFFINITY;
>
> I am concerned this will alter more than just the current task, which
> would mean more modifications later. There is a comment about it being
> called 'more than once' and 'per cpu' so I am hesitant to change the
> function itself.
>
> Although I am unsure of the call path.. fork_idle() -> init_idle() I
> guess?

There's only 2 ways to get into do_idle(), through cpu_startup_entry()
and play_idle_precise(). The latter already frobs PF_IDLE since it is
the forced idle path, this then leaves cpu_startup_entry() which is the
regular idle path.

All idle threads will end up calling into it, the boot CPU through the
rest_init() and the SMP cpus through arch SMP bringup.

IOW, this ensures all idle loops will have PF_IDLE set but not the
pre-idle loop setup code these threads run.

2023-09-14 11:45:16

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] init/main: Clear boot task idle flag

On Wed, Sep 13, 2023 at 01:32:38PM -0400, Liam R. Howlett wrote:
> * Peter Zijlstra <[email protected]> [230913 12:13]:
> > On Wed, Sep 13, 2023 at 10:51:25AM -0400, Liam R. Howlett wrote:
> > > * Peter Zijlstra <[email protected]> [230913 09:53]:
> > > > On Tue, Sep 12, 2023 at 08:56:47PM -0400, Liam R. Howlett wrote:
> > > >
> > > > > diff --git a/init/main.c b/init/main.c
> > > > > index ad920fac325c..f74772acf612 100644
> > > > > --- a/init/main.c
> > > > > +++ b/init/main.c
> > > > > @@ -696,7 +696,7 @@ noinline void __ref __noreturn rest_init(void)
> > > > > */
> > > > > rcu_read_lock();
> > > > > tsk = find_task_by_pid_ns(pid, &init_pid_ns);
> > > > > - tsk->flags |= PF_NO_SETAFFINITY;
> > > > > + tsk->flags |= PF_NO_SETAFFINITY | PF_IDLE;
> > > > > set_cpus_allowed_ptr(tsk, cpumask_of(smp_processor_id()));
> > > > > rcu_read_unlock();
> > > > >
> > > >
> > > > Hmm, isn't that pid-1 you're setting PF_IDLE on?
> > >
> > > Yes, thanks. I think that is what Geert is hitting with my patch.
> > >
> > > debug __might_resched() in kernel/sched/core.c is failing to return in
> > > that first (complex) if statement. His report says pid 1 so this is
> > > likely the issue.
> > >
> > > >
> > > > The task becoming idle is 'current' at this point, see the
> > > > cpu_startup_entry() call below.
> > > >
> > > > Would not something like so be the right thing?
> > > >
> > > >
> > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > > > index 2299a5cfbfb9..802551e0009b 100644
> > > > --- a/kernel/sched/core.c
> > > > +++ b/kernel/sched/core.c
> > > > @@ -9269,7 +9269,7 @@ void __init init_idle(struct task_struct *idle, int cpu)
> > > > * PF_KTHREAD should already be set at this point; regardless, make it
> > > > * look like a proper per-CPU kthread.
> > > > */
> > > > - idle->flags |= PF_IDLE | PF_KTHREAD | PF_NO_SETAFFINITY;
> > > > + idle->flags |= PF_KTHREAD | PF_NO_SETAFFINITY;
> > >
> > > I am concerned this will alter more than just the current task, which
> > > would mean more modifications later. There is a comment about it being
> > > called 'more than once' and 'per cpu' so I am hesitant to change the
> > > function itself.
> > >
> > > Although I am unsure of the call path.. fork_idle() -> init_idle() I
> > > guess?
> >
> > There's only 2 ways to get into do_idle(), through cpu_startup_entry()
> > and play_idle_precise(). The latter already frobs PF_IDLE since it is
> > the forced idle path, this then leaves cpu_startup_entry() which is the
> > regular idle path.
> >
> > All idle threads will end up calling into it, the boot CPU through the
> > rest_init() and the SMP cpus through arch SMP bringup.
> >
> > IOW, this ensures all idle loops will have PF_IDLE set but not the
> > pre-idle loop setup code these threads run.
>
> Thanks for the information. This does leave the init_idle() function in
> the odd state of not setting PF_IDLE, but I guess that's okay?

Yep, the few things that care about PF_IDLE seem to really only care
about do_idle() and very much not (per the rcutiny thing) any code that
comes before it.

2023-09-15 00:58:46

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH] init/main: Clear boot task idle flag



Hello,

kernel test robot noticed "BUG:sleeping_function_called_from_invalid_context_at_kernel/sched/completion.c" on:

commit: afa19b70494c22ba27224f90c0805e88e5db41e1 ("[PATCH] init/main: Clear boot task idle flag")
url: https://github.com/intel-lab-lkp/linux/commits/Liam-R-Howlett/init-main-Clear-boot-task-idle-flag/20230913-085921
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 3669558bdf354cd352be955ef2764cde6a9bf5ec
patch link: https://lore.kernel.org/all/[email protected]/
patch subject: [PATCH] init/main: Clear boot task idle flag

in testcase: boot

compiler: gcc-12
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)


besides BUG:sleeping_function_called_from_invalid_context_at_kernel/sched/completion.c,
we also observed other issues on this commit but clean on parent:


3669558bdf354cd3 afa19b70494c22ba27224f90c08
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:6 17% 1:6 dmesg.BUG:sleeping_function_called_from_invalid_context_at_drivers/dma-buf/selftest.c
:6 17% 1:6 dmesg.BUG:sleeping_function_called_from_invalid_context_at_drivers/net/ethernet/intel/e1000_hw.c
:6 100% 6:6 dmesg.BUG:sleeping_function_called_from_invalid_context_at_include/linux/sched/mm.h
:6 83% 5:6 dmesg.BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/mutex.c
:6 50% 3:6 dmesg.BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/rwsem.c
:6 17% 1:6 dmesg.BUG:sleeping_function_called_from_invalid_context_at_kernel/rcu/tree.c
:6 100% 6:6 dmesg.BUG:sleeping_function_called_from_invalid_context_at_kernel/sched/completion.c
:6 100% 6:6 dmesg.RIP:__do_softirq
:6 100% 6:6 dmesg.RIP:__flush_smp_call_function_queue
:6 33% 2:6 dmesg.RIP:__handle_irq_event_percpu
:6 100% 6:6 dmesg.RIP:__hrtimer_run_queues
:6 100% 6:6 dmesg.RIP:__note_gp_changes
:6 100% 6:6 dmesg.RIP:__queue_work
:6 100% 6:6 dmesg.RIP:__raise_softirq_irqoff
:6 100% 6:6 dmesg.RIP:__run_timers
:6 100% 6:6 dmesg.RIP:__smp_call_single_queue
:6 100% 6:6 dmesg.RIP:__sysvec_apic_timer_interrupt
:6 100% 6:6 dmesg.RIP:__sysvec_call_function_single
:6 50% 3:6 dmesg.RIP:__update_load_avg_blocked_se
:6 100% 6:6 dmesg.RIP:__update_load_avg_cfs_rq
:6 100% 6:6 dmesg.RIP:__update_load_avg_se
:6 100% 6:6 dmesg.RIP:__wake_nocb_gp
:6 67% 4:6 dmesg.RIP:_raw_spin_unlock_irqrestore
:6 100% 6:6 dmesg.RIP:arch_ftrace_ops_list_func
:6 100% 6:6 dmesg.RIP:attach_entity_load_avg
:6 100% 6:6 dmesg.RIP:call_timer_fn
:6 100% 6:6 dmesg.RIP:default_idle
:6 100% 6:6 dmesg.RIP:dequeue_task_fair
:6 100% 6:6 dmesg.RIP:detach_entity_load_avg
:6 33% 2:6 dmesg.RIP:detach_if_pending
:6 67% 4:6 dmesg.RIP:dma_fence_signal_timestamp_locked
:6 33% 2:6 dmesg.RIP:do_nocb_deferred_wakeup
:6 100% 6:6 dmesg.RIP:do_nocb_deferred_wakeup_timer
:6 100% 6:6 dmesg.RIP:enqueue_hrtimer
:6 100% 6:6 dmesg.RIP:enqueue_task_fair
:6 100% 6:6 dmesg.RIP:enqueue_timer
:6 17% 1:6 dmesg.RIP:hrtimer_init
:6 67% 4:6 dmesg.RIP:load_balance
:6 100% 6:6 dmesg.RIP:lock_acquire
:6 100% 6:6 dmesg.RIP:lock_acquired
:6 100% 6:6 dmesg.RIP:lock_contended
:6 100% 6:6 dmesg.RIP:lock_release
:6 33% 2:6 dmesg.RIP:printk_sprint
:6 17% 1:6 dmesg.RIP:rcu_barrier_trace
:6 100% 6:6 dmesg.RIP:rcu_core
:6 100% 6:6 dmesg.RIP:rcu_is_cpu_rrupt_from_idle
:6 83% 5:6 dmesg.RIP:rcu_qs
:6 100% 6:6 dmesg.RIP:rcu_report_qs_rnp
:6 100% 6:6 dmesg.RIP:rcu_sched_clock_irq
:6 50% 3:6 dmesg.RIP:resched_curr
:6 100% 6:6 dmesg.RIP:set_task_cpu
:6 100% 6:6 dmesg.RIP:trace_hardirqs_off
:6 100% 6:6 dmesg.RIP:trace_hardirqs_off_finish
:6 100% 6:6 dmesg.RIP:trace_hardirqs_on
:6 100% 6:6 dmesg.RIP:trace_reschedule_entry
:6 100% 6:6 dmesg.RIP:trace_reschedule_exit
:6 100% 6:6 dmesg.RIP:try_to_wake_up
:6 100% 6:6 dmesg.RIP:ttwu_do_activate
:6 100% 6:6 dmesg.RIP:update_cpu_capacity
:6 100% 6:6 dmesg.RIP:update_curr
:6 100% 6:6 dmesg.RIP:update_dl_rq_load_avg
:6 100% 6:6 dmesg.RIP:update_irq_load_avg
:6 100% 6:6 dmesg.RIP:update_rt_rq_load_avg
:6 100% 6:6 dmesg.RIP:update_sd_lb_stats
:6 17% 1:6 dmesg.RIP:wake_up_nohz_cpu
:6 100% 6:6 dmesg.WARNING:at_arch/x86/include/asm/trace/irq_vectors.h:#__sysvec_apic_timer_interrupt
:6 100% 6:6 dmesg.WARNING:at_arch/x86/include/asm/trace/irq_vectors.h:#__sysvec_call_function_single
:6 100% 6:6 dmesg.WARNING:at_arch/x86/include/asm/trace/irq_vectors.h:#trace_reschedule_entry
:6 100% 6:6 dmesg.WARNING:at_arch/x86/include/asm/trace/irq_vectors.h:#trace_reschedule_exit
:6 100% 6:6 dmesg.WARNING:at_include/linux/trace_recursion.h:#arch_ftrace_ops_list_func
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/csd.h:#__flush_smp_call_function_queue
:6 67% 4:6 dmesg.WARNING:at_include/trace/events/dma_fence.h:#dma_fence_signal_timestamp_locked
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/ipi.h:#__smp_call_single_queue
:6 50% 3:6 dmesg.WARNING:at_include/trace/events/ipi.h:#resched_curr
:6 17% 1:6 dmesg.WARNING:at_include/trace/events/ipi.h:#wake_up_nohz_cpu
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/irq.h:#__do_softirq
:6 33% 2:6 dmesg.WARNING:at_include/trace/events/irq.h:#__handle_irq_event_percpu
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/irq.h:#__raise_softirq_irqoff
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/lock.h:#lock_acquire
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/lock.h:#lock_acquired
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/lock.h:#lock_contended
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/lock.h:#lock_release
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/preemptirq.h:#trace_hardirqs_off
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/preemptirq.h:#trace_hardirqs_off_finish
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/preemptirq.h:#trace_hardirqs_on
:6 33% 2:6 dmesg.WARNING:at_include/trace/events/printk.h:#printk_sprint
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/rcu.h:#__note_gp_changes
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/rcu.h:#__wake_nocb_gp
:6 33% 2:6 dmesg.WARNING:at_include/trace/events/rcu.h:#do_nocb_deferred_wakeup
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/rcu.h:#do_nocb_deferred_wakeup_timer
:6 17% 1:6 dmesg.WARNING:at_include/trace/events/rcu.h:#rcu_barrier_trace
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/rcu.h:#rcu_core
:6 83% 5:6 dmesg.WARNING:at_include/trace/events/rcu.h:#rcu_qs
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/rcu.h:#rcu_report_qs_rnp
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/rcu.h:#rcu_sched_clock_irq
:6 50% 3:6 dmesg.WARNING:at_include/trace/events/sched.h:#__update_load_avg_blocked_se
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/sched.h:#__update_load_avg_cfs_rq
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/sched.h:#__update_load_avg_se
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/sched.h:#attach_entity_load_avg
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/sched.h:#dequeue_task_fair
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/sched.h:#detach_entity_load_avg
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/sched.h:#enqueue_task_fair
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/sched.h:#set_task_cpu
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/sched.h:#try_to_wake_up
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/sched.h:#ttwu_do_activate
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/sched.h:#update_cpu_capacity
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/sched.h:#update_curr
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/sched.h:#update_dl_rq_load_avg
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/sched.h:#update_irq_load_avg
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/sched.h:#update_rt_rq_load_avg
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/sched.h:#update_sd_lb_stats
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/timer.h:#__hrtimer_run_queues
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/timer.h:#__run_timers
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/timer.h:#call_timer_fn
:6 33% 2:6 dmesg.WARNING:at_include/trace/events/timer.h:#detach_if_pending
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/timer.h:#enqueue_hrtimer
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/timer.h:#enqueue_timer
:6 17% 1:6 dmesg.WARNING:at_include/trace/events/timer.h:#hrtimer_init
:6 100% 6:6 dmesg.WARNING:at_include/trace/events/workqueue.h:#__queue_work
:6 100% 6:6 dmesg.WARNING:at_kernel/rcu/tree.c:#rcu_is_cpu_rrupt_from_idle
:6 17% 1:6 dmesg.WARNING:inconsistent_lock_state
:6 100% 6:6 dmesg.WARNING:suspicious_RCU_usage
:6 33% 2:6 dmesg.arch/x86/kernel/irq.c:#IRQ_failed_to_wake_up_RCU
:6 100% 6:6 dmesg.include/linux/cgroup.h:#suspicious_rcu_dereference_check()usage
:6 100% 6:6 dmesg.include/linux/rcupdate.h:#rcu_read_lock()used_illegally_while_idle
:6 100% 6:6 dmesg.include/linux/rcupdate.h:#rcu_read_unlock()used_illegally_while_idle
:6 17% 1:6 dmesg.inconsistent{IN-HARDIRQ-W}->{HARDIRQ-ON-W}usage
:6 100% 6:6 dmesg.kernel/events/core.c:#suspicious_rcu_dereference_check()usage
:6 83% 5:6 dmesg.kernel/exit.c:#suspicious_rcu_dereference_check()usage
:6 100% 6:6 dmesg.kernel/rcu/tree.c:#RCU_dynticks_nmi_nesting_counter_underflow/zero
:6 67% 4:6 dmesg.kernel/sched/core.c:#suspicious_rcu_dereference_check()usage
:6 100% 6:6 dmesg.kernel/sched/fair.c:#suspicious_rcu_dereference_check()usage
:6 83% 5:6 dmesg.kernel/workqueue.c:#RCU_or_wq_pool_mutex_should_be_held
:6 83% 5:6 dmesg.kernel/workqueue.c:#suspicious_rcu_dereference_check()usage



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-lkp/[email protected]


[ 2.318233][ T1] BUG: sleeping function called from invalid context at kernel/sched/completion.c:111
[ 2.318647][ T1] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
[ 2.319725][ T1] preempt_count: 0, expected: 0
[ 2.320293][ T1] no locks held by swapper/0/1.
[ 2.320882][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.6.0-rc1-00034-gafa19b70494c #1
[ 2.321957][ T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 2.321957][ T1] Call Trace:
[ 2.321957][ T1] <TASK>
[ 2.321957][ T1] dump_stack_lvl (lib/dump_stack.c:107)
[ 2.321957][ T1] __might_resched (kernel/sched/core.c:10188)
[ 2.321957][ T1] ? sched_core_find (kernel/sched/core.c:10142)
[ 2.321957][ T1] __wait_for_common (kernel/sched/completion.c:115)
[ 2.321957][ T1] ? usleep_range_state (kernel/time/timer.c:2129)
[ 2.321957][ T1] ? reacquire_held_locks (kernel/locking/lockdep.c:5404)
[ 2.321957][ T1] ? wait_for_completion_io_timeout (kernel/sched/completion.c:110)
[ 2.321957][ T1] ? do_raw_spin_lock (kernel/locking/spinlock_debug.c:121)
[ 2.321957][ T1] ? rest_init (init/main.c:1431)
[ 2.321957][ T1] kernel_init (init/main.c:1439)
[ 2.321957][ T1] ? _raw_spin_unlock_irq (arch/x86/include/asm/irqflags.h:42 arch/x86/include/asm/irqflags.h:77 include/linux/spinlock_api_smp.h:159 kernel/locking/spinlock.c:202)
[ 2.321957][ T1] ret_from_fork (arch/x86/kernel/process.c:153)
[ 2.321957][ T1] ? rest_init (init/main.c:1431)
[ 2.321957][ T1] ret_from_fork_asm (arch/x86/entry/entry_64.S:312)
[ 2.321957][ T1] </TASK>
[ 2.322276][ T1] smpboot: CPU0: Intel Xeon E312xx (Sandy Bridge) (family: 0x6, model: 0x2a, stepping: 0x1)
[ 2.327212][ T1] RCU Tasks: Setting shift to 1 and lim to 1 rcu_task_cb_adjust=1.
[ 2.328461][ T1] RCU Tasks Rude: Setting shift to 1 and lim to 1 rcu_task_cb_adjust=1.
[ 2.330045][ T1] RCU Tasks Trace: Setting shift to 1 and lim to 1 rcu_task_cb_adjust=1.
[ 2.331385][ T1] Running RCU-tasks wait API self tests
[ 2.333117][ T1] Performance Events: unsupported p6 CPU model 42 no PMU driver, software events only.
[ 2.334654][ T1] signal: max sigframe size: 1776
[ 2.335915][ T1] rcu: Hierarchical SRCU implementation.
[ 2.336653][ T1] rcu: Max phase no-delay instances is 1000.
[ 2.342550][ T14] Callback from call_rcu_tasks_trace() invoked.
[ 2.344299][ T1] smp: Bringing up secondary CPUs ...
[ 2.347251][ T1] smpboot: x86: Booting SMP configuration:
[ 2.347999][ T1] .... node #0, CPUs: #1
[ 0.487313][ T0] masked ExtINT on CPU#1
[ 2.348624][ T0] ------------[ cut here ]------------
[ 2.348624][ T0] WARNING: CPU: 0 PID: 0 at include/trace/events/preemptirq.h:36 trace_hardirqs_off_finish (include/trace/events/preemptirq.h:36 include/trace/events/preemptirq.h:36 kernel/trace/trace_preemptirq.c:78 kernel/trace/trace_preemptirq.c:73)
[ 2.348624][ T0] Modules linked in:
[ 2.348624][ T0] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 6.6.0-rc1-00034-gafa19b70494c #1
[ 2.348624][ T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 2.348624][ T0] RIP: 0010:trace_hardirqs_off_finish (include/trace/events/preemptirq.h:36 include/trace/events/preemptirq.h:36 kernel/trace/trace_preemptirq.c:78 kernel/trace/trace_preemptirq.c:73)
[ 2.348624][ T0] Code: 03 80 3c 02 00 75 2a 48 8b 7b 08 31 d2 48 89 ee e8 99 f4 ff ff 65 ff 0d 0a 69 ce 7e e9 5d ff ff ff e8 38 18 e9 ff 84 c0 75 80 <0f> 0b e9 79 ff ff ff e8 c8 c0 3a 00 eb cf 66 0f 1f 44 00 00 65 8b
All code
========
0: 03 80 3c 02 00 75 add 0x7500023c(%rax),%eax
6: 2a 48 8b sub -0x75(%rax),%cl
9: 7b 08 jnp 0x13
b: 31 d2 xor %edx,%edx
d: 48 89 ee mov %rbp,%rsi
10: e8 99 f4 ff ff call 0xfffffffffffff4ae
15: 65 ff 0d 0a 69 ce 7e decl %gs:0x7ece690a(%rip) # 0x7ece6926
1c: e9 5d ff ff ff jmp 0xffffffffffffff7e
21: e8 38 18 e9 ff call 0xffffffffffe9185e
26: 84 c0 test %al,%al
28: 75 80 jne 0xffffffffffffffaa
2a:* 0f 0b ud2 <-- trapping instruction
2c: e9 79 ff ff ff jmp 0xffffffffffffffaa
31: e8 c8 c0 3a 00 call 0x3ac0fe
36: eb cf jmp 0x7
38: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
3e: 65 gs
3f: 8b .byte 0x8b

Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: e9 79 ff ff ff jmp 0xffffffffffffff80
7: e8 c8 c0 3a 00 call 0x3ac0d4
c: eb cf jmp 0xffffffffffffffdd
e: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
14: 65 gs
15: 8b .byte 0x8b
[ 2.348624][ T0] RSP: 0000:ffffffff84a07d60 EFLAGS: 00010046
[ 2.348624][ T0] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff81513ac2
[ 2.348624][ T0] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffffff85a79540
[ 2.348624][ T0] RBP: ffffffff8352c418 R08: 0000000000000000 R09: fffffbfff0b4f2a8
[ 2.348624][ T0] R10: ffffffff85a79547 R11: 0000000000000000 R12: 0000000000000000
[ 2.348624][ T0] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 2.348624][ T0] FS: 0000000000000000(0000) GS:ffff8883aea00000(0000) knlGS:0000000000000000
[ 2.348624][ T0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.348624][ T0] CR2: ffff88843ffff000 CR3: 0000000004a89000 CR4: 00000000000406f0
[ 2.348624][ T0] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2.348624][ T0] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2.348624][ T0] Call Trace:
[ 2.348624][ T0] <TASK>
[ 2.348624][ T0] ? __warn (kernel/panic.c:673)
[ 2.348624][ T0] ? trace_hardirqs_off_finish (include/trace/events/preemptirq.h:36 include/trace/events/preemptirq.h:36 kernel/trace/trace_preemptirq.c:78 kernel/trace/trace_preemptirq.c:73)
[ 2.348624][ T0] ? report_bug (lib/bug.c:180 lib/bug.c:219)
[ 2.348624][ T0] ? handle_bug (arch/x86/kernel/traps.c:237)
[ 2.348624][ T0] ? exc_invalid_op (arch/x86/kernel/traps.c:258 (discriminator 1))
[ 2.348624][ T0] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:568)
[ 2.348624][ T0] ? irqentry_enter (kernel/entry/common.c:315)
[ 2.348624][ T0] ? trace_hardirqs_off_finish (arch/x86/include/asm/bitops.h:228 arch/x86/include/asm/bitops.h:240 include/asm-generic/bitops/instrumented-non-atomic.h:142 include/linux/cpumask.h:504 include/linux/cpumask.h:1082 include/trace/events/preemptirq.h:36 kernel/trace/trace_preemptirq.c:78 kernel/trace/trace_preemptirq.c:73)
[ 2.348624][ T0] ? trace_hardirqs_off_finish (include/trace/events/preemptirq.h:36 include/trace/events/preemptirq.h:36 kernel/trace/trace_preemptirq.c:78 kernel/trace/trace_preemptirq.c:73)
[ 2.348624][ T0] irqentry_enter (kernel/entry/common.c:315)
[ 2.348624][ T0] sysvec_call_function_single (arch/x86/kernel/smp.c:287)
[ 2.348624][ T0] asm_sysvec_call_function_single (arch/x86/include/asm/idtentry.h:652)
[ 2.348624][ T0] RIP: 0010:default_idle (arch/x86/include/asm/irqflags.h:37 arch/x86/include/asm/irqflags.h:72 arch/x86/kernel/process.c:750)
[ 2.348624][ T0] Code: 07 76 e7 48 89 07 49 c7 c0 08 00 00 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 72 ff ff ff cc cc cc cc eb 07 0f 00 2d 37 79 30 00 fb f4 <fa> c3 0f 1f 00 65 48 8b 04 25 40 a4 1f 00 f0 80 48 02 20 48 8b 10
All code
========
0: 07 (bad)
1: 76 e7 jbe 0xffffffffffffffea
3: 48 89 07 mov %rax,(%rdi)
6: 49 c7 c0 08 00 00 00 mov $0x8,%r8
d: 4d 29 c8 sub %r9,%r8
10: 4c 01 c7 add %r8,%rdi
13: 4c 29 c2 sub %r8,%rdx
16: e9 72 ff ff ff jmp 0xffffffffffffff8d
1b: cc int3
1c: cc int3
1d: cc int3
1e: cc int3
1f: eb 07 jmp 0x28
21: 0f 00 2d 37 79 30 00 verw 0x307937(%rip) # 0x30795f
28: fb sti
29: f4 hlt
2a:* fa cli <-- trapping instruction
2b: c3 ret
2c: 0f 1f 00 nopl (%rax)
2f: 65 48 8b 04 25 40 a4 mov %gs:0x1fa440,%rax
36: 1f 00
38: f0 80 48 02 20 lock orb $0x20,0x2(%rax)
3d: 48 8b 10 mov (%rax),%rdx

Code starting with the faulting instruction
===========================================
0: fa cli
1: c3 ret
2: 0f 1f 00 nopl (%rax)
5: 65 48 8b 04 25 40 a4 mov %gs:0x1fa440,%rax
c: 1f 00
e: f0 80 48 02 20 lock orb $0x20,0x2(%rax)
13: 48 8b 10 mov (%rax),%rdx
[ 2.348624][ T0] RSP: 0000:ffffffff84a07e40 EFLAGS: 00000206
[ 2.348624][ T0] RAX: 0000000000003056 RBX: 0000000000000000 RCX: ffffffff8352c7ba
[ 2.348624][ T0] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff812e3bd3
[ 2.348624][ T0] RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed1075d7ed9d
[ 2.348624][ T0] R10: ffff8883aebf6ceb R11: 0000000000000000 R12: 1ffffffff0940fca
[ 2.348624][ T0] R13: ffffffff84a1e940 R14: dffffc0000000000 R15: 0000000000013f10
[ 2.348624][ T0] ? ct_kernel_exit+0xca/0x100
[ 2.348624][ T0] ? cpuidle_idle_call (kernel/sched/idle.c:171)
[ 2.348624][ T0] default_idle_call (include/linux/cpuidle.h:143 kernel/sched/idle.c:98)
[ 2.348624][ T0] cpuidle_idle_call (kernel/sched/idle.c:171)
[ 2.348624][ T0] ? arch_cpu_idle_exit+0x30/0x30
[ 2.348624][ T0] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4565)
[ 2.348624][ T0] do_idle (kernel/sched/idle.c:282)
[ 2.348624][ T0] cpu_startup_entry (kernel/sched/idle.c:378 (discriminator 1))
[ 2.348624][ T0] rest_init (include/linux/rcupdate.h:750 init/main.c:697)
[ 2.348624][ T0] arch_call_rest_init+0xf/0x20
[ 2.348624][ T0] start_kernel (init/main.c:1021 (discriminator 1))
[ 2.348624][ T0] x86_64_start_reservations (arch/x86/kernel/head64.c:544)
[ 2.348624][ T0] x86_64_start_kernel (??:?)
[ 2.348624][ T0] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:433)
[ 2.348624][ T0] </TASK>
[ 2.348624][ T0] irq event stamp: 12375
[ 2.348624][ T0] hardirqs last enabled at (12374): cpuidle_idle_call (kernel/sched/idle.c:171)
[ 2.348624][ T0] hardirqs last disabled at (12375): sysvec_call_function_single (arch/x86/kernel/smp.c:287)
[ 2.348624][ T0] softirqs last enabled at (8563): __do_softirq (arch/x86/include/asm/preempt.h:27 kernel/softirq.c:400 kernel/softirq.c:582)
[ 2.348624][ T0] softirqs last disabled at (8558): __irq_exit_rcu (kernel/softirq.c:427 kernel/softirq.c:632)
[ 2.348624][ T0] ---[ end trace 0000000000000000 ]---


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230914/[email protected]



--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki