2017-07-13 07:07:56

by Tony Lindgren

[permalink] [raw]
Subject: Regression with suspicious RCU usage splats with cpu_pm change

Hi,

Looks like next-20170713 gives me a bunch of "suspicious RCU usage"
splats with cpuidle_coupled on duovero, see below. I bisected it down
to commit 2f027e003d05 ("cpu_pm: replace raw_notifier with
atomic_notifier").

Regards,

Tony

8< -----------------------
=============================
WARNING: suspicious RCU usage
4.12.0-next-20170713+ #118 Tainted: G W
-----------------------------
./include/linux/rcupdate.h:611 rcu_read_lock() used illegally while idle!
[ 2.928802]
other info that might help us debug this:
[ 2.928802]
[ 2.946777]
RCU used illegally from idle CPU!
rcu_scheduler_active = 2, debug_locks = 1
RCU used illegally from extended quiescent state!
1 lock held by swapper/1/0:
#0: (rcu_read_lock){......}, at: [<c01624fc>] __atomic_notifier_call_chain+0x0/0x150
[ 2.975433]
stack backtrace:
CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 4.12.0-next-20170713+ #118
Hardware name: Generic OMAP4 (Flattened Device Tree)
[<c0111524>] (unwind_backtrace) from [<c010d20c>] (show_stack+0x10/0x14)
[<c010d20c>] (show_stack) from [<c086259c>] (dump_stack+0xac/0xe0)
[<c086259c>] (dump_stack) from [<c0162624>] (__atomic_notifier_call_chain+0x128/0x150)
[<c0162624>] (__atomic_notifier_call_chain) from [<c023d9e8>] (cpu_pm_enter+0x24/0x64)
[<c023d9e8>] (cpu_pm_enter) from [<c0128e68>] (omap_enter_idle_coupled+0x188/0x208)
[<c0128e68>] (omap_enter_idle_coupled) from [<c069e640>] (cpuidle_enter_state+0x118/0x514)
[<c069e640>] (cpuidle_enter_state) from [<c06a07c0>] (cpuidle_enter_state_coupled+0x3a8/0x40c)
[<c06a07c0>] (cpuidle_enter_state_coupled) from [<c018d844>] (do_idle+0x1a8/0x21c)
[<c018d844>] (do_idle) from [<c018dc3c>] (cpu_startup_entry+0x18/0x1c)
[<c018dc3c>] (cpu_startup_entry) from [<8010164c>] (0x8010164c)
[ 3.088043]
=============================
WARNING: suspicious RCU usage
4.12.0-next-20170713+ #118 Tainted: G W
-----------------------------
./include/linux/rcupdate.h:611 rcu_read_lock() used illegally while idle!
[ 3.115661]
other info that might help us debug this:
[ 3.115661]
[ 3.123718]
RCU used illegally from idle CPU!
rcu_scheduler_active = 2, debug_locks = 1
RCU used illegally from extended quiescent state!
2 locks held by swapper/1/0:
#0: (rcu_read_lock){......}, at: [<c01624fc>] __atomic_notifier_call_chain+0x0/0x150
#1: (rcu_read_lock){......}, at: [<c0216434>] __is_insn_slot_addr+0x0/0x16c
[ 3.161773]
stack backtrace:
CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 4.12.0-next-20170713+ #118
Hardware name: Generic OMAP4 (Flattened Device Tree)
[<c0111524>] (unwind_backtrace) from [<c010d20c>] (show_stack+0x10/0x14)
[<c010d20c>] (show_stack) from [<c086259c>] (dump_stack+0xac/0xe0)
[<c086259c>] (dump_stack) from [<c0216578>] (__is_insn_slot_addr+0x144/0x16c)
[<c0216578>] (__is_insn_slot_addr) from [<c015eecc>] (kernel_text_address+0x70/0x9c)
[<c015eecc>] (kernel_text_address) from [<c0110f0c>] (unwind_frame+0x3c/0x654)
[<c0110f0c>] (unwind_frame) from [<c01115c0>] (unwind_backtrace+0x9c/0x120)
[<c01115c0>] (unwind_backtrace) from [<c010d20c>] (show_stack+0x10/0x14)
[<c010d20c>] (show_stack) from [<c086259c>] (dump_stack+0xac/0xe0)
[<c086259c>] (dump_stack) from [<c0162624>] (__atomic_notifier_call_chain+0x128/0x150)
[<c0162624>] (__atomic_notifier_call_chain) from [<c023d9e8>] (cpu_pm_enter+0x24/0x64)
[<c023d9e8>] (cpu_pm_enter) from [<c0128e68>] (omap_enter_idle_coupled+0x188/0x208)
[<c0128e68>] (omap_enter_idle_coupled) from [<c069e640>] (cpuidle_enter_state+0x118/0x514)
[<c069e640>] (cpuidle_enter_state) from [<c06a07c0>] (cpuidle_enter_state_coupled+0x3a8/0x40c)
[<c06a07c0>] (cpuidle_enter_state_coupled) from [<c018d844>] (do_idle+0x1a8/0x21c)
[<c018d844>] (do_idle) from [<c018dc3c>] (cpu_startup_entry+0x18/0x1c)
[<c018dc3c>] (cpu_startup_entry) from [<8010164c>] (0x8010164c)
[ 3.313781]
=============================
WARNING: suspicious RCU usage
4.12.0-next-20170713+ #118 Tainted: G W
-----------------------------
./include/linux/rcupdate.h:662 rcu_read_unlock() used illegally while idle!
[ 3.341583]
other info that might help us debug this:
[ 3.341583]
[ 3.349639]
RCU used illegally from idle CPU!
rcu_scheduler_active = 2, debug_locks = 1
RCU used illegally from extended quiescent state!
3 locks held by swapper/1/0:
#0: (rcu_read_lock){......}, at: [<c01624fc>] __atomic_notifier_call_chain+0x0/0x150
#1: (rcu_read_lock){......}, at: [<c0216434>] __is_insn_slot_addr+0x0/0x16c
#2: (rcu_read_lock){......}, at: [<c0216434>] __is_insn_slot_addr+0x0/0x16c
[ 3.395996]
stack backtrace:
CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 4.12.0-next-20170713+ #118
Hardware name: Generic OMAP4 (Flattened Device Tree)
[<c0111524>] (unwind_backtrace) from [<c010d20c>] (show_stack+0x10/0x14)
[<c010d20c>] (show_stack) from [<c086259c>] (dump_stack+0xac/0xe0)
[<c086259c>] (dump_stack) from [<c0216550>] (__is_insn_slot_addr+0x11c/0x16c)
[<c0216550>] (__is_insn_slot_addr) from [<c015eecc>] (kernel_text_address+0x70/0x9c)
[<c015eecc>] (kernel_text_address) from [<c0110f0c>] (unwind_frame+0x3c/0x654)
[<c0110f0c>] (unwind_frame) from [<c01115c0>] (unwind_backtrace+0x9c/0x120)
[<c01115c0>] (unwind_backtrace) from [<c010d20c>] (show_stack+0x10/0x14)
[<c010d20c>] (show_stack) from [<c086259c>] (dump_stack+0xac/0xe0)
[<c086259c>] (dump_stack) from [<c0216578>] (__is_insn_slot_addr+0x144/0x16c)
[<c0216578>] (__is_insn_slot_addr) from [<c015eecc>] (kernel_text_address+0x70/0x9c)
[<c015eecc>] (kernel_text_address) from [<c0110f0c>] (unwind_frame+0x3c/0x654)
[<c0110f0c>] (unwind_frame) from [<c01115c0>] (unwind_backtrace+0x9c/0x120)
[<c01115c0>] (unwind_backtrace) from [<c010d20c>] (show_stack+0x10/0x14)
[<c010d20c>] (show_stack) from [<c086259c>] (dump_stack+0xac/0xe0)
[<c086259c>] (dump_stack) from [<c0162624>] (__atomic_notifier_call_chain+0x128/0x150)
[<c0162624>] (__atomic_notifier_call_chain) from [<c023d9e8>] (cpu_pm_enter+0x24/0x64)
[<c023d9e8>] (cpu_pm_enter) from [<c0128e68>] (omap_enter_idle_coupled+0x188/0x208)
[<c0128e68>] (omap_enter_idle_coupled) from [<c069e640>] (cpuidle_enter_state+0x118/0x514)
[<c069e640>] (cpuidle_enter_state) from [<c06a07c0>] (cpuidle_enter_state_coupled+0x3a8/0x40c)
[<c06a07c0>] (cpuidle_enter_state_coupled) from [<c018d844>] (do_idle+0x1a8/0x21c)
[<c018d844>] (do_idle) from [<c018dc3c>] (cpu_startup_entry+0x18/0x1c)
[<c018dc3c>] (cpu_startup_entry) from [<8010164c>] (0x8010164c)
[ 3.596588]
=============================
WARNING: suspicious RCU usage
4.12.0-next-20170713+ #118 Tainted: G W
-----------------------------
./include/linux/rcupdate.h:662 rcu_read_unlock() used illegally while idle!
[ 3.624328]
other info that might help us debug this:
[ 3.624328]
[ 3.632385]
RCU used illegally from idle CPU!
rcu_scheduler_active = 2, debug_locks = 1
RCU used illegally from extended quiescent state!
1 lock held by swapper/1/0:
#0: (rcu_read_lock){......}, at: [<c01624fc>] __atomic_notifier_call_chain+0x0/0x150
[ 3.662048]
stack backtrace:
CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 4.12.0-next-20170713+ #118
Hardware name: Generic OMAP4 (Flattened Device Tree)
[<c0111524>] (unwind_backtrace) from [<c010d20c>] (show_stack+0x10/0x14)
[<c010d20c>] (show_stack) from [<c086259c>] (dump_stack+0xac/0xe0)
[<c086259c>] (dump_stack) from [<c01625fc>] (__atomic_notifier_call_chain+0x100/0x150)
[<c01625fc>] (__atomic_notifier_call_chain) from [<c023d9e8>] (cpu_pm_enter+0x24/0x64)
[<c023d9e8>] (cpu_pm_enter) from [<c0128e68>] (omap_enter_idle_coupled+0x188/0x208)
[<c0128e68>] (omap_enter_idle_coupled) from [<c069e640>] (cpuidle_enter_state+0x118/0x514)
[<c069e640>] (cpuidle_enter_state) from [<c06a07c0>] (cpuidle_enter_state_coupled+0x3a8/0x40c)
[<c06a07c0>] (cpuidle_enter_state_coupled) from [<c018d844>] (do_idle+0x1a8/0x21c)
[<c018d844>] (do_idle) from [<c018dc3c>] (cpu_startup_entry+0x18/0x1c)
[<c018dc3c>] (cpu_startup_entry) from [<8010164c>] (0x8010164c)


2017-07-13 09:43:13

by Alex Shi

[permalink] [raw]
Subject: Re: Regression with suspicious RCU usage splats with cpu_pm change


On 07/13/2017 03:07 PM, Tony Lindgren wrote:
> Hi,
>
> Looks like next-20170713 gives me a bunch of "suspicious RCU usage"
> splats with cpuidle_coupled on duovero, see below. I bisected it down
> to commit 2f027e003d05 ("cpu_pm: replace raw_notifier with
> atomic_notifier").
>
> Regards,
>
> Tony
>
> 8< -----------------------
> =============================
> WARNING: suspicious RCU usage
> 4.12.0-next-20170713+ #118 Tainted: G W
> -----------------------------
> ./include/linux/rcupdate.h:611 rcu_read_lock() used illegally while idle!
> [ 2.928802]
> other info that might help us debug this:
> [ 2.928802]
> [ 2.946777]
> RCU used illegally from idle CPU!
> rcu_scheduler_active = 2, debug_locks = 1
> RCU used illegally from extended quiescent state!

CC Catalin & Will,

It's a shame for me. :(

lockdep_rcu_suspicious() explained, rcu_read_lock can not be used after
rcu_idle_enter():

* If a CPU is in the RCU-free window in idle (ie: in the section
* between rcu_idle_enter() and rcu_idle_exit(), then RCU
* considers that CPU to be in an "extended quiescent state",
* which means that RCU will be completely ignoring that CPU.
* Therefore, rcu_read_lock() and friends have absolutely no
* effect on a CPU running in that state. In other words, even if
* such an RCU-idle CPU has called rcu_read_lock(), RCU might well
* delete data structures out from under it. RCU really has no
* choice here: we need to keep an RCU-free window in idle where
* the CPU may possibly enter into low power mode. This way we can
* notice an extended quiescent state to other CPUs that started a grace
* period. Otherwise we would delay any grace period as long as we run
* in the idle task.

Although the cpu is still alive and not going to idle yet, the RCU is
starting to count this cpu into quiescent state. I guess it's not so
good to put off rcu_idle_enter for all archs. We need another solution
on this problem, maybe unsleepable raw_rwlock? or may best way is split
the notification and the real idle trigger in the function
arm_enter_idle_state()?



Regards
Alex

2017-07-13 11:50:29

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Regression with suspicious RCU usage splats with cpu_pm change

On Thu, Jul 13, 2017 at 11:43 AM, Alex Shi <[email protected]> wrote:
>
> On 07/13/2017 03:07 PM, Tony Lindgren wrote:
>> Hi,
>>
>> Looks like next-20170713 gives me a bunch of "suspicious RCU usage"
>> splats with cpuidle_coupled on duovero, see below. I bisected it down
>> to commit 2f027e003d05 ("cpu_pm: replace raw_notifier with
>> atomic_notifier").

OK, so I'm dropping this commit.

Thanks,
Rafael

2017-07-13 12:43:26

by Paul E. McKenney

[permalink] [raw]
Subject: Re: Regression with suspicious RCU usage splats with cpu_pm change

On Thu, Jul 13, 2017 at 01:50:26PM +0200, Rafael J. Wysocki wrote:
> On Thu, Jul 13, 2017 at 11:43 AM, Alex Shi <[email protected]> wrote:
> >
> > On 07/13/2017 03:07 PM, Tony Lindgren wrote:
> >> Hi,
> >>
> >> Looks like next-20170713 gives me a bunch of "suspicious RCU usage"
> >> splats with cpuidle_coupled on duovero, see below. I bisected it down
> >> to commit 2f027e003d05 ("cpu_pm: replace raw_notifier with
> >> atomic_notifier").
>
> OK, so I'm dropping this commit.

You can surround idle-loop RCU-reading code with RCU_NONIDLE().
This will tell RCU to pay attention even though the CPU is otherwise
idle.

Thanx, Paul

2017-07-16 23:24:23

by Alex Shi

[permalink] [raw]
Subject: Re: Regression with suspicious RCU usage splats with cpu_pm change



On 07/13/2017 08:43 PM, Paul E. McKenney wrote:
> On Thu, Jul 13, 2017 at 01:50:26PM +0200, Rafael J. Wysocki wrote:
>> On Thu, Jul 13, 2017 at 11:43 AM, Alex Shi <[email protected]> wrote:
>>>
>>> On 07/13/2017 03:07 PM, Tony Lindgren wrote:
>>>> Hi,
>>>>
>>>> Looks like next-20170713 gives me a bunch of "suspicious RCU usage"
>>>> splats with cpuidle_coupled on duovero, see below. I bisected it down
>>>> to commit 2f027e003d05 ("cpu_pm: replace raw_notifier with
>>>> atomic_notifier").
>>
>> OK, so I'm dropping this commit.
>
> You can surround idle-loop RCU-reading code with RCU_NONIDLE().
> This will tell RCU to pay attention even though the CPU is otherwise
> idle.
>
> Thanx, Paul
>


Thanks a lot, Paul! :)
I reused the rcu_irq_enter_irqson() from RCU_NONIDLE to avoid this issue.
It works fine.

Tony, Could you like to give a tested-by if this patch works for you.

Sebastian,
May I keep your acked-by with new fixed patch, since the main thing remained? :)


Thanks everyone!

======

>From c8ec81808d46a78e58267f6a23f2b58b48ed5725 Mon Sep 17 00:00:00 2001
From: Alex Shi <[email protected]>
Date: Tue, 4 Jul 2017 21:49:23 +0800
Subject: [PATCH] cpu_pm: replace raw_notifier to atomic_notifier

This patch replace a rwlock and raw notifier by atomic notifier which
protected by spin_lock and rcu.

The first to reason to have this replace is due to a 'scheduling while
atomic' bug of RT kernel on arm/arm64 platform. On arm/arm64, rwlock
cpu_pm_notifier_lock in cpu_pm cause a potential schedule after irq
disable in idle call chain:

cpu_startup_entry
cpu_idle_loop
local_irq_disable()
cpuidle_idle_call
call_cpuidle
cpuidle_enter
cpuidle_enter_state
->enter :arm_enter_idle_state
cpu_pm_enter/exit
CPU_PM_CPU_IDLE_ENTER
read_lock(&cpu_pm_notifier_lock); <-- sleep in idle
__rt_spin_lock();
schedule();

The kernel panic is here:
[ 4.609601] BUG: scheduling while atomic: swapper/1/0/0x00000002
[ 4.609608] [<ffff0000086fae70>] arm_enter_idle_state+0x18/0x70
[ 4.609614] Modules linked in:
[ 4.609615] [<ffff0000086f9298>] cpuidle_enter_state+0xf0/0x218
[ 4.609620] [<ffff0000086f93f8>] cpuidle_enter+0x18/0x20
[ 4.609626] Preemption disabled at:
[ 4.609627] [<ffff0000080fa234>] call_cpuidle+0x24/0x40
[ 4.609635] [<ffff000008882fa4>] schedule_preempt_disabled+0x1c/0x28
[ 4.609639] [<ffff0000080fa49c>] cpu_startup_entry+0x154/0x1f8
[ 4.609645] [<ffff00000808e004>] secondary_start_kernel+0x15c/0x1a0

Daniel Lezcano said this notification is needed on arm/arm64 platforms.
Sebastian suggested using atomic_notifier instead of rwlock, which is not
only removing the sleeping in idle, but also getting better latency
improvement.

Tony Lezcano found a miss use that rcu_read_lock used after rcu_idle_enter
Paul E. McKenney suggested trying RCU_NONIDLE.

Thanks everyone! :)

This patch passed Fengguang's 0day testing.

Signed-off-by: Alex Shi <[email protected]>
To: [email protected]
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Anders Roxell <[email protected]>
Cc: Daniel Lezcano <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Tony Lindgren <[email protected]>
Cc: [email protected]
Cc: linux-rt-users <[email protected]>
Acked-by: Sebastian Andrzej Siewior <[email protected]>
---
kernel/cpu_pm.c | 50 +++++++++++++-------------------------------------
1 file changed, 13 insertions(+), 37 deletions(-)

diff --git a/kernel/cpu_pm.c b/kernel/cpu_pm.c
index 009cc9a..67b02e1 100644
--- a/kernel/cpu_pm.c
+++ b/kernel/cpu_pm.c
@@ -22,15 +22,21 @@
#include <linux/spinlock.h>
#include <linux/syscore_ops.h>

-static DEFINE_RWLOCK(cpu_pm_notifier_lock);
-static RAW_NOTIFIER_HEAD(cpu_pm_notifier_chain);
+static ATOMIC_NOTIFIER_HEAD(cpu_pm_notifier_chain);

static int cpu_pm_notify(enum cpu_pm_event event, int nr_to_call, int *nr_calls)
{
int ret;

- ret = __raw_notifier_call_chain(&cpu_pm_notifier_chain, event, NULL,
+ /*
+ * __atomic_notifier_call_chain has a RCU read critical section, which
+ * could be disfunctional in cpu idle. Copy RCU_NONIDLE code to let
+ * RCU know this.
+ */
+ rcu_irq_enter_irqson();
+ ret = __atomic_notifier_call_chain(&cpu_pm_notifier_chain, event, NULL,
nr_to_call, nr_calls);
+ rcu_irq_exit_irqson();

return notifier_to_errno(ret);
}
@@ -47,14 +53,7 @@ static int cpu_pm_notify(enum cpu_pm_event event, int nr_to_call, int *nr_calls)
*/
int cpu_pm_register_notifier(struct notifier_block *nb)
{
- unsigned long flags;
- int ret;
-
- write_lock_irqsave(&cpu_pm_notifier_lock, flags);
- ret = raw_notifier_chain_register(&cpu_pm_notifier_chain, nb);
- write_unlock_irqrestore(&cpu_pm_notifier_lock, flags);
-
- return ret;
+ return atomic_notifier_chain_register(&cpu_pm_notifier_chain, nb);
}
EXPORT_SYMBOL_GPL(cpu_pm_register_notifier);

@@ -69,14 +68,7 @@ EXPORT_SYMBOL_GPL(cpu_pm_register_notifier);
*/
int cpu_pm_unregister_notifier(struct notifier_block *nb)
{
- unsigned long flags;
- int ret;
-
- write_lock_irqsave(&cpu_pm_notifier_lock, flags);
- ret = raw_notifier_chain_unregister(&cpu_pm_notifier_chain, nb);
- write_unlock_irqrestore(&cpu_pm_notifier_lock, flags);
-
- return ret;
+ return atomic_notifier_chain_unregister(&cpu_pm_notifier_chain, nb);
}
EXPORT_SYMBOL_GPL(cpu_pm_unregister_notifier);

@@ -100,7 +92,6 @@ int cpu_pm_enter(void)
int nr_calls;
int ret = 0;

- read_lock(&cpu_pm_notifier_lock);
ret = cpu_pm_notify(CPU_PM_ENTER, -1, &nr_calls);
if (ret)
/*
@@ -108,7 +99,6 @@ int cpu_pm_enter(void)
* PM entry who are notified earlier to prepare for it.
*/
cpu_pm_notify(CPU_PM_ENTER_FAILED, nr_calls - 1, NULL);
- read_unlock(&cpu_pm_notifier_lock);

return ret;
}
@@ -128,13 +118,7 @@ EXPORT_SYMBOL_GPL(cpu_pm_enter);
*/
int cpu_pm_exit(void)
{
- int ret;
-
- read_lock(&cpu_pm_notifier_lock);
- ret = cpu_pm_notify(CPU_PM_EXIT, -1, NULL);
- read_unlock(&cpu_pm_notifier_lock);
-
- return ret;
+ return cpu_pm_notify(CPU_PM_EXIT, -1, NULL);
}
EXPORT_SYMBOL_GPL(cpu_pm_exit);

@@ -159,7 +143,6 @@ int cpu_cluster_pm_enter(void)
int nr_calls;
int ret = 0;

- read_lock(&cpu_pm_notifier_lock);
ret = cpu_pm_notify(CPU_CLUSTER_PM_ENTER, -1, &nr_calls);
if (ret)
/*
@@ -167,7 +150,6 @@ int cpu_cluster_pm_enter(void)
* PM entry who are notified earlier to prepare for it.
*/
cpu_pm_notify(CPU_CLUSTER_PM_ENTER_FAILED, nr_calls - 1, NULL);
- read_unlock(&cpu_pm_notifier_lock);

return ret;
}
@@ -190,13 +172,7 @@ EXPORT_SYMBOL_GPL(cpu_cluster_pm_enter);
*/
int cpu_cluster_pm_exit(void)
{
- int ret;
-
- read_lock(&cpu_pm_notifier_lock);
- ret = cpu_pm_notify(CPU_CLUSTER_PM_EXIT, -1, NULL);
- read_unlock(&cpu_pm_notifier_lock);
-
- return ret;
+ return cpu_pm_notify(CPU_CLUSTER_PM_EXIT, -1, NULL);
}
EXPORT_SYMBOL_GPL(cpu_cluster_pm_exit);

--
2.7.4

2017-07-17 06:08:12

by Tony Lindgren

[permalink] [raw]
Subject: Re: Regression with suspicious RCU usage splats with cpu_pm change

* Alex Shi <[email protected]> [170716 16:25]:
> I reused the rcu_irq_enter_irqson() from RCU_NONIDLE to avoid this issue.
> It works fine.
>
> Tony, Could you like to give a tested-by if this patch works for you.

Yeah that keeps things booting for me with no splats so:

Tested-by: Tony Lindgren <[email protected]>

In general, it seems we're missing the knowledge in Linux kernel
of when the entire system is idle. Right now it seems that only
cpuidle_coupled knows that?

We could probably simplify things by adding some PM state for
entire system idle. Then cpuidle code and timer code could use
that to test when it's safe to do whatever the SoC needs to do
to enter deeper power states.

If we already have something like that, please do let me know :)

Regards,

Tony

2017-07-17 12:39:46

by Paul E. McKenney

[permalink] [raw]
Subject: Re: Regression with suspicious RCU usage splats with cpu_pm change

On Sun, Jul 16, 2017 at 11:08:07PM -0700, Tony Lindgren wrote:
> * Alex Shi <[email protected]> [170716 16:25]:
> > I reused the rcu_irq_enter_irqson() from RCU_NONIDLE to avoid this issue.
> > It works fine.
> >
> > Tony, Could you like to give a tested-by if this patch works for you.
>
> Yeah that keeps things booting for me with no splats so:
>
> Tested-by: Tony Lindgren <[email protected]>
>
> In general, it seems we're missing the knowledge in Linux kernel
> of when the entire system is idle. Right now it seems that only
> cpuidle_coupled knows that?
>
> We could probably simplify things by adding some PM state for
> entire system idle. Then cpuidle code and timer code could use
> that to test when it's safe to do whatever the SoC needs to do
> to enter deeper power states.
>
> If we already have something like that, please do let me know :)

Well, we used to have CONFIG_NO_HZ_FULL_SYSIDLE, which detected
full-system idle lazily so as to avoid scalability bottlenecks.
https://lwn.net/Articles/558284/

No one was using it, so I removed it last merge window. The
patch that removed it is at sysidle.2017.05.11a, which can
probably still be reverted cleanly. Or just use v4.11 or earlier.

Thanx, Paul

2017-07-18 05:41:43

by Tony Lindgren

[permalink] [raw]
Subject: Re: Regression with suspicious RCU usage splats with cpu_pm change

* Paul E. McKenney <[email protected]> [170717 05:40]:
> On Sun, Jul 16, 2017 at 11:08:07PM -0700, Tony Lindgren wrote:
> > * Alex Shi <[email protected]> [170716 16:25]:
> > > I reused the rcu_irq_enter_irqson() from RCU_NONIDLE to avoid this issue.
> > > It works fine.
> > >
> > > Tony, Could you like to give a tested-by if this patch works for you.
> >
> > Yeah that keeps things booting for me with no splats so:
> >
> > Tested-by: Tony Lindgren <[email protected]>
> >
> > In general, it seems we're missing the knowledge in Linux kernel
> > of when the entire system is idle. Right now it seems that only
> > cpuidle_coupled knows that?
> >
> > We could probably simplify things by adding some PM state for
> > entire system idle. Then cpuidle code and timer code could use
> > that to test when it's safe to do whatever the SoC needs to do
> > to enter deeper power states.
> >
> > If we already have something like that, please do let me know :)
>
> Well, we used to have CONFIG_NO_HZ_FULL_SYSIDLE, which detected
> full-system idle lazily so as to avoid scalability bottlenecks.
> https://lwn.net/Articles/558284/
>
> No one was using it, so I removed it last merge window. The
> patch that removed it is at sysidle.2017.05.11a, which can
> probably still be reverted cleanly. Or just use v4.11 or earlier.

OK thanks for the pointer, for reference that commit is
fe5ac724d81a ("rcu: Remove nohz_full full-system-idle state
machine").

For a potential user, I think we could use it for example in
cpuidle_enter_state_coupled() + omap_enter_idle_coupled() where
we try to figure out if the system is fully idle before calling
tick_broadcast_enter().

Regards,

Tony

2017-07-18 17:00:19

by Paul E. McKenney

[permalink] [raw]
Subject: Re: Regression with suspicious RCU usage splats with cpu_pm change

On Mon, Jul 17, 2017 at 10:41:38PM -0700, Tony Lindgren wrote:
> * Paul E. McKenney <[email protected]> [170717 05:40]:
> > On Sun, Jul 16, 2017 at 11:08:07PM -0700, Tony Lindgren wrote:
> > > * Alex Shi <[email protected]> [170716 16:25]:
> > > > I reused the rcu_irq_enter_irqson() from RCU_NONIDLE to avoid this issue.
> > > > It works fine.
> > > >
> > > > Tony, Could you like to give a tested-by if this patch works for you.
> > >
> > > Yeah that keeps things booting for me with no splats so:
> > >
> > > Tested-by: Tony Lindgren <[email protected]>
> > >
> > > In general, it seems we're missing the knowledge in Linux kernel
> > > of when the entire system is idle. Right now it seems that only
> > > cpuidle_coupled knows that?
> > >
> > > We could probably simplify things by adding some PM state for
> > > entire system idle. Then cpuidle code and timer code could use
> > > that to test when it's safe to do whatever the SoC needs to do
> > > to enter deeper power states.
> > >
> > > If we already have something like that, please do let me know :)
> >
> > Well, we used to have CONFIG_NO_HZ_FULL_SYSIDLE, which detected
> > full-system idle lazily so as to avoid scalability bottlenecks.
> > https://lwn.net/Articles/558284/
> >
> > No one was using it, so I removed it last merge window. The
> > patch that removed it is at sysidle.2017.05.11a, which can
> > probably still be reverted cleanly. Or just use v4.11 or earlier.
>
> OK thanks for the pointer, for reference that commit is
> fe5ac724d81a ("rcu: Remove nohz_full full-system-idle state
> machine").
>
> For a potential user, I think we could use it for example in
> cpuidle_enter_state_coupled() + omap_enter_idle_coupled() where
> we try to figure out if the system is fully idle before calling
> tick_broadcast_enter().

Would you be willing to prototype your usage on v4.12? It still has
NO_HZ_FULL_SYSIDLE. You have to enable NO_HZ_FULL in order to enable
NO_HZ_FULL_SYSIDLE at the moment.

Either way, here is the important bit for usage:

bool rcu_sys_is_idle(void);
void rcu_sysidle_force_exit(void);

The rcu_sys_is_idle() function returns true if all CPUs other than the
time-keeping CPU (that is, tick_do_timer_cpu, which is usually CPU 0)
are in their idle loop. Of course, if you invoke rcu_sys_is_idle()
from any CPU other than the time-keeping CPU, you will automatically
get a return value of false.

RCU's idle-exit code already sets state appropriately, but if there
is some other circumstance where you need to force the state machine
out of all-CPUs-idle state, you can call rcu_sysidle_force_exit().

Thanx, Paul

2017-07-19 08:01:34

by Tony Lindgren

[permalink] [raw]
Subject: Re: Regression with suspicious RCU usage splats with cpu_pm change

* Paul E. McKenney <[email protected]> [170718 10:00]:
> On Mon, Jul 17, 2017 at 10:41:38PM -0700, Tony Lindgren wrote:
> > * Paul E. McKenney <[email protected]> [170717 05:40]:
> > > On Sun, Jul 16, 2017 at 11:08:07PM -0700, Tony Lindgren wrote:
> > > > * Alex Shi <[email protected]> [170716 16:25]:
> > > > > I reused the rcu_irq_enter_irqson() from RCU_NONIDLE to avoid this issue.
> > > > > It works fine.
> > > > >
> > > > > Tony, Could you like to give a tested-by if this patch works for you.
> > > >
> > > > Yeah that keeps things booting for me with no splats so:
> > > >
> > > > Tested-by: Tony Lindgren <[email protected]>
> > > >
> > > > In general, it seems we're missing the knowledge in Linux kernel
> > > > of when the entire system is idle. Right now it seems that only
> > > > cpuidle_coupled knows that?
> > > >
> > > > We could probably simplify things by adding some PM state for
> > > > entire system idle. Then cpuidle code and timer code could use
> > > > that to test when it's safe to do whatever the SoC needs to do
> > > > to enter deeper power states.
> > > >
> > > > If we already have something like that, please do let me know :)
> > >
> > > Well, we used to have CONFIG_NO_HZ_FULL_SYSIDLE, which detected
> > > full-system idle lazily so as to avoid scalability bottlenecks.
> > > https://lwn.net/Articles/558284/
> > >
> > > No one was using it, so I removed it last merge window. The
> > > patch that removed it is at sysidle.2017.05.11a, which can
> > > probably still be reverted cleanly. Or just use v4.11 or earlier.
> >
> > OK thanks for the pointer, for reference that commit is
> > fe5ac724d81a ("rcu: Remove nohz_full full-system-idle state
> > machine").
> >
> > For a potential user, I think we could use it for example in
> > cpuidle_enter_state_coupled() + omap_enter_idle_coupled() where
> > we try to figure out if the system is fully idle before calling
> > tick_broadcast_enter().
>
> Would you be willing to prototype your usage on v4.12? It still has
> NO_HZ_FULL_SYSIDLE. You have to enable NO_HZ_FULL in order to enable
> NO_HZ_FULL_SYSIDLE at the moment.
>
> Either way, here is the important bit for usage:
>
> bool rcu_sys_is_idle(void);
> void rcu_sysidle_force_exit(void);
>
> The rcu_sys_is_idle() function returns true if all CPUs other than the
> time-keeping CPU (that is, tick_do_timer_cpu, which is usually CPU 0)
> are in their idle loop. Of course, if you invoke rcu_sys_is_idle()
> from any CPU other than the time-keeping CPU, you will automatically
> get a return value of false.
>
> RCU's idle-exit code already sets state appropriately, but if there
> is some other circumstance where you need to force the state machine
> out of all-CPUs-idle state, you can call rcu_sysidle_force_exit().

OK sure I'll take a look at some point.

Thanks,

Tony

2017-07-27 09:55:39

by Alex Shi

[permalink] [raw]
Subject: Re: Regression with suspicious RCU usage splats with cpu_pm change

Hi Rafael,

Guess you overlook my new patch in this thread. :)
Do I need to recent this patch as new? or you could pick it up here?

BTW,
it's
Tested-by: Tony Lindgren <[email protected]>

Thanks!
Alex

On 07/17/2017 07:24 AM, Alex Shi wrote:
>
>
> On 07/13/2017 08:43 PM, Paul E. McKenney wrote:
>> On Thu, Jul 13, 2017 at 01:50:26PM +0200, Rafael J. Wysocki wrote:
>>> On Thu, Jul 13, 2017 at 11:43 AM, Alex Shi <[email protected]> wrote:
>>>>
>>>> On 07/13/2017 03:07 PM, Tony Lindgren wrote:
>>>>> Hi,
>>>>>
>>>>> Looks like next-20170713 gives me a bunch of "suspicious RCU usage"
>>>>> splats with cpuidle_coupled on duovero, see below. I bisected it down
>>>>> to commit 2f027e003d05 ("cpu_pm: replace raw_notifier with
>>>>> atomic_notifier").
>>>
>>> OK, so I'm dropping this commit.
>>
>> You can surround idle-loop RCU-reading code with RCU_NONIDLE().
>> This will tell RCU to pay attention even though the CPU is otherwise
>> idle.
>>
>> Thanx, Paul
>>
>
>
> Thanks a lot, Paul! :)
> I reused the rcu_irq_enter_irqson() from RCU_NONIDLE to avoid this issue.
> It works fine.
>
> Tony, Could you like to give a tested-by if this patch works for you.
>
> Sebastian,
> May I keep your acked-by with new fixed patch, since the main thing remained? :)
>
>
> Thanks everyone!
>
> ======
>
> From c8ec81808d46a78e58267f6a23f2b58b48ed5725 Mon Sep 17 00:00:00 2001
> From: Alex Shi <[email protected]>
> Date: Tue, 4 Jul 2017 21:49:23 +0800
> Subject: [PATCH] cpu_pm: replace raw_notifier to atomic_notifier
>
> This patch replace a rwlock and raw notifier by atomic notifier which
> protected by spin_lock and rcu.
>
> The first to reason to have this replace is due to a 'scheduling while
> atomic' bug of RT kernel on arm/arm64 platform. On arm/arm64, rwlock
> cpu_pm_notifier_lock in cpu_pm cause a potential schedule after irq
> disable in idle call chain:
>
> cpu_startup_entry
> cpu_idle_loop
> local_irq_disable()
> cpuidle_idle_call
> call_cpuidle
> cpuidle_enter
> cpuidle_enter_state
> ->enter :arm_enter_idle_state
> cpu_pm_enter/exit
> CPU_PM_CPU_IDLE_ENTER
> read_lock(&cpu_pm_notifier_lock); <-- sleep in idle
> __rt_spin_lock();
> schedule();
>
> The kernel panic is here:
> [ 4.609601] BUG: scheduling while atomic: swapper/1/0/0x00000002
> [ 4.609608] [<ffff0000086fae70>] arm_enter_idle_state+0x18/0x70
> [ 4.609614] Modules linked in:
> [ 4.609615] [<ffff0000086f9298>] cpuidle_enter_state+0xf0/0x218
> [ 4.609620] [<ffff0000086f93f8>] cpuidle_enter+0x18/0x20
> [ 4.609626] Preemption disabled at:
> [ 4.609627] [<ffff0000080fa234>] call_cpuidle+0x24/0x40
> [ 4.609635] [<ffff000008882fa4>] schedule_preempt_disabled+0x1c/0x28
> [ 4.609639] [<ffff0000080fa49c>] cpu_startup_entry+0x154/0x1f8
> [ 4.609645] [<ffff00000808e004>] secondary_start_kernel+0x15c/0x1a0
>
> Daniel Lezcano said this notification is needed on arm/arm64 platforms.
> Sebastian suggested using atomic_notifier instead of rwlock, which is not
> only removing the sleeping in idle, but also getting better latency
> improvement.
>
> Tony Lezcano found a miss use that rcu_read_lock used after rcu_idle_enter
> Paul E. McKenney suggested trying RCU_NONIDLE.
>
> Thanks everyone! :)
>
> This patch passed Fengguang's 0day testing.
>
> Signed-off-by: Alex Shi <[email protected]>
> To: [email protected]
> Cc: Sebastian Andrzej Siewior <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Anders Roxell <[email protected]>
> Cc: Daniel Lezcano <[email protected]>
> Cc: Rafael J. Wysocki <[email protected]>
> Cc: Tony Lindgren <[email protected]>
> Cc: [email protected]
> Cc: linux-rt-users <[email protected]>
> Acked-by: Sebastian Andrzej Siewior <[email protected]>
> ---
> kernel/cpu_pm.c | 50 +++++++++++++-------------------------------------
> 1 file changed, 13 insertions(+), 37 deletions(-)
>
> diff --git a/kernel/cpu_pm.c b/kernel/cpu_pm.c
> index 009cc9a..67b02e1 100644
> --- a/kernel/cpu_pm.c
> +++ b/kernel/cpu_pm.c
> @@ -22,15 +22,21 @@
> #include <linux/spinlock.h>
> #include <linux/syscore_ops.h>
>
> -static DEFINE_RWLOCK(cpu_pm_notifier_lock);
> -static RAW_NOTIFIER_HEAD(cpu_pm_notifier_chain);
> +static ATOMIC_NOTIFIER_HEAD(cpu_pm_notifier_chain);
>
> static int cpu_pm_notify(enum cpu_pm_event event, int nr_to_call, int *nr_calls)
> {
> int ret;
>
> - ret = __raw_notifier_call_chain(&cpu_pm_notifier_chain, event, NULL,
> + /*
> + * __atomic_notifier_call_chain has a RCU read critical section, which
> + * could be disfunctional in cpu idle. Copy RCU_NONIDLE code to let
> + * RCU know this.
> + */
> + rcu_irq_enter_irqson();
> + ret = __atomic_notifier_call_chain(&cpu_pm_notifier_chain, event, NULL,
> nr_to_call, nr_calls);
> + rcu_irq_exit_irqson();
>
> return notifier_to_errno(ret);
> }
> @@ -47,14 +53,7 @@ static int cpu_pm_notify(enum cpu_pm_event event, int nr_to_call, int *nr_calls)
> */
> int cpu_pm_register_notifier(struct notifier_block *nb)
> {
> - unsigned long flags;
> - int ret;
> -
> - write_lock_irqsave(&cpu_pm_notifier_lock, flags);
> - ret = raw_notifier_chain_register(&cpu_pm_notifier_chain, nb);
> - write_unlock_irqrestore(&cpu_pm_notifier_lock, flags);
> -
> - return ret;
> + return atomic_notifier_chain_register(&cpu_pm_notifier_chain, nb);
> }
> EXPORT_SYMBOL_GPL(cpu_pm_register_notifier);
>
> @@ -69,14 +68,7 @@ EXPORT_SYMBOL_GPL(cpu_pm_register_notifier);
> */
> int cpu_pm_unregister_notifier(struct notifier_block *nb)
> {
> - unsigned long flags;
> - int ret;
> -
> - write_lock_irqsave(&cpu_pm_notifier_lock, flags);
> - ret = raw_notifier_chain_unregister(&cpu_pm_notifier_chain, nb);
> - write_unlock_irqrestore(&cpu_pm_notifier_lock, flags);
> -
> - return ret;
> + return atomic_notifier_chain_unregister(&cpu_pm_notifier_chain, nb);
> }
> EXPORT_SYMBOL_GPL(cpu_pm_unregister_notifier);
>
> @@ -100,7 +92,6 @@ int cpu_pm_enter(void)
> int nr_calls;
> int ret = 0;
>
> - read_lock(&cpu_pm_notifier_lock);
> ret = cpu_pm_notify(CPU_PM_ENTER, -1, &nr_calls);
> if (ret)
> /*
> @@ -108,7 +99,6 @@ int cpu_pm_enter(void)
> * PM entry who are notified earlier to prepare for it.
> */
> cpu_pm_notify(CPU_PM_ENTER_FAILED, nr_calls - 1, NULL);
> - read_unlock(&cpu_pm_notifier_lock);
>
> return ret;
> }
> @@ -128,13 +118,7 @@ EXPORT_SYMBOL_GPL(cpu_pm_enter);
> */
> int cpu_pm_exit(void)
> {
> - int ret;
> -
> - read_lock(&cpu_pm_notifier_lock);
> - ret = cpu_pm_notify(CPU_PM_EXIT, -1, NULL);
> - read_unlock(&cpu_pm_notifier_lock);
> -
> - return ret;
> + return cpu_pm_notify(CPU_PM_EXIT, -1, NULL);
> }
> EXPORT_SYMBOL_GPL(cpu_pm_exit);
>
> @@ -159,7 +143,6 @@ int cpu_cluster_pm_enter(void)
> int nr_calls;
> int ret = 0;
>
> - read_lock(&cpu_pm_notifier_lock);
> ret = cpu_pm_notify(CPU_CLUSTER_PM_ENTER, -1, &nr_calls);
> if (ret)
> /*
> @@ -167,7 +150,6 @@ int cpu_cluster_pm_enter(void)
> * PM entry who are notified earlier to prepare for it.
> */
> cpu_pm_notify(CPU_CLUSTER_PM_ENTER_FAILED, nr_calls - 1, NULL);
> - read_unlock(&cpu_pm_notifier_lock);
>
> return ret;
> }
> @@ -190,13 +172,7 @@ EXPORT_SYMBOL_GPL(cpu_cluster_pm_enter);
> */
> int cpu_cluster_pm_exit(void)
> {
> - int ret;
> -
> - read_lock(&cpu_pm_notifier_lock);
> - ret = cpu_pm_notify(CPU_CLUSTER_PM_EXIT, -1, NULL);
> - read_unlock(&cpu_pm_notifier_lock);
> -
> - return ret;
> + return cpu_pm_notify(CPU_CLUSTER_PM_EXIT, -1, NULL);
> }
> EXPORT_SYMBOL_GPL(cpu_cluster_pm_exit);
>
>

2017-07-27 21:40:35

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Regression with suspicious RCU usage splats with cpu_pm change

On Thursday, July 27, 2017 05:55:19 PM Alex Shi wrote:
> Hi Rafael,
>
> Guess you overlook my new patch in this thread. :)
> Do I need to recent this patch as new? or you could pick it up here?

Please resend it, add all of the applicable tags etc.

Thanks,
Rafael