2020-08-31 06:33:50

by Naresh Kamboju

[permalink] [raw]
Subject: WARNING: suspicious RCU usage - sdhci-pltfm: SDHCI platform and OF driver helper

While booting linux mainline kernel on arm64 db410c this kernel warning
noticed.

metadata:
git branch: master
git repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
git commit: f75aef392f869018f78cfedf3c320a6b3fcfda6b
git describe: v5.9-rc3
make_kernelversion: 5.9.0-rc3
kernel-config:
http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/dragonboard-410c/lkft/linux-mainline/2965/config

Boot log,

[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd030]
[ 0.000000] Linux version 5.9.0-rc3 (oe-user@oe-host)
(aarch64-linaro-linux-gcc (GCC) 7.3.0, GNU ld (GNU Binutils)
2.30.0.20180208) #1 SMP PREEMPT Mon Aug 31 00:23:15 UTC 2020
[ 0.000000] Machine model: Qualcomm Technologies, Inc. APQ 8016 SBC
<>
[ 5.299090] sdhci: Secure Digital Host Controller Interface driver
[ 5.299140] sdhci: Copyright(c) Pierre Ossman
[ 5.304313]
[ 5.307771] Synopsys Designware Multimedia Card Interface Driver
[ 5.308588] =============================
[ 5.308593] WARNING: suspicious RCU usage
[ 5.316628] sdhci-pltfm: SDHCI platform and OF driver helper
[ 5.320052] 5.9.0-rc3 #1 Not tainted
[ 5.320057] -----------------------------
[ 5.320063] /usr/src/kernel/include/trace/events/lock.h:37
suspicious rcu_dereference_check() usage!
[ 5.320068]
[ 5.320068] other info that might help us debug this:
[ 5.320068]
[ 5.320074]
[ 5.320074] rcu_scheduler_active = 2, debug_locks = 1
[ 5.320078] RCU used illegally from extended quiescent state!
[ 5.320084] no locks held by swapper/0/0.
[ 5.320089]
[ 5.320089] stack backtrace:
[ 5.320098] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #1
[ 5.346354] sdhci_msm 7864900.sdhci: Got CD GPIO
[ 5.346446] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[ 5.346452] Call trace:
[ 5.346463] dump_backtrace+0x0/0x1f8
[ 5.346471] show_stack+0x2c/0x38
[ 5.346480] dump_stack+0xec/0x15c
[ 5.346490] lockdep_rcu_suspicious+0xd4/0xf8
[ 5.346499] lock_acquire+0x3d0/0x440
[ 5.346510] _raw_spin_lock_irqsave+0x80/0xb0
[ 5.413118] __pm_runtime_suspend+0x34/0x1d0
[ 5.417457] psci_enter_domain_idle_state+0x4c/0xb0
[ 5.421795] cpuidle_enter_state+0xc8/0x610
[ 5.426392] cpuidle_enter+0x3c/0x50
[ 5.430561] call_cpuidle+0x44/0x80
[ 5.434378] do_idle+0x240/0x2a0
[ 5.437589] cpu_startup_entry+0x2c/0x78
[ 5.441063] rest_init+0x1ac/0x280
[ 5.444970] arch_call_rest_init+0x14/0x1c
[ 5.448180] start_kernel+0x50c/0x544
[ 5.452395]
[ 5.452399]
[ 5.452403] =============================
[ 5.452406] WARNING: suspicious RCU usage
[ 5.452409] 5.9.0-rc3 #1 Not tainted
[ 5.452412] -----------------------------
[ 5.452417] /usr/src/kernel/include/trace/events/ipi.h:36
suspicious rcu_dereference_check() usage!
[ 5.452420]
[ 5.452424] other info that might help us debug this:
[ 5.452426]
[ 5.452429]
[ 5.452432] rcu_scheduler_active = 2, debug_locks = 1
[ 5.452436] RCU used illegally from extended quiescent state!
[ 5.452440] 1 lock held by swapper/0/0:
[ 5.452443] #0: ffff8000127408f8 (logbuf_lock){-...}-{2:2}, at:
vprintk_emit+0xb0/0x358
[ 5.452458]
[ 5.452461] stack backtrace:
[ 5.452465] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #1
[ 5.452469] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[ 5.452472] Call trace:
[ 5.452476] dump_backtrace+0x0/0x1f8
[ 5.452479] show_stack+0x2c/0x38
[ 5.452481] dump_stack+0xec/0x15c
[ 5.452485] lockdep_rcu_suspicious+0xd4/0xf8
[ 5.452489] arch_irq_work_raise+0x208/0x210
[ 5.452493] __irq_work_queue_local+0x64/0x88
[ 5.452495] irq_work_queue+0x3c/0x88
[ 5.452499] printk_safe_log_store+0x148/0x178
[ 5.452502] vprintk_func+0x1cc/0x2b8
[ 5.452506] printk+0x74/0x94
[ 5.452509] lockdep_rcu_suspicious+0x28/0xf8
[ 5.452512] lock_release+0x338/0x360
[ 5.452516] _raw_spin_unlock+0x3c/0xa0
[ 5.452519] vprintk_emit+0xf8/0x358
[ 5.452522] vprintk_default+0x48/0x58
[ 5.452526] vprintk_func+0xec/0x2b8
[ 5.452528] printk+0x74/0x94
[ 5.452532] lockdep_rcu_suspicious+0x28/0xf8
[ 5.452535] lock_acquire+0x3d0/0x440
[ 5.452538] _raw_spin_lock_irqsave+0x80/0xb0
[ 5.452542] __pm_runtime_suspend+0x34/0x1d0
[ 5.452545] psci_enter_domain_idle_state+0x4c/0xb0
[ 5.452549] cpuidle_enter_state+0xc8/0x610
[ 5.452552] cpuidle_enter+0x3c/0x50
[ 5.452555] call_cpuidle+0x44/0x80
[ 5.452559] do_idle+0x240/0x2a0
[ 5.452562] cpu_startup_entry+0x2c/0x78
[ 5.452564] rest_init+0x1ac/0x280
[ 5.452568] arch_call_rest_init+0x14/0x1c
[ 5.452571] start_kernel+0x50c/0x544
[ 5.452575] =============================
[ 5.452578] WARNING: suspicious RCU usage
[ 5.452582] 5.9.0-rc3 #1 Not tainted
[ 5.452585] -----------------------------
[ 5.452590] /usr/src/kernel/include/trace/events/lock.h:63
suspicious rcu_dereference_check() usage!
[ 5.452593]
[ 5.452596] other info that might help us debug this:
[ 5.452599]
[ 5.452601]
[ 5.452605] rcu_scheduler_active = 2, debug_locks = 1
[ 5.452609] RCU used illegally from extended quiescent state!
[ 5.452612] 1 lock held by swapper/0/0:
[ 5.452615] #0: ffff8000127408f8 (logbuf_lock){-...}-{2:2}, at:
vprintk_emit+0xb0/0x358
[ 5.452630]
[ 5.452633] stack backtrace:
[ 5.452636] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #1
[ 5.452640] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[ 5.452643] Call trace:
[ 5.452646] dump_backtrace+0x0/0x1f8
[ 5.452649] show_stack+0x2c/0x38
[ 5.452652] dump_stack+0xec/0x15c
[ 5.452656] lockdep_rcu_suspicious+0xd4/0xf8
[ 5.452659] lock_release+0x338/0x360
[ 5.452662] _raw_spin_unlock+0x3c/0xa0
[ 5.452665] vprintk_emit+0xf8/0x358
[ 5.452669] vprintk_default+0x48/0x58
[ 5.452671] vprintk_func+0xec/0x2b8
[ 5.452674] printk+0x74/0x94
[ 5.452677] lockdep_rcu_suspicious+0x28/0xf8
[ 5.452680] lock_acquire+0x3d0/0x440
[ 5.452683] _raw_spin_lock_irqsave+0x80/0xb0
[ 5.452686] __pm_runtime_suspend+0x34/0x1d0
[ 5.452690] psci_enter_domain_idle_state+0x4c/0xb0
[ 5.452693] cpuidle_enter_state+0xc8/0x610
[ 5.452696] cpuidle_enter+0x3c/0x50
[ 5.452698] call_cpuidle+0x44/0x80
[ 5.452701] do_idle+0x240/0x2a0
[ 5.452704] cpu_startup_entry+0x2c/0x78
[ 5.452708] rest_init+0x1ac/0x280
[ 5.452711] arch_call_rest_init+0x14/0x1c
[ 5.452714] start_kernel+0x50c/0x544

full test log link,
https://qa-reports.linaro.org/lkft/linux-mainline-oe/build/v5.9-rc3/testrun/3137660/suite/linux-log-parser/test/check-kernel-warning-1722813/log

--
Linaro LKFT
https://lkft.linaro.org


2020-08-31 22:46:54

by Paul E. McKenney

[permalink] [raw]
Subject: Re: WARNING: suspicious RCU usage - sdhci-pltfm: SDHCI platform and OF driver helper

On Mon, Aug 31, 2020 at 12:02:31PM +0530, Naresh Kamboju wrote:
> While booting linux mainline kernel on arm64 db410c this kernel warning
> noticed.
>
> metadata:
> git branch: master
> git repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> git commit: f75aef392f869018f78cfedf3c320a6b3fcfda6b
> git describe: v5.9-rc3
> make_kernelversion: 5.9.0-rc3
> kernel-config:
> http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/dragonboard-410c/lkft/linux-mainline/2965/config
>
> Boot log,
>
> [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd030]
> [ 0.000000] Linux version 5.9.0-rc3 (oe-user@oe-host)
> (aarch64-linaro-linux-gcc (GCC) 7.3.0, GNU ld (GNU Binutils)
> 2.30.0.20180208) #1 SMP PREEMPT Mon Aug 31 00:23:15 UTC 2020
> [ 0.000000] Machine model: Qualcomm Technologies, Inc. APQ 8016 SBC
> <>
> [ 5.299090] sdhci: Secure Digital Host Controller Interface driver
> [ 5.299140] sdhci: Copyright(c) Pierre Ossman
> [ 5.304313]
> [ 5.307771] Synopsys Designware Multimedia Card Interface Driver
> [ 5.308588] =============================
> [ 5.308593] WARNING: suspicious RCU usage
> [ 5.316628] sdhci-pltfm: SDHCI platform and OF driver helper
> [ 5.320052] 5.9.0-rc3 #1 Not tainted
> [ 5.320057] -----------------------------
> [ 5.320063] /usr/src/kernel/include/trace/events/lock.h:37
> suspicious rcu_dereference_check() usage!
> [ 5.320068]
> [ 5.320068] other info that might help us debug this:
> [ 5.320068]
> [ 5.320074]
> [ 5.320074] rcu_scheduler_active = 2, debug_locks = 1
> [ 5.320078] RCU used illegally from extended quiescent state!
> [ 5.320084] no locks held by swapper/0/0.
> [ 5.320089]
> [ 5.320089] stack backtrace:
> [ 5.320098] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #1
> [ 5.346354] sdhci_msm 7864900.sdhci: Got CD GPIO
> [ 5.346446] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
> [ 5.346452] Call trace:
> [ 5.346463] dump_backtrace+0x0/0x1f8
> [ 5.346471] show_stack+0x2c/0x38
> [ 5.346480] dump_stack+0xec/0x15c
> [ 5.346490] lockdep_rcu_suspicious+0xd4/0xf8
> [ 5.346499] lock_acquire+0x3d0/0x440
> [ 5.346510] _raw_spin_lock_irqsave+0x80/0xb0
> [ 5.413118] __pm_runtime_suspend+0x34/0x1d0
> [ 5.417457] psci_enter_domain_idle_state+0x4c/0xb0
> [ 5.421795] cpuidle_enter_state+0xc8/0x610
> [ 5.426392] cpuidle_enter+0x3c/0x50
> [ 5.430561] call_cpuidle+0x44/0x80
> [ 5.434378] do_idle+0x240/0x2a0

RCU ignores CPUs in the idle loop, which means that you cannot use
rcu_read_lock() from the idle loop without use of something like
RCU_NONIDLE(). If this is due to event tracing, you should use the
_rcuidle() variant of the event trace statement.

Note also that Peter Zijlstra (CCed) is working to shrink the portion
of the idle loop that RCU ignores. Not sure that it covers your
case, but it is worth checking.

Thanx, Paul

> [ 5.437589] cpu_startup_entry+0x2c/0x78
> [ 5.441063] rest_init+0x1ac/0x280
> [ 5.444970] arch_call_rest_init+0x14/0x1c
> [ 5.448180] start_kernel+0x50c/0x544
> [ 5.452395]
> [ 5.452399]
> [ 5.452403] =============================
> [ 5.452406] WARNING: suspicious RCU usage
> [ 5.452409] 5.9.0-rc3 #1 Not tainted
> [ 5.452412] -----------------------------
> [ 5.452417] /usr/src/kernel/include/trace/events/ipi.h:36
> suspicious rcu_dereference_check() usage!
> [ 5.452420]
> [ 5.452424] other info that might help us debug this:
> [ 5.452426]
> [ 5.452429]
> [ 5.452432] rcu_scheduler_active = 2, debug_locks = 1
> [ 5.452436] RCU used illegally from extended quiescent state!
> [ 5.452440] 1 lock held by swapper/0/0:
> [ 5.452443] #0: ffff8000127408f8 (logbuf_lock){-...}-{2:2}, at:
> vprintk_emit+0xb0/0x358
> [ 5.452458]
> [ 5.452461] stack backtrace:
> [ 5.452465] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #1
> [ 5.452469] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
> [ 5.452472] Call trace:
> [ 5.452476] dump_backtrace+0x0/0x1f8
> [ 5.452479] show_stack+0x2c/0x38
> [ 5.452481] dump_stack+0xec/0x15c
> [ 5.452485] lockdep_rcu_suspicious+0xd4/0xf8
> [ 5.452489] arch_irq_work_raise+0x208/0x210
> [ 5.452493] __irq_work_queue_local+0x64/0x88
> [ 5.452495] irq_work_queue+0x3c/0x88
> [ 5.452499] printk_safe_log_store+0x148/0x178
> [ 5.452502] vprintk_func+0x1cc/0x2b8
> [ 5.452506] printk+0x74/0x94
> [ 5.452509] lockdep_rcu_suspicious+0x28/0xf8
> [ 5.452512] lock_release+0x338/0x360
> [ 5.452516] _raw_spin_unlock+0x3c/0xa0
> [ 5.452519] vprintk_emit+0xf8/0x358
> [ 5.452522] vprintk_default+0x48/0x58
> [ 5.452526] vprintk_func+0xec/0x2b8
> [ 5.452528] printk+0x74/0x94
> [ 5.452532] lockdep_rcu_suspicious+0x28/0xf8
> [ 5.452535] lock_acquire+0x3d0/0x440
> [ 5.452538] _raw_spin_lock_irqsave+0x80/0xb0
> [ 5.452542] __pm_runtime_suspend+0x34/0x1d0
> [ 5.452545] psci_enter_domain_idle_state+0x4c/0xb0
> [ 5.452549] cpuidle_enter_state+0xc8/0x610
> [ 5.452552] cpuidle_enter+0x3c/0x50
> [ 5.452555] call_cpuidle+0x44/0x80
> [ 5.452559] do_idle+0x240/0x2a0
> [ 5.452562] cpu_startup_entry+0x2c/0x78
> [ 5.452564] rest_init+0x1ac/0x280
> [ 5.452568] arch_call_rest_init+0x14/0x1c
> [ 5.452571] start_kernel+0x50c/0x544
> [ 5.452575] =============================
> [ 5.452578] WARNING: suspicious RCU usage
> [ 5.452582] 5.9.0-rc3 #1 Not tainted
> [ 5.452585] -----------------------------
> [ 5.452590] /usr/src/kernel/include/trace/events/lock.h:63
> suspicious rcu_dereference_check() usage!
> [ 5.452593]
> [ 5.452596] other info that might help us debug this:
> [ 5.452599]
> [ 5.452601]
> [ 5.452605] rcu_scheduler_active = 2, debug_locks = 1
> [ 5.452609] RCU used illegally from extended quiescent state!
> [ 5.452612] 1 lock held by swapper/0/0:
> [ 5.452615] #0: ffff8000127408f8 (logbuf_lock){-...}-{2:2}, at:
> vprintk_emit+0xb0/0x358
> [ 5.452630]
> [ 5.452633] stack backtrace:
> [ 5.452636] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #1
> [ 5.452640] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
> [ 5.452643] Call trace:
> [ 5.452646] dump_backtrace+0x0/0x1f8
> [ 5.452649] show_stack+0x2c/0x38
> [ 5.452652] dump_stack+0xec/0x15c
> [ 5.452656] lockdep_rcu_suspicious+0xd4/0xf8
> [ 5.452659] lock_release+0x338/0x360
> [ 5.452662] _raw_spin_unlock+0x3c/0xa0
> [ 5.452665] vprintk_emit+0xf8/0x358
> [ 5.452669] vprintk_default+0x48/0x58
> [ 5.452671] vprintk_func+0xec/0x2b8
> [ 5.452674] printk+0x74/0x94
> [ 5.452677] lockdep_rcu_suspicious+0x28/0xf8
> [ 5.452680] lock_acquire+0x3d0/0x440
> [ 5.452683] _raw_spin_lock_irqsave+0x80/0xb0
> [ 5.452686] __pm_runtime_suspend+0x34/0x1d0
> [ 5.452690] psci_enter_domain_idle_state+0x4c/0xb0
> [ 5.452693] cpuidle_enter_state+0xc8/0x610
> [ 5.452696] cpuidle_enter+0x3c/0x50
> [ 5.452698] call_cpuidle+0x44/0x80
> [ 5.452701] do_idle+0x240/0x2a0
> [ 5.452704] cpu_startup_entry+0x2c/0x78
> [ 5.452708] rest_init+0x1ac/0x280
> [ 5.452711] arch_call_rest_init+0x14/0x1c
> [ 5.452714] start_kernel+0x50c/0x544
>
> full test log link,
> https://qa-reports.linaro.org/lkft/linux-mainline-oe/build/v5.9-rc3/testrun/3137660/suite/linux-log-parser/test/check-kernel-warning-1722813/log
>
> --
> Linaro LKFT
> https://lkft.linaro.org

2020-09-01 06:48:56

by Ulf Hansson

[permalink] [raw]
Subject: Re: WARNING: suspicious RCU usage - sdhci-pltfm: SDHCI platform and OF driver helper

+ Saravanna, Rafael, Lina

On Mon, 31 Aug 2020 at 21:44, Paul E. McKenney <[email protected]> wrote:
>
> On Mon, Aug 31, 2020 at 12:02:31PM +0530, Naresh Kamboju wrote:
> > While booting linux mainline kernel on arm64 db410c this kernel warning
> > noticed.
> >
> > metadata:
> > git branch: master
> > git repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> > git commit: f75aef392f869018f78cfedf3c320a6b3fcfda6b
> > git describe: v5.9-rc3
> > make_kernelversion: 5.9.0-rc3
> > kernel-config:
> > http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/dragonboard-410c/lkft/linux-mainline/2965/config
> >
> > Boot log,
> >
> > [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd030]
> > [ 0.000000] Linux version 5.9.0-rc3 (oe-user@oe-host)
> > (aarch64-linaro-linux-gcc (GCC) 7.3.0, GNU ld (GNU Binutils)
> > 2.30.0.20180208) #1 SMP PREEMPT Mon Aug 31 00:23:15 UTC 2020
> > [ 0.000000] Machine model: Qualcomm Technologies, Inc. APQ 8016 SBC
> > <>
> > [ 5.299090] sdhci: Secure Digital Host Controller Interface driver
> > [ 5.299140] sdhci: Copyright(c) Pierre Ossman
> > [ 5.304313]
> > [ 5.307771] Synopsys Designware Multimedia Card Interface Driver
> > [ 5.308588] =============================
> > [ 5.308593] WARNING: suspicious RCU usage
> > [ 5.316628] sdhci-pltfm: SDHCI platform and OF driver helper
> > [ 5.320052] 5.9.0-rc3 #1 Not tainted
> > [ 5.320057] -----------------------------
> > [ 5.320063] /usr/src/kernel/include/trace/events/lock.h:37
> > suspicious rcu_dereference_check() usage!
> > [ 5.320068]
> > [ 5.320068] other info that might help us debug this:
> > [ 5.320068]
> > [ 5.320074]
> > [ 5.320074] rcu_scheduler_active = 2, debug_locks = 1
> > [ 5.320078] RCU used illegally from extended quiescent state!
> > [ 5.320084] no locks held by swapper/0/0.
> > [ 5.320089]
> > [ 5.320089] stack backtrace:
> > [ 5.320098] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #1
> > [ 5.346354] sdhci_msm 7864900.sdhci: Got CD GPIO
> > [ 5.346446] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
> > [ 5.346452] Call trace:
> > [ 5.346463] dump_backtrace+0x0/0x1f8
> > [ 5.346471] show_stack+0x2c/0x38
> > [ 5.346480] dump_stack+0xec/0x15c
> > [ 5.346490] lockdep_rcu_suspicious+0xd4/0xf8
> > [ 5.346499] lock_acquire+0x3d0/0x440
> > [ 5.346510] _raw_spin_lock_irqsave+0x80/0xb0
> > [ 5.413118] __pm_runtime_suspend+0x34/0x1d0
> > [ 5.417457] psci_enter_domain_idle_state+0x4c/0xb0
> > [ 5.421795] cpuidle_enter_state+0xc8/0x610
> > [ 5.426392] cpuidle_enter+0x3c/0x50
> > [ 5.430561] call_cpuidle+0x44/0x80
> > [ 5.434378] do_idle+0x240/0x2a0
>
> RCU ignores CPUs in the idle loop, which means that you cannot use
> rcu_read_lock() from the idle loop without use of something like
> RCU_NONIDLE(). If this is due to event tracing, you should use the
> _rcuidle() variant of the event trace statement.

In the runtime suspend path, the runtime PM core calls
device_links_read_lock() - if the device in question has any links to
suppliers (to allow them to be suspended too).

device_links_read_lock() calls srcu_read_lock().

It turns out that the device in question (the CPU device that is
attached to genpd) didn't have any links before - but that seems to
have changed, due to the work done by Saravana (links become created
on a per resource basis, parsed from DT during boot).

>
> Note also that Peter Zijlstra (CCed) is working to shrink the portion
> of the idle loop that RCU ignores. Not sure that it covers your
> case, but it is worth checking.

Thanks for letting me know. Let's see what Peter thinks about this then.

Apologize for my ignorance, but from a cpuidle point of view, what
does it mean using RCU_NONIDLE()? I guess we should avoid RCU_NONIDLE
on bigger code paths?

I could add RCU_NONIDLE for the calls to pm_runtime_put_sync_suspend()
and pm_runtime_get_sync() in psci_enter_domain_idle_state(). Perhaps
that's the easiest approach, at least to start with.

Or do you have any other ideas?

[...]

Kind regards
Uffe

2020-09-01 06:54:37

by Ulf Hansson

[permalink] [raw]
Subject: Re: WARNING: suspicious RCU usage - sdhci-pltfm: SDHCI platform and OF driver helper

+ Re-adding Peter (seems like the original address was wrong)

On Tue, 1 Sep 2020 at 08:46, Ulf Hansson <[email protected]> wrote:
>
> + Saravanna, Rafael, Lina
>
> On Mon, 31 Aug 2020 at 21:44, Paul E. McKenney <[email protected]> wrote:
> >
> > On Mon, Aug 31, 2020 at 12:02:31PM +0530, Naresh Kamboju wrote:
> > > While booting linux mainline kernel on arm64 db410c this kernel warning
> > > noticed.
> > >
> > > metadata:
> > > git branch: master
> > > git repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> > > git commit: f75aef392f869018f78cfedf3c320a6b3fcfda6b
> > > git describe: v5.9-rc3
> > > make_kernelversion: 5.9.0-rc3
> > > kernel-config:
> > > http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/dragonboard-410c/lkft/linux-mainline/2965/config
> > >
> > > Boot log,
> > >
> > > [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd030]
> > > [ 0.000000] Linux version 5.9.0-rc3 (oe-user@oe-host)
> > > (aarch64-linaro-linux-gcc (GCC) 7.3.0, GNU ld (GNU Binutils)
> > > 2.30.0.20180208) #1 SMP PREEMPT Mon Aug 31 00:23:15 UTC 2020
> > > [ 0.000000] Machine model: Qualcomm Technologies, Inc. APQ 8016 SBC
> > > <>
> > > [ 5.299090] sdhci: Secure Digital Host Controller Interface driver
> > > [ 5.299140] sdhci: Copyright(c) Pierre Ossman
> > > [ 5.304313]
> > > [ 5.307771] Synopsys Designware Multimedia Card Interface Driver
> > > [ 5.308588] =============================
> > > [ 5.308593] WARNING: suspicious RCU usage
> > > [ 5.316628] sdhci-pltfm: SDHCI platform and OF driver helper
> > > [ 5.320052] 5.9.0-rc3 #1 Not tainted
> > > [ 5.320057] -----------------------------
> > > [ 5.320063] /usr/src/kernel/include/trace/events/lock.h:37
> > > suspicious rcu_dereference_check() usage!
> > > [ 5.320068]
> > > [ 5.320068] other info that might help us debug this:
> > > [ 5.320068]
> > > [ 5.320074]
> > > [ 5.320074] rcu_scheduler_active = 2, debug_locks = 1
> > > [ 5.320078] RCU used illegally from extended quiescent state!
> > > [ 5.320084] no locks held by swapper/0/0.
> > > [ 5.320089]
> > > [ 5.320089] stack backtrace:
> > > [ 5.320098] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #1
> > > [ 5.346354] sdhci_msm 7864900.sdhci: Got CD GPIO
> > > [ 5.346446] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
> > > [ 5.346452] Call trace:
> > > [ 5.346463] dump_backtrace+0x0/0x1f8
> > > [ 5.346471] show_stack+0x2c/0x38
> > > [ 5.346480] dump_stack+0xec/0x15c
> > > [ 5.346490] lockdep_rcu_suspicious+0xd4/0xf8
> > > [ 5.346499] lock_acquire+0x3d0/0x440
> > > [ 5.346510] _raw_spin_lock_irqsave+0x80/0xb0
> > > [ 5.413118] __pm_runtime_suspend+0x34/0x1d0
> > > [ 5.417457] psci_enter_domain_idle_state+0x4c/0xb0
> > > [ 5.421795] cpuidle_enter_state+0xc8/0x610
> > > [ 5.426392] cpuidle_enter+0x3c/0x50
> > > [ 5.430561] call_cpuidle+0x44/0x80
> > > [ 5.434378] do_idle+0x240/0x2a0
> >
> > RCU ignores CPUs in the idle loop, which means that you cannot use
> > rcu_read_lock() from the idle loop without use of something like
> > RCU_NONIDLE(). If this is due to event tracing, you should use the
> > _rcuidle() variant of the event trace statement.
>
> In the runtime suspend path, the runtime PM core calls
> device_links_read_lock() - if the device in question has any links to
> suppliers (to allow them to be suspended too).
>
> device_links_read_lock() calls srcu_read_lock().
>
> It turns out that the device in question (the CPU device that is
> attached to genpd) didn't have any links before - but that seems to
> have changed, due to the work done by Saravana (links become created
> on a per resource basis, parsed from DT during boot).
>
> >
> > Note also that Peter Zijlstra (CCed) is working to shrink the portion
> > of the idle loop that RCU ignores. Not sure that it covers your
> > case, but it is worth checking.
>
> Thanks for letting me know. Let's see what Peter thinks about this then.
>
> Apologize for my ignorance, but from a cpuidle point of view, what
> does it mean using RCU_NONIDLE()? I guess we should avoid RCU_NONIDLE
> on bigger code paths?
>
> I could add RCU_NONIDLE for the calls to pm_runtime_put_sync_suspend()
> and pm_runtime_get_sync() in psci_enter_domain_idle_state(). Perhaps
> that's the easiest approach, at least to start with.
>
> Or do you have any other ideas?
>
> [...]
>
> Kind regards
> Uffe

2020-09-01 10:44:47

by Peter Zijlstra

[permalink] [raw]
Subject: Re: WARNING: suspicious RCU usage - sdhci-pltfm: SDHCI platform and OF driver helper

On Tue, Sep 01, 2020 at 08:50:57AM +0200, Ulf Hansson wrote:
> On Tue, 1 Sep 2020 at 08:46, Ulf Hansson <[email protected]> wrote:
> > On Mon, 31 Aug 2020 at 21:44, Paul E. McKenney <[email protected]> wrote:

> > > > [ 5.308588] =============================
> > > > [ 5.308593] WARNING: suspicious RCU usage
> > > > [ 5.316628] sdhci-pltfm: SDHCI platform and OF driver helper
> > > > [ 5.320052] 5.9.0-rc3 #1 Not tainted
> > > > [ 5.320057] -----------------------------
> > > > [ 5.320063] /usr/src/kernel/include/trace/events/lock.h:37 suspicious rcu_dereference_check() usage!
> > > > [ 5.320068]
> > > > [ 5.320068] other info that might help us debug this:
> > > > [ 5.320068]
> > > > [ 5.320074]
> > > > [ 5.320074] rcu_scheduler_active = 2, debug_locks = 1
> > > > [ 5.320078] RCU used illegally from extended quiescent state!
> > > > [ 5.320084] no locks held by swapper/0/0.
> > > > [ 5.320089]
> > > > [ 5.320089] stack backtrace:
> > > > [ 5.320098] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #1
> > > > [ 5.346354] sdhci_msm 7864900.sdhci: Got CD GPIO
> > > > [ 5.346446] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
> > > > [ 5.346452] Call trace:
> > > > [ 5.346463] dump_backtrace+0x0/0x1f8
> > > > [ 5.346471] show_stack+0x2c/0x38
> > > > [ 5.346480] dump_stack+0xec/0x15c
> > > > [ 5.346490] lockdep_rcu_suspicious+0xd4/0xf8
> > > > [ 5.346499] lock_acquire+0x3d0/0x440
> > > > [ 5.346510] _raw_spin_lock_irqsave+0x80/0xb0
> > > > [ 5.413118] __pm_runtime_suspend+0x34/0x1d0
> > > > [ 5.417457] psci_enter_domain_idle_state+0x4c/0xb0
> > > > [ 5.421795] cpuidle_enter_state+0xc8/0x610
> > > > [ 5.426392] cpuidle_enter+0x3c/0x50
> > > > [ 5.430561] call_cpuidle+0x44/0x80
> > > > [ 5.434378] do_idle+0x240/0x2a0

> > > Note also that Peter Zijlstra (CCed) is working to shrink the portion
> > > of the idle loop that RCU ignores. Not sure that it covers your
> > > case, but it is worth checking.

Right, so I think I 'caused' this by making the lock tracepoints
visible. That is, the error always existed, now we actually warn about
it.

> > Thanks for letting me know. Let's see what Peter thinks about this then.
> >
> > Apologize for my ignorance, but from a cpuidle point of view, what
> > does it mean using RCU_NONIDLE()? I guess we should avoid RCU_NONIDLE
> > on bigger code paths?
> >
> > I could add RCU_NONIDLE for the calls to pm_runtime_put_sync_suspend()
> > and pm_runtime_get_sync() in psci_enter_domain_idle_state(). Perhaps
> > that's the easiest approach, at least to start with.
> >
> > Or do you have any other ideas?

So IMO trace_*_rcuidle() and RCU_NONIDLE() are bugs, they just mean we
got the ordering wrong and are papering over it. That said, that's been
the modus operandi for a while now, just make it shut up and don't think
about it :-/

That said; I pushed the rcu_idle_enter() about as deep as it goes into
generic code in commit 1098582a0f6c ("sched,idle,rcu: Push rcu_idle
deeper into the idle path")

I suppose the next step is pushing it into individual driver when
needed, something like the below perhaps. I realize the coupled idle
state stuff is more complicated that most, but it's also not an area
I've looked at in detail, so perhaps I've just made a bigger mess, but
it ought to give you enough to get going I think.

Rafael?

---
diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
index 74463841805f..617bbef316e6 100644
--- a/drivers/cpuidle/cpuidle-psci.c
+++ b/drivers/cpuidle/cpuidle-psci.c
@@ -49,6 +49,9 @@ static inline u32 psci_get_domain_state(void)

static inline int psci_enter_state(int idx, u32 state)
{
+ /*
+ * XXX push rcu_idle_enter into the coupled code
+ */
return CPU_PM_CPU_IDLE_ENTER_PARAM(psci_cpu_suspend_enter, idx, state);
}

@@ -72,7 +75,9 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
if (!state)
state = states[idx];

+ rcu_idle_enter();
ret = psci_cpu_suspend_enter(state) ? -1 : idx;
+ rcu_idle_exit();

pm_runtime_get_sync(pd_dev);

@@ -125,8 +130,13 @@ static int psci_enter_idle_state(struct cpuidle_device *dev,
struct cpuidle_driver *drv, int idx)
{
u32 *state = __this_cpu_read(psci_cpuidle_data.psci_states);
+ int ret;

- return psci_enter_state(idx, state[idx]);
+ rcu_idle_enter();
+ ret = psci_enter_state(idx, state[idx]);
+ rcu_idle_exit();
+
+ return ret;
}

static const struct of_device_id psci_idle_state_match[] = {
@@ -170,6 +180,7 @@ static int psci_dt_cpu_init_topology(struct cpuidle_driver *drv,
* deeper states.
*/
drv->states[state_count - 1].enter = psci_enter_domain_idle_state;
+ drv->states[state_count - 1].flags = CPUIDLE_FLAG_RCU_IDLE;
psci_cpuidle_use_cpuhp = true;

return 0;
@@ -285,6 +296,7 @@ static int psci_idle_init_cpu(struct device *dev, int cpu)
* state index 0.
*/
drv->states[0].enter = psci_enter_idle_state;
+ drv->states[0].flags = CPUIDLE_FLAG_RCU_IDLE;
drv->states[0].exit_latency = 1;
drv->states[0].target_residency = 1;
drv->states[0].power_usage = UINT_MAX;
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 04becd70cc41..3dbac3bb761b 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -239,9 +239,11 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
time_start = ns_to_ktime(local_clock());

stop_critical_timings();
- rcu_idle_enter();
+ if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
+ rcu_idle_enter();
entered_state = target_state->enter(dev, drv, index);
- rcu_idle_exit();
+ if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
+ rcu_idle_exit();
start_critical_timings();

sched_clock_idle_wakeup_event();
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 75895e6363b8..47f686131a54 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -82,6 +82,7 @@ struct cpuidle_state {
#define CPUIDLE_FLAG_UNUSABLE BIT(3) /* avoid using this state */
#define CPUIDLE_FLAG_OFF BIT(4) /* disable this state by default */
#define CPUIDLE_FLAG_TLB_FLUSHED BIT(5) /* idle-state flushes TLBs */
+#define CPUIDLE_FLAG_RCU_IDLE BIT(6) /* driver will do RCU-idle */

struct cpuidle_device_kobj;
struct cpuidle_state_kobj;

2020-09-01 12:39:09

by Ulf Hansson

[permalink] [raw]
Subject: Re: WARNING: suspicious RCU usage - sdhci-pltfm: SDHCI platform and OF driver helper

On Tue, 1 Sep 2020 at 12:42, <[email protected]> wrote:
>
> On Tue, Sep 01, 2020 at 08:50:57AM +0200, Ulf Hansson wrote:
> > On Tue, 1 Sep 2020 at 08:46, Ulf Hansson <[email protected]> wrote:
> > > On Mon, 31 Aug 2020 at 21:44, Paul E. McKenney <[email protected]> wrote:
>
> > > > > [ 5.308588] =============================
> > > > > [ 5.308593] WARNING: suspicious RCU usage
> > > > > [ 5.316628] sdhci-pltfm: SDHCI platform and OF driver helper
> > > > > [ 5.320052] 5.9.0-rc3 #1 Not tainted
> > > > > [ 5.320057] -----------------------------
> > > > > [ 5.320063] /usr/src/kernel/include/trace/events/lock.h:37 suspicious rcu_dereference_check() usage!
> > > > > [ 5.320068]
> > > > > [ 5.320068] other info that might help us debug this:
> > > > > [ 5.320068]
> > > > > [ 5.320074]
> > > > > [ 5.320074] rcu_scheduler_active = 2, debug_locks = 1
> > > > > [ 5.320078] RCU used illegally from extended quiescent state!
> > > > > [ 5.320084] no locks held by swapper/0/0.
> > > > > [ 5.320089]
> > > > > [ 5.320089] stack backtrace:
> > > > > [ 5.320098] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #1
> > > > > [ 5.346354] sdhci_msm 7864900.sdhci: Got CD GPIO
> > > > > [ 5.346446] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
> > > > > [ 5.346452] Call trace:
> > > > > [ 5.346463] dump_backtrace+0x0/0x1f8
> > > > > [ 5.346471] show_stack+0x2c/0x38
> > > > > [ 5.346480] dump_stack+0xec/0x15c
> > > > > [ 5.346490] lockdep_rcu_suspicious+0xd4/0xf8
> > > > > [ 5.346499] lock_acquire+0x3d0/0x440
> > > > > [ 5.346510] _raw_spin_lock_irqsave+0x80/0xb0
> > > > > [ 5.413118] __pm_runtime_suspend+0x34/0x1d0
> > > > > [ 5.417457] psci_enter_domain_idle_state+0x4c/0xb0
> > > > > [ 5.421795] cpuidle_enter_state+0xc8/0x610
> > > > > [ 5.426392] cpuidle_enter+0x3c/0x50
> > > > > [ 5.430561] call_cpuidle+0x44/0x80
> > > > > [ 5.434378] do_idle+0x240/0x2a0
>
> > > > Note also that Peter Zijlstra (CCed) is working to shrink the portion
> > > > of the idle loop that RCU ignores. Not sure that it covers your
> > > > case, but it is worth checking.
>
> Right, so I think I 'caused' this by making the lock tracepoints
> visible. That is, the error always existed, now we actually warn about
> it.
>
> > > Thanks for letting me know. Let's see what Peter thinks about this then.
> > >
> > > Apologize for my ignorance, but from a cpuidle point of view, what
> > > does it mean using RCU_NONIDLE()? I guess we should avoid RCU_NONIDLE
> > > on bigger code paths?
> > >
> > > I could add RCU_NONIDLE for the calls to pm_runtime_put_sync_suspend()
> > > and pm_runtime_get_sync() in psci_enter_domain_idle_state(). Perhaps
> > > that's the easiest approach, at least to start with.
> > >
> > > Or do you have any other ideas?
>
> So IMO trace_*_rcuidle() and RCU_NONIDLE() are bugs, they just mean we
> got the ordering wrong and are papering over it. That said, that's been
> the modus operandi for a while now, just make it shut up and don't think
> about it :-/
>
> That said; I pushed the rcu_idle_enter() about as deep as it goes into
> generic code in commit 1098582a0f6c ("sched,idle,rcu: Push rcu_idle
> deeper into the idle path")

Aha, that commit should fix this problem, I think. Looks like that
commit was sent as a fix and included in the recent v5.9-rc3.

Naresh, can you try with the above commit?

>
> I suppose the next step is pushing it into individual driver when
> needed, something like the below perhaps. I realize the coupled idle
> state stuff is more complicated that most, but it's also not an area
> I've looked at in detail, so perhaps I've just made a bigger mess, but
> it ought to give you enough to get going I think.

These aren't coupled states. Instead, in cpuidle-psci, we are using PM
domains through genpd and runtime PM to manage "shared idle states"
between CPUs.

Kind regards
Uffe

>
> Rafael?
>
> ---
> diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
> index 74463841805f..617bbef316e6 100644
> --- a/drivers/cpuidle/cpuidle-psci.c
> +++ b/drivers/cpuidle/cpuidle-psci.c
> @@ -49,6 +49,9 @@ static inline u32 psci_get_domain_state(void)
>
> static inline int psci_enter_state(int idx, u32 state)
> {
> + /*
> + * XXX push rcu_idle_enter into the coupled code
> + */
> return CPU_PM_CPU_IDLE_ENTER_PARAM(psci_cpu_suspend_enter, idx, state);
> }
>
> @@ -72,7 +75,9 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
> if (!state)
> state = states[idx];
>
> + rcu_idle_enter();
> ret = psci_cpu_suspend_enter(state) ? -1 : idx;
> + rcu_idle_exit();
>
> pm_runtime_get_sync(pd_dev);
>
> @@ -125,8 +130,13 @@ static int psci_enter_idle_state(struct cpuidle_device *dev,
> struct cpuidle_driver *drv, int idx)
> {
> u32 *state = __this_cpu_read(psci_cpuidle_data.psci_states);
> + int ret;
>
> - return psci_enter_state(idx, state[idx]);
> + rcu_idle_enter();
> + ret = psci_enter_state(idx, state[idx]);
> + rcu_idle_exit();
> +
> + return ret;
> }
>
> static const struct of_device_id psci_idle_state_match[] = {
> @@ -170,6 +180,7 @@ static int psci_dt_cpu_init_topology(struct cpuidle_driver *drv,
> * deeper states.
> */
> drv->states[state_count - 1].enter = psci_enter_domain_idle_state;
> + drv->states[state_count - 1].flags = CPUIDLE_FLAG_RCU_IDLE;
> psci_cpuidle_use_cpuhp = true;
>
> return 0;
> @@ -285,6 +296,7 @@ static int psci_idle_init_cpu(struct device *dev, int cpu)
> * state index 0.
> */
> drv->states[0].enter = psci_enter_idle_state;
> + drv->states[0].flags = CPUIDLE_FLAG_RCU_IDLE;
> drv->states[0].exit_latency = 1;
> drv->states[0].target_residency = 1;
> drv->states[0].power_usage = UINT_MAX;
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index 04becd70cc41..3dbac3bb761b 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -239,9 +239,11 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
> time_start = ns_to_ktime(local_clock());
>
> stop_critical_timings();
> - rcu_idle_enter();
> + if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
> + rcu_idle_enter();
> entered_state = target_state->enter(dev, drv, index);
> - rcu_idle_exit();
> + if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
> + rcu_idle_exit();
> start_critical_timings();
>
> sched_clock_idle_wakeup_event();
> diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
> index 75895e6363b8..47f686131a54 100644
> --- a/include/linux/cpuidle.h
> +++ b/include/linux/cpuidle.h
> @@ -82,6 +82,7 @@ struct cpuidle_state {
> #define CPUIDLE_FLAG_UNUSABLE BIT(3) /* avoid using this state */
> #define CPUIDLE_FLAG_OFF BIT(4) /* disable this state by default */
> #define CPUIDLE_FLAG_TLB_FLUSHED BIT(5) /* idle-state flushes TLBs */
> +#define CPUIDLE_FLAG_RCU_IDLE BIT(6) /* driver will do RCU-idle */
>
> struct cpuidle_device_kobj;
> struct cpuidle_state_kobj;

2020-09-01 12:43:17

by Ulf Hansson

[permalink] [raw]
Subject: Re: WARNING: suspicious RCU usage - sdhci-pltfm: SDHCI platform and OF driver helper

On Tue, 1 Sep 2020 at 14:35, Ulf Hansson <[email protected]> wrote:
>
> On Tue, 1 Sep 2020 at 12:42, <[email protected]> wrote:
> >
> > On Tue, Sep 01, 2020 at 08:50:57AM +0200, Ulf Hansson wrote:
> > > On Tue, 1 Sep 2020 at 08:46, Ulf Hansson <[email protected]> wrote:
> > > > On Mon, 31 Aug 2020 at 21:44, Paul E. McKenney <[email protected]> wrote:
> >
> > > > > > [ 5.308588] =============================
> > > > > > [ 5.308593] WARNING: suspicious RCU usage
> > > > > > [ 5.316628] sdhci-pltfm: SDHCI platform and OF driver helper
> > > > > > [ 5.320052] 5.9.0-rc3 #1 Not tainted
> > > > > > [ 5.320057] -----------------------------
> > > > > > [ 5.320063] /usr/src/kernel/include/trace/events/lock.h:37 suspicious rcu_dereference_check() usage!
> > > > > > [ 5.320068]
> > > > > > [ 5.320068] other info that might help us debug this:
> > > > > > [ 5.320068]
> > > > > > [ 5.320074]
> > > > > > [ 5.320074] rcu_scheduler_active = 2, debug_locks = 1
> > > > > > [ 5.320078] RCU used illegally from extended quiescent state!
> > > > > > [ 5.320084] no locks held by swapper/0/0.
> > > > > > [ 5.320089]
> > > > > > [ 5.320089] stack backtrace:
> > > > > > [ 5.320098] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #1
> > > > > > [ 5.346354] sdhci_msm 7864900.sdhci: Got CD GPIO
> > > > > > [ 5.346446] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
> > > > > > [ 5.346452] Call trace:
> > > > > > [ 5.346463] dump_backtrace+0x0/0x1f8
> > > > > > [ 5.346471] show_stack+0x2c/0x38
> > > > > > [ 5.346480] dump_stack+0xec/0x15c
> > > > > > [ 5.346490] lockdep_rcu_suspicious+0xd4/0xf8
> > > > > > [ 5.346499] lock_acquire+0x3d0/0x440
> > > > > > [ 5.346510] _raw_spin_lock_irqsave+0x80/0xb0
> > > > > > [ 5.413118] __pm_runtime_suspend+0x34/0x1d0
> > > > > > [ 5.417457] psci_enter_domain_idle_state+0x4c/0xb0
> > > > > > [ 5.421795] cpuidle_enter_state+0xc8/0x610
> > > > > > [ 5.426392] cpuidle_enter+0x3c/0x50
> > > > > > [ 5.430561] call_cpuidle+0x44/0x80
> > > > > > [ 5.434378] do_idle+0x240/0x2a0
> >
> > > > > Note also that Peter Zijlstra (CCed) is working to shrink the portion
> > > > > of the idle loop that RCU ignores. Not sure that it covers your
> > > > > case, but it is worth checking.
> >
> > Right, so I think I 'caused' this by making the lock tracepoints
> > visible. That is, the error always existed, now we actually warn about
> > it.
> >
> > > > Thanks for letting me know. Let's see what Peter thinks about this then.
> > > >
> > > > Apologize for my ignorance, but from a cpuidle point of view, what
> > > > does it mean using RCU_NONIDLE()? I guess we should avoid RCU_NONIDLE
> > > > on bigger code paths?
> > > >
> > > > I could add RCU_NONIDLE for the calls to pm_runtime_put_sync_suspend()
> > > > and pm_runtime_get_sync() in psci_enter_domain_idle_state(). Perhaps
> > > > that's the easiest approach, at least to start with.
> > > >
> > > > Or do you have any other ideas?
> >
> > So IMO trace_*_rcuidle() and RCU_NONIDLE() are bugs, they just mean we
> > got the ordering wrong and are papering over it. That said, that's been
> > the modus operandi for a while now, just make it shut up and don't think
> > about it :-/
> >
> > That said; I pushed the rcu_idle_enter() about as deep as it goes into
> > generic code in commit 1098582a0f6c ("sched,idle,rcu: Push rcu_idle
> > deeper into the idle path")
>
> Aha, that commit should fix this problem, I think. Looks like that
> commit was sent as a fix and included in the recent v5.9-rc3.
>
> Naresh, can you try with the above commit?

Ah, just realized that I misread the patch. It doesn't fix it.

We would still need a RCU_NONIDLE() in psci_enter_domain_idle_state()
- or something along the lines of what you suggest below.

Apologies for the noise.

Kind regards
Uffe

>
> >
> > I suppose the next step is pushing it into individual driver when
> > needed, something like the below perhaps. I realize the coupled idle
> > state stuff is more complicated that most, but it's also not an area
> > I've looked at in detail, so perhaps I've just made a bigger mess, but
> > it ought to give you enough to get going I think.
>
> These aren't coupled states. Instead, in cpuidle-psci, we are using PM
> domains through genpd and runtime PM to manage "shared idle states"
> between CPUs.
>
> Kind regards
> Uffe
>
> >
> > Rafael?
> >
> > ---
> > diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
> > index 74463841805f..617bbef316e6 100644
> > --- a/drivers/cpuidle/cpuidle-psci.c
> > +++ b/drivers/cpuidle/cpuidle-psci.c
> > @@ -49,6 +49,9 @@ static inline u32 psci_get_domain_state(void)
> >
> > static inline int psci_enter_state(int idx, u32 state)
> > {
> > + /*
> > + * XXX push rcu_idle_enter into the coupled code
> > + */
> > return CPU_PM_CPU_IDLE_ENTER_PARAM(psci_cpu_suspend_enter, idx, state);
> > }
> >
> > @@ -72,7 +75,9 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
> > if (!state)
> > state = states[idx];
> >
> > + rcu_idle_enter();
> > ret = psci_cpu_suspend_enter(state) ? -1 : idx;
> > + rcu_idle_exit();
> >
> > pm_runtime_get_sync(pd_dev);
> >
> > @@ -125,8 +130,13 @@ static int psci_enter_idle_state(struct cpuidle_device *dev,
> > struct cpuidle_driver *drv, int idx)
> > {
> > u32 *state = __this_cpu_read(psci_cpuidle_data.psci_states);
> > + int ret;
> >
> > - return psci_enter_state(idx, state[idx]);
> > + rcu_idle_enter();
> > + ret = psci_enter_state(idx, state[idx]);
> > + rcu_idle_exit();
> > +
> > + return ret;
> > }
> >
> > static const struct of_device_id psci_idle_state_match[] = {
> > @@ -170,6 +180,7 @@ static int psci_dt_cpu_init_topology(struct cpuidle_driver *drv,
> > * deeper states.
> > */
> > drv->states[state_count - 1].enter = psci_enter_domain_idle_state;
> > + drv->states[state_count - 1].flags = CPUIDLE_FLAG_RCU_IDLE;
> > psci_cpuidle_use_cpuhp = true;
> >
> > return 0;
> > @@ -285,6 +296,7 @@ static int psci_idle_init_cpu(struct device *dev, int cpu)
> > * state index 0.
> > */
> > drv->states[0].enter = psci_enter_idle_state;
> > + drv->states[0].flags = CPUIDLE_FLAG_RCU_IDLE;
> > drv->states[0].exit_latency = 1;
> > drv->states[0].target_residency = 1;
> > drv->states[0].power_usage = UINT_MAX;
> > diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> > index 04becd70cc41..3dbac3bb761b 100644
> > --- a/drivers/cpuidle/cpuidle.c
> > +++ b/drivers/cpuidle/cpuidle.c
> > @@ -239,9 +239,11 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
> > time_start = ns_to_ktime(local_clock());
> >
> > stop_critical_timings();
> > - rcu_idle_enter();
> > + if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
> > + rcu_idle_enter();
> > entered_state = target_state->enter(dev, drv, index);
> > - rcu_idle_exit();
> > + if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
> > + rcu_idle_exit();
> > start_critical_timings();
> >
> > sched_clock_idle_wakeup_event();
> > diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
> > index 75895e6363b8..47f686131a54 100644
> > --- a/include/linux/cpuidle.h
> > +++ b/include/linux/cpuidle.h
> > @@ -82,6 +82,7 @@ struct cpuidle_state {
> > #define CPUIDLE_FLAG_UNUSABLE BIT(3) /* avoid using this state */
> > #define CPUIDLE_FLAG_OFF BIT(4) /* disable this state by default */
> > #define CPUIDLE_FLAG_TLB_FLUSHED BIT(5) /* idle-state flushes TLBs */
> > +#define CPUIDLE_FLAG_RCU_IDLE BIT(6) /* driver will do RCU-idle */
> >
> > struct cpuidle_device_kobj;
> > struct cpuidle_state_kobj;

2020-09-01 15:02:05

by Peter Zijlstra

[permalink] [raw]
Subject: Re: WARNING: suspicious RCU usage - sdhci-pltfm: SDHCI platform and OF driver helper

On Tue, Sep 01, 2020 at 02:35:52PM +0200, Ulf Hansson wrote:
> On Tue, 1 Sep 2020 at 12:42, <[email protected]> wrote:

> > That said; I pushed the rcu_idle_enter() about as deep as it goes into
> > generic code in commit 1098582a0f6c ("sched,idle,rcu: Push rcu_idle
> > deeper into the idle path")
>
> Aha, that commit should fix this problem, I think. Looks like that
> commit was sent as a fix and included in the recent v5.9-rc3.

AFAICT psci_enter_domain_idle_state() is still buggered. All that
pm_runtime_*() stuff is using locks.

Look at this:

psci_enter_domain_idle_state()
pm_runtime_put_sync_suspend()
__pm_runtime_suspend()
spin_lock_irqsave(&dev->power.lock, flags);

That's a definite fail after we've done rcu_idle_enter().

> > I suppose the next step is pushing it into individual driver when
> > needed, something like the below perhaps. I realize the coupled idle
> > state stuff is more complicated that most, but it's also not an area
> > I've looked at in detail, so perhaps I've just made a bigger mess, but
> > it ought to give you enough to get going I think.
>
> These aren't coupled states. Instead, in cpuidle-psci, we are using PM
> domains through genpd and runtime PM to manage "shared idle states"
> between CPUs.

Similar problem I'm thinking, 'complicated' stuff.

2020-09-01 15:02:48

by Paul E. McKenney

[permalink] [raw]
Subject: Re: WARNING: suspicious RCU usage - sdhci-pltfm: SDHCI platform and OF driver helper

On Tue, Sep 01, 2020 at 08:46:54AM +0200, Ulf Hansson wrote:
> + Saravanna, Rafael, Lina
>
> On Mon, 31 Aug 2020 at 21:44, Paul E. McKenney <[email protected]> wrote:
> >
> > On Mon, Aug 31, 2020 at 12:02:31PM +0530, Naresh Kamboju wrote:
> > > While booting linux mainline kernel on arm64 db410c this kernel warning
> > > noticed.
> > >
> > > metadata:
> > > git branch: master
> > > git repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> > > git commit: f75aef392f869018f78cfedf3c320a6b3fcfda6b
> > > git describe: v5.9-rc3
> > > make_kernelversion: 5.9.0-rc3
> > > kernel-config:
> > > http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/dragonboard-410c/lkft/linux-mainline/2965/config
> > >
> > > Boot log,
> > >
> > > [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd030]
> > > [ 0.000000] Linux version 5.9.0-rc3 (oe-user@oe-host)
> > > (aarch64-linaro-linux-gcc (GCC) 7.3.0, GNU ld (GNU Binutils)
> > > 2.30.0.20180208) #1 SMP PREEMPT Mon Aug 31 00:23:15 UTC 2020
> > > [ 0.000000] Machine model: Qualcomm Technologies, Inc. APQ 8016 SBC
> > > <>
> > > [ 5.299090] sdhci: Secure Digital Host Controller Interface driver
> > > [ 5.299140] sdhci: Copyright(c) Pierre Ossman
> > > [ 5.304313]
> > > [ 5.307771] Synopsys Designware Multimedia Card Interface Driver
> > > [ 5.308588] =============================
> > > [ 5.308593] WARNING: suspicious RCU usage
> > > [ 5.316628] sdhci-pltfm: SDHCI platform and OF driver helper
> > > [ 5.320052] 5.9.0-rc3 #1 Not tainted
> > > [ 5.320057] -----------------------------
> > > [ 5.320063] /usr/src/kernel/include/trace/events/lock.h:37
> > > suspicious rcu_dereference_check() usage!
> > > [ 5.320068]
> > > [ 5.320068] other info that might help us debug this:
> > > [ 5.320068]
> > > [ 5.320074]
> > > [ 5.320074] rcu_scheduler_active = 2, debug_locks = 1
> > > [ 5.320078] RCU used illegally from extended quiescent state!
> > > [ 5.320084] no locks held by swapper/0/0.
> > > [ 5.320089]
> > > [ 5.320089] stack backtrace:
> > > [ 5.320098] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #1
> > > [ 5.346354] sdhci_msm 7864900.sdhci: Got CD GPIO
> > > [ 5.346446] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
> > > [ 5.346452] Call trace:
> > > [ 5.346463] dump_backtrace+0x0/0x1f8
> > > [ 5.346471] show_stack+0x2c/0x38
> > > [ 5.346480] dump_stack+0xec/0x15c
> > > [ 5.346490] lockdep_rcu_suspicious+0xd4/0xf8
> > > [ 5.346499] lock_acquire+0x3d0/0x440
> > > [ 5.346510] _raw_spin_lock_irqsave+0x80/0xb0
> > > [ 5.413118] __pm_runtime_suspend+0x34/0x1d0
> > > [ 5.417457] psci_enter_domain_idle_state+0x4c/0xb0
> > > [ 5.421795] cpuidle_enter_state+0xc8/0x610
> > > [ 5.426392] cpuidle_enter+0x3c/0x50
> > > [ 5.430561] call_cpuidle+0x44/0x80
> > > [ 5.434378] do_idle+0x240/0x2a0
> >
> > RCU ignores CPUs in the idle loop, which means that you cannot use
> > rcu_read_lock() from the idle loop without use of something like
> > RCU_NONIDLE(). If this is due to event tracing, you should use the
> > _rcuidle() variant of the event trace statement.
>
> In the runtime suspend path, the runtime PM core calls
> device_links_read_lock() - if the device in question has any links to
> suppliers (to allow them to be suspended too).
>
> device_links_read_lock() calls srcu_read_lock().

Except that it is perfectly legal to invoke srcu_read_lock() from the
idle loop. The problem is instead rcu_read_lock() and similar.

> It turns out that the device in question (the CPU device that is
> attached to genpd) didn't have any links before - but that seems to
> have changed, due to the work done by Saravana (links become created
> on a per resource basis, parsed from DT during boot).
>
> > Note also that Peter Zijlstra (CCed) is working to shrink the portion
> > of the idle loop that RCU ignores. Not sure that it covers your
> > case, but it is worth checking.
>
> Thanks for letting me know. Let's see what Peter thinks about this then.
>
> Apologize for my ignorance, but from a cpuidle point of view, what
> does it mean using RCU_NONIDLE()? I guess we should avoid RCU_NONIDLE
> on bigger code paths?

It means that as far as RCU (and only RCU) is concerned there is an
exit from idle state for just long enough to execute RCU_NONIDLE()'s
argument. This involves an atomic operation on both entry to and exit
from RCU_NONIDLE(), which in most cases won't be noticeable. But in some
cases you might (for example) want to enclose a loop in RCU_NONIDLE()
rather than doing RCU_NONIDLE() on each pass through the loop.

> I could add RCU_NONIDLE for the calls to pm_runtime_put_sync_suspend()
> and pm_runtime_get_sync() in psci_enter_domain_idle_state(). Perhaps
> that's the easiest approach, at least to start with.
>
> Or do you have any other ideas?

Here is the list, though it is early in the morning here:

1. RCU_NONIDLE().

2. Peter's patch, if it turns out to hoist your code out of what
RCU considers to be the idle loop.

3. If the problem is trace events, use the _rcuidle() variant of the
trace event. Instead of trace_blah(), use trace_blah_rcuidle().

4. Switch from RCU (as in rcu_read_lock()) to SRCU (as in
srcu_read_lock()).

5. Take Peter's patch a step further, moving the rcu_idle_enter()
and rcu_idle_exit() calls as needed. But please keep in mind
that these two functions require that irqs be disabled by their
callers.

6. If RCU_NONIDLE() in inconvenient due to early exits and such,
you could use the rcu_irq_enter_irqson() and rcu_irq_exit_irqson()
functions that it calls.

Do any of those help?

Thanx, Paul

2020-09-01 15:52:52

by Peter Zijlstra

[permalink] [raw]
Subject: Re: WARNING: suspicious RCU usage - sdhci-pltfm: SDHCI platform and OF driver helper

On Tue, Sep 01, 2020 at 09:44:17AM -0600, Lina Iyer wrote:
> > > > > > I could add RCU_NONIDLE for the calls to pm_runtime_put_sync_suspend()
> > > > > > and pm_runtime_get_sync() in psci_enter_domain_idle_state(). Perhaps
> > > > > > that's the easiest approach, at least to start with.

> I think this would be nice. This should also cover the case, where PM domain
> power off notification callbacks call trace function internally. Right?

That's just more crap for me to clean up later :-(

trace_*_rcuidle() and RCU_NONIDLE() need to die, not proliferate.

2020-09-01 15:56:03

by Lina Iyer

[permalink] [raw]
Subject: Re: WARNING: suspicious RCU usage - sdhci-pltfm: SDHCI platform and OF driver helper

Hi Ulf,

On Tue, Sep 01 2020 at 06:41 -0600, Ulf Hansson wrote:
>On Tue, 1 Sep 2020 at 14:35, Ulf Hansson <[email protected]> wrote:
>>
>> On Tue, 1 Sep 2020 at 12:42, <[email protected]> wrote:
>> >
>> > On Tue, Sep 01, 2020 at 08:50:57AM +0200, Ulf Hansson wrote:
>> > > On Tue, 1 Sep 2020 at 08:46, Ulf Hansson <[email protected]> wrote:
>> > > > On Mon, 31 Aug 2020 at 21:44, Paul E. McKenney <[email protected]> wrote:
>> >
>> > > > > > [ 5.308588] =============================
>> > > > > > [ 5.308593] WARNING: suspicious RCU usage
>> > > > > > [ 5.316628] sdhci-pltfm: SDHCI platform and OF driver helper
>> > > > > > [ 5.320052] 5.9.0-rc3 #1 Not tainted
>> > > > > > [ 5.320057] -----------------------------
>> > > > > > [ 5.320063] /usr/src/kernel/include/trace/events/lock.h:37 suspicious rcu_dereference_check() usage!
>> > > > > > [ 5.320068]
>> > > > > > [ 5.320068] other info that might help us debug this:
>> > > > > > [ 5.320068]
>> > > > > > [ 5.320074]
>> > > > > > [ 5.320074] rcu_scheduler_active = 2, debug_locks = 1
>> > > > > > [ 5.320078] RCU used illegally from extended quiescent state!
>> > > > > > [ 5.320084] no locks held by swapper/0/0.
>> > > > > > [ 5.320089]
>> > > > > > [ 5.320089] stack backtrace:
>> > > > > > [ 5.320098] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #1
>> > > > > > [ 5.346354] sdhci_msm 7864900.sdhci: Got CD GPIO
>> > > > > > [ 5.346446] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
>> > > > > > [ 5.346452] Call trace:
>> > > > > > [ 5.346463] dump_backtrace+0x0/0x1f8
>> > > > > > [ 5.346471] show_stack+0x2c/0x38
>> > > > > > [ 5.346480] dump_stack+0xec/0x15c
>> > > > > > [ 5.346490] lockdep_rcu_suspicious+0xd4/0xf8
>> > > > > > [ 5.346499] lock_acquire+0x3d0/0x440
>> > > > > > [ 5.346510] _raw_spin_lock_irqsave+0x80/0xb0
>> > > > > > [ 5.413118] __pm_runtime_suspend+0x34/0x1d0
>> > > > > > [ 5.417457] psci_enter_domain_idle_state+0x4c/0xb0
>> > > > > > [ 5.421795] cpuidle_enter_state+0xc8/0x610
>> > > > > > [ 5.426392] cpuidle_enter+0x3c/0x50
>> > > > > > [ 5.430561] call_cpuidle+0x44/0x80
>> > > > > > [ 5.434378] do_idle+0x240/0x2a0
>> >
>> > > > > Note also that Peter Zijlstra (CCed) is working to shrink the portion
>> > > > > of the idle loop that RCU ignores. Not sure that it covers your
>> > > > > case, but it is worth checking.
>> >
>> > Right, so I think I 'caused' this by making the lock tracepoints
>> > visible. That is, the error always existed, now we actually warn about
>> > it.
>> >
>> > > > Thanks for letting me know. Let's see what Peter thinks about this then.
>> > > >
>> > > > Apologize for my ignorance, but from a cpuidle point of view, what
>> > > > does it mean using RCU_NONIDLE()? I guess we should avoid RCU_NONIDLE
>> > > > on bigger code paths?
>> > > >
>> > > > I could add RCU_NONIDLE for the calls to pm_runtime_put_sync_suspend()
>> > > > and pm_runtime_get_sync() in psci_enter_domain_idle_state(). Perhaps
>> > > > that's the easiest approach, at least to start with.
>> > > >
I think this would be nice. This should also cover the case, where PM domain
power off notification callbacks call trace function internally. Right?

--Lina

>> > > > Or do you have any other ideas?
>> >
>> > So IMO trace_*_rcuidle() and RCU_NONIDLE() are bugs, they just mean we
>> > got the ordering wrong and are papering over it. That said, that's been
>> > the modus operandi for a while now, just make it shut up and don't think
>> > about it :-/
>> >
>> > That said; I pushed the rcu_idle_enter() about as deep as it goes into
>> > generic code in commit 1098582a0f6c ("sched,idle,rcu: Push rcu_idle
>> > deeper into the idle path")
>>
>> Aha, that commit should fix this problem, I think. Looks like that
>> commit was sent as a fix and included in the recent v5.9-rc3.
>>
>> Naresh, can you try with the above commit?
>
>Ah, just realized that I misread the patch. It doesn't fix it.
>
>We would still need a RCU_NONIDLE() in psci_enter_domain_idle_state()
>- or something along the lines of what you suggest below.
>
>Apologies for the noise.
>
>Kind regards
>Uffe
>
[1]. https://lkml.org/lkml/2020/8/26/81
>>
>> >
>> > I suppose the next step is pushing it into individual driver when
>> > needed, something like the below perhaps. I realize the coupled idle
>> > state stuff is more complicated that most, but it's also not an area
>> > I've looked at in detail, so perhaps I've just made a bigger mess, but
>> > it ought to give you enough to get going I think.
>>
>> These aren't coupled states. Instead, in cpuidle-psci, we are using PM
>> domains through genpd and runtime PM to manage "shared idle states"
>> between CPUs.
>>
>> Kind regards
>> Uffe
>>
>> >
>> > Rafael?
>> >
>> > ---
>> > diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
>> > index 74463841805f..617bbef316e6 100644
>> > --- a/drivers/cpuidle/cpuidle-psci.c
>> > +++ b/drivers/cpuidle/cpuidle-psci.c
>> > @@ -49,6 +49,9 @@ static inline u32 psci_get_domain_state(void)
>> >
>> > static inline int psci_enter_state(int idx, u32 state)
>> > {
>> > + /*
>> > + * XXX push rcu_idle_enter into the coupled code
>> > + */
>> > return CPU_PM_CPU_IDLE_ENTER_PARAM(psci_cpu_suspend_enter, idx, state);
>> > }
>> >
>> > @@ -72,7 +75,9 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
>> > if (!state)
>> > state = states[idx];
>> >
>> > + rcu_idle_enter();
>> > ret = psci_cpu_suspend_enter(state) ? -1 : idx;
>> > + rcu_idle_exit();
>> >
>> > pm_runtime_get_sync(pd_dev);
>> >
>> > @@ -125,8 +130,13 @@ static int psci_enter_idle_state(struct cpuidle_device *dev,
>> > struct cpuidle_driver *drv, int idx)
>> > {
>> > u32 *state = __this_cpu_read(psci_cpuidle_data.psci_states);
>> > + int ret;
>> >
>> > - return psci_enter_state(idx, state[idx]);
>> > + rcu_idle_enter();
>> > + ret = psci_enter_state(idx, state[idx]);
>> > + rcu_idle_exit();
>> > +
>> > + return ret;
>> > }
>> >
>> > static const struct of_device_id psci_idle_state_match[] = {
>> > @@ -170,6 +180,7 @@ static int psci_dt_cpu_init_topology(struct cpuidle_driver *drv,
>> > * deeper states.
>> > */
>> > drv->states[state_count - 1].enter = psci_enter_domain_idle_state;
>> > + drv->states[state_count - 1].flags = CPUIDLE_FLAG_RCU_IDLE;
>> > psci_cpuidle_use_cpuhp = true;
>> >
>> > return 0;
>> > @@ -285,6 +296,7 @@ static int psci_idle_init_cpu(struct device *dev, int cpu)
>> > * state index 0.
>> > */
>> > drv->states[0].enter = psci_enter_idle_state;
>> > + drv->states[0].flags = CPUIDLE_FLAG_RCU_IDLE;
>> > drv->states[0].exit_latency = 1;
>> > drv->states[0].target_residency = 1;
>> > drv->states[0].power_usage = UINT_MAX;
>> > diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
>> > index 04becd70cc41..3dbac3bb761b 100644
>> > --- a/drivers/cpuidle/cpuidle.c
>> > +++ b/drivers/cpuidle/cpuidle.c
>> > @@ -239,9 +239,11 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
>> > time_start = ns_to_ktime(local_clock());
>> >
>> > stop_critical_timings();
>> > - rcu_idle_enter();
>> > + if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
>> > + rcu_idle_enter();
>> > entered_state = target_state->enter(dev, drv, index);
>> > - rcu_idle_exit();
>> > + if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
>> > + rcu_idle_exit();
>> > start_critical_timings();
>> >
>> > sched_clock_idle_wakeup_event();
>> > diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
>> > index 75895e6363b8..47f686131a54 100644
>> > --- a/include/linux/cpuidle.h
>> > +++ b/include/linux/cpuidle.h
>> > @@ -82,6 +82,7 @@ struct cpuidle_state {
>> > #define CPUIDLE_FLAG_UNUSABLE BIT(3) /* avoid using this state */
>> > #define CPUIDLE_FLAG_OFF BIT(4) /* disable this state by default */
>> > #define CPUIDLE_FLAG_TLB_FLUSHED BIT(5) /* idle-state flushes TLBs */
>> > +#define CPUIDLE_FLAG_RCU_IDLE BIT(6) /* driver will do RCU-idle */
>> >
>> > struct cpuidle_device_kobj;
>> > struct cpuidle_state_kobj;

2020-09-01 16:14:43

by Paul E. McKenney

[permalink] [raw]
Subject: Re: WARNING: suspicious RCU usage - sdhci-pltfm: SDHCI platform and OF driver helper

On Tue, Sep 01, 2020 at 05:50:14PM +0200, [email protected] wrote:
> On Tue, Sep 01, 2020 at 09:44:17AM -0600, Lina Iyer wrote:
> > > > > > > I could add RCU_NONIDLE for the calls to pm_runtime_put_sync_suspend()
> > > > > > > and pm_runtime_get_sync() in psci_enter_domain_idle_state(). Perhaps
> > > > > > > that's the easiest approach, at least to start with.
>
> > I think this would be nice. This should also cover the case, where PM domain
> > power off notification callbacks call trace function internally. Right?
>
> That's just more crap for me to clean up later :-(
>
> trace_*_rcuidle() and RCU_NONIDLE() need to die, not proliferate.

Moving the idle-entry boundary further in is good in any number of ways.
But experience indicates that no matter how far you move it, there will
be something complex further in. Unless you are pushing it all the way
into all the arch-specific code down as far as it can possibly go?

Thanx, Paul

2020-09-01 17:45:30

by Peter Zijlstra

[permalink] [raw]
Subject: Re: WARNING: suspicious RCU usage - sdhci-pltfm: SDHCI platform and OF driver helper

On Tue, Sep 01, 2020 at 09:13:40AM -0700, Paul E. McKenney wrote:
> On Tue, Sep 01, 2020 at 05:50:14PM +0200, [email protected] wrote:
> > On Tue, Sep 01, 2020 at 09:44:17AM -0600, Lina Iyer wrote:
> > > > > > > > I could add RCU_NONIDLE for the calls to pm_runtime_put_sync_suspend()
> > > > > > > > and pm_runtime_get_sync() in psci_enter_domain_idle_state(). Perhaps
> > > > > > > > that's the easiest approach, at least to start with.
> >
> > > I think this would be nice. This should also cover the case, where PM domain
> > > power off notification callbacks call trace function internally. Right?
> >
> > That's just more crap for me to clean up later :-(
> >
> > trace_*_rcuidle() and RCU_NONIDLE() need to die, not proliferate.
>
> Moving the idle-entry boundary further in is good in any number of ways.
> But experience indicates that no matter how far you move it, there will
> be something complex further in. Unless you are pushing it all the way
> into all the arch-specific code down as far as it can possibly go?

Not all; the simple cpuidle drivers should be good already. The more
complicated ones need some help.

The patch provided earlier:

https://lkml.kernel.org/r/[email protected]

should allow the complicated drivers to take over and DTRT.

2020-09-02 06:51:08

by Ulf Hansson

[permalink] [raw]
Subject: Re: WARNING: suspicious RCU usage - sdhci-pltfm: SDHCI platform and OF driver helper

On Tue, 1 Sep 2020 at 17:00, Paul E. McKenney <[email protected]> wrote:
>
> On Tue, Sep 01, 2020 at 08:46:54AM +0200, Ulf Hansson wrote:
> > + Saravanna, Rafael, Lina
> >
> > On Mon, 31 Aug 2020 at 21:44, Paul E. McKenney <[email protected]> wrote:
> > >
> > > On Mon, Aug 31, 2020 at 12:02:31PM +0530, Naresh Kamboju wrote:
> > > > While booting linux mainline kernel on arm64 db410c this kernel warning
> > > > noticed.
> > > >
> > > > metadata:
> > > > git branch: master
> > > > git repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> > > > git commit: f75aef392f869018f78cfedf3c320a6b3fcfda6b
> > > > git describe: v5.9-rc3
> > > > make_kernelversion: 5.9.0-rc3
> > > > kernel-config:
> > > > http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/dragonboard-410c/lkft/linux-mainline/2965/config
> > > >
> > > > Boot log,
> > > >
> > > > [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd030]
> > > > [ 0.000000] Linux version 5.9.0-rc3 (oe-user@oe-host)
> > > > (aarch64-linaro-linux-gcc (GCC) 7.3.0, GNU ld (GNU Binutils)
> > > > 2.30.0.20180208) #1 SMP PREEMPT Mon Aug 31 00:23:15 UTC 2020
> > > > [ 0.000000] Machine model: Qualcomm Technologies, Inc. APQ 8016 SBC
> > > > <>
> > > > [ 5.299090] sdhci: Secure Digital Host Controller Interface driver
> > > > [ 5.299140] sdhci: Copyright(c) Pierre Ossman
> > > > [ 5.304313]
> > > > [ 5.307771] Synopsys Designware Multimedia Card Interface Driver
> > > > [ 5.308588] =============================
> > > > [ 5.308593] WARNING: suspicious RCU usage
> > > > [ 5.316628] sdhci-pltfm: SDHCI platform and OF driver helper
> > > > [ 5.320052] 5.9.0-rc3 #1 Not tainted
> > > > [ 5.320057] -----------------------------
> > > > [ 5.320063] /usr/src/kernel/include/trace/events/lock.h:37
> > > > suspicious rcu_dereference_check() usage!
> > > > [ 5.320068]
> > > > [ 5.320068] other info that might help us debug this:
> > > > [ 5.320068]
> > > > [ 5.320074]
> > > > [ 5.320074] rcu_scheduler_active = 2, debug_locks = 1
> > > > [ 5.320078] RCU used illegally from extended quiescent state!
> > > > [ 5.320084] no locks held by swapper/0/0.
> > > > [ 5.320089]
> > > > [ 5.320089] stack backtrace:
> > > > [ 5.320098] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #1
> > > > [ 5.346354] sdhci_msm 7864900.sdhci: Got CD GPIO
> > > > [ 5.346446] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
> > > > [ 5.346452] Call trace:
> > > > [ 5.346463] dump_backtrace+0x0/0x1f8
> > > > [ 5.346471] show_stack+0x2c/0x38
> > > > [ 5.346480] dump_stack+0xec/0x15c
> > > > [ 5.346490] lockdep_rcu_suspicious+0xd4/0xf8
> > > > [ 5.346499] lock_acquire+0x3d0/0x440
> > > > [ 5.346510] _raw_spin_lock_irqsave+0x80/0xb0
> > > > [ 5.413118] __pm_runtime_suspend+0x34/0x1d0
> > > > [ 5.417457] psci_enter_domain_idle_state+0x4c/0xb0
> > > > [ 5.421795] cpuidle_enter_state+0xc8/0x610
> > > > [ 5.426392] cpuidle_enter+0x3c/0x50
> > > > [ 5.430561] call_cpuidle+0x44/0x80
> > > > [ 5.434378] do_idle+0x240/0x2a0
> > >
> > > RCU ignores CPUs in the idle loop, which means that you cannot use
> > > rcu_read_lock() from the idle loop without use of something like
> > > RCU_NONIDLE(). If this is due to event tracing, you should use the
> > > _rcuidle() variant of the event trace statement.
> >
> > In the runtime suspend path, the runtime PM core calls
> > device_links_read_lock() - if the device in question has any links to
> > suppliers (to allow them to be suspended too).
> >
> > device_links_read_lock() calls srcu_read_lock().
>
> Except that it is perfectly legal to invoke srcu_read_lock() from the
> idle loop. The problem is instead rcu_read_lock() and similar.

Hmm. Sounds like more debugging is needed then, to narrow down the problem.

>
> > It turns out that the device in question (the CPU device that is
> > attached to genpd) didn't have any links before - but that seems to
> > have changed, due to the work done by Saravana (links become created
> > on a per resource basis, parsed from DT during boot).
> >
> > > Note also that Peter Zijlstra (CCed) is working to shrink the portion
> > > of the idle loop that RCU ignores. Not sure that it covers your
> > > case, but it is worth checking.
> >
> > Thanks for letting me know. Let's see what Peter thinks about this then.
> >
> > Apologize for my ignorance, but from a cpuidle point of view, what
> > does it mean using RCU_NONIDLE()? I guess we should avoid RCU_NONIDLE
> > on bigger code paths?
>
> It means that as far as RCU (and only RCU) is concerned there is an
> exit from idle state for just long enough to execute RCU_NONIDLE()'s
> argument. This involves an atomic operation on both entry to and exit
> from RCU_NONIDLE(), which in most cases won't be noticeable. But in some
> cases you might (for example) want to enclose a loop in RCU_NONIDLE()
> rather than doing RCU_NONIDLE() on each pass through the loop.
>
> > I could add RCU_NONIDLE for the calls to pm_runtime_put_sync_suspend()
> > and pm_runtime_get_sync() in psci_enter_domain_idle_state(). Perhaps
> > that's the easiest approach, at least to start with.
> >
> > Or do you have any other ideas?
>
> Here is the list, though it is early in the morning here:
>
> 1. RCU_NONIDLE().
>
> 2. Peter's patch, if it turns out to hoist your code out of what
> RCU considers to be the idle loop.
>
> 3. If the problem is trace events, use the _rcuidle() variant of the
> trace event. Instead of trace_blah(), use trace_blah_rcuidle().
>
> 4. Switch from RCU (as in rcu_read_lock()) to SRCU (as in
> srcu_read_lock()).
>
> 5. Take Peter's patch a step further, moving the rcu_idle_enter()
> and rcu_idle_exit() calls as needed. But please keep in mind
> that these two functions require that irqs be disabled by their
> callers.
>
> 6. If RCU_NONIDLE() in inconvenient due to early exits and such,
> you could use the rcu_irq_enter_irqson() and rcu_irq_exit_irqson()
> functions that it calls.
>
> Do any of those help?

Yes, they will, in one way or the other. Thanks for providing me with
all the available options.

BTW, I still don't get what good rcu_idle_enter|exit() does, but I am
assuming those need to be called at some point before the CPU goes to
sleep.

Kind regards
Uffe

2020-09-02 07:05:22

by Ulf Hansson

[permalink] [raw]
Subject: Re: WARNING: suspicious RCU usage - sdhci-pltfm: SDHCI platform and OF driver helper

On Tue, 1 Sep 2020 at 19:42, Peter Zijlstra <[email protected]> wrote:
>
> On Tue, Sep 01, 2020 at 09:13:40AM -0700, Paul E. McKenney wrote:
> > On Tue, Sep 01, 2020 at 05:50:14PM +0200, [email protected] wrote:
> > > On Tue, Sep 01, 2020 at 09:44:17AM -0600, Lina Iyer wrote:
> > > > > > > > > I could add RCU_NONIDLE for the calls to pm_runtime_put_sync_suspend()
> > > > > > > > > and pm_runtime_get_sync() in psci_enter_domain_idle_state(). Perhaps
> > > > > > > > > that's the easiest approach, at least to start with.
> > >
> > > > I think this would be nice. This should also cover the case, where PM domain
> > > > power off notification callbacks call trace function internally. Right?
> > >
> > > That's just more crap for me to clean up later :-(
> > >
> > > trace_*_rcuidle() and RCU_NONIDLE() need to die, not proliferate.
> >
> > Moving the idle-entry boundary further in is good in any number of ways.
> > But experience indicates that no matter how far you move it, there will
> > be something complex further in. Unless you are pushing it all the way
> > into all the arch-specific code down as far as it can possibly go?
>
> Not all; the simple cpuidle drivers should be good already. The more
> complicated ones need some help.
>
> The patch provided earlier:
>
> https://lkml.kernel.org/r/[email protected]
>
> should allow the complicated drivers to take over and DTRT.

Don't get me wrong, I fully support your approach by moving the
rcu_idle_enter() down as far as possible, but it seems to require more
work than just adding a simple flag for the idle states.

Lots of cpuidle drivers are using CPU_PM notifiers (grep for
cpu_pm_enter and you will see) from their idlestates ->enter()
callbacks. And for those we are already calling
rcu_irq_enter_irqson|off() in cpu_pm_notify() when firing them.

Kind regards
Uffe

2020-09-02 12:15:23

by Peter Zijlstra

[permalink] [raw]
Subject: Re: WARNING: suspicious RCU usage - sdhci-pltfm: SDHCI platform and OF driver helper

On Wed, Sep 02, 2020 at 09:03:37AM +0200, Ulf Hansson wrote:
> Lots of cpuidle drivers are using CPU_PM notifiers (grep for
> cpu_pm_enter and you will see) from their idlestates ->enter()
> callbacks. And for those we are already calling
> rcu_irq_enter_irqson|off() in cpu_pm_notify() when firing them.

Yeah, that particular trainwreck is on my todo list already ... then
again, that list is forever overflowing.

I'm thinking cpu_pm_unregister_notifier() is not a common thing? The few
I looked at seem to suggest 'never' is a good approximation.

It would be fairly trivial to replace the atomic_notifier usage with a
raw_notifier a lock and either stop-machine or IPIs. Better still would
be if we can get rid of it entirely, but I can't tell in a hurry if that
is possible.

2020-09-02 15:01:05

by Paul E. McKenney

[permalink] [raw]
Subject: Re: WARNING: suspicious RCU usage - sdhci-pltfm: SDHCI platform and OF driver helper

On Wed, Sep 02, 2020 at 08:49:11AM +0200, Ulf Hansson wrote:
> On Tue, 1 Sep 2020 at 17:00, Paul E. McKenney <[email protected]> wrote:

[ . . . ]

> > Here is the list, though it is early in the morning here:
> >
> > 1. RCU_NONIDLE().
> >
> > 2. Peter's patch, if it turns out to hoist your code out of what
> > RCU considers to be the idle loop.
> >
> > 3. If the problem is trace events, use the _rcuidle() variant of the
> > trace event. Instead of trace_blah(), use trace_blah_rcuidle().
> >
> > 4. Switch from RCU (as in rcu_read_lock()) to SRCU (as in
> > srcu_read_lock()).
> >
> > 5. Take Peter's patch a step further, moving the rcu_idle_enter()
> > and rcu_idle_exit() calls as needed. But please keep in mind
> > that these two functions require that irqs be disabled by their
> > callers.
> >
> > 6. If RCU_NONIDLE() in inconvenient due to early exits and such,
> > you could use the rcu_irq_enter_irqson() and rcu_irq_exit_irqson()
> > functions that it calls.
> >
> > Do any of those help?
>
> Yes, they will, in one way or the other. Thanks for providing me with
> all the available options.
>
> BTW, I still don't get what good rcu_idle_enter|exit() does, but I am
> assuming those need to be called at some point before the CPU goes to
> sleep.

These functions allow RCU to leave idle CPUs undisturbed. If they
were not invoked, RCU would periodically IPI idle CPUs to verify that
there were no RCU readers running on them. This would be quite bad for
battery lifetime, among other things. So the call to rcu_idle_enter()
tells RCU that it may safely completely ignore this CPU until its next
call to rcu_idle_exit().

Thanx, Paul

2020-09-02 16:02:37

by Ulf Hansson

[permalink] [raw]
Subject: Re: WARNING: suspicious RCU usage - sdhci-pltfm: SDHCI platform and OF driver helper

On Wed, 2 Sep 2020 at 14:14, <[email protected]> wrote:
>
> On Wed, Sep 02, 2020 at 09:03:37AM +0200, Ulf Hansson wrote:
> > Lots of cpuidle drivers are using CPU_PM notifiers (grep for
> > cpu_pm_enter and you will see) from their idlestates ->enter()
> > callbacks. And for those we are already calling
> > rcu_irq_enter_irqson|off() in cpu_pm_notify() when firing them.
>
> Yeah, that particular trainwreck is on my todo list already ... then
> again, that list is forever overflowing.
>
> I'm thinking cpu_pm_unregister_notifier() is not a common thing? The few
> I looked at seem to suggest 'never' is a good approximation.

The trend is that drivers are turning into regular modules that may
also need to manage "->remove()", which may mean unregistering the
notifier. Of course, I don't know for sure whether that becomes a
problem, but it seems quite limiting.

>
> It would be fairly trivial to replace the atomic_notifier usage with a
> raw_notifier a lock and either stop-machine or IPIs. Better still would
> be if we can get rid of it entirely, but I can't tell in a hurry if that
> is possible.

Okay, let's see.

In any case, I was thinking that the patch with CPU idle flag, for
letting CPU idle drivers deal with RCU, that you proposed, seems like
a good first step.

At least it should enable us to solve the problem for runtime PM in
psci_enter_domain_idle_state(). Let me update the patch and send it
out, then we can continue the discussion over there.

Kind regards
Uffe

2020-09-02 16:11:13

by Ulf Hansson

[permalink] [raw]
Subject: Re: WARNING: suspicious RCU usage - sdhci-pltfm: SDHCI platform and OF driver helper

On Wed, 2 Sep 2020 at 15:52, Paul E. McKenney <[email protected]> wrote:
>
> On Wed, Sep 02, 2020 at 08:49:11AM +0200, Ulf Hansson wrote:
> > On Tue, 1 Sep 2020 at 17:00, Paul E. McKenney <[email protected]> wrote:
>
> [ . . . ]
>
> > > Here is the list, though it is early in the morning here:
> > >
> > > 1. RCU_NONIDLE().
> > >
> > > 2. Peter's patch, if it turns out to hoist your code out of what
> > > RCU considers to be the idle loop.
> > >
> > > 3. If the problem is trace events, use the _rcuidle() variant of the
> > > trace event. Instead of trace_blah(), use trace_blah_rcuidle().
> > >
> > > 4. Switch from RCU (as in rcu_read_lock()) to SRCU (as in
> > > srcu_read_lock()).
> > >
> > > 5. Take Peter's patch a step further, moving the rcu_idle_enter()
> > > and rcu_idle_exit() calls as needed. But please keep in mind
> > > that these two functions require that irqs be disabled by their
> > > callers.
> > >
> > > 6. If RCU_NONIDLE() in inconvenient due to early exits and such,
> > > you could use the rcu_irq_enter_irqson() and rcu_irq_exit_irqson()
> > > functions that it calls.
> > >
> > > Do any of those help?
> >
> > Yes, they will, in one way or the other. Thanks for providing me with
> > all the available options.
> >
> > BTW, I still don't get what good rcu_idle_enter|exit() does, but I am
> > assuming those need to be called at some point before the CPU goes to
> > sleep.
>
> These functions allow RCU to leave idle CPUs undisturbed. If they
> were not invoked, RCU would periodically IPI idle CPUs to verify that
> there were no RCU readers running on them. This would be quite bad for
> battery lifetime, among other things. So the call to rcu_idle_enter()
> tells RCU that it may safely completely ignore this CPU until its next
> call to rcu_idle_exit().

Alright, thanks for explaining this, much appreciated.

So in one way, we would also like to call rcu_idle_enter(), as soon as
we know there is no need for the RCU to be active. To prevent
unnecessary IPIs I mean. :-)

Kind regards
Uffe

2020-09-02 17:03:07

by Paul E. McKenney

[permalink] [raw]
Subject: Re: WARNING: suspicious RCU usage - sdhci-pltfm: SDHCI platform and OF driver helper

On Wed, Sep 02, 2020 at 06:07:05PM +0200, Ulf Hansson wrote:
> On Wed, 2 Sep 2020 at 15:52, Paul E. McKenney <[email protected]> wrote:
> >
> > On Wed, Sep 02, 2020 at 08:49:11AM +0200, Ulf Hansson wrote:
> > > On Tue, 1 Sep 2020 at 17:00, Paul E. McKenney <[email protected]> wrote:
> >
> > [ . . . ]
> >
> > > > Here is the list, though it is early in the morning here:
> > > >
> > > > 1. RCU_NONIDLE().
> > > >
> > > > 2. Peter's patch, if it turns out to hoist your code out of what
> > > > RCU considers to be the idle loop.
> > > >
> > > > 3. If the problem is trace events, use the _rcuidle() variant of the
> > > > trace event. Instead of trace_blah(), use trace_blah_rcuidle().
> > > >
> > > > 4. Switch from RCU (as in rcu_read_lock()) to SRCU (as in
> > > > srcu_read_lock()).
> > > >
> > > > 5. Take Peter's patch a step further, moving the rcu_idle_enter()
> > > > and rcu_idle_exit() calls as needed. But please keep in mind
> > > > that these two functions require that irqs be disabled by their
> > > > callers.
> > > >
> > > > 6. If RCU_NONIDLE() in inconvenient due to early exits and such,
> > > > you could use the rcu_irq_enter_irqson() and rcu_irq_exit_irqson()
> > > > functions that it calls.
> > > >
> > > > Do any of those help?
> > >
> > > Yes, they will, in one way or the other. Thanks for providing me with
> > > all the available options.
> > >
> > > BTW, I still don't get what good rcu_idle_enter|exit() does, but I am
> > > assuming those need to be called at some point before the CPU goes to
> > > sleep.
> >
> > These functions allow RCU to leave idle CPUs undisturbed. If they
> > were not invoked, RCU would periodically IPI idle CPUs to verify that
> > there were no RCU readers running on them. This would be quite bad for
> > battery lifetime, among other things. So the call to rcu_idle_enter()
> > tells RCU that it may safely completely ignore this CPU until its next
> > call to rcu_idle_exit().
>
> Alright, thanks for explaining this, much appreciated.
>
> So in one way, we would also like to call rcu_idle_enter(), as soon as
> we know there is no need for the RCU to be active. To prevent
> unnecessary IPIs I mean. :-)

Well, the IPIs don't happen until the better part of a second into
the grace period. So delaying an rcu_idle_enter() a few microseconds,
as Peter Zijlstra is proposing, is absolutely no problem whatsoever.
And once the rcu_idle_enter() happens, the RCU grace-period kthread's next
scan of the CPUs will see that this CPU needs to be ignored, so no more
IPIs for it until it does the next rcu_idle_exit(), rcu_irq_enter(),
or any of a number of other things that cause RCU to once again pay
attention to that CPU.

Thanx, Paul

2020-09-03 15:03:00

by Peter Zijlstra

[permalink] [raw]
Subject: [RFC][PATCH] cpu_pm: Remove RCU abuse

On Wed, Sep 02, 2020 at 05:58:55PM +0200, Ulf Hansson wrote:
> On Wed, 2 Sep 2020 at 14:14, <[email protected]> wrote:
> >
> > On Wed, Sep 02, 2020 at 09:03:37AM +0200, Ulf Hansson wrote:
> > > Lots of cpuidle drivers are using CPU_PM notifiers (grep for
> > > cpu_pm_enter and you will see) from their idlestates ->enter()
> > > callbacks. And for those we are already calling
> > > rcu_irq_enter_irqson|off() in cpu_pm_notify() when firing them.
> >
> > Yeah, that particular trainwreck is on my todo list already ... then
> > again, that list is forever overflowing.
> >
> > I'm thinking cpu_pm_unregister_notifier() is not a common thing? The few
> > I looked at seem to suggest 'never' is a good approximation.
>
> The trend is that drivers are turning into regular modules that may
> also need to manage "->remove()", which may mean unregistering the
> notifier. Of course, I don't know for sure whether that becomes a
> problem, but it seems quite limiting.

You can pin modules, once they're loaded they can never be removed
again.

Anyway, the below should 'work', I think.

---
diff --git a/kernel/cpu_pm.c b/kernel/cpu_pm.c
index f7e1d0eccdbc..72804e0883d5 100644
--- a/kernel/cpu_pm.c
+++ b/kernel/cpu_pm.c
@@ -12,21 +12,18 @@
#include <linux/notifier.h>
#include <linux/spinlock.h>
#include <linux/syscore_ops.h>
+#include <linux/cpu.h>
+#include <linux/smp.h>

-static ATOMIC_NOTIFIER_HEAD(cpu_pm_notifier_chain);
+static RAW_NOTIFIER_HEAD(cpu_pm_notifier_chain);
+static DEFINE_SPINLOCK(cpu_pm_lock);

static int cpu_pm_notify(enum cpu_pm_event event)
{
int ret;

- /*
- * atomic_notifier_call_chain has a RCU read critical section, which
- * could be disfunctional in cpu idle. Copy RCU_NONIDLE code to let
- * RCU know this.
- */
- rcu_irq_enter_irqson();
- ret = atomic_notifier_call_chain(&cpu_pm_notifier_chain, event, NULL);
- rcu_irq_exit_irqson();
+ lockdep_assert_irqs_disabled();
+ ret = raw_notifier_call_chain(&cpu_pm_notifier_chain, event, NULL);

return notifier_to_errno(ret);
}
@@ -35,9 +32,8 @@ static int cpu_pm_notify_robust(enum cpu_pm_event event_up, enum cpu_pm_event ev
{
int ret;

- rcu_irq_enter_irqson();
- ret = atomic_notifier_call_chain_robust(&cpu_pm_notifier_chain, event_up, event_down, NULL);
- rcu_irq_exit_irqson();
+ lockdep_assert_irqs_disabled();
+ ret = raw_notifier_call_chain_robust(&cpu_pm_notifier_chain, event_up, event_down, NULL);

return notifier_to_errno(ret);
}
@@ -54,10 +50,28 @@ static int cpu_pm_notify_robust(enum cpu_pm_event event_up, enum cpu_pm_event ev
*/
int cpu_pm_register_notifier(struct notifier_block *nb)
{
- return atomic_notifier_chain_register(&cpu_pm_notifier_chain, nb);
+ unsigned long flags;
+ int ret;
+
+ spin_lock_irqsave(&cpu_pm_lock, flags);
+ ret = raw_notifier_chain_register(&cpu_pm_notifier_chain, nb);
+ spin_unlock_irqrestore(&cpu_pm_lock, flags);
+
+ return ret;
}
EXPORT_SYMBOL_GPL(cpu_pm_register_notifier);

+static bool __is_idle_cpu(int cpu, void *info)
+{
+ /*
+ * Racy as heck, however if we fail to see an idle task, it must be
+ * after we removed our element, so all is fine.
+ */
+ return is_idle_task(curr_task(cpu));
+}
+
+static void __nop_func(void *arg) { }
+
/**
* cpu_pm_unregister_notifier - unregister a driver with cpu_pm
* @nb: notifier block to be unregistered
@@ -69,7 +83,30 @@ EXPORT_SYMBOL_GPL(cpu_pm_register_notifier);
*/
int cpu_pm_unregister_notifier(struct notifier_block *nb)
{
- return atomic_notifier_chain_unregister(&cpu_pm_notifier_chain, nb);
+ unsigned long flags;
+ int ret, cpu;
+
+ spin_lock_irqsave(&cpu_pm_lock, flags);
+ ret = raw_notifier_chain_unregister(&cpu_pm_notifier_chain, nb);
+ spin_unlock_irqrestore(&cpu_pm_lock, flags);
+
+ /*
+ * Orders the removal above vs the __is_idle_cpu() test below. Matches
+ * schedule() switching to the idle task.
+ *
+ * Ensures that if we miss an idle task, it must be after the removal.
+ */
+ smp_mb();
+
+ /*
+ * IPI all idle CPUs, this guarantees that no CPU is currently
+ * iterating the notifier list.
+ */
+ cpus_read_lock();
+ on_each_cpu_cond(__is_idle_cpu, __nop_func, NULL, 1);
+ cpus_read_unlock();
+
+ return ret;
}
EXPORT_SYMBOL_GPL(cpu_pm_unregister_notifier);