2023-01-12 21:30:57

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH v3 00/51] cpuidle,rcu: Clean up the mess

Hi All!

The (hopefully) final respin of cpuidle vs rcu cleanup patches. Barring any
objections I'll be queueing these patches in tip/sched/core in the next few
days.

v2: https://lkml.kernel.org/r/[email protected]

These here patches clean up the mess that is cpuidle vs rcuidle.

At the end of the ride there's only on RCU_NONIDLE user left:

arch/arm64/kernel/suspend.c: RCU_NONIDLE(__cpu_suspend_exit());

And I know Mark has been prodding that with something sharp.

The last version was tested by a number of people and I'm hoping to not have
broken anything in the meantime ;-)


Changes since v2:

- rebased to v6.2-rc3; as available at:
git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/idle

- folded: https://lkml.kernel.org/r/[email protected]
which makes the ARM cpuidle index 0 consistently not use
CPUIDLE_FLAG_RCU_IDLE, as requested by Ulf.

- added a few more __always_inline to empty stub functions as found by the
robot.

- Used _RET_IP_ instead of _THIS_IP_ in a few placed because of:
https://github.com/ClangBuiltLinux/linux/issues/263

- Added new patches to address various robot reports:

#35: trace,hardirq: No moar _rcuidle() tracing
#47: cpuidle: Ensure ct_cpuidle_enter() is always called from noinstr/__cpuidle
#48: cpuidle,arch: Mark all ct_cpuidle_enter() callers __cpuidle
#49: cpuidle,arch: Mark all regular cpuidle_state::enter methods __cpuidle
#50: cpuidle: Comments about noinstr/__cpuidle
#51: context_tracking: Fix noinstr vs KASAN


---
arch/alpha/kernel/process.c | 1 -
arch/alpha/kernel/vmlinux.lds.S | 1 -
arch/arc/kernel/process.c | 3 ++
arch/arc/kernel/vmlinux.lds.S | 1 -
arch/arm/include/asm/vmlinux.lds.h | 1 -
arch/arm/kernel/cpuidle.c | 4 +-
arch/arm/kernel/process.c | 1 -
arch/arm/kernel/smp.c | 6 +--
arch/arm/mach-davinci/cpuidle.c | 4 +-
arch/arm/mach-gemini/board-dt.c | 3 +-
arch/arm/mach-imx/cpuidle-imx5.c | 4 +-
arch/arm/mach-imx/cpuidle-imx6q.c | 8 ++--
arch/arm/mach-imx/cpuidle-imx6sl.c | 4 +-
arch/arm/mach-imx/cpuidle-imx6sx.c | 9 ++--
arch/arm/mach-imx/cpuidle-imx7ulp.c | 4 +-
arch/arm/mach-omap2/common.h | 6 ++-
arch/arm/mach-omap2/cpuidle34xx.c | 16 ++++++-
arch/arm/mach-omap2/cpuidle44xx.c | 29 +++++++------
arch/arm/mach-omap2/omap-mpuss-lowpower.c | 12 +++++-
arch/arm/mach-omap2/pm.h | 2 +-
arch/arm/mach-omap2/pm24xx.c | 51 +---------------------
arch/arm/mach-omap2/pm34xx.c | 14 +++++--
arch/arm/mach-omap2/pm44xx.c | 2 +-
arch/arm/mach-omap2/powerdomain.c | 10 ++---
arch/arm/mach-s3c/cpuidle-s3c64xx.c | 5 +--
arch/arm64/kernel/cpuidle.c | 2 +-
arch/arm64/kernel/idle.c | 1 -
arch/arm64/kernel/smp.c | 4 +-
arch/arm64/kernel/vmlinux.lds.S | 1 -
arch/csky/kernel/process.c | 1 -
arch/csky/kernel/smp.c | 2 +-
arch/csky/kernel/vmlinux.lds.S | 1 -
arch/hexagon/kernel/process.c | 1 -
arch/hexagon/kernel/vmlinux.lds.S | 1 -
arch/ia64/kernel/process.c | 1 +
arch/ia64/kernel/vmlinux.lds.S | 1 -
arch/loongarch/kernel/idle.c | 1 +
arch/loongarch/kernel/vmlinux.lds.S | 1 -
arch/m68k/kernel/vmlinux-nommu.lds | 1 -
arch/m68k/kernel/vmlinux-std.lds | 1 -
arch/m68k/kernel/vmlinux-sun3.lds | 1 -
arch/microblaze/kernel/process.c | 1 -
arch/microblaze/kernel/vmlinux.lds.S | 1 -
arch/mips/kernel/idle.c | 14 +++----
arch/mips/kernel/vmlinux.lds.S | 1 -
arch/nios2/kernel/process.c | 1 -
arch/nios2/kernel/vmlinux.lds.S | 1 -
arch/openrisc/kernel/process.c | 1 +
arch/openrisc/kernel/vmlinux.lds.S | 1 -
arch/parisc/kernel/process.c | 2 -
arch/parisc/kernel/vmlinux.lds.S | 1 -
arch/powerpc/kernel/idle.c | 5 +--
arch/powerpc/kernel/vmlinux.lds.S | 1 -
arch/riscv/kernel/process.c | 1 -
arch/riscv/kernel/vmlinux-xip.lds.S | 1 -
arch/riscv/kernel/vmlinux.lds.S | 1 -
arch/s390/kernel/idle.c | 1 -
arch/s390/kernel/vmlinux.lds.S | 1 -
arch/sh/kernel/idle.c | 1 +
arch/sh/kernel/vmlinux.lds.S | 1 -
arch/sparc/kernel/leon_pmc.c | 4 ++
arch/sparc/kernel/process_32.c | 1 -
arch/sparc/kernel/process_64.c | 3 +-
arch/sparc/kernel/vmlinux.lds.S | 1 -
arch/um/kernel/dyn.lds.S | 1 -
arch/um/kernel/process.c | 1 -
arch/um/kernel/uml.lds.S | 1 -
arch/x86/boot/compressed/vmlinux.lds.S | 1 +
arch/x86/coco/tdx/tdcall.S | 15 +------
arch/x86/coco/tdx/tdx.c | 25 ++++-------
arch/x86/events/amd/brs.c | 13 +++---
arch/x86/include/asm/fpu/xcr.h | 4 +-
arch/x86/include/asm/irqflags.h | 11 ++---
arch/x86/include/asm/mwait.h | 14 +++----
arch/x86/include/asm/nospec-branch.h | 2 +-
arch/x86/include/asm/paravirt.h | 6 ++-
arch/x86/include/asm/perf_event.h | 2 +-
arch/x86/include/asm/shared/io.h | 4 +-
arch/x86/include/asm/shared/tdx.h | 1 -
arch/x86/include/asm/special_insns.h | 8 ++--
arch/x86/include/asm/xen/hypercall.h | 2 +-
arch/x86/kernel/cpu/bugs.c | 2 +-
arch/x86/kernel/fpu/core.c | 4 +-
arch/x86/kernel/paravirt.c | 14 ++++++-
arch/x86/kernel/process.c | 65 ++++++++++++++--------------
arch/x86/kernel/vmlinux.lds.S | 1 -
arch/x86/lib/memcpy_64.S | 5 +--
arch/x86/lib/memmove_64.S | 4 +-
arch/x86/lib/memset_64.S | 4 +-
arch/x86/xen/enlighten_pv.c | 2 +-
arch/x86/xen/irq.c | 2 +-
arch/xtensa/kernel/process.c | 1 +
arch/xtensa/kernel/vmlinux.lds.S | 1 -
drivers/acpi/processor_idle.c | 28 ++++++++-----
drivers/base/power/runtime.c | 24 +++++------
drivers/clk/clk.c | 8 ++--
drivers/cpuidle/cpuidle-arm.c | 4 +-
drivers/cpuidle/cpuidle-big_little.c | 12 ++++--
drivers/cpuidle/cpuidle-mvebu-v7.c | 13 ++++--
drivers/cpuidle/cpuidle-psci.c | 26 +++++-------
drivers/cpuidle/cpuidle-qcom-spm.c | 4 +-
drivers/cpuidle/cpuidle-riscv-sbi.c | 19 +++++----
drivers/cpuidle/cpuidle-tegra.c | 31 +++++++++-----
drivers/cpuidle/cpuidle.c | 70 ++++++++++++++++++++++---------
drivers/cpuidle/dt_idle_states.c | 2 +-
drivers/cpuidle/poll_state.c | 10 ++++-
drivers/idle/intel_idle.c | 19 ++++-----
drivers/perf/arm_pmu.c | 11 +----
drivers/perf/riscv_pmu_sbi.c | 8 +---
include/asm-generic/vmlinux.lds.h | 9 ++--
include/linux/clockchips.h | 4 +-
include/linux/compiler_types.h | 18 +++++++-
include/linux/cpu.h | 3 --
include/linux/cpuidle.h | 32 ++++++++++++++
include/linux/cpumask.h | 4 +-
include/linux/percpu-defs.h | 2 +-
include/linux/sched/idle.h | 40 +++++++++++++-----
include/linux/thread_info.h | 18 +++++++-
include/linux/tracepoint.h | 15 ++++++-
kernel/context_tracking.c | 12 +++---
kernel/cpu_pm.c | 9 ----
kernel/printk/printk.c | 2 +-
kernel/sched/idle.c | 47 ++++++---------------
kernel/time/tick-broadcast-hrtimer.c | 29 ++++++-------
kernel/time/tick-broadcast.c | 6 ++-
kernel/trace/trace.c | 3 ++
kernel/trace/trace_preemptirq.c | 50 ++++++----------------
lib/ubsan.c | 5 ++-
mm/kasan/kasan.h | 4 ++
mm/kasan/shadow.c | 38 +++++++++++++++++
tools/objtool/check.c | 17 ++++++++
131 files changed, 617 insertions(+), 523 deletions(-)


2023-01-13 18:44:45

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v3 00/51] cpuidle,rcu: Clean up the mess

On Thu, Jan 12, 2023 at 08:43:14PM +0100, Peter Zijlstra wrote:
> Hi All!
>
> The (hopefully) final respin of cpuidle vs rcu cleanup patches. Barring any
> objections I'll be queueing these patches in tip/sched/core in the next few
> days.
>
> v2: https://lkml.kernel.org/r/[email protected]
>
> These here patches clean up the mess that is cpuidle vs rcuidle.
>
> At the end of the ride there's only on RCU_NONIDLE user left:
>
> arch/arm64/kernel/suspend.c: RCU_NONIDLE(__cpu_suspend_exit());
>
> And I know Mark has been prodding that with something sharp.
>
> The last version was tested by a number of people and I'm hoping to not have
> broken anything in the meantime ;-)
>
>
> Changes since v2:

150 rcutorture hours on each of the default scenarios passed. This
is qemu/KVM on x86:

Tested-by: Paul E. McKenney <[email protected]>

> - rebased to v6.2-rc3; as available at:
> git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/idle
>
> - folded: https://lkml.kernel.org/r/[email protected]
> which makes the ARM cpuidle index 0 consistently not use
> CPUIDLE_FLAG_RCU_IDLE, as requested by Ulf.
>
> - added a few more __always_inline to empty stub functions as found by the
> robot.
>
> - Used _RET_IP_ instead of _THIS_IP_ in a few placed because of:
> https://github.com/ClangBuiltLinux/linux/issues/263
>
> - Added new patches to address various robot reports:
>
> #35: trace,hardirq: No moar _rcuidle() tracing
> #47: cpuidle: Ensure ct_cpuidle_enter() is always called from noinstr/__cpuidle
> #48: cpuidle,arch: Mark all ct_cpuidle_enter() callers __cpuidle
> #49: cpuidle,arch: Mark all regular cpuidle_state::enter methods __cpuidle
> #50: cpuidle: Comments about noinstr/__cpuidle
> #51: context_tracking: Fix noinstr vs KASAN
>
>
> ---
> arch/alpha/kernel/process.c | 1 -
> arch/alpha/kernel/vmlinux.lds.S | 1 -
> arch/arc/kernel/process.c | 3 ++
> arch/arc/kernel/vmlinux.lds.S | 1 -
> arch/arm/include/asm/vmlinux.lds.h | 1 -
> arch/arm/kernel/cpuidle.c | 4 +-
> arch/arm/kernel/process.c | 1 -
> arch/arm/kernel/smp.c | 6 +--
> arch/arm/mach-davinci/cpuidle.c | 4 +-
> arch/arm/mach-gemini/board-dt.c | 3 +-
> arch/arm/mach-imx/cpuidle-imx5.c | 4 +-
> arch/arm/mach-imx/cpuidle-imx6q.c | 8 ++--
> arch/arm/mach-imx/cpuidle-imx6sl.c | 4 +-
> arch/arm/mach-imx/cpuidle-imx6sx.c | 9 ++--
> arch/arm/mach-imx/cpuidle-imx7ulp.c | 4 +-
> arch/arm/mach-omap2/common.h | 6 ++-
> arch/arm/mach-omap2/cpuidle34xx.c | 16 ++++++-
> arch/arm/mach-omap2/cpuidle44xx.c | 29 +++++++------
> arch/arm/mach-omap2/omap-mpuss-lowpower.c | 12 +++++-
> arch/arm/mach-omap2/pm.h | 2 +-
> arch/arm/mach-omap2/pm24xx.c | 51 +---------------------
> arch/arm/mach-omap2/pm34xx.c | 14 +++++--
> arch/arm/mach-omap2/pm44xx.c | 2 +-
> arch/arm/mach-omap2/powerdomain.c | 10 ++---
> arch/arm/mach-s3c/cpuidle-s3c64xx.c | 5 +--
> arch/arm64/kernel/cpuidle.c | 2 +-
> arch/arm64/kernel/idle.c | 1 -
> arch/arm64/kernel/smp.c | 4 +-
> arch/arm64/kernel/vmlinux.lds.S | 1 -
> arch/csky/kernel/process.c | 1 -
> arch/csky/kernel/smp.c | 2 +-
> arch/csky/kernel/vmlinux.lds.S | 1 -
> arch/hexagon/kernel/process.c | 1 -
> arch/hexagon/kernel/vmlinux.lds.S | 1 -
> arch/ia64/kernel/process.c | 1 +
> arch/ia64/kernel/vmlinux.lds.S | 1 -
> arch/loongarch/kernel/idle.c | 1 +
> arch/loongarch/kernel/vmlinux.lds.S | 1 -
> arch/m68k/kernel/vmlinux-nommu.lds | 1 -
> arch/m68k/kernel/vmlinux-std.lds | 1 -
> arch/m68k/kernel/vmlinux-sun3.lds | 1 -
> arch/microblaze/kernel/process.c | 1 -
> arch/microblaze/kernel/vmlinux.lds.S | 1 -
> arch/mips/kernel/idle.c | 14 +++----
> arch/mips/kernel/vmlinux.lds.S | 1 -
> arch/nios2/kernel/process.c | 1 -
> arch/nios2/kernel/vmlinux.lds.S | 1 -
> arch/openrisc/kernel/process.c | 1 +
> arch/openrisc/kernel/vmlinux.lds.S | 1 -
> arch/parisc/kernel/process.c | 2 -
> arch/parisc/kernel/vmlinux.lds.S | 1 -
> arch/powerpc/kernel/idle.c | 5 +--
> arch/powerpc/kernel/vmlinux.lds.S | 1 -
> arch/riscv/kernel/process.c | 1 -
> arch/riscv/kernel/vmlinux-xip.lds.S | 1 -
> arch/riscv/kernel/vmlinux.lds.S | 1 -
> arch/s390/kernel/idle.c | 1 -
> arch/s390/kernel/vmlinux.lds.S | 1 -
> arch/sh/kernel/idle.c | 1 +
> arch/sh/kernel/vmlinux.lds.S | 1 -
> arch/sparc/kernel/leon_pmc.c | 4 ++
> arch/sparc/kernel/process_32.c | 1 -
> arch/sparc/kernel/process_64.c | 3 +-
> arch/sparc/kernel/vmlinux.lds.S | 1 -
> arch/um/kernel/dyn.lds.S | 1 -
> arch/um/kernel/process.c | 1 -
> arch/um/kernel/uml.lds.S | 1 -
> arch/x86/boot/compressed/vmlinux.lds.S | 1 +
> arch/x86/coco/tdx/tdcall.S | 15 +------
> arch/x86/coco/tdx/tdx.c | 25 ++++-------
> arch/x86/events/amd/brs.c | 13 +++---
> arch/x86/include/asm/fpu/xcr.h | 4 +-
> arch/x86/include/asm/irqflags.h | 11 ++---
> arch/x86/include/asm/mwait.h | 14 +++----
> arch/x86/include/asm/nospec-branch.h | 2 +-
> arch/x86/include/asm/paravirt.h | 6 ++-
> arch/x86/include/asm/perf_event.h | 2 +-
> arch/x86/include/asm/shared/io.h | 4 +-
> arch/x86/include/asm/shared/tdx.h | 1 -
> arch/x86/include/asm/special_insns.h | 8 ++--
> arch/x86/include/asm/xen/hypercall.h | 2 +-
> arch/x86/kernel/cpu/bugs.c | 2 +-
> arch/x86/kernel/fpu/core.c | 4 +-
> arch/x86/kernel/paravirt.c | 14 ++++++-
> arch/x86/kernel/process.c | 65 ++++++++++++++--------------
> arch/x86/kernel/vmlinux.lds.S | 1 -
> arch/x86/lib/memcpy_64.S | 5 +--
> arch/x86/lib/memmove_64.S | 4 +-
> arch/x86/lib/memset_64.S | 4 +-
> arch/x86/xen/enlighten_pv.c | 2 +-
> arch/x86/xen/irq.c | 2 +-
> arch/xtensa/kernel/process.c | 1 +
> arch/xtensa/kernel/vmlinux.lds.S | 1 -
> drivers/acpi/processor_idle.c | 28 ++++++++-----
> drivers/base/power/runtime.c | 24 +++++------
> drivers/clk/clk.c | 8 ++--
> drivers/cpuidle/cpuidle-arm.c | 4 +-
> drivers/cpuidle/cpuidle-big_little.c | 12 ++++--
> drivers/cpuidle/cpuidle-mvebu-v7.c | 13 ++++--
> drivers/cpuidle/cpuidle-psci.c | 26 +++++-------
> drivers/cpuidle/cpuidle-qcom-spm.c | 4 +-
> drivers/cpuidle/cpuidle-riscv-sbi.c | 19 +++++----
> drivers/cpuidle/cpuidle-tegra.c | 31 +++++++++-----
> drivers/cpuidle/cpuidle.c | 70 ++++++++++++++++++++++---------
> drivers/cpuidle/dt_idle_states.c | 2 +-
> drivers/cpuidle/poll_state.c | 10 ++++-
> drivers/idle/intel_idle.c | 19 ++++-----
> drivers/perf/arm_pmu.c | 11 +----
> drivers/perf/riscv_pmu_sbi.c | 8 +---
> include/asm-generic/vmlinux.lds.h | 9 ++--
> include/linux/clockchips.h | 4 +-
> include/linux/compiler_types.h | 18 +++++++-
> include/linux/cpu.h | 3 --
> include/linux/cpuidle.h | 32 ++++++++++++++
> include/linux/cpumask.h | 4 +-
> include/linux/percpu-defs.h | 2 +-
> include/linux/sched/idle.h | 40 +++++++++++++-----
> include/linux/thread_info.h | 18 +++++++-
> include/linux/tracepoint.h | 15 ++++++-
> kernel/context_tracking.c | 12 +++---
> kernel/cpu_pm.c | 9 ----
> kernel/printk/printk.c | 2 +-
> kernel/sched/idle.c | 47 ++++++---------------
> kernel/time/tick-broadcast-hrtimer.c | 29 ++++++-------
> kernel/time/tick-broadcast.c | 6 ++-
> kernel/trace/trace.c | 3 ++
> kernel/trace/trace_preemptirq.c | 50 ++++++----------------
> lib/ubsan.c | 5 ++-
> mm/kasan/kasan.h | 4 ++
> mm/kasan/shadow.c | 38 +++++++++++++++++
> tools/objtool/check.c | 17 ++++++++
> 131 files changed, 617 insertions(+), 523 deletions(-)
>

2023-01-16 17:57:55

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH v3 00/51] cpuidle,rcu: Clean up the mess

On Thu, Jan 12, 2023 at 08:43:14PM +0100, Peter Zijlstra wrote:
> Hi All!

Hi Peter,

> The (hopefully) final respin of cpuidle vs rcu cleanup patches. Barring any
> objections I'll be queueing these patches in tip/sched/core in the next few
> days.

I'm sorry to have to bear some bad news on that front. :(

I just had a go at testing this on a Juno dev board, using your queue.git
sched/idle branch and defconfig + CONFIG_PROVE_LOCKING=y +
CONFIG_DEBUG_LOCKDEP=y + CONFIG_DEBUG_ATOMIC_SLEEP=y.

With that I consistently see RCU at boot time (log below).

| =============================
| WARNING: suspicious RCU usage
| 6.2.0-rc3-00051-gced9b6eecb31 #1 Not tainted
| -----------------------------
| include/trace/events/ipi.h:19 suspicious rcu_dereference_check() usage!
|
| other info that might help us debug this:
|
|
| rcu_scheduler_active = 2, debug_locks = 1
| RCU used illegally from extended quiescent state!
| no locks held by swapper/0/0.
|
| stack backtrace:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-rc3-00051-gced9b6eecb31 #1
| Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II May 16 2021
| Call trace:
| dump_backtrace.part.0+0xe4/0xf0
| show_stack+0x18/0x30
| dump_stack_lvl+0x98/0xd4
| dump_stack+0x18/0x34
| lockdep_rcu_suspicious+0xf8/0x10c
| trace_ipi_raise+0x1a8/0x1b0
| arch_irq_work_raise+0x4c/0x70
| __irq_work_queue_local+0x48/0x80
| irq_work_queue+0x50/0x80
| __wake_up_klogd.part.0+0x98/0xe0
| defer_console_output+0x20/0x30
| vprintk+0x98/0xf0
| _printk+0x5c/0x84
| lockdep_rcu_suspicious+0x34/0x10c
| trace_lock_acquire+0x174/0x180
| lock_acquire+0x3c/0x8c
| _raw_spin_lock_irqsave+0x70/0x150
| down_trylock+0x18/0x50
| __down_trylock_console_sem+0x3c/0xd0
| console_trylock+0x28/0x90
| vprintk_emit+0x11c/0x354
| vprintk_default+0x38/0x4c
| vprintk+0xd4/0xf0
| _printk+0x5c/0x84
| lockdep_rcu_suspicious+0x34/0x10c
| printk_sprint+0x238/0x240
| vprintk_store+0x32c/0x4b0
| vprintk_emit+0x104/0x354
| vprintk_default+0x38/0x4c
| vprintk+0xd4/0xf0
| _printk+0x5c/0x84
| lockdep_rcu_suspicious+0x34/0x10c
| trace_irq_disable+0x1ac/0x1b0
| trace_hardirqs_off+0xe8/0x110
| cpu_suspend+0x4c/0xfc
| psci_cpu_suspend_enter+0x58/0x6c
| psci_enter_idle_state+0x70/0x170
| cpuidle_enter_state+0xc4/0x464
| cpuidle_enter+0x38/0x50
| do_idle+0x230/0x2c0
| cpu_startup_entry+0x24/0x30
| rest_init+0x110/0x190
| arch_post_acpi_subsys_init+0x0/0x18
| start_kernel+0x6f8/0x738
| __primary_switched+0xbc/0xc4

IIUC what's happenign here is the PSCI cpuidle driver has entered idle and RCU
is no longer watching when arm64's cpu_suspend() manipulates DAIF. Our
local_daif_*() helpers poke lockdep and tracing, hence the call to
trace_hardirqs_off() and the RCU usage.

I think we need RCU to be watching all the way down to cpu_suspend(), and it's
cpu_suspend() that should actually enter/exit idle context. That and we need to
make cpu_suspend() and the low-level PSCI invocation noinstr.

I'm not sure whether 32-bit will have a similar issue or not.

I'm surprised no-one else who has tested has seen this; I suspect people
haven't enabled lockdep and friends. :/

Thanks,
Mark.

2023-01-17 10:33:21

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3 00/51] cpuidle,rcu: Clean up the mess

On Mon, Jan 16, 2023 at 04:59:04PM +0000, Mark Rutland wrote:

> I'm sorry to have to bear some bad news on that front. :(

Moo, something had to give..


> IIUC what's happenign here is the PSCI cpuidle driver has entered idle and RCU
> is no longer watching when arm64's cpu_suspend() manipulates DAIF. Our
> local_daif_*() helpers poke lockdep and tracing, hence the call to
> trace_hardirqs_off() and the RCU usage.

Right, strictly speaking not needed at this point, IRQs should have been
traced off a long time ago.

> I think we need RCU to be watching all the way down to cpu_suspend(), and it's
> cpu_suspend() that should actually enter/exit idle context. That and we need to
> make cpu_suspend() and the low-level PSCI invocation noinstr.
>
> I'm not sure whether 32-bit will have a similar issue or not.

I'm not seeing 32bit or Risc-V have similar issues here, but who knows,
maybe I missed somsething.

In any case, the below ought to cure the ARM64 case and remove that last
known RCU_NONIDLE() user as a bonus.

---
diff --git a/arch/arm64/kernel/cpuidle.c b/arch/arm64/kernel/cpuidle.c
index 41974a1a229a..42e19fff40ee 100644
--- a/arch/arm64/kernel/cpuidle.c
+++ b/arch/arm64/kernel/cpuidle.c
@@ -67,10 +67,10 @@ __cpuidle int acpi_processor_ffh_lpi_enter(struct acpi_lpi_state *lpi)
u32 state = lpi->address;

if (ARM64_LPI_IS_RETENTION_STATE(lpi->arch_flags))
- return CPU_PM_CPU_IDLE_ENTER_RETENTION_PARAM(psci_cpu_suspend_enter,
+ return CPU_PM_CPU_IDLE_ENTER_RETENTION_PARAM_RCU(psci_cpu_suspend_enter,
lpi->index, state);
else
- return CPU_PM_CPU_IDLE_ENTER_PARAM(psci_cpu_suspend_enter,
+ return CPU_PM_CPU_IDLE_ENTER_PARAM_RCU(psci_cpu_suspend_enter,
lpi->index, state);
}
#endif
diff --git a/arch/arm64/kernel/suspend.c b/arch/arm64/kernel/suspend.c
index e7163f31f716..0fbdf5fe64d8 100644
--- a/arch/arm64/kernel/suspend.c
+++ b/arch/arm64/kernel/suspend.c
@@ -4,6 +4,7 @@
#include <linux/slab.h>
#include <linux/uaccess.h>
#include <linux/pgtable.h>
+#include <linux/cpuidle.h>
#include <asm/alternative.h>
#include <asm/cacheflush.h>
#include <asm/cpufeature.h>
@@ -104,6 +105,10 @@ int cpu_suspend(unsigned long arg, int (*fn)(unsigned long))
* From this point debug exceptions are disabled to prevent
* updates to mdscr register (saved and restored along with
* general purpose registers) from kernel debuggers.
+ *
+ * Strictly speaking the trace_hardirqs_off() here is superfluous,
+ * hardirqs should be firmly off by now. This really ought to use
+ * something like raw_local_daif_save().
*/
flags = local_daif_save();

@@ -120,6 +125,8 @@ int cpu_suspend(unsigned long arg, int (*fn)(unsigned long))
*/
arm_cpuidle_save_irq_context(&context);

+ ct_cpuidle_enter();
+
if (__cpu_suspend_enter(&state)) {
/* Call the suspend finisher */
ret = fn(arg);
@@ -133,8 +140,11 @@ int cpu_suspend(unsigned long arg, int (*fn)(unsigned long))
*/
if (!ret)
ret = -EOPNOTSUPP;
+
+ ct_cpuidle_exit();
} else {
- RCU_NONIDLE(__cpu_suspend_exit());
+ ct_cpuidle_exit();
+ __cpu_suspend_exit();
}

arm_cpuidle_restore_irq_context(&context);
diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
index 4fc4e0381944..312a34ef28dc 100644
--- a/drivers/cpuidle/cpuidle-psci.c
+++ b/drivers/cpuidle/cpuidle-psci.c
@@ -69,16 +69,12 @@ static __cpuidle int __psci_enter_domain_idle_state(struct cpuidle_device *dev,
else
pm_runtime_put_sync_suspend(pd_dev);

- ct_cpuidle_enter();
-
state = psci_get_domain_state();
if (!state)
state = states[idx];

ret = psci_cpu_suspend_enter(state) ? -1 : idx;

- ct_cpuidle_exit();
-
if (s2idle)
dev_pm_genpd_resume(pd_dev);
else
@@ -192,7 +188,7 @@ static __cpuidle int psci_enter_idle_state(struct cpuidle_device *dev,
{
u32 *state = __this_cpu_read(psci_cpuidle_data.psci_states);

- return CPU_PM_CPU_IDLE_ENTER_PARAM(psci_cpu_suspend_enter, idx, state[idx]);
+ return CPU_PM_CPU_IDLE_ENTER_PARAM_RCU(psci_cpu_suspend_enter, idx, state[idx]);
}

static const struct of_device_id psci_idle_state_match[] = {
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index e7bcfca4159f..f3a044fa4652 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -462,11 +462,22 @@ int psci_cpu_suspend_enter(u32 state)
if (!psci_power_state_loses_context(state)) {
struct arm_cpuidle_irq_context context;

+ ct_cpuidle_enter();
arm_cpuidle_save_irq_context(&context);
ret = psci_ops.cpu_suspend(state, 0);
arm_cpuidle_restore_irq_context(&context);
+ ct_cpuidle_exit();
} else {
+ /*
+ * ARM64 cpu_suspend() wants to do ct_cpuidle_*() itself.
+ */
+ if (!IS_ENABLED(CONFIG_ARM64))
+ ct_cpuidle_enter();
+
ret = cpu_suspend(state, psci_suspend_finisher);
+
+ if (!IS_ENABLED(CONFIG_ARM64))
+ ct_cpuidle_exit();
}

return ret;
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 630c879143c7..3183aeb7f5b4 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -307,7 +307,7 @@ extern s64 cpuidle_governor_latency_req(unsigned int cpu);
#define __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, \
idx, \
state, \
- is_retention) \
+ is_retention, is_rcu) \
({ \
int __ret = 0; \
\
@@ -319,9 +319,11 @@ extern s64 cpuidle_governor_latency_req(unsigned int cpu);
if (!is_retention) \
__ret = cpu_pm_enter(); \
if (!__ret) { \
- ct_cpuidle_enter(); \
+ if (!is_rcu) \
+ ct_cpuidle_enter(); \
__ret = low_level_idle_enter(state); \
- ct_cpuidle_exit(); \
+ if (!is_rcu) \
+ ct_cpuidle_exit(); \
if (!is_retention) \
cpu_pm_exit(); \
} \
@@ -330,15 +332,21 @@ extern s64 cpuidle_governor_latency_req(unsigned int cpu);
})

#define CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx) \
- __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, idx, 0)
+ __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, idx, 0, 0)

#define CPU_PM_CPU_IDLE_ENTER_RETENTION(low_level_idle_enter, idx) \
- __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, idx, 1)
+ __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, idx, 1, 0)

#define CPU_PM_CPU_IDLE_ENTER_PARAM(low_level_idle_enter, idx, state) \
- __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, state, 0)
+ __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, state, 0, 0)
+
+#define CPU_PM_CPU_IDLE_ENTER_PARAM_RCU(low_level_idle_enter, idx, state) \
+ __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, state, 0, 1)

#define CPU_PM_CPU_IDLE_ENTER_RETENTION_PARAM(low_level_idle_enter, idx, state) \
- __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, state, 1)
+ __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, state, 1, 0)
+
+#define CPU_PM_CPU_IDLE_ENTER_RETENTION_PARAM_RCU(low_level_idle_enter, idx, state) \
+ __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, state, 1, 1)

#endif /* _LINUX_CPUIDLE_H */

2023-01-17 13:15:08

by Sudeep Holla

[permalink] [raw]
Subject: Re: [PATCH v3 00/51] cpuidle,rcu: Clean up the mess

On Tue, Jan 17, 2023 at 11:26:29AM +0100, Peter Zijlstra wrote:
> On Mon, Jan 16, 2023 at 04:59:04PM +0000, Mark Rutland wrote:
>
> > I'm sorry to have to bear some bad news on that front. :(
>
> Moo, something had to give..
>
>
> > IIUC what's happenign here is the PSCI cpuidle driver has entered idle and RCU
> > is no longer watching when arm64's cpu_suspend() manipulates DAIF. Our
> > local_daif_*() helpers poke lockdep and tracing, hence the call to
> > trace_hardirqs_off() and the RCU usage.
>
> Right, strictly speaking not needed at this point, IRQs should have been
> traced off a long time ago.
>
> > I think we need RCU to be watching all the way down to cpu_suspend(), and it's
> > cpu_suspend() that should actually enter/exit idle context. That and we need to
> > make cpu_suspend() and the low-level PSCI invocation noinstr.
> >
> > I'm not sure whether 32-bit will have a similar issue or not.
>
> I'm not seeing 32bit or Risc-V have similar issues here, but who knows,
> maybe I missed somsething.
>
> In any case, the below ought to cure the ARM64 case and remove that last
> known RCU_NONIDLE() user as a bonus.
>

Thanks for the fix. I tested the series and did observe the same splat
with both DT and ACPI boot(they enter idle in different code paths). Thanks
to Mark for reminding me about ACPI. With this fix, I see the splat is
gone in both DT(cpuidle-psci.c) and ACPI(acpi_processor_idle.c).

You can add:

Tested-by: Sudeep Holla <[email protected]>

--
Regards,
Sudeep

2023-01-17 13:28:10

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH v3 00/51] cpuidle,rcu: Clean up the mess

On Tue, Jan 17, 2023 at 11:26:29AM +0100, Peter Zijlstra wrote:
> On Mon, Jan 16, 2023 at 04:59:04PM +0000, Mark Rutland wrote:
>
> > I'm sorry to have to bear some bad news on that front. :(
>
> Moo, something had to give..
>
>
> > IIUC what's happenign here is the PSCI cpuidle driver has entered idle and RCU
> > is no longer watching when arm64's cpu_suspend() manipulates DAIF. Our
> > local_daif_*() helpers poke lockdep and tracing, hence the call to
> > trace_hardirqs_off() and the RCU usage.
>
> Right, strictly speaking not needed at this point, IRQs should have been
> traced off a long time ago.

True, but there are some other calls around here that *might* end up invoking
RCU stuff (e.g. the MTE code).

That all needs a noinstr cleanup too, which I'll sort out as a follow-up.

> > I think we need RCU to be watching all the way down to cpu_suspend(), and it's
> > cpu_suspend() that should actually enter/exit idle context. That and we need to
> > make cpu_suspend() and the low-level PSCI invocation noinstr.
> >
> > I'm not sure whether 32-bit will have a similar issue or not.
>
> I'm not seeing 32bit or Risc-V have similar issues here, but who knows,
> maybe I missed somsething.

I reckon if they do, the core changes here give us the infrastructure to fix
them if/when we get reports.

> In any case, the below ought to cure the ARM64 case and remove that last
> known RCU_NONIDLE() user as a bonus.

The below works for me testing on a Juno R1 board with PSCI, using defconfig +
CONFIG_PROVE_LOCKING=y + CONFIG_DEBUG_LOCKDEP=y + CONFIG_DEBUG_ATOMIC_SLEEP=y.
I'm not sure how to test the LPI / FFH part, but it looks good to me.

FWIW:

Reviewed-by: Mark Rutland <[email protected]>
Tested-by: Mark Rutland <[email protected]>

Sudeep, would you be able to give the LPI/FFH side a spin with the kconfig
options above?

Thanks,
Mark.

>
> ---
> diff --git a/arch/arm64/kernel/cpuidle.c b/arch/arm64/kernel/cpuidle.c
> index 41974a1a229a..42e19fff40ee 100644
> --- a/arch/arm64/kernel/cpuidle.c
> +++ b/arch/arm64/kernel/cpuidle.c
> @@ -67,10 +67,10 @@ __cpuidle int acpi_processor_ffh_lpi_enter(struct acpi_lpi_state *lpi)
> u32 state = lpi->address;
>
> if (ARM64_LPI_IS_RETENTION_STATE(lpi->arch_flags))
> - return CPU_PM_CPU_IDLE_ENTER_RETENTION_PARAM(psci_cpu_suspend_enter,
> + return CPU_PM_CPU_IDLE_ENTER_RETENTION_PARAM_RCU(psci_cpu_suspend_enter,
> lpi->index, state);
> else
> - return CPU_PM_CPU_IDLE_ENTER_PARAM(psci_cpu_suspend_enter,
> + return CPU_PM_CPU_IDLE_ENTER_PARAM_RCU(psci_cpu_suspend_enter,
> lpi->index, state);
> }
> #endif
> diff --git a/arch/arm64/kernel/suspend.c b/arch/arm64/kernel/suspend.c
> index e7163f31f716..0fbdf5fe64d8 100644
> --- a/arch/arm64/kernel/suspend.c
> +++ b/arch/arm64/kernel/suspend.c
> @@ -4,6 +4,7 @@
> #include <linux/slab.h>
> #include <linux/uaccess.h>
> #include <linux/pgtable.h>
> +#include <linux/cpuidle.h>
> #include <asm/alternative.h>
> #include <asm/cacheflush.h>
> #include <asm/cpufeature.h>
> @@ -104,6 +105,10 @@ int cpu_suspend(unsigned long arg, int (*fn)(unsigned long))
> * From this point debug exceptions are disabled to prevent
> * updates to mdscr register (saved and restored along with
> * general purpose registers) from kernel debuggers.
> + *
> + * Strictly speaking the trace_hardirqs_off() here is superfluous,
> + * hardirqs should be firmly off by now. This really ought to use
> + * something like raw_local_daif_save().
> */
> flags = local_daif_save();
>
> @@ -120,6 +125,8 @@ int cpu_suspend(unsigned long arg, int (*fn)(unsigned long))
> */
> arm_cpuidle_save_irq_context(&context);
>
> + ct_cpuidle_enter();
> +
> if (__cpu_suspend_enter(&state)) {
> /* Call the suspend finisher */
> ret = fn(arg);
> @@ -133,8 +140,11 @@ int cpu_suspend(unsigned long arg, int (*fn)(unsigned long))
> */
> if (!ret)
> ret = -EOPNOTSUPP;
> +
> + ct_cpuidle_exit();
> } else {
> - RCU_NONIDLE(__cpu_suspend_exit());
> + ct_cpuidle_exit();
> + __cpu_suspend_exit();
> }
>
> arm_cpuidle_restore_irq_context(&context);
> diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
> index 4fc4e0381944..312a34ef28dc 100644
> --- a/drivers/cpuidle/cpuidle-psci.c
> +++ b/drivers/cpuidle/cpuidle-psci.c
> @@ -69,16 +69,12 @@ static __cpuidle int __psci_enter_domain_idle_state(struct cpuidle_device *dev,
> else
> pm_runtime_put_sync_suspend(pd_dev);
>
> - ct_cpuidle_enter();
> -
> state = psci_get_domain_state();
> if (!state)
> state = states[idx];
>
> ret = psci_cpu_suspend_enter(state) ? -1 : idx;
>
> - ct_cpuidle_exit();
> -
> if (s2idle)
> dev_pm_genpd_resume(pd_dev);
> else
> @@ -192,7 +188,7 @@ static __cpuidle int psci_enter_idle_state(struct cpuidle_device *dev,
> {
> u32 *state = __this_cpu_read(psci_cpuidle_data.psci_states);
>
> - return CPU_PM_CPU_IDLE_ENTER_PARAM(psci_cpu_suspend_enter, idx, state[idx]);
> + return CPU_PM_CPU_IDLE_ENTER_PARAM_RCU(psci_cpu_suspend_enter, idx, state[idx]);
> }
>
> static const struct of_device_id psci_idle_state_match[] = {
> diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
> index e7bcfca4159f..f3a044fa4652 100644
> --- a/drivers/firmware/psci/psci.c
> +++ b/drivers/firmware/psci/psci.c
> @@ -462,11 +462,22 @@ int psci_cpu_suspend_enter(u32 state)
> if (!psci_power_state_loses_context(state)) {
> struct arm_cpuidle_irq_context context;
>
> + ct_cpuidle_enter();
> arm_cpuidle_save_irq_context(&context);
> ret = psci_ops.cpu_suspend(state, 0);
> arm_cpuidle_restore_irq_context(&context);
> + ct_cpuidle_exit();
> } else {
> + /*
> + * ARM64 cpu_suspend() wants to do ct_cpuidle_*() itself.
> + */
> + if (!IS_ENABLED(CONFIG_ARM64))
> + ct_cpuidle_enter();
> +
> ret = cpu_suspend(state, psci_suspend_finisher);
> +
> + if (!IS_ENABLED(CONFIG_ARM64))
> + ct_cpuidle_exit();
> }
>
> return ret;
> diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
> index 630c879143c7..3183aeb7f5b4 100644
> --- a/include/linux/cpuidle.h
> +++ b/include/linux/cpuidle.h
> @@ -307,7 +307,7 @@ extern s64 cpuidle_governor_latency_req(unsigned int cpu);
> #define __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, \
> idx, \
> state, \
> - is_retention) \
> + is_retention, is_rcu) \
> ({ \
> int __ret = 0; \
> \
> @@ -319,9 +319,11 @@ extern s64 cpuidle_governor_latency_req(unsigned int cpu);
> if (!is_retention) \
> __ret = cpu_pm_enter(); \
> if (!__ret) { \
> - ct_cpuidle_enter(); \
> + if (!is_rcu) \
> + ct_cpuidle_enter(); \
> __ret = low_level_idle_enter(state); \
> - ct_cpuidle_exit(); \
> + if (!is_rcu) \
> + ct_cpuidle_exit(); \
> if (!is_retention) \
> cpu_pm_exit(); \
> } \
> @@ -330,15 +332,21 @@ extern s64 cpuidle_governor_latency_req(unsigned int cpu);
> })
>
> #define CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx) \
> - __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, idx, 0)
> + __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, idx, 0, 0)
>
> #define CPU_PM_CPU_IDLE_ENTER_RETENTION(low_level_idle_enter, idx) \
> - __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, idx, 1)
> + __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, idx, 1, 0)
>
> #define CPU_PM_CPU_IDLE_ENTER_PARAM(low_level_idle_enter, idx, state) \
> - __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, state, 0)
> + __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, state, 0, 0)
> +
> +#define CPU_PM_CPU_IDLE_ENTER_PARAM_RCU(low_level_idle_enter, idx, state) \
> + __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, state, 0, 1)
>
> #define CPU_PM_CPU_IDLE_ENTER_RETENTION_PARAM(low_level_idle_enter, idx, state) \
> - __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, state, 1)
> + __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, state, 1, 0)
> +
> +#define CPU_PM_CPU_IDLE_ENTER_RETENTION_PARAM_RCU(low_level_idle_enter, idx, state) \
> + __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, state, 1, 1)
>
> #endif /* _LINUX_CPUIDLE_H */

2023-01-17 15:00:26

by Sudeep Holla

[permalink] [raw]
Subject: Re: [PATCH v3 00/51] cpuidle,rcu: Clean up the mess

On Tue, Jan 17, 2023 at 01:16:21PM +0000, Mark Rutland wrote:
> On Tue, Jan 17, 2023 at 11:26:29AM +0100, Peter Zijlstra wrote:
> > On Mon, Jan 16, 2023 at 04:59:04PM +0000, Mark Rutland wrote:
> >
> > > I'm sorry to have to bear some bad news on that front. :(
> >
> > Moo, something had to give..
> >
> >
> > > IIUC what's happenign here is the PSCI cpuidle driver has entered idle and RCU
> > > is no longer watching when arm64's cpu_suspend() manipulates DAIF. Our
> > > local_daif_*() helpers poke lockdep and tracing, hence the call to
> > > trace_hardirqs_off() and the RCU usage.
> >
> > Right, strictly speaking not needed at this point, IRQs should have been
> > traced off a long time ago.
>
> True, but there are some other calls around here that *might* end up invoking
> RCU stuff (e.g. the MTE code).
>
> That all needs a noinstr cleanup too, which I'll sort out as a follow-up.
>
> > > I think we need RCU to be watching all the way down to cpu_suspend(), and it's
> > > cpu_suspend() that should actually enter/exit idle context. That and we need to
> > > make cpu_suspend() and the low-level PSCI invocation noinstr.
> > >
> > > I'm not sure whether 32-bit will have a similar issue or not.
> >
> > I'm not seeing 32bit or Risc-V have similar issues here, but who knows,
> > maybe I missed somsething.
>
> I reckon if they do, the core changes here give us the infrastructure to fix
> them if/when we get reports.
>
> > In any case, the below ought to cure the ARM64 case and remove that last
> > known RCU_NONIDLE() user as a bonus.
>
> The below works for me testing on a Juno R1 board with PSCI, using defconfig +
> CONFIG_PROVE_LOCKING=y + CONFIG_DEBUG_LOCKDEP=y + CONFIG_DEBUG_ATOMIC_SLEEP=y.
> I'm not sure how to test the LPI / FFH part, but it looks good to me.
>
> FWIW:
>
> Reviewed-by: Mark Rutland <[email protected]>
> Tested-by: Mark Rutland <[email protected]>
>
> Sudeep, would you be able to give the LPI/FFH side a spin with the kconfig
> options above?
>

Not sure if I have messed up something in my mail setup, but I did reply
earlier. I did test both DT/cpuidle-psci driver and ACPI/LPI+FFH driver
with the fix Peter sent. I was seeing same splat as you in both DT and
ACPI boot which the patch fixed it. I used the same config as described by
you above.

--
Regards,
Sudeep

2023-01-17 15:42:42

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH v3 00/51] cpuidle,rcu: Clean up the mess

On Tue, Jan 17, 2023 at 02:21:40PM +0000, Sudeep Holla wrote:
> On Tue, Jan 17, 2023 at 01:16:21PM +0000, Mark Rutland wrote:
> > On Tue, Jan 17, 2023 at 11:26:29AM +0100, Peter Zijlstra wrote:
> > > On Mon, Jan 16, 2023 at 04:59:04PM +0000, Mark Rutland wrote:
> > >
> > > > I'm sorry to have to bear some bad news on that front. :(
> > >
> > > Moo, something had to give..
> > >
> > >
> > > > IIUC what's happenign here is the PSCI cpuidle driver has entered idle and RCU
> > > > is no longer watching when arm64's cpu_suspend() manipulates DAIF. Our
> > > > local_daif_*() helpers poke lockdep and tracing, hence the call to
> > > > trace_hardirqs_off() and the RCU usage.
> > >
> > > Right, strictly speaking not needed at this point, IRQs should have been
> > > traced off a long time ago.
> >
> > True, but there are some other calls around here that *might* end up invoking
> > RCU stuff (e.g. the MTE code).
> >
> > That all needs a noinstr cleanup too, which I'll sort out as a follow-up.
> >
> > > > I think we need RCU to be watching all the way down to cpu_suspend(), and it's
> > > > cpu_suspend() that should actually enter/exit idle context. That and we need to
> > > > make cpu_suspend() and the low-level PSCI invocation noinstr.
> > > >
> > > > I'm not sure whether 32-bit will have a similar issue or not.
> > >
> > > I'm not seeing 32bit or Risc-V have similar issues here, but who knows,
> > > maybe I missed somsething.
> >
> > I reckon if they do, the core changes here give us the infrastructure to fix
> > them if/when we get reports.
> >
> > > In any case, the below ought to cure the ARM64 case and remove that last
> > > known RCU_NONIDLE() user as a bonus.
> >
> > The below works for me testing on a Juno R1 board with PSCI, using defconfig +
> > CONFIG_PROVE_LOCKING=y + CONFIG_DEBUG_LOCKDEP=y + CONFIG_DEBUG_ATOMIC_SLEEP=y.
> > I'm not sure how to test the LPI / FFH part, but it looks good to me.
> >
> > FWIW:
> >
> > Reviewed-by: Mark Rutland <[email protected]>
> > Tested-by: Mark Rutland <[email protected]>
> >
> > Sudeep, would you be able to give the LPI/FFH side a spin with the kconfig
> > options above?
> >
>
> Not sure if I have messed up something in my mail setup, but I did reply
> earlier.

Sorry, that was my bad; I had been drafting my reply for a while and forgot to
re-check prior to sending.

> I did test both DT/cpuidle-psci driver and ACPI/LPI+FFH driver
> with the fix Peter sent. I was seeing same splat as you in both DT and
> ACPI boot which the patch fixed it. I used the same config as described by
> you above.

Perfect; thanks!

Mark.

Subject: [tip: sched/core] cpuidle, arm64: Fix the ARM64 cpuidle logic

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 19235e47279894b033a3ec5cf2732de634862b3a
Gitweb: https://git.kernel.org/tip/19235e47279894b033a3ec5cf2732de634862b3a
Author: Peter Zijlstra <[email protected]>
AuthorDate: Tue, 17 Jan 2023 11:26:29 +01:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Wed, 18 Jan 2023 12:27:17 +01:00

cpuidle, arm64: Fix the ARM64 cpuidle logic

The recent cpuidle changes started triggering RCU splats on
Juno development boards:

| =============================
| WARNING: suspicious RCU usage
| -----------------------------
| include/trace/events/ipi.h:19 suspicious rcu_dereference_check() usage!

Fix cpuidle on ARM64:

- ... by introducing a new 'is_rcu' flag to the cpuidle helpers & make
ARM64 use it, as ARM64 wants to keep RCU active longer and wants to
do the ct_cpuidle_enter()/exit() dance itself.

- Also update the PSCI driver accordingly.

- This also removes the last known RCU_NONIDLE() user as a bonus.

Reported-by: Mark Rutland <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Tested-by: Sudeep Holla <[email protected]>
Tested-by: Mark Rutland <[email protected]>
Reviewed-by: Mark Rutland <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

--
---
arch/arm64/kernel/cpuidle.c | 4 ++--
arch/arm64/kernel/suspend.c | 12 +++++++++++-
drivers/cpuidle/cpuidle-psci.c | 6 +-----
drivers/firmware/psci/psci.c | 11 +++++++++++
include/linux/cpuidle.h | 22 +++++++++++++++-------
5 files changed, 40 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/kernel/cpuidle.c b/arch/arm64/kernel/cpuidle.c
index 41974a1..42e19ff 100644
--- a/arch/arm64/kernel/cpuidle.c
+++ b/arch/arm64/kernel/cpuidle.c
@@ -67,10 +67,10 @@ __cpuidle int acpi_processor_ffh_lpi_enter(struct acpi_lpi_state *lpi)
u32 state = lpi->address;

if (ARM64_LPI_IS_RETENTION_STATE(lpi->arch_flags))
- return CPU_PM_CPU_IDLE_ENTER_RETENTION_PARAM(psci_cpu_suspend_enter,
+ return CPU_PM_CPU_IDLE_ENTER_RETENTION_PARAM_RCU(psci_cpu_suspend_enter,
lpi->index, state);
else
- return CPU_PM_CPU_IDLE_ENTER_PARAM(psci_cpu_suspend_enter,
+ return CPU_PM_CPU_IDLE_ENTER_PARAM_RCU(psci_cpu_suspend_enter,
lpi->index, state);
}
#endif
diff --git a/arch/arm64/kernel/suspend.c b/arch/arm64/kernel/suspend.c
index e7163f3..0fbdf5f 100644
--- a/arch/arm64/kernel/suspend.c
+++ b/arch/arm64/kernel/suspend.c
@@ -4,6 +4,7 @@
#include <linux/slab.h>
#include <linux/uaccess.h>
#include <linux/pgtable.h>
+#include <linux/cpuidle.h>
#include <asm/alternative.h>
#include <asm/cacheflush.h>
#include <asm/cpufeature.h>
@@ -104,6 +105,10 @@ int cpu_suspend(unsigned long arg, int (*fn)(unsigned long))
* From this point debug exceptions are disabled to prevent
* updates to mdscr register (saved and restored along with
* general purpose registers) from kernel debuggers.
+ *
+ * Strictly speaking the trace_hardirqs_off() here is superfluous,
+ * hardirqs should be firmly off by now. This really ought to use
+ * something like raw_local_daif_save().
*/
flags = local_daif_save();

@@ -120,6 +125,8 @@ int cpu_suspend(unsigned long arg, int (*fn)(unsigned long))
*/
arm_cpuidle_save_irq_context(&context);

+ ct_cpuidle_enter();
+
if (__cpu_suspend_enter(&state)) {
/* Call the suspend finisher */
ret = fn(arg);
@@ -133,8 +140,11 @@ int cpu_suspend(unsigned long arg, int (*fn)(unsigned long))
*/
if (!ret)
ret = -EOPNOTSUPP;
+
+ ct_cpuidle_exit();
} else {
- RCU_NONIDLE(__cpu_suspend_exit());
+ ct_cpuidle_exit();
+ __cpu_suspend_exit();
}

arm_cpuidle_restore_irq_context(&context);
diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
index 4fc4e03..312a34e 100644
--- a/drivers/cpuidle/cpuidle-psci.c
+++ b/drivers/cpuidle/cpuidle-psci.c
@@ -69,16 +69,12 @@ static __cpuidle int __psci_enter_domain_idle_state(struct cpuidle_device *dev,
else
pm_runtime_put_sync_suspend(pd_dev);

- ct_cpuidle_enter();
-
state = psci_get_domain_state();
if (!state)
state = states[idx];

ret = psci_cpu_suspend_enter(state) ? -1 : idx;

- ct_cpuidle_exit();
-
if (s2idle)
dev_pm_genpd_resume(pd_dev);
else
@@ -192,7 +188,7 @@ static __cpuidle int psci_enter_idle_state(struct cpuidle_device *dev,
{
u32 *state = __this_cpu_read(psci_cpuidle_data.psci_states);

- return CPU_PM_CPU_IDLE_ENTER_PARAM(psci_cpu_suspend_enter, idx, state[idx]);
+ return CPU_PM_CPU_IDLE_ENTER_PARAM_RCU(psci_cpu_suspend_enter, idx, state[idx]);
}

static const struct of_device_id psci_idle_state_match[] = {
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index e7bcfca..f3a044f 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -462,11 +462,22 @@ int psci_cpu_suspend_enter(u32 state)
if (!psci_power_state_loses_context(state)) {
struct arm_cpuidle_irq_context context;

+ ct_cpuidle_enter();
arm_cpuidle_save_irq_context(&context);
ret = psci_ops.cpu_suspend(state, 0);
arm_cpuidle_restore_irq_context(&context);
+ ct_cpuidle_exit();
} else {
+ /*
+ * ARM64 cpu_suspend() wants to do ct_cpuidle_*() itself.
+ */
+ if (!IS_ENABLED(CONFIG_ARM64))
+ ct_cpuidle_enter();
+
ret = cpu_suspend(state, psci_suspend_finisher);
+
+ if (!IS_ENABLED(CONFIG_ARM64))
+ ct_cpuidle_exit();
}

return ret;
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 630c879..3183aeb 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -307,7 +307,7 @@ extern s64 cpuidle_governor_latency_req(unsigned int cpu);
#define __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, \
idx, \
state, \
- is_retention) \
+ is_retention, is_rcu) \
({ \
int __ret = 0; \
\
@@ -319,9 +319,11 @@ extern s64 cpuidle_governor_latency_req(unsigned int cpu);
if (!is_retention) \
__ret = cpu_pm_enter(); \
if (!__ret) { \
- ct_cpuidle_enter(); \
+ if (!is_rcu) \
+ ct_cpuidle_enter(); \
__ret = low_level_idle_enter(state); \
- ct_cpuidle_exit(); \
+ if (!is_rcu) \
+ ct_cpuidle_exit(); \
if (!is_retention) \
cpu_pm_exit(); \
} \
@@ -330,15 +332,21 @@ extern s64 cpuidle_governor_latency_req(unsigned int cpu);
})

#define CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx) \
- __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, idx, 0)
+ __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, idx, 0, 0)

#define CPU_PM_CPU_IDLE_ENTER_RETENTION(low_level_idle_enter, idx) \
- __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, idx, 1)
+ __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, idx, 1, 0)

#define CPU_PM_CPU_IDLE_ENTER_PARAM(low_level_idle_enter, idx, state) \
- __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, state, 0)
+ __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, state, 0, 0)
+
+#define CPU_PM_CPU_IDLE_ENTER_PARAM_RCU(low_level_idle_enter, idx, state) \
+ __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, state, 0, 1)

#define CPU_PM_CPU_IDLE_ENTER_RETENTION_PARAM(low_level_idle_enter, idx, state) \
- __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, state, 1)
+ __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, state, 1, 0)
+
+#define CPU_PM_CPU_IDLE_ENTER_RETENTION_PARAM_RCU(low_level_idle_enter, idx, state) \
+ __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, state, 1, 1)

#endif /* _LINUX_CPUIDLE_H */