LinuxLists.cc - Re: [rcu] [ INFO: suspicious RCU usage. ]

2015-02-03 10:01:52

Subject: Re: [rcu] [ INFO: suspicious RCU usage. ]

On sob, 2015-01-31 at 18:59 -0800, Fengguang Wu wrote:
> Greetings,
>
> 0day kernel testing robot got the below dmesg and the first bad commit is
>
> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git revert-c418b8035fac0cc7d242e5de126cec1006a34bed-dd2b39be8eee9d175c7842c30e405a5cbe50095a

On next-20150203 I hit similar error on ARM/Exynos4412 (Trats2 board)
while suspending to RAM:

[ 30.986262] PM: Syncing filesystems ... done.
[ 30.994661] PM: Preparing system for mem sleep
[ 31.002064] Freezing user space processes ... (elapsed 0.002 seconds) done.
[ 31.008629] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[ 31.016325] PM: Entering mem sleep
[ 31.016338] Suspending console(s) (use no_console_suspend to debug)
[ 31.051009] random: nonblocking pool is initialized
[ 31.085811] wake enabled for irq 102
[ 31.086964] wake enabled for irq 123
[ 31.086972] wake enabled for irq 124
[ 31.090409] PM: suspend of devices complete after 59.684 msecs
[ 31.090524] CAM_ISP_CORE_1.2V: No configuration
[ 31.090534] VMEM_VDDF_3.0V: No configuration
[ 31.090543] VCC_SUB_2.0V: No configuration
[ 31.090552] VCC_SUB_1.35V: No configuration
[ 31.090562] VMEM_1.2V_AP: No configuration
[ 31.090587] MOTOR_VCC_3.0V: No configuration
[ 31.090596] LCD_VCC_3.3V: No configuration
[ 31.090605] TSP_VDD_1.8V: No configuration
[ 31.090614] TSP_AVDD_3.3V: No configuration
[ 31.090623] VMEM_VDD_2.8V: No configuration
[ 31.090631] VTF_2.8V: No configuration
[ 31.090640] VDDQ_PRE_1.8V: No configuration
[ 31.090649] VT_CAM_1.8V: No configuration
[ 31.090658] CAM_ISP_SEN_IO_1.8V: No configuration
[ 31.090667] CAM_SENSOR_CORE_1.2V: No configuration
[ 31.090677] VHSIC_1.8V: No configuration
[ 31.090685] VHSIC_1.0V: No configuration
[ 31.090694] VABB2_1.95V: No configuration
[ 31.090703] NFC_AVDD_1.8V: No configuration
[ 31.090712] VUOTG_3.0V: No configuration
[ 31.090721] VABB1_1.95V: No configuration
[ 31.090730] VMIPI_1.8V: No configuration
[ 31.090739] CAM_ISP_MIPI_1.2V: No configuration
[ 31.090747] VMIPI_1.0V: No configuration
[ 31.090756] VPLL_1.0V_AP: No configuration
[ 31.090765] VMPLL_1.0V_AP: No configuration
[ 31.090773] VCC_1.8V_IO: No configuration
[ 31.090782] VCC_2.8V_AP: No configuration
[ 31.090791] VCC_1.8V_AP: No configuration
[ 31.090800] VM1M2_1.2V_AP: No configuration
[ 31.090809] VALIVE_1.0V_AP: No configuration
[ 31.100297] PM: late suspend of devices complete after 9.445 msecs
[ 31.108891] PM: noirq suspend of devices complete after 8.577 msecs
[ 31.109052] Disabling non-boot CPUs ...
[ 31.113921]
[ 31.113925] ===============================
[ 31.113928] [ INFO: suspicious RCU usage. ]
[ 31.113935] 3.19.0-rc7-next-20150203 #1914 Not tainted
[ 31.113938] -------------------------------
[ 31.113943] kernel/sched/fair.c:4740 suspicious rcu_dereference_check() usage!
[ 31.113946]
[ 31.113946] other info that might help us debug this:
[ 31.113946]
[ 31.113952]
[ 31.113952] RCU used illegally from offline CPU!
[ 31.113952] rcu_scheduler_active = 1, debug_locks = 0
[ 31.113957] 3 locks held by swapper/1/0:
[ 31.113988] #0: ((cpu_died).wait.lock){......}, at: [<c005a114>] complete+0x14/0x44
[ 31.114012] #1: (&p->pi_lock){-.-.-.}, at: [<c004a790>] try_to_wake_up+0x28/0x300
[ 31.114035] #2: (rcu_read_lock){......}, at: [<c004f1b8>] select_task_rq_fair+0x5c/0xa04
[ 31.114038]
[ 31.114038] stack backtrace:
[ 31.114046] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150203 #1914
[ 31.114050] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[ 31.114076] [<c0014ce4>] (unwind_backtrace) from [<c0011c30>] (show_stack+0x10/0x14)
[ 31.114091] [<c0011c30>] (show_stack) from [<c04dc048>] (dump_stack+0x70/0xbc)
[ 31.114105] [<c04dc048>] (dump_stack) from [<c004f83c>] (select_task_rq_fair+0x6e0/0xa04)
[ 31.114118] [<c004f83c>] (select_task_rq_fair) from [<c004a83c>] (try_to_wake_up+0xd4/0x300)
[ 31.114129] [<c004a83c>] (try_to_wake_up) from [<c00598a0>] (__wake_up_common+0x4c/0x80)
[ 31.114140] [<c00598a0>] (__wake_up_common) from [<c00598e8>] (__wake_up_locked+0x14/0x1c)
[ 31.114150] [<c00598e8>] (__wake_up_locked) from [<c005a134>] (complete+0x34/0x44)
[ 31.114167] [<c005a134>] (complete) from [<c04d6ca4>] (cpu_die+0x24/0x84)
[ 31.114179] [<c04d6ca4>] (cpu_die) from [<c005a508>] (cpu_startup_entry+0x328/0x358)
[ 31.114189] [<c005a508>] (cpu_startup_entry) from [<40008784>] (0x40008784)
[ 31.114226] CPU1: shutdown
[ 31.132479] CPU2: shutdown
[ 31.146815] CPU3: shutdown
[ 31.160767] Enabling non-boot CPUs ...
[ 31.175645] CPU1 is up
[ 31.191120] CPU2 is up
[ 31.206650] CPU3 is up
[ 31.206922] s3c-i2c 13860000.i2c: slave address 0x10
[ 31.206935] s3c-i2c 13860000.i2c: bus frequency set to 390 KHz
[ 31.206952] s3c-i2c 13890000.i2c: slave address 0x10
[ 31.206962] s3c-i2c 13890000.i2c: bus frequency set to 390 KHz
[ 31.206978] s3c-i2c 138d0000.i2c: slave address 0x10
[ 31.206987] s3c-i2c 138d0000.i2c: bus frequency set to 97 KHz
[ 31.209201] PM: noirq resume of devices complete after 2.539 msecs
[ 31.212202] PM: early resume of devices complete after 2.812 msecs
[ 31.229844] Failed to resume regulators from suspend (-22)
[ 31.230915] wake disabled for irq 123
[ 31.230923] wake disabled for irq 124
[ 31.232003] wake disabled for irq 102
[ 31.259950] max77686_rtc_tm_to_data: MAX77686 RTC cannot handle the year 1970.Assume it's 2000.
[ 31.298929] mmc_host mmc1: Bus speed (slot 0) = 50000000Hz (slot req 400000Hz, actual 396825HZ div = 63)
[ 31.526729] mmc_host mmc1: Bus speed (slot 0) = 50000000Hz (slot req 52000000Hz, actual 50000000HZ div = 0)
[ 31.526976] mmc_host mmc1: Bus speed (slot 0) = 100000000Hz (slot req 52000000Hz, actual 50000000HZ div = 1)
[ 31.527207] PM: resume of devices complete after 297.352 msecs
[ 31.985665] PM: Finishing wakeup.
[ 31.988959] Restarting tasks ... done.
root@target:~#

Best regards,
Krzysztof

>
> commit dd2b39be8eee9d175c7842c30e405a5cbe50095a
> Author: Paul E. McKenney <[email protected]>
> AuthorDate: Wed Jan 28 14:42:09 2015 -0800
> Commit: Paul E. McKenney <[email protected]>
> CommitDate: Fri Jan 30 12:59:22 2015 -0800
>
> rcu: Handle outgoing CPUs on exit from idle loop
>
> This commit informs RCU of an outgoing CPU just before that CPU invokes
> arch_cpu_idle_dead() during its last pass through the idle loop (via a
> new CPU_DYING_IDLE notifier value). This change means that RCU need not
> deal with outgoing CPUs passing through the scheduler after informing
> RCU that they are no longer online. Note that removing the CPU from
> the rcu_node ->qsmaskinit bit masks is done at CPU_DYING_IDLE time,
> and orphaning callbacks is still done at CPU_DEAD time, the reason being
> that at CPU_DEAD time we have another CPU that can adopt them.
>
> Signed-off-by: Paul E. McKenney <[email protected]>
>
> +-------------------------------------+------------+------------+------------+
> | | 8c50d7da91 | dd2b39be8e | d586642522 |
> +-------------------------------------+------------+------------+------------+
> | boot_successes | 198 | 11 | 51 |
> | boot_failures | 0 | 55 | 15 |
> | INFO:suspicious_RCU_usage | 0 | 55 | 15 |
> | RCU_used_illegally_from_offline_CPU | 0 | 55 | 15 |
> | backtrace:cpu_startup_entry | 0 | 55 | 15 |
> | BUG:kernel_test_hang | 0 | 0 | 4 |
> +-------------------------------------+------------+------------+------------+
>
> [ 15.244825] numa_remove_cpu cpu 0 node 0: mask now 1
> [ 15.246713]
> [ 15.246917] ===============================
> [ 15.247424] [ INFO: suspicious RCU usage. ]
> [ 15.247964] 3.19.0-rc1-gdd2b39b #10 Not tainted
> [ 15.248531] -------------------------------
> [ 15.248586] include/trace/events/rcu.h:35 suspicious rcu_dereference_check() usage!
> [ 15.248586]
> [ 15.248586] other info that might help us debug this:
> [ 15.248586]
> [ 15.248586]
> [ 15.248586] RCU used illegally from offline CPU!
> [ 15.248586] rcu_scheduler_active = 1, debug_locks = 0
> [ 15.248586] no locks held by swapper/0/0.
> [ 15.248586]
> [ 15.248586] stack backtrace:
> [ 15.248586] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.19.0-rc1-gdd2b39b #10
> [ 15.248586] 0000000000000001 ffffffff81e03e08 ffffffff8171b89b 0000000000000000
> [ 15.248586] ffffffff81e0e580 ffffffff81e03e38 ffffffff810efec2 ffffffff81e4b140
> [ 15.248586] ffffffff81c77ba0 0000000000000002 ffffffff81e11e98 ffffffff81e03e58
> [ 15.248586] Call Trace:
> [ 15.248586] [<ffffffff8171b89b>] dump_stack+0x7f/0xa7
> [ 15.248586] [<ffffffff810efec2>] lockdep_rcu_suspicious+0x107/0x110
> [ 15.248586] [<ffffffff81111363>] trace_rcu_utilization+0x127/0x133
> [ 15.248586] [<ffffffff8111291e>] rcu_cpu_notify+0x527/0x53b
> [ 15.248586] [<ffffffff810e9722>] cpu_startup_entry+0x1dc/0x4ea
> [ 15.248586] [<ffffffff8170eb5d>] rest_init+0x159/0x15f
> [ 15.248586] [<ffffffff8237b2da>] start_kernel+0x565/0x572
> [ 15.248586] [<ffffffff8237a120>] ? early_idt_handlers+0x120/0x120
> [ 15.248586] [<ffffffff8237a4e4>] x86_64_start_reservations+0x41/0x43
> [ 15.248586] [<ffffffff8237a623>] x86_64_start_kernel+0x13d/0x14c
> [ 15.265151] CPU 0 is now offline
> [ 15.265941] debug: unmapping init [mem 0xffffffff82365000-0xffffffff82539fff]
> [ 15.266726] Write protecting the kernel read-only data: 14336k
>
> git bisect start d58664252218cfefb19709d597ff0c5d93688203 26bc420b59a38e4e6685a73345a0def461136dce --
> git bisect bad 19f7d9c2f948a4c5c7742adb16fe00920f35f302 # 13:29 23- 6 Merge 'jtkirshe-net-next/i40e-queue' into devel-roam-smoke-201501311226
> git bisect bad 2c86978183cc365003e2d6949052a30865ef8b89 # 13:33 34- 32 Merge 'wsa/i2c/for-next' into devel-roam-smoke-201501311226
> git bisect good 1ffdd3662d27b1d4d59d51bbcc104b200be63d6a # 13:37 66+ 0 Merge 'pci/pci/virtualization' into devel-roam-smoke-201501311226
> git bisect bad 0ce6ea6707a3d5ae5bfdbdc4d16ebc86cff77f5f # 13:43 32- 22 Merge 'rcu/rcu/next' into devel-roam-smoke-201501311226
> git bisect good 53805a9f2fa76294af534fb7e9f96d43f1d820eb # 13:52 66+ 0 Merge 'iio/testing' into devel-roam-smoke-201501311226
> git bisect good 78e691f4ae2d5edea0199ca802bb505b9cdced88 # 14:01 66+ 0 Merge branches 'doc.2015.01.07a', 'fixes.2015.01.15a', 'preempt.2015.01.06a', 'srcu.2015.01.06a', 'stall.2015.01.16a' and 'torture.2015.01.11a' into HEAD
> git bisect good 17366dc8dc49858ba931c4120d8de494e388d93e # 14:05 66+ 0 documentation: Update rcutree.kthread_prio for grace-period kthread use
> git bisect good 569c1500e44189136c8a9f4b5e39f0055e422b0d # 14:14 66+ 0 documentation: Update based on on-demand vmstat workers
> git bisect good 14fefdb410cf48327c972ce91deb5e98edc8671f # 14:18 66+ 0 rcu: Eliminate ->onoff_mutex from rcu_node structure
> git bisect bad dd2b39be8eee9d175c7842c30e405a5cbe50095a # 14:29 11- 55 rcu: Handle outgoing CPUs on exit from idle loop
> git bisect good 8c50d7da9124a9f1e92e13996a0a148b2431390d # 14:34 66+ 0 cpu: Make CPU-offline idle-loop transition point more precise
> # first bad commit: [dd2b39be8eee9d175c7842c30e405a5cbe50095a] rcu: Handle outgoing CPUs on exit from idle loop
> git bisect good 8c50d7da9124a9f1e92e13996a0a148b2431390d # 14:37 198+ 0 cpu: Make CPU-offline idle-loop transition point more precise
> # extra tests with DEBUG_INFO
> git bisect good dd2b39be8eee9d175c7842c30e405a5cbe50095a # 15:35 198+ 198 rcu: Handle outgoing CPUs on exit from idle loop
> # extra tests on HEAD of linux-devel/devel-roam-smoke-201501311226
> git bisect bad d58664252218cfefb19709d597ff0c5d93688203 # 15:35 0- 15 0day head guard for 'devel-roam-smoke-201501311226'
> # extra tests on tree/branch rcu/rcu/next
> git bisect bad c418b8035fac0cc7d242e5de126cec1006a34bed # 15:52 47- 21 cpu: Stop newly offlined CPU from using RCU readers
> # extra tests with first bad commit reverted
> # extra tests on tree/branch linus/master
> git bisect good 2141fd018156db0f29efb384f4d99ead23b48f18 # 16:04 198+ 0 Merge tag 'char-misc-3.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
> # extra tests on tree/branch next/master
> git bisect good 827e3bdf1bb2401c1a1e5586eb3977d228d298b2 # 16:12 198+ 0 Add linux-next specific files for 20150130
>
>
> This script may reproduce the error.
>
> ----------------------------------------------------------------------------
> #!/bin/bash
>
> kernel=$1
> initrd=quantal-core-x86_64.cgz
>
> wget --no-clobber https://github.com/fengguang/reproduce-kernel-bug/raw/master/initrd/$initrd
>
> kvm=(
> qemu-system-x86_64
> -cpu kvm64
> -enable-kvm
> -kernel $kernel
> -initrd $initrd
> -m 320
> -smp 2
> -net nic,vlan=1,model=e1000
> -net user,vlan=1
> -boot order=nc
> -no-reboot
> -watchdog i6300esb
> -rtc base=localtime
> -serial stdio
> -display none
> -monitor null
> )
>
> append=(
> hung_task_panic=1
> earlyprintk=ttyS0,115200
> debug
> apic=debug
> sysrq_always_enabled
> rcupdate.rcu_cpu_stall_timeout=100
> panic=-1
> softlockup_panic=1
> nmi_watchdog=panic
> oops=panic
> load_ramdisk=2
> prompt_ramdisk=0
> console=ttyS0,115200
> console=tty0
> vga=normal
> root=/dev/ram0
> rw
> drbd.minor_count=8
> )
>
> "${kvm[@]}" --append "${append[*]}"
> ----------------------------------------------------------------------------
>
> Thanks,
> Fengguang
> _______________________________________________
> LKP mailing list
> [email protected]

2015-02-03 16:27:24

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [rcu] [ INFO: suspicious RCU usage. ]

On Tue, Feb 03, 2015 at 11:01:42AM +0100, Krzysztof Kozlowski wrote:
> On sob, 2015-01-31 at 18:59 -0800, Fengguang Wu wrote:
> > Greetings,
> >
> > 0day kernel testing robot got the below dmesg and the first bad commit is
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git revert-c418b8035fac0cc7d242e5de126cec1006a34bed-dd2b39be8eee9d175c7842c30e405a5cbe50095a
>
> On next-20150203 I hit similar error on ARM/Exynos4412 (Trats2 board)
> while suspending to RAM:

Yep, you are not supposed to be using RCU on offline CPUs, and RCU recently
got more picky about that. This could cause failures in any environment
where CPUs could get delayed by more than one jiffy, which includes pretty
much all virtualized environements.

> [ 30.986262] PM: Syncing filesystems ... done.
> [ 30.994661] PM: Preparing system for mem sleep
> [ 31.002064] Freezing user space processes ... (elapsed 0.002 seconds) done.
> [ 31.008629] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> [ 31.016325] PM: Entering mem sleep
> [ 31.016338] Suspending console(s) (use no_console_suspend to debug)
> [ 31.051009] random: nonblocking pool is initialized
> [ 31.085811] wake enabled for irq 102
> [ 31.086964] wake enabled for irq 123
> [ 31.086972] wake enabled for irq 124
> [ 31.090409] PM: suspend of devices complete after 59.684 msecs
> [ 31.090524] CAM_ISP_CORE_1.2V: No configuration
> [ 31.090534] VMEM_VDDF_3.0V: No configuration
> [ 31.090543] VCC_SUB_2.0V: No configuration
> [ 31.090552] VCC_SUB_1.35V: No configuration
> [ 31.090562] VMEM_1.2V_AP: No configuration
> [ 31.090587] MOTOR_VCC_3.0V: No configuration
> [ 31.090596] LCD_VCC_3.3V: No configuration
> [ 31.090605] TSP_VDD_1.8V: No configuration
> [ 31.090614] TSP_AVDD_3.3V: No configuration
> [ 31.090623] VMEM_VDD_2.8V: No configuration
> [ 31.090631] VTF_2.8V: No configuration
> [ 31.090640] VDDQ_PRE_1.8V: No configuration
> [ 31.090649] VT_CAM_1.8V: No configuration
> [ 31.090658] CAM_ISP_SEN_IO_1.8V: No configuration
> [ 31.090667] CAM_SENSOR_CORE_1.2V: No configuration
> [ 31.090677] VHSIC_1.8V: No configuration
> [ 31.090685] VHSIC_1.0V: No configuration
> [ 31.090694] VABB2_1.95V: No configuration
> [ 31.090703] NFC_AVDD_1.8V: No configuration
> [ 31.090712] VUOTG_3.0V: No configuration
> [ 31.090721] VABB1_1.95V: No configuration
> [ 31.090730] VMIPI_1.8V: No configuration
> [ 31.090739] CAM_ISP_MIPI_1.2V: No configuration
> [ 31.090747] VMIPI_1.0V: No configuration
> [ 31.090756] VPLL_1.0V_AP: No configuration
> [ 31.090765] VMPLL_1.0V_AP: No configuration
> [ 31.090773] VCC_1.8V_IO: No configuration
> [ 31.090782] VCC_2.8V_AP: No configuration
> [ 31.090791] VCC_1.8V_AP: No configuration
> [ 31.090800] VM1M2_1.2V_AP: No configuration
> [ 31.090809] VALIVE_1.0V_AP: No configuration
> [ 31.100297] PM: late suspend of devices complete after 9.445 msecs
> [ 31.108891] PM: noirq suspend of devices complete after 8.577 msecs
> [ 31.109052] Disabling non-boot CPUs ...
> [ 31.113921]
> [ 31.113925] ===============================
> [ 31.113928] [ INFO: suspicious RCU usage. ]
> [ 31.113935] 3.19.0-rc7-next-20150203 #1914 Not tainted
> [ 31.113938] -------------------------------
> [ 31.113943] kernel/sched/fair.c:4740 suspicious rcu_dereference_check() usage!
> [ 31.113946]
> [ 31.113946] other info that might help us debug this:
> [ 31.113946]
> [ 31.113952]
> [ 31.113952] RCU used illegally from offline CPU!
> [ 31.113952] rcu_scheduler_active = 1, debug_locks = 0
> [ 31.113957] 3 locks held by swapper/1/0:
> [ 31.113988] #0: ((cpu_died).wait.lock){......}, at: [<c005a114>] complete+0x14/0x44
> [ 31.114012] #1: (&p->pi_lock){-.-.-.}, at: [<c004a790>] try_to_wake_up+0x28/0x300
> [ 31.114035] #2: (rcu_read_lock){......}, at: [<c004f1b8>] select_task_rq_fair+0x5c/0xa04
> [ 31.114038]
> [ 31.114038] stack backtrace:
> [ 31.114046] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150203 #1914
> [ 31.114050] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
> [ 31.114076] [<c0014ce4>] (unwind_backtrace) from [<c0011c30>] (show_stack+0x10/0x14)
> [ 31.114091] [<c0011c30>] (show_stack) from [<c04dc048>] (dump_stack+0x70/0xbc)
> [ 31.114105] [<c04dc048>] (dump_stack) from [<c004f83c>] (select_task_rq_fair+0x6e0/0xa04)
> [ 31.114118] [<c004f83c>] (select_task_rq_fair) from [<c004a83c>] (try_to_wake_up+0xd4/0x300)
> [ 31.114129] [<c004a83c>] (try_to_wake_up) from [<c00598a0>] (__wake_up_common+0x4c/0x80)
> [ 31.114140] [<c00598a0>] (__wake_up_common) from [<c00598e8>] (__wake_up_locked+0x14/0x1c)
> [ 31.114150] [<c00598e8>] (__wake_up_locked) from [<c005a134>] (complete+0x34/0x44)
> [ 31.114167] [<c005a134>] (complete) from [<c04d6ca4>] (cpu_die+0x24/0x84)
> [ 31.114179] [<c04d6ca4>] (cpu_die) from [<c005a508>] (cpu_startup_entry+0x328/0x358)

And so you no longer get to invoke complete() from the CPU going offline
out of the idle loop.

How would you like to handle this? One approach would be to make __cpu_die()
poll with appropriate duty cycle. Or is there some ARM-specific approach
that could work here?

Another thing I could do would be to have an arch-specific Kconfig
variable that made ARM responsible for informing RCU that the CPU
was departing, which would allow a call to as follows to be placed
immediately after the complete():

rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());

Note: This absolutely requires that the rcu_cpu_notify() -always-
be allowed to execute!!! This will not work if there is -any- possibility
of __cpu_die() powering off the outgoing CPU before the call to
rcu_cpu_notify() returns.

Thoughts?

Thanx, Paul

2015-02-04 11:39:14

by Krzysztof Kozlowski

[permalink] [raw]

Subject: Re: [rcu] [ INFO: suspicious RCU usage. ]

+Cc some ARM people

On wto, 2015-02-03 at 08:27 -0800, Paul E. McKenney wrote:
> On Tue, Feb 03, 2015 at 11:01:42AM +0100, Krzysztof Kozlowski wrote:
> > On sob, 2015-01-31 at 18:59 -0800, Fengguang Wu wrote:
> > > Greetings,
> > >
> > > 0day kernel testing robot got the below dmesg and the first bad commit is
> > >
> > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git revert-c418b8035fac0cc7d242e5de126cec1006a34bed-dd2b39be8eee9d175c7842c30e405a5cbe50095a
> >
> > On next-20150203 I hit similar error on ARM/Exynos4412 (Trats2 board)
> > while suspending to RAM:
>
> Yep, you are not supposed to be using RCU on offline CPUs, and RCU recently
> got more picky about that. This could cause failures in any environment
> where CPUs could get delayed by more than one jiffy, which includes pretty
> much all virtualized environements.
>
> > [ 30.986262] PM: Syncing filesystems ... done.
> > [ 30.994661] PM: Preparing system for mem sleep
> > [ 31.002064] Freezing user space processes ... (elapsed 0.002 seconds) done.
> > [ 31.008629] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> > [ 31.016325] PM: Entering mem sleep
> > [ 31.016338] Suspending console(s) (use no_console_suspend to debug)
> > [ 31.051009] random: nonblocking pool is initialized
> > [ 31.085811] wake enabled for irq 102
> > [ 31.086964] wake enabled for irq 123
> > [ 31.086972] wake enabled for irq 124
> > [ 31.090409] PM: suspend of devices complete after 59.684 msecs
> > [ 31.090524] CAM_ISP_CORE_1.2V: No configuration
> > [ 31.090534] VMEM_VDDF_3.0V: No configuration
> > [ 31.090543] VCC_SUB_2.0V: No configuration
> > [ 31.090552] VCC_SUB_1.35V: No configuration
> > [ 31.090562] VMEM_1.2V_AP: No configuration
> > [ 31.090587] MOTOR_VCC_3.0V: No configuration
> > [ 31.090596] LCD_VCC_3.3V: No configuration
> > [ 31.090605] TSP_VDD_1.8V: No configuration
> > [ 31.090614] TSP_AVDD_3.3V: No configuration
> > [ 31.090623] VMEM_VDD_2.8V: No configuration
> > [ 31.090631] VTF_2.8V: No configuration
> > [ 31.090640] VDDQ_PRE_1.8V: No configuration
> > [ 31.090649] VT_CAM_1.8V: No configuration
> > [ 31.090658] CAM_ISP_SEN_IO_1.8V: No configuration
> > [ 31.090667] CAM_SENSOR_CORE_1.2V: No configuration
> > [ 31.090677] VHSIC_1.8V: No configuration
> > [ 31.090685] VHSIC_1.0V: No configuration
> > [ 31.090694] VABB2_1.95V: No configuration
> > [ 31.090703] NFC_AVDD_1.8V: No configuration
> > [ 31.090712] VUOTG_3.0V: No configuration
> > [ 31.090721] VABB1_1.95V: No configuration
> > [ 31.090730] VMIPI_1.8V: No configuration
> > [ 31.090739] CAM_ISP_MIPI_1.2V: No configuration
> > [ 31.090747] VMIPI_1.0V: No configuration
> > [ 31.090756] VPLL_1.0V_AP: No configuration
> > [ 31.090765] VMPLL_1.0V_AP: No configuration
> > [ 31.090773] VCC_1.8V_IO: No configuration
> > [ 31.090782] VCC_2.8V_AP: No configuration
> > [ 31.090791] VCC_1.8V_AP: No configuration
> > [ 31.090800] VM1M2_1.2V_AP: No configuration
> > [ 31.090809] VALIVE_1.0V_AP: No configuration
> > [ 31.100297] PM: late suspend of devices complete after 9.445 msecs
> > [ 31.108891] PM: noirq suspend of devices complete after 8.577 msecs
> > [ 31.109052] Disabling non-boot CPUs ...
> > [ 31.113921]
> > [ 31.113925] ===============================
> > [ 31.113928] [ INFO: suspicious RCU usage. ]
> > [ 31.113935] 3.19.0-rc7-next-20150203 #1914 Not tainted
> > [ 31.113938] -------------------------------
> > [ 31.113943] kernel/sched/fair.c:4740 suspicious rcu_dereference_check() usage!
> > [ 31.113946]
> > [ 31.113946] other info that might help us debug this:
> > [ 31.113946]
> > [ 31.113952]
> > [ 31.113952] RCU used illegally from offline CPU!
> > [ 31.113952] rcu_scheduler_active = 1, debug_locks = 0
> > [ 31.113957] 3 locks held by swapper/1/0:
> > [ 31.113988] #0: ((cpu_died).wait.lock){......}, at: [<c005a114>] complete+0x14/0x44
> > [ 31.114012] #1: (&p->pi_lock){-.-.-.}, at: [<c004a790>] try_to_wake_up+0x28/0x300
> > [ 31.114035] #2: (rcu_read_lock){......}, at: [<c004f1b8>] select_task_rq_fair+0x5c/0xa04
> > [ 31.114038]
> > [ 31.114038] stack backtrace:
> > [ 31.114046] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150203 #1914
> > [ 31.114050] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
> > [ 31.114076] [<c0014ce4>] (unwind_backtrace) from [<c0011c30>] (show_stack+0x10/0x14)
> > [ 31.114091] [<c0011c30>] (show_stack) from [<c04dc048>] (dump_stack+0x70/0xbc)
> > [ 31.114105] [<c04dc048>] (dump_stack) from [<c004f83c>] (select_task_rq_fair+0x6e0/0xa04)
> > [ 31.114118] [<c004f83c>] (select_task_rq_fair) from [<c004a83c>] (try_to_wake_up+0xd4/0x300)
> > [ 31.114129] [<c004a83c>] (try_to_wake_up) from [<c00598a0>] (__wake_up_common+0x4c/0x80)
> > [ 31.114140] [<c00598a0>] (__wake_up_common) from [<c00598e8>] (__wake_up_locked+0x14/0x1c)
> > [ 31.114150] [<c00598e8>] (__wake_up_locked) from [<c005a134>] (complete+0x34/0x44)
> > [ 31.114167] [<c005a134>] (complete) from [<c04d6ca4>] (cpu_die+0x24/0x84)
> > [ 31.114179] [<c04d6ca4>] (cpu_die) from [<c005a508>] (cpu_startup_entry+0x328/0x358)
>
> And so you no longer get to invoke complete() from the CPU going offline
> out of the idle loop.
>
> How would you like to handle this? One approach would be to make __cpu_die()
> poll with appropriate duty cycle.

The polling could work but that would be somehow reinventing the
wait/complete.

> Or is there some ARM-specific approach
> that could work here?

I am not aware of such. Anyone?

>
> Another thing I could do would be to have an arch-specific Kconfig
> variable that made ARM responsible for informing RCU that the CPU
> was departing, which would allow a call to as follows to be placed
> immediately after the complete():
>
> rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());
>
> Note: This absolutely requires that the rcu_cpu_notify() -always-
> be allowed to execute!!! This will not work if there is -any- possibility
> of __cpu_die() powering off the outgoing CPU before the call to
> rcu_cpu_notify() returns.

The problem is that __cpu_die() (waiting for completion signal) may cut
the power of dying CPU.

It could however wait for all RCU callbacks before powering down.
rcu_barrier() would do the trick?

rcu_barrier();
if (!platform_cpu_kill(cpu))
pr_err("CPU%u: unable to kill\n", cpu);

Best regards,
Krzysztof

2015-02-04 13:00:43

by Russell King - ARM Linux

[permalink] [raw]

Subject: Re: [rcu] [ INFO: suspicious RCU usage. ]

On Wed, Feb 04, 2015 at 12:39:07PM +0100, Krzysztof Kozlowski wrote:
> +Cc some ARM people

I wish that people would CC this list with problems seen on ARM. I'm
minded to just ignore this message because of this in the hope that by
doing so, people will learn something...

> > Another thing I could do would be to have an arch-specific Kconfig
> > variable that made ARM responsible for informing RCU that the CPU
> > was departing, which would allow a call to as follows to be placed
> > immediately after the complete():
> >
> > rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());
> >
> > Note: This absolutely requires that the rcu_cpu_notify() -always-
> > be allowed to execute!!! This will not work if there is -any- possibility
> > of __cpu_die() powering off the outgoing CPU before the call to
> > rcu_cpu_notify() returns.

Exactly, so that's not going to be possible. The completion at that
point marks the point at which power _could_ be removed from the CPU
going down.

--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

2015-02-04 13:13:40

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [rcu] [ INFO: suspicious RCU usage. ]

On Wed, Feb 04, 2015 at 12:39:07PM +0100, Krzysztof Kozlowski wrote:
> +Cc some ARM people
>
>
> On wto, 2015-02-03 at 08:27 -0800, Paul E. McKenney wrote:
> > On Tue, Feb 03, 2015 at 11:01:42AM +0100, Krzysztof Kozlowski wrote:
> > > On sob, 2015-01-31 at 18:59 -0800, Fengguang Wu wrote:
> > > > Greetings,
> > > >
> > > > 0day kernel testing robot got the below dmesg and the first bad commit is
> > > >
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git revert-c418b8035fac0cc7d242e5de126cec1006a34bed-dd2b39be8eee9d175c7842c30e405a5cbe50095a
> > >
> > > On next-20150203 I hit similar error on ARM/Exynos4412 (Trats2 board)
> > > while suspending to RAM:
> >
> > Yep, you are not supposed to be using RCU on offline CPUs, and RCU recently
> > got more picky about that. This could cause failures in any environment
> > where CPUs could get delayed by more than one jiffy, which includes pretty
> > much all virtualized environements.
> >
> > > [ 30.986262] PM: Syncing filesystems ... done.
> > > [ 30.994661] PM: Preparing system for mem sleep
> > > [ 31.002064] Freezing user space processes ... (elapsed 0.002 seconds) done.
> > > [ 31.008629] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> > > [ 31.016325] PM: Entering mem sleep
> > > [ 31.016338] Suspending console(s) (use no_console_suspend to debug)
> > > [ 31.051009] random: nonblocking pool is initialized
> > > [ 31.085811] wake enabled for irq 102
> > > [ 31.086964] wake enabled for irq 123
> > > [ 31.086972] wake enabled for irq 124
> > > [ 31.090409] PM: suspend of devices complete after 59.684 msecs
> > > [ 31.090524] CAM_ISP_CORE_1.2V: No configuration
> > > [ 31.090534] VMEM_VDDF_3.0V: No configuration
> > > [ 31.090543] VCC_SUB_2.0V: No configuration
> > > [ 31.090552] VCC_SUB_1.35V: No configuration
> > > [ 31.090562] VMEM_1.2V_AP: No configuration
> > > [ 31.090587] MOTOR_VCC_3.0V: No configuration
> > > [ 31.090596] LCD_VCC_3.3V: No configuration
> > > [ 31.090605] TSP_VDD_1.8V: No configuration
> > > [ 31.090614] TSP_AVDD_3.3V: No configuration
> > > [ 31.090623] VMEM_VDD_2.8V: No configuration
> > > [ 31.090631] VTF_2.8V: No configuration
> > > [ 31.090640] VDDQ_PRE_1.8V: No configuration
> > > [ 31.090649] VT_CAM_1.8V: No configuration
> > > [ 31.090658] CAM_ISP_SEN_IO_1.8V: No configuration
> > > [ 31.090667] CAM_SENSOR_CORE_1.2V: No configuration
> > > [ 31.090677] VHSIC_1.8V: No configuration
> > > [ 31.090685] VHSIC_1.0V: No configuration
> > > [ 31.090694] VABB2_1.95V: No configuration
> > > [ 31.090703] NFC_AVDD_1.8V: No configuration
> > > [ 31.090712] VUOTG_3.0V: No configuration
> > > [ 31.090721] VABB1_1.95V: No configuration
> > > [ 31.090730] VMIPI_1.8V: No configuration
> > > [ 31.090739] CAM_ISP_MIPI_1.2V: No configuration
> > > [ 31.090747] VMIPI_1.0V: No configuration
> > > [ 31.090756] VPLL_1.0V_AP: No configuration
> > > [ 31.090765] VMPLL_1.0V_AP: No configuration
> > > [ 31.090773] VCC_1.8V_IO: No configuration
> > > [ 31.090782] VCC_2.8V_AP: No configuration
> > > [ 31.090791] VCC_1.8V_AP: No configuration
> > > [ 31.090800] VM1M2_1.2V_AP: No configuration
> > > [ 31.090809] VALIVE_1.0V_AP: No configuration
> > > [ 31.100297] PM: late suspend of devices complete after 9.445 msecs
> > > [ 31.108891] PM: noirq suspend of devices complete after 8.577 msecs
> > > [ 31.109052] Disabling non-boot CPUs ...
> > > [ 31.113921]
> > > [ 31.113925] ===============================
> > > [ 31.113928] [ INFO: suspicious RCU usage. ]
> > > [ 31.113935] 3.19.0-rc7-next-20150203 #1914 Not tainted
> > > [ 31.113938] -------------------------------
> > > [ 31.113943] kernel/sched/fair.c:4740 suspicious rcu_dereference_check() usage!
> > > [ 31.113946]
> > > [ 31.113946] other info that might help us debug this:
> > > [ 31.113946]
> > > [ 31.113952]
> > > [ 31.113952] RCU used illegally from offline CPU!
> > > [ 31.113952] rcu_scheduler_active = 1, debug_locks = 0
> > > [ 31.113957] 3 locks held by swapper/1/0:
> > > [ 31.113988] #0: ((cpu_died).wait.lock){......}, at: [<c005a114>] complete+0x14/0x44
> > > [ 31.114012] #1: (&p->pi_lock){-.-.-.}, at: [<c004a790>] try_to_wake_up+0x28/0x300
> > > [ 31.114035] #2: (rcu_read_lock){......}, at: [<c004f1b8>] select_task_rq_fair+0x5c/0xa04
> > > [ 31.114038]
> > > [ 31.114038] stack backtrace:
> > > [ 31.114046] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150203 #1914
> > > [ 31.114050] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
> > > [ 31.114076] [<c0014ce4>] (unwind_backtrace) from [<c0011c30>] (show_stack+0x10/0x14)
> > > [ 31.114091] [<c0011c30>] (show_stack) from [<c04dc048>] (dump_stack+0x70/0xbc)
> > > [ 31.114105] [<c04dc048>] (dump_stack) from [<c004f83c>] (select_task_rq_fair+0x6e0/0xa04)
> > > [ 31.114118] [<c004f83c>] (select_task_rq_fair) from [<c004a83c>] (try_to_wake_up+0xd4/0x300)
> > > [ 31.114129] [<c004a83c>] (try_to_wake_up) from [<c00598a0>] (__wake_up_common+0x4c/0x80)
> > > [ 31.114140] [<c00598a0>] (__wake_up_common) from [<c00598e8>] (__wake_up_locked+0x14/0x1c)
> > > [ 31.114150] [<c00598e8>] (__wake_up_locked) from [<c005a134>] (complete+0x34/0x44)
> > > [ 31.114167] [<c005a134>] (complete) from [<c04d6ca4>] (cpu_die+0x24/0x84)
> > > [ 31.114179] [<c04d6ca4>] (cpu_die) from [<c005a508>] (cpu_startup_entry+0x328/0x358)
> >
> > And so you no longer get to invoke complete() from the CPU going offline
> > out of the idle loop.
> >
> > How would you like to handle this? One approach would be to make __cpu_die()
> > poll with appropriate duty cycle.
>
> The polling could work but that would be somehow reinventing the
> wait/complete.

Yeah, well, the CPU has reached a point in the offline process where it
cannot use the scheduler, so...

> > Or is there some ARM-specific approach
> > that could work here?
>
> I am not aware of such. Anyone?
>
> > Another thing I could do would be to have an arch-specific Kconfig
> > variable that made ARM responsible for informing RCU that the CPU
> > was departing, which would allow a call to as follows to be placed
> > immediately after the complete():
> >
> > rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());
> >
> > Note: This absolutely requires that the rcu_cpu_notify() -always-
> > be allowed to execute!!! This will not work if there is -any- possibility
> > of __cpu_die() powering off the outgoing CPU before the call to
> > rcu_cpu_notify() returns.
>
> The problem is that __cpu_die() (waiting for completion signal) may cut
> the power of dying CPU.

I was afraid of that...

> It could however wait for all RCU callbacks before powering down.
> rcu_barrier() would do the trick?
>
> rcu_barrier();
> if (!platform_cpu_kill(cpu))
> pr_err("CPU%u: unable to kill\n", cpu);

Unfortunately, no. The rcu_barrier() function can block, which is
not permitted when preemption is disabled, as it is at this point
in the idle loop.

So polling loop, then?

Thanx, Paul

2015-02-04 13:14:28

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [rcu] [ INFO: suspicious RCU usage. ]

On Wed, Feb 04, 2015 at 01:00:18PM +0000, Russell King - ARM Linux wrote:
> On Wed, Feb 04, 2015 at 12:39:07PM +0100, Krzysztof Kozlowski wrote:
> > +Cc some ARM people
>
> I wish that people would CC this list with problems seen on ARM. I'm
> minded to just ignore this message because of this in the hope that by
> doing so, people will learn something...
>
> > > Another thing I could do would be to have an arch-specific Kconfig
> > > variable that made ARM responsible for informing RCU that the CPU
> > > was departing, which would allow a call to as follows to be placed
> > > immediately after the complete():
> > >
> > > rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());
> > >
> > > Note: This absolutely requires that the rcu_cpu_notify() -always-
> > > be allowed to execute!!! This will not work if there is -any- possibility
> > > of __cpu_die() powering off the outgoing CPU before the call to
> > > rcu_cpu_notify() returns.
>
> Exactly, so that's not going to be possible. The completion at that
> point marks the point at which power _could_ be removed from the CPU
> going down.

OK, sounds like a polling loop is required.

Thanx, Paul

2015-02-04 14:16:33

by Krzysztof Kozlowski

[permalink] [raw]

Subject: Re: [rcu] [ INFO: suspicious RCU usage. ]

On śro, 2015-02-04 at 05:14 -0800, Paul E. McKenney wrote:
> On Wed, Feb 04, 2015 at 01:00:18PM +0000, Russell King - ARM Linux wrote:
> > On Wed, Feb 04, 2015 at 12:39:07PM +0100, Krzysztof Kozlowski wrote:
> > > +Cc some ARM people
> >
> > I wish that people would CC this list with problems seen on ARM. I'm
> > minded to just ignore this message because of this in the hope that by
> > doing so, people will learn something...
> >
> > > > Another thing I could do would be to have an arch-specific Kconfig
> > > > variable that made ARM responsible for informing RCU that the CPU
> > > > was departing, which would allow a call to as follows to be placed
> > > > immediately after the complete():
> > > >
> > > > rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());
> > > >
> > > > Note: This absolutely requires that the rcu_cpu_notify() -always-
> > > > be allowed to execute!!! This will not work if there is -any- possibility
> > > > of __cpu_die() powering off the outgoing CPU before the call to
> > > > rcu_cpu_notify() returns.
> >
> > Exactly, so that's not going to be possible. The completion at that
> > point marks the point at which power _could_ be removed from the CPU
> > going down.
>
> OK, sounds like a polling loop is required.

I thought about using wait_on_bit() in __cpu_die() (the waiting thread)
and clearing the bit on CPU being powered down. What do you think about
such idea?

Best regards,
Krzysztof

2015-02-04 15:10:54

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [rcu] [ INFO: suspicious RCU usage. ]

On Wed, Feb 04, 2015 at 03:16:27PM +0100, Krzysztof Kozlowski wrote:
> On śro, 2015-02-04 at 05:14 -0800, Paul E. McKenney wrote:
> > On Wed, Feb 04, 2015 at 01:00:18PM +0000, Russell King - ARM Linux wrote:
> > > On Wed, Feb 04, 2015 at 12:39:07PM +0100, Krzysztof Kozlowski wrote:
> > > > +Cc some ARM people
> > >
> > > I wish that people would CC this list with problems seen on ARM. I'm
> > > minded to just ignore this message because of this in the hope that by
> > > doing so, people will learn something...
> > >
> > > > > Another thing I could do would be to have an arch-specific Kconfig
> > > > > variable that made ARM responsible for informing RCU that the CPU
> > > > > was departing, which would allow a call to as follows to be placed
> > > > > immediately after the complete():
> > > > >
> > > > > rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());
> > > > >
> > > > > Note: This absolutely requires that the rcu_cpu_notify() -always-
> > > > > be allowed to execute!!! This will not work if there is -any- possibility
> > > > > of __cpu_die() powering off the outgoing CPU before the call to
> > > > > rcu_cpu_notify() returns.
> > >
> > > Exactly, so that's not going to be possible. The completion at that
> > > point marks the point at which power _could_ be removed from the CPU
> > > going down.
> >
> > OK, sounds like a polling loop is required.
>
> I thought about using wait_on_bit() in __cpu_die() (the waiting thread)
> and clearing the bit on CPU being powered down. What do you think about
> such idea?

Hmmm... It looks to me that wait_on_bit() calls out_of_line_wait_on_bit(),
which in turn calls __wait_on_bit(), which calls prepare_to_wait() and
finish_wait(). These are in the scheduler, but this is being called from
the CPU that remains online, so that should be OK.

But what do you invoke on the outgoing CPU? Can you get away with
simply clearing the bit, or do you also have to do a wakeup? It looks
to me like a wakeup is required, which would be illegal on the outgoing
CPU, which is at a point where it cannot legally invoke the scheduler.
Or am I missing something?

You know, this situation is giving me a bad case of nostalgia for the
old Sequent Symmetry and NUMA-Q hardware. On those platforms, the
outgoing CPU could turn itself off, and thus didn't need to tell some
other CPU when it was ready to be turned off. Seems to me that this
self-turn-off capability would be a great feature for future systems!

Thanx, Paul

2015-02-04 15:16:39

by Russell King - ARM Linux

[permalink] [raw]

Subject: Re: [rcu] [ INFO: suspicious RCU usage. ]

On Wed, Feb 04, 2015 at 07:10:28AM -0800, Paul E. McKenney wrote:
> You know, this situation is giving me a bad case of nostalgia for the
> old Sequent Symmetry and NUMA-Q hardware. On those platforms, the
> outgoing CPU could turn itself off, and thus didn't need to tell some
> other CPU when it was ready to be turned off. Seems to me that this
> self-turn-off capability would be a great feature for future systems!

Unfortunately, some briliant people decided that secure firmware on
their platforms (which is sometimes needed to turn the secondary CPUs
off) can only be called by CPU0...

Other people decide that they can power down the secondary CPU when it
hits a WFI (wait for interrupt) instruction after arming that state
change, which is far saner - but we still need to know on the requesting
CPU when the dying CPU has completed the time-expensive parts of the
offlining process.

--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

2015-02-04 15:34:58

by Krzysztof Kozlowski

[permalink] [raw]

Subject: Re: [rcu] [ INFO: suspicious RCU usage. ]

On śro, 2015-02-04 at 07:10 -0800, Paul E. McKenney wrote:
> On Wed, Feb 04, 2015 at 03:16:27PM +0100, Krzysztof Kozlowski wrote:
> > On śro, 2015-02-04 at 05:14 -0800, Paul E. McKenney wrote:
> > > On Wed, Feb 04, 2015 at 01:00:18PM +0000, Russell King - ARM Linux wrote:
> > > > On Wed, Feb 04, 2015 at 12:39:07PM +0100, Krzysztof Kozlowski wrote:
> > > > > +Cc some ARM people
> > > >
> > > > I wish that people would CC this list with problems seen on ARM. I'm
> > > > minded to just ignore this message because of this in the hope that by
> > > > doing so, people will learn something...
> > > >
> > > > > > Another thing I could do would be to have an arch-specific Kconfig
> > > > > > variable that made ARM responsible for informing RCU that the CPU
> > > > > > was departing, which would allow a call to as follows to be placed
> > > > > > immediately after the complete():
> > > > > >
> > > > > > rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());
> > > > > >
> > > > > > Note: This absolutely requires that the rcu_cpu_notify() -always-
> > > > > > be allowed to execute!!! This will not work if there is -any- possibility
> > > > > > of __cpu_die() powering off the outgoing CPU before the call to
> > > > > > rcu_cpu_notify() returns.
> > > >
> > > > Exactly, so that's not going to be possible. The completion at that
> > > > point marks the point at which power _could_ be removed from the CPU
> > > > going down.
> > >
> > > OK, sounds like a polling loop is required.
> >
> > I thought about using wait_on_bit() in __cpu_die() (the waiting thread)
> > and clearing the bit on CPU being powered down. What do you think about
> > such idea?
>
> Hmmm... It looks to me that wait_on_bit() calls out_of_line_wait_on_bit(),
> which in turn calls __wait_on_bit(), which calls prepare_to_wait() and
> finish_wait(). These are in the scheduler, but this is being called from
> the CPU that remains online, so that should be OK.
>
> But what do you invoke on the outgoing CPU? Can you get away with
> simply clearing the bit, or do you also have to do a wakeup? It looks
> to me like a wakeup is required, which would be illegal on the outgoing
> CPU, which is at a point where it cannot legally invoke the scheduler.
> Or am I missing something?

Actually the timeout versions but I think that doesn't matter.
The wait_on_bit will busy-loop with testing for the bit. Inside the loop
it calls the 'action' which in my case will be bit_wait_io_timeout().
This calls schedule_timeout().

See proof of concept in attachment. One observed issue: hot unplug from
commandline takes a lot more time. About 7 seconds instead of ~0.5.
Probably I did something wrong.

>
> You know, this situation is giving me a bad case of nostalgia for the
> old Sequent Symmetry and NUMA-Q hardware. On those platforms, the
> outgoing CPU could turn itself off, and thus didn't need to tell some
> other CPU when it was ready to be turned off. Seems to me that this
> self-turn-off capability would be a great feature for future systems!

There are a lot more issues with hotplug on ARM...

Patch/RFC attached.

Attachments:

0001-ARM-Don-t-use-complete-during-__cpu_die.patch (2.26 kB)

2015-02-04 15:52:41

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [rcu] [ INFO: suspicious RCU usage. ]

On Wed, Feb 04, 2015 at 03:16:24PM +0000, Russell King - ARM Linux wrote:
> On Wed, Feb 04, 2015 at 07:10:28AM -0800, Paul E. McKenney wrote:
> > You know, this situation is giving me a bad case of nostalgia for the
> > old Sequent Symmetry and NUMA-Q hardware. On those platforms, the
> > outgoing CPU could turn itself off, and thus didn't need to tell some
> > other CPU when it was ready to be turned off. Seems to me that this
> > self-turn-off capability would be a great feature for future systems!
>
> Unfortunately, some briliant people decided that secure firmware on
> their platforms (which is sometimes needed to turn the secondary CPUs
> off) can only be called by CPU0...
>
> Other people decide that they can power down the secondary CPU when it
> hits a WFI (wait for interrupt) instruction after arming that state
> change, which is far saner - but we still need to know on the requesting
> CPU when the dying CPU has completed the time-expensive parts of the
> offlining process.

I suppose that you could grant the outgoing CPU the ability to arm
that state, but easy for me to say...

Anyway, still looks like a pure polling loop is required, with short
timed waits running on the surviving CPU.

Thanx, Paul

2015-02-04 15:56:40

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [rcu] [ INFO: suspicious RCU usage. ]

On Wed, Feb 04, 2015 at 04:22:28PM +0100, Krzysztof Kozlowski wrote:
> On śro, 2015-02-04 at 07:10 -0800, Paul E. McKenney wrote:
> > On Wed, Feb 04, 2015 at 03:16:27PM +0100, Krzysztof Kozlowski wrote:
> > > On śro, 2015-02-04 at 05:14 -0800, Paul E. McKenney wrote:
> > > > On Wed, Feb 04, 2015 at 01:00:18PM +0000, Russell King - ARM Linux wrote:
> > > > > On Wed, Feb 04, 2015 at 12:39:07PM +0100, Krzysztof Kozlowski wrote:
> > > > > > +Cc some ARM people
> > > > >
> > > > > I wish that people would CC this list with problems seen on ARM. I'm
> > > > > minded to just ignore this message because of this in the hope that by
> > > > > doing so, people will learn something...
> > > > >
> > > > > > > Another thing I could do would be to have an arch-specific Kconfig
> > > > > > > variable that made ARM responsible for informing RCU that the CPU
> > > > > > > was departing, which would allow a call to as follows to be placed
> > > > > > > immediately after the complete():
> > > > > > >
> > > > > > > rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());
> > > > > > >
> > > > > > > Note: This absolutely requires that the rcu_cpu_notify() -always-
> > > > > > > be allowed to execute!!! This will not work if there is -any- possibility
> > > > > > > of __cpu_die() powering off the outgoing CPU before the call to
> > > > > > > rcu_cpu_notify() returns.
> > > > >
> > > > > Exactly, so that's not going to be possible. The completion at that
> > > > > point marks the point at which power _could_ be removed from the CPU
> > > > > going down.
> > > >
> > > > OK, sounds like a polling loop is required.
> > >
> > > I thought about using wait_on_bit() in __cpu_die() (the waiting thread)
> > > and clearing the bit on CPU being powered down. What do you think about
> > > such idea?
> >
> > Hmmm... It looks to me that wait_on_bit() calls out_of_line_wait_on_bit(),
> > which in turn calls __wait_on_bit(), which calls prepare_to_wait() and
> > finish_wait(). These are in the scheduler, but this is being called from
> > the CPU that remains online, so that should be OK.
> >
> > But what do you invoke on the outgoing CPU? Can you get away with
> > simply clearing the bit, or do you also have to do a wakeup? It looks
> > to me like a wakeup is required, which would be illegal on the outgoing
> > CPU, which is at a point where it cannot legally invoke the scheduler.
> > Or am I missing something?
>
> Actually the timeout versions but I think that doesn't matter.
> The wait_on_bit will busy-loop with testing for the bit. Inside the loop
> it calls the 'action' which in my case will be bit_wait_io_timeout().
> This calls schedule_timeout().

Ah, good point.

> See proof of concept in attachment. One observed issue: hot unplug from
> commandline takes a lot more time. About 7 seconds instead of ~0.5.
> Probably I did something wrong.

Well, you do set the timeout to five seconds, and so if the condition
does not get set before the surviving CPU finds its way to the
out_of_line_wait_on_bit_timeout(), you are guaranteed to wait for at
least five seconds.

One alternative approach would be to have a loop around a series of
shorter waits. Other thoughts?

> > You know, this situation is giving me a bad case of nostalgia for the
> > old Sequent Symmetry and NUMA-Q hardware. On those platforms, the
> > outgoing CPU could turn itself off, and thus didn't need to tell some
> > other CPU when it was ready to be turned off. Seems to me that this
> > self-turn-off capability would be a great feature for future systems!
>
> There are a lot more issues with hotplug on ARM...

Just trying to clean up this particular corner at the moment. ;-)

> Patch/RFC attached.

Again, I believe that you will need to loop over a shorter timeout
in order to get reasonable latencies. If waiting a millisecond at
a time is an energy-efficiency concern (don't know why it would be
in this rare case, but...), then one approach would be to start
with very short waits, then increase the wait time, for example,
doubling the wait time on each pass through the loop would result
in a smallish number of wakeups, but would mean that you waited
no more than twice as long as necessary.

Thoughts?

Thanx, Paul

> >From feaad18a483871747170fa797f80b49592489ad1 Mon Sep 17 00:00:00 2001
> From: Krzysztof Kozlowski <[email protected]>
> Date: Wed, 4 Feb 2015 16:14:41 +0100
> Subject: [RFC] ARM: Don't use complete() during __cpu_die
>
> The complete() should not be used on offlined CPU.
>
> Signed-off-by: Krzysztof Kozlowski <[email protected]>
> ---
> arch/arm/kernel/smp.c | 22 ++++++++++++++++++----
> 1 file changed, 18 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index 86ef244c5a24..f3a5ad80a253 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -26,6 +26,7 @@
> #include <linux/completion.h>
> #include <linux/cpufreq.h>
> #include <linux/irq_work.h>
> +#include <linux/wait.h>
>
> #include <linux/atomic.h>
> #include <asm/smp.h>
> @@ -76,6 +77,9 @@ enum ipi_msg_type {
>
> static DECLARE_COMPLETION(cpu_running);
>
> +#define CPU_DIE_WAIT_BIT 0
> +static unsigned long wait_cpu_die;
> +
> static struct smp_operations smp_ops;
>
> void __init smp_set_ops(struct smp_operations *ops)
> @@ -133,7 +137,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
> pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
> }
>
> -
> + set_bit(CPU_DIE_WAIT_BIT, &wait_cpu_die);
> memset(&secondary_data, 0, sizeof(secondary_data));
> return ret;
> }
> @@ -213,7 +217,17 @@ int __cpu_disable(void)
> return 0;
> }
>
> -static DECLARE_COMPLETION(cpu_died);
> +static int wait_for_cpu_die(void)
> +{
> + might_sleep();
> +
> + if (!test_bit(CPU_DIE_WAIT_BIT, &wait_cpu_die))
> + return 0;
> +
> + return out_of_line_wait_on_bit_timeout(&wait_cpu_die, CPU_DIE_WAIT_BIT,
> + bit_wait_timeout, TASK_UNINTERRUPTIBLE,
> + msecs_to_jiffies(5000));
> +}
>
> /*
> * called on the thread which is asking for a CPU to be shutdown -
> @@ -221,7 +235,7 @@ static DECLARE_COMPLETION(cpu_died);
> */
> void __cpu_die(unsigned int cpu)
> {
> - if (!wait_for_completion_timeout(&cpu_died, msecs_to_jiffies(5000))) {
> + if (wait_for_cpu_die()) {
> pr_err("CPU%u: cpu didn't die\n", cpu);
> return;
> }
> @@ -267,7 +281,7 @@ void __ref cpu_die(void)
> * this returns, power and/or clocks can be removed at any point
> * from this CPU and its cache by platform_cpu_kill().
> */
> - complete(&cpu_died);
> + clear_bit(CPU_DIE_WAIT_BIT, &wait_cpu_die);
>
> /*
> * Ensure that the cache lines associated with that completion are
> --
> 1.9.1
>

2015-02-04 16:11:01

by Krzysztof Kozlowski

[permalink] [raw]

Subject: Re: [rcu] [ INFO: suspicious RCU usage. ]

On śro, 2015-02-04 at 07:56 -0800, Paul E. McKenney wrote:
> On Wed, Feb 04, 2015 at 04:22:28PM +0100, Krzysztof Kozlowski wrote:
> >
> > Actually the timeout versions but I think that doesn't matter.
> > The wait_on_bit will busy-loop with testing for the bit. Inside the loop
> > it calls the 'action' which in my case will be bit_wait_io_timeout().
> > This calls schedule_timeout().
>
> Ah, good point.
>
> > See proof of concept in attachment. One observed issue: hot unplug from
> > commandline takes a lot more time. About 7 seconds instead of ~0.5.
> > Probably I did something wrong.
>
> Well, you do set the timeout to five seconds, and so if the condition
> does not get set before the surviving CPU finds its way to the
> out_of_line_wait_on_bit_timeout(), you are guaranteed to wait for at
> least five seconds.
>
> One alternative approach would be to have a loop around a series of
> shorter waits. Other thoughts?

Right! That was the issue. It seems it works. I'll think also on
self-adapting interval as you said below. I'll test it more and send a
patch.

Best regards,
Krzysztof

>
> > > You know, this situation is giving me a bad case of nostalgia for the
> > > old Sequent Symmetry and NUMA-Q hardware. On those platforms, the
> > > outgoing CPU could turn itself off, and thus didn't need to tell some
> > > other CPU when it was ready to be turned off. Seems to me that this
> > > self-turn-off capability would be a great feature for future systems!
> >
> > There are a lot more issues with hotplug on ARM...
>
> Just trying to clean up this particular corner at the moment. ;-)
>
> > Patch/RFC attached.
>
> Again, I believe that you will need to loop over a shorter timeout
> in order to get reasonable latencies. If waiting a millisecond at
> a time is an energy-efficiency concern (don't know why it would be
> in this rare case, but...), then one approach would be to start
> with very short waits, then increase the wait time, for example,
> doubling the wait time on each pass through the loop would result
> in a smallish number of wakeups, but would mean that you waited
> no more than twice as long as necessary.
>
> Thoughts?

2015-02-04 16:33:11

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [rcu] [ INFO: suspicious RCU usage. ]

On Wed, Feb 04, 2015 at 05:10:56PM +0100, Krzysztof Kozlowski wrote:
> On śro, 2015-02-04 at 07:56 -0800, Paul E. McKenney wrote:
> > On Wed, Feb 04, 2015 at 04:22:28PM +0100, Krzysztof Kozlowski wrote:
> > >
> > > Actually the timeout versions but I think that doesn't matter.
> > > The wait_on_bit will busy-loop with testing for the bit. Inside the loop
> > > it calls the 'action' which in my case will be bit_wait_io_timeout().
> > > This calls schedule_timeout().
> >
> > Ah, good point.
> >
> > > See proof of concept in attachment. One observed issue: hot unplug from
> > > commandline takes a lot more time. About 7 seconds instead of ~0.5.
> > > Probably I did something wrong.
> >
> > Well, you do set the timeout to five seconds, and so if the condition
> > does not get set before the surviving CPU finds its way to the
> > out_of_line_wait_on_bit_timeout(), you are guaranteed to wait for at
> > least five seconds.
> >
> > One alternative approach would be to have a loop around a series of
> > shorter waits. Other thoughts?
>
> Right! That was the issue. It seems it works. I'll think also on
> self-adapting interval as you said below. I'll test it more and send a
> patch.

Sounds good!

Are you doing ARM, ARM64, or both? I of course vote for both. ;-)

Thanx, Paul

> Best regards,
> Krzysztof
>
> >
> > > > You know, this situation is giving me a bad case of nostalgia for the
> > > > old Sequent Symmetry and NUMA-Q hardware. On those platforms, the
> > > > outgoing CPU could turn itself off, and thus didn't need to tell some
> > > > other CPU when it was ready to be turned off. Seems to me that this
> > > > self-turn-off capability would be a great feature for future systems!
> > >
> > > There are a lot more issues with hotplug on ARM...
> >
> > Just trying to clean up this particular corner at the moment. ;-)
> >
> > > Patch/RFC attached.
> >
> > Again, I believe that you will need to loop over a shorter timeout
> > in order to get reasonable latencies. If waiting a millisecond at
> > a time is an energy-efficiency concern (don't know why it would be
> > in this rare case, but...), then one approach would be to start
> > with very short waits, then increase the wait time, for example,
> > doubling the wait time on each pass through the loop would result
> > in a smallish number of wakeups, but would mean that you waited
> > no more than twice as long as necessary.
> >
> > Thoughts?
>
>

2015-02-04 16:43:53

by Krzysztof Kozlowski

[permalink] [raw]

Subject: Re: [rcu] [ INFO: suspicious RCU usage. ]

On śro, 2015-02-04 at 08:28 -0800, Paul E. McKenney wrote:
> On Wed, Feb 04, 2015 at 05:10:56PM +0100, Krzysztof Kozlowski wrote:
> > On śro, 2015-02-04 at 07:56 -0800, Paul E. McKenney wrote:
> > > On Wed, Feb 04, 2015 at 04:22:28PM +0100, Krzysztof Kozlowski wrote:
> > > >
> > > > Actually the timeout versions but I think that doesn't matter.
> > > > The wait_on_bit will busy-loop with testing for the bit. Inside the loop
> > > > it calls the 'action' which in my case will be bit_wait_io_timeout().
> > > > This calls schedule_timeout().
> > >
> > > Ah, good point.
> > >
> > > > See proof of concept in attachment. One observed issue: hot unplug from
> > > > commandline takes a lot more time. About 7 seconds instead of ~0.5.
> > > > Probably I did something wrong.
> > >
> > > Well, you do set the timeout to five seconds, and so if the condition
> > > does not get set before the surviving CPU finds its way to the
> > > out_of_line_wait_on_bit_timeout(), you are guaranteed to wait for at
> > > least five seconds.
> > >
> > > One alternative approach would be to have a loop around a series of
> > > shorter waits. Other thoughts?
> >
> > Right! That was the issue. It seems it works. I'll think also on
> > self-adapting interval as you said below. I'll test it more and send a
> > patch.
>
> Sounds good!
>
> Are you doing ARM, ARM64, or both? I of course vote for both. ;-)

I'll do both but first I need to find who has ARM64 board in my team.

Best regards,
Krzysztof