Hi,
I was testing stress-ng on an ARM64 box and I found that it can be killed instantaneously with a --hrtimers 1 test:
https://github.com/ColinIanKing/stress-ng/blob/master/stress-hrtimers.c
The console shell locks up immediately after starting the process, and I get this rcu_preempt splat after 21 seconds,
letting me know that the grace-periods kernel thread could not run:
# stress-ng --hrtimers 1
stress-ng: info: [320] defaulting to a 86400 second (1 day, 0.00 secs) run per stressor
stress-ng: info: [320] dispatching hogs: 1 hrtimers
stress-ng: info: [320] cache allocate: using defaults, can't determine cache details from sysfs
[ 85.827528] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 85.833656] (detected by 1, t=6502 jiffies, g=1789, q=12)
[ 85.839163] rcu: All QSes seen, last rcu_preempt kthread activity 6502 (4294913720-4294907218), jiffies_till_next_fqs=1, root ->qsmask 0x0
[ 85.851647] rcu: rcu_preempt kthread starved for 6502 jiffies! g1789 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
[ 85.861859] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
[ 85.871025] rcu: RCU grace-period kthread stack dump:
[ 85.876094] rcu_preempt R running task 0 10 2 0x00000028
[ 85.883173] Call trace:
[ 85.885636] __switch_to+0xf8/0x148
[ 85.889137] __schedule+0x2d8/0x808
[ 85.892636] schedule+0x48/0x100
[ 85.895875] schedule_timeout+0x1c8/0x420
[ 85.899900] rcu_gp_kthread+0x738/0x1b78
[ 85.903836] kthread+0x158/0x168
[ 85.907075] ret_from_fork+0x10/0x18
[ 93.283548] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 33s!
[ 93.291569] BUG: workqueue lockup - pool cpus=0-1 flags=0x4 nice=0 stuck for 33s!
[ 93.299105] Showing busy workqueues and worker pools:
[ 93.304189] workqueue events: flags=0x0
[ 93.308116] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[ 93.315031] pending: vmstat_shepherd
[ 93.318990] workqueue events_unbound: flags=0x2
[ 93.323577] pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/512 refcnt=3
[ 93.330309] pending: flush_to_ldisc
[ 93.334161]
[ 93.334164] ======================================================
[ 93.334165] WARNING: possible circular locking dependency detected
[ 93.334167] 5.7.0-08604-g7dcedf8c04c0 #118 Not tainted
[ 93.334168] ------------------------------------------------------
[ 93.334169] stress-ng-hrtim/326 is trying to acquire lock:
[ 93.334171] ffffb236f6aee1a0 (console_owner){-.-.}-{0:0}, at: console_unlock+0x21c/0x678
[ 93.334176]
[ 93.334177] but task is already holding lock:
[ 93.334178] ffff00207ac0d018 (&pool->lock/1){-.-.}-{2:2}, at: show_workqueue_state+0x288/0x3a8
[ 93.334184]
[ 93.334186] which lock already depends on the new lock.
[ 93.334187]
[ 93.334188]
[ 93.334189] the existing dependency chain (in reverse order) is:
[ 93.334190]
[ 93.334191] -> #2 (&pool->lock/1){-.-.}-{2:2}:
[ 93.334197] _raw_spin_lock+0x5c/0x78
[ 93.334198] __queue_work+0x124/0x7c8
[ 93.334199] queue_work_on+0xd0/0xf0
[ 93.334200] tty_flip_buffer_push+0x3c/0x48
[ 93.334202] serial8250_rx_chars+0x74/0x88
[ 93.334203] fsl8250_handle_irq+0x15c/0x1a0
[ 93.334204] serial8250_interrupt+0x70/0xb8
[ 93.334206] __handle_irq_event_percpu+0xe0/0x478
[ 93.334207] handle_irq_event_percpu+0x40/0x98
[ 93.334208] handle_irq_event+0x4c/0xd0
[ 93.334209] handle_fasteoi_irq+0xb4/0x158
[ 93.334211] generic_handle_irq+0x3c/0x58
[ 93.334212] __handle_domain_irq+0x68/0xc0
[ 93.334213] gic_handle_irq+0x6c/0x160
[ 93.334214] el1_irq+0xbc/0x180
[ 93.334216] cpuidle_enter_state+0xb4/0x4f8
[ 93.334217] cpuidle_enter+0x3c/0x50
[ 93.334218] call_cpuidle+0x44/0x78
[ 93.334219] do_idle+0x228/0x2c8
[ 93.334220] cpu_startup_entry+0x2c/0x48
[ 93.334222] rest_init+0x1ac/0x280
[ 93.334223] arch_call_rest_init+0x14/0x1c
[ 93.334224] start_kernel+0x4ec/0x524
[ 93.334225]
[ 93.334226] -> #1 (&port->lock#2){-.-.}-{2:2}:
[ 93.334232] _raw_spin_lock_irqsave+0x78/0xa0
[ 93.334233] serial8250_console_write+0x1f4/0x348
[ 93.334234] univ8250_console_write+0x44/0x58
[ 93.334235] console_unlock+0x480/0x678
[ 93.334237] vprintk_emit+0x188/0x370
[ 93.334238] vprintk_default+0x48/0x58
[ 93.334239] vprintk_func+0xf0/0x238
[ 93.334240] printk+0x74/0x94
[ 93.334241] register_console+0x1a0/0x300
[ 93.334243] uart_add_one_port+0x4a0/0x4e0
[ 93.334244] serial8250_register_8250_port+0x2d0/0x4b0
[ 93.334245] of_platform_serial_probe+0x318/0x650
[ 93.334246] platform_drv_probe+0x58/0xa8
[ 93.334248] really_probe+0x10c/0x358
[ 93.334249] driver_probe_device+0x5c/0x100
[ 93.334250] device_driver_attach+0x74/0x98
[ 93.334252] __driver_attach+0x98/0xe0
[ 93.334253] bus_for_each_dev+0x84/0xd8
[ 93.334254] driver_attach+0x30/0x40 [ 93.334255] bus_add_driver+0x160/0x208
[ 93.334256] driver_register+0x64/0x110
[ 93.334258] __platform_driver_register+0x58/0x68
[ 93.334259] of_platform_serial_driver_init+0x20/0x28
[ 93.334260] do_one_initcall+0x94/0x438
[ 93.334261] kernel_init_freeable+0x2cc/0x338
[ 93.334263] kernel_init+0x18/0x110 [ 93.334264] ret_from_fork+0x10/0x18 [ 93.334265]
[ 93.334266] -> #0 (console_owner){-.-.}-{0:0}:
[ 93.334271] __lock_acquire+0x1170/0x1510
[ 93.334272] lock_acquire+0xf8/0x458 [ 93.334273] console_unlock+0x274/0x678
[ 93.334274] vprintk_emit+0x188/0x370
[ 93.334275] vprintk_default+0x48/0x58
[ 93.334277] vprintk_func+0xf0/0x238 [ 93.334278] printk+0x74/0x94
[ 93.334279] show_pwq+0x80/0x2c0
[ 93.334280] show_workqueue_state+0x2ac/0x3a8
[ 93.334281] wq_watchdog_timer_fn+0x194/0x2d8
[ 93.334283] call_timer_fn+0xcc/0x400
[ 93.334284] run_timer_softirq+0x250/0x6e0
[ 93.334285] __do_softirq+0x12c/0x5b4
[ 93.334286] irq_exit+0x13c/0x148
[ 93.334287] __handle_domain_irq+0x6c/0xc0
[ 93.334289] gic_handle_irq+0x6c/0x160
[ 93.334290] el1_irq+0xbc/0x180
[ 93.334291] lock_acquire+0x110/0x458
[ 93.334292] __might_fault+0x84/0xa8 [ 93.334293] __arm64_sys_rt_sigreturn+0x154/0x980
[ 93.334295] el0_svc_common.constprop.2+0x7c/0x178
[ 93.334296] do_el0_svc+0x34/0xa0
[ 93.334297] el0_sync_handler+0x160/0x1f0
[ 93.334298] el0_sync+0x140/0x180
[ 93.334299]
[ 93.334301] other info that might help us debug this:
[ 93.334301]
[ 93.334303] Chain exists of:
[ 93.334303] console_owner --> &port->lock#2 --> &pool->lock/1
[ 93.334311]
[ 93.334313] Possible unsafe locking scenario:
[ 93.334314]
[ 93.334315] CPU0 CPU1
[ 93.334316] ---- ----
[ 93.334317] lock(&pool->lock/1);
[ 93.334321] lock(&port->lock#2);
[ 93.334325] lock(&pool->lock/1);
[ 93.334328] lock(console_owner);
[ 93.334331]
[ 93.334332] *** DEADLOCK ***
[ 93.334333]
[ 93.334334] 5 locks held by stress-ng-hrtim/326:
[ 93.334335] #0: ffff002077cf9a18 (&mm->mmap_sem){++++}-{3:3}, at: __might_fault+0x60/0xa8
[ 93.334341] #1: ffff800010003de0 ((&wq_watchdog_timer)){+.-.}-{0:0}, at: call_timer_fn+0x0/0x400
[ 93.334347] #2: ffffb236f6aefd60 (rcu_read_lock){....}-{1:2}, at: show_workqueue_state+0x0/0x3a8
[ 93.334353] #3: ffff00207ac0d018 (&pool->lock/1){-.-.}-{2:2}, at: show_workqueue_state+0x288/0x3a8
[ 93.334360] #4: ffffb236f6aee080 (console_lock){+.+.}-{0:0}, at: vprintk_emit+0x180/0x370
[ 93.334366]
[ 93.334367] stack backtrace:
[ 93.334369] CPU: 0 PID: 326 Comm: stress-ng-hrtim Not tainted 5.7.0-08604-g7dcedf8c04c0 #118
[ 93.334370] Hardware name: LS1028A RDB Board (DT)
[ 93.334371] Call trace:
[ 93.334372] dump_backtrace+0x0/0x1a8
[ 93.334373] show_stack+0x24/0x30
[ 93.334374] dump_stack+0xe8/0x150
[ 93.334376] print_circular_bug.isra.41+0x228/0x280
[ 93.334377] check_noncircular+0x14c/0x1b0
[ 93.334378] __lock_acquire+0x1170/0x1510 [ 93.334379] lock_acquire+0xf8/0x458
[ 93.334381] console_unlock+0x274/0x678
[ 93.334382] vprintk_emit+0x188/0x370
[ 93.334383] vprintk_default+0x48/0x58
[ 93.334384] vprintk_func+0xf0/0x238
[ 93.334385] printk+0x74/0x94
[ 93.334386] show_pwq+0x80/0x2c0
[ 93.334387] show_workqueue_state+0x2ac/0x3a8
[ 93.334389] wq_watchdog_timer_fn+0x194/0x2d8
[ 93.334390] call_timer_fn+0xcc/0x400
[ 93.334391] run_timer_softirq+0x250/0x6e0
[ 93.334392] __do_softirq+0x12c/0x5b4
[ 93.334393] irq_exit+0x13c/0x148
[ 93.334394] __handle_domain_irq+0x6c/0xc0
[ 93.334396] gic_handle_irq+0x6c/0x160
[ 93.334397] el1_irq+0xbc/0x180
[ 93.334398] lock_acquire+0x110/0x458
[ 93.334399] __might_fault+0x84/0xa8
[ 93.334400] __arm64_sys_rt_sigreturn+0x154/0x980
[ 93.334402] el0_svc_common.constprop.2+0x7c/0x178
[ 93.334403] do_el0_svc+0x34/0xa0
[ 93.334404] el0_sync_handler+0x160/0x1f0 [ 93.334405] el0_sync+0x140/0x180
[ 93.984979] workqueue events_freezable: flags=0x4
[ 93.989705] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[ 93.996522] pending: mmc_rescan
[ 94.000032] workqueue events_power_efficient: flags=0x82
[ 94.005369] pwq 4: cpus=0-1 flags=0x4 nice=0 active=4/256 refcnt=6
[ 94.011748] pending: phy_state_machine, phy_state_machine, neigh_periodic_work, do_cache_clean
[ 94.020761] workqueue pm: flags=0x4
[ 94.024266] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[ 94.031081] pending: pm_runtime_work
[ 94.035025] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[ 94.041840] pending: pm_runtime_work
[ 94.045782] workqueue writeback: flags=0x4a
[ 94.049984] pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/256 refcnt=3
[ 94.056362] pending: wb_workfn
[ 94.059784] workqueue kblockd: flags=0x18 [ 94.063812] pwq 3: cpus=1 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2
[ 94.070802] pending: blk_mq_timeout_work
Looks like there's another issue in fsl8250_handle_irq, but I think that's unrelated so I'll ignore it.
So it looks like CPU 1 detects that the rcu_preempt kthread isn't scheduled, despite the fact that all
quiescent states were seen. So it looks like there is no RCU usage bug in here, but a legitimate CPU
starvation issue as far as I can see.
I have some noob questions:
- Is my system (hardware, drivers, kernel config) misconfigured? This can be reproduced at least on the
NXP LS1028A running today's master branch with just the arm64 defconfig. On other hardware however
(NXP T1040 PowerPC), RT throttling kicks in faster and the system doesn't hang (more below).
- What is the expected behavior of a system to this hrtimer test? Intuitively I would say the kenel
shouldn't hang, but I don't exactly know what are the mechanisms to avoid that. Below are 2 things I tried.
- I tried to enable CONFIG_RCU_BOOST and had limited success. I could get rid of the RCU stall messages
but the system would still hang after a longer while. When it hanged, it didn't even respond to ping
from the outside, no shell, only magic sysrq could restart the kernel.
- I read a little bit of the "Limiting the CPU usage of real-time and deadline processes" chapter from
man 7 sched. Is this what I should be doing? My /proc/sys/kernel/sched_rt_period_us and
/proc/sys/kernel/sched_rt_runtime_us are set to their default values (RT period 1 second, RT runtime
.95 seconds), why isn't that enough? If I reduce the RT runtime to .8 seconds while keeping the period
at 1 second, RT throttling (finally!) kicks in and the system does not die.
The RCU stall warning does keep, however, getting printed continuously.
[ 385.732074] sched: RT throttling activated
[ 385.883960] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 385.890077] (detected by 1, t=6503 jiffies, g=1801, q=68)
[ 385.895582] rcu: All QSes seen, last rcu_preempt kthread activity 6504 (4294988738-4294982234), jiffies_till_next_fqs=1, root ->qsmask 0x0
[ 385.908062] rcu: rcu_preempt kthread starved for 6508 jiffies! g1801 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
[ 385.918273] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
[ 385.927437] rcu: RCU grace-period kthread stack dump:
[ 385.932503] rcu_preempt R running task 0 11 2 0x00000028
[ 385.939579] Call trace:
[ 385.942040] __switch_to+0xf8/0x148
[ 385.945539] __schedule+0x2d8/0x808
[ 385.949036] schedule+0x48/0x100
[ 385.952273] schedule_timeout+0x1c8/0x420
[ 385.956295] rcu_gp_kthread+0x704/0x1870
[ 385.960228] kthread+0x158/0x168
[ 385.963464] ret_from_fork+0x10/0x18
(...)
[ 463.903959] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 463.910077] (detected by 1, t=26008 jiffies, g=1801, q=148)
[ 463.915756] rcu: All QSes seen, last rcu_preempt kthread activity 26009 (4295008243-4294982234), jiffies_till_next_fqs=1, root ->qsmask 0x0
[ 463.928324] rcu: rcu_preempt kthread starved for 26013 jiffies! g1801 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
[ 463.938622] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
[ 463.947786] rcu: RCU grace-period kthread stack dump:
[ 463.952852] rcu_preempt R running task 0 11 2 0x00000028
[ 463.959928] Call trace:
[ 463.962389] __switch_to+0xf8/0x148
[ 463.965888] __schedule+0x2d8/0x808
[ 463.969385] schedule+0x48/0x100
[ 463.972622] schedule_timeout+0x1c8/0x420
[ 463.976644] rcu_gp_kthread+0x704/0x1870
[ 463.980578] kthread+0x158/0x168
[ 463.983814] ret_from_fork+0x10/0x18
(...)
[ 541.923959] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 541.930076] (detected by 1, t=45513 jiffies, g=1801, q=230)
[ 541.935754] rcu: All QSes seen, last rcu_preempt kthread activity 45514 (4295027748-4294982234), jiffies_till_next_fqs=1, root ->qsmask 0x0
[ 541.948322] rcu: rcu_preempt kthread starved for 45518 jiffies! g1801 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
[ 541.958620] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
[ 541.967784] rcu: RCU grace-period kthread stack dump:
[ 541.972850] rcu_preempt R running task 0 11 2 0x00000028
[ 541.979926] Call trace:
[ 541.982387] __switch_to+0xf8/0x148
[ 541.985887] __schedule+0x2d8/0x808
[ 541.989384] schedule+0x48/0x100
[ 541.992621] schedule_timeout+0x1c8/0x420
[ 541.996643] rcu_gp_kthread+0x704/0x1870
[ 542.000576] kthread+0x158/0x168
[ 542.003813] ret_from_fork+0x10/0x18
I would really appreciate some pointers.
Thanks,
-Vladimir
On Fri, Jun 05, 2020 at 04:47:51AM +0300, Vladimir Oltean wrote:
> Hi,
>
> I was testing stress-ng on an ARM64 box and I found that it can be killed instantaneously with a --hrtimers 1 test:
> https://github.com/ColinIanKing/stress-ng/blob/master/stress-hrtimers.c
> The console shell locks up immediately after starting the process, and I get this rcu_preempt splat after 21 seconds,
> letting me know that the grace-periods kernel thread could not run:
>
> # stress-ng --hrtimers 1
> stress-ng: info: [320] defaulting to a 86400 second (1 day, 0.00 secs) run per stressor
> stress-ng: info: [320] dispatching hogs: 1 hrtimers
> stress-ng: info: [320] cache allocate: using defaults, can't determine cache details from sysfs
> [ 85.827528] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 85.833656] (detected by 1, t=6502 jiffies, g=1789, q=12)
> [ 85.839163] rcu: All QSes seen, last rcu_preempt kthread activity 6502 (4294913720-4294907218), jiffies_till_next_fqs=1, root ->qsmask 0x0
> [ 85.851647] rcu: rcu_preempt kthread starved for 6502 jiffies! g1789 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
> [ 85.861859] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> [ 85.871025] rcu: RCU grace-period kthread stack dump:
> [ 85.876094] rcu_preempt R running task 0 10 2 0x00000028
> [ 85.883173] Call trace:
> [ 85.885636] __switch_to+0xf8/0x148
> [ 85.889137] __schedule+0x2d8/0x808
> [ 85.892636] schedule+0x48/0x100
> [ 85.895875] schedule_timeout+0x1c8/0x420
> [ 85.899900] rcu_gp_kthread+0x738/0x1b78
> [ 85.903836] kthread+0x158/0x168
> [ 85.907075] ret_from_fork+0x10/0x18
> [ 93.283548] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 33s!
> [ 93.291569] BUG: workqueue lockup - pool cpus=0-1 flags=0x4 nice=0 stuck for 33s!
> [ 93.299105] Showing busy workqueues and worker pools:
> [ 93.304189] workqueue events: flags=0x0
> [ 93.308116] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> [ 93.315031] pending: vmstat_shepherd
> [ 93.318990] workqueue events_unbound: flags=0x2
> [ 93.323577] pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/512 refcnt=3
> [ 93.330309] pending: flush_to_ldisc
> [ 93.334161]
> [ 93.334164] ======================================================
> [ 93.334165] WARNING: possible circular locking dependency detected
> [ 93.334167] 5.7.0-08604-g7dcedf8c04c0 #118 Not tainted
> [ 93.334168] ------------------------------------------------------
> [ 93.334169] stress-ng-hrtim/326 is trying to acquire lock:
> [ 93.334171] ffffb236f6aee1a0 (console_owner){-.-.}-{0:0}, at: console_unlock+0x21c/0x678
> [ 93.334176]
> [ 93.334177] but task is already holding lock:
> [ 93.334178] ffff00207ac0d018 (&pool->lock/1){-.-.}-{2:2}, at: show_workqueue_state+0x288/0x3a8
> [ 93.334184]
> [ 93.334186] which lock already depends on the new lock.
> [ 93.334187]
> [ 93.334188]
> [ 93.334189] the existing dependency chain (in reverse order) is:
> [ 93.334190]
> [ 93.334191] -> #2 (&pool->lock/1){-.-.}-{2:2}:
> [ 93.334197] _raw_spin_lock+0x5c/0x78
> [ 93.334198] __queue_work+0x124/0x7c8
> [ 93.334199] queue_work_on+0xd0/0xf0
> [ 93.334200] tty_flip_buffer_push+0x3c/0x48
> [ 93.334202] serial8250_rx_chars+0x74/0x88
> [ 93.334203] fsl8250_handle_irq+0x15c/0x1a0
> [ 93.334204] serial8250_interrupt+0x70/0xb8
> [ 93.334206] __handle_irq_event_percpu+0xe0/0x478
> [ 93.334207] handle_irq_event_percpu+0x40/0x98
> [ 93.334208] handle_irq_event+0x4c/0xd0
> [ 93.334209] handle_fasteoi_irq+0xb4/0x158
> [ 93.334211] generic_handle_irq+0x3c/0x58
> [ 93.334212] __handle_domain_irq+0x68/0xc0
> [ 93.334213] gic_handle_irq+0x6c/0x160
> [ 93.334214] el1_irq+0xbc/0x180
> [ 93.334216] cpuidle_enter_state+0xb4/0x4f8
> [ 93.334217] cpuidle_enter+0x3c/0x50
> [ 93.334218] call_cpuidle+0x44/0x78
> [ 93.334219] do_idle+0x228/0x2c8
> [ 93.334220] cpu_startup_entry+0x2c/0x48
> [ 93.334222] rest_init+0x1ac/0x280
> [ 93.334223] arch_call_rest_init+0x14/0x1c
> [ 93.334224] start_kernel+0x4ec/0x524
> [ 93.334225]
> [ 93.334226] -> #1 (&port->lock#2){-.-.}-{2:2}:
> [ 93.334232] _raw_spin_lock_irqsave+0x78/0xa0
> [ 93.334233] serial8250_console_write+0x1f4/0x348
> [ 93.334234] univ8250_console_write+0x44/0x58
> [ 93.334235] console_unlock+0x480/0x678
> [ 93.334237] vprintk_emit+0x188/0x370
> [ 93.334238] vprintk_default+0x48/0x58
> [ 93.334239] vprintk_func+0xf0/0x238
> [ 93.334240] printk+0x74/0x94
> [ 93.334241] register_console+0x1a0/0x300
> [ 93.334243] uart_add_one_port+0x4a0/0x4e0
> [ 93.334244] serial8250_register_8250_port+0x2d0/0x4b0
> [ 93.334245] of_platform_serial_probe+0x318/0x650
> [ 93.334246] platform_drv_probe+0x58/0xa8
> [ 93.334248] really_probe+0x10c/0x358
> [ 93.334249] driver_probe_device+0x5c/0x100
> [ 93.334250] device_driver_attach+0x74/0x98
> [ 93.334252] __driver_attach+0x98/0xe0
> [ 93.334253] bus_for_each_dev+0x84/0xd8
> [ 93.334254] driver_attach+0x30/0x40 [ 93.334255] bus_add_driver+0x160/0x208
> [ 93.334256] driver_register+0x64/0x110
> [ 93.334258] __platform_driver_register+0x58/0x68
> [ 93.334259] of_platform_serial_driver_init+0x20/0x28
> [ 93.334260] do_one_initcall+0x94/0x438
> [ 93.334261] kernel_init_freeable+0x2cc/0x338
> [ 93.334263] kernel_init+0x18/0x110 [ 93.334264] ret_from_fork+0x10/0x18 [ 93.334265]
> [ 93.334266] -> #0 (console_owner){-.-.}-{0:0}:
> [ 93.334271] __lock_acquire+0x1170/0x1510
> [ 93.334272] lock_acquire+0xf8/0x458 [ 93.334273] console_unlock+0x274/0x678
> [ 93.334274] vprintk_emit+0x188/0x370
> [ 93.334275] vprintk_default+0x48/0x58
> [ 93.334277] vprintk_func+0xf0/0x238 [ 93.334278] printk+0x74/0x94
> [ 93.334279] show_pwq+0x80/0x2c0
> [ 93.334280] show_workqueue_state+0x2ac/0x3a8
> [ 93.334281] wq_watchdog_timer_fn+0x194/0x2d8
> [ 93.334283] call_timer_fn+0xcc/0x400
> [ 93.334284] run_timer_softirq+0x250/0x6e0
> [ 93.334285] __do_softirq+0x12c/0x5b4
> [ 93.334286] irq_exit+0x13c/0x148
> [ 93.334287] __handle_domain_irq+0x6c/0xc0
> [ 93.334289] gic_handle_irq+0x6c/0x160
> [ 93.334290] el1_irq+0xbc/0x180
> [ 93.334291] lock_acquire+0x110/0x458
> [ 93.334292] __might_fault+0x84/0xa8 [ 93.334293] __arm64_sys_rt_sigreturn+0x154/0x980
> [ 93.334295] el0_svc_common.constprop.2+0x7c/0x178
> [ 93.334296] do_el0_svc+0x34/0xa0
> [ 93.334297] el0_sync_handler+0x160/0x1f0
> [ 93.334298] el0_sync+0x140/0x180
> [ 93.334299]
> [ 93.334301] other info that might help us debug this:
> [ 93.334301]
> [ 93.334303] Chain exists of:
> [ 93.334303] console_owner --> &port->lock#2 --> &pool->lock/1
> [ 93.334311]
> [ 93.334313] Possible unsafe locking scenario:
> [ 93.334314]
> [ 93.334315] CPU0 CPU1
> [ 93.334316] ---- ----
> [ 93.334317] lock(&pool->lock/1);
> [ 93.334321] lock(&port->lock#2);
> [ 93.334325] lock(&pool->lock/1);
> [ 93.334328] lock(console_owner);
> [ 93.334331]
> [ 93.334332] *** DEADLOCK ***
> [ 93.334333]
> [ 93.334334] 5 locks held by stress-ng-hrtim/326:
> [ 93.334335] #0: ffff002077cf9a18 (&mm->mmap_sem){++++}-{3:3}, at: __might_fault+0x60/0xa8
> [ 93.334341] #1: ffff800010003de0 ((&wq_watchdog_timer)){+.-.}-{0:0}, at: call_timer_fn+0x0/0x400
> [ 93.334347] #2: ffffb236f6aefd60 (rcu_read_lock){....}-{1:2}, at: show_workqueue_state+0x0/0x3a8
> [ 93.334353] #3: ffff00207ac0d018 (&pool->lock/1){-.-.}-{2:2}, at: show_workqueue_state+0x288/0x3a8
> [ 93.334360] #4: ffffb236f6aee080 (console_lock){+.+.}-{0:0}, at: vprintk_emit+0x180/0x370
> [ 93.334366]
> [ 93.334367] stack backtrace:
> [ 93.334369] CPU: 0 PID: 326 Comm: stress-ng-hrtim Not tainted 5.7.0-08604-g7dcedf8c04c0 #118
> [ 93.334370] Hardware name: LS1028A RDB Board (DT)
> [ 93.334371] Call trace:
> [ 93.334372] dump_backtrace+0x0/0x1a8
> [ 93.334373] show_stack+0x24/0x30
> [ 93.334374] dump_stack+0xe8/0x150
> [ 93.334376] print_circular_bug.isra.41+0x228/0x280
> [ 93.334377] check_noncircular+0x14c/0x1b0
> [ 93.334378] __lock_acquire+0x1170/0x1510 [ 93.334379] lock_acquire+0xf8/0x458
> [ 93.334381] console_unlock+0x274/0x678
> [ 93.334382] vprintk_emit+0x188/0x370
> [ 93.334383] vprintk_default+0x48/0x58
> [ 93.334384] vprintk_func+0xf0/0x238
> [ 93.334385] printk+0x74/0x94
> [ 93.334386] show_pwq+0x80/0x2c0
> [ 93.334387] show_workqueue_state+0x2ac/0x3a8
> [ 93.334389] wq_watchdog_timer_fn+0x194/0x2d8
> [ 93.334390] call_timer_fn+0xcc/0x400
> [ 93.334391] run_timer_softirq+0x250/0x6e0
> [ 93.334392] __do_softirq+0x12c/0x5b4
> [ 93.334393] irq_exit+0x13c/0x148
> [ 93.334394] __handle_domain_irq+0x6c/0xc0
> [ 93.334396] gic_handle_irq+0x6c/0x160
> [ 93.334397] el1_irq+0xbc/0x180
> [ 93.334398] lock_acquire+0x110/0x458
> [ 93.334399] __might_fault+0x84/0xa8
> [ 93.334400] __arm64_sys_rt_sigreturn+0x154/0x980
> [ 93.334402] el0_svc_common.constprop.2+0x7c/0x178
> [ 93.334403] do_el0_svc+0x34/0xa0
> [ 93.334404] el0_sync_handler+0x160/0x1f0 [ 93.334405] el0_sync+0x140/0x180
> [ 93.984979] workqueue events_freezable: flags=0x4
> [ 93.989705] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> [ 93.996522] pending: mmc_rescan
> [ 94.000032] workqueue events_power_efficient: flags=0x82
> [ 94.005369] pwq 4: cpus=0-1 flags=0x4 nice=0 active=4/256 refcnt=6
> [ 94.011748] pending: phy_state_machine, phy_state_machine, neigh_periodic_work, do_cache_clean
> [ 94.020761] workqueue pm: flags=0x4
> [ 94.024266] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> [ 94.031081] pending: pm_runtime_work
> [ 94.035025] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> [ 94.041840] pending: pm_runtime_work
> [ 94.045782] workqueue writeback: flags=0x4a
> [ 94.049984] pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/256 refcnt=3
> [ 94.056362] pending: wb_workfn
> [ 94.059784] workqueue kblockd: flags=0x18 [ 94.063812] pwq 3: cpus=1 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2
> [ 94.070802] pending: blk_mq_timeout_work
>
> Looks like there's another issue in fsl8250_handle_irq, but I think that's unrelated so I'll ignore it.
> So it looks like CPU 1 detects that the rcu_preempt kthread isn't scheduled, despite the fact that all
> quiescent states were seen. So it looks like there is no RCU usage bug in here, but a legitimate CPU
> starvation issue as far as I can see.
That agrees with my interpretation of the stall warning message.
On the rest, I must defer to others.
Thanx, Paul
> I have some noob questions:
>
> - Is my system (hardware, drivers, kernel config) misconfigured? This can be reproduced at least on the
> NXP LS1028A running today's master branch with just the arm64 defconfig. On other hardware however
> (NXP T1040 PowerPC), RT throttling kicks in faster and the system doesn't hang (more below).
>
> - What is the expected behavior of a system to this hrtimer test? Intuitively I would say the kenel
> shouldn't hang, but I don't exactly know what are the mechanisms to avoid that. Below are 2 things I tried.
>
> - I tried to enable CONFIG_RCU_BOOST and had limited success. I could get rid of the RCU stall messages
> but the system would still hang after a longer while. When it hanged, it didn't even respond to ping
> from the outside, no shell, only magic sysrq could restart the kernel.
>
> - I read a little bit of the "Limiting the CPU usage of real-time and deadline processes" chapter from
> man 7 sched. Is this what I should be doing? My /proc/sys/kernel/sched_rt_period_us and
> /proc/sys/kernel/sched_rt_runtime_us are set to their default values (RT period 1 second, RT runtime
> .95 seconds), why isn't that enough? If I reduce the RT runtime to .8 seconds while keeping the period
> at 1 second, RT throttling (finally!) kicks in and the system does not die.
> The RCU stall warning does keep, however, getting printed continuously.
>
> [ 385.732074] sched: RT throttling activated
> [ 385.883960] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 385.890077] (detected by 1, t=6503 jiffies, g=1801, q=68)
> [ 385.895582] rcu: All QSes seen, last rcu_preempt kthread activity 6504 (4294988738-4294982234), jiffies_till_next_fqs=1, root ->qsmask 0x0
> [ 385.908062] rcu: rcu_preempt kthread starved for 6508 jiffies! g1801 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
> [ 385.918273] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> [ 385.927437] rcu: RCU grace-period kthread stack dump:
> [ 385.932503] rcu_preempt R running task 0 11 2 0x00000028
> [ 385.939579] Call trace:
> [ 385.942040] __switch_to+0xf8/0x148
> [ 385.945539] __schedule+0x2d8/0x808
> [ 385.949036] schedule+0x48/0x100
> [ 385.952273] schedule_timeout+0x1c8/0x420
> [ 385.956295] rcu_gp_kthread+0x704/0x1870
> [ 385.960228] kthread+0x158/0x168
> [ 385.963464] ret_from_fork+0x10/0x18
> (...)
> [ 463.903959] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 463.910077] (detected by 1, t=26008 jiffies, g=1801, q=148)
> [ 463.915756] rcu: All QSes seen, last rcu_preempt kthread activity 26009 (4295008243-4294982234), jiffies_till_next_fqs=1, root ->qsmask 0x0
> [ 463.928324] rcu: rcu_preempt kthread starved for 26013 jiffies! g1801 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
> [ 463.938622] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> [ 463.947786] rcu: RCU grace-period kthread stack dump:
> [ 463.952852] rcu_preempt R running task 0 11 2 0x00000028
> [ 463.959928] Call trace:
> [ 463.962389] __switch_to+0xf8/0x148
> [ 463.965888] __schedule+0x2d8/0x808
> [ 463.969385] schedule+0x48/0x100
> [ 463.972622] schedule_timeout+0x1c8/0x420
> [ 463.976644] rcu_gp_kthread+0x704/0x1870
> [ 463.980578] kthread+0x158/0x168
> [ 463.983814] ret_from_fork+0x10/0x18
> (...)
> [ 541.923959] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 541.930076] (detected by 1, t=45513 jiffies, g=1801, q=230)
> [ 541.935754] rcu: All QSes seen, last rcu_preempt kthread activity 45514 (4295027748-4294982234), jiffies_till_next_fqs=1, root ->qsmask 0x0
> [ 541.948322] rcu: rcu_preempt kthread starved for 45518 jiffies! g1801 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
> [ 541.958620] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> [ 541.967784] rcu: RCU grace-period kthread stack dump:
> [ 541.972850] rcu_preempt R running task 0 11 2 0x00000028
> [ 541.979926] Call trace:
> [ 541.982387] __switch_to+0xf8/0x148
> [ 541.985887] __schedule+0x2d8/0x808
> [ 541.989384] schedule+0x48/0x100
> [ 541.992621] schedule_timeout+0x1c8/0x420
> [ 541.996643] rcu_gp_kthread+0x704/0x1870
> [ 542.000576] kthread+0x158/0x168
> [ 542.003813] ret_from_fork+0x10/0x18
>
> I would really appreciate some pointers.
>
> Thanks,
> -Vladimir
Hi Paul,
On Fri, 5 Jun 2020 at 06:20, Paul E. McKenney <[email protected]> wrote:
>
>
> That agrees with my interpretation of the stall warning message.
>
> On the rest, I must defer to others.
>
> Thanx, Paul
>
Thanks, always good to have a confirmation from you!
FWIW, the system has survived an overnight run with
sched_rt_runtime_us set to 80%, I don't know if there's more to it.
Regards,
-Vladimir
Hi Vladimir,
On Fri Jun 05 2020, Vladimir Oltean wrote:
> Hi,
>
> I was testing stress-ng on an ARM64 box and I found that it can be killed instantaneously with a --hrtimers 1 test:
> https://github.com/ColinIanKing/stress-ng/blob/master/stress-hrtimers.c
> The console shell locks up immediately after starting the process, and I get this rcu_preempt splat after 21 seconds,
> letting me know that the grace-periods kernel thread could not run:
interesting. Just tested this on an ARM64 box with v5.6-rt and the
stress-ng hrtimer test works fine. No lockups, cyclictest results are
looking good. So maybe this is v5.7 related.
Thanks,
Kurt
Hi Kurt,
On Fri, 5 Jun 2020 at 15:57, Kurt Kanzenbach
<[email protected]> wrote:
>
> Hi Vladimir,
>
> On Fri Jun 05 2020, Vladimir Oltean wrote:
> > Hi,
> >
> > I was testing stress-ng on an ARM64 box and I found that it can be killed instantaneously with a --hrtimers 1 test:
> > https://github.com/ColinIanKing/stress-ng/blob/master/stress-hrtimers.c
> > The console shell locks up immediately after starting the process, and I get this rcu_preempt splat after 21 seconds,
> > letting me know that the grace-periods kernel thread could not run:
>
> interesting. Just tested this on an ARM64 box with v5.6-rt and the
> stress-ng hrtimer test works fine. No lockups, cyclictest results are
> looking good. So maybe this is v5.7 related.
>
> Thanks,
> Kurt
This is not actually v5.7 related. I could also reproduce it on a
downstream 5.4 kernel which is how I originally saw it.
Thanks,
-Vladimir
Hi Kurt,
On Fri, 5 Jun 2020 at 16:19, Vladimir Oltean <[email protected]> wrote:
>
> Hi Kurt,
>
> On Fri, 5 Jun 2020 at 15:57, Kurt Kanzenbach
> <[email protected]> wrote:
> >
> > Hi Vladimir,
> >
> > On Fri Jun 05 2020, Vladimir Oltean wrote:
> > > Hi,
> > >
> > > I was testing stress-ng on an ARM64 box and I found that it can be killed instantaneously with a --hrtimers 1 test:
> > > https://github.com/ColinIanKing/stress-ng/blob/master/stress-hrtimers.c
> > > The console shell locks up immediately after starting the process, and I get this rcu_preempt splat after 21 seconds,
> > > letting me know that the grace-periods kernel thread could not run:
> >
> > interesting. Just tested this on an ARM64 box with v5.6-rt and the
> > stress-ng hrtimer test works fine. No lockups, cyclictest results are
> > looking good. So maybe this is v5.7 related.
> >
> > Thanks,
> > Kurt
>
> This is not actually v5.7 related. I could also reproduce it on a
> downstream 5.4 kernel which is how I originally saw it.
>
> Thanks,
> -Vladimir
Just out of curiosity, what and how many CPU cores does your ARM64 box
have, and what frequency are you running them at?
Mine is a dual-core A72 machine running at 1500 MHz.
Thanks,
-Vladimir
Hi Vladimir,
On Tue Jun 09 2020, Vladimir Oltean wrote:
> Just out of curiosity, what and how many CPU cores does your ARM64 box
> have, and what frequency are you running them at?
> Mine is a dual-core A72 machine running at 1500 MHz.
That particular machine has a dual core Cortex A53 running at 1GHz.
Thanks,
Kurt
Hi Kurt,
On Wed, 10 Jun 2020 at 08:58, Kurt Kanzenbach
<[email protected]> wrote:
>
> Hi Vladimir,
>
> On Tue Jun 09 2020, Vladimir Oltean wrote:
> > Just out of curiosity, what and how many CPU cores does your ARM64 box
> > have, and what frequency are you running them at?
> > Mine is a dual-core A72 machine running at 1500 MHz.
>
> That particular machine has a dual core Cortex A53 running at 1GHz.
>
> Thanks,
> Kurt
Ok, I just noticed that you said "v5.6*-rt*".
So I re-tested that 5.4 downstream kernel again that hanged before,
just applied the -rt1 patchset on top. And it's actually working fine
now, no hangs, no RCU stalls, no need to reduce sched_rt_runtime_us.
So maybe it's something related to the config?
Thanks,
-Vladimir
Hi, Kurt,
May I know whether you used "root" user to run stress-ng? using "root" user will change the scheduler to be "SCHED_RR", so would you please share test result with root and non-root users? Thanks.
Best Regards,
Jiafei.
-----Original Message-----
From: [email protected] <[email protected]> On Behalf Of Kurt Kanzenbach
Sent: Wednesday, June 10, 2020 1:58 PM
To: Vladimir Oltean <[email protected]>
Cc: [email protected]; lkml <[email protected]>; [email protected]; Colin King <[email protected]>
Subject: [EXT] Re: stress-ng --hrtimers hangs system
Hi Vladimir,
On Tue Jun 09 2020, Vladimir Oltean wrote:
> Just out of curiosity, what and how many CPU cores does your ARM64 box
> have, and what frequency are you running them at?
> Mine is a dual-core A72 machine running at 1500 MHz.
That particular machine has a dual core Cortex A53 running at 1GHz.
Thanks,
Kurt
Hi Jiafei,
On Friday, June 12, 2020 2:49:17 AM CEST Jiafei Pan wrote:
> Hi, Kurt,
>
> May I know whether you used "root" user to run stress-ng? using "root"
> user will change the scheduler to be "SCHED_RR", so would you please
> share test result with root and non-root users? Thanks.
Performed the test now as root and non-root user. Same result: No
problem.
Thanks,
Kurt
Thanks Kurt for your confirmation.
Best Regards,
Jiafei.
-----Original Message-----
From: Kurt Kanzenbach <[email protected]>
Sent: Saturday, June 20, 2020 4:08 PM
To: Vladimir Oltean <[email protected]>; Jiafei Pan <[email protected]>
Cc: [email protected]; lkml <[email protected]>; [email protected]; Colin King <[email protected]>; Jiafei Pan <[email protected]>
Subject: Re: [EXT] Re: stress-ng --hrtimers hangs system
Caution: EXT Email
Hi Jiafei,
On Friday, June 12, 2020 2:49:17 AM CEST Jiafei Pan wrote:
> Hi, Kurt,
>
> May I know whether you used "root" user to run stress-ng? using "root"
> user will change the scheduler to be "SCHED_RR", so would you please
> share test result with root and non-root users? Thanks.
Performed the test now as root and non-root user. Same result: No problem.
Thanks,
Kurt