2022-05-09 01:37:05

by Jian-Hong Pan

[permalink] [raw]
Subject: [BUG] The UI hangs when system changes CPU frequency on Raspberry Pi 4B

Hi,

We found the UI hangs totally when system changes CPU frequency on
Raspberry Pi 4B. The test kernel is mainline kernel 5.18-rc3.

According to the oops messages and the stack:

[ 1330.166158] INFO: task kworker/0:0:6 blocked for more than 120 seconds.
[ 1330.172963] Tainted: G W E 5.18.0-rc3 #61
[ 1330.178767] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1330.186742] task:kworker/0:0 state:D stack: 0 pid: 6
ppid: 2 flags:0x00000008
[ 1330.186762] Workqueue: events dbs_work_handler
[ 1330.186779] Call trace:
[ 1330.186782] __switch_to+0xf8/0x150
[ 1330.186793] __schedule+0x37c/0x141c
[ 1330.186800] schedule+0x54/0xc4
[ 1330.186807] schedule_timeout+0xa4/0x190
[ 1330.186813] __wait_for_common+0xc4/0x224
[ 1330.186818] wait_for_completion_timeout+0x2c/0x40
[ 1330.186822] mbox_send_message+0xf0/0x1f4
[ 1330.186828] rpi_firmware_property_list+0xf8/0x280
[ 1330.186834] rpi_firmware_property+0x7c/0xc0
[ 1330.186839] raspberrypi_fw_set_rate+0x64/0xe4 [clk_raspberrypi]
[ 1330.186851] clk_change_rate+0xe4/0x470
[ 1330.186859] clk_core_set_rate_nolock+0x1d0/0x2ec
[ 1330.186864] clk_set_rate+0x44/0x190
[ 1330.186870] _set_opp+0x3b4/0x70c
[ 1330.186876] dev_pm_opp_set_rate+0x11c/0x2d0
[ 1330.186881] set_target+0x3c/0x4c
[ 1330.186887] __cpufreq_driver_target+0x400/0x7ac
[ 1330.186893] od_dbs_update+0xc8/0x1c0
[ 1330.186898] dbs_work_handler+0x4c/0x8c
[ 1330.186903] process_one_work+0x210/0x480
[ 1330.186910] worker_thread+0x180/0x520
[ 1330.186915] kthread+0xe4/0xf0
[ 1330.186920] ret_from_fork+0x10/0x20
[ 1330.186928] INFO: task kworker/u8:0:8 blocked for more than 120 seconds.
[ 1330.193744] Tainted: G W E 5.18.0-rc3 #61
[ 1330.199509] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1330.207467] task:kworker/u8:0 state:D stack: 0 pid: 8
ppid: 2 flags:0x00000008
[ 1330.207485] Workqueue: events_unbound commit_work [drm_kms_helper]
[ 1330.207555] Call trace:
[ 1330.207558] __switch_to+0xf8/0x150
[ 1330.207572] __schedule+0x37c/0x141c
[ 1330.207579] schedule+0x54/0xc4
[ 1330.207586] schedule_preempt_disabled+0x1c/0x2c
[ 1330.207593] __mutex_lock.constprop.0+0x160/0x59c
[ 1330.207598] __mutex_lock_slowpath+0x40/0x8c
[ 1330.207602] mutex_lock+0x68/0x80
[ 1330.207607] clk_prepare_lock+0x4c/0xac
[ 1330.207616] clk_set_rate_range.part.0+0x3c/0x250
[ 1330.207622] clk_set_min_rate+0x3c/0xdc
[ 1330.207627] vc4_atomic_commit_tail+0x68c/0x77c [vc4]
[ 1330.207664] commit_tail+0xd8/0x210 [drm_kms_helper]
[ 1330.207707] commit_work+0x20/0x30 [drm_kms_helper]
[ 1330.207748] process_one_work+0x210/0x480
[ 1330.207755] worker_thread+0x180/0x520
[ 1330.207760] kthread+0xe4/0xf0
[ 1330.207765] ret_from_fork+0x10/0x20

Looks like clk_raspberrypi tries to set/change rate by invoking
raspberrypi_fw_set_rate(), but the transaction with firmware is
timeout. There might be a lock shared with others, but held by it. For
example, the second oops message from drm_kms_helper -> vc4 shows
clk_prepare_lock() is waiting to be scheduled.

The UI hangs until raspberrypi-firmware gets the failed returned error
from mbox_send_message().

[ 4887.601985] raspberrypi-firmware soc:firmware: mbox_send_message returned -62
[ 4887.609269] raspberrypi-clk soc:firmware:clocks: Failed to change
fw-clk-arm frequency: -62

The serial console is alive when the UI hangs.

I put all of the findings on https://bugzilla.kernel.org/show_bug.cgi?id=215875

Jian-Hong Pan