2022-01-06 04:19:58

by Chris Murphy

[permalink] [raw]
Subject: 5.16.0-rc8 deadlock at rfcomm_sk_state_change

I just experienced this soon after wake from suspend-to-ram; I haven't
noticed it previously, but a bot might have seen it before (below the
dmesg snippet).

[14646.797396] kernel: PM: resume devices took 2.975 seconds
[14646.797418] kernel: OOM killer enabled.
[14646.797426] kernel: Restarting tasks ...
[14646.822164] kernel: done.
[14646.941351] kernel: PM: suspend exit
[14646.967395] kernel: rfkill: input handler disabled
[14647.074621] kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down
[14651.697550] kernel: wlp0s20f3: authenticate with f8:a0:97:6e:c7:e8
[14651.699884] kernel: wlp0s20f3: send auth to f8:a0:97:6e:c7:e8 (try 1/3)
[14651.828236] kernel: wlp0s20f3: authenticated
[14651.830313] kernel: wlp0s20f3: associate with f8:a0:97:6e:c7:e8 (try 1/3)
[14651.842846] kernel: wlp0s20f3: RX AssocResp from f8:a0:97:6e:c7:e8
(capab=0x411 status=0 aid=4)
[14651.845635] kernel: wlp0s20f3: associated
[14651.869218] kernel: wlp0s20f3: Limiting TX power to 27 (30 - 3) dBm
as advertised by f8:a0:97:6e:c7:e8
[14651.894962] kernel: IPv6: ADDRCONF(NETDEV_CHANGE): wlp0s20f3: link
becomes ready
[14655.200461] kernel:
[14655.200468] kernel: ======================================================
[14655.200470] kernel: WARNING: possible circular locking dependency detected
[14655.200472] kernel: 5.16.0-0.rc8.55.fc36.x86_64+debug #1 Not tainted
[14655.200475] kernel: ------------------------------------------------------
[14655.200476] kernel: krfcommd/1358 is trying to acquire lock:
[14655.200479] kernel: ffff9e662103d130
(sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}, at:
rfcomm_sk_state_change+0x4d/0x110 [rfcomm]
[14655.200494] kernel:
but task is already holding lock:
[14655.200495] kernel: ffff9e672a54bb40 (&d->lock){+.+.}-{3:3}, at:
__rfcomm_dlc_close+0x88/0x230 [rfcomm]
[14655.200506] kernel:
which lock already depends on the new lock.
[14655.200508] kernel:
the existing dependency chain (in reverse order) is:
[14655.200509] kernel:
-> #2 (&d->lock){+.+.}-{3:3}:
[14655.200515] kernel: __mutex_lock+0x93/0x910
[14655.200521] kernel: __rfcomm_dlc_close+0x88/0x230 [rfcomm]
[14655.200527] kernel: rfcomm_run+0x765/0x1820 [rfcomm]
[14655.200533] kernel: kthread+0x17a/0x1a0
[14655.200538] kernel: ret_from_fork+0x1f/0x30
[14655.200543] kernel:
-> #1 (rfcomm_mutex){+.+.}-{3:3}:
[14655.200548] kernel: __mutex_lock+0x93/0x910
[14655.200551] kernel: rfcomm_dlc_open+0x31/0x350 [rfcomm]
[14655.200557] kernel: rfcomm_sock_connect+0xd3/0x130 [rfcomm]
[14655.200564] kernel: __sys_connect+0x9f/0xd0
[14655.200569] kernel: __x64_sys_connect+0x14/0x20
[14655.200573] kernel: do_syscall_64+0x38/0x90
[14655.200578] kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
[14655.200581] kernel:
-> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}:
[14655.200586] kernel: __lock_acquire+0x11c7/0x1de0
[14655.200590] kernel: lock_acquire+0xd3/0x2f0
[14655.200593] kernel: lock_sock_nested+0x2e/0x80
[14655.200597] kernel: rfcomm_sk_state_change+0x4d/0x110 [rfcomm]
[14655.200603] kernel: __rfcomm_dlc_close+0xa2/0x230 [rfcomm]
[14655.200609] kernel: rfcomm_run+0x765/0x1820 [rfcomm]
[14655.200615] kernel: kthread+0x17a/0x1a0
[14655.200618] kernel: ret_from_fork+0x1f/0x30
[14655.200622] kernel:
other info that might help us debug this:
[14655.200624] kernel: Chain exists of:
sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM -->
rfcomm_mutex --> &d->lock
[14655.200630] kernel: Possible unsafe locking scenario:
[14655.200632] kernel: CPU0 CPU1
[14655.200633] kernel: ---- ----
[14655.200635] kernel: lock(&d->lock);
[14655.200638] kernel: lock(rfcomm_mutex);
[14655.200640] kernel: lock(&d->lock);
[14655.200643] kernel: lock(sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM);
[14655.200646] kernel:
*** DEADLOCK ***
[14655.200648] kernel: 2 locks held by krfcommd/1358:
[14655.200650] kernel: #0: ffffffffc141d130
(rfcomm_mutex){+.+.}-{3:3}, at: rfcomm_run+0x135/0x1820 [rfcomm]
[14655.200661] kernel: #1: ffff9e672a54bb40 (&d->lock){+.+.}-{3:3},
at: __rfcomm_dlc_close+0x88/0x230 [rfcomm]
[14655.200671] kernel:
stack backtrace:
[14655.200673] kernel: CPU: 3 PID: 1358 Comm: krfcommd Not tainted
5.16.0-0.rc8.55.fc36.x86_64+debug #1
[14655.200677] kernel: Hardware name: LENOVO 20QDS3E200/20QDS3E200,
BIOS N2HET64W (1.47 ) 07/19/2021
[14655.200680] kernel: Call Trace:
[14655.200682] kernel: <TASK>
[14655.200686] kernel: dump_stack_lvl+0x59/0x73
[14655.200692] kernel: check_noncircular+0xdf/0x100
[14655.200699] kernel: __lock_acquire+0x11c7/0x1de0
[14655.200706] kernel: lock_acquire+0xd3/0x2f0
[14655.200710] kernel: ? rfcomm_sk_state_change+0x4d/0x110 [rfcomm]
[14655.200719] kernel: ? __rfcomm_dlc_close+0x88/0x230 [rfcomm]
[14655.200728] kernel: lock_sock_nested+0x2e/0x80
[14655.200731] kernel: ? rfcomm_sk_state_change+0x4d/0x110 [rfcomm]
[14655.200737] kernel: rfcomm_sk_state_change+0x4d/0x110 [rfcomm]
[14655.200744] kernel: __rfcomm_dlc_close+0xa2/0x230 [rfcomm]
[14655.200752] kernel: rfcomm_run+0x765/0x1820 [rfcomm]
[14655.200762] kernel: ? __init_waitqueue_head+0x60/0x60
[14655.200768] kernel: ? _raw_spin_unlock_irqrestore+0x3e/0x60
[14655.200772] kernel: ? rfcomm_check_accept+0xa0/0xa0 [rfcomm]
[14655.200780] kernel: kthread+0x17a/0x1a0
[14655.200783] kernel: ? set_kthread_struct+0x40/0x40
[14655.200788] kernel: ret_from_fork+0x1f/0x30
[14655.200796] kernel: </TASK>

Looks similar to one a bot reported against 5.14.0[syzbot] possible
deadlock in rfcomm_sk_state_change
https://www.spinics.net/lists/linux-bluetooth/msg94142.html

Downstream bug has complete dmesg attached.
https://bugzilla.redhat.com/show_bug.cgi?id=2037601

--
Chris Murphy