Hello,
syzbot found the following issue on:
HEAD commit: 7ed2632ec7d7 drm/ttm: fix ttm pool initialization for no-d..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=10768b3be80000
kernel config: https://syzkaller.appspot.com/x/.config?x=bc36d99546fe9035
dashboard link: https://syzkaller.appspot.com/bug?extid=a984066a63e9c1e62662
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11b8c217e80000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=116e5bbfe80000
Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/86560603f270/disk-7ed2632e.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/364bc921d8bf/vmlinux-7ed2632e.xz
kernel image: https://storage.googleapis.com/syzbot-assets/1d78f4f3ec9b/bzImage-7ed2632e.xz
The issue was bisected to:
commit 5af1f84ed13a416297ab9ced7537f4d5ae7f329a
Author: Luiz Augusto von Dentz <[email protected]>
Date: Thu Aug 3 18:04:51 2023 +0000
Bluetooth: hci_sync: Fix UAF on hci_abort_conn_sync
bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=11218f6be80000
final oops: https://syzkaller.appspot.com/x/report.txt?x=13218f6be80000
console output: https://syzkaller.appspot.com/x/log.txt?x=15218f6be80000
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]
Fixes: 5af1f84ed13a ("Bluetooth: hci_sync: Fix UAF on hci_abort_conn_sync")
INFO: task kworker/u5:3:5119 blocked for more than 143 seconds.
Not tainted 6.8.0-rc1-syzkaller-00026-g7ed2632ec7d7 #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u5:3 state:D stack:27264 pid:5119 tgid:5119 ppid:2 flags:0x00004000
Workqueue: hci5 hci_cmd_sync_work
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5400 [inline]
__schedule+0xf12/0x5c00 kernel/sched/core.c:6727
__schedule_loop kernel/sched/core.c:6802 [inline]
schedule+0xe9/0x270 kernel/sched/core.c:6817
schedule_preempt_disabled+0x13/0x20 kernel/sched/core.c:6874
__mutex_lock_common kernel/locking/mutex.c:684 [inline]
__mutex_lock+0x5b9/0x9d0 kernel/locking/mutex.c:752
hci_connect_cfm include/net/bluetooth/hci_core.h:1983 [inline]
hci_conn_failed+0x158/0x370 net/bluetooth/hci_conn.c:1289
hci_abort_conn_sync+0x758/0xb50 net/bluetooth/hci_sync.c:5356
abort_conn_sync+0x187/0x390 net/bluetooth/hci_conn.c:2988
hci_cmd_sync_work+0x1a4/0x410 net/bluetooth/hci_sync.c:306
process_one_work+0x886/0x15d0 kernel/workqueue.c:2633
process_scheduled_works kernel/workqueue.c:2706 [inline]
worker_thread+0x8b9/0x1290 kernel/workqueue.c:2787
kthread+0x2c6/0x3a0 kernel/kthread.c:388
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
</TASK>
INFO: task kworker/u5:5:5122 blocked for more than 146 seconds.
Not tainted 6.8.0-rc1-syzkaller-00026-g7ed2632ec7d7 #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u5:5 state:D stack:27536 pid:5122 tgid:5122 ppid:2 flags:0x00004000
Workqueue: hci1 hci_cmd_sync_work
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5400 [inline]
__schedule+0xf12/0x5c00 kernel/sched/core.c:6727
__schedule_loop kernel/sched/core.c:6802 [inline]
schedule+0xe9/0x270 kernel/sched/core.c:6817
schedule_preempt_disabled+0x13/0x20 kernel/sched/core.c:6874
__mutex_lock_common kernel/locking/mutex.c:684 [inline]
__mutex_lock+0x5b9/0x9d0 kernel/locking/mutex.c:752
hci_connect_cfm include/net/bluetooth/hci_core.h:1983 [inline]
hci_conn_failed+0x158/0x370 net/bluetooth/hci_conn.c:1289
hci_abort_conn_sync+0x758/0xb50 net/bluetooth/hci_sync.c:5356
abort_conn_sync+0x187/0x390 net/bluetooth/hci_conn.c:2988
hci_cmd_sync_work+0x1a4/0x410 net/bluetooth/hci_sync.c:306
process_one_work+0x886/0x15d0 kernel/workqueue.c:2633
process_scheduled_works kernel/workqueue.c:2706 [inline]
worker_thread+0x8b9/0x1290 kernel/workqueue.c:2787
kthread+0x2c6/0x3a0 kernel/kthread.c:388
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
</TASK>
INFO: task kworker/u5:8:5131 blocked for more than 148 seconds.
Not tainted 6.8.0-rc1-syzkaller-00026-g7ed2632ec7d7 #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u5:8 state:D stack:27808 pid:5131 tgid:5131 ppid:2 flags:0x00004000
Workqueue: hci4 hci_cmd_sync_work
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5400 [inline]
__schedule+0xf12/0x5c00 kernel/sched/core.c:6727
__schedule_loop kernel/sched/core.c:6802 [inline]
schedule+0xe9/0x270 kernel/sched/core.c:6817
schedule_preempt_disabled+0x13/0x20 kernel/sched/core.c:6874
__mutex_lock_common kernel/locking/mutex.c:684 [inline]
__mutex_lock+0x5b9/0x9d0 kernel/locking/mutex.c:752
hci_connect_cfm include/net/bluetooth/hci_core.h:1983 [inline]
hci_conn_failed+0x158/0x370 net/bluetooth/hci_conn.c:1289
hci_abort_conn_sync+0x758/0xb50 net/bluetooth/hci_sync.c:5356
abort_conn_sync+0x187/0x390 net/bluetooth/hci_conn.c:2988
hci_cmd_sync_work+0x1a4/0x410 net/bluetooth/hci_sync.c:306
process_one_work+0x886/0x15d0 kernel/workqueue.c:2633
process_scheduled_works kernel/workqueue.c:2706 [inline]
worker_thread+0x8b9/0x1290 kernel/workqueue.c:2787
kthread+0x2c6/0x3a0 kernel/kthread.c:388
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
</TASK>
Showing all locks held in the system:
2 locks held by kworker/0:1/9:
3 locks held by kworker/1:0/23:
1 lock held by khungtaskd/29:
#0: ffffffff8d1acba0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline]
#0: ffffffff8d1acba0 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:750 [inline]
#0: ffffffff8d1acba0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x75/0x340 kernel/locking/lockdep.c:6614
2 locks held by kswapd0/87:
2 locks held by kswapd1/88:
3 locks held by kworker/1:2/925:
2 locks held by syslogd/4496:
#0: ffff8880b993ccd8 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x29/0x130 kernel/sched/core.c:559
#1: ffff8880b9928a08 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_switch+0x2d9/0x900 kernel/sched/psi.c:988
2 locks held by getty/4812:
#0: ffff88802908b0a0 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x24/0x80 drivers/tty/tty_ldisc.c:243
#1: ffffc900031232f0 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0xfc6/0x1490 drivers/tty/n_tty.c:2201
2 locks held by kworker/0:0/5077:
1 lock held by sshd/5100:
2 locks held by kworker/1:1/5116:
5 locks held by kworker/u5:3/5119:
#0: ffff88801d8b3938 ((wq_completion)hci5){+.+.}-{0:0}, at: process_one_work+0x789/0x15d0 kernel/workqueue.c:2608
#1: ffffc900046efd80 ((work_completion)(&hdev->cmd_sync_work)){+.+.}-{0:0}, at: process_one_work+0x7eb/0x15d0 kernel/workqueue.c:2609
#2: ffff888021019060 (&hdev->req_lock){+.+.}-{3:3}, at: hci_cmd_sync_work+0x170/0x410 net/bluetooth/hci_sync.c:305
#3: ffff888021018078 (&hdev->lock){+.+.}-{3:3}, at: hci_abort_conn_sync+0x150/0xb50 net/bluetooth/hci_sync.c:5337
#4: ffffffff8ef22548 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_connect_cfm include/net/bluetooth/hci_core.h:1983 [inline]
#4: ffffffff8ef22548 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_conn_failed+0x158/0x370 net/bluetooth/hci_conn.c:1289
---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].
syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.
If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)
If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report
If you want to undo deduplication, reply with:
#syz undup
please test task hung in hci_conn_failed
#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
index a6fc8a2a5c67..2cc194ad37ba 100644
--- a/net/bluetooth/hci_sync.c
+++ b/net/bluetooth/hci_sync.c
@@ -5338,7 +5338,7 @@ int hci_abort_conn_sync(struct hci_dev *hdev, struct hci_conn *conn, u8 reason)
/* Check if the connection has been cleaned up concurrently */
c = hci_conn_hash_lookup_handle(hdev, handle);
- if (!c || c != conn) {
+ if (!c || c != conn || conn->state == BT_CLOSED) {
err = 0;
goto unlock;
}
Hello,
syzbot has tested the proposed patch but the reproducer is still triggering an issue:
INFO: task hung in hci_conn_failed
INFO: task kworker/u5:2:5062 blocked for more than 143 seconds.
Not tainted 6.8.0-rc1-syzkaller-g8a696a29c690-dirty #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u5:2 state:D stack:27840 pid:5062 tgid:5062 ppid:2 flags:0x00004000
Workqueue: hci5 hci_cmd_sync_work
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5400 [inline]
__schedule+0xf12/0x5c00 kernel/sched/core.c:6727
__schedule_loop kernel/sched/core.c:6802 [inline]
schedule+0xe9/0x270 kernel/sched/core.c:6817
schedule_preempt_disabled+0x13/0x20 kernel/sched/core.c:6874
__mutex_lock_common kernel/locking/mutex.c:684 [inline]
__mutex_lock+0x5b9/0x9d0 kernel/locking/mutex.c:752
hci_connect_cfm include/net/bluetooth/hci_core.h:1983 [inline]
hci_conn_failed+0x158/0x370 net/bluetooth/hci_conn.c:1289
hci_abort_conn_sync+0x7f8/0xb70 net/bluetooth/hci_sync.c:5356
abort_conn_sync+0x187/0x390 net/bluetooth/hci_conn.c:2988
hci_cmd_sync_work+0x1a4/0x410 net/bluetooth/hci_sync.c:306
process_one_work+0x886/0x15d0 kernel/workqueue.c:2633
process_scheduled_works kernel/workqueue.c:2706 [inline]
worker_thread+0x8b9/0x1290 kernel/workqueue.c:2787
kthread+0x2c6/0x3a0 kernel/kthread.c:388
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
</TASK>
Showing all locks held in the system:
2 locks held by kworker/0:1/9:
2 locks held by kworker/u4:0/11:
#0: ffff888013089938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x789/0x15d0 kernel/workqueue.c:2608
#1: ffffc90000107d80 (connector_reaper_work){+.+.}-{0:0}, at: process_one_work+0x7eb/0x15d0 kernel/workqueue.c:2609
3 locks held by kworker/1:1/27:
1 lock held by khungtaskd/29:
#0: ffffffff8d1acba0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline]
#0: ffffffff8d1acba0 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:750 [inline]
#0: ffffffff8d1acba0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x75/0x340 kernel/locking/lockdep.c:6614
6 locks held by kworker/u5:0/51:
#0: ffff88802a730d38 ((wq_completion)hci2){+.+.}-{0:0}, at: process_one_work+0x789/0x15d0 kernel/workqueue.c:2608
#1: ffffc90000bc7d80 ((work_completion)(&hdev->cmd_sync_work)){+.+.}-{0:0}, at: process_one_work+0x7eb/0x15d0 kernel/workqueue.c:2609
#2: ffff888020989060 (&hdev->req_lock){+.+.}-{3:3}, at: hci_cmd_sync_work+0x170/0x410 net/bluetooth/hci_sync.c:305
#3: ffff888020988078 (&hdev->lock){+.+.}-{3:3}, at: hci_abort_conn_sync+0x150/0xb70 net/bluetooth/hci_sync.c:5337
#4: ffffffff8ef23348 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_connect_cfm include/net/bluetooth/hci_core.h:1983 [inline]
#4: ffffffff8ef23348 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_conn_failed+0x158/0x370 net/bluetooth/hci_conn.c:1289
#5: ffffffff8d1b83f8 (rcu_state.exp_mutex){+.+.}-{3:3}, at: exp_funnel_lock kernel/rcu/tree_exp.h:324 [inline]
#5: ffffffff8d1b83f8 (rcu_state.exp_mutex){+.+.}-{3:3}, at: synchronize_rcu_expedited+0x3ff/0x800 kernel/rcu/tree_exp.h:995
3 locks held by kworker/1:2/779:
#0: ffff8880b993ccd8 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x29/0x130 kernel/sched/core.c:559
#1: ffffc900038c7d80 ((work_completion)(&aux->work)#2){+.+.}-{0:0}, at: process_one_work+0x7eb/0x15d0 kernel/workqueue.c:2609
#2: ffff888052984c80 (&aux->poke_mutex){+.+.}-{3:3}, at: __fd_array_map_delete_elem+0x125/0x2f0 kernel/bpf/arraymap.c:884
2 locks held by kworker/u4:7/2797:
2 locks held by kworker/u4:9/2812:
#0: ffff888013089938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x789/0x15d0 kernel/workqueue.c:2608
#1: ffffc9000a05fd80 ((reaper_work).work){+.+.}-{0:0}, at: process_one_work+0x7eb/0x15d0 kernel/workqueue.c:2609
5 locks held by kworker/u5:1/4456:
#0: ffff888021b4a138 ((wq_completion)hci0){+.+.}-{0:0}, at: process_one_work+0x789/0x15d0 kernel/workqueue.c:2608
#1: ffffc9000d7dfd80 ((work_completion)(&hdev->cmd_sync_work)){+.+.}-{0:0}, at: process_one_work+0x7eb/0x15d0 kernel/workqueue.c:2609
#2: ffff88806a001060 (&hdev->req_lock){+.+.}-{3:3}, at: hci_cmd_sync_work+0x170/0x410 net/bluetooth/hci_sync.c:305
#3: ffff88806a000078 (&hdev->lock){+.+.}-{3:3}, at: hci_abort_conn_sync+0x150/0xb70 net/bluetooth/hci_sync.c:5337
#4: ffffffff8d1b83f8 (rcu_state.exp_mutex){+.+.}-{3:3}, at: exp_funnel_lock kernel/rcu/tree_exp.h:324 [inline]
#4: ffffffff8d1b83f8 (rcu_state.exp_mutex){+.+.}-{3:3}, at: synchronize_rcu_expedited+0x3ff/0x800 kernel/rcu/tree_exp.h:995
3 locks held by udevd/4512:
2 locks held by getty/4813:
#0: ffff88802e7980a0 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x24/0x80 drivers/tty/tty_ldisc.c:243
#1: ffffc9000311b2f0 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0xfc6/0x1490 drivers/tty/n_tty.c:2201
5 locks held by kworker/u5:2/5062:
#0: ffff88802271a938 ((wq_completion)hci5){+.+.}-{0:0}, at: process_one_work+0x789/0x15d0 kernel/workqueue.c:2608
#1: ffffc900041cfd80 ((work_completion)(&hdev->cmd_sync_work)){+.+.}-{0:0}, at: process_one_work+0x7eb/0x15d0 kernel/workqueue.c:2609
#2: ffff888019ff1060 (&hdev->req_lock){+.+.}-{3:3}, at: hci_cmd_sync_work+0x170/0x410 net/bluetooth/hci_sync.c:305
#3: ffff888019ff0078 (&hdev->lock){+.+.}-{3:3}, at: hci_abort_conn_sync+0x150/0xb70 net/bluetooth/hci_sync.c:5337
#4: ffffffff8ef23348 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_connect_cfm include/net/bluetooth/hci_core.h:1983 [inline]
#4: ffffffff8ef23348 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_conn_failed+0x158/0x370 net/bluetooth/hci_conn.c:1289
2 locks held by kworker/0:3/5070:
2 locks held by kworker/1:4/5071:
#0: ffff8880b993ccd8 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x29/0x130 kernel/sched/core.c:559
#1: ffff8880b9928a08 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_switch+0x2d9/0x900 kernel/sched/psi.c:988
3 locks held by kworker/1:5/5493:
3 locks held by kworker/1:6/5497:
3 locks held by kworker/0:6/5520:
2 locks held by kworker/0:8/5525:
#0: ffff8880b983ccd8 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x29/0x130 kernel/sched/core.c:559
#1: ffff8880b9828a08 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_switch+0x2d9/0x900 kernel/sched/psi.c:988
1 lock held by dhcpcd/20169:
#0: ffff8880581ba130 (sk_lock-AF_PACKET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1691 [inline]
#0: ffff8880581ba130 (sk_lock-AF_PACKET){+.+.}-{0:0}, at: packet_do_bind+0x2c/0xf50 net/packet/af_packet.c:3202
1 lock held by syz-executor.0/21114:
=============================================
NMI backtrace for cpu 1
CPU: 1 PID: 29 Comm: khungtaskd Not tainted 6.8.0-rc1-syzkaller-g8a696a29c690-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x1b0 lib/dump_stack.c:106
nmi_cpu_backtrace+0x277/0x390 lib/nmi_backtrace.c:113
nmi_trigger_cpumask_backtrace+0x299/0x300 lib/nmi_backtrace.c:62
trigger_all_cpu_backtrace include/linux/nmi.h:160 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:222 [inline]
watchdog+0xf87/0x1210 kernel/hung_task.c:379
kthread+0x2c6/0x3a0 kernel/kthread.c:388
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
</TASK>
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 21128 Comm: syz-executor.4 Not tainted 6.8.0-rc1-syzkaller-g8a696a29c690-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
RIP: 0010:get_current arch/x86/include/asm/current.h:42 [inline]
RIP: 0010:__sanitizer_cov_trace_pc+0x4/0x60 kernel/kcov.c:206
Code: 66 0f 1f 44 00 00 f3 0f 1e fa 48 8b be b0 01 00 00 e8 b0 ff ff ff 31 c0 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa <65> 48 8b 14 25 80 c2 03 00 65 8b 05 b4 9e 7b 7e a9 00 01 ff 00 48
RSP: 0018:ffffc9000433fa10 EFLAGS: 00000202
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff824e81f8
RDX: ffff88807efe5940 RSI: 0000000000000000 RDI: 0000000000000001
RBP: ffff88801ea65000 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: ffff888010bb03f0 R12: ffff8880625f1f68
R13: ffff88807e228b40 R14: ffff88814960b000 R15: ffff8880625f1f8c
FS: 0000555556c83480(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f925bd01404 CR3: 000000004b197000 CR4: 0000000000350ef0
Call Trace:
<NMI>
</NMI>
<TASK>
bh2jh include/linux/jbd2.h:338 [inline]
jbd2_journal_dirty_metadata+0x85/0xd60 fs/jbd2/transaction.c:1514
__ext4_handle_dirty_metadata+0x16e/0x8e0 fs/ext4/ext4_jbd2.c:354
__ext4_new_inode+0x1bed/0x5200 fs/ext4/ialloc.c:1234
ext4_symlink+0x5d7/0xa20 fs/ext4/namei.c:3396
vfs_symlink fs/namei.c:4480 [inline]
vfs_symlink+0x3e4/0x620 fs/namei.c:4464
do_symlinkat+0x25f/0x310 fs/namei.c:4506
__do_sys_symlinkat fs/namei.c:4522 [inline]
__se_sys_symlinkat fs/namei.c:4519 [inline]
__x64_sys_symlinkat+0x97/0xc0 fs/namei.c:4519
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xd3/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7f925bc7c527
Code: 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 0a 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fffb3e6e518 EFLAGS: 00000202 ORIG_RAX: 000000000000010a
RAX: ffffffffffffffda RBX: 00007fffb3e6e5e0 RCX: 00007f925bc7c527
RDX: 00007f925bcc951b RSI: 00000000ffffff9c RDI: 00007f925bcc850e
RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
</TASK>
Tested on:
commit: 8a696a29 Merge tag 'platform-drivers-x86-v6.8-2' of gi..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
console output: https://syzkaller.appspot.com/x/log.txt?x=10300d80180000
kernel config: https://syzkaller.appspot.com/x/.config?x=bc36d99546fe9035
dashboard link: https://syzkaller.appspot.com/bug?extid=a984066a63e9c1e62662
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=1609e64be80000
On Sat, 27 Jan 2024 15:34:31 -0800
> HEAD commit: 7ed2632ec7d7 drm/ttm: fix ttm pool initialization for no-d..
> git tree: upstream
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=116e5bbfe80000
#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
--- x/include/net/bluetooth/hci_core.h
+++ y/include/net/bluetooth/hci_core.h
@@ -758,7 +758,10 @@ struct hci_conn {
unsigned int sent;
- struct sk_buff_head data_q;
+ union {
+ struct sk_buff_head data_q;
+ struct rcu_head rcu;
+ };
struct list_head chan_list;
struct delayed_work disc_work;
--- x/net/bluetooth/hci_conn.c
+++ y/net/bluetooth/hci_conn.c
@@ -2819,16 +2819,13 @@ void hci_chan_del(struct hci_chan *chan)
BT_DBG("%s hcon %p chan %p", hdev->name, conn, chan);
list_del_rcu(&chan->list);
-
- synchronize_rcu();
-
/* Prevent new hci_chan's to be created for this hci_conn */
set_bit(HCI_CONN_DROP, &conn->flags);
hci_conn_put(conn);
skb_queue_purge(&chan->data_q);
- kfree(chan);
+ kfree_rcu(chan, rcu);
}
void hci_chan_list_flush(struct hci_conn *conn)
--
Hello,
syzbot tried to test the proposed patch but the build/boot failed:
/include/linux/stddef.h:16:33: error: 'struct hci_chan' has no member named 'rcu'
/include/linux/rcupdate.h:992:41: error: 'struct hci_chan' has no member named 'rcu'
Tested on:
commit: 8a696a29 Merge tag 'platform-drivers-x86-v6.8-2' of gi..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
kernel config: https://syzkaller.appspot.com/x/.config?x=bc36d99546fe9035
dashboard link: https://syzkaller.appspot.com/bug?extid=a984066a63e9c1e62662
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=11d9682fe80000
On Sat, 27 Jan 2024 15:34:31 -0800
> HEAD commit: 7ed2632ec7d7 drm/ttm: fix ttm pool initialization for no-d..
> git tree: upstream
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=116e5bbfe80000
#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
--- x/net/bluetooth/hci_conn.c
+++ y/net/bluetooth/hci_conn.c
@@ -2819,16 +2819,13 @@ void hci_chan_del(struct hci_chan *chan)
BT_DBG("%s hcon %p chan %p", hdev->name, conn, chan);
list_del_rcu(&chan->list);
-
- synchronize_rcu();
-
/* Prevent new hci_chan's to be created for this hci_conn */
set_bit(HCI_CONN_DROP, &conn->flags);
hci_conn_put(conn);
skb_queue_purge(&chan->data_q);
- kfree(chan);
+ kfree_rcu(chan, rcu);
}
void hci_chan_list_flush(struct hci_conn *conn)
--- x/include/net/bluetooth/hci_core.h
+++ y/include/net/bluetooth/hci_core.h
@@ -797,7 +797,10 @@ struct hci_chan {
struct list_head list;
__u16 handle;
struct hci_conn *conn;
- struct sk_buff_head data_q;
+ union {
+ struct sk_buff_head data_q;
+ struct rcu_head rcu;
+ };
unsigned int sent;
__u8 state;
bool amp;
--
Hello,
syzbot has tested the proposed patch and the reproducer did not trigger any issue:
Reported-and-tested-by: [email protected]
Tested on:
commit: 8a696a29 Merge tag 'platform-drivers-x86-v6.8-2' of gi..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
console output: https://syzkaller.appspot.com/x/log.txt?x=1185e5a0180000
kernel config: https://syzkaller.appspot.com/x/.config?x=bc36d99546fe9035
dashboard link: https://syzkaller.appspot.com/bug?extid=a984066a63e9c1e62662
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=14504eefe80000
Note: testing is done by a robot and is best-effort only.
On Sat, 27 Jan 2024 18:31:03 -0800
> Hello,
>
> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> INFO: task hung in hci_conn_failed
>
> INFO: task kworker/u5:2:5062 blocked for more than 143 seconds.
> Not tainted 6.8.0-rc1-syzkaller-g8a696a29c690-dirty #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:kworker/u5:2 state:D stack:27840 pid:5062 tgid:5062 ppid:2 flags:0x00004000
> Workqueue: hci5 hci_cmd_sync_work
> Call Trace:
> <TASK>
> context_switch kernel/sched/core.c:5400 [inline]
> __schedule+0xf12/0x5c00 kernel/sched/core.c:6727
> __schedule_loop kernel/sched/core.c:6802 [inline]
> schedule+0xe9/0x270 kernel/sched/core.c:6817
> schedule_preempt_disabled+0x13/0x20 kernel/sched/core.c:6874
> __mutex_lock_common kernel/locking/mutex.c:684 [inline]
> __mutex_lock+0x5b9/0x9d0 kernel/locking/mutex.c:752
> hci_connect_cfm include/net/bluetooth/hci_core.h:1983 [inline]
> hci_conn_failed+0x158/0x370 net/bluetooth/hci_conn.c:1289
> hci_abort_conn_sync+0x7f8/0xb70 net/bluetooth/hci_sync.c:5356
> abort_conn_sync+0x187/0x390 net/bluetooth/hci_conn.c:2988
> hci_cmd_sync_work+0x1a4/0x410 net/bluetooth/hci_sync.c:306
> process_one_work+0x886/0x15d0 kernel/workqueue.c:2633
> process_scheduled_works kernel/workqueue.c:2706 [inline]
> worker_thread+0x8b9/0x1290 kernel/workqueue.c:2787
> kthread+0x2c6/0x3a0 kernel/kthread.c:388
> ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
> ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
> </TASK>
>
> Showing all locks held in the system:
> 2 locks held by kworker/0:1/9:
> 2 locks held by kworker/u4:0/11:
> #0: ffff888013089938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x789/0x15d0 kernel/workqueue.c:2608
> #1: ffffc90000107d80 (connector_reaper_work){+.+.}-{0:0}, at: process_one_work+0x7eb/0x15d0 kernel/workqueue.c:2609
> 3 locks held by kworker/1:1/27:
> 1 lock held by khungtaskd/29:
> #0: ffffffff8d1acba0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline]
> #0: ffffffff8d1acba0 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:750 [inline]
> #0: ffffffff8d1acba0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x75/0x340 kernel/locking/lockdep.c:6614
> 6 locks held by kworker/u5:0/51:
> #0: ffff88802a730d38 ((wq_completion)hci2){+.+.}-{0:0}, at: process_one_work+0x789/0x15d0 kernel/workqueue.c:2608
> #1: ffffc90000bc7d80 ((work_completion)(&hdev->cmd_sync_work)){+.+.}-{0:0}, at: process_one_work+0x7eb/0x15d0 kernel/workqueue.c:2609
> #2: ffff888020989060 (&hdev->req_lock){+.+.}-{3:3}, at: hci_cmd_sync_work+0x170/0x410 net/bluetooth/hci_sync.c:305
> #3: ffff888020988078 (&hdev->lock){+.+.}-{3:3}, at: hci_abort_conn_sync+0x150/0xb70 net/bluetooth/hci_sync.c:5337
> #4: ffffffff8ef23348 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_connect_cfm include/net/bluetooth/hci_core.h:1983 [inline]
> #4: ffffffff8ef23348 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_conn_failed+0x158/0x370 net/bluetooth/hci_conn.c:1289
> #5: ffffffff8d1b83f8 (rcu_state.exp_mutex){+.+.}-{3:3}, at: exp_funnel_lock kernel/rcu/tree_exp.h:324 [inline]
> #5: ffffffff8d1b83f8 (rcu_state.exp_mutex){+.+.}-{3:3}, at: synchronize_rcu_expedited+0x3ff/0x800 kernel/rcu/tree_exp.h:995
> 3 locks held by kworker/1:2/779:
> #0: ffff8880b993ccd8 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x29/0x130 kernel/sched/core.c:559
> #1: ffffc900038c7d80 ((work_completion)(&aux->work)#2){+.+.}-{0:0}, at: process_one_work+0x7eb/0x15d0 kernel/workqueue.c:2609
> #2: ffff888052984c80 (&aux->poke_mutex){+.+.}-{3:3}, at: __fd_array_map_delete_elem+0x125/0x2f0 kernel/bpf/arraymap.c:884
Could locking people shed any light on the failure of detecting the
poke_mutex with rq lock held?
> 2 locks held by kworker/u4:7/2797:
> 2 locks held by kworker/u4:9/2812:
> #0: ffff888013089938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x789/0x15d0 kernel/workqueue.c:2608
> #1: ffffc9000a05fd80 ((reaper_work).work){+.+.}-{0:0}, at: process_one_work+0x7eb/0x15d0 kernel/workqueue.c:2609
> 5 locks held by kworker/u5:1/4456:
> #0: ffff888021b4a138 ((wq_completion)hci0){+.+.}-{0:0}, at: process_one_work+0x789/0x15d0 kernel/workqueue.c:2608
> #1: ffffc9000d7dfd80 ((work_completion)(&hdev->cmd_sync_work)){+.+.}-{0:0}, at: process_one_work+0x7eb/0x15d0 kernel/workqueue.c:2609
> #2: ffff88806a001060 (&hdev->req_lock){+.+.}-{3:3}, at: hci_cmd_sync_work+0x170/0x410 net/bluetooth/hci_sync.c:305
> #3: ffff88806a000078 (&hdev->lock){+.+.}-{3:3}, at: hci_abort_conn_sync+0x150/0xb70 net/bluetooth/hci_sync.c:5337
> #4: ffffffff8d1b83f8 (rcu_state.exp_mutex){+.+.}-{3:3}, at: exp_funnel_lock kernel/rcu/tree_exp.h:324 [inline]
> #4: ffffffff8d1b83f8 (rcu_state.exp_mutex){+.+.}-{3:3}, at: synchronize_rcu_expedited+0x3ff/0x800 kernel/rcu/tree_exp.h:995
> 3 locks held by udevd/4512:
> 2 locks held by getty/4813:
> #0: ffff88802e7980a0 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x24/0x80 drivers/tty/tty_ldisc.c:243
> #1: ffffc9000311b2f0 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0xfc6/0x1490 drivers/tty/n_tty.c:2201
> 5 locks held by kworker/u5:2/5062:
> #0: ffff88802271a938 ((wq_completion)hci5){+.+.}-{0:0}, at: process_one_work+0x789/0x15d0 kernel/workqueue.c:2608
> #1: ffffc900041cfd80 ((work_completion)(&hdev->cmd_sync_work)){+.+.}-{0:0}, at: process_one_work+0x7eb/0x15d0 kernel/workqueue.c:2609
> #2: ffff888019ff1060 (&hdev->req_lock){+.+.}-{3:3}, at: hci_cmd_sync_work+0x170/0x410 net/bluetooth/hci_sync.c:305
> #3: ffff888019ff0078 (&hdev->lock){+.+.}-{3:3}, at: hci_abort_conn_sync+0x150/0xb70 net/bluetooth/hci_sync.c:5337
> #4: ffffffff8ef23348 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_connect_cfm include/net/bluetooth/hci_core.h:1983 [inline]
> #4: ffffffff8ef23348 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_conn_failed+0x158/0x370 net/bluetooth/hci_conn.c:1289
> 2 locks held by kworker/0:3/5070:
> 2 locks held by kworker/1:4/5071:
> #0: ffff8880b993ccd8 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x29/0x130 kernel/sched/core.c:559
> #1: ffff8880b9928a08 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_switch+0x2d9/0x900 kernel/sched/psi.c:988
> 3 locks held by kworker/1:5/5493:
> 3 locks held by kworker/1:6/5497:
> 3 locks held by kworker/0:6/5520:
> 2 locks held by kworker/0:8/5525:
> #0: ffff8880b983ccd8 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x29/0x130 kernel/sched/core.c:559
> #1: ffff8880b9828a08 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_switch+0x2d9/0x900 kernel/sched/psi.c:988
> 1 lock held by dhcpcd/20169:
> #0: ffff8880581ba130 (sk_lock-AF_PACKET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1691 [inline]
> #0: ffff8880581ba130 (sk_lock-AF_PACKET){+.+.}-{0:0}, at: packet_do_bind+0x2c/0xf50 net/packet/af_packet.c:3202
> 1 lock held by syz-executor.0/21114:
>
> =============================================
>
>
On 2024/01/29 13:48, Hillf Danton wrote:
>> 3 locks held by kworker/1:2/779:
>> #0: ffff8880b993ccd8 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x29/0x130 kernel/sched/core.c:559
>> #1: ffffc900038c7d80 ((work_completion)(&aux->work)#2){+.+.}-{0:0}, at: process_one_work+0x7eb/0x15d0 kernel/workqueue.c:2609
>> #2: ffff888052984c80 (&aux->poke_mutex){+.+.}-{3:3}, at: __fd_array_map_delete_elem+0x125/0x2f0 kernel/bpf/arraymap.c:884
>
> Could locking people shed any light on the failure of detecting the
> poke_mutex with rq lock held?
>
Showing held locks (lockdep_print_held_locks()) is not a snapshot.
Synchronous printk() can make #0 already released by the moment #2 is taken.
Please consult printk() people for possibility of making printk() from
lockdep reports (and/or hung task reports) asynchronous.
On Mon, 29 Jan 2024 14:53:20 +0900 Tetsuo Handa wrote:
> On 2024/01/29 13:48, Hillf Danton wrote:
> >> 3 locks held by kworker/1:2/779:
> >> #0: ffff8880b993ccd8 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x29/0x130 kernel/sched/core.c:559
> >> #1: ffffc900038c7d80 ((work_completion)(&aux->work)#2){+.+.}-{0:0}, at: process_one_work+0x7eb/0x15d0 kernel/workqueue.c:2609
> >> #2: ffff888052984c80 (&aux->poke_mutex){+.+.}-{3:3}, at: __fd_array_map_delete_elem+0x125/0x2f0 kernel/bpf/arraymap.c:884
> >
> > Could locking people shed any light on the failure of detecting the
> > poke_mutex with rq lock held?
> >
>
> Showing held locks (lockdep_print_held_locks()) is not a snapshot.
> Synchronous printk() can make #0 already released by the moment #2 is taken.
>
> Please consult printk() people for possibility of making printk() from
> lockdep reports (and/or hung task reports) asynchronous.
If printk did its work well then we have difficulty explaining why RCU
stalled more than two minutes.
6 locks held by kworker/u5:0/51:
#0: ffff88802a730d38 ((wq_completion)hci2){+.+.}-{0:0}, at: process_one_work+0x789/0x15d0 kernel/workqueue.c:2608
#1: ffffc90000bc7d80 ((work_completion)(&hdev->cmd_sync_work)){+.+.}-{0:0}, at: process_one_work+0x7eb/0x15d0 kernel/workqueue.c:2609
#2: ffff888020989060 (&hdev->req_lock){+.+.}-{3:3}, at: hci_cmd_sync_work+0x170/0x410 net/bluetooth/hci_sync.c:305
#3: ffff888020988078 (&hdev->lock){+.+.}-{3:3}, at: hci_abort_conn_sync+0x150/0xb70 net/bluetooth/hci_sync.c:5337
#4: ffffffff8ef23348 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_connect_cfm include/net/bluetooth/hci_core.h:1983 [inline]
#4: ffffffff8ef23348 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_conn_failed+0x158/0x370 net/bluetooth/hci_conn.c:1289
#5: ffffffff8d1b83f8 (rcu_state.exp_mutex){+.+.}-{3:3}, at: exp_funnel_lock kernel/rcu/tree_exp.h:324 [inline]
#5: ffffffff8d1b83f8 (rcu_state.exp_mutex){+.+.}-{3:3}, at: synchronize_rcu_expedited+0x3ff/0x800 kernel/rcu/tree_exp.h:995
On 2024/01/29 19:47, Hillf Danton wrote:
> On Mon, 29 Jan 2024 14:53:20 +0900 Tetsuo Handa wrote:
>> On 2024/01/29 13:48, Hillf Danton wrote:
>>>> 3 locks held by kworker/1:2/779:
>>>> #0: ffff8880b993ccd8 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x29/0x130 kernel/sched/core.c:559
>>>> #1: ffffc900038c7d80 ((work_completion)(&aux->work)#2){+.+.}-{0:0}, at: process_one_work+0x7eb/0x15d0 kernel/workqueue.c:2609
>>>> #2: ffff888052984c80 (&aux->poke_mutex){+.+.}-{3:3}, at: __fd_array_map_delete_elem+0x125/0x2f0 kernel/bpf/arraymap.c:884
>>>
>>> Could locking people shed any light on the failure of detecting the
>>> poke_mutex with rq lock held?
>>>
>>
>> Showing held locks (lockdep_print_held_locks()) is not a snapshot.
>> Synchronous printk() can make #0 already released by the moment #2 is taken.
>>
>> Please consult printk() people for possibility of making printk() from
>> lockdep reports (and/or hung task reports) asynchronous.
>
> If printk did its work well then we have difficulty explaining why RCU
> stalled more than two minutes.
>
The report you are talking about is https://syzkaller.appspot.com/x/report.txt?x=14300d80180000 .
It says
INFO: task kworker/u5:2:5062 blocked for more than 143 seconds.
Not tainted 6.8.0-rc1-syzkaller-g8a696a29c690-dirty #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u5:2 state:D stack:27840 pid:5062 tgid:5062 ppid:2 flags:0x00004000
Workqueue: hci5 hci_cmd_sync_work
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5400 [inline]
__schedule+0xf12/0x5c00 kernel/sched/core.c:6727
__schedule_loop kernel/sched/core.c:6802 [inline]
schedule+0xe9/0x270 kernel/sched/core.c:6817
schedule_preempt_disabled+0x13/0x20 kernel/sched/core.c:6874
__mutex_lock_common kernel/locking/mutex.c:684 [inline]
__mutex_lock+0x5b9/0x9d0 kernel/locking/mutex.c:752
hci_connect_cfm include/net/bluetooth/hci_core.h:1983 [inline]
hci_conn_failed+0x158/0x370 net/bluetooth/hci_conn.c:1289
hci_abort_conn_sync+0x7f8/0xb70 net/bluetooth/hci_sync.c:5356
abort_conn_sync+0x187/0x390 net/bluetooth/hci_conn.c:2988
hci_cmd_sync_work+0x1a4/0x410 net/bluetooth/hci_sync.c:306
process_one_work+0x886/0x15d0 kernel/workqueue.c:2633
process_scheduled_works kernel/workqueue.c:2706 [inline]
worker_thread+0x8b9/0x1290 kernel/workqueue.c:2787
kthread+0x2c6/0x3a0 kernel/kthread.c:388
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
</TASK>
and lockdep says kworker/u5:2/5062 was blocked at
5 locks held by kworker/u5:2/5062:
#0: ffff88802271a938 ((wq_completion)hci5){+.+.}-{0:0}, at: process_one_work+0x789/0x15d0 kernel/workqueue.c:2608
#1: ffffc900041cfd80 ((work_completion)(&hdev->cmd_sync_work)){+.+.}-{0:0}, at: process_one_work+0x7eb/0x15d0 kernel/workqueue.c:2609
#2: ffff888019ff1060 (&hdev->req_lock){+.+.}-{3:3}, at: hci_cmd_sync_work+0x170/0x410 net/bluetooth/hci_sync.c:305
#3: ffff888019ff0078 (&hdev->lock){+.+.}-{3:3}, at: hci_abort_conn_sync+0x150/0xb70 net/bluetooth/hci_sync.c:5337
#4: ffffffff8ef23348 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_connect_cfm include/net/bluetooth/hci_core.h:1983 [inline]
#4: ffffffff8ef23348 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_conn_failed+0x158/0x370 net/bluetooth/hci_conn.c:1289
and ffffffff8ef23348 was held by kworker/u5:0/51.
6 locks held by kworker/u5:0/51:
#0: ffff88802a730d38 ((wq_completion)hci2){+.+.}-{0:0}, at: process_one_work+0x789/0x15d0 kernel/workqueue.c:2608
#1: ffffc90000bc7d80 ((work_completion)(&hdev->cmd_sync_work)){+.+.}-{0:0}, at: process_one_work+0x7eb/0x15d0 kernel/workqueue.c:2609
#2: ffff888020989060 (&hdev->req_lock){+.+.}-{3:3}, at: hci_cmd_sync_work+0x170/0x410 net/bluetooth/hci_sync.c:305
#3: ffff888020988078 (&hdev->lock){+.+.}-{3:3}, at: hci_abort_conn_sync+0x150/0xb70 net/bluetooth/hci_sync.c:5337
#4: ffffffff8ef23348 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_connect_cfm include/net/bluetooth/hci_core.h:1983 [inline]
#4: ffffffff8ef23348 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_conn_failed+0x158/0x370 net/bluetooth/hci_conn.c:1289
#5: ffffffff8d1b83f8 (rcu_state.exp_mutex){+.+.}-{3:3}, at: exp_funnel_lock kernel/rcu/tree_exp.h:324 [inline]
#5: ffffffff8d1b83f8 (rcu_state.exp_mutex){+.+.}-{3:3}, at: synchronize_rcu_expedited+0x3ff/0x800 kernel/rcu/tree_exp.h:995
Since RCU reader side is not permitted to sleep, kworker/u5:0/51 should have released ffffffff8ef23348 shortly.
But the thread who was blocking ffffffff8d1b83f8 is not reported (presumably one of threads that are reported as
"X lock[s] held by Y" without stack trace), we can't know what the thread blocking ffffffff8d1b83f8 was doing.
Showing all backtraces of threads (including TASK_RUNNING state) could explain.
So, your question will be "Lockdep developers, can we please make it possible to report all backtraces of
threads which are holding at least one lock, tolerating increasing possibility of UAF when trying to report
locks held?" Although Linus does not like, maybe we need to tune hung task report using a kernel config option.
On Mon, Jan 29, 2024 at 08:28:50PM +0900, Tetsuo Handa wrote:
> On 2024/01/29 19:47, Hillf Danton wrote:
> > On Mon, 29 Jan 2024 14:53:20 +0900 Tetsuo Handa wrote:
> >> On 2024/01/29 13:48, Hillf Danton wrote:
> >>>> 3 locks held by kworker/1:2/779:
> >>>> #0: ffff8880b993ccd8 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x29/0x130 kernel/sched/core.c:559
> >>>> #1: ffffc900038c7d80 ((work_completion)(&aux->work)#2){+.+.}-{0:0}, at: process_one_work+0x7eb/0x15d0 kernel/workqueue.c:2609
> >>>> #2: ffff888052984c80 (&aux->poke_mutex){+.+.}-{3:3}, at: __fd_array_map_delete_elem+0x125/0x2f0 kernel/bpf/arraymap.c:884
> >>>
> >>> Could locking people shed any light on the failure of detecting the
> >>> poke_mutex with rq lock held?
> >>>
> >>
> >> Showing held locks (lockdep_print_held_locks()) is not a snapshot.
> >> Synchronous printk() can make #0 already released by the moment #2 is taken.
> >>
> >> Please consult printk() people for possibility of making printk() from
> >> lockdep reports (and/or hung task reports) asynchronous.
> >
> > If printk did its work well then we have difficulty explaining why RCU
> > stalled more than two minutes.
> >
>
> The report you are talking about is https://syzkaller.appspot.com/x/report.txt?x=14300d80180000 .
>
> It says
>
> INFO: task kworker/u5:2:5062 blocked for more than 143 seconds.
> Not tainted 6.8.0-rc1-syzkaller-g8a696a29c690-dirty #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:kworker/u5:2 state:D stack:27840 pid:5062 tgid:5062 ppid:2 flags:0x00004000
> Workqueue: hci5 hci_cmd_sync_work
> Call Trace:
> <TASK>
> context_switch kernel/sched/core.c:5400 [inline]
> __schedule+0xf12/0x5c00 kernel/sched/core.c:6727
> __schedule_loop kernel/sched/core.c:6802 [inline]
> schedule+0xe9/0x270 kernel/sched/core.c:6817
> schedule_preempt_disabled+0x13/0x20 kernel/sched/core.c:6874
> __mutex_lock_common kernel/locking/mutex.c:684 [inline]
> __mutex_lock+0x5b9/0x9d0 kernel/locking/mutex.c:752
> hci_connect_cfm include/net/bluetooth/hci_core.h:1983 [inline]
> hci_conn_failed+0x158/0x370 net/bluetooth/hci_conn.c:1289
> hci_abort_conn_sync+0x7f8/0xb70 net/bluetooth/hci_sync.c:5356
> abort_conn_sync+0x187/0x390 net/bluetooth/hci_conn.c:2988
> hci_cmd_sync_work+0x1a4/0x410 net/bluetooth/hci_sync.c:306
> process_one_work+0x886/0x15d0 kernel/workqueue.c:2633
> process_scheduled_works kernel/workqueue.c:2706 [inline]
> worker_thread+0x8b9/0x1290 kernel/workqueue.c:2787
> kthread+0x2c6/0x3a0 kernel/kthread.c:388
> ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
> ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
> </TASK>
>
> and lockdep says kworker/u5:2/5062 was blocked at
>
> 5 locks held by kworker/u5:2/5062:
> #0: ffff88802271a938 ((wq_completion)hci5){+.+.}-{0:0}, at: process_one_work+0x789/0x15d0 kernel/workqueue.c:2608
> #1: ffffc900041cfd80 ((work_completion)(&hdev->cmd_sync_work)){+.+.}-{0:0}, at: process_one_work+0x7eb/0x15d0 kernel/workqueue.c:2609
> #2: ffff888019ff1060 (&hdev->req_lock){+.+.}-{3:3}, at: hci_cmd_sync_work+0x170/0x410 net/bluetooth/hci_sync.c:305
> #3: ffff888019ff0078 (&hdev->lock){+.+.}-{3:3}, at: hci_abort_conn_sync+0x150/0xb70 net/bluetooth/hci_sync.c:5337
> #4: ffffffff8ef23348 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_connect_cfm include/net/bluetooth/hci_core.h:1983 [inline]
> #4: ffffffff8ef23348 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_conn_failed+0x158/0x370 net/bluetooth/hci_conn.c:1289
>
> and ffffffff8ef23348 was held by kworker/u5:0/51.
>
> 6 locks held by kworker/u5:0/51:
> #0: ffff88802a730d38 ((wq_completion)hci2){+.+.}-{0:0}, at: process_one_work+0x789/0x15d0 kernel/workqueue.c:2608
> #1: ffffc90000bc7d80 ((work_completion)(&hdev->cmd_sync_work)){+.+.}-{0:0}, at: process_one_work+0x7eb/0x15d0 kernel/workqueue.c:2609
> #2: ffff888020989060 (&hdev->req_lock){+.+.}-{3:3}, at: hci_cmd_sync_work+0x170/0x410 net/bluetooth/hci_sync.c:305
> #3: ffff888020988078 (&hdev->lock){+.+.}-{3:3}, at: hci_abort_conn_sync+0x150/0xb70 net/bluetooth/hci_sync.c:5337
> #4: ffffffff8ef23348 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_connect_cfm include/net/bluetooth/hci_core.h:1983 [inline]
> #4: ffffffff8ef23348 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_conn_failed+0x158/0x370 net/bluetooth/hci_conn.c:1289
> #5: ffffffff8d1b83f8 (rcu_state.exp_mutex){+.+.}-{3:3}, at: exp_funnel_lock kernel/rcu/tree_exp.h:324 [inline]
> #5: ffffffff8d1b83f8 (rcu_state.exp_mutex){+.+.}-{3:3}, at: synchronize_rcu_expedited+0x3ff/0x800 kernel/rcu/tree_exp.h:995
>
> Since RCU reader side is not permitted to sleep, kworker/u5:0/51 should have released ffffffff8ef23348 shortly.
>
> But the thread who was blocking ffffffff8d1b83f8 is not reported (presumably one of threads that are reported as
> "X lock[s] held by Y" without stack trace), we can't know what the thread blocking ffffffff8d1b83f8 was doing.
>
> Showing all backtraces of threads (including TASK_RUNNING state) could explain.
>
> So, your question will be "Lockdep developers, can we please make it possible to report all backtraces of
> threads which are holding at least one lock, tolerating increasing possibility of UAF when trying to report
> locks held?" Although Linus does not like, maybe we need to tune hung task report using a kernel config option.
>
Looks to me that debug_show_all_locks() doesn't hold the lockdep lock,
so it's really a best effort race read of all tasks on lock hold
information. Maybe Hillf wants the following? (Completely untested, it
will stop your whole system and print lock holding information).
Regards,
Boqun
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 151bd3de5936..17a9392e6bd8 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -6611,6 +6611,8 @@ void debug_show_all_locks(void)
}
pr_warn("\nShowing all locks held in the system:\n");
+ local_irq_disable();
+ lockdep_lock();
rcu_read_lock();
for_each_process_thread(g, p) {
if (!p->lockdep_depth)
@@ -6620,6 +6622,8 @@ void debug_show_all_locks(void)
touch_all_softlockup_watchdogs();
}
rcu_read_unlock();
+ lockdep_unlock();
+ local_irq_enable();
pr_warn("\n");
pr_warn("=============================================\n\n");
On Mon, 29 Jan 2024 12:25:13 -0800 Boqun Feng <[email protected]>
>
> Looks to me that debug_show_all_locks() doesn't hold the lockdep lock,
> so it's really a best effort race read of all tasks on lock hold
> information. Maybe Hillf wants the following? (Completely untested, it
> will stop your whole system and print lock holding information).
#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -6611,6 +6611,8 @@ void debug_show_all_locks(void)
}
pr_warn("\nShowing all locks held in the system:\n");
+ local_irq_disable();
+ lockdep_lock();
rcu_read_lock();
for_each_process_thread(g, p) {
if (!p->lockdep_depth)
@@ -6620,6 +6622,8 @@ void debug_show_all_locks(void)
touch_all_softlockup_watchdogs();
}
rcu_read_unlock();
+ lockdep_unlock();
+ local_irq_enable();
pr_warn("\n");
pr_warn("=============================================\n\n");
--
Hello,
syzbot has tested the proposed patch but the reproducer is still triggering an issue:
INFO: task hung in hci_conn_failed
INFO: task kworker/u5:4:5429 blocked for more than 143 seconds.
Not tainted 6.8.0-rc2-syzkaller-g861c0981648f-dirty #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u5:4 state:D stack:27616 pid:5429 tgid:5429 ppid:2 flags:0x00004000
Workqueue: hci0 hci_cmd_sync_work
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5400 [inline]
__schedule+0xf12/0x5c00 kernel/sched/core.c:6727
__schedule_loop kernel/sched/core.c:6802 [inline]
schedule+0xe9/0x270 kernel/sched/core.c:6817
schedule_preempt_disabled+0x13/0x20 kernel/sched/core.c:6874
__mutex_lock_common kernel/locking/mutex.c:684 [inline]
__mutex_lock+0x5b9/0x9d0 kernel/locking/mutex.c:752
hci_connect_cfm include/net/bluetooth/hci_core.h:1983 [inline]
hci_conn_failed+0x158/0x370 net/bluetooth/hci_conn.c:1289
hci_abort_conn_sync+0x758/0xb50 net/bluetooth/hci_sync.c:5356
abort_conn_sync+0x187/0x390 net/bluetooth/hci_conn.c:2988
hci_cmd_sync_work+0x1a4/0x410 net/bluetooth/hci_sync.c:306
process_one_work+0x886/0x15d0 kernel/workqueue.c:2633
process_scheduled_works kernel/workqueue.c:2706 [inline]
worker_thread+0x8b9/0x1290 kernel/workqueue.c:2787
kthread+0x2c6/0x3a0 kernel/kthread.c:388
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
</TASK>
Showing all locks held in the system:
3 locks held by kworker/0:0/8:
3 locks held by kworker/0:1/9:
2 locks held by kworker/u4:0/11:
#0: ffff888013089938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x789/0x15d0 kernel/workqueue.c:2608
#1: ffffc90000107d80 (connector_reaper_work){+.+.}-{0:0}, at: process_one_work+0x7eb/0x15d0 kernel/workqueue.c:2609
2 locks held by kworker/u4:2/42:
#0: ffff888013089938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x789/0x15d0 kernel/workqueue.c:2608
#1: ffffc90000b2fd80 ((reaper_work).work){+.+.}-{0:0}, at: process_one_work+0x7eb/0x15d0 kernel/workqueue.c:2609
1 lock held by udevd/4510:
2 locks held by getty/4809:
#0: ffff8880298450a0 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x24/0x80 drivers/tty/tty_ldisc.c:243
#1: ffffc90002f062f0 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0xfc6/0x1490 drivers/tty/n_tty.c:2201
1 lock held by syz-executor.3/5419:
5 locks held by kworker/u5:4/5429:
#0: ffff88802d96d538 ((wq_completion)hci0){+.+.}-{0:0}, at: process_one_work+0x789/0x15d0 kernel/workqueue.c:2608
#1: ffffc90009befd80 ((work_completion)(&hdev->cmd_sync_work)){+.+.}-{0:0}, at: process_one_work+0x7eb/0x15d0 kernel/workqueue.c:2609
#2: ffff888075635060 (&hdev->req_lock){+.+.}-{3:3}, at: hci_cmd_sync_work+0x170/0x410 net/bluetooth/hci_sync.c:305
#3: ffff888075634078 (&hdev->lock){+.+.}-{3:3}, at: hci_abort_conn_sync+0x150/0xb50 net/bluetooth/hci_sync.c:5337
#4: ffffffff8ef238c8 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_connect_cfm include/net/bluetooth/hci_core.h:1983 [inline]
#4: ffffffff8ef238c8 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_conn_failed+0x158/0x370 net/bluetooth/hci_conn.c:1289
6 locks held by kworker/u5:5/5431:
#0: ffff888027f29538 ((wq_completion)hci5){+.+.}-{0:0}, at: process_one_work+0x789/0x15d0 kernel/workqueue.c:2608
#1: ffffc90009bffd80 ((work_completion)(&hdev->cmd_sync_work)){+.+.}-{0:0}, at: process_one_work+0x7eb/0x15d0 kernel/workqueue.c:2609
#2: ffff888079359060 (&hdev->req_lock){+.+.}-{3:3}, at: hci_cmd_sync_work+0x170/0x410 net/bluetooth/hci_sync.c:305
#3: ffff888079358078 (&hdev->lock){+.+.}-{3:3}, at: hci_abort_conn_sync+0x150/0xb50 net/bluetooth/hci_sync.c:5337
#4: ffffffff8ef238c8 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_connect_cfm include/net/bluetooth/hci_core.h:1983 [inline]
#4: ffffffff8ef238c8 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_conn_failed+0x158/0x370 net/bluetooth/hci_conn.c:1289
#5: ffffffff8d1b8438 (rcu_state.exp_mutex){+.+.}-{3:3}, at: exp_funnel_lock kernel/rcu/tree_exp.h:324 [inline]
#5: ffffffff8d1b8438 (rcu_state.exp_mutex){+.+.}-{3:3}, at: synchronize_rcu_expedited+0x3ff/0x800 kernel/rcu/tree_exp.h:995
3 locks held by kworker/1:5/5500:
3 locks held by kworker/0:8/5556:
3 locks held by kworker/1:6/5586:
3 locks held by kworker/1:7/5659:
2 locks held by dhcpcd/20271:
#0: ffff88808e304130 (sk_lock-AF_PACKET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1691 [inline]
#0: ffff88808e304130 (sk_lock-AF_PACKET){+.+.}-{0:0}, at: packet_do_bind+0x2c/0xf50 net/packet/af_packet.c:3202
#1: ffffffff8d1b8438 (rcu_state.exp_mutex){+.+.}-{3:3}, at: exp_funnel_lock kernel/rcu/tree_exp.h:324 [inline]
#1: ffffffff8d1b8438 (rcu_state.exp_mutex){+.+.}-{3:3}, at: synchronize_rcu_expedited+0x3ff/0x800 kernel/rcu/tree_exp.h:995
1 lock held by dhcpcd/20354:
#0: ffff88803a7ba130 (sk_lock-AF_PACKET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1691 [inline]
#0: ffff88803a7ba130 (sk_lock-AF_PACKET){+.+.}-{0:0}, at: packet_do_bind+0x2c/0xf50 net/packet/af_packet.c:3202
1 lock held by dhcpcd/20466:
#0: ffff88801db68130 (sk_lock-AF_PACKET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1691 [inline]
#0: ffff88801db68130 (sk_lock-AF_PACKET){+.+.}-{0:0}, at: packet_do_bind+0x2c/0xf50 net/packet/af_packet.c:3202
1 lock held by dhcpcd/20585:
#0: ffff888094ba4130 (sk_lock-AF_PACKET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1691 [inline]
#0: ffff888094ba4130 (sk_lock-AF_PACKET){+.+.}-{0:0}, at: packet_do_bind+0x2c/0xf50 net/packet/af_packet.c:3202
1 lock held by dhcpcd/20736:
#0: ffff88809175e130 (sk_lock-AF_PACKET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1691 [inline]
#0: ffff88809175e130 (sk_lock-AF_PACKET){+.+.}-{0:0}, at: packet_do_bind+0x2c/0xf50 net/packet/af_packet.c:3202
1 lock held by syz-executor.1/21229:
3 locks held by syz-executor.4/21232:
1 lock held by syz-executor.5/21233:
=============================================
NMI backtrace for cpu 1
CPU: 1 PID: 29 Comm: khungtaskd Not tainted 6.8.0-rc2-syzkaller-g861c0981648f-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x1b0 lib/dump_stack.c:106
nmi_cpu_backtrace+0x277/0x390 lib/nmi_backtrace.c:113
nmi_trigger_cpumask_backtrace+0x299/0x300 lib/nmi_backtrace.c:62
trigger_all_cpu_backtrace include/linux/nmi.h:160 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:222 [inline]
watchdog+0xf87/0x1210 kernel/hung_task.c:379
kthread+0x2c6/0x3a0 kernel/kthread.c:388
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
</TASK>
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 21255 Comm: syz-executor.0 Not tainted 6.8.0-rc2-syzkaller-g861c0981648f-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
RIP: 0010:__debug_check_no_obj_freed lib/debugobjects.c:981 [inline]
RIP: 0010:debug_check_no_obj_freed+0x21d/0x600 lib/debugobjects.c:1019
Code: 89 64 24 38 49 bd 22 01 00 00 00 00 ad de 48 c1 ea 03 4d 89 fc 80 3c 1a 00 0f 85 dc 00 00 00 48 8d 78 18 41 83 c6 01 4c 8b 38 <48> 89 fa 48 c1 ea 03 80 3c 1a 00 0f 85 f7 02 00 00 48 8b 50 18 4c
RSP: 0018:ffffc9007e76fc80 EFLAGS: 00000006
RAX: ffff88809414c528 RBX: dffffc0000000000 RCX: ffffffff8168ea5e
RDX: 1ffff110128298a5 RSI: ffff888066342100 RDI: ffff88809414c540
RBP: ffffc9007e76fdc0 R08: 0000000000000001 R09: fffff5200fcedf7e
R10: 0000000000000003 R11: 0000000000000000 R12: ffff888066341100
R13: dead000000000122 R14: 0000000000000005 R15: ffff88805dc630f0
FS: 0000555555f24480(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007faacb501400 CR3: 0000000048e71000 CR4: 0000000000350ef0
Call Trace:
<NMI>
</NMI>
<TASK>
slab_free_hook mm/slub.c:2093 [inline]
slab_free mm/slub.c:4299 [inline]
kmem_cache_free+0x255/0x350 mm/slub.c:4363
putname+0x12e/0x170 fs/namei.c:274
do_symlinkat+0x1ac/0x310 fs/namei.c:4515
__do_sys_symlinkat fs/namei.c:4522 [inline]
__se_sys_symlinkat fs/namei.c:4519 [inline]
__x64_sys_symlinkat+0x97/0xc0 fs/namei.c:4519
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xd3/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7faacb47c527
Code: 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 0a 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffd9c121f28 EFLAGS: 00000206 ORIG_RAX: 000000000000010a
RAX: ffffffffffffffda RBX: 00007ffd9c121ff0 RCX: 00007faacb47c527
RDX: 00007faacb4c9526 RSI: 00000000ffffff9c RDI: 00007ffd9c121ff0
RBP: 0000000000000001 R08: 0000000000000017 R09: 00007ffd9c121c77
R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
</TASK>
Tested on:
commit: 861c0981 Merge tag 'jfs-6.8-rc3' of github.com:kleikam..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
console output: https://syzkaller.appspot.com/x/log.txt?x=1207db37e80000
kernel config: https://syzkaller.appspot.com/x/.config?x=b168fa511db3ca08
dashboard link: https://syzkaller.appspot.com/bug?extid=a984066a63e9c1e62662
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=14a550d3e80000
On Tue, 30 Jan 2024 07:45:06 -0800
> 6 locks held by kworker/u5:5/5431:
> #0: ffff888027f29538 ((wq_completion)hci5){+.+.}-{0:0}, at: process_one_work+0x789/0x15d0 kernel/workqueue.c:2608
> #1: ffffc90009bffd80 ((work_completion)(&hdev->cmd_sync_work)){+.+.}-{0:0}, at: process_one_work+0x7eb/0x15d0 kernel/workqueue.c:2609
> #2: ffff888079359060 (&hdev->req_lock){+.+.}-{3:3}, at: hci_cmd_sync_work+0x170/0x410 net/bluetooth/hci_sync.c:305
> #3: ffff888079358078 (&hdev->lock){+.+.}-{3:3}, at: hci_abort_conn_sync+0x150/0xb50 net/bluetooth/hci_sync.c:5337
> #4: ffffffff8ef238c8 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_connect_cfm include/net/bluetooth/hci_core.h:1983 [inline]
> #4: ffffffff8ef238c8 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_conn_failed+0x158/0x370 net/bluetooth/hci_conn.c:1289
> #5: ffffffff8d1b8438 (rcu_state.exp_mutex){+.+.}-{3:3}, at: exp_funnel_lock kernel/rcu/tree_exp.h:324 [inline]
> #5: ffffffff8d1b8438 (rcu_state.exp_mutex){+.+.}-{3:3}, at: synchronize_rcu_expedited+0x3ff/0x800 kernel/rcu/tree_exp.h:995
> 3 locks held by kworker/1:5/5500:
> 3 locks held by kworker/0:8/5556:
> 3 locks held by kworker/1:6/5586:
> 3 locks held by kworker/1:7/5659:
> 2 locks held by dhcpcd/20271:
> #0: ffff88808e304130 (sk_lock-AF_PACKET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1691 [inline]
> #0: ffff88808e304130 (sk_lock-AF_PACKET){+.+.}-{0:0}, at: packet_do_bind+0x2c/0xf50 net/packet/af_packet.c:3202
> #1: ffffffff8d1b8438 (rcu_state.exp_mutex){+.+.}-{3:3}, at: exp_funnel_lock kernel/rcu/tree_exp.h:324 [inline]
> #1: ffffffff8d1b8438 (rcu_state.exp_mutex){+.+.}-{3:3}, at: synchronize_rcu_expedited+0x3ff/0x800 kernel/rcu/tree_exp.h:995
With Boqun's patch, the failure of detecting mutex with runqueue lock
held disappeared but RCU stalled more than two minutes as before.
[Cc other RCU folks]
On Tue, Jan 30, 2024 at 07:45:06AM -0800, syzbot wrote:
> Hello,
>
This looks like the case which Frederic recently fixed:
https://lore.kernel.org/lkml/[email protected]/
, which is currently queued in my tree for v6.9 PR:
https://github.com/fbq/linux.git rcu-exp.2024.01.29b
can we give it a spin?
Regards,
Boqun
> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> INFO: task hung in hci_conn_failed
>
> INFO: task kworker/u5:4:5429 blocked for more than 143 seconds.
> Not tainted 6.8.0-rc2-syzkaller-g861c0981648f-dirty #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:kworker/u5:4 state:D stack:27616 pid:5429 tgid:5429 ppid:2 flags:0x00004000
> Workqueue: hci0 hci_cmd_sync_work
> Call Trace:
> <TASK>
> context_switch kernel/sched/core.c:5400 [inline]
> __schedule+0xf12/0x5c00 kernel/sched/core.c:6727
> __schedule_loop kernel/sched/core.c:6802 [inline]
> schedule+0xe9/0x270 kernel/sched/core.c:6817
> schedule_preempt_disabled+0x13/0x20 kernel/sched/core.c:6874
> __mutex_lock_common kernel/locking/mutex.c:684 [inline]
> __mutex_lock+0x5b9/0x9d0 kernel/locking/mutex.c:752
> hci_connect_cfm include/net/bluetooth/hci_core.h:1983 [inline]
> hci_conn_failed+0x158/0x370 net/bluetooth/hci_conn.c:1289
> hci_abort_conn_sync+0x758/0xb50 net/bluetooth/hci_sync.c:5356
> abort_conn_sync+0x187/0x390 net/bluetooth/hci_conn.c:2988
> hci_cmd_sync_work+0x1a4/0x410 net/bluetooth/hci_sync.c:306
> process_one_work+0x886/0x15d0 kernel/workqueue.c:2633
> process_scheduled_works kernel/workqueue.c:2706 [inline]
> worker_thread+0x8b9/0x1290 kernel/workqueue.c:2787
> kthread+0x2c6/0x3a0 kernel/kthread.c:388
> ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
> ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
> </TASK>
>
> Showing all locks held in the system:
> 3 locks held by kworker/0:0/8:
> 3 locks held by kworker/0:1/9:
> 2 locks held by kworker/u4:0/11:
> #0: ffff888013089938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x789/0x15d0 kernel/workqueue.c:2608
> #1: ffffc90000107d80 (connector_reaper_work){+.+.}-{0:0}, at: process_one_work+0x7eb/0x15d0 kernel/workqueue.c:2609
> 2 locks held by kworker/u4:2/42:
> #0: ffff888013089938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x789/0x15d0 kernel/workqueue.c:2608
> #1: ffffc90000b2fd80 ((reaper_work).work){+.+.}-{0:0}, at: process_one_work+0x7eb/0x15d0 kernel/workqueue.c:2609
> 1 lock held by udevd/4510:
> 2 locks held by getty/4809:
> #0: ffff8880298450a0 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x24/0x80 drivers/tty/tty_ldisc.c:243
> #1: ffffc90002f062f0 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0xfc6/0x1490 drivers/tty/n_tty.c:2201
> 1 lock held by syz-executor.3/5419:
> 5 locks held by kworker/u5:4/5429:
> #0: ffff88802d96d538 ((wq_completion)hci0){+.+.}-{0:0}, at: process_one_work+0x789/0x15d0 kernel/workqueue.c:2608
> #1: ffffc90009befd80 ((work_completion)(&hdev->cmd_sync_work)){+.+.}-{0:0}, at: process_one_work+0x7eb/0x15d0 kernel/workqueue.c:2609
> #2: ffff888075635060 (&hdev->req_lock){+.+.}-{3:3}, at: hci_cmd_sync_work+0x170/0x410 net/bluetooth/hci_sync.c:305
> #3: ffff888075634078 (&hdev->lock){+.+.}-{3:3}, at: hci_abort_conn_sync+0x150/0xb50 net/bluetooth/hci_sync.c:5337
> #4: ffffffff8ef238c8 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_connect_cfm include/net/bluetooth/hci_core.h:1983 [inline]
> #4: ffffffff8ef238c8 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_conn_failed+0x158/0x370 net/bluetooth/hci_conn.c:1289
> 6 locks held by kworker/u5:5/5431:
> #0: ffff888027f29538 ((wq_completion)hci5){+.+.}-{0:0}, at: process_one_work+0x789/0x15d0 kernel/workqueue.c:2608
> #1: ffffc90009bffd80 ((work_completion)(&hdev->cmd_sync_work)){+.+.}-{0:0}, at: process_one_work+0x7eb/0x15d0 kernel/workqueue.c:2609
> #2: ffff888079359060 (&hdev->req_lock){+.+.}-{3:3}, at: hci_cmd_sync_work+0x170/0x410 net/bluetooth/hci_sync.c:305
> #3: ffff888079358078 (&hdev->lock){+.+.}-{3:3}, at: hci_abort_conn_sync+0x150/0xb50 net/bluetooth/hci_sync.c:5337
> #4: ffffffff8ef238c8 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_connect_cfm include/net/bluetooth/hci_core.h:1983 [inline]
> #4: ffffffff8ef238c8 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_conn_failed+0x158/0x370 net/bluetooth/hci_conn.c:1289
> #5: ffffffff8d1b8438 (rcu_state.exp_mutex){+.+.}-{3:3}, at: exp_funnel_lock kernel/rcu/tree_exp.h:324 [inline]
> #5: ffffffff8d1b8438 (rcu_state.exp_mutex){+.+.}-{3:3}, at: synchronize_rcu_expedited+0x3ff/0x800 kernel/rcu/tree_exp.h:995
> 3 locks held by kworker/1:5/5500:
> 3 locks held by kworker/0:8/5556:
> 3 locks held by kworker/1:6/5586:
> 3 locks held by kworker/1:7/5659:
> 2 locks held by dhcpcd/20271:
> #0: ffff88808e304130 (sk_lock-AF_PACKET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1691 [inline]
> #0: ffff88808e304130 (sk_lock-AF_PACKET){+.+.}-{0:0}, at: packet_do_bind+0x2c/0xf50 net/packet/af_packet.c:3202
> #1: ffffffff8d1b8438 (rcu_state.exp_mutex){+.+.}-{3:3}, at: exp_funnel_lock kernel/rcu/tree_exp.h:324 [inline]
> #1: ffffffff8d1b8438 (rcu_state.exp_mutex){+.+.}-{3:3}, at: synchronize_rcu_expedited+0x3ff/0x800 kernel/rcu/tree_exp.h:995
> 1 lock held by dhcpcd/20354:
> #0: ffff88803a7ba130 (sk_lock-AF_PACKET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1691 [inline]
> #0: ffff88803a7ba130 (sk_lock-AF_PACKET){+.+.}-{0:0}, at: packet_do_bind+0x2c/0xf50 net/packet/af_packet.c:3202
> 1 lock held by dhcpcd/20466:
> #0: ffff88801db68130 (sk_lock-AF_PACKET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1691 [inline]
> #0: ffff88801db68130 (sk_lock-AF_PACKET){+.+.}-{0:0}, at: packet_do_bind+0x2c/0xf50 net/packet/af_packet.c:3202
> 1 lock held by dhcpcd/20585:
> #0: ffff888094ba4130 (sk_lock-AF_PACKET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1691 [inline]
> #0: ffff888094ba4130 (sk_lock-AF_PACKET){+.+.}-{0:0}, at: packet_do_bind+0x2c/0xf50 net/packet/af_packet.c:3202
> 1 lock held by dhcpcd/20736:
> #0: ffff88809175e130 (sk_lock-AF_PACKET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1691 [inline]
> #0: ffff88809175e130 (sk_lock-AF_PACKET){+.+.}-{0:0}, at: packet_do_bind+0x2c/0xf50 net/packet/af_packet.c:3202
> 1 lock held by syz-executor.1/21229:
> 3 locks held by syz-executor.4/21232:
> 1 lock held by syz-executor.5/21233:
>
> =============================================
>
> NMI backtrace for cpu 1
> CPU: 1 PID: 29 Comm: khungtaskd Not tainted 6.8.0-rc2-syzkaller-g861c0981648f-dirty #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
> Call Trace:
> <TASK>
> __dump_stack lib/dump_stack.c:88 [inline]
> dump_stack_lvl+0xd9/0x1b0 lib/dump_stack.c:106
> nmi_cpu_backtrace+0x277/0x390 lib/nmi_backtrace.c:113
> nmi_trigger_cpumask_backtrace+0x299/0x300 lib/nmi_backtrace.c:62
> trigger_all_cpu_backtrace include/linux/nmi.h:160 [inline]
> check_hung_uninterruptible_tasks kernel/hung_task.c:222 [inline]
> watchdog+0xf87/0x1210 kernel/hung_task.c:379
> kthread+0x2c6/0x3a0 kernel/kthread.c:388
> ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
> ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
> </TASK>
> Sending NMI from CPU 1 to CPUs 0:
> NMI backtrace for cpu 0
> CPU: 0 PID: 21255 Comm: syz-executor.0 Not tainted 6.8.0-rc2-syzkaller-g861c0981648f-dirty #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
> RIP: 0010:__debug_check_no_obj_freed lib/debugobjects.c:981 [inline]
> RIP: 0010:debug_check_no_obj_freed+0x21d/0x600 lib/debugobjects.c:1019
> Code: 89 64 24 38 49 bd 22 01 00 00 00 00 ad de 48 c1 ea 03 4d 89 fc 80 3c 1a 00 0f 85 dc 00 00 00 48 8d 78 18 41 83 c6 01 4c 8b 38 <48> 89 fa 48 c1 ea 03 80 3c 1a 00 0f 85 f7 02 00 00 48 8b 50 18 4c
> RSP: 0018:ffffc9007e76fc80 EFLAGS: 00000006
> RAX: ffff88809414c528 RBX: dffffc0000000000 RCX: ffffffff8168ea5e
> RDX: 1ffff110128298a5 RSI: ffff888066342100 RDI: ffff88809414c540
> RBP: ffffc9007e76fdc0 R08: 0000000000000001 R09: fffff5200fcedf7e
> R10: 0000000000000003 R11: 0000000000000000 R12: ffff888066341100
> R13: dead000000000122 R14: 0000000000000005 R15: ffff88805dc630f0
> FS: 0000555555f24480(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007faacb501400 CR3: 0000000048e71000 CR4: 0000000000350ef0
> Call Trace:
> <NMI>
> </NMI>
> <TASK>
> slab_free_hook mm/slub.c:2093 [inline]
> slab_free mm/slub.c:4299 [inline]
> kmem_cache_free+0x255/0x350 mm/slub.c:4363
> putname+0x12e/0x170 fs/namei.c:274
> do_symlinkat+0x1ac/0x310 fs/namei.c:4515
> __do_sys_symlinkat fs/namei.c:4522 [inline]
> __se_sys_symlinkat fs/namei.c:4519 [inline]
> __x64_sys_symlinkat+0x97/0xc0 fs/namei.c:4519
> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> do_syscall_64+0xd3/0x250 arch/x86/entry/common.c:83
> entry_SYSCALL_64_after_hwframe+0x63/0x6b
> RIP: 0033:0x7faacb47c527
> Code: 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 0a 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007ffd9c121f28 EFLAGS: 00000206 ORIG_RAX: 000000000000010a
> RAX: ffffffffffffffda RBX: 00007ffd9c121ff0 RCX: 00007faacb47c527
> RDX: 00007faacb4c9526 RSI: 00000000ffffff9c RDI: 00007ffd9c121ff0
> RBP: 0000000000000001 R08: 0000000000000017 R09: 00007ffd9c121c77
> R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
> R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
> </TASK>
>
>
> Tested on:
>
> commit: 861c0981 Merge tag 'jfs-6.8-rc3' of github.com:kleikam..
> git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> console output: https://syzkaller.appspot.com/x/log.txt?x=1207db37e80000
> kernel config: https://syzkaller.appspot.com/x/.config?x=b168fa511db3ca08
> dashboard link: https://syzkaller.appspot.com/bug?extid=a984066a63e9c1e62662
> compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> patch: https://syzkaller.appspot.com/x/patch.diff?x=14a550d3e80000
>
[cut cc list short to walk around spam detector]
On Tue, 30 Jan 2024 22:38:30 -0800 Boqun Feng <[email protected]>
>
> This looks like the case which Frederic recently fixed:
>
> https://lore.kernel.org/lkml/[email protected]/
>
> , which is currently queued in my tree for v6.9 PR:
>
> https://github.com/fbq/linux.git rcu-exp.2024.01.29b
>
> can we give it a spin?
#syz test https://github.com/fbq/linux.git rcu-exp.2024.01.29b
Hello,
syzbot has tested the proposed patch and the reproducer did not trigger any issue:
Reported-and-tested-by: [email protected]
Tested on:
commit: dd85149d rcu/exp: Remove rcu_par_gp_wq
git tree: https://github.com/fbq/linux.git rcu-exp.2024.01.29b
console output: https://syzkaller.appspot.com/x/log.txt?x=1132b337e80000
kernel config: https://syzkaller.appspot.com/x/.config?x=f9804db253bdfc61
dashboard link: https://syzkaller.appspot.com/bug?extid=a984066a63e9c1e62662
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
Note: no patches were applied.
Note: testing is done by a robot and is best-effort only.