Hello,
syzbot found the following issue on:
HEAD commit: e47eb90a0a9a Add linux-next specific files for 20220901
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=13fad8f5080000
kernel config: https://syzkaller.appspot.com/x/.config?x=7933882276523081
dashboard link: https://syzkaller.appspot.com/bug?extid=844c7bf1b1aa4119c5de
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
Unfortunately, I don't have any reproducer for this issue yet.
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]
------------[ cut here ]------------
ODEBUG: assert_init not available (active state 0) object type: timer_list hint: 0x0
WARNING: CPU: 1 PID: 4347 at lib/debugobjects.c:509 debug_print_object+0x16e/0x250 lib/debugobjects.c:509
Modules linked in:
CPU: 1 PID: 4347 Comm: syz-executor.1 Not tainted 6.0.0-rc3-next-20220901-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022
RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:509
Code: ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 af 00 00 00 48 8b 14 dd 00 46 49 8a 4c 89 ee 48 c7 c7 a0 39 49 8a e8 36 a5 3a 05 <0f> 0b 83 05 b5 56 dc 09 01 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e c3
RSP: 0018:ffffc900148279c0 EFLAGS: 00010086
RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000000000
RDX: 0000000000040000 RSI: ffffffff81620448 RDI: fffff52002904f2a
RBP: 0000000000000001 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000080000000 R11: 203a47554245444f R12: ffffffff89ef0860
R13: ffffffff8a494060 R14: ffffffff816b41b0 R15: 1ffff92002904f43
FS: 00007f431bb46700(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000300 CR3: 0000000027d81000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
debug_object_assert_init lib/debugobjects.c:899 [inline]
debug_object_assert_init+0x1f4/0x2e0 lib/debugobjects.c:870
debug_timer_assert_init kernel/time/timer.c:792 [inline]
debug_assert_init kernel/time/timer.c:837 [inline]
del_timer+0x6d/0x110 kernel/time/timer.c:1257
try_to_grab_pending+0x6d/0xd0 kernel/workqueue.c:1275
__cancel_work_timer+0xa6/0x570 kernel/workqueue.c:3119
mgmt_index_removed+0x218/0x340 net/bluetooth/mgmt.c:8964
hci_sock_bind+0x472/0x1760 net/bluetooth/hci_sock.c:1218
__sys_bind+0x1e9/0x250 net/socket.c:1776
__do_sys_bind net/socket.c:1787 [inline]
__se_sys_bind net/socket.c:1785 [inline]
__x64_sys_bind+0x6f/0xb0 net/socket.c:1785
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f431aa89279
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f431bb46168 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
RAX: ffffffffffffffda RBX: 00007f431ab9bf80 RCX: 00007f431aa89279
RDX: 0000000000000006 RSI: 0000000020000300 RDI: 0000000000000004
RBP: 00007f431aae32e9 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffce33d94af R14: 00007f431bb46300 R15: 0000000000022000
</TASK>
---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].
syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
I guess that, since hci_pi(sk)->hdev = hdev in hci_sock_bind() does not check whether a hdev
is already associated with some sk, it is possible that multiple sk are bound to the same hdev.
As a result, lock_sock(sk) is not sufficient for serializing access to hdev, and
hci_dev_lock(hdev) is needed when calling mgmt_index_*(hdev) functions.
If my guess above is correct, I think that what syzbot is telling us is that, due to lack of
serialization via hci_dev_lock(hdev), setting of HCI_MGMT flag from mgmt_init_hdev() from
hci_mgmt_cmd() from hci_sock_sendmsg() is racing with testing of HCI_MGMT flag from
mgmt_index_removed() from hci_sock_bind().
I suggest you to explicitly use lockdep_assert_held() in Bluethooth code for clarifying what
locks need to be held, instead of continue using racy operations like hci_dev_test_and_set_flag()
without holding appropriate locks.
hci_unregister_dev() {
if (!test_bit(HCI_INIT, &hdev->flags) &&
!hci_dev_test_flag(hdev, HCI_SETUP) &&
!hci_dev_test_flag(hdev, HCI_CONFIG)) {
hci_dev_lock(hdev);
mgmt_index_removed(hdev) {
if (!hci_dev_test_flag(hdev, HCI_MGMT))
return;
cancel_delayed_work_sync(&hdev->discov_off);
}
hci_dev_unlock(hdev);
}
}
hci_sock_sendmsg() {
lock_sock(sk);
mutex_lock(&mgmt_chan_list_lock);
chan = __hci_mgmt_chan_find(hci_pi(sk)->channel);
if (chan)
err = hci_mgmt_cmd(chan, sk, skb) {
if (hdev && chan->hdev_init) // chan->hdev_init == mgmt_init_hdev
chan->hdev_init(sk, hdev) {
if (hci_dev_test_and_set_flag(hdev, HCI_MGMT)) // Missing hci_dev_lock(hdev)
return;
INIT_DELAYED_WORK(&hdev->discov_off, discov_off);
}
err = handler->func(sk, hdev, cp, len) { // handler->func() == set_external_config or set_public_address
hci_dev_lock(hdev);
mgmt_index_removed(hdev) {
if (!hci_dev_test_flag(hdev, HCI_MGMT))
return;
cancel_delayed_work_sync(&hdev->discov_off);
}
hci_dev_unlock(hdev);
}
}
else
err = -EINVAL;
mutex_unlock(&mgmt_chan_list_lock);
release_sock(sk);
}
hci_sock_bind() {
lock_sock(sk);
mgmt_index_removed(hdev) {
if (!hci_dev_test_flag(hdev, HCI_MGMT)) // Missing hci_dev_lock(hdev)
return;
cancel_delayed_work_sync(&hdev->discov_off); // ODEBUG complains missing INIT_DELAYED_WORK()
}
release_sock(sk);
}
syzbot is again reporting attempt to cancel uninitialized work
at mgmt_index_removed() [1], for setting of HCI_MGMT flag from
mgmt_init_hdev() from hci_mgmt_cmd() from hci_sock_sendmsg() can
race with testing of HCI_MGMT flag from mgmt_index_removed() from
hci_sock_bind() due to lack of serialization via hci_dev_lock().
Since mgmt_init_hdev() is called with mgmt_chan_list_lock held, we can
safely split hci_dev_test_and_set_flag() into hci_dev_test_flag() and
hci_dev_set_flag(). Thus, in order to close this race, set HCI_MGMT flag
after INIT_DELAYED_WORK() completed.
This is a local fix based on mgmt_chan_list_lock. Lack of serialization
via hci_dev_lock() might be causing different race conditions somewhere
else. But a global fix based on hci_dev_lock() should deserve a future
patch.
Link: https://syzkaller.appspot.com/bug?extid=844c7bf1b1aa4119c5de
Reported-by: [email protected]
Signed-off-by: Tetsuo Handa <[email protected]>
Fixes: 3f2893d3c142986a ("Bluetooth: don't try to cancel uninitialized works at mgmt_index_removed()")
---
net/bluetooth/mgmt.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/bluetooth/mgmt.c b/net/bluetooth/mgmt.c
index 72e6595a71cc..3d1cd0666968 100644
--- a/net/bluetooth/mgmt.c
+++ b/net/bluetooth/mgmt.c
@@ -1050,7 +1050,7 @@ static void discov_off(struct work_struct *work)
static void mgmt_init_hdev(struct sock *sk, struct hci_dev *hdev)
{
- if (hci_dev_test_and_set_flag(hdev, HCI_MGMT))
+ if (hci_dev_test_flag(hdev, HCI_MGMT))
return;
BT_INFO("MGMT ver %d.%d", MGMT_VERSION, MGMT_REVISION);
@@ -1065,6 +1065,8 @@ static void mgmt_init_hdev(struct sock *sk, struct hci_dev *hdev)
* it
*/
hci_dev_clear_flag(hdev, HCI_BONDABLE);
+
+ hci_dev_set_flag(hdev, HCI_MGMT);
}
static int read_controller_info(struct sock *sk, struct hci_dev *hdev,
--
2.18.4
This is automated email and please do not reply to this email!
Dear submitter,
Thank you for submitting the patches to the linux bluetooth mailing list.
This is a CI test results with your patch series:
PW Link:https://patchwork.kernel.org/project/bluetooth/list/?series=676039
---Test result---
Test Summary:
CheckPatch PASS 1.76 seconds
GitLint PASS 0.95 seconds
SubjectPrefix PASS 0.82 seconds
BuildKernel PASS 34.17 seconds
BuildKernel32 PASS 30.91 seconds
Incremental Build with patchesPASS 43.79 seconds
TestRunner: Setup PASS 513.11 seconds
TestRunner: l2cap-tester PASS 17.32 seconds
TestRunner: iso-tester PASS 16.55 seconds
TestRunner: bnep-tester PASS 6.54 seconds
TestRunner: mgmt-tester PASS 103.20 seconds
TestRunner: rfcomm-tester PASS 10.20 seconds
TestRunner: sco-tester PASS 9.78 seconds
TestRunner: smp-tester PASS 9.87 seconds
TestRunner: userchan-tester PASS 6.86 seconds
---
Regards,
Linux Bluetooth
Hello:
This patch was applied to bluetooth/bluetooth-next.git (master)
by Luiz Augusto von Dentz <[email protected]>:
On Mon, 12 Sep 2022 01:21:42 +0900 you wrote:
> syzbot is again reporting attempt to cancel uninitialized work
> at mgmt_index_removed() [1], for setting of HCI_MGMT flag from
> mgmt_init_hdev() from hci_mgmt_cmd() from hci_sock_sendmsg() can
> race with testing of HCI_MGMT flag from mgmt_index_removed() from
> hci_sock_bind() due to lack of serialization via hci_dev_lock().
>
> Since mgmt_init_hdev() is called with mgmt_chan_list_lock held, we can
> safely split hci_dev_test_and_set_flag() into hci_dev_test_flag() and
> hci_dev_set_flag(). Thus, in order to close this race, set HCI_MGMT flag
> after INIT_DELAYED_WORK() completed.
>
> [...]
Here is the summary with links:
- Bluetooth: avoid hci_dev_test_and_set_flag() in mgmt_init_hdev()
https://git.kernel.org/bluetooth/bluetooth-next/c/f74ca25d6d66
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html