2024-02-08 09:31:01

by syzbot

[permalink] [raw]
Subject: [syzbot] [bluetooth?] KASAN: slab-use-after-free Write in __hci_acl_create_connection_sync

Hello,

syzbot found the following issue on:

HEAD commit: b1d3a0e70c38 Add linux-next specific files for 20240208
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=15530be4180000
kernel config: https://syzkaller.appspot.com/x/.config?x=bb693ba195662a06
dashboard link: https://syzkaller.appspot.com/bug?extid=3f0a39be7a2035700868
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11d95147e80000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=107c2d8fe80000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/176a6b395bbe/disk-b1d3a0e7.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/02d7d46f81bd/vmlinux-b1d3a0e7.xz
kernel image: https://storage.googleapis.com/syzbot-assets/18a5a5030e19/bzImage-b1d3a0e7.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

==================================================================
BUG: KASAN: slab-use-after-free in __hci_acl_create_connection_sync+0x6ce/0x990 net/bluetooth/hci_sync.c:6518
Write of size 2 at addr ffff88806b80c036 by task kworker/u9:4/5087

CPU: 0 PID: 5087 Comm: kworker/u9:4 Not tainted 6.8.0-rc3-next-20240208-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
Workqueue: hci4 hci_cmd_sync_work
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
print_address_description mm/kasan/report.c:377 [inline]
print_report+0x169/0x550 mm/kasan/report.c:488
kasan_report+0x143/0x180 mm/kasan/report.c:601
__hci_acl_create_connection_sync+0x6ce/0x990 net/bluetooth/hci_sync.c:6518
hci_cmd_sync_work+0x22b/0x400 net/bluetooth/hci_sync.c:306
process_one_work kernel/workqueue.c:3102 [inline]
process_scheduled_works+0x9d7/0x1730 kernel/workqueue.c:3182
worker_thread+0x86d/0xd70 kernel/workqueue.c:3263
kthread+0x2f0/0x390 kernel/kthread.c:388
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:242
</TASK>

Allocated by task 6357:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
poison_kmalloc_redzone mm/kasan/common.c:370 [inline]
__kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:387
kasan_kmalloc include/linux/kasan.h:211 [inline]
kmalloc_trace+0x1d9/0x360 mm/slub.c:4013
kmalloc include/linux/slab.h:590 [inline]
kzalloc include/linux/slab.h:711 [inline]
hci_conn_add+0xc7/0x13a0 net/bluetooth/hci_conn.c:914
hci_conn_add_unset net/bluetooth/hci_conn.c:1016 [inline]
hci_connect_acl+0x15d/0x470 net/bluetooth/hci_conn.c:1632
hci_connect_sco+0x3f/0x350 net/bluetooth/hci_conn.c:1692
sco_connect net/bluetooth/sco.c:266 [inline]
sco_sock_connect+0x2ce/0x950 net/bluetooth/sco.c:591
__sys_connect_file net/socket.c:2048 [inline]
__sys_connect+0x2df/0x310 net/socket.c:2065
__do_sys_connect net/socket.c:2075 [inline]
__se_sys_connect net/socket.c:2072 [inline]
__x64_sys_connect+0x7a/0x90 net/socket.c:2072
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x6d/0x75

Freed by task 5087:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:586
poison_slab_object+0xa6/0xe0 mm/kasan/common.c:240
__kasan_slab_free+0x37/0x60 mm/kasan/common.c:256
kasan_slab_free include/linux/kasan.h:184 [inline]
slab_free_hook mm/slub.c:2122 [inline]
slab_free mm/slub.c:4296 [inline]
kfree+0x14a/0x380 mm/slub.c:4406
device_release+0x99/0x1c0
kobject_cleanup lib/kobject.c:682 [inline]
kobject_release lib/kobject.c:716 [inline]
kref_put include/linux/kref.h:65 [inline]
kobject_put+0x1f5/0x430 lib/kobject.c:733
hci_conn_cleanup net/bluetooth/hci_conn.c:176 [inline]
hci_conn_del+0x8f0/0xc70 net/bluetooth/hci_conn.c:1126
hci_abort_conn_sync+0x583/0xde0 net/bluetooth/hci_sync.c:5361
__hci_acl_create_connection_sync+0x60c/0x990 net/bluetooth/hci_sync.c:6554
hci_cmd_sync_work+0x22b/0x400 net/bluetooth/hci_sync.c:306
process_one_work kernel/workqueue.c:3102 [inline]
process_scheduled_works+0x9d7/0x1730 kernel/workqueue.c:3182
worker_thread+0x86d/0xd70 kernel/workqueue.c:3263
kthread+0x2f0/0x390 kernel/kthread.c:388
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:242

The buggy address belongs to the object at ffff88806b80c000
which belongs to the cache kmalloc-4k of size 4096
The buggy address is located 54 bytes inside of
freed 4096-byte region [ffff88806b80c000, ffff88806b80d000)

The buggy address belongs to the physical page:
page:ffffea0001ae0200 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x6b808
head:ffffea0001ae0200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0xfff80000000840(slab|head|node=0|zone=1|lastcpupid=0xfff)
page_type: 0xffffffff()
raw: 00fff80000000840 ffff888014c42140 ffffea000073c600 dead000000000002
raw: 0000000000000000 0000000000040004 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask 0x1d2040(__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL), pid 5397, tgid 5397 (syz-executor270), ts 117997642770, free_ts 25758092955
set_page_owner include/linux/page_owner.h:31 [inline]
post_alloc_hook+0x1ea/0x210 mm/page_alloc.c:1539
prep_new_page mm/page_alloc.c:1546 [inline]
get_page_from_freelist+0x34eb/0x3680 mm/page_alloc.c:3353
__alloc_pages+0x256/0x6a0 mm/page_alloc.c:4609
__alloc_pages_node include/linux/gfp.h:238 [inline]
alloc_pages_node include/linux/gfp.h:261 [inline]
alloc_slab_page+0x5f/0x160 mm/slub.c:2191
allocate_slab mm/slub.c:2354 [inline]
new_slab+0x84/0x2f0 mm/slub.c:2407
___slab_alloc+0xc73/0x1260 mm/slub.c:3541
__slab_alloc mm/slub.c:3626 [inline]
__slab_alloc_node mm/slub.c:3679 [inline]
slab_alloc_node mm/slub.c:3851 [inline]
__do_kmalloc_node mm/slub.c:3981 [inline]
__kmalloc+0x2e3/0x4a0 mm/slub.c:3995
kmalloc include/linux/slab.h:594 [inline]
tomoyo_realpath_from_path+0xcf/0x5e0 security/tomoyo/realpath.c:251
tomoyo_get_realpath security/tomoyo/file.c:151 [inline]
tomoyo_path_perm+0x2b7/0x740 security/tomoyo/file.c:822
tomoyo_path_symlink+0xde/0x120 security/tomoyo/tomoyo.c:212
security_path_symlink+0xe3/0x140 security/security.c:1870
do_symlinkat+0x136/0x3a0 fs/namei.c:4502
__do_sys_symlink fs/namei.c:4525 [inline]
__se_sys_symlink fs/namei.c:4523 [inline]
__x64_sys_symlink+0x7e/0x90 fs/namei.c:4523
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x6d/0x75
page last free pid 1 tgid 1 stack trace:
reset_page_owner include/linux/page_owner.h:24 [inline]
free_pages_prepare mm/page_alloc.c:1140 [inline]
free_unref_page_prepare+0x968/0xa90 mm/page_alloc.c:2388
free_unref_page+0x37/0x3f0 mm/page_alloc.c:2528
free_contig_range+0x9e/0x160 mm/page_alloc.c:6574
destroy_args+0x8a/0x890 mm/debug_vm_pgtable.c:1028
debug_vm_pgtable+0x4be/0x550 mm/debug_vm_pgtable.c:1408
do_one_initcall+0x238/0x830 init/main.c:1233
do_initcall_level+0x157/0x210 init/main.c:1295
do_initcalls+0x3f/0x80 init/main.c:1311
kernel_init_freeable+0x430/0x5d0 init/main.c:1542
kernel_init+0x1d/0x2b0 init/main.c:1432
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:242

Memory state around the buggy address:
ffff88806b80bf00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88806b80bf80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88806b80c000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88806b80c080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88806b80c100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup


2024-02-09 07:40:14

by Hillf Danton

[permalink] [raw]
Subject: Re: [syzbot] [bluetooth?] KASAN: slab-use-after-free Write in __hci_acl_create_connection_sync

On Thu, 08 Feb 2024 01:27:31 -0800
> HEAD commit: b1d3a0e70c38 Add linux-next specific files for 20240208
> git tree: linux-next
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=107c2d8fe80000

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master

--- x/net/bluetooth/hci_sync.c
+++ y/net/bluetooth/hci_sync.c
@@ -6553,12 +6553,14 @@ static int __hci_acl_create_connection_s
if (err == -ETIMEDOUT)
hci_abort_conn_sync(hdev, conn, HCI_ERROR_LOCAL_HOST_TERM);

+ hci_conn_put(conn);
return err;
}

int hci_acl_create_connection_sync(struct hci_dev *hdev,
struct hci_conn *conn)
{
+ hci_conn_get(conn);
return hci_cmd_sync_queue(hdev, __hci_acl_create_connection_sync,
conn, NULL);
}
--

2024-02-09 08:20:59

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] [bluetooth?] KASAN: slab-use-after-free Write in __hci_acl_create_connection_sync

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: [email protected]

Tested on:

commit: 445a555e Add linux-next specific files for 20240209
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
console output: https://syzkaller.appspot.com/x/log.txt?x=16e15042180000
kernel config: https://syzkaller.appspot.com/x/.config?x=85aa3388229f9ea9
dashboard link: https://syzkaller.appspot.com/bug?extid=3f0a39be7a2035700868
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=15b1e3a2180000

Note: testing is done by a robot and is best-effort only.

2024-02-13 01:13:15

by Jonas Dreßler

[permalink] [raw]
Subject: Re: [syzbot] [bluetooth?] KASAN: slab-use-after-free Write in __hci_acl_create_connection_sync

Hi Luiz,

On 09.02.24 14:36, Luiz Augusto von Dentz wrote:
> Hi Jonas,
>
> On Fri, Feb 9, 2024 at 7:37 AM Jonas Dreßler <[email protected]> wrote:
>>
>> Hi everyone!
>>
>> On 08.02.24 16:32, syzbot wrote:
>>> syzbot has bisected this issue to:
>>>
>>> commit 456561ba8e495e9320c1f304bf1cd3d1043cbe7b
>>> Author: Jonas Dreßler <[email protected]>
>>> Date: Tue Feb 6 11:08:13 2024 +0000
>>>
>>> Bluetooth: hci_conn: Only do ACL connections sequentially
>>>
>>> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=154f8550180000
>>> start commit: b1d3a0e70c38 Add linux-next specific files for 20240208
>>> git tree: linux-next
>>> final oops: https://syzkaller.appspot.com/x/report.txt?x=174f8550180000
>>> console output: https://syzkaller.appspot.com/x/log.txt?x=134f8550180000
>>> kernel config: https://syzkaller.appspot.com/x/.config?x=bb693ba195662a06
>>> dashboard link: https://syzkaller.appspot.com/bug?extid=3f0a39be7a2035700868
>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11d95147e80000
>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=107c2d8fe80000
>>>
>>> Reported-by: [email protected]
>>> Fixes: 456561ba8e49 ("Bluetooth: hci_conn: Only do ACL connections sequentially")
>>>
>>> For information about bisection process see: https://goo.gl/tpsmEJ#bisection
>>
>> Hmm, looking at the backtraces, I think the issue that the introduction of the
>> sequential connect has introduced another async case: In hci_connect_acl(), when
>> we call hci_acl_create_connection_sync(), the conn state no longer immediately
>> gets set to BT_CONNECT, but remains in BT_OPEN or BT_CLOSED until the hci_sync
>> queue actually executes __hci_acl_create_connection_sync().
>
> Need to double check but I think we do set BT_CONNECT in case of LE
> when it is queued so which shall prevent it to be queued multiple
> times.

Nope, we set it only from within the hci_sync callback, see
hci_connect_le_sync(). IMO that makes sense, because in
hci_abort_conn_sync(), we send a HCI_OP_CREATE_CONN_CANCEL in case
of conn->state == BT_CONNECT, and this OP we wouldn't want to send
while the command is still waiting in the queue.

>
>> This means that now hci_connect_acl() is happy to do multiple
>> hci_acl_create_connection_sync calls, and the hci_sync machinery will happily
>> execute them right after each other. Then the newly introduced hci_abort_conn_sync()
>> in __hci_acl_create_connection_sync() calls hci_conn_del() and frees the conn
>> object, so the second time we enter __hci_acl_create_connection_sync(),
>> things blow up.
>>
>> It looks to me like in theory the hci_connect_le_sync() logic is prone to a
>> similar issue, but in practice that's prohibited because in hci_connect_le_sync()
>> we lookup whether the conn object still exists and bail out if it doesn't.
>>
>> Even for LE though I think we can queue multiple hci_connect_le_sync() calls
>> and those will happily send HCI_OP_LE_CREATE_CONN no matter what the connection
>> state actually is?
>>
>> So assuming this analysis is correct, what do we do to fix this? It seems to me
>> that
>>
>> 1) we want a BT_CONNECT_QUEUED state for connections, so that the state
>> machine covers this additional stage that we have for ACL and LE connections now.
>
> BT_CONNECT already indicates that connection procedure is in progress.

I still think an additional state is necessary. Alternatively (and maybe
a lot nicer than the extra state) would be to add some functions to hci_sync
to check for and remove queued/ongoing commands, I'm thinking of a new

bool hci_cmd_sync_has(struct hci_dev *hdev, hci_cmd_sync_work_func_t func, void *data);
void hci_cmd_sync_cancel(struct hci_dev *hdev, hci_cmd_sync_work_func_t func, void *data);

I think if we had those, in addition to not needing the additional state,
we could also simplify the hci_abort_conn() code quite a bit and possibly
get rid of the passing of connection handles to hci_connect_le_sync().

I'll give that a try and propose a small patch tomorrow.

Cheers,
Jonas

>
>> 2) the conn object can still disappear while the __hci_acl_create_connection_sync()
>> is queued, so we need something like the "if conn doesn't exist anymore, bail out"
>> check from hci_connect_le_sync() in __hci_acl_create_connection_sync(), too.
>
> Btw, I'd probably clean up the connect function and create something
> like hci_connect/hci_connect_sync which takes care of the details
> internally like it was done to abort.
>
>> That said, the current check in hci_connect_le_sync() that's using the connection
>> handle to lookup the conn does not seem great, aren't these handles re-used
>> after connections are torn down?
>
> Well we could perhaps do a lookup by pointer to see if the connection
> hasn't been removed in the meantime, that said to force a clash on the
> handles it need to happen in between abort, which frees the handle,
> and connect, anyway the real culprit here is that we should be able to
> abort the cmd_sync callback like we do in LE:
>
> https://github.com/bluez/bluetooth-next/blob/master/net/bluetooth/hci_conn.c#L2943
>
> That way we stop the connect callback to run and don't have to worry
> about handle re-use.
>
>> Cheers,
>> Jonas
>
>
>