Hi Luiz,
ti, 2023-08-15 kello 15:41 -0700, Luiz Augusto von Dentz kirjoitti:
> Hi Pauli,
>
> On Mon, Aug 14, 2023 at 12:01 PM Luiz Augusto von Dentz
> <[email protected]> wrote:
> >
> > From: Luiz Augusto von Dentz <[email protected]>
> >
> > Use-after-free can occur in hci_disconnect_all_sync if a connection is
> > deleted by concurrent processing of a controller event.
> >
> > To prevent this the code now tries to iterate over the list backwards
> > to ensure the links are cleanup before its parents, also it no longer
> > relies on a cursor, instead it always uses the last element since
> > hci_abort_conn_sync is guaranteed to call hci_conn_del.
> >
> > UAF crash log:
> > ==================================================================
> > BUG: KASAN: slab-use-after-free in hci_set_powered_sync
> > (net/bluetooth/hci_sync.c:5424) [bluetooth]
> > Read of size 8 at addr ffff888009d9c000 by task kworker/u9:0/124
> >
> > CPU: 0 PID: 124 Comm: kworker/u9:0 Tainted: G W
> > 6.5.0-rc1+ #10
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > 1.16.2-1.fc38 04/01/2014
> > Workqueue: hci0 hci_cmd_sync_work [bluetooth]
> > Call Trace:
> > <TASK>
> > dump_stack_lvl+0x5b/0x90
> > print_report+0xcf/0x670
> > ? __virt_addr_valid+0xdd/0x160
> > ? hci_set_powered_sync+0x2c9/0x4a0 [bluetooth]
> > kasan_report+0xa6/0xe0
> > ? hci_set_powered_sync+0x2c9/0x4a0 [bluetooth]
> > ? __pfx_set_powered_sync+0x10/0x10 [bluetooth]
> > hci_set_powered_sync+0x2c9/0x4a0 [bluetooth]
> > ? __pfx_hci_set_powered_sync+0x10/0x10 [bluetooth]
> > ? __pfx_lock_release+0x10/0x10
> > ? __pfx_set_powered_sync+0x10/0x10 [bluetooth]
> > hci_cmd_sync_work+0x137/0x220 [bluetooth]
> > process_one_work+0x526/0x9d0
> > ? __pfx_process_one_work+0x10/0x10
> > ? __pfx_do_raw_spin_lock+0x10/0x10
> > ? mark_held_locks+0x1a/0x90
> > worker_thread+0x92/0x630
> > ? __pfx_worker_thread+0x10/0x10
> > kthread+0x196/0x1e0
> > ? __pfx_kthread+0x10/0x10
> > ret_from_fork+0x2c/0x50
> > </TASK>
> >
> > Allocated by task 1782:
> > kasan_save_stack+0x33/0x60
> > kasan_set_track+0x25/0x30
> > __kasan_kmalloc+0x8f/0xa0
> > hci_conn_add+0xa5/0xa80 [bluetooth]
> > hci_bind_cis+0x881/0x9b0 [bluetooth]
> > iso_connect_cis+0x121/0x520 [bluetooth]
> > iso_sock_connect+0x3f6/0x790 [bluetooth]
> > __sys_connect+0x109/0x130
> > __x64_sys_connect+0x40/0x50
> > do_syscall_64+0x60/0x90
> > entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> >
> > Freed by task 695:
> > kasan_save_stack+0x33/0x60
> > kasan_set_track+0x25/0x30
> > kasan_save_free_info+0x2b/0x50
> > __kasan_slab_free+0x10a/0x180
> > __kmem_cache_free+0x14d/0x2e0
> > device_release+0x5d/0xf0
> > kobject_put+0xdf/0x270
> > hci_disconn_complete_evt+0x274/0x3a0 [bluetooth]
> > hci_event_packet+0x579/0x7e0 [bluetooth]
> > hci_rx_work+0x287/0xaa0 [bluetooth]
> > process_one_work+0x526/0x9d0
> > worker_thread+0x92/0x630
> > kthread+0x196/0x1e0
> > ret_from_fork+0x2c/0x50
> > ==================================================================
> >
> > Fixes: 182ee45da083 ("Bluetooth: hci_sync: Rework hci_suspend_notifier")
> > Signed-off-by: Pauli Virtanen <[email protected]>
> > Signed-off-by: Luiz Augusto von Dentz <[email protected]>
> > ---
> > net/bluetooth/hci_sync.c | 55 +++++++++++++++++++++++++---------------
> > 1 file changed, 35 insertions(+), 20 deletions(-)
> >
> > diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
> > index 5eb30ba21370..d10a0f36b947 100644
> > --- a/net/bluetooth/hci_sync.c
> > +++ b/net/bluetooth/hci_sync.c
> > @@ -5370,6 +5370,7 @@ int hci_abort_conn_sync(struct hci_dev *hdev, struct hci_conn *conn, u8 reason)
> > {
> > int err = 0;
> > u16 handle = conn->handle;
> > + struct hci_conn *c;
> >
> > switch (conn->state) {
> > case BT_CONNECTED:
> > @@ -5389,43 +5390,57 @@ int hci_abort_conn_sync(struct hci_dev *hdev, struct hci_conn *conn, u8 reason)
> > hci_dev_unlock(hdev);
> > return 0;
> > default:
> > + hci_dev_lock(hdev);
> > conn->state = BT_CLOSED;
> > + hci_disconn_cfm(conn, reason);
> > + hci_conn_del(conn);
> > + hci_dev_unlock(hdev);
> > return 0;
> > }
> >
> > + hci_dev_lock(hdev);
> > +
> > + /* Check if the connection hasn't been cleanup while waiting
> > + * commands to complete.
> > + */
> > + c = hci_conn_hash_lookup_handle(hdev, handle);
> > + if (!c || c != conn) {
> > + err = 0;
> > + goto unlock;
> > + }
> > +
> > /* Cleanup hci_conn object if it cannot be cancelled as it
> > * likelly means the controller and host stack are out of sync
> > * or in case of LE it was still scanning so it can be cleanup
> > * safely.
> > */
> > - if (err) {
> > - struct hci_conn *c;
> > -
> > - /* Check if the connection hasn't been cleanup while waiting
> > - * commands to complete.
> > - */
> > - c = hci_conn_hash_lookup_handle(hdev, handle);
> > - if (!c || c != conn)
> > - return 0;
> > -
> > - hci_dev_lock(hdev);
> > - hci_conn_failed(conn, err);
> > - hci_dev_unlock(hdev);
> > - }
> > + hci_conn_failed(conn, reason);
> >
> > +unlock:
> > + hci_dev_unlock(hdev);
> > return err;
> > }
> >
> > static int hci_disconnect_all_sync(struct hci_dev *hdev, u8 reason)
> > {
> > - struct hci_conn *conn, *tmp;
> > - int err;
> > + struct list_head *head = &hdev->conn_hash.list;
> > + struct hci_conn *conn;
> >
> > - list_for_each_entry_safe(conn, tmp, &hdev->conn_hash.list, list) {
> > - err = hci_abort_conn_sync(hdev, conn, reason);
> > - if (err)
> > - return err;
> > + rcu_read_lock();
> > + while ((conn = list_first_or_null_rcu(head, struct hci_conn, list))) {
> > + /* Make sure the connection is not freed while unlocking */
> > + conn = hci_conn_get(conn);
> > + rcu_read_unlock();
> > + /* Disregard possible errors since hci_conn_del shall have been
> > + * called even in case of errors had occurred since it would
> > + * then cause hci_conn_failed to be called which calls
> > + * hci_conn_del internally.
> > + */
> > + hci_abort_conn_sync(hdev, conn, reason);
> > + hci_conn_put(conn);
> > + rcu_read_lock();
> > }
> > + rcu_read_unlock();
> >
> > return 0;
> > }
> > --
> > 2.41.0
>
> Any comments on this one, I actually took the time to add some tests
> to iso-tester to attempt to cover scenarios where
> hci_disconnect_all_sync is called whiled connecting/connected which
> seems to be working with these changes.
>
I don't have further comments. I tested it on the load that previously
generated KASAN crashes, and haven't seen any so far.
I guess the only question was if deleting conns in hdev->req_workqueue
could trigger some crash in hdev->workqueue processing not protected by
locks/refcount, but don't know a scenario how this would occur right
now.
--
Pauli Virtanen