2023-06-23 17:35:54

by Pauli Virtanen

[permalink] [raw]
Subject: [PATCH RFC 0/5] hci_conn and ISO concurrency fixes

This series addresses additional KASAN errors seen in the same test case
as in the previous one. These are mostly due to hci_conn being deleted
at a bad time, or manipulated without necessary locks.

With this series, the test setup doesn't seem to be producing KASAN
crashes any more.

The test setup is still

while true; do bluetoothctl power on; sleep 12; bluetoothctl power off; sleep 1.5; bluetoothctl power off; sleep 2.5; done;
while true; do sudo systemctl restart bluetooth; sleep 110; done
while true; do systemctl --user restart pipewire wireplumber pipewire-pulse; sleep 91; done
while true; do paplay sample.flac & sleep 2; kill %1; sleep 0.7; done

and equivalent operations manually, on VM + connect to TWS earbuds,
and let it run until it hits a crash.

There's an RFC question here: it would seem useful to be able to keep
references to hci_conn around without RCU or other locks, and be able
to safely continue later if the conn is still around. I.e.

hci_conn_get(conn);
hci_dev_unlock(hdev);
...
hci_dev_lock(hdev);
if (!hci_conn_is_alive(hdev, conn)) {
hci_conn_put(conn);
goto bail_out;
}
hci_conn_put(conn);
...

The first commit here adds this function. It should also be RCU-correct
too, but I'll need to think that through a second time.

It seems in several parts in hci_sync.c it is assumed the conn is not
deleted while the code is blocking waiting for controller events. At
first sight it's not so clear that it's really guaranteed there can't be
UAF here, so I'm wondering if there would be a need to start polluting
hci_sync.c with locks and aliveness checks after waits.

Or if it's guaranteed by something not apparent and nothing needs to be
done, or if some other thing should be better (such as serializing
operations that delete hci_conn through hci_sync).

Pauli Virtanen (5):
Bluetooth: hci_conn: add hci_conn_is_alive
Bluetooth: hci_sync: iterate conn_hash safely in
hci_disconnect_all_sync
Bluetooth: hci_conn: hold ref while hci_connect_le_sync is pending
Bluetooth: ISO: fix use-after-free in __iso_sock_close
Bluetooth: ISO: fix locking in iso_conn_ready

include/net/bluetooth/hci_core.h | 18 ++++
net/bluetooth/hci_conn.c | 24 ++++--
net/bluetooth/hci_sync.c | 140 ++++++++++++++++++++++++++++---
net/bluetooth/iso.c | 69 +++++++++++----
4 files changed, 214 insertions(+), 37 deletions(-)

--
2.41.0



2023-06-23 17:36:05

by Pauli Virtanen

[permalink] [raw]
Subject: [PATCH RFC 5/5] Bluetooth: ISO: fix locking in iso_conn_ready

Getting conn->sk must sock_hold, otherwise the socket may be freed
concurrently. Access to conn->hcon is safe when holding hdev->lock.

Fix the locking in iso_conn_ready to obey this.

Signed-off-by: Pauli Virtanen <[email protected]>
---
net/bluetooth/iso.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/net/bluetooth/iso.c b/net/bluetooth/iso.c
index ea0209fb9872..c2045adbd7b6 100644
--- a/net/bluetooth/iso.c
+++ b/net/bluetooth/iso.c
@@ -179,6 +179,7 @@ static void iso_chan_del(struct sock *sk, int err)
sock_set_flag(sk, SOCK_ZAPPED);
}

+/* Must hold hdev->lock */
static void iso_conn_del(struct hci_conn *hcon, int err)
{
struct iso_conn *conn = hcon->iso_data;
@@ -1547,19 +1548,23 @@ static bool iso_match_big(struct sock *sk, void *data)
static void iso_conn_ready(struct iso_conn *conn)
{
struct sock *parent;
- struct sock *sk = conn->sk;
+ struct sock *sk;
struct hci_ev_le_big_sync_estabilished *ev;
struct hci_conn *hcon;

BT_DBG("conn %p", conn);

- if (sk) {
- iso_sock_ready(conn->sk);
- } else {
- hcon = conn->hcon;
- if (!hcon)
- return;
+ iso_conn_lock(conn);
+ hcon = conn->hcon;
+ sk = conn->sk;
+ if (sk)
+ sock_hold(sk);
+ iso_conn_unlock(conn);

+ if (sk) {
+ iso_sock_ready(sk);
+ sock_put(sk);
+ } else if (hcon) {
ev = hci_recv_event_data(hcon->hdev,
HCI_EVT_LE_BIG_SYNC_ESTABILISHED);
if (ev)
--
2.41.0


2023-06-23 17:37:04

by Pauli Virtanen

[permalink] [raw]
Subject: [PATCH RFC 1/5] Bluetooth: hci_conn: add hci_conn_is_alive

A delayed operation such as hci_sync on a given hci_conn needs to take
hci_conn_get, so that the hci_conn doesn't get freed in the meantime.
This does not guarantee the conn is still alive in a valid state, as it
may be cleaned up in the meantime, so one needs to check if it is still
in conn_hash to know if it's still alive.

Simplify this alive check, using HCI_CONN_DELETED flag. This is also
meaningful with RCU lock only, but with slightly different semantics.

If hci_conn_is_alive(conn) returns true inside rcu_read_lock, conn was
in conn_hash from the point of view of the current task when the flag
was read. Then its deletion cannot complete before rcu_read_unlock.

Signed-off-by: Pauli Virtanen <[email protected]>
---

Notes:
This probably can be done with RCU primitives setting list.prev, but
that's maybe more magical...

include/net/bluetooth/hci_core.h | 18 ++++++++++++++++++
net/bluetooth/hci_conn.c | 10 +---------
2 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
index 05a9b3ab3f56..cab39bdd0592 100644
--- a/include/net/bluetooth/hci_core.h
+++ b/include/net/bluetooth/hci_core.h
@@ -978,6 +978,7 @@ enum {
HCI_CONN_PER_ADV,
HCI_CONN_BIG_CREATED,
HCI_CONN_CREATE_CIS,
+ HCI_CONN_DELETED,
};

static inline bool hci_conn_ssp_enabled(struct hci_conn *conn)
@@ -997,6 +998,7 @@ static inline bool hci_conn_sc_enabled(struct hci_conn *conn)
static inline void hci_conn_hash_add(struct hci_dev *hdev, struct hci_conn *c)
{
struct hci_conn_hash *h = &hdev->conn_hash;
+ WARN_ON(test_bit(HCI_CONN_DELETED, &c->flags));
list_add_tail_rcu(&c->list, &h->list);
switch (c->type) {
case ACL_LINK:
@@ -1023,6 +1025,10 @@ static inline void hci_conn_hash_add(struct hci_dev *hdev, struct hci_conn *c)
static inline void hci_conn_hash_del(struct hci_dev *hdev, struct hci_conn *c)
{
struct hci_conn_hash *h = &hdev->conn_hash;
+ bool deleted;
+
+ deleted = test_and_set_bit(HCI_CONN_DELETED, &c->flags);
+ WARN_ON(deleted);

list_del_rcu(&c->list);
synchronize_rcu();
@@ -1049,6 +1055,18 @@ static inline void hci_conn_hash_del(struct hci_dev *hdev, struct hci_conn *c)
}
}

+/* With hdev->lock: whether hci_conn is in conn_hash.
+ * With RCU: if true, the hci_conn is valid conn_hash iteration cursor and
+ * hci_conn_hash_del has not completed. (Note that if hci_conn was obtained in
+ * this critical section it is always valid, but this may return false!)
+ */
+static inline bool hci_conn_is_alive(struct hci_dev *hdev, struct hci_conn *c)
+{
+ RCU_LOCKDEP_WARN(lockdep_is_held(&hdev->lock) || rcu_read_lock_held(),
+ "suspicious locking");
+ return !test_bit(HCI_CONN_DELETED, &c->flags);
+}
+
static inline unsigned int hci_conn_num(struct hci_dev *hdev, __u8 type)
{
struct hci_conn_hash *h = &hdev->conn_hash;
diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c
index 62a7ccfdfe63..d489a4829be7 100644
--- a/net/bluetooth/hci_conn.c
+++ b/net/bluetooth/hci_conn.c
@@ -183,21 +183,13 @@ static void le_scan_cleanup(struct work_struct *work)
struct hci_conn *conn = container_of(work, struct hci_conn,
le_scan_cleanup);
struct hci_dev *hdev = conn->hdev;
- struct hci_conn *c = NULL;

BT_DBG("%s hcon %p", hdev->name, conn);

hci_dev_lock(hdev);

/* Check that the hci_conn is still around */
- rcu_read_lock();
- list_for_each_entry_rcu(c, &hdev->conn_hash.list, list) {
- if (c == conn)
- break;
- }
- rcu_read_unlock();
-
- if (c == conn) {
+ if (hci_conn_is_alive(hdev, conn)) {
hci_connect_le_scan_cleanup(conn, 0x00);
hci_conn_cleanup(conn);
}
--
2.41.0


2023-06-23 17:48:13

by Pauli Virtanen

[permalink] [raw]
Subject: [PATCH RFC 4/5] Bluetooth: ISO: fix use-after-free in __iso_sock_close

__iso_sock_close calls hci_conn_del without hci_dev_lock, which is
invalid and results to use-after-free under suitable conditions. But
the lock cannot be taken here because of lock ordering hci_dev_lock >
lock_sock > iso_conn_lock.

Instead, take hci_conn_get temporarily, and delete the connection
after lock_sock is released later on, if it is still alive.

Also do iso_conn_del, so that we don't leak the iso_conn.

Log:
==================================================================
hci_chan_list_flush:2769: hcon ffff8880356b9000
hci_le_remove_cig:952: hci0: handle 0x00
hci_dev_put:1487: hci0 orig refcnt 9
iso_chan_del:156: sk ffff888010918800, conn ffff888004daee00, err 104
iso_chan_del:156: sk ffff888010918800, conn 0000000000000000, err 103
BUG: KASAN: slab-use-after-free in iso_conn_del (net/bluetooth/iso.c:211) bluetooth
Write of size 8 at addr ffff8880356b99f0 by task kworker/u5:1/1073

Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014
Workqueue: hci0 hci_cmd_sync_work [bluetooth]
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:108)
print_report (mm/kasan/report.c:352 mm/kasan/report.c:462)
? __virt_addr_valid (./include/linux/mmzone.h:1901 ./include/linux/mmzone.h:1997 arch/x86/mm/physaddr.c:65)
? iso_conn_del (net/bluetooth/iso.c:211) bluetooth
kasan_report (mm/kasan/report.c:574)
? iso_conn_del (net/bluetooth/iso.c:211) bluetooth
iso_conn_del (net/bluetooth/iso.c:211) bluetooth
hci_conn_hash_flush (./include/net/bluetooth/hci_core.h:1872 net/bluetooth/hci_conn.c:2576) bluetooth
hci_dev_close_sync (net/bluetooth/hci_sync.c:5043) bluetooth
? __pfx_hci_dev_close_sync (net/bluetooth/hci_sync.c:4974) bluetooth
hci_set_powered_sync (net/bluetooth/hci_sync.c:5433 net/bluetooth/hci_sync.c:5441) bluetooth
? __pfx_hci_set_powered_sync (net/bluetooth/hci_sync.c:5437) bluetooth
? set_powered_sync (net/bluetooth/mgmt.c:1369) bluetooth
? __pfx_set_powered_sync (net/bluetooth/mgmt.c:1367) bluetooth
hci_cmd_sync_work (net/bluetooth/hci_sync.c:306) bluetooth
process_one_work (kernel/workqueue.c:2410)
? __pfx_process_one_work (kernel/workqueue.c:2300)
? __pfx_do_raw_spin_lock (kernel/locking/spinlock_debug.c:113)
? mark_held_locks (kernel/locking/lockdep.c:4240)
worker_thread (./include/linux/list.h:292 kernel/workqueue.c:2553)
? __pfx_worker_thread (kernel/workqueue.c:2495)
kthread (kernel/kthread.c:379)
? __pfx_kthread (kernel/kthread.c:332)
ret_from_fork (arch/x86/entry/entry_64.S:314)
</TASK>

Allocated by task 2372:
kasan_save_stack (mm/kasan/common.c:46)
kasan_set_track (mm/kasan/common.c:52)
__kasan_kmalloc (mm/kasan/common.c:374 mm/kasan/common.c:383)
hci_conn_add (./include/linux/slab.h:559 ./include/linux/slab.h:680 net/bluetooth/hci_conn.c:1002) bluetooth
hci_bind_cis (net/bluetooth/hci_conn.c:1920) bluetooth
iso_connect_cis (net/bluetooth/iso.c:383) bluetooth
iso_sock_connect (net/bluetooth/iso.c:890) bluetooth
__sys_connect (./include/linux/file.h:44 net/socket.c:2021)
__x64_sys_connect (net/socket.c:2027)
do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)

Freed by task 2372:
kasan_save_stack (mm/kasan/common.c:46)
kasan_set_track (mm/kasan/common.c:52)
kasan_save_free_info (mm/kasan/generic.c:523)
__kasan_slab_free (mm/kasan/common.c:238 mm/kasan/common.c:200 mm/kasan/common.c:244)
__kmem_cache_free (mm/slub.c:1807 mm/slub.c:3786 mm/slub.c:3799)
device_release (drivers/base/core.c:2489)
kobject_put (lib/kobject.c:683 lib/kobject.c:714 ./include/linux/kref.h:65 lib/kobject.c:731)
__iso_sock_close (net/bluetooth/iso.c:665) bluetooth
iso_sock_release (net/bluetooth/iso.c:686 net/bluetooth/iso.c:1473) bluetooth
__sock_release (net/socket.c:654)
sock_close (net/socket.c:1399)
__fput (fs/file_table.c:322)
task_work_run (kernel/task_work.c:180)
exit_to_user_mode_prepare (./include/linux/resume_user_mode.h:49 kernel/entry/common.c:171 kernel/entry/common.c:204)
syscall_exit_to_user_mode (kernel/entry/common.c:130 kernel/entry/common.c:299)
do_syscall_64 (arch/x86/entry/common.c:87)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)

The buggy address belongs to the object at ffff8880356b9000
which belongs to the cache kmalloc-4k of size 4096
The buggy address is located 2544 bytes inside of
freed 4096-byte region [ffff8880356b9000, ffff8880356ba000)
==================================================================

Fixes: ccf74f2390d6 ("Bluetooth: Add BTPROTO_ISO socket type")
Signed-off-by: Pauli Virtanen <[email protected]>
---
net/bluetooth/iso.c | 50 +++++++++++++++++++++++++++++++++++++--------
1 file changed, 42 insertions(+), 8 deletions(-)

diff --git a/net/bluetooth/iso.c b/net/bluetooth/iso.c
index 84d238d0639a..ea0209fb9872 100644
--- a/net/bluetooth/iso.c
+++ b/net/bluetooth/iso.c
@@ -626,7 +626,38 @@ static void iso_conn_defer_reject(struct hci_conn *conn)
hci_send_cmd(conn->hdev, HCI_OP_LE_REJECT_CIS, sizeof(cp), &cp);
}

-static void __iso_sock_close(struct sock *sk)
+static void iso_conn_del_hci_conn(struct iso_conn *conn, struct hci_conn **del)
+{
+ /* Lock ordering forbids taking hdev->lock, postpone hci_conn_del */
+ iso_conn_lock(conn);
+ if (conn->hcon) {
+ hci_conn_get(conn->hcon);
+ hci_dev_hold(conn->hcon->hdev);
+ *del = conn->hcon;
+ conn->hcon = NULL;
+ }
+ iso_conn_unlock(conn);
+}
+
+static void iso_conn_del_hci_conn_finish(struct hci_conn *hcon)
+{
+ struct hci_dev *hdev;
+
+ if (!hcon)
+ return;
+
+ hdev = hcon->hdev;
+ hci_dev_lock(hdev);
+ if (hci_conn_is_alive(hdev, hcon)) {
+ iso_conn_del(hcon, ECONNRESET);
+ hci_conn_del(hcon);
+ }
+ hci_dev_unlock(hdev);
+ hci_dev_put(hdev);
+ hci_conn_put(hcon);
+}
+
+static void __iso_sock_close(struct sock *sk, struct hci_conn **del_conn)
{
BT_DBG("sk %p state %d socket %p", sk, sk->sk_state, sk->sk_socket);

@@ -659,11 +690,8 @@ static void __iso_sock_close(struct sock *sk)
* needs to be removed so just call hci_conn_del so the cleanup
* callback do what is needed.
*/
- if (test_bit(BT_SK_DEFER_SETUP, &bt_sk(sk)->flags) &&
- iso_pi(sk)->conn->hcon) {
- hci_conn_del(iso_pi(sk)->conn->hcon);
- iso_pi(sk)->conn->hcon = NULL;
- }
+ if (test_bit(BT_SK_DEFER_SETUP, &bt_sk(sk)->flags))
+ iso_conn_del_hci_conn(iso_pi(sk)->conn, del_conn);

iso_chan_del(sk, ECONNRESET);
break;
@@ -680,11 +708,14 @@ static void __iso_sock_close(struct sock *sk)
/* Must be called on unlocked socket. */
static void iso_sock_close(struct sock *sk)
{
+ struct hci_conn *del_conn = NULL;
+
iso_sock_clear_timer(sk);
lock_sock(sk);
- __iso_sock_close(sk);
+ __iso_sock_close(sk, &del_conn);
release_sock(sk);
iso_sock_kill(sk);
+ iso_conn_del_hci_conn_finish(del_conn);
}

static void iso_sock_init(struct sock *sk, struct sock *parent)
@@ -1418,6 +1449,7 @@ static int iso_sock_getsockopt(struct socket *sock, int level, int optname,
static int iso_sock_shutdown(struct socket *sock, int how)
{
struct sock *sk = sock->sk;
+ struct hci_conn *del_conn = NULL;
int err = 0;

BT_DBG("sock %p, sk %p, how %d", sock, sk, how);
@@ -1447,7 +1479,7 @@ static int iso_sock_shutdown(struct socket *sock, int how)
}

iso_sock_clear_timer(sk);
- __iso_sock_close(sk);
+ __iso_sock_close(sk, &del_conn);

if (sock_flag(sk, SOCK_LINGER) && sk->sk_lingertime &&
!(current->flags & PF_EXITING))
@@ -1457,6 +1489,8 @@ static int iso_sock_shutdown(struct socket *sock, int how)
release_sock(sk);
sock_put(sk);

+ iso_conn_del_hci_conn_finish(del_conn);
+
return err;
}

--
2.41.0


2023-06-23 18:02:14

by bluez.test.bot

[permalink] [raw]
Subject: RE: hci_conn and ISO concurrency fixes

This is automated email and please do not reply to this email!

Dear submitter,

Thank you for submitting the patches to the linux bluetooth mailing list.
This is a CI test results with your patch series:
PW Link:https://patchwork.kernel.org/project/bluetooth/list/?series=759866

---Test result---

Test Summary:
CheckPatch FAIL 4.21 seconds
GitLint FAIL 2.10 seconds
SubjectPrefix PASS 0.68 seconds
BuildKernel PASS 33.21 seconds
CheckAllWarning PASS 35.89 seconds
CheckSparse PASS 40.89 seconds
CheckSmatch PASS 113.98 seconds
BuildKernel32 PASS 31.83 seconds
TestRunnerSetup PASS 450.74 seconds
TestRunner_l2cap-tester PASS 16.89 seconds
TestRunner_iso-tester FAIL 22.89 seconds
TestRunner_bnep-tester PASS 5.40 seconds
TestRunner_mgmt-tester PASS 128.09 seconds
TestRunner_rfcomm-tester PASS 8.59 seconds
TestRunner_sco-tester PASS 7.93 seconds
TestRunner_ioctl-tester PASS 9.29 seconds
TestRunner_mesh-tester PASS 6.77 seconds
TestRunner_smp-tester PASS 7.89 seconds
TestRunner_userchan-tester PASS 5.66 seconds
IncrementalBuild PASS 51.76 seconds

Details
##############################
Test: CheckPatch - FAIL
Desc: Run checkpatch.pl script
Output:
[RFC,2/5] Bluetooth: hci_sync: iterate conn_hash safely in hci_disconnect_all_sync
WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#102:
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014

total: 0 errors, 1 warnings, 0 checks, 180 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
mechanically convert to the typical style using --fix or --fix-inplace.

/github/workspace/src/src/13290840.patch has style problems, please review.

NOTE: Ignored message types: UNKNOWN_COMMIT_ID

NOTE: If any of the errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.


[RFC,3/5] Bluetooth: hci_conn: hold ref while hci_connect_le_sync is pending
WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#101:
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014

total: 0 errors, 1 warnings, 0 checks, 41 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
mechanically convert to the typical style using --fix or --fix-inplace.

/github/workspace/src/src/13290841.patch has style problems, please review.

NOTE: Ignored message types: UNKNOWN_COMMIT_ID

NOTE: If any of the errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.


[RFC,4/5] Bluetooth: ISO: fix use-after-free in __iso_sock_close
WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#106:
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014

total: 0 errors, 1 warnings, 0 checks, 90 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
mechanically convert to the typical style using --fix or --fix-inplace.

/github/workspace/src/src/13290842.patch has style problems, please review.

NOTE: Ignored message types: UNKNOWN_COMMIT_ID

NOTE: If any of the errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.


##############################
Test: GitLint - FAIL
Desc: Run gitlint
Output:
[RFC,2/5] Bluetooth: hci_sync: iterate conn_hash safely in hci_disconnect_all_sync

WARNING: I3 - ignore-body-lines: gitlint will be switching from using Python regex 'match' (match beginning) to 'search' (match anywhere) semantics. Please review your ignore-body-lines.regex option accordingly. To remove this warning, set general.regex-style-search=True. More details: https://jorisroovers.github.io/gitlint/configuration/#regex-style-search
1: T1 Title exceeds max length (82>80): "[RFC,2/5] Bluetooth: hci_sync: iterate conn_hash safely in hci_disconnect_all_sync"
16: B1 Line exceeds max length (157>80): "BUG: KASAN: slab-use-after-free in hci_set_powered_sync (net/bluetooth/hci_sync.c:5345 net/bluetooth/hci_sync.c:5385 net/bluetooth/hci_sync.c:5397) bluetooth"
19: B1 Line exceeds max length (81>80): "Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014"
25: B1 Line exceeds max length (107>80): "? __virt_addr_valid (./include/linux/mmzone.h:1901 ./include/linux/mmzone.h:1997 arch/x86/mm/physaddr.c:65)"
26: B1 Line exceeds max length (124>80): "? hci_set_powered_sync (net/bluetooth/hci_sync.c:5345 net/bluetooth/hci_sync.c:5385 net/bluetooth/hci_sync.c:5397) bluetooth"
28: B1 Line exceeds max length (124>80): "? hci_set_powered_sync (net/bluetooth/hci_sync.c:5345 net/bluetooth/hci_sync.c:5385 net/bluetooth/hci_sync.c:5397) bluetooth"
29: B1 Line exceeds max length (122>80): "hci_set_powered_sync (net/bluetooth/hci_sync.c:5345 net/bluetooth/hci_sync.c:5385 net/bluetooth/hci_sync.c:5397) bluetooth"
49: B1 Line exceeds max length (108>80): "hci_conn_add (./include/linux/slab.h:559 ./include/linux/slab.h:680 net/bluetooth/hci_conn.c:1002) bluetooth"
62: B1 Line exceeds max length (85>80): "__kasan_slab_free (mm/kasan/common.c:238 mm/kasan/common.c:200 mm/kasan/common.c:244)"
65: B1 Line exceeds max length (93>80): "kobject_put (lib/kobject.c:683 lib/kobject.c:714 ./include/linux/kref.h:65 lib/kobject.c:731)"
72: B1 Line exceeds max length (117>80): "exit_to_user_mode_prepare (./include/linux/resume_user_mode.h:49 kernel/entry/common.c:171 kernel/entry/common.c:204)"
[RFC,3/5] Bluetooth: hci_conn: hold ref while hci_connect_le_sync is pending

WARNING: I3 - ignore-body-lines: gitlint will be switching from using Python regex 'match' (match beginning) to 'search' (match anywhere) semantics. Please review your ignore-body-lines.regex option accordingly. To remove this warning, set general.regex-style-search=True. More details: https://jorisroovers.github.io/gitlint/configuration/#regex-style-search
15: B1 Line exceeds max length (100>80): "BUG: KASAN: slab-use-after-free in hci_le_create_conn_sync (net/bluetooth/hci_sync.c:6167) bluetooth"
18: B1 Line exceeds max length (81>80): "Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014"
24: B1 Line exceeds max length (107>80): "? __virt_addr_valid (./include/linux/mmzone.h:1901 ./include/linux/mmzone.h:1997 arch/x86/mm/physaddr.c:65)"
52: B1 Line exceeds max length (108>80): "hci_conn_add (./include/linux/slab.h:559 ./include/linux/slab.h:680 net/bluetooth/hci_conn.c:1002) bluetooth"
54: B1 Line exceeds max length (104>80): "process_adv_report.constprop.0 (net/bluetooth/hci_event.c:6172 net/bluetooth/hci_event.c:6293) bluetooth"
56: B1 Line exceeds max length (90>80): "hci_event_packet (net/bluetooth/hci_event.c:7515 net/bluetooth/hci_event.c:7570) bluetooth"
67: B1 Line exceeds max length (85>80): "__kasan_slab_free (mm/kasan/common.c:238 mm/kasan/common.c:200 mm/kasan/common.c:244)"
70: B1 Line exceeds max length (93>80): "kobject_put (lib/kobject.c:683 lib/kobject.c:714 ./include/linux/kref.h:65 lib/kobject.c:731)"
[RFC,4/5] Bluetooth: ISO: fix use-after-free in __iso_sock_close

WARNING: I3 - ignore-body-lines: gitlint will be switching from using Python regex 'match' (match beginning) to 'search' (match anywhere) semantics. Please review your ignore-body-lines.regex option accordingly. To remove this warning, set general.regex-style-search=True. More details: https://jorisroovers.github.io/gitlint/configuration/#regex-style-search
20: B1 Line exceeds max length (83>80): "BUG: KASAN: slab-use-after-free in iso_conn_del (net/bluetooth/iso.c:211) bluetooth"
23: B1 Line exceeds max length (81>80): "Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014"
29: B1 Line exceeds max length (107>80): "? __virt_addr_valid (./include/linux/mmzone.h:1901 ./include/linux/mmzone.h:1997 arch/x86/mm/physaddr.c:65)"
34: B1 Line exceeds max length (101>80): "hci_conn_hash_flush (./include/net/bluetooth/hci_core.h:1872 net/bluetooth/hci_conn.c:2576) bluetooth"
37: B1 Line exceeds max length (92>80): "hci_set_powered_sync (net/bluetooth/hci_sync.c:5433 net/bluetooth/hci_sync.c:5441) bluetooth"
57: B1 Line exceeds max length (108>80): "hci_conn_add (./include/linux/slab.h:559 ./include/linux/slab.h:680 net/bluetooth/hci_conn.c:1002) bluetooth"
70: B1 Line exceeds max length (85>80): "__kasan_slab_free (mm/kasan/common.c:238 mm/kasan/common.c:200 mm/kasan/common.c:244)"
73: B1 Line exceeds max length (93>80): "kobject_put (lib/kobject.c:683 lib/kobject.c:714 ./include/linux/kref.h:65 lib/kobject.c:731)"
80: B1 Line exceeds max length (117>80): "exit_to_user_mode_prepare (./include/linux/resume_user_mode.h:49 kernel/entry/common.c:171 kernel/entry/common.c:204)"
##############################
Test: TestRunner_iso-tester - FAIL
Desc: Run iso-tester with test-runner
Output:
Total: 80, Passed: 76 (95.0%), Failed: 4, Not Run: 0

Failed Test Cases
ISO Receive - Success Failed 0.192 seconds
ISO Receive Timestamped - Success Failed 0.181 seconds
ISO Defer Receive - Success Failed 0.183 seconds
ISO 48_2_1 Defer Receive - Success Failed 0.181 seconds


---
Regards,
Linux Bluetooth

2023-06-23 19:58:07

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: [PATCH RFC 1/5] Bluetooth: hci_conn: add hci_conn_is_alive

Hi Pauli,

On Fri, Jun 23, 2023 at 10:37 AM Pauli Virtanen <[email protected]> wrote:
>
> A delayed operation such as hci_sync on a given hci_conn needs to take
> hci_conn_get, so that the hci_conn doesn't get freed in the meantime.
> This does not guarantee the conn is still alive in a valid state, as it
> may be cleaned up in the meantime, so one needs to check if it is still
> in conn_hash to know if it's still alive.
>
> Simplify this alive check, using HCI_CONN_DELETED flag. This is also
> meaningful with RCU lock only, but with slightly different semantics.
>
> If hci_conn_is_alive(conn) returns true inside rcu_read_lock, conn was
> in conn_hash from the point of view of the current task when the flag
> was read. Then its deletion cannot complete before rcu_read_unlock.
>
> Signed-off-by: Pauli Virtanen <[email protected]>
> ---
>
> Notes:
> This probably can be done with RCU primitives setting list.prev, but
> that's maybe more magical...
>
> include/net/bluetooth/hci_core.h | 18 ++++++++++++++++++
> net/bluetooth/hci_conn.c | 10 +---------
> 2 files changed, 19 insertions(+), 9 deletions(-)
>
> diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
> index 05a9b3ab3f56..cab39bdd0592 100644
> --- a/include/net/bluetooth/hci_core.h
> +++ b/include/net/bluetooth/hci_core.h
> @@ -978,6 +978,7 @@ enum {
> HCI_CONN_PER_ADV,
> HCI_CONN_BIG_CREATED,
> HCI_CONN_CREATE_CIS,
> + HCI_CONN_DELETED,
> };
>
> static inline bool hci_conn_ssp_enabled(struct hci_conn *conn)
> @@ -997,6 +998,7 @@ static inline bool hci_conn_sc_enabled(struct hci_conn *conn)
> static inline void hci_conn_hash_add(struct hci_dev *hdev, struct hci_conn *c)
> {
> struct hci_conn_hash *h = &hdev->conn_hash;
> + WARN_ON(test_bit(HCI_CONN_DELETED, &c->flags));
> list_add_tail_rcu(&c->list, &h->list);
> switch (c->type) {
> case ACL_LINK:
> @@ -1023,6 +1025,10 @@ static inline void hci_conn_hash_add(struct hci_dev *hdev, struct hci_conn *c)
> static inline void hci_conn_hash_del(struct hci_dev *hdev, struct hci_conn *c)
> {
> struct hci_conn_hash *h = &hdev->conn_hash;
> + bool deleted;
> +
> + deleted = test_and_set_bit(HCI_CONN_DELETED, &c->flags);
> + WARN_ON(deleted);
>
> list_del_rcu(&c->list);
> synchronize_rcu();
> @@ -1049,6 +1055,18 @@ static inline void hci_conn_hash_del(struct hci_dev *hdev, struct hci_conn *c)
> }
> }
>
> +/* With hdev->lock: whether hci_conn is in conn_hash.
> + * With RCU: if true, the hci_conn is valid conn_hash iteration cursor and
> + * hci_conn_hash_del has not completed. (Note that if hci_conn was obtained in
> + * this critical section it is always valid, but this may return false!)
> + */
> +static inline bool hci_conn_is_alive(struct hci_dev *hdev, struct hci_conn *c)
> +{
> + RCU_LOCKDEP_WARN(lockdep_is_held(&hdev->lock) || rcu_read_lock_held(),
> + "suspicious locking");
> + return !test_bit(HCI_CONN_DELETED, &c->flags);
> +}

I think we are better off doing something like
hci_conn_hold_unless_zero like we do in l2cap_chan_hold_unless_zero,
that said we need to check if the hci_conn_drop can still set the ref
below zero, anyway that is probably a bug in itself and we should
probably WARN_ON if that happens.

> static inline unsigned int hci_conn_num(struct hci_dev *hdev, __u8 type)
> {
> struct hci_conn_hash *h = &hdev->conn_hash;
> diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c
> index 62a7ccfdfe63..d489a4829be7 100644
> --- a/net/bluetooth/hci_conn.c
> +++ b/net/bluetooth/hci_conn.c
> @@ -183,21 +183,13 @@ static void le_scan_cleanup(struct work_struct *work)
> struct hci_conn *conn = container_of(work, struct hci_conn,
> le_scan_cleanup);
> struct hci_dev *hdev = conn->hdev;
> - struct hci_conn *c = NULL;
>
> BT_DBG("%s hcon %p", hdev->name, conn);
>
> hci_dev_lock(hdev);
>
> /* Check that the hci_conn is still around */
> - rcu_read_lock();
> - list_for_each_entry_rcu(c, &hdev->conn_hash.list, list) {
> - if (c == conn)
> - break;
> - }
> - rcu_read_unlock();
> -
> - if (c == conn) {
> + if (hci_conn_is_alive(hdev, conn)) {

Hmm, I don't think this is safe, except if we are doing hci_conn_get
we can't really access the conn pointer since it may be freed already,
anyway this is sort of broken already given that we do access
conn->hdev already.

> hci_connect_le_scan_cleanup(conn, 0x00);
> hci_conn_cleanup(conn);
> }
> --
> 2.41.0
>


--
Luiz Augusto von Dentz

2023-06-23 22:29:23

by Pauli Virtanen

[permalink] [raw]
Subject: Re: [PATCH RFC 1/5] Bluetooth: hci_conn: add hci_conn_is_alive

Hi Luiz,

pe, 2023-06-23 kello 12:39 -0700, Luiz Augusto von Dentz kirjoitti:
> On Fri, Jun 23, 2023 at 10:37 AM Pauli Virtanen <[email protected]> wrote:
> >
> > A delayed operation such as hci_sync on a given hci_conn needs to take
> > hci_conn_get, so that the hci_conn doesn't get freed in the meantime.
> > This does not guarantee the conn is still alive in a valid state, as it
> > may be cleaned up in the meantime, so one needs to check if it is still
> > in conn_hash to know if it's still alive.
> >
> > Simplify this alive check, using HCI_CONN_DELETED flag. This is also
> > meaningful with RCU lock only, but with slightly different semantics.
> >
> > If hci_conn_is_alive(conn) returns true inside rcu_read_lock, conn was
> > in conn_hash from the point of view of the current task when the flag
> > was read. Then its deletion cannot complete before rcu_read_unlock.
> >
> > Signed-off-by: Pauli Virtanen <[email protected]>
> > ---
> >
> > Notes:
> > This probably can be done with RCU primitives setting list.prev, but
> > that's maybe more magical...
> >
> > include/net/bluetooth/hci_core.h | 18 ++++++++++++++++++
> > net/bluetooth/hci_conn.c | 10 +---------
> > 2 files changed, 19 insertions(+), 9 deletions(-)
> >
> > diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
> > index 05a9b3ab3f56..cab39bdd0592 100644
> > --- a/include/net/bluetooth/hci_core.h
> > +++ b/include/net/bluetooth/hci_core.h
> > @@ -978,6 +978,7 @@ enum {
> > HCI_CONN_PER_ADV,
> > HCI_CONN_BIG_CREATED,
> > HCI_CONN_CREATE_CIS,
> > + HCI_CONN_DELETED,
> > };
> >
> > static inline bool hci_conn_ssp_enabled(struct hci_conn *conn)
> > @@ -997,6 +998,7 @@ static inline bool hci_conn_sc_enabled(struct hci_conn *conn)
> > static inline void hci_conn_hash_add(struct hci_dev *hdev, struct hci_conn *c)
> > {
> > struct hci_conn_hash *h = &hdev->conn_hash;
> > + WARN_ON(test_bit(HCI_CONN_DELETED, &c->flags));
> > list_add_tail_rcu(&c->list, &h->list);
> > switch (c->type) {
> > case ACL_LINK:
> > @@ -1023,6 +1025,10 @@ static inline void hci_conn_hash_add(struct hci_dev *hdev, struct hci_conn *c)
> > static inline void hci_conn_hash_del(struct hci_dev *hdev, struct hci_conn *c)
> > {
> > struct hci_conn_hash *h = &hdev->conn_hash;
> > + bool deleted;
> > +
> > + deleted = test_and_set_bit(HCI_CONN_DELETED, &c->flags);
> > + WARN_ON(deleted);
> >
> > list_del_rcu(&c->list);
> > synchronize_rcu();
> > @@ -1049,6 +1055,18 @@ static inline void hci_conn_hash_del(struct hci_dev *hdev, struct hci_conn *c)
> > }
> > }
> >
> > +/* With hdev->lock: whether hci_conn is in conn_hash.
> > + * With RCU: if true, the hci_conn is valid conn_hash iteration cursor and
> > + * hci_conn_hash_del has not completed. (Note that if hci_conn was obtained in
> > + * this critical section it is always valid, but this may return false!)
> > + */
> > +static inline bool hci_conn_is_alive(struct hci_dev *hdev, struct hci_conn *c)
> > +{
> > + RCU_LOCKDEP_WARN(lockdep_is_held(&hdev->lock) || rcu_read_lock_held(),
> > + "suspicious locking");
> > + return !test_bit(HCI_CONN_DELETED, &c->flags);
> > +}
>
> I think we are better off doing something like
> hci_conn_hold_unless_zero like we do in l2cap_chan_hold_unless_zero,
> that said we need to check if the hci_conn_drop can still set the ref
> below zero, anyway that is probably a bug in itself and we should
> probably WARN_ON if that happens.

The problem here is that we'd like to have both

(1) to have hci_conn_del/cleanup delete the item from conn_hash
immediately

(2) be able to continue iteration from the conn we held, after
releasing and reacquiring RCU or hdev->lock

If conn is removed from the list, conn->list.next won't be updated any
more, so it is not safe to access after we have left the critical
section. So it seems we'd need some marker on whether it is still in
the list.

Maybe (1) could be given up instead, something like: hci_conn_cleanup
sets HCI_CONN_DELETED instead of deleting from the list if refcount is
positive, and lookup functions skip items with this flag. 

Something along these lines could work, need to think a bit.

> > static inline unsigned int hci_conn_num(struct hci_dev *hdev, __u8 type)
> > {
> > struct hci_conn_hash *h = &hdev->conn_hash;
> > diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c
> > index 62a7ccfdfe63..d489a4829be7 100644
> > --- a/net/bluetooth/hci_conn.c
> > +++ b/net/bluetooth/hci_conn.c
> > @@ -183,21 +183,13 @@ static void le_scan_cleanup(struct work_struct *work)
> > struct hci_conn *conn = container_of(work, struct hci_conn,
> > le_scan_cleanup);
> > struct hci_dev *hdev = conn->hdev;
> > - struct hci_conn *c = NULL;
> >
> > BT_DBG("%s hcon %p", hdev->name, conn);
> >
> > hci_dev_lock(hdev);
> >
> > /* Check that the hci_conn is still around */
> > - rcu_read_lock();
> > - list_for_each_entry_rcu(c, &hdev->conn_hash.list, list) {
> > - if (c == conn)
> > - break;
> > - }
> > - rcu_read_unlock();
> > -
> > - if (c == conn) {
> > + if (hci_conn_is_alive(hdev, conn)) {
>
> Hmm, I don't think this is safe, except if we are doing hci_conn_get
> we can't really access the conn pointer since it may be freed already,
> anyway this is sort of broken already given that we do access
> conn->hdev already.

hci_conn_get is held here, there's a hci_conn_put at the end of this
function.

>
> > hci_connect_le_scan_cleanup(conn, 0x00);
> > hci_conn_cleanup(conn);
> > }
> > --
> > 2.41.0
> >
>
>


2023-06-27 23:12:01

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: [PATCH RFC 1/5] Bluetooth: hci_conn: add hci_conn_is_alive

Hi Pauli,

On Fri, Jun 23, 2023 at 3:21 PM Pauli Virtanen <[email protected]> wrote:
>
> Hi Luiz,
>
> pe, 2023-06-23 kello 12:39 -0700, Luiz Augusto von Dentz kirjoitti:
> > On Fri, Jun 23, 2023 at 10:37 AM Pauli Virtanen <[email protected]> wrote:
> > >
> > > A delayed operation such as hci_sync on a given hci_conn needs to take
> > > hci_conn_get, so that the hci_conn doesn't get freed in the meantime.
> > > This does not guarantee the conn is still alive in a valid state, as it
> > > may be cleaned up in the meantime, so one needs to check if it is still
> > > in conn_hash to know if it's still alive.
> > >
> > > Simplify this alive check, using HCI_CONN_DELETED flag. This is also
> > > meaningful with RCU lock only, but with slightly different semantics.
> > >
> > > If hci_conn_is_alive(conn) returns true inside rcu_read_lock, conn was
> > > in conn_hash from the point of view of the current task when the flag
> > > was read. Then its deletion cannot complete before rcu_read_unlock.
> > >
> > > Signed-off-by: Pauli Virtanen <[email protected]>
> > > ---
> > >
> > > Notes:
> > > This probably can be done with RCU primitives setting list.prev, but
> > > that's maybe more magical...
> > >
> > > include/net/bluetooth/hci_core.h | 18 ++++++++++++++++++
> > > net/bluetooth/hci_conn.c | 10 +---------
> > > 2 files changed, 19 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
> > > index 05a9b3ab3f56..cab39bdd0592 100644
> > > --- a/include/net/bluetooth/hci_core.h
> > > +++ b/include/net/bluetooth/hci_core.h
> > > @@ -978,6 +978,7 @@ enum {
> > > HCI_CONN_PER_ADV,
> > > HCI_CONN_BIG_CREATED,
> > > HCI_CONN_CREATE_CIS,
> > > + HCI_CONN_DELETED,
> > > };
> > >
> > > static inline bool hci_conn_ssp_enabled(struct hci_conn *conn)
> > > @@ -997,6 +998,7 @@ static inline bool hci_conn_sc_enabled(struct hci_conn *conn)
> > > static inline void hci_conn_hash_add(struct hci_dev *hdev, struct hci_conn *c)
> > > {
> > > struct hci_conn_hash *h = &hdev->conn_hash;
> > > + WARN_ON(test_bit(HCI_CONN_DELETED, &c->flags));
> > > list_add_tail_rcu(&c->list, &h->list);
> > > switch (c->type) {
> > > case ACL_LINK:
> > > @@ -1023,6 +1025,10 @@ static inline void hci_conn_hash_add(struct hci_dev *hdev, struct hci_conn *c)
> > > static inline void hci_conn_hash_del(struct hci_dev *hdev, struct hci_conn *c)
> > > {
> > > struct hci_conn_hash *h = &hdev->conn_hash;
> > > + bool deleted;
> > > +
> > > + deleted = test_and_set_bit(HCI_CONN_DELETED, &c->flags);
> > > + WARN_ON(deleted);
> > >
> > > list_del_rcu(&c->list);
> > > synchronize_rcu();
> > > @@ -1049,6 +1055,18 @@ static inline void hci_conn_hash_del(struct hci_dev *hdev, struct hci_conn *c)
> > > }
> > > }
> > >
> > > +/* With hdev->lock: whether hci_conn is in conn_hash.
> > > + * With RCU: if true, the hci_conn is valid conn_hash iteration cursor and
> > > + * hci_conn_hash_del has not completed. (Note that if hci_conn was obtained in
> > > + * this critical section it is always valid, but this may return false!)
> > > + */
> > > +static inline bool hci_conn_is_alive(struct hci_dev *hdev, struct hci_conn *c)
> > > +{
> > > + RCU_LOCKDEP_WARN(lockdep_is_held(&hdev->lock) || rcu_read_lock_held(),
> > > + "suspicious locking");
> > > + return !test_bit(HCI_CONN_DELETED, &c->flags);
> > > +}
> >
> > I think we are better off doing something like
> > hci_conn_hold_unless_zero like we do in l2cap_chan_hold_unless_zero,
> > that said we need to check if the hci_conn_drop can still set the ref
> > below zero, anyway that is probably a bug in itself and we should
> > probably WARN_ON if that happens.
>
> The problem here is that we'd like to have both
>
> (1) to have hci_conn_del/cleanup delete the item from conn_hash
> immediately
>
> (2) be able to continue iteration from the conn we held, after
> releasing and reacquiring RCU or hdev->lock
>
> If conn is removed from the list, conn->list.next won't be updated any
> more, so it is not safe to access after we have left the critical
> section. So it seems we'd need some marker on whether it is still in
> the list.
>
> Maybe (1) could be given up instead, something like: hci_conn_cleanup
> sets HCI_CONN_DELETED instead of deleting from the list if refcount is
> positive, and lookup functions skip items with this flag.
>
> Something along these lines could work, need to think a bit.

Ive end up reworking this logic to use something similar to what
mgmt.c was doing:

https://patchwork.kernel.org/project/bluetooth/patch/[email protected]/

That way we just cancel by handle and don't have to make reference
left and right, we just lookup by handle if the connection is still
there when the work is scheduled we abort otherwise we don't have to
do anything.

> > > static inline unsigned int hci_conn_num(struct hci_dev *hdev, __u8 type)
> > > {
> > > struct hci_conn_hash *h = &hdev->conn_hash;
> > > diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c
> > > index 62a7ccfdfe63..d489a4829be7 100644
> > > --- a/net/bluetooth/hci_conn.c
> > > +++ b/net/bluetooth/hci_conn.c
> > > @@ -183,21 +183,13 @@ static void le_scan_cleanup(struct work_struct *work)
> > > struct hci_conn *conn = container_of(work, struct hci_conn,
> > > le_scan_cleanup);
> > > struct hci_dev *hdev = conn->hdev;
> > > - struct hci_conn *c = NULL;
> > >
> > > BT_DBG("%s hcon %p", hdev->name, conn);
> > >
> > > hci_dev_lock(hdev);
> > >
> > > /* Check that the hci_conn is still around */
> > > - rcu_read_lock();
> > > - list_for_each_entry_rcu(c, &hdev->conn_hash.list, list) {
> > > - if (c == conn)
> > > - break;
> > > - }
> > > - rcu_read_unlock();
> > > -
> > > - if (c == conn) {
> > > + if (hci_conn_is_alive(hdev, conn)) {
> >
> > Hmm, I don't think this is safe, except if we are doing hci_conn_get
> > we can't really access the conn pointer since it may be freed already,
> > anyway this is sort of broken already given that we do access
> > conn->hdev already.
>
> hci_conn_get is held here, there's a hci_conn_put at the end of this
> function.
>
> >
> > > hci_connect_le_scan_cleanup(conn, 0x00);
> > > hci_conn_cleanup(conn);
> > > }
> > > --
> > > 2.41.0
> > >
> >
> >
>


--
Luiz Augusto von Dentz

2023-06-28 16:32:40

by Pauli Virtanen

[permalink] [raw]
Subject: Re: [PATCH RFC 1/5] Bluetooth: hci_conn: add hci_conn_is_alive

Hi Luiz,

ti, 2023-06-27 kello 16:05 -0700, Luiz Augusto von Dentz kirjoitti:
> Hi Pauli,
>
> On Fri, Jun 23, 2023 at 3:21 PM Pauli Virtanen <[email protected]> wrote:
> >
> > Hi Luiz,
> >
> > pe, 2023-06-23 kello 12:39 -0700, Luiz Augusto von Dentz kirjoitti:
> > > On Fri, Jun 23, 2023 at 10:37 AM Pauli Virtanen <[email protected]> wrote:
> > > >
> > > > A delayed operation such as hci_sync on a given hci_conn needs to take
> > > > hci_conn_get, so that the hci_conn doesn't get freed in the meantime.
> > > > This does not guarantee the conn is still alive in a valid state, as it
> > > > may be cleaned up in the meantime, so one needs to check if it is still
> > > > in conn_hash to know if it's still alive.
> > > >
> > > > Simplify this alive check, using HCI_CONN_DELETED flag. This is also
> > > > meaningful with RCU lock only, but with slightly different semantics.
> > > >
> > > > If hci_conn_is_alive(conn) returns true inside rcu_read_lock, conn was
> > > > in conn_hash from the point of view of the current task when the flag
> > > > was read. Then its deletion cannot complete before rcu_read_unlock.
> > > >
> > > > Signed-off-by: Pauli Virtanen <[email protected]>
> > > > ---
> > > >
> > > > Notes:
> > > > This probably can be done with RCU primitives setting list.prev, but
> > > > that's maybe more magical...
> > > >
> > > > include/net/bluetooth/hci_core.h | 18 ++++++++++++++++++
> > > > net/bluetooth/hci_conn.c | 10 +---------
> > > > 2 files changed, 19 insertions(+), 9 deletions(-)
> > > >
> > > > diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
> > > > index 05a9b3ab3f56..cab39bdd0592 100644
> > > > --- a/include/net/bluetooth/hci_core.h
> > > > +++ b/include/net/bluetooth/hci_core.h
> > > > @@ -978,6 +978,7 @@ enum {
> > > > HCI_CONN_PER_ADV,
> > > > HCI_CONN_BIG_CREATED,
> > > > HCI_CONN_CREATE_CIS,
> > > > + HCI_CONN_DELETED,
> > > > };
> > > >
> > > > static inline bool hci_conn_ssp_enabled(struct hci_conn *conn)
> > > > @@ -997,6 +998,7 @@ static inline bool hci_conn_sc_enabled(struct hci_conn *conn)
> > > > static inline void hci_conn_hash_add(struct hci_dev *hdev, struct hci_conn *c)
> > > > {
> > > > struct hci_conn_hash *h = &hdev->conn_hash;
> > > > + WARN_ON(test_bit(HCI_CONN_DELETED, &c->flags));
> > > > list_add_tail_rcu(&c->list, &h->list);
> > > > switch (c->type) {
> > > > case ACL_LINK:
> > > > @@ -1023,6 +1025,10 @@ static inline void hci_conn_hash_add(struct hci_dev *hdev, struct hci_conn *c)
> > > > static inline void hci_conn_hash_del(struct hci_dev *hdev, struct hci_conn *c)
> > > > {
> > > > struct hci_conn_hash *h = &hdev->conn_hash;
> > > > + bool deleted;
> > > > +
> > > > + deleted = test_and_set_bit(HCI_CONN_DELETED, &c->flags);
> > > > + WARN_ON(deleted);
> > > >
> > > > list_del_rcu(&c->list);
> > > > synchronize_rcu();
> > > > @@ -1049,6 +1055,18 @@ static inline void hci_conn_hash_del(struct hci_dev *hdev, struct hci_conn *c)
> > > > }
> > > > }
> > > >
> > > > +/* With hdev->lock: whether hci_conn is in conn_hash.
> > > > + * With RCU: if true, the hci_conn is valid conn_hash iteration cursor and
> > > > + * hci_conn_hash_del has not completed. (Note that if hci_conn was obtained in
> > > > + * this critical section it is always valid, but this may return false!)
> > > > + */
> > > > +static inline bool hci_conn_is_alive(struct hci_dev *hdev, struct hci_conn *c)
> > > > +{
> > > > + RCU_LOCKDEP_WARN(lockdep_is_held(&hdev->lock) || rcu_read_lock_held(),
> > > > + "suspicious locking");
> > > > + return !test_bit(HCI_CONN_DELETED, &c->flags);
> > > > +}
> > >
> > > I think we are better off doing something like
> > > hci_conn_hold_unless_zero like we do in l2cap_chan_hold_unless_zero,
> > > that said we need to check if the hci_conn_drop can still set the ref
> > > below zero, anyway that is probably a bug in itself and we should
> > > probably WARN_ON if that happens.
> >
> > The problem here is that we'd like to have both
> >
> > (1) to have hci_conn_del/cleanup delete the item from conn_hash
> > immediately
> >
> > (2) be able to continue iteration from the conn we held, after
> > releasing and reacquiring RCU or hdev->lock
> >
> > If conn is removed from the list, conn->list.next won't be updated any
> > more, so it is not safe to access after we have left the critical
> > section. So it seems we'd need some marker on whether it is still in
> > the list.
> >
> > Maybe (1) could be given up instead, something like: hci_conn_cleanup
> > sets HCI_CONN_DELETED instead of deleting from the list if refcount is
> > positive, and lookup functions skip items with this flag.
> >
> > Something along these lines could work, need to think a bit.
>
> Ive end up reworking this logic to use something similar to what
> mgmt.c was doing:
>
> https://patchwork.kernel.org/project/bluetooth/patch/[email protected]/
>
> That way we just cancel by handle and don't have to make reference
> left and right, we just lookup by handle if the connection is still
> there when the work is scheduled we abort otherwise we don't have to
> do anything.

Does this still rely on the conn not being freed concurrently, maybe to
be totally sure holding rcu_read_lock/hdev->lock or having refcount
would be needed around the lookup and *conn access?

Unless there is something else guaranteeing the call sites of
hci_conn_del cannot occur at the same time?

IIUC, hci_conn_del is called also from several other places that may
run concurrently if you don't lock, eg. in hci_event.c (seems to run in
different workqueue than hci_sync), and I guess controller could
trigger eg. HCI_Disconnection_Complete spontaneously.

I'm not sure if these can be serialized behind hci_sync, if a handle is
disconnected it probably needs to do some teardown immediately like
unregistering it from sysfs, so that the same handle value can be
reused.

This is also problem for using conn->refcnt for keeping it alive: it
seems we want to do partial cleanup anyway even if someone is holding
the conn, so it would in the end to boil down to same as hci_conn_get.
(Having hci_conn_get keep items in list would simplify iteration, but
other parts seem to become more complex so what was in this RFC is
maybe simplest in that direction.)

>
> > > > static inline unsigned int hci_conn_num(struct hci_dev *hdev, __u8 type)
> > > > {
> > > > struct hci_conn_hash *h = &hdev->conn_hash;
> > > > diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c
> > > > index 62a7ccfdfe63..d489a4829be7 100644
> > > > --- a/net/bluetooth/hci_conn.c
> > > > +++ b/net/bluetooth/hci_conn.c
> > > > @@ -183,21 +183,13 @@ static void le_scan_cleanup(struct work_struct *work)
> > > > struct hci_conn *conn = container_of(work, struct hci_conn,
> > > > le_scan_cleanup);
> > > > struct hci_dev *hdev = conn->hdev;
> > > > - struct hci_conn *c = NULL;
> > > >
> > > > BT_DBG("%s hcon %p", hdev->name, conn);
> > > >
> > > > hci_dev_lock(hdev);
> > > >
> > > > /* Check that the hci_conn is still around */
> > > > - rcu_read_lock();
> > > > - list_for_each_entry_rcu(c, &hdev->conn_hash.list, list) {
> > > > - if (c == conn)
> > > > - break;
> > > > - }
> > > > - rcu_read_unlock();
> > > > -
> > > > - if (c == conn) {
> > > > + if (hci_conn_is_alive(hdev, conn)) {
> > >
> > > Hmm, I don't think this is safe, except if we are doing hci_conn_get
> > > we can't really access the conn pointer since it may be freed already,
> > > anyway this is sort of broken already given that we do access
> > > conn->hdev already.
> >
> > hci_conn_get is held here, there's a hci_conn_put at the end of this
> > function.
> >
> > >
> > > > hci_connect_le_scan_cleanup(conn, 0x00);
> > > > hci_conn_cleanup(conn);
> > > > }
> > > > --
> > > > 2.41.0
> > > >
> > >
> > >
> >
>
>


2023-06-28 19:32:51

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: [PATCH RFC 1/5] Bluetooth: hci_conn: add hci_conn_is_alive

Hi Pauli,

On Wed, Jun 28, 2023 at 9:24 AM Pauli Virtanen <[email protected]> wrote:
>
> Hi Luiz,
>
> ti, 2023-06-27 kello 16:05 -0700, Luiz Augusto von Dentz kirjoitti:
> > Hi Pauli,
> >
> > On Fri, Jun 23, 2023 at 3:21 PM Pauli Virtanen <[email protected]> wrote:
> > >
> > > Hi Luiz,
> > >
> > > pe, 2023-06-23 kello 12:39 -0700, Luiz Augusto von Dentz kirjoitti:
> > > > On Fri, Jun 23, 2023 at 10:37 AM Pauli Virtanen <[email protected]> wrote:
> > > > >
> > > > > A delayed operation such as hci_sync on a given hci_conn needs to take
> > > > > hci_conn_get, so that the hci_conn doesn't get freed in the meantime.
> > > > > This does not guarantee the conn is still alive in a valid state, as it
> > > > > may be cleaned up in the meantime, so one needs to check if it is still
> > > > > in conn_hash to know if it's still alive.
> > > > >
> > > > > Simplify this alive check, using HCI_CONN_DELETED flag. This is also
> > > > > meaningful with RCU lock only, but with slightly different semantics.
> > > > >
> > > > > If hci_conn_is_alive(conn) returns true inside rcu_read_lock, conn was
> > > > > in conn_hash from the point of view of the current task when the flag
> > > > > was read. Then its deletion cannot complete before rcu_read_unlock.
> > > > >
> > > > > Signed-off-by: Pauli Virtanen <[email protected]>
> > > > > ---
> > > > >
> > > > > Notes:
> > > > > This probably can be done with RCU primitives setting list.prev, but
> > > > > that's maybe more magical...
> > > > >
> > > > > include/net/bluetooth/hci_core.h | 18 ++++++++++++++++++
> > > > > net/bluetooth/hci_conn.c | 10 +---------
> > > > > 2 files changed, 19 insertions(+), 9 deletions(-)
> > > > >
> > > > > diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
> > > > > index 05a9b3ab3f56..cab39bdd0592 100644
> > > > > --- a/include/net/bluetooth/hci_core.h
> > > > > +++ b/include/net/bluetooth/hci_core.h
> > > > > @@ -978,6 +978,7 @@ enum {
> > > > > HCI_CONN_PER_ADV,
> > > > > HCI_CONN_BIG_CREATED,
> > > > > HCI_CONN_CREATE_CIS,
> > > > > + HCI_CONN_DELETED,
> > > > > };
> > > > >
> > > > > static inline bool hci_conn_ssp_enabled(struct hci_conn *conn)
> > > > > @@ -997,6 +998,7 @@ static inline bool hci_conn_sc_enabled(struct hci_conn *conn)
> > > > > static inline void hci_conn_hash_add(struct hci_dev *hdev, struct hci_conn *c)
> > > > > {
> > > > > struct hci_conn_hash *h = &hdev->conn_hash;
> > > > > + WARN_ON(test_bit(HCI_CONN_DELETED, &c->flags));
> > > > > list_add_tail_rcu(&c->list, &h->list);
> > > > > switch (c->type) {
> > > > > case ACL_LINK:
> > > > > @@ -1023,6 +1025,10 @@ static inline void hci_conn_hash_add(struct hci_dev *hdev, struct hci_conn *c)
> > > > > static inline void hci_conn_hash_del(struct hci_dev *hdev, struct hci_conn *c)
> > > > > {
> > > > > struct hci_conn_hash *h = &hdev->conn_hash;
> > > > > + bool deleted;
> > > > > +
> > > > > + deleted = test_and_set_bit(HCI_CONN_DELETED, &c->flags);
> > > > > + WARN_ON(deleted);
> > > > >
> > > > > list_del_rcu(&c->list);
> > > > > synchronize_rcu();
> > > > > @@ -1049,6 +1055,18 @@ static inline void hci_conn_hash_del(struct hci_dev *hdev, struct hci_conn *c)
> > > > > }
> > > > > }
> > > > >
> > > > > +/* With hdev->lock: whether hci_conn is in conn_hash.
> > > > > + * With RCU: if true, the hci_conn is valid conn_hash iteration cursor and
> > > > > + * hci_conn_hash_del has not completed. (Note that if hci_conn was obtained in
> > > > > + * this critical section it is always valid, but this may return false!)
> > > > > + */
> > > > > +static inline bool hci_conn_is_alive(struct hci_dev *hdev, struct hci_conn *c)
> > > > > +{
> > > > > + RCU_LOCKDEP_WARN(lockdep_is_held(&hdev->lock) || rcu_read_lock_held(),
> > > > > + "suspicious locking");
> > > > > + return !test_bit(HCI_CONN_DELETED, &c->flags);
> > > > > +}
> > > >
> > > > I think we are better off doing something like
> > > > hci_conn_hold_unless_zero like we do in l2cap_chan_hold_unless_zero,
> > > > that said we need to check if the hci_conn_drop can still set the ref
> > > > below zero, anyway that is probably a bug in itself and we should
> > > > probably WARN_ON if that happens.
> > >
> > > The problem here is that we'd like to have both
> > >
> > > (1) to have hci_conn_del/cleanup delete the item from conn_hash
> > > immediately
> > >
> > > (2) be able to continue iteration from the conn we held, after
> > > releasing and reacquiring RCU or hdev->lock
> > >
> > > If conn is removed from the list, conn->list.next won't be updated any
> > > more, so it is not safe to access after we have left the critical
> > > section. So it seems we'd need some marker on whether it is still in
> > > the list.
> > >
> > > Maybe (1) could be given up instead, something like: hci_conn_cleanup
> > > sets HCI_CONN_DELETED instead of deleting from the list if refcount is
> > > positive, and lookup functions skip items with this flag.
> > >
> > > Something along these lines could work, need to think a bit.
> >
> > Ive end up reworking this logic to use something similar to what
> > mgmt.c was doing:
> >
> > https://patchwork.kernel.org/project/bluetooth/patch/[email protected]/
> >
> > That way we just cancel by handle and don't have to make reference
> > left and right, we just lookup by handle if the connection is still
> > there when the work is scheduled we abort otherwise we don't have to
> > do anything.
>
> Does this still rely on the conn not being freed concurrently, maybe to
> be totally sure holding rcu_read_lock/hdev->lock or having refcount
> would be needed around the lookup and *conn access?
>
> Unless there is something else guaranteeing the call sites of
> hci_conn_del cannot occur at the same time?
>
> IIUC, hci_conn_del is called also from several other places that may
> run concurrently if you don't lock, eg. in hci_event.c (seems to run in
> different workqueue than hci_sync), and I guess controller could
> trigger eg. HCI_Disconnection_Complete spontaneously.
>
> I'm not sure if these can be serialized behind hci_sync, if a handle is
> disconnected it probably needs to do some teardown immediately like
> unregistering it from sysfs, so that the same handle value can be
> reused.
>
> This is also problem for using conn->refcnt for keeping it alive: it
> seems we want to do partial cleanup anyway even if someone is holding
> the conn, so it would in the end to boil down to same as hci_conn_get.
> (Having hci_conn_get keep items in list would simplify iteration, but
> other parts seem to become more complex so what was in this RFC is
> maybe simplest in that direction.)

The idea here is that lookup by handle shall always be safe since if
hci_conn_del has already been called the conn shall no longer be in
the hash, now perhaps what you are afraid is that hci_conn_del is not
safe to be called concurrently, which is probably correct, so perhaps
we need a flag or something to check that HCI_CONN_DEL is in progress.

> >
> > > > > static inline unsigned int hci_conn_num(struct hci_dev *hdev, __u8 type)
> > > > > {
> > > > > struct hci_conn_hash *h = &hdev->conn_hash;
> > > > > diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c
> > > > > index 62a7ccfdfe63..d489a4829be7 100644
> > > > > --- a/net/bluetooth/hci_conn.c
> > > > > +++ b/net/bluetooth/hci_conn.c
> > > > > @@ -183,21 +183,13 @@ static void le_scan_cleanup(struct work_struct *work)
> > > > > struct hci_conn *conn = container_of(work, struct hci_conn,
> > > > > le_scan_cleanup);
> > > > > struct hci_dev *hdev = conn->hdev;
> > > > > - struct hci_conn *c = NULL;
> > > > >
> > > > > BT_DBG("%s hcon %p", hdev->name, conn);
> > > > >
> > > > > hci_dev_lock(hdev);
> > > > >
> > > > > /* Check that the hci_conn is still around */
> > > > > - rcu_read_lock();
> > > > > - list_for_each_entry_rcu(c, &hdev->conn_hash.list, list) {
> > > > > - if (c == conn)
> > > > > - break;
> > > > > - }
> > > > > - rcu_read_unlock();
> > > > > -
> > > > > - if (c == conn) {
> > > > > + if (hci_conn_is_alive(hdev, conn)) {
> > > >
> > > > Hmm, I don't think this is safe, except if we are doing hci_conn_get
> > > > we can't really access the conn pointer since it may be freed already,
> > > > anyway this is sort of broken already given that we do access
> > > > conn->hdev already.
> > >
> > > hci_conn_get is held here, there's a hci_conn_put at the end of this
> > > function.
> > >
> > > >
> > > > > hci_connect_le_scan_cleanup(conn, 0x00);
> > > > > hci_conn_cleanup(conn);
> > > > > }
> > > > > --
> > > > > 2.41.0
> > > > >
> > > >
> > > >
> > >
> >
> >
>


--
Luiz Augusto von Dentz

2023-06-28 20:11:00

by Pauli Virtanen

[permalink] [raw]
Subject: Re: [PATCH RFC 1/5] Bluetooth: hci_conn: add hci_conn_is_alive

ke, 2023-06-28 kello 12:29 -0700, Luiz Augusto von Dentz kirjoitti:
> Hi Pauli,
>
> On Wed, Jun 28, 2023 at 9:24 AM Pauli Virtanen <[email protected]> wrote:
> >
> > Hi Luiz,
> >
> > ti, 2023-06-27 kello 16:05 -0700, Luiz Augusto von Dentz kirjoitti:
> > > Hi Pauli,
> > >
> > > On Fri, Jun 23, 2023 at 3:21 PM Pauli Virtanen <[email protected]> wrote:
> > > >
> > > > Hi Luiz,
> > > >
> > > > pe, 2023-06-23 kello 12:39 -0700, Luiz Augusto von Dentz kirjoitti:
> > > > > On Fri, Jun 23, 2023 at 10:37 AM Pauli Virtanen <[email protected]> wrote:
> > > > > >
> > > > > > A delayed operation such as hci_sync on a given hci_conn needs to take
> > > > > > hci_conn_get, so that the hci_conn doesn't get freed in the meantime.
> > > > > > This does not guarantee the conn is still alive in a valid state, as it
> > > > > > may be cleaned up in the meantime, so one needs to check if it is still
> > > > > > in conn_hash to know if it's still alive.
> > > > > >
> > > > > > Simplify this alive check, using HCI_CONN_DELETED flag. This is also
> > > > > > meaningful with RCU lock only, but with slightly different semantics.
> > > > > >
> > > > > > If hci_conn_is_alive(conn) returns true inside rcu_read_lock, conn was
> > > > > > in conn_hash from the point of view of the current task when the flag
> > > > > > was read. Then its deletion cannot complete before rcu_read_unlock.
> > > > > >
> > > > > > Signed-off-by: Pauli Virtanen <[email protected]>
> > > > > > ---
> > > > > >
> > > > > > Notes:
> > > > > > This probably can be done with RCU primitives setting list.prev, but
> > > > > > that's maybe more magical...
> > > > > >
> > > > > > include/net/bluetooth/hci_core.h | 18 ++++++++++++++++++
> > > > > > net/bluetooth/hci_conn.c | 10 +---------
> > > > > > 2 files changed, 19 insertions(+), 9 deletions(-)
> > > > > >
> > > > > > diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
> > > > > > index 05a9b3ab3f56..cab39bdd0592 100644
> > > > > > --- a/include/net/bluetooth/hci_core.h
> > > > > > +++ b/include/net/bluetooth/hci_core.h
> > > > > > @@ -978,6 +978,7 @@ enum {
> > > > > > HCI_CONN_PER_ADV,
> > > > > > HCI_CONN_BIG_CREATED,
> > > > > > HCI_CONN_CREATE_CIS,
> > > > > > + HCI_CONN_DELETED,
> > > > > > };
> > > > > >
> > > > > > static inline bool hci_conn_ssp_enabled(struct hci_conn *conn)
> > > > > > @@ -997,6 +998,7 @@ static inline bool hci_conn_sc_enabled(struct hci_conn *conn)
> > > > > > static inline void hci_conn_hash_add(struct hci_dev *hdev, struct hci_conn *c)
> > > > > > {
> > > > > > struct hci_conn_hash *h = &hdev->conn_hash;
> > > > > > + WARN_ON(test_bit(HCI_CONN_DELETED, &c->flags));
> > > > > > list_add_tail_rcu(&c->list, &h->list);
> > > > > > switch (c->type) {
> > > > > > case ACL_LINK:
> > > > > > @@ -1023,6 +1025,10 @@ static inline void hci_conn_hash_add(struct hci_dev *hdev, struct hci_conn *c)
> > > > > > static inline void hci_conn_hash_del(struct hci_dev *hdev, struct hci_conn *c)
> > > > > > {
> > > > > > struct hci_conn_hash *h = &hdev->conn_hash;
> > > > > > + bool deleted;
> > > > > > +
> > > > > > + deleted = test_and_set_bit(HCI_CONN_DELETED, &c->flags);
> > > > > > + WARN_ON(deleted);
> > > > > >
> > > > > > list_del_rcu(&c->list);
> > > > > > synchronize_rcu();
> > > > > > @@ -1049,6 +1055,18 @@ static inline void hci_conn_hash_del(struct hci_dev *hdev, struct hci_conn *c)
> > > > > > }
> > > > > > }
> > > > > >
> > > > > > +/* With hdev->lock: whether hci_conn is in conn_hash.
> > > > > > + * With RCU: if true, the hci_conn is valid conn_hash iteration cursor and
> > > > > > + * hci_conn_hash_del has not completed. (Note that if hci_conn was obtained in
> > > > > > + * this critical section it is always valid, but this may return false!)
> > > > > > + */
> > > > > > +static inline bool hci_conn_is_alive(struct hci_dev *hdev, struct hci_conn *c)
> > > > > > +{
> > > > > > + RCU_LOCKDEP_WARN(lockdep_is_held(&hdev->lock) || rcu_read_lock_held(),
> > > > > > + "suspicious locking");
> > > > > > + return !test_bit(HCI_CONN_DELETED, &c->flags);
> > > > > > +}
> > > > >
> > > > > I think we are better off doing something like
> > > > > hci_conn_hold_unless_zero like we do in l2cap_chan_hold_unless_zero,
> > > > > that said we need to check if the hci_conn_drop can still set the ref
> > > > > below zero, anyway that is probably a bug in itself and we should
> > > > > probably WARN_ON if that happens.
> > > >
> > > > The problem here is that we'd like to have both
> > > >
> > > > (1) to have hci_conn_del/cleanup delete the item from conn_hash
> > > > immediately
> > > >
> > > > (2) be able to continue iteration from the conn we held, after
> > > > releasing and reacquiring RCU or hdev->lock
> > > >
> > > > If conn is removed from the list, conn->list.next won't be updated any
> > > > more, so it is not safe to access after we have left the critical
> > > > section. So it seems we'd need some marker on whether it is still in
> > > > the list.
> > > >
> > > > Maybe (1) could be given up instead, something like: hci_conn_cleanup
> > > > sets HCI_CONN_DELETED instead of deleting from the list if refcount is
> > > > positive, and lookup functions skip items with this flag.
> > > >
> > > > Something along these lines could work, need to think a bit.
> > >
> > > Ive end up reworking this logic to use something similar to what
> > > mgmt.c was doing:
> > >
> > > https://patchwork.kernel.org/project/bluetooth/patch/[email protected]/
> > >
> > > That way we just cancel by handle and don't have to make reference
> > > left and right, we just lookup by handle if the connection is still
> > > there when the work is scheduled we abort otherwise we don't have to
> > > do anything.
> >
> > Does this still rely on the conn not being freed concurrently, maybe to
> > be totally sure holding rcu_read_lock/hdev->lock or having refcount
> > would be needed around the lookup and *conn access?
> >
> > Unless there is something else guaranteeing the call sites of
> > hci_conn_del cannot occur at the same time?
> >
> > IIUC, hci_conn_del is called also from several other places that may
> > run concurrently if you don't lock, eg. in hci_event.c (seems to run in
> > different workqueue than hci_sync), and I guess controller could
> > trigger eg. HCI_Disconnection_Complete spontaneously.
> >
> > I'm not sure if these can be serialized behind hci_sync, if a handle is
> > disconnected it probably needs to do some teardown immediately like
> > unregistering it from sysfs, so that the same handle value can be
> > reused.
> >
> > This is also problem for using conn->refcnt for keeping it alive: it
> > seems we want to do partial cleanup anyway even if someone is holding
> > the conn, so it would in the end to boil down to same as hci_conn_get.
> > (Having hci_conn_get keep items in list would simplify iteration, but
> > other parts seem to become more complex so what was in this RFC is
> > maybe simplest in that direction.)
>
> The idea here is that lookup by handle shall always be safe since if
> hci_conn_del has already been called the conn shall no longer be in
> the hash, now perhaps what you are afraid is that hci_conn_del is not
> safe to be called concurrently, which is probably correct, so perhaps
> we need a flag or something to check that HCI_CONN_DEL is in progress.

For clarification, I mean here that constructs like

conn = hci_conn_hash_lookup_handle(hdev, handle);
if (!conn)
return 0;
do_something(conn);

might not be safe if e.g. hci_event.c is doing hci_conn_del on another
CPU at the same time (it appears to be in hdev->workqueue, cmd_sync in
req_workqueue so I'm not sure they are serialized), and frees conn
after hci_conn_hash_lookup_handle exits the critical section,
before/during do_something runs. Something like

rcu_read_lock();
conn = hci_conn_hash_lookup_handle(hdev, handle);
if (conn)
handle = conn->handle;
rcu_read_unlock();
if (!conn)
return 0;
do_something(handle)

could avoid that, or holding hdev->lock (IIUC in hci_sync callbacks it
is not automatically held).

Or, maybe there is some other guarantee that can be assumed here, and
I'm missing something about how the locking/scheduling works here
(don't know this well enough).

The other comment was about if

hci_cmd_sync_cancel(hdev, -ECANCELED);

could cancel the wrong command. Note conn->state is set to BT_CONNECT
in hci_connect_le, before hci_connect_le_sync sends its command, so if
there was already another command in the hci_sync queue belonging to a
different connection, it seems hci_cmd_sync_cancel might here cancel
the wrong one.

>
> > >
> > > > > > static inline unsigned int hci_conn_num(struct hci_dev *hdev, __u8 type)
> > > > > > {
> > > > > > struct hci_conn_hash *h = &hdev->conn_hash;
> > > > > > diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c
> > > > > > index 62a7ccfdfe63..d489a4829be7 100644
> > > > > > --- a/net/bluetooth/hci_conn.c
> > > > > > +++ b/net/bluetooth/hci_conn.c
> > > > > > @@ -183,21 +183,13 @@ static void le_scan_cleanup(struct work_struct *work)
> > > > > > struct hci_conn *conn = container_of(work, struct hci_conn,
> > > > > > le_scan_cleanup);
> > > > > > struct hci_dev *hdev = conn->hdev;
> > > > > > - struct hci_conn *c = NULL;
> > > > > >
> > > > > > BT_DBG("%s hcon %p", hdev->name, conn);
> > > > > >
> > > > > > hci_dev_lock(hdev);
> > > > > >
> > > > > > /* Check that the hci_conn is still around */
> > > > > > - rcu_read_lock();
> > > > > > - list_for_each_entry_rcu(c, &hdev->conn_hash.list, list) {
> > > > > > - if (c == conn)
> > > > > > - break;
> > > > > > - }
> > > > > > - rcu_read_unlock();
> > > > > > -
> > > > > > - if (c == conn) {
> > > > > > + if (hci_conn_is_alive(hdev, conn)) {
> > > > >
> > > > > Hmm, I don't think this is safe, except if we are doing hci_conn_get
> > > > > we can't really access the conn pointer since it may be freed already,
> > > > > anyway this is sort of broken already given that we do access
> > > > > conn->hdev already.
> > > >
> > > > hci_conn_get is held here, there's a hci_conn_put at the end of this
> > > > function.
> > > >
> > > > >
> > > > > > hci_connect_le_scan_cleanup(conn, 0x00);
> > > > > > hci_conn_cleanup(conn);
> > > > > > }
> > > > > > --
> > > > > > 2.41.0
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> > >
> >
>
>


2023-06-28 21:24:09

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: [PATCH RFC 1/5] Bluetooth: hci_conn: add hci_conn_is_alive

Hi Pauli,

On Wed, Jun 28, 2023 at 1:11 PM Pauli Virtanen <[email protected]> wrote:
>
> ke, 2023-06-28 kello 12:29 -0700, Luiz Augusto von Dentz kirjoitti:
> > Hi Pauli,
> >
> > On Wed, Jun 28, 2023 at 9:24 AM Pauli Virtanen <[email protected]> wrote:
> > >
> > > Hi Luiz,
> > >
> > > ti, 2023-06-27 kello 16:05 -0700, Luiz Augusto von Dentz kirjoitti:
> > > > Hi Pauli,
> > > >
> > > > On Fri, Jun 23, 2023 at 3:21 PM Pauli Virtanen <[email protected]> wrote:
> > > > >
> > > > > Hi Luiz,
> > > > >
> > > > > pe, 2023-06-23 kello 12:39 -0700, Luiz Augusto von Dentz kirjoitti:
> > > > > > On Fri, Jun 23, 2023 at 10:37 AM Pauli Virtanen <[email protected]> wrote:
> > > > > > >
> > > > > > > A delayed operation such as hci_sync on a given hci_conn needs to take
> > > > > > > hci_conn_get, so that the hci_conn doesn't get freed in the meantime.
> > > > > > > This does not guarantee the conn is still alive in a valid state, as it
> > > > > > > may be cleaned up in the meantime, so one needs to check if it is still
> > > > > > > in conn_hash to know if it's still alive.
> > > > > > >
> > > > > > > Simplify this alive check, using HCI_CONN_DELETED flag. This is also
> > > > > > > meaningful with RCU lock only, but with slightly different semantics.
> > > > > > >
> > > > > > > If hci_conn_is_alive(conn) returns true inside rcu_read_lock, conn was
> > > > > > > in conn_hash from the point of view of the current task when the flag
> > > > > > > was read. Then its deletion cannot complete before rcu_read_unlock.
> > > > > > >
> > > > > > > Signed-off-by: Pauli Virtanen <[email protected]>
> > > > > > > ---
> > > > > > >
> > > > > > > Notes:
> > > > > > > This probably can be done with RCU primitives setting list.prev, but
> > > > > > > that's maybe more magical...
> > > > > > >
> > > > > > > include/net/bluetooth/hci_core.h | 18 ++++++++++++++++++
> > > > > > > net/bluetooth/hci_conn.c | 10 +---------
> > > > > > > 2 files changed, 19 insertions(+), 9 deletions(-)
> > > > > > >
> > > > > > > diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
> > > > > > > index 05a9b3ab3f56..cab39bdd0592 100644
> > > > > > > --- a/include/net/bluetooth/hci_core.h
> > > > > > > +++ b/include/net/bluetooth/hci_core.h
> > > > > > > @@ -978,6 +978,7 @@ enum {
> > > > > > > HCI_CONN_PER_ADV,
> > > > > > > HCI_CONN_BIG_CREATED,
> > > > > > > HCI_CONN_CREATE_CIS,
> > > > > > > + HCI_CONN_DELETED,
> > > > > > > };
> > > > > > >
> > > > > > > static inline bool hci_conn_ssp_enabled(struct hci_conn *conn)
> > > > > > > @@ -997,6 +998,7 @@ static inline bool hci_conn_sc_enabled(struct hci_conn *conn)
> > > > > > > static inline void hci_conn_hash_add(struct hci_dev *hdev, struct hci_conn *c)
> > > > > > > {
> > > > > > > struct hci_conn_hash *h = &hdev->conn_hash;
> > > > > > > + WARN_ON(test_bit(HCI_CONN_DELETED, &c->flags));
> > > > > > > list_add_tail_rcu(&c->list, &h->list);
> > > > > > > switch (c->type) {
> > > > > > > case ACL_LINK:
> > > > > > > @@ -1023,6 +1025,10 @@ static inline void hci_conn_hash_add(struct hci_dev *hdev, struct hci_conn *c)
> > > > > > > static inline void hci_conn_hash_del(struct hci_dev *hdev, struct hci_conn *c)
> > > > > > > {
> > > > > > > struct hci_conn_hash *h = &hdev->conn_hash;
> > > > > > > + bool deleted;
> > > > > > > +
> > > > > > > + deleted = test_and_set_bit(HCI_CONN_DELETED, &c->flags);
> > > > > > > + WARN_ON(deleted);
> > > > > > >
> > > > > > > list_del_rcu(&c->list);
> > > > > > > synchronize_rcu();
> > > > > > > @@ -1049,6 +1055,18 @@ static inline void hci_conn_hash_del(struct hci_dev *hdev, struct hci_conn *c)
> > > > > > > }
> > > > > > > }
> > > > > > >
> > > > > > > +/* With hdev->lock: whether hci_conn is in conn_hash.
> > > > > > > + * With RCU: if true, the hci_conn is valid conn_hash iteration cursor and
> > > > > > > + * hci_conn_hash_del has not completed. (Note that if hci_conn was obtained in
> > > > > > > + * this critical section it is always valid, but this may return false!)
> > > > > > > + */
> > > > > > > +static inline bool hci_conn_is_alive(struct hci_dev *hdev, struct hci_conn *c)
> > > > > > > +{
> > > > > > > + RCU_LOCKDEP_WARN(lockdep_is_held(&hdev->lock) || rcu_read_lock_held(),
> > > > > > > + "suspicious locking");
> > > > > > > + return !test_bit(HCI_CONN_DELETED, &c->flags);
> > > > > > > +}
> > > > > >
> > > > > > I think we are better off doing something like
> > > > > > hci_conn_hold_unless_zero like we do in l2cap_chan_hold_unless_zero,
> > > > > > that said we need to check if the hci_conn_drop can still set the ref
> > > > > > below zero, anyway that is probably a bug in itself and we should
> > > > > > probably WARN_ON if that happens.
> > > > >
> > > > > The problem here is that we'd like to have both
> > > > >
> > > > > (1) to have hci_conn_del/cleanup delete the item from conn_hash
> > > > > immediately
> > > > >
> > > > > (2) be able to continue iteration from the conn we held, after
> > > > > releasing and reacquiring RCU or hdev->lock
> > > > >
> > > > > If conn is removed from the list, conn->list.next won't be updated any
> > > > > more, so it is not safe to access after we have left the critical
> > > > > section. So it seems we'd need some marker on whether it is still in
> > > > > the list.
> > > > >
> > > > > Maybe (1) could be given up instead, something like: hci_conn_cleanup
> > > > > sets HCI_CONN_DELETED instead of deleting from the list if refcount is
> > > > > positive, and lookup functions skip items with this flag.
> > > > >
> > > > > Something along these lines could work, need to think a bit.
> > > >
> > > > Ive end up reworking this logic to use something similar to what
> > > > mgmt.c was doing:
> > > >
> > > > https://patchwork.kernel.org/project/bluetooth/patch/[email protected]/
> > > >
> > > > That way we just cancel by handle and don't have to make reference
> > > > left and right, we just lookup by handle if the connection is still
> > > > there when the work is scheduled we abort otherwise we don't have to
> > > > do anything.
> > >
> > > Does this still rely on the conn not being freed concurrently, maybe to
> > > be totally sure holding rcu_read_lock/hdev->lock or having refcount
> > > would be needed around the lookup and *conn access?
> > >
> > > Unless there is something else guaranteeing the call sites of
> > > hci_conn_del cannot occur at the same time?
> > >
> > > IIUC, hci_conn_del is called also from several other places that may
> > > run concurrently if you don't lock, eg. in hci_event.c (seems to run in
> > > different workqueue than hci_sync), and I guess controller could
> > > trigger eg. HCI_Disconnection_Complete spontaneously.
> > >
> > > I'm not sure if these can be serialized behind hci_sync, if a handle is
> > > disconnected it probably needs to do some teardown immediately like
> > > unregistering it from sysfs, so that the same handle value can be
> > > reused.
> > >
> > > This is also problem for using conn->refcnt for keeping it alive: it
> > > seems we want to do partial cleanup anyway even if someone is holding
> > > the conn, so it would in the end to boil down to same as hci_conn_get.
> > > (Having hci_conn_get keep items in list would simplify iteration, but
> > > other parts seem to become more complex so what was in this RFC is
> > > maybe simplest in that direction.)
> >
> > The idea here is that lookup by handle shall always be safe since if
> > hci_conn_del has already been called the conn shall no longer be in
> > the hash, now perhaps what you are afraid is that hci_conn_del is not
> > safe to be called concurrently, which is probably correct, so perhaps
> > we need a flag or something to check that HCI_CONN_DEL is in progress.
>
> For clarification, I mean here that constructs like
>
> conn = hci_conn_hash_lookup_handle(hdev, handle);
> if (!conn)
> return 0;
> do_something(conn);
>
> might not be safe if e.g. hci_event.c is doing hci_conn_del on another
> CPU at the same time (it appears to be in hdev->workqueue, cmd_sync in
> req_workqueue so I'm not sure they are serialized), and frees conn
> after hci_conn_hash_lookup_handle exits the critical section,
> before/during do_something runs. Something like
>
> rcu_read_lock();
> conn = hci_conn_hash_lookup_handle(hdev, handle);
> if (conn)
> handle = conn->handle;
> rcu_read_unlock();
> if (!conn)
> return 0;
> do_something(handle)

Perhaps we need some tests that cover these scenarios, for the most
part all commands shall be serialized, so if the cmd_sync is doing
hci_conn_del while the receive/event work is doing it as well then we
have a bug in the code path. Btw, hci_conn_hash_lookup_handle does
rcu_read_lock already.

> could avoid that, or holding hdev->lock (IIUC in hci_sync callbacks it
> is not automatically held).
>
> Or, maybe there is some other guarantee that can be assumed here, and
> I'm missing something about how the locking/scheduling works here
> (don't know this well enough).
>
> The other comment was about if
>
> hci_cmd_sync_cancel(hdev, -ECANCELED);
>
> could cancel the wrong command. Note conn->state is set to BT_CONNECT
> in hci_connect_le, before hci_connect_le_sync sends its command, so if
> there was already another command in the hci_sync queue belonging to a
> different connection, it seems hci_cmd_sync_cancel might here cancel
> the wrong one.

Right, Im planning to introduce the following changes:

https://gist.github.com/Vudentz/1dd1525980c200bf9617ae002e0eb1ea

> >
> > > >
> > > > > > > static inline unsigned int hci_conn_num(struct hci_dev *hdev, __u8 type)
> > > > > > > {
> > > > > > > struct hci_conn_hash *h = &hdev->conn_hash;
> > > > > > > diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c
> > > > > > > index 62a7ccfdfe63..d489a4829be7 100644
> > > > > > > --- a/net/bluetooth/hci_conn.c
> > > > > > > +++ b/net/bluetooth/hci_conn.c
> > > > > > > @@ -183,21 +183,13 @@ static void le_scan_cleanup(struct work_struct *work)
> > > > > > > struct hci_conn *conn = container_of(work, struct hci_conn,
> > > > > > > le_scan_cleanup);
> > > > > > > struct hci_dev *hdev = conn->hdev;
> > > > > > > - struct hci_conn *c = NULL;
> > > > > > >
> > > > > > > BT_DBG("%s hcon %p", hdev->name, conn);
> > > > > > >
> > > > > > > hci_dev_lock(hdev);
> > > > > > >
> > > > > > > /* Check that the hci_conn is still around */
> > > > > > > - rcu_read_lock();
> > > > > > > - list_for_each_entry_rcu(c, &hdev->conn_hash.list, list) {
> > > > > > > - if (c == conn)
> > > > > > > - break;
> > > > > > > - }
> > > > > > > - rcu_read_unlock();
> > > > > > > -
> > > > > > > - if (c == conn) {
> > > > > > > + if (hci_conn_is_alive(hdev, conn)) {
> > > > > >
> > > > > > Hmm, I don't think this is safe, except if we are doing hci_conn_get
> > > > > > we can't really access the conn pointer since it may be freed already,
> > > > > > anyway this is sort of broken already given that we do access
> > > > > > conn->hdev already.
> > > > >
> > > > > hci_conn_get is held here, there's a hci_conn_put at the end of this
> > > > > function.
> > > > >
> > > > > >
> > > > > > > hci_connect_le_scan_cleanup(conn, 0x00);
> > > > > > > hci_conn_cleanup(conn);
> > > > > > > }
> > > > > > > --
> > > > > > > 2.41.0
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > >
> >
> >
>


--
Luiz Augusto von Dentz