In hci_abort_conn_sync it is possible that conn is deleted concurrently
by something else, also e.g. when waiting for hdev->lock. This causes
double deletion of the conn, so UAF or conn_hash.list corruption.
Fix by having all code paths check that the connection is still in
conn_hash before deleting it, while holding hdev->lock which prevents
any races.
Log (when powering off while BAP streaming, occurs rarely):
=======================================================================
kernel BUG at lib/list_debug.c:56!
...
? __list_del_entry_valid (lib/list_debug.c:56)
hci_conn_del (net/bluetooth/hci_conn.c:154) bluetooth
hci_abort_conn_sync (net/bluetooth/hci_sync.c:5415) bluetooth
? __pfx_hci_abort_conn_sync+0x10/0x10 [bluetooth]
? lock_release+0x1d5/0x3c0
? hci_disconnect_all_sync.constprop.0+0xb2/0x230 [bluetooth]
? __pfx_lock_release+0x10/0x10
? __kmem_cache_free+0x14d/0x2e0
hci_disconnect_all_sync.constprop.0+0xda/0x230 [bluetooth]
? __pfx_hci_disconnect_all_sync.constprop.0+0x10/0x10 [bluetooth]
? hci_clear_adv_sync+0x14f/0x170 [bluetooth]
? __pfx_set_powered_sync+0x10/0x10 [bluetooth]
hci_set_powered_sync+0x293/0x450 [bluetooth]
=======================================================================
Fixes: 94d9ba9f9888 ("Bluetooth: hci_sync: Fix UAF in hci_disconnect_all_sync")
Signed-off-by: Pauli Virtanen <[email protected]>
---
Notes:
v2: rebased
net/bluetooth/hci_sync.c | 26 ++++++++++++--------------
1 file changed, 12 insertions(+), 14 deletions(-)
diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
index 3640d73f9595..c6f57af88bfd 100644
--- a/net/bluetooth/hci_sync.c
+++ b/net/bluetooth/hci_sync.c
@@ -5380,6 +5380,7 @@ int hci_abort_conn_sync(struct hci_dev *hdev, struct hci_conn *conn, u8 reason)
{
int err = 0;
u16 handle = conn->handle;
+ bool disconnect = false;
struct hci_conn *c;
switch (conn->state) {
@@ -5395,24 +5396,15 @@ int hci_abort_conn_sync(struct hci_dev *hdev, struct hci_conn *conn, u8 reason)
break;
case BT_OPEN:
case BT_BOUND:
- hci_dev_lock(hdev);
- hci_conn_failed(conn, reason);
- hci_dev_unlock(hdev);
- return 0;
+ break;
default:
- hci_dev_lock(hdev);
- conn->state = BT_CLOSED;
- hci_disconn_cfm(conn, reason);
- hci_conn_del(conn);
- hci_dev_unlock(hdev);
- return 0;
+ disconnect = true;
+ break;
}
hci_dev_lock(hdev);
- /* Check if the connection hasn't been cleanup while waiting
- * commands to complete.
- */
+ /* Check if the connection has been cleaned up concurrently */
c = hci_conn_hash_lookup_handle(hdev, handle);
if (!c || c != conn) {
err = 0;
@@ -5424,7 +5416,13 @@ int hci_abort_conn_sync(struct hci_dev *hdev, struct hci_conn *conn, u8 reason)
* or in case of LE it was still scanning so it can be cleanup
* safely.
*/
- hci_conn_failed(conn, reason);
+ if (disconnect) {
+ conn->state = BT_CLOSED;
+ hci_disconn_cfm(conn, reason);
+ hci_conn_del(conn);
+ } else {
+ hci_conn_failed(conn, reason);
+ }
unlock:
hci_dev_unlock(hdev);
--
2.41.0
There is a race condition where a connection handle is reused, after
hci_abort_conn but before abort_conn_sync is processed in hci_sync. In
this case, hci_abort_conn_sync ends up working on the wrong connection,
which can have abort_reason==0, in which case hci_connect_cfm is called
with success status and then connection is deleted, which causes various
use-after-free.
Fix by checking abort_reason is nonzero before calling
hci_abort_conn_sync. If it's zero, then the connection is probably a
different one than we expected and should not be aborted.
Also fix some theoretical UAF / races, where something frees the conn
while hci_abort_conn_sync is working on it.
Fixes: a13f316e90fd ("Bluetooth: hci_conn: Consolidate code for aborting connections")
Reported-by: [email protected]
Closes: https://lore.kernel.org/linux-bluetooth/[email protected]/
Signed-off-by: Pauli Virtanen <[email protected]>
---
Notes:
v2: drop one probably not necessary WARN_ON
Not sure how you'd hit this condition in real controller, but syzbot
does end up calling hci_abort_conn_sync with reason == 0 which then
causes havoc.
This can be verified: with a patch that changes abort_conn_sync to
2874 conn = hci_conn_hash_lookup_handle(hdev, handle);
2875 if (!conn || WARN_ON(!conn->abort_reason))
2876 return 0;
https://syzkaller.appspot.com/text?tag=Patch&x=16eff740680000
it hits that WARN_ON:
https://syzkaller.appspot.com/x/log.txt?x=10affb97a80000
#syz test git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git master
net/bluetooth/hci_conn.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c
index e62a5f368a51..22de82071e35 100644
--- a/net/bluetooth/hci_conn.c
+++ b/net/bluetooth/hci_conn.c
@@ -2901,12 +2901,25 @@ static int abort_conn_sync(struct hci_dev *hdev, void *data)
{
struct hci_conn *conn;
u16 handle = PTR_UINT(data);
+ u8 reason;
+ int err;
+
+ rcu_read_lock();
conn = hci_conn_hash_lookup_handle(hdev, handle);
+ if (conn) {
+ reason = READ_ONCE(conn->abort_reason);
+ conn = reason ? hci_conn_get(conn) : NULL;
+ }
+
+ rcu_read_unlock();
+
if (!conn)
return 0;
- return hci_abort_conn_sync(hdev, conn, conn->abort_reason);
+ err = hci_abort_conn_sync(hdev, conn, reason);
+ hci_conn_put(conn);
+ return err;
}
int hci_abort_conn(struct hci_conn *conn, u8 reason)
@@ -2918,6 +2931,8 @@ int hci_abort_conn(struct hci_conn *conn, u8 reason)
*/
if (conn->abort_reason)
return 0;
+ if (WARN_ON(!reason))
+ reason = HCI_ERROR_UNSPECIFIED;
bt_dev_dbg(hdev, "handle 0x%2.2x reason 0x%2.2x", conn->handle, reason);
--
2.41.0
Hi Pauli,
On Sat, Sep 30, 2023 at 3:25 PM Pauli Virtanen <[email protected]> wrote:
>
> There is a race condition where a connection handle is reused, after
> hci_abort_conn but before abort_conn_sync is processed in hci_sync. In
> this case, hci_abort_conn_sync ends up working on the wrong connection,
> which can have abort_reason==0, in which case hci_connect_cfm is called
> with success status and then connection is deleted, which causes various
> use-after-free.
>
> Fix by checking abort_reason is nonzero before calling
> hci_abort_conn_sync. If it's zero, then the connection is probably a
> different one than we expected and should not be aborted.
>
> Also fix some theoretical UAF / races, where something frees the conn
> while hci_abort_conn_sync is working on it.
>
> Fixes: a13f316e90fd ("Bluetooth: hci_conn: Consolidate code for aborting connections")
> Reported-by: [email protected]
> Closes: https://lore.kernel.org/linux-bluetooth/[email protected]/
> Signed-off-by: Pauli Virtanen <[email protected]>
> ---
>
> Notes:
> v2: drop one probably not necessary WARN_ON
>
> Not sure how you'd hit this condition in real controller, but syzbot
> does end up calling hci_abort_conn_sync with reason == 0 which then
> causes havoc.
>
> This can be verified: with a patch that changes abort_conn_sync to
>
> 2874 conn = hci_conn_hash_lookup_handle(hdev, handle);
> 2875 if (!conn || WARN_ON(!conn->abort_reason))
> 2876 return 0;
>
> https://syzkaller.appspot.com/text?tag=Patch&x=16eff740680000
>
> it hits that WARN_ON:
>
> https://syzkaller.appspot.com/x/log.txt?x=10affb97a80000
>
> #syz test git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git master
>
> net/bluetooth/hci_conn.c | 17 ++++++++++++++++-
> 1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c
> index e62a5f368a51..22de82071e35 100644
> --- a/net/bluetooth/hci_conn.c
> +++ b/net/bluetooth/hci_conn.c
> @@ -2901,12 +2901,25 @@ static int abort_conn_sync(struct hci_dev *hdev, void *data)
> {
> struct hci_conn *conn;
> u16 handle = PTR_UINT(data);
> + u8 reason;
> + int err;
> +
> + rcu_read_lock();
>
> conn = hci_conn_hash_lookup_handle(hdev, handle);
> + if (conn) {
> + reason = READ_ONCE(conn->abort_reason);
> + conn = reason ? hci_conn_get(conn) : NULL;
> + }
> +
> + rcu_read_unlock();
> +
> if (!conn)
> return 0;
>
> - return hci_abort_conn_sync(hdev, conn, conn->abort_reason);
> + err = hci_abort_conn_sync(hdev, conn, reason);
> + hci_conn_put(conn);
> + return err;
> }
>
> int hci_abort_conn(struct hci_conn *conn, u8 reason)
> @@ -2918,6 +2931,8 @@ int hci_abort_conn(struct hci_conn *conn, u8 reason)
> */
> if (conn->abort_reason)
> return 0;
> + if (WARN_ON(!reason))
> + reason = HCI_ERROR_UNSPECIFIED;
>
> bt_dev_dbg(hdev, "handle 0x%2.2x reason 0x%2.2x", conn->handle, reason);
>
> --
> 2.41.0
Don't really like where this is going, the culprit here seems that we
are not able to cancel the callback when freeing the hci_conn so it
stays enqueued while the new connection is created with the same
handle, so I think we need to introduce a function that cancel queued
like Ive sent sometime back, that way we don't need to keep trying to
check if it is the same connection or not.
--
Luiz Augusto von Dentz
Hi Luiz,
ma, 2023-10-02 kello 16:22 -0700, Luiz Augusto von Dentz kirjoitti:
> Hi Pauli,
>
> On Sat, Sep 30, 2023 at 3:25 PM Pauli Virtanen <[email protected]> wrote:
> >
> > There is a race condition where a connection handle is reused, after
> > hci_abort_conn but before abort_conn_sync is processed in hci_sync. In
> > this case, hci_abort_conn_sync ends up working on the wrong connection,
> > which can have abort_reason==0, in which case hci_connect_cfm is called
> > with success status and then connection is deleted, which causes various
> > use-after-free.
> >
> > Fix by checking abort_reason is nonzero before calling
> > hci_abort_conn_sync. If it's zero, then the connection is probably a
> > different one than we expected and should not be aborted.
> >
> > Also fix some theoretical UAF / races, where something frees the conn
> > while hci_abort_conn_sync is working on it.
> >
> > Fixes: a13f316e90fd ("Bluetooth: hci_conn: Consolidate code for aborting connections")
> > Reported-by: [email protected]
> > Closes: https://lore.kernel.org/linux-bluetooth/[email protected]/
> > Signed-off-by: Pauli Virtanen <[email protected]>
> > ---
> >
> > Notes:
> > v2: drop one probably not necessary WARN_ON
> >
> > Not sure how you'd hit this condition in real controller, but syzbot
> > does end up calling hci_abort_conn_sync with reason == 0 which then
> > causes havoc.
> >
> > This can be verified: with a patch that changes abort_conn_sync to
> >
> > 2874 conn = hci_conn_hash_lookup_handle(hdev, handle);
> > 2875 if (!conn || WARN_ON(!conn->abort_reason))
> > 2876 return 0;
> >
> > https://syzkaller.appspot.com/text?tag=Patch&x=16eff740680000
> >
> > it hits that WARN_ON:
> >
> > https://syzkaller.appspot.com/x/log.txt?x=10affb97a80000
> >
> > #syz test git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git master
> >
> > net/bluetooth/hci_conn.c | 17 ++++++++++++++++-
> > 1 file changed, 16 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c
> > index e62a5f368a51..22de82071e35 100644
> > --- a/net/bluetooth/hci_conn.c
> > +++ b/net/bluetooth/hci_conn.c
> > @@ -2901,12 +2901,25 @@ static int abort_conn_sync(struct hci_dev *hdev, void *data)
> > {
> > struct hci_conn *conn;
> > u16 handle = PTR_UINT(data);
> > + u8 reason;
> > + int err;
> > +
> > + rcu_read_lock();
> >
> > conn = hci_conn_hash_lookup_handle(hdev, handle);
> > + if (conn) {
> > + reason = READ_ONCE(conn->abort_reason);
> > + conn = reason ? hci_conn_get(conn) : NULL;
> > + }
> > +
> > + rcu_read_unlock();
> > +
> > if (!conn)
> > return 0;
> >
> > - return hci_abort_conn_sync(hdev, conn, conn->abort_reason);
> > + err = hci_abort_conn_sync(hdev, conn, reason);
> > + hci_conn_put(conn);
> > + return err;
> > }
> >
> > int hci_abort_conn(struct hci_conn *conn, u8 reason)
> > @@ -2918,6 +2931,8 @@ int hci_abort_conn(struct hci_conn *conn, u8 reason)
> > */
> > if (conn->abort_reason)
> > return 0;
> > + if (WARN_ON(!reason))
> > + reason = HCI_ERROR_UNSPECIFIED;
> >
> > bt_dev_dbg(hdev, "handle 0x%2.2x reason 0x%2.2x", conn->handle, reason);
> >
> > --
> > 2.41.0
>
> Don't really like where this is going, the culprit here seems that we
> are not able to cancel the callback when freeing the hci_conn so it
> stays enqueued while the new connection is created with the same
> handle, so I think we need to introduce a function that cancel queued
> like Ive sent sometime back, that way we don't need to keep trying to
> check if it is the same connection or not.
Ack.
Note though that the first patch in series is separate issue, and won't
be fixed by canceling callbacks.
Removing the item from hci_sync queue on deletion would also avoid
syzbot hitting the issue in this second patch.
There'd be a remaining theoretical race though in
conn = hci_conn_hash_lookup_handle(hdev, handle);
/* the other workqueue frees conn here -> UAF */
if (!conn)
return 0;
unless you do the other stuff here. I wonder if there instead should be
something like hci_cmd_sync_dev_lock/unlock from
https://lore.kernel.org/linux-bluetooth/15e7863c06bd87cd991ab963132fa9d12ef7e11a.1690399379.git.pav@iki.fi/
to limit the concurrency only to the places where we wait for
controller events.
--
Pauli Virtanen
Hello:
This series was applied to bluetooth/bluetooth-next.git (master)
by Luiz Augusto von Dentz <[email protected]>:
On Sat, 30 Sep 2023 15:53:32 +0300 you wrote:
> In hci_abort_conn_sync it is possible that conn is deleted concurrently
> by something else, also e.g. when waiting for hdev->lock. This causes
> double deletion of the conn, so UAF or conn_hash.list corruption.
>
> Fix by having all code paths check that the connection is still in
> conn_hash before deleting it, while holding hdev->lock which prevents
> any races.
>
> [...]
Here is the summary with links:
- [v2,1/2] Bluetooth: hci_sync: always check if connection is alive before deleting
https://git.kernel.org/bluetooth/bluetooth-next/c/32f6776f0083
- [v2,2/2] Bluetooth: hci_conn: verify connection is to be aborted before doing it
(no matching commit)
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html