2021-09-02 12:40:28

by Lin Ma

[permalink] [raw]
Subject: Help needed in patching CVE-2021-3640

Hello there,

There is one bug (CVE-2021-3640: https://www.openwall.com/lists/oss-security/2021/07/22/1) that is similar to the recently fixed CVE-2021-3573.

The key point here is that the sco_conn_del() function can be called when syscalls like sco_sendmsg() is undergoing.
I think the easiest fix is to hang the sco_conn_del() using lock_sock() like below.

diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c
index d9a4e88dacbb..3da1ad441463 100644
--- a/net/bluetooth/sco.c
+++ b/net/bluetooth/sco.c
@@ -173,10 +173,10 @@ static void sco_conn_del(struct hci_conn *hcon, int err)

if (sk) {
sock_hold(sk);
- bh_lock_sock(sk);
+ lock_sock(sk);
sco_sock_clear_timer(sk);
sco_chan_del(sk, err);
- bh_unlock_sock(sk);
+ release_sock(sk);
sco_sock_kill(sk);
sock_put(sk);
}

This can make sure the kfree() will wait for the sock held by the sco_sendmsg() function. However, this patch can incur WARNING report like below. (I don't really know if this report is correct).

[ 75.147515] ======================================================
[ 75.149955] WARNING: possible circular locking dependency detected
[ 75.150546] 5.11.11+ #58 Not tainted
[ 75.150895] ------------------------------------------------------
[ 75.151485] poc.sco/127 is trying to acquire lock:
[ 75.151947] ffff888012212120 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}, at: sco_conn_del+0xf6/0x0
[ 75.152863]
[ 75.152863] but task is already holding lock:
[ 75.153420] ffffffff85b43948 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_conn_hash_flush+0xb3/0x1f0
[ 75.154256]
[ 75.154256] which lock already depends on the new lock.

P.S. find the POC code in openwall report

With the lesson I learnt in last bad patch e305509e678b ("Bluetooth: use correct lock to prevent UAF of hdev object"). I don't really expect this as the final correct patch.

I then try to use the technique in e04480920d1e ("Bluetooth: defer cleanup of resources in hci_unregister_dev()"). I mean, I want to defer the kfree of sco_conn object. However, the sco connection/disconnection mechanism is somewhat weird and I didn't really understand it by now.

Let's see this __sco_sock_close() function, which will be called from sco_sock_release().

static void __sco_sock_close(struct sock *sk)
{
BT_DBG("sk %p state %d socket %p", sk, sk->sk_state, sk->sk_socket);

switch (sk->sk_state) {
case BT_LISTEN:
sco_sock_cleanup_listen(sk);
break;

case BT_CONNECTED:
case BT_CONFIG:
if (sco_pi(sk)->conn->hcon) {
sk->sk_state = BT_DISCONN;
sco_sock_set_timer(sk, SCO_DISCONN_TIMEOUT);
sco_conn_lock(sco_pi(sk)->conn);
hci_conn_drop(sco_pi(sk)->conn->hcon);
sco_pi(sk)->conn->hcon = NULL;
sco_conn_unlock(sco_pi(sk)->conn);
} else
sco_chan_del(sk, ECONNRESET);
break;

case BT_CONNECT2:
case BT_CONNECT:
case BT_DISCONN:
sco_chan_del(sk, ECONNRESET);
break;

default:
sock_set_flag(sk, SOCK_ZAPPED);
break;
}
}

As you can see, though one socket is in BT_CONNECTED state, this function will just drop the kref of sco_pi(sk)->conn->hcon but do nothing with sco_pi(sk)->conn object. Then how this conn object is released? Where should I defer the deallocation function to?

I think I need help and discussion to settle down the solution for this. T_T

Best Wishes
Lin Ma


2021-09-02 20:20:49

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: Help needed in patching CVE-2021-3640

Hi Li,

On Thu, Sep 2, 2021 at 5:40 AM LinMa <[email protected]> wrote:
>
> Hello there,
>
> There is one bug (CVE-2021-3640: https://www.openwall.com/lists/oss-security/2021/07/22/1) that is similar to the recently fixed CVE-2021-3573.
>
> The key point here is that the sco_conn_del() function can be called when syscalls like sco_sendmsg() is undergoing.
> I think the easiest fix is to hang the sco_conn_del() using lock_sock() like below.
>
> diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c
> index d9a4e88dacbb..3da1ad441463 100644
> --- a/net/bluetooth/sco.c
> +++ b/net/bluetooth/sco.c
> @@ -173,10 +173,10 @@ static void sco_conn_del(struct hci_conn *hcon, int err)
>
> if (sk) {
> sock_hold(sk);
> - bh_lock_sock(sk);
> + lock_sock(sk);
> sco_sock_clear_timer(sk);
> sco_chan_del(sk, err);
> - bh_unlock_sock(sk);
> + release_sock(sk);
> sco_sock_kill(sk);
> sock_put(sk);
> }
>
> This can make sure the kfree() will wait for the sock held by the sco_sendmsg() function. However, this patch can incur WARNING report like below. (I don't really know if this report is correct).
>
> [ 75.147515] ======================================================
> [ 75.149955] WARNING: possible circular locking dependency detected
> [ 75.150546] 5.11.11+ #58 Not tainted
> [ 75.150895] ------------------------------------------------------
> [ 75.151485] poc.sco/127 is trying to acquire lock:
> [ 75.151947] ffff888012212120 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}, at: sco_conn_del+0xf6/0x0
> [ 75.152863]
> [ 75.152863] but task is already holding lock:
> [ 75.153420] ffffffff85b43948 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_conn_hash_flush+0xb3/0x1f0
> [ 75.154256]
> [ 75.154256] which lock already depends on the new lock.

Im not really sure what to make it out of this, they are not the same
lock so how does it establish the relationship of hci_cb_list_lock and
sock_lock? Anyway it seems pretty obvious that sock_lock must be used
to prevent concurrent operation like these to happen, but if we can't
use sock_lock then perhaps something needs to change in the way we
acquire hci_cb_list_lock.

> P.S. find the POC code in openwall report
>
> With the lesson I learnt in last bad patch e305509e678b ("Bluetooth: use correct lock to prevent UAF of hdev object"). I don't really expect this as the final correct patch.
>
> I then try to use the technique in e04480920d1e ("Bluetooth: defer cleanup of resources in hci_unregister_dev()"). I mean, I want to defer the kfree of sco_conn object. However, the sco connection/disconnection mechanism is somewhat weird and I didn't really understand it by now.
>
> Let's see this __sco_sock_close() function, which will be called from sco_sock_release().
>
> static void __sco_sock_close(struct sock *sk)
> {
> BT_DBG("sk %p state %d socket %p", sk, sk->sk_state, sk->sk_socket);
>
> switch (sk->sk_state) {
> case BT_LISTEN:
> sco_sock_cleanup_listen(sk);
> break;
>
> case BT_CONNECTED:
> case BT_CONFIG:
> if (sco_pi(sk)->conn->hcon) {
> sk->sk_state = BT_DISCONN;
> sco_sock_set_timer(sk, SCO_DISCONN_TIMEOUT);
> sco_conn_lock(sco_pi(sk)->conn);
> hci_conn_drop(sco_pi(sk)->conn->hcon);
> sco_pi(sk)->conn->hcon = NULL;
> sco_conn_unlock(sco_pi(sk)->conn);
> } else
> sco_chan_del(sk, ECONNRESET);
> break;
>
> case BT_CONNECT2:
> case BT_CONNECT:
> case BT_DISCONN:
> sco_chan_del(sk, ECONNRESET);
> break;
>
> default:
> sock_set_flag(sk, SOCK_ZAPPED);
> break;
> }
> }
>
> As you can see, though one socket is in BT_CONNECTED state, this function will just drop the kref of sco_pi(sk)->conn->hcon but do nothing with sco_pi(sk)->conn object. Then how this conn object is released? Where should I defer the deallocation function to?
>
> I think I need help and discussion to settle down the solution for this. T_T
>
> Best Wishes
> Lin Ma



--
Luiz Augusto von Dentz

2021-09-02 23:27:03

by Tetsuo Handa

[permalink] [raw]
Subject: Re: Help needed in patching CVE-2021-3640

On 2021/09/02 21:33, LinMa wrote:
> Hello there,
>
> There is one bug (CVE-2021-3640: https://www.openwall.com/lists/oss-security/2021/07/22/1) that is similar to the recently fixed CVE-2021-3573.
>
> The key point here is that the sco_conn_del() function can be called when syscalls like sco_sendmsg() is undergoing.

Since hdev->lock is held when sco_conn_del() is called,

3 locks held by poc/6686:
#0: ffff8880158690e0 (&hdev->req_lock){+.+.}-{3:3}, at: hci_dev_do_close+0x44/0x6a0 [bluetooth]
#1: ffff888015868080 (&hdev->lock){+.+.}-{3:3}, at: hci_dev_do_close+0x1ac/0x6a0 [bluetooth]
#2: ffffffffa0630030 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_conn_hash_flush+0x6f/0x140 [bluetooth]

I guess that holding hdev->lock when sco_send_frame() is called would avoid use-after-free.

diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c
index d9a4e88dacbb..f5339bfba4a5 100644
--- a/net/bluetooth/sco.c
+++ b/net/bluetooth/sco.c
@@ -727,10 +727,17 @@ static int sco_sock_sendmsg(struct socket *sock, struct msghdr *msg,

lock_sock(sk);

- if (sk->sk_state == BT_CONNECTED)
- err = sco_send_frame(sk, msg, len);
- else
- err = -ENOTCONN;
+ err = -ENOTCONN;
+ if (sk->sk_state == BT_CONNECTED) {
+ struct hci_dev *hdev = hci_get_route(&sco_pi(sk)->dst, &sco_pi(sk)->src, BDADDR_BREDR);
+
+ if (hdev) {
+ hci_dev_lock(hdev);
+ err = sco_send_frame(sk, msg, len);
+ hci_dev_unlock(hdev);
+ hci_dev_put(hdev);
+ }
+ }

release_sock(sk);
return err;

But I'm not happy with calling hci_get_route() every time.
Can we cache the hdev found upon sco_connect() ?

2021-09-03 03:05:13

by Tetsuo Handa

[permalink] [raw]
Subject: [PATCH] Bluetooth: avoid page fault from sco_send_frame()

Since userfaultfd mechanism allows sleeping with kernel lock held,
avoiding page fault with kernel lock held where possible will make
the module more robust. This patch just brings memcpy_from_msg() calls
to out of sock lock.

This patch is an instant mitigation for CVE-2021-3640. To fully close
the race window for this use-after-free problem, we need more changes.

Signed-off-by: Tetsuo Handa <[email protected]>
---
net/bluetooth/sco.c | 21 ++++++++++++++-------
1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c
index d9a4e88dacbb..e4b079b31ce9 100644
--- a/net/bluetooth/sco.c
+++ b/net/bluetooth/sco.c
@@ -273,7 +273,7 @@ static int sco_connect(struct sock *sk)
return err;
}

-static int sco_send_frame(struct sock *sk, struct msghdr *msg, int len)
+static int sco_send_frame(struct sock *sk, const void *buf, int len, int flags)
{
struct sco_conn *conn = sco_pi(sk)->conn;
struct sk_buff *skb;
@@ -285,14 +285,11 @@ static int sco_send_frame(struct sock *sk, struct msghdr *msg, int len)

BT_DBG("sk %p len %d", sk, len);

- skb = bt_skb_send_alloc(sk, len, msg->msg_flags & MSG_DONTWAIT, &err);
+ skb = bt_skb_send_alloc(sk, len, flags & MSG_DONTWAIT, &err);
if (!skb)
return err;

- if (memcpy_from_msg(skb_put(skb, len), msg, len)) {
- kfree_skb(skb);
- return -EFAULT;
- }
+ memcpy(skb_put(skb, len), buf, len);

hci_send_sco(conn->hcon, skb);

@@ -714,6 +711,7 @@ static int sco_sock_sendmsg(struct socket *sock, struct msghdr *msg,
size_t len)
{
struct sock *sk = sock->sk;
+ void *buf;
int err;

BT_DBG("sock %p, sk %p", sock, sk);
@@ -725,14 +723,23 @@ static int sco_sock_sendmsg(struct socket *sock, struct msghdr *msg,
if (msg->msg_flags & MSG_OOB)
return -EOPNOTSUPP;

+ buf = kmalloc(len, GFP_KERNEL | __GFP_NOWARN);
+ if (!buf)
+ return -ENOMEM;
+ if (memcpy_from_msg(buf, msg, len)) {
+ kfree(buf);
+ return -EFAULT;
+ }
+
lock_sock(sk);

if (sk->sk_state == BT_CONNECTED)
- err = sco_send_frame(sk, msg, len);
+ err = sco_send_frame(sk, buf, len, msg->msg_flags);
else
err = -ENOTCONN;

release_sock(sk);
+ kfree(buf);
return err;
}

--
2.30.2


2021-09-03 04:09:41

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: avoid page fault from sco_send_frame()

Hi Tetsuo,

On Thu, Sep 2, 2021 at 7:44 PM Tetsuo Handa
<[email protected]> wrote:
>
> Since userfaultfd mechanism allows sleeping with kernel lock held,
> avoiding page fault with kernel lock held where possible will make
> the module more robust. This patch just brings memcpy_from_msg() calls
> to out of sock lock.
>
> This patch is an instant mitigation for CVE-2021-3640. To fully close
> the race window for this use-after-free problem, we need more changes.
>
> Signed-off-by: Tetsuo Handa <[email protected]>
> ---
> net/bluetooth/sco.c | 21 ++++++++++++++-------
> 1 file changed, 14 insertions(+), 7 deletions(-)
>
> diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c
> index d9a4e88dacbb..e4b079b31ce9 100644
> --- a/net/bluetooth/sco.c
> +++ b/net/bluetooth/sco.c
> @@ -273,7 +273,7 @@ static int sco_connect(struct sock *sk)
> return err;
> }
>
> -static int sco_send_frame(struct sock *sk, struct msghdr *msg, int len)
> +static int sco_send_frame(struct sock *sk, const void *buf, int len, int flags)
> {
> struct sco_conn *conn = sco_pi(sk)->conn;
> struct sk_buff *skb;
> @@ -285,14 +285,11 @@ static int sco_send_frame(struct sock *sk, struct msghdr *msg, int len)
>
> BT_DBG("sk %p len %d", sk, len);
>
> - skb = bt_skb_send_alloc(sk, len, msg->msg_flags & MSG_DONTWAIT, &err);
> + skb = bt_skb_send_alloc(sk, len, flags & MSG_DONTWAIT, &err);
> if (!skb)
> return err;
>
> - if (memcpy_from_msg(skb_put(skb, len), msg, len)) {
> - kfree_skb(skb);
> - return -EFAULT;
> - }
> + memcpy(skb_put(skb, len), buf, len);
>
> hci_send_sco(conn->hcon, skb);
>
> @@ -714,6 +711,7 @@ static int sco_sock_sendmsg(struct socket *sock, struct msghdr *msg,
> size_t len)
> {
> struct sock *sk = sock->sk;
> + void *buf;
> int err;
>
> BT_DBG("sock %p, sk %p", sock, sk);
> @@ -725,14 +723,23 @@ static int sco_sock_sendmsg(struct socket *sock, struct msghdr *msg,
> if (msg->msg_flags & MSG_OOB)
> return -EOPNOTSUPP;
>
> + buf = kmalloc(len, GFP_KERNEL | __GFP_NOWARN);
> + if (!buf)
> + return -ENOMEM;
> + if (memcpy_from_msg(buf, msg, len)) {
> + kfree(buf);
> + return -EFAULT;
> + }

There is a set already handing this sort of problem:

https://patchwork.kernel.org/project/bluetooth/patch/[email protected]/

> lock_sock(sk);
>
> if (sk->sk_state == BT_CONNECTED)
> - err = sco_send_frame(sk, msg, len);
> + err = sco_send_frame(sk, buf, len, msg->msg_flags);
> else
> err = -ENOTCONN;
>
> release_sock(sk);
> + kfree(buf);
> return err;
> }
>
> --
> 2.30.2
>
>


--
Luiz Augusto von Dentz

2021-09-03 04:43:14

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: avoid page fault from sco_send_frame()

On 2021/09/03 12:48, Luiz Augusto von Dentz wrote:
> There is a set already handing this sort of problem:
>
> https://patchwork.kernel.org/project/bluetooth/patch/[email protected]/

OK, I didn't know that. (I'm not subscribed to bluethooth ML.)

But can we please keep the fix minimal? Multiple distributors are
waiting for the fix (which can be backported) for more than one month.

https://security-tracker.debian.org/tracker/CVE-2021-3640
https://access.redhat.com/security/cve/cve-2021-3640

And it looks to me that your
"[3/4] Bluetooth: SCO: Replace use of memcpy_from_msg with bt_skb_sendmsg"
contains a new use-after-free or memory corruption bug... :-(

2021-09-04 02:11:31

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: avoid page fault from sco_send_frame()

Commit 99c23da0eed4fd20 ("Bluetooth: sco: Fix lock_sock() blockage by memcpy_from_msg()") in linux-next.git should be sent to linux.git now as a mitigation for CVE-2021-3640.

But I think "[PATCH v3 3/4] Bluetooth: SCO: Replace use of memcpy_from_msg with bt_skb_sendmsg" still contains bug.

2021-10-11 07:21:18

by Takashi Iwai

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: avoid page fault from sco_send_frame()

On Mon, 11 Oct 2021 09:00:00 +0200,
Salvatore Bonaccorso wrote:
>
> Hi,
>
> On Sat, Sep 04, 2021 at 11:02:58AM +0900, Tetsuo Handa wrote:
> > Commit 99c23da0eed4fd20 ("Bluetooth: sco: Fix lock_sock() blockage
> > by memcpy_from_msg()") in linux-next.git should be sent to linux.git
> > now as a mitigation for CVE-2021-3640.
> >
> > But I think "[PATCH v3 3/4] Bluetooth: SCO: Replace use of
> > memcpy_from_msg with bt_skb_sendmsg" still contains bug.
>
> Did his one felt through the cracks? I'm confused about the statement
> in https://bugzilla.suse.com/show_bug.cgi?id=1188172#c8 so Cc'ing
> Takashi Iwai as well.

The quite similar fix has been already in the subsystem tree,
commit 99c23da0eed4 ("Bluetooth: sco: Fix lock_sock() blockage by
memcpy_from_msg()"). The particular CVE should be covered by that and
prerequisite patches.


Takashi

2021-10-11 11:38:32

by Salvatore Bonaccorso

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: avoid page fault from sco_send_frame()

Hi,

On Sat, Sep 04, 2021 at 11:02:58AM +0900, Tetsuo Handa wrote:
> Commit 99c23da0eed4fd20 ("Bluetooth: sco: Fix lock_sock() blockage
> by memcpy_from_msg()") in linux-next.git should be sent to linux.git
> now as a mitigation for CVE-2021-3640.
>
> But I think "[PATCH v3 3/4] Bluetooth: SCO: Replace use of
> memcpy_from_msg with bt_skb_sendmsg" still contains bug.

Did his one felt through the cracks? I'm confused about the statement
in https://bugzilla.suse.com/show_bug.cgi?id=1188172#c8 so Cc'ing
Takashi Iwai as well.

Regards,
Salvatore