2021-06-27 13:21:19

by Tetsuo Handa

[permalink] [raw]
Subject: [PATCH] Bluetooth: call lock_sock() outside of spinlock section

syzbot is hitting might_sleep() warning at hci_sock_dev_event() due to
calling lock_sock() with rw spinlock held [1]. Defer calling lock_sock()
via sock_hold().

Link: https://syzkaller.appspot.com/bug?extid=a5df189917e79d5e59c9 [1]
Reported-by: syzbot <[email protected]>
Signed-off-by: Tetsuo Handa <[email protected]>
Tested-by: syzbot <[email protected]>
Fixes: e305509e678b3a4a ("Bluetooth: use correct lock to prevent UAF of hdev object")
---
net/bluetooth/hci_sock.c | 24 ++++++++++++++++++++++--
1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index eed0dd066e12..64e54ab0892f 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -759,19 +759,39 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
if (event == HCI_DEV_UNREG) {
struct sock *sk;

+restart:
/* Detach sockets from device */
read_lock(&hci_sk_list.lock);
sk_for_each(sk, &hci_sk_list.head) {
+ /* hci_sk_list.lock is preventing hci_sock_release()
+ * from calling bt_sock_unlink().
+ */
+ if (hci_pi(sk)->hdev != hdev || sk_unhashed(sk))
+ continue;
+ /* Take a ref because we can't call lock_sock() with
+ * hci_sk_list.lock held.
+ */
+ sock_hold(sk);
+ read_unlock(&hci_sk_list.lock);
lock_sock(sk);
- if (hci_pi(sk)->hdev == hdev) {
+ /* Since hci_sock_release() might have already called
+ * bt_sock_unlink() while waiting for lock_sock(),
+ * use sk_hashed(sk) for checking that bt_sock_unlink()
+ * is not yet called.
+ */
+ if (sk_hashed(sk) && hci_pi(sk)->hdev == hdev) {
hci_pi(sk)->hdev = NULL;
sk->sk_err = EPIPE;
sk->sk_state = BT_OPEN;
sk->sk_state_change(sk);
-
hci_dev_put(hdev);
}
release_sock(sk);
+ sock_put(sk);
+ /* Restarting is safe, for hci_pi(sk)->hdev is now NULL
+ * if condition met and will "continue;" otherwise.
+ */
+ goto restart;
}
read_unlock(&hci_sk_list.lock);
}
--
2.18.4


2021-06-27 14:12:31

by bluez.test.bot

[permalink] [raw]
Subject: RE: Bluetooth: call lock_sock() outside of spinlock section

This is automated email and please do not reply to this email!

Dear submitter,

Thank you for submitting the patches to the linux bluetooth mailing list.
This is a CI test results with your patch series:
PW Link:https://patchwork.kernel.org/project/bluetooth/list/?series=507659

---Test result---

Test Summary:
CheckPatch PASS 0.88 seconds
GitLint FAIL 0.12 seconds
BuildKernel PASS 542.37 seconds
TestRunner: Setup PASS 358.20 seconds
TestRunner: l2cap-tester PASS 2.78 seconds
TestRunner: bnep-tester PASS 1.91 seconds
TestRunner: mgmt-tester FAIL 33.12 seconds
TestRunner: rfcomm-tester PASS 2.30 seconds
TestRunner: sco-tester PASS 2.07 seconds
TestRunner: smp-tester PASS 2.25 seconds
TestRunner: userchan-tester PASS 2.08 seconds

Details
##############################
Test: CheckPatch - PASS - 0.88 seconds
Run checkpatch.pl script with rule in .checkpatch.conf


##############################
Test: GitLint - FAIL - 0.12 seconds
Run gitlint with rule in .gitlint
Bluetooth: call lock_sock() outside of spinlock section
11: B1 Line exceeds max length (85>80): "Fixes: e305509e678b3a4a ("Bluetooth: use correct lock to prevent UAF of hdev object")"


##############################
Test: BuildKernel - PASS - 542.37 seconds
Build Kernel with minimal configuration supports Bluetooth


##############################
Test: TestRunner: Setup - PASS - 358.20 seconds
Setup environment for running Test Runner


##############################
Test: TestRunner: l2cap-tester - PASS - 2.78 seconds
Run test-runner with l2cap-tester
Total: 40, Passed: 40 (100.0%), Failed: 0, Not Run: 0

##############################
Test: TestRunner: bnep-tester - PASS - 1.91 seconds
Run test-runner with bnep-tester
Total: 1, Passed: 1 (100.0%), Failed: 0, Not Run: 0

##############################
Test: TestRunner: mgmt-tester - FAIL - 33.12 seconds
Run test-runner with mgmt-tester
Total: 446, Passed: 436 (97.8%), Failed: 5, Not Run: 5

Failed Test Cases
Read Ext Controller Info 1 Failed 0.028 seconds
Read Ext Controller Info 2 Failed 0.028 seconds
Read Ext Controller Info 3 Failed 0.028 seconds
Read Ext Controller Info 4 Failed 0.020 seconds
Read Ext Controller Info 5 Failed 0.024 seconds

##############################
Test: TestRunner: rfcomm-tester - PASS - 2.30 seconds
Run test-runner with rfcomm-tester
Total: 9, Passed: 9 (100.0%), Failed: 0, Not Run: 0

##############################
Test: TestRunner: sco-tester - PASS - 2.07 seconds
Run test-runner with sco-tester
Total: 8, Passed: 8 (100.0%), Failed: 0, Not Run: 0

##############################
Test: TestRunner: smp-tester - PASS - 2.25 seconds
Run test-runner with smp-tester
Total: 8, Passed: 8 (100.0%), Failed: 0, Not Run: 0

##############################
Test: TestRunner: userchan-tester - PASS - 2.08 seconds
Run test-runner with userchan-tester
Total: 3, Passed: 3 (100.0%), Failed: 0, Not Run: 0



---
Regards,
Linux Bluetooth


Attachments:
l2cap-tester.log (43.31 kB)
bnep-tester.log (3.48 kB)
mgmt-tester.log (599.63 kB)
rfcomm-tester.log (11.41 kB)
sco-tester.log (9.68 kB)
smp-tester.log (11.55 kB)
userchan-tester.log (5.33 kB)
Download all attachments

2021-07-07 09:46:04

by Tetsuo Handa

[permalink] [raw]
Subject: [PATCH v2] Bluetooth: call lock_sock() outside of spinlock section

syzbot is hitting might_sleep() warning at hci_sock_dev_event() due to
calling lock_sock() with rw spinlock held [1]. Defer calling lock_sock()
via sock_hold().

Link: https://syzkaller.appspot.com/bug?extid=a5df189917e79d5e59c9 [1]
Reported-by: syzbot <[email protected]>
Signed-off-by: Tetsuo Handa <[email protected]>
Tested-by: syzbot <[email protected]>
Fixes: e305509e678b3a4a ("Bluetooth: use correct lock to prevent UAF of hdev object")
---
Changes in v2:
Take hci_sk_list.lock for write in case bt_sock_unlink() is called after
sk_hashed(sk) test, and defer hci_dev_put(hdev) till schedulable context.

net/bluetooth/hci_sock.c | 32 +++++++++++++++++++++++++++++---
1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index b04a5a02ecf3..d8e1ac1ae10d 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -758,20 +758,46 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)

if (event == HCI_DEV_UNREG) {
struct sock *sk;
+ bool put_dev;

+restart:
+ put_dev = false;
/* Detach sockets from device */
read_lock(&hci_sk_list.lock);
sk_for_each(sk, &hci_sk_list.head) {
+ /* hci_sk_list.lock is preventing hci_sock_release()
+ * from calling bt_sock_unlink().
+ */
+ if (hci_pi(sk)->hdev != hdev || sk_unhashed(sk))
+ continue;
+ /* Take a ref because we can't call lock_sock() with
+ * hci_sk_list.lock held.
+ */
+ sock_hold(sk);
+ read_unlock(&hci_sk_list.lock);
lock_sock(sk);
- if (hci_pi(sk)->hdev == hdev) {
+ /* Since hci_sock_release() might have already called
+ * bt_sock_unlink() while waiting for lock_sock(),
+ * use sk_hashed(sk) for checking that bt_sock_unlink()
+ * is not yet called.
+ */
+ write_lock(&hci_sk_list.lock);
+ if (sk_hashed(sk) && hci_pi(sk)->hdev == hdev) {
hci_pi(sk)->hdev = NULL;
sk->sk_err = EPIPE;
sk->sk_state = BT_OPEN;
sk->sk_state_change(sk);
-
- hci_dev_put(hdev);
+ put_dev = true;
}
+ write_unlock(&hci_sk_list.lock);
release_sock(sk);
+ sock_put(sk);
+ if (put_dev)
+ hci_dev_put(hdev);
+ /* Restarting is safe, for hci_pi(sk)->hdev != hdev if
+ * condition met and sk_unhashed(sk) == true otherwise.
+ */
+ goto restart;
}
read_unlock(&hci_sk_list.lock);
}
--
2.18.4


2021-07-07 10:10:46

by bluez.test.bot

[permalink] [raw]
Subject: RE: [v2] Bluetooth: call lock_sock() outside of spinlock section

This is automated email and please do not reply to this email!

Dear submitter,

Thank you for submitting the patches to the linux bluetooth mailing list.
This is a CI test results with your patch series:
PW Link:https://patchwork.kernel.org/project/bluetooth/list/?series=511891

---Test result---

Test Summary:
CheckPatch FAIL 0.48 seconds
GitLint FAIL 0.11 seconds
BuildKernel PASS 534.00 seconds
TestRunner: Setup PASS 350.30 seconds
TestRunner: l2cap-tester PASS 2.51 seconds
TestRunner: bnep-tester PASS 1.89 seconds
TestRunner: mgmt-tester FAIL 30.45 seconds
TestRunner: rfcomm-tester PASS 2.07 seconds
TestRunner: sco-tester PASS 2.02 seconds
TestRunner: smp-tester PASS 2.09 seconds
TestRunner: userchan-tester PASS 1.91 seconds

Details
##############################
Test: CheckPatch - FAIL - 0.48 seconds
Run checkpatch.pl script with rule in .checkpatch.conf
Bluetooth: call lock_sock() outside of spinlock section
WARNING: From:/Signed-off-by: email address mismatch: 'From: Tetsuo Handa <[email protected]>' != 'Signed-off-by: Tetsuo Handa <[email protected]>'

total: 0 errors, 1 warnings, 0 checks, 49 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
mechanically convert to the typical style using --fix or --fix-inplace.

"[PATCH] Bluetooth: call lock_sock() outside of spinlock section" has style problems, please review.

NOTE: If any of the errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.


##############################
Test: GitLint - FAIL - 0.11 seconds
Run gitlint with rule in .gitlint
Bluetooth: call lock_sock() outside of spinlock section
11: B1 Line exceeds max length (85>80): "Fixes: e305509e678b3a4a ("Bluetooth: use correct lock to prevent UAF of hdev object")"


##############################
Test: BuildKernel - PASS - 534.00 seconds
Build Kernel with minimal configuration supports Bluetooth


##############################
Test: TestRunner: Setup - PASS - 350.30 seconds
Setup environment for running Test Runner


##############################
Test: TestRunner: l2cap-tester - PASS - 2.51 seconds
Run test-runner with l2cap-tester
Total: 40, Passed: 40 (100.0%), Failed: 0, Not Run: 0

##############################
Test: TestRunner: bnep-tester - PASS - 1.89 seconds
Run test-runner with bnep-tester
Total: 1, Passed: 1 (100.0%), Failed: 0, Not Run: 0

##############################
Test: TestRunner: mgmt-tester - FAIL - 30.45 seconds
Run test-runner with mgmt-tester
Total: 446, Passed: 436 (97.8%), Failed: 5, Not Run: 5

Failed Test Cases
Read Ext Controller Info 1 Failed 0.014 seconds
Read Ext Controller Info 2 Failed 0.014 seconds
Read Ext Controller Info 3 Failed 0.013 seconds
Read Ext Controller Info 4 Failed 0.013 seconds
Read Ext Controller Info 5 Failed 0.015 seconds

##############################
Test: TestRunner: rfcomm-tester - PASS - 2.07 seconds
Run test-runner with rfcomm-tester
Total: 9, Passed: 9 (100.0%), Failed: 0, Not Run: 0

##############################
Test: TestRunner: sco-tester - PASS - 2.02 seconds
Run test-runner with sco-tester
Total: 8, Passed: 8 (100.0%), Failed: 0, Not Run: 0

##############################
Test: TestRunner: smp-tester - PASS - 2.09 seconds
Run test-runner with smp-tester
Total: 8, Passed: 8 (100.0%), Failed: 0, Not Run: 0

##############################
Test: TestRunner: userchan-tester - PASS - 1.91 seconds
Run test-runner with userchan-tester
Total: 3, Passed: 3 (100.0%), Failed: 0, Not Run: 0



---
Regards,
Linux Bluetooth


Attachments:
l2cap-tester.log (43.31 kB)
bnep-tester.log (3.48 kB)
mgmt-tester.log (599.67 kB)
rfcomm-tester.log (11.41 kB)
sco-tester.log (9.68 kB)
smp-tester.log (11.55 kB)
userchan-tester.log (5.33 kB)
Download all attachments

2021-07-07 18:26:00

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: [PATCH v2] Bluetooth: call lock_sock() outside of spinlock section

Hi Tetsuo,

On Wed, Jul 7, 2021 at 2:43 AM Tetsuo Handa
<[email protected]> wrote:
>
> syzbot is hitting might_sleep() warning at hci_sock_dev_event() due to
> calling lock_sock() with rw spinlock held [1]. Defer calling lock_sock()
> via sock_hold().
>
> Link: https://syzkaller.appspot.com/bug?extid=a5df189917e79d5e59c9 [1]
> Reported-by: syzbot <[email protected]>
> Signed-off-by: Tetsuo Handa <[email protected]>
> Tested-by: syzbot <[email protected]>
> Fixes: e305509e678b3a4a ("Bluetooth: use correct lock to prevent UAF of hdev object")
> ---
> Changes in v2:
> Take hci_sk_list.lock for write in case bt_sock_unlink() is called after
> sk_hashed(sk) test, and defer hci_dev_put(hdev) till schedulable context.
>
> net/bluetooth/hci_sock.c | 32 +++++++++++++++++++++++++++++---
> 1 file changed, 29 insertions(+), 3 deletions(-)
>
> diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
> index b04a5a02ecf3..d8e1ac1ae10d 100644
> --- a/net/bluetooth/hci_sock.c
> +++ b/net/bluetooth/hci_sock.c
> @@ -758,20 +758,46 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
>
> if (event == HCI_DEV_UNREG) {
> struct sock *sk;
> + bool put_dev;
>
> +restart:
> + put_dev = false;
> /* Detach sockets from device */
> read_lock(&hci_sk_list.lock);
> sk_for_each(sk, &hci_sk_list.head) {
> + /* hci_sk_list.lock is preventing hci_sock_release()
> + * from calling bt_sock_unlink().
> + */
> + if (hci_pi(sk)->hdev != hdev || sk_unhashed(sk))
> + continue;
> + /* Take a ref because we can't call lock_sock() with
> + * hci_sk_list.lock held.
> + */
> + sock_hold(sk);
> + read_unlock(&hci_sk_list.lock);
> lock_sock(sk);
> - if (hci_pi(sk)->hdev == hdev) {
> + /* Since hci_sock_release() might have already called
> + * bt_sock_unlink() while waiting for lock_sock(),
> + * use sk_hashed(sk) for checking that bt_sock_unlink()
> + * is not yet called.
> + */
> + write_lock(&hci_sk_list.lock);
> + if (sk_hashed(sk) && hci_pi(sk)->hdev == hdev) {
> hci_pi(sk)->hdev = NULL;
> sk->sk_err = EPIPE;
> sk->sk_state = BT_OPEN;
> sk->sk_state_change(sk);
> -
> - hci_dev_put(hdev);
> + put_dev = true;
> }
> + write_unlock(&hci_sk_list.lock);
> release_sock(sk);
> + sock_put(sk);
> + if (put_dev)
> + hci_dev_put(hdev);
> + /* Restarting is safe, for hci_pi(sk)->hdev != hdev if
> + * condition met and sk_unhashed(sk) == true otherwise.
> + */
> + goto restart;

This sounds a little too complicated, afaik backward goto is not even
consider a good practice either, since it appears we don't unlink the
sockets here we could perhaps don't release the reference to hdev
either and leave hci_sock_release to deal with it and then perhaps we
can take away the backward goto, actually why are you restarting to
begin with? It is also weird that this only manifests in the Bluetooth
HCI sockets or other subsystems don't use such locking mechanism
anymore?


> }
> read_unlock(&hci_sk_list.lock);
> }
> --
> 2.18.4
>
>


--
Luiz Augusto von Dentz

2021-07-07 23:36:53

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH v2] Bluetooth: call lock_sock() outside of spinlock section

On 2021/07/08 3:20, Luiz Augusto von Dentz wrote:
>> diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
>> index b04a5a02ecf3..d8e1ac1ae10d 100644
>> --- a/net/bluetooth/hci_sock.c
>> +++ b/net/bluetooth/hci_sock.c
>> @@ -758,20 +758,46 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
>>
>> if (event == HCI_DEV_UNREG) {
>> struct sock *sk;
>> + bool put_dev;
>>
>> +restart:
>> + put_dev = false;
>> /* Detach sockets from device */
>> read_lock(&hci_sk_list.lock);
>> sk_for_each(sk, &hci_sk_list.head) {
>> + /* hci_sk_list.lock is preventing hci_sock_release()
>> + * from calling bt_sock_unlink().
>> + */
>> + if (hci_pi(sk)->hdev != hdev || sk_unhashed(sk))
>> + continue;
>> + /* Take a ref because we can't call lock_sock() with
>> + * hci_sk_list.lock held.
>> + */
>> + sock_hold(sk);
>> + read_unlock(&hci_sk_list.lock);
>> lock_sock(sk);
>> - if (hci_pi(sk)->hdev == hdev) {
>> + /* Since hci_sock_release() might have already called
>> + * bt_sock_unlink() while waiting for lock_sock(),
>> + * use sk_hashed(sk) for checking that bt_sock_unlink()
>> + * is not yet called.
>> + */
>> + write_lock(&hci_sk_list.lock);
>> + if (sk_hashed(sk) && hci_pi(sk)->hdev == hdev) {
>> hci_pi(sk)->hdev = NULL;
>> sk->sk_err = EPIPE;
>> sk->sk_state = BT_OPEN;
>> sk->sk_state_change(sk);
>> -
>> - hci_dev_put(hdev);
>> + put_dev = true;
>> }
>> + write_unlock(&hci_sk_list.lock);
>> release_sock(sk);
>> + sock_put(sk);
>> + if (put_dev)
>> + hci_dev_put(hdev);
>> + /* Restarting is safe, for hci_pi(sk)->hdev != hdev if
>> + * condition met and sk_unhashed(sk) == true otherwise.
>> + */
>> + goto restart;
>
> This sounds a little too complicated, afaik backward goto is not even
> consider a good practice either, since it appears we don't unlink the
> sockets here

Because hci_sock_release() might be concurrently called while
hci_sock_dev_event() from hci_unregister_dev() from vhci_release() is running.

While hci_sock_dev_event() itself does not unlink the sockets from hci_sk_list.head,
bt_sock_unlink() from hci_sock_release() unlinks a socket from hci_sk_list.head.

Therefore, as long as there is possibility that hci_sk_list is modified by other thread
when current thread is traversing this list, we need to be prepared for such race.

> we could perhaps don't release the reference to hdev
> either and leave hci_sock_release to deal with it and then perhaps we
> can take away the backward goto, actually why are you restarting to
> begin with?

Do you mean something like

diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index b04a5a02ecf3..0525883f4639 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -759,19 +759,14 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
if (event == HCI_DEV_UNREG) {
struct sock *sk;

- /* Detach sockets from device */
+ /* Change socket state and notify */
read_lock(&hci_sk_list.lock);
sk_for_each(sk, &hci_sk_list.head) {
- lock_sock(sk);
if (hci_pi(sk)->hdev == hdev) {
- hci_pi(sk)->hdev = NULL;
sk->sk_err = EPIPE;
sk->sk_state = BT_OPEN;
sk->sk_state_change(sk);
-
- hci_dev_put(hdev);
}
- release_sock(sk);
}
read_unlock(&hci_sk_list.lock);
}

? I can't judge because I don't know how this works. I worry that
without lock_sock()/release_sock(), this races with e.g. hci_sock_bind().

We could take away the backward goto if we can do something like below.

diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index b04a5a02ecf3..1ca03769badf 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -43,6 +43,8 @@ static DEFINE_IDA(sock_cookie_ida);

static atomic_t monitor_promisc = ATOMIC_INIT(0);

+static DEFINE_MUTEX(sock_list_lock);
+
/* ----- HCI socket interface ----- */

/* Socket info */
@@ -760,7 +762,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
struct sock *sk;

/* Detach sockets from device */
- read_lock(&hci_sk_list.lock);
+ mutex_lock(&sock_list_lock);
sk_for_each(sk, &hci_sk_list.head) {
lock_sock(sk);
if (hci_pi(sk)->hdev == hdev) {
@@ -773,7 +775,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
}
release_sock(sk);
}
- read_unlock(&hci_sk_list.lock);
+ mutex_unlock(&sock_list_lock);
}
}

@@ -838,6 +840,7 @@ static int hci_sock_release(struct socket *sock)
if (!sk)
return 0;

+ mutex_lock(&sock_list_lock);
lock_sock(sk);

switch (hci_pi(sk)->channel) {
@@ -860,6 +863,7 @@ static int hci_sock_release(struct socket *sock)
}

bt_sock_unlink(&hci_sk_list, sk);
+ mutex_unlock(&sock_list_lock);

hdev = hci_pi(sk)->hdev;
if (hdev) {
@@ -2049,7 +2053,9 @@ static int hci_sock_create(struct net *net, struct socket *sock, int protocol,
sock->state = SS_UNCONNECTED;
sk->sk_state = BT_OPEN;

+ mutex_lock(&sock_list_lock);
bt_sock_link(&hci_sk_list, sk);
+ mutex_unlock(&sock_list_lock);
return 0;
}


> It is also weird that this only manifests in the Bluetooth
> HCI sockets or other subsystems don't use such locking mechanism
> anymore?

If other subsystems have similar problem, that should be handled by different
patches. This patch fixes a regression introduced when fixing CVE-2021-3573,
and I think that Linux distributors are waiting for this regression to be fixed
so that they can backport commit e305509e678b3a4a ("Bluetooth: use correct lock
to prevent UAF of hdev object"). Also, this regression is currently 7th top
crashers for syzbot, and I'd like to apply this patch as soon as possible.

I think that this patch can serve as a response to Lin's comment

> In short, I have no idea if there is any lock replacing solution for
> this bug. I need help and suggestions because the lock mechanism is
> just so difficult.

at https://patchwork.kernel.org/project/bluetooth/patch/[email protected]om
without changing behavior.

>
>
>> }
>> read_unlock(&hci_sk_list.lock);
>> }
>> --
>> 2.18.4

2021-07-08 01:03:12

by LinMa

[permalink] [raw]
Subject: Re: Re: [PATCH v2] Bluetooth: call lock_sock() outside of spinlock section

>
> diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
> index b04a5a02ecf3..0525883f4639 100644
> --- a/net/bluetooth/hci_sock.c
> +++ b/net/bluetooth/hci_sock.c
> @@ -759,19 +759,14 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
> if (event == HCI_DEV_UNREG) {
> struct sock *sk;
>
> - /* Detach sockets from device */
> + /* Change socket state and notify */
> read_lock(&hci_sk_list.lock);
> sk_for_each(sk, &hci_sk_list.head) {
> - lock_sock(sk);
> if (hci_pi(sk)->hdev == hdev) {
> - hci_pi(sk)->hdev = NULL;
> sk->sk_err = EPIPE;
> sk->sk_state = BT_OPEN;
> sk->sk_state_change(sk);
> -
> - hci_dev_put(hdev);
> }
> - release_sock(sk);
> }
> read_unlock(&hci_sk_list.lock);
> }
>
> ? I can't judge because I don't know how this works. I worry that
> without lock_sock()/release_sock(), this races with e.g. hci_sock_bind().
>
> We could take away the backward goto if we can do something like below.
>
> diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
> index b04a5a02ecf3..1ca03769badf 100644
> --- a/net/bluetooth/hci_sock.c
> +++ b/net/bluetooth/hci_sock.c
> @@ -43,6 +43,8 @@ static DEFINE_IDA(sock_cookie_ida);
>
> static atomic_t monitor_promisc = ATOMIC_INIT(0);
>
> +static DEFINE_MUTEX(sock_list_lock);
> +
> /* ----- HCI socket interface ----- */
>
> /* Socket info */
> @@ -760,7 +762,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
> struct sock *sk;
>
> /* Detach sockets from device */
> - read_lock(&hci_sk_list.lock);
> + mutex_lock(&sock_list_lock);
> sk_for_each(sk, &hci_sk_list.head) {
> lock_sock(sk);
> if (hci_pi(sk)->hdev == hdev) {
> @@ -773,7 +775,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
> }
> release_sock(sk);
> }
> - read_unlock(&hci_sk_list.lock);
> + mutex_unlock(&sock_list_lock);
> }
> }
>
> @@ -838,6 +840,7 @@ static int hci_sock_release(struct socket *sock)
> if (!sk)
> return 0;
>
> + mutex_lock(&sock_list_lock);
> lock_sock(sk);
>
> switch (hci_pi(sk)->channel) {
> @@ -860,6 +863,7 @@ static int hci_sock_release(struct socket *sock)
> }
>
> bt_sock_unlink(&hci_sk_list, sk);
> + mutex_unlock(&sock_list_lock);
>
> hdev = hci_pi(sk)->hdev;
> if (hdev) {
> @@ -2049,7 +2053,9 @@ static int hci_sock_create(struct net *net, struct socket *sock, int protocol,
> sock->state = SS_UNCONNECTED;
> sk->sk_state = BT_OPEN;
>
> + mutex_lock(&sock_list_lock);
> bt_sock_link(&hci_sk_list, sk);
> + mutex_unlock(&sock_list_lock);
> return 0;
> }
>
>
> > It is also weird that this only manifests in the Bluetooth
> > HCI sockets or other subsystems don't use such locking mechanism
> > anymore?
>

Hello Tetsuo,

Yeah, that's a great patch indeed. Add one extra mutex lock for handling this.
In fact, I have tried to replace all the hci_sk_list.lock from rwlock_t to mutext.

> https://patchwork.kernel.org/project/bluetooth/patch/[email protected]om/
> However, from the lock principle in the Linux kernel, this lock
> replacement is not appropriate. I take a lot of time to try with other
> lock combinations for this case but failed. For example, I tried to
> replace the rwlock_t in the hci_sk_list with a sleep-able mutex lock.

Because I have seem other part of code in kernel uses this combination: mutex_t + lock_sock. It shouldn't trigger any locking errors. (Will test it)

> Also, this regression is currently 7th top
> crashers for syzbot, and I'd like to apply this patch as soon as possible.
>

XD, Yeah. Because the bug crash point is located at function hci_sock_dev_event(). Whenever syzkaller fuzzes Bluetooth stack and the executor exits, the crash happens.

> I think that this patch can serve as a response to Lin's comment

> > In short, I have no idea if there is any lock replacing solution for
> > this bug. I need help and suggestions because the lock mechanism is
> > just so difficult.

Thanks for that, it's quite appreciating.

Regards
Lin Ma

2021-07-08 07:19:01

by bluez.test.bot

[permalink] [raw]
Subject: RE: [v2] Bluetooth: call lock_sock() outside of spinlock section

This is automated email and please do not reply to this email!

Dear submitter,

Thank you for submitting the patches to the linux bluetooth mailing list.
This is a CI test results with your patch series:
PW Link:https://patchwork.kernel.org/project/bluetooth/list/?series=511891

---Test result---

Test Summary:
CheckPatch FAIL 0.66 seconds
GitLint FAIL 0.13 seconds
BuildKernel PASS 670.72 seconds
TestRunner: Setup PASS 443.19 seconds
TestRunner: l2cap-tester PASS 2.99 seconds
TestRunner: bnep-tester PASS 2.16 seconds
TestRunner: mgmt-tester PASS 34.81 seconds
TestRunner: rfcomm-tester PASS 2.44 seconds
TestRunner: sco-tester PASS 2.40 seconds
TestRunner: smp-tester FAIL 2.49 seconds
TestRunner: userchan-tester PASS 2.14 seconds

Details
##############################
Test: CheckPatch - FAIL - 0.66 seconds
Run checkpatch.pl script with rule in .checkpatch.conf
Bluetooth: call lock_sock() outside of spinlock section
WARNING: From:/Signed-off-by: email address mismatch: 'From: Tetsuo Handa <[email protected]>' != 'Signed-off-by: Tetsuo Handa <[email protected]>'

total: 0 errors, 1 warnings, 0 checks, 49 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
mechanically convert to the typical style using --fix or --fix-inplace.

"[PATCH] Bluetooth: call lock_sock() outside of spinlock section" has style problems, please review.

NOTE: If any of the errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.


##############################
Test: GitLint - FAIL - 0.13 seconds
Run gitlint with rule in .gitlint
Bluetooth: call lock_sock() outside of spinlock section
11: B1 Line exceeds max length (85>80): "Fixes: e305509e678b3a4a ("Bluetooth: use correct lock to prevent UAF of hdev object")"


##############################
Test: BuildKernel - PASS - 670.72 seconds
Build Kernel with minimal configuration supports Bluetooth


##############################
Test: TestRunner: Setup - PASS - 443.19 seconds
Setup environment for running Test Runner


##############################
Test: TestRunner: l2cap-tester - PASS - 2.99 seconds
Run test-runner with l2cap-tester
Total: 40, Passed: 40 (100.0%), Failed: 0, Not Run: 0

##############################
Test: TestRunner: bnep-tester - PASS - 2.16 seconds
Run test-runner with bnep-tester
Total: 1, Passed: 1 (100.0%), Failed: 0, Not Run: 0

##############################
Test: TestRunner: mgmt-tester - PASS - 34.81 seconds
Run test-runner with mgmt-tester
Total: 446, Passed: 443 (99.3%), Failed: 0, Not Run: 3

##############################
Test: TestRunner: rfcomm-tester - PASS - 2.44 seconds
Run test-runner with rfcomm-tester
Total: 9, Passed: 9 (100.0%), Failed: 0, Not Run: 0

##############################
Test: TestRunner: sco-tester - PASS - 2.40 seconds
Run test-runner with sco-tester
Total: 8, Passed: 8 (100.0%), Failed: 0, Not Run: 0

##############################
Test: TestRunner: smp-tester - FAIL - 2.49 seconds
Run test-runner with smp-tester
Total: 8, Passed: 7 (87.5%), Failed: 1, Not Run: 0

Failed Test Cases
SMP Client - SC Request 2 Failed 0.036 seconds

##############################
Test: TestRunner: userchan-tester - PASS - 2.14 seconds
Run test-runner with userchan-tester
Total: 3, Passed: 3 (100.0%), Failed: 0, Not Run: 0



---
Regards,
Linux Bluetooth


Attachments:
l2cap-tester.log (43.31 kB)
bnep-tester.log (3.47 kB)
mgmt-tester.log (600.03 kB)
rfcomm-tester.log (11.40 kB)
sco-tester.log (9.68 kB)
smp-tester.log (11.43 kB)
userchan-tester.log (5.32 kB)
Download all attachments

2021-07-09 13:51:34

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH v2] Bluetooth: call lock_sock() outside of spinlock section

It seems that history of this locking problem is a trial and error.

Commit b40df5743ee8aed8 ("[PATCH] bluetooth: fix socket locking in
hci_sock_dev_event()") in 2.6.21-rc4 changed bh_lock_sock() to lock_sock()
as an attempt to fix lockdep warning.

Then, commit 4ce61d1c7a8ef4c1 ("[BLUETOOTH]: Fix locking in
hci_sock_dev_event().") in 2.6.22-rc2 changed lock_sock() to
local_bh_disable() + bh_lock_sock_nested() as an attempt to fix
sleep in atomic context warning.

Then, commit 4b5dd696f81b210c ("Bluetooth: Remove local_bh_disable() from
hci_sock.c") in 3.3-rc1 removed local_bh_disable().

Then, commit e305509e678b3a4a ("Bluetooth: use correct lock to prevent UAF
of hdev object") in 5.13-rc5 again changed bh_lock_sock_nested() to
lock_sock() as an attempt to fix CVE-2021-3573.

But unfortunately it is too difficult to convert rw spinlock into mutex;
we need to live with current rw spinlock.

And we have three choices that can live with current rw spinlock.
Patches for these choices are show bottom. All tested by syzbot.

(1) Introduce a global mutex dedicated for hci_sock_dev_event(), and block
bt_sock_unlink() and concurrent hci_sock_dev_event() callers.

This is simplest if it is guaranteed that total delay for lock_sock()
on all sockets is short enough.

But it is not clear how long lock_sock() might block, for e.g.
hci_sock_bound_ioctl() which is called inside lock_sock() section is
doing copy_from_user()/copy_to_user() which may result in blocking
other lock_sock() waiters for many seconds. I think that POC.zip is
demonstrating that this delay is controllable via userfaultfd.

Of course, the robust fix will be not to call
copy_from_user()/copy_to_user() inside lock_sock() section. But such
big change is not suitable for a fix for commit e305509e678b3a4a
("Bluetooth: use correct lock to prevent UAF of hdev object").

(2) Introduce a global mutex for hci_sock_dev_event(), but don't block
bt_sock_unlink() nor concurrent hci_sock_dev_event() callers (i.e.
use this mutex like a spinlock).

Since it will be safe to resume sk_for_each() as long as currently
accessing socket remains on that list, we can track which socket is
currently accessed, and let bt_sock_unlink() wait if that socket is
currently accessed.

It is possible that total delay becomes long enough for khungtaskd to
complain. Commit 8d0caedb75968304 ("can: bcm/raw/isotp: use per module
netdevice notifier") is an example for avoiding khungtaskd warning
using this choice. Compared to that commit, this choice for
hci_sock_dev_event() case will need to also touch "struct hci_pinfo"
because we need to track concurrent hci_sock_dev_event() callers.

(3) Don't introduce a global mutex for hci_sock_dev_event(), and don't
block bt_sock_unlink() nor concurrent hci_sock_dev_event() callers.

Since it will be safe to resume sk_for_each() as long as currently
accessing socket remains on that list, take a refcount on currently
accessing socket and check if currently accessing socket is still
on the list. This choice needs to touch only hci_sock_dev_event().

Which choice do we want to go?

Patch for choice (1):

----------------------------------------
net/bluetooth/hci_sock.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index b04a5a02ecf3..c860ec4ea7b8 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -150,6 +150,8 @@ static struct bt_sock_list hci_sk_list = {
.lock = __RW_LOCK_UNLOCKED(hci_sk_list.lock)
};

+static DEFINE_MUTEX(hci_sk_list_lock);
+
static bool is_filtered_packet(struct sock *sk, struct sk_buff *skb)
{
struct hci_filter *flt;
@@ -758,10 +760,13 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)

if (event == HCI_DEV_UNREG) {
struct sock *sk;
+ int put_count = 0;

/* Detach sockets from device */
+ mutex_lock(&hci_sk_list_lock);
read_lock(&hci_sk_list.lock);
sk_for_each(sk, &hci_sk_list.head) {
+ read_unlock(&hci_sk_list.lock);
lock_sock(sk);
if (hci_pi(sk)->hdev == hdev) {
hci_pi(sk)->hdev = NULL;
@@ -769,11 +774,15 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
sk->sk_state = BT_OPEN;
sk->sk_state_change(sk);

- hci_dev_put(hdev);
+ put_count++;
}
release_sock(sk);
+ read_lock(&hci_sk_list.lock);
}
read_unlock(&hci_sk_list.lock);
+ mutex_unlock(&hci_sk_list_lock);
+ while (put_count--)
+ hci_dev_put(hdev);
}
}

@@ -838,6 +847,10 @@ static int hci_sock_release(struct socket *sock)
if (!sk)
return 0;

+ mutex_lock(&hci_sk_list_lock);
+ bt_sock_unlink(&hci_sk_list, sk);
+ mutex_unlock(&hci_sk_list_lock);
+
lock_sock(sk);

switch (hci_pi(sk)->channel) {
@@ -859,8 +872,6 @@ static int hci_sock_release(struct socket *sock)
break;
}

- bt_sock_unlink(&hci_sk_list, sk);
-
hdev = hci_pi(sk)->hdev;
if (hdev) {
if (hci_pi(sk)->channel == HCI_CHANNEL_USER) {
----------------------------------------

Patch for choice (2):

----------------------------------------
net/bluetooth/hci_sock.c | 39 ++++++++++++++++++++++++++++++++++++---
1 file changed, 36 insertions(+), 3 deletions(-)

diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index b04a5a02ecf3..3e65fcc8c9af 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -43,6 +43,8 @@ static DEFINE_IDA(sock_cookie_ida);

static atomic_t monitor_promisc = ATOMIC_INIT(0);

+static DEFINE_MUTEX(dev_event_lock);
+
/* ----- HCI socket interface ----- */

/* Socket info */
@@ -57,6 +59,7 @@ struct hci_pinfo {
unsigned long flags;
__u32 cookie;
char comm[TASK_COMM_LEN];
+ unsigned int event_in_progress;
};

void hci_sock_set_flag(struct sock *sk, int nr)
@@ -758,10 +761,15 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)

if (event == HCI_DEV_UNREG) {
struct sock *sk;
+ int put_count = 0;

/* Detach sockets from device */
+ mutex_lock(&dev_event_lock);
read_lock(&hci_sk_list.lock);
sk_for_each(sk, &hci_sk_list.head) {
+ read_unlock(&hci_sk_list.lock);
+ hci_pi(sk)->event_in_progress++;
+ mutex_unlock(&dev_event_lock);
lock_sock(sk);
if (hci_pi(sk)->hdev == hdev) {
hci_pi(sk)->hdev = NULL;
@@ -769,11 +777,17 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
sk->sk_state = BT_OPEN;
sk->sk_state_change(sk);

- hci_dev_put(hdev);
+ put_count++;
}
release_sock(sk);
+ mutex_lock(&dev_event_lock);
+ hci_pi(sk)->event_in_progress--;
+ read_lock(&hci_sk_list.lock);
}
read_unlock(&hci_sk_list.lock);
+ mutex_unlock(&dev_event_lock);
+ while (put_count--)
+ hci_dev_put(hdev);
}
}

@@ -838,6 +852,26 @@ static int hci_sock_release(struct socket *sock)
if (!sk)
return 0;

+ /*
+ * Wait for sk_for_each() in hci_sock_dev_event() to stop accessing
+ * this sk before unlinking. Need to unlink before lock_sock(), for
+ * hci_sock_dev_event() calls lock_sock() after incrementing
+ * event_in_progress counter.
+ */
+ while (1) {
+ bool unlinked = true;
+
+ mutex_lock(&dev_event_lock);
+ if (!hci_pi(sk)->event_in_progress)
+ bt_sock_unlink(&hci_sk_list, sk);
+ else
+ unlinked = false;
+ mutex_unlock(&dev_event_lock);
+ if (unlinked)
+ break;
+ schedule_timeout_uninterruptible(1);
+ }
+
lock_sock(sk);

switch (hci_pi(sk)->channel) {
@@ -859,8 +893,6 @@ static int hci_sock_release(struct socket *sock)
break;
}

- bt_sock_unlink(&hci_sk_list, sk);
-
hdev = hci_pi(sk)->hdev;
if (hdev) {
if (hci_pi(sk)->channel == HCI_CHANNEL_USER) {
@@ -2049,6 +2081,7 @@ static int hci_sock_create(struct net *net, struct socket *sock, int protocol,
sock->state = SS_UNCONNECTED;
sk->sk_state = BT_OPEN;

+ hci_pi(sk)->event_in_progress = 0;
bt_sock_link(&hci_sk_list, sk);
return 0;
}
----------------------------------------

Patch for choice (3):

----------------------------------------
net/bluetooth/hci_sock.c | 35 +++++++++++++++++++++++++++++++++--
1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index b04a5a02ecf3..38146cf37378 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -758,22 +758,53 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)

if (event == HCI_DEV_UNREG) {
struct sock *sk;
+ int put_count = 0;

/* Detach sockets from device */
+restart:
read_lock(&hci_sk_list.lock);
sk_for_each(sk, &hci_sk_list.head) {
+ /* This sock_hold(sk) is safe, for bt_sock_unlink(sk)
+ * is not called yet.
+ */
+ sock_hold(sk);
+ read_unlock(&hci_sk_list.lock);
lock_sock(sk);
- if (hci_pi(sk)->hdev == hdev) {
+ write_lock(&hci_sk_list.lock);
+ /* Check that bt_sock_unlink(sk) is not called yet. */
+ if (sk_hashed(sk) && hci_pi(sk)->hdev == hdev) {
hci_pi(sk)->hdev = NULL;
sk->sk_err = EPIPE;
sk->sk_state = BT_OPEN;
sk->sk_state_change(sk);

- hci_dev_put(hdev);
+ put_count++;
}
+ write_unlock(&hci_sk_list.lock);
release_sock(sk);
+ read_lock(&hci_sk_list.lock);
+ /* If bt_sock_unlink(sk) is not called yet, we can
+ * continue iteration. We can use __sock_put(sk) here
+ * because hci_sock_release() will call sock_put(sk)
+ * after bt_sock_unlink(sk).
+ */
+ if (sk_hashed(sk)) {
+ __sock_put(sk);
+ continue;
+ }
+ /* Otherwise, we need to restart iteration, for the
+ * next socket pointed by sk->next might be already
+ * gone. We can't use __sock_put(sk) here because
+ * hci_sock_release() might have already called
+ * sock_put(sk) after bt_sock_unlink(sk).
+ */
+ read_unlock(&hci_sk_list.lock);
+ sock_put(sk);
+ goto restart;
}
read_unlock(&hci_sk_list.lock);
+ while (put_count--)
+ hci_dev_put(hdev);
}
}

----------------------------------------

2021-07-10 13:39:57

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH v2] Bluetooth: call lock_sock() outside of spinlock section

On 2021/07/08 8:33, Tetsuo Handa wrote:
>> we could perhaps don't release the reference to hdev
>> either and leave hci_sock_release to deal with it and then perhaps we
>> can take away the backward goto, actually why are you restarting to
>> begin with?
>
> Do you mean something like
>
> diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
> index b04a5a02ecf3..0525883f4639 100644
> --- a/net/bluetooth/hci_sock.c
> +++ b/net/bluetooth/hci_sock.c
> @@ -759,19 +759,14 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
> if (event == HCI_DEV_UNREG) {
> struct sock *sk;
>
> - /* Detach sockets from device */
> + /* Change socket state and notify */
> read_lock(&hci_sk_list.lock);
> sk_for_each(sk, &hci_sk_list.head) {
> - lock_sock(sk);
> if (hci_pi(sk)->hdev == hdev) {
> - hci_pi(sk)->hdev = NULL;
> sk->sk_err = EPIPE;
> sk->sk_state = BT_OPEN;
> sk->sk_state_change(sk);
> -
> - hci_dev_put(hdev);
> }
> - release_sock(sk);
> }
> read_unlock(&hci_sk_list.lock);
> }
>
> ? I can't judge because I don't know how this works. I worry that
> without lock_sock()/release_sock(), this races with e.g. hci_sock_bind().
>

I examined hci_unregister_dev() and concluded that this can't work.

hci_sock_dev_event(hdev, HCI_DEV_UNREG) can't defer dropping the reference to
this hdev till hci_sock_release(), for hci_unregister_dev() cleans up everything
related to this hdev and calls hci_dev_put(hdev) and then vhci_release() calls
hci_free_dev(hdev).

That's the reason hci_sock_dev_event() has to use lock_sock() in order not to
miss some hci_dev_put(hdev) calls.

>> This sounds a little too complicated, afaik backward goto is not even
>> consider a good practice either, since it appears we don't unlink the
>> sockets here

Despite your comment, I'd like to go with choice (3) for now. After lock_sock() became
free from delay caused by pagefault handling, we could consider updating to choice (1).

2021-07-13 11:29:31

by Tetsuo Handa

[permalink] [raw]
Subject: [PATCH v3] Bluetooth: call lock_sock() outside of spinlock section

syzbot is hitting might_sleep() warning at hci_sock_dev_event() due to
calling lock_sock() with rw spinlock held [1]. Among three possible
approaches [2], this patch chose holding a refcount via sock_hold() and
revalidating the element via sk_hashed().

Link: https://syzkaller.appspot.com/bug?extid=a5df189917e79d5e59c9 [1]
Link: https://lkml.kernel.org/r/[email protected] [2]
Reported-by: syzbot <[email protected]>
Signed-off-by: Tetsuo Handa <[email protected]>
Tested-by: syzbot <[email protected]>
Fixes: e305509e678b3a4a ("Bluetooth: use correct lock to prevent UAF of hdev object")
---
Changes in v3:
Don't use unlocked hci_pi(sk)->hdev != hdev test, for it is racy.
No need to defer hci_dev_put(hdev), for it can't be the last reference.

Changes in v2:
Take hci_sk_list.lock for write in case bt_sock_unlink() is called after
sk_hashed(sk) test, and defer hci_dev_put(hdev) till schedulable context.

net/bluetooth/hci_sock.c | 30 +++++++++++++++++++++++++++++-
1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index b04a5a02ecf3..786a06a232fd 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -760,10 +760,18 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
struct sock *sk;

/* Detach sockets from device */
+restart:
read_lock(&hci_sk_list.lock);
sk_for_each(sk, &hci_sk_list.head) {
+ /* This sock_hold(sk) is safe, for bt_sock_unlink(sk)
+ * is not called yet.
+ */
+ sock_hold(sk);
+ read_unlock(&hci_sk_list.lock);
lock_sock(sk);
- if (hci_pi(sk)->hdev == hdev) {
+ write_lock(&hci_sk_list.lock);
+ /* Check that bt_sock_unlink(sk) is not called yet. */
+ if (sk_hashed(sk) && hci_pi(sk)->hdev == hdev) {
hci_pi(sk)->hdev = NULL;
sk->sk_err = EPIPE;
sk->sk_state = BT_OPEN;
@@ -771,7 +779,27 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)

hci_dev_put(hdev);
}
+ write_unlock(&hci_sk_list.lock);
release_sock(sk);
+ read_lock(&hci_sk_list.lock);
+ /* If bt_sock_unlink(sk) is not called yet, we can
+ * continue iteration. We can use __sock_put(sk) here
+ * because hci_sock_release() will call sock_put(sk)
+ * after bt_sock_unlink(sk).
+ */
+ if (sk_hashed(sk)) {
+ __sock_put(sk);
+ continue;
+ }
+ /* Otherwise, we need to restart iteration, for the
+ * next socket pointed by sk->next might be already
+ * gone. We can't use __sock_put(sk) here because
+ * hci_sock_release() might have already called
+ * sock_put(sk) after bt_sock_unlink(sk).
+ */
+ read_unlock(&hci_sk_list.lock);
+ sock_put(sk);
+ goto restart;
}
read_unlock(&hci_sk_list.lock);
}
--
2.18.4

2021-07-13 12:00:08

by bluez.test.bot

[permalink] [raw]
Subject: RE: [v3] Bluetooth: call lock_sock() outside of spinlock section

This is automated email and please do not reply to this email!

Dear submitter,

Thank you for submitting the patches to the linux bluetooth mailing list.
This is a CI test results with your patch series:
PW Link:https://patchwork.kernel.org/project/bluetooth/list/?series=514717

---Test result---

Test Summary:
CheckPatch PASS 0.58 seconds
GitLint FAIL 0.10 seconds
BuildKernel PASS 507.74 seconds
TestRunner: Setup PASS 331.78 seconds
TestRunner: l2cap-tester PASS 2.49 seconds
TestRunner: bnep-tester PASS 1.90 seconds
TestRunner: mgmt-tester PASS 30.48 seconds
TestRunner: rfcomm-tester PASS 2.05 seconds
TestRunner: sco-tester PASS 1.98 seconds
TestRunner: smp-tester FAIL 2.02 seconds
TestRunner: userchan-tester PASS 1.88 seconds

Details
##############################
Test: CheckPatch - PASS - 0.58 seconds
Run checkpatch.pl script with rule in .checkpatch.conf


##############################
Test: GitLint - FAIL - 0.10 seconds
Run gitlint with rule in .gitlint
Bluetooth: call lock_sock() outside of spinlock section
9: B1 Line exceeds max length (92>80): "Link: https://lkml.kernel.org/r/[email protected] [2]"
13: B1 Line exceeds max length (85>80): "Fixes: e305509e678b3a4a ("Bluetooth: use correct lock to prevent UAF of hdev object")"


##############################
Test: BuildKernel - PASS - 507.74 seconds
Build Kernel with minimal configuration supports Bluetooth


##############################
Test: TestRunner: Setup - PASS - 331.78 seconds
Setup environment for running Test Runner


##############################
Test: TestRunner: l2cap-tester - PASS - 2.49 seconds
Run test-runner with l2cap-tester
Total: 40, Passed: 40 (100.0%), Failed: 0, Not Run: 0

##############################
Test: TestRunner: bnep-tester - PASS - 1.90 seconds
Run test-runner with bnep-tester
Total: 1, Passed: 1 (100.0%), Failed: 0, Not Run: 0

##############################
Test: TestRunner: mgmt-tester - PASS - 30.48 seconds
Run test-runner with mgmt-tester
Total: 446, Passed: 443 (99.3%), Failed: 0, Not Run: 3

##############################
Test: TestRunner: rfcomm-tester - PASS - 2.05 seconds
Run test-runner with rfcomm-tester
Total: 9, Passed: 9 (100.0%), Failed: 0, Not Run: 0

##############################
Test: TestRunner: sco-tester - PASS - 1.98 seconds
Run test-runner with sco-tester
Total: 8, Passed: 8 (100.0%), Failed: 0, Not Run: 0

##############################
Test: TestRunner: smp-tester - FAIL - 2.02 seconds
Run test-runner with smp-tester
Total: 8, Passed: 7 (87.5%), Failed: 1, Not Run: 0

Failed Test Cases
SMP Client - SC Request 2 Failed 0.020 seconds

##############################
Test: TestRunner: userchan-tester - PASS - 1.88 seconds
Run test-runner with userchan-tester
Total: 3, Passed: 3 (100.0%), Failed: 0, Not Run: 0



---
Regards,
Linux Bluetooth


Attachments:
l2cap-tester.log (43.31 kB)
bnep-tester.log (3.47 kB)
mgmt-tester.log (600.00 kB)
rfcomm-tester.log (11.40 kB)
sco-tester.log (9.68 kB)
smp-tester.log (11.43 kB)
userchan-tester.log (5.33 kB)
Download all attachments

2021-07-14 19:21:54

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: [PATCH v3] Bluetooth: call lock_sock() outside of spinlock section

Hi Tetsuo,

On Tue, Jul 13, 2021 at 4:28 AM Tetsuo Handa
<[email protected]> wrote:
>
> syzbot is hitting might_sleep() warning at hci_sock_dev_event() due to
> calling lock_sock() with rw spinlock held [1]. Among three possible
> approaches [2], this patch chose holding a refcount via sock_hold() and
> revalidating the element via sk_hashed().
>
> Link: https://syzkaller.appspot.com/bug?extid=a5df189917e79d5e59c9 [1]
> Link: https://lkml.kernel.org/r/[email protected] [2]
> Reported-by: syzbot <[email protected]>
> Signed-off-by: Tetsuo Handa <[email protected]>
> Tested-by: syzbot <[email protected]>
> Fixes: e305509e678b3a4a ("Bluetooth: use correct lock to prevent UAF of hdev object")
> ---
> Changes in v3:
> Don't use unlocked hci_pi(sk)->hdev != hdev test, for it is racy.
> No need to defer hci_dev_put(hdev), for it can't be the last reference.
>
> Changes in v2:
> Take hci_sk_list.lock for write in case bt_sock_unlink() is called after
> sk_hashed(sk) test, and defer hci_dev_put(hdev) till schedulable context.

How about we revert back to use bh_lock_sock_nested but use
local_bh_disable like the following patch:

https://patchwork.kernel.org/project/bluetooth/patch/[email protected]/

> net/bluetooth/hci_sock.c | 30 +++++++++++++++++++++++++++++-
> 1 file changed, 29 insertions(+), 1 deletion(-)
>
> diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
> index b04a5a02ecf3..786a06a232fd 100644
> --- a/net/bluetooth/hci_sock.c
> +++ b/net/bluetooth/hci_sock.c
> @@ -760,10 +760,18 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
> struct sock *sk;
>
> /* Detach sockets from device */
> +restart:
> read_lock(&hci_sk_list.lock);
> sk_for_each(sk, &hci_sk_list.head) {
> + /* This sock_hold(sk) is safe, for bt_sock_unlink(sk)
> + * is not called yet.
> + */
> + sock_hold(sk);
> + read_unlock(&hci_sk_list.lock);
> lock_sock(sk);
> - if (hci_pi(sk)->hdev == hdev) {
> + write_lock(&hci_sk_list.lock);
> + /* Check that bt_sock_unlink(sk) is not called yet. */
> + if (sk_hashed(sk) && hci_pi(sk)->hdev == hdev) {
> hci_pi(sk)->hdev = NULL;
> sk->sk_err = EPIPE;
> sk->sk_state = BT_OPEN;
> @@ -771,7 +779,27 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
>
> hci_dev_put(hdev);
> }
> + write_unlock(&hci_sk_list.lock);
> release_sock(sk);
> + read_lock(&hci_sk_list.lock);
> + /* If bt_sock_unlink(sk) is not called yet, we can
> + * continue iteration. We can use __sock_put(sk) here
> + * because hci_sock_release() will call sock_put(sk)
> + * after bt_sock_unlink(sk).
> + */
> + if (sk_hashed(sk)) {
> + __sock_put(sk);
> + continue;
> + }
> + /* Otherwise, we need to restart iteration, for the
> + * next socket pointed by sk->next might be already
> + * gone. We can't use __sock_put(sk) here because
> + * hci_sock_release() might have already called
> + * sock_put(sk) after bt_sock_unlink(sk).
> + */
> + read_unlock(&hci_sk_list.lock);
> + sock_put(sk);
> + goto restart;
> }
> read_unlock(&hci_sk_list.lock);
> }
> --
> 2.18.4
>


--
Luiz Augusto von Dentz

2021-07-15 03:15:23

by LinMa

[permalink] [raw]
Subject: Re: Re: [PATCH v3] Bluetooth: call lock_sock() outside of spinlock section

Hi there,

I'm just exhilarated to see there have been some new ideas to fix this.

>
> How about we revert back to use bh_lock_sock_nested but use
> local_bh_disable like the following patch:
>
> https://patchwork.kernel.org/project/bluetooth/patch/[email protected]/
>

I have checked that patch and learn about some `local_bh_disable/enable` usage.
To the best of my knowledge, the local_bh_disable() function can be used to disable the processing of bottom halves (softirqs).
Or in another word, if process context function, hci_sock_sendmsg() for example, can mask the BH (hci_dev_do_close()?). It doesn't need to worry about the UAF.

However, after doing some experiments, I failed :(
For instance, I try to do following patch:

--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -1720,6 +1720,7 @@ static int hci_sock_sendmsg(struct socket *sock, struct msghdr *msg,
return -EINVAL;

lock_sock(sk);
+ local_bh_disable();

switch (hci_pi(sk)->channel) {
case HCI_CHANNEL_RAW:
@@ -1832,7 +1833,9 @@ static int hci_sock_sendmsg(struct socket *sock, struct msghdr *msg,
err = len;

done:
+ local_bh_enable();
release_sock(sk);
+
return err;

But the POC code shows error message like below:

[ 18.169155] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:197
[ 18.170181] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 120, name: exp
[ 18.170987] 1 lock held by exp/120:
[ 18.171384] #0: ffff888011dd5120 (sk_lock-AF_BLUETOOTH-BTPROTO_HCI){+.+.}-{0:0}, at: hci_sock_sendmsg+0x11e/0x26c0
[ 18.172300] CPU: 0 PID: 120 Comm: exp Not tainted 5.11.11+ #44
[ 18.172921] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
...

The patch provided by Desmond adds the local_bh_disable() before the bh_lock_sock() so I also try that in

--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -762,6 +762,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
/* Detach sockets from device */
read_lock(&hci_sk_list.lock);
sk_for_each(sk, &hci_sk_list.head) {
+ local_bh_disable();
bh_lock_sock_nested(sk);
if (hci_pi(sk)->hdev == hdev) {
hci_pi(sk)->hdev = NULL;
@@ -772,6 +773,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
hci_dev_put(hdev);
}
bh_unlock_sock(sk);
+ local_bh_enable();
}
read_unlock(&hci_sk_list.lock);
}

But this is not useful, the UAF still occurs

[ 13.862117] ==================================================================
[ 13.863064] BUG: KASAN: use-after-free in __lock_acquire+0xe5/0x2ca0
[ 13.863852] Read of size 8 at addr ffff888011d9aeb0 by task exp/119
[ 13.864620]
[ 13.864818] CPU: 0 PID: 119 Comm: exp Not tainted 5.11.11+ #45
[ 13.865543] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 13.866634] Call Trace:
[ 13.866947] dump_stack+0x183/0x22e
[ 13.867389] ? show_regs_print_info+0x12/0x12
[ 13.867927] ? log_buf_vmcoreinfo_setup+0x45d/0x45d
[ 13.868503] ? _raw_spin_lock_irqsave+0xbd/0x100
[ 13.869244] print_address_description+0x7b/0x3a0
[ 13.869828] __kasan_report+0x14e/0x200
[ 13.870288] ? __lock_acquire+0xe5/0x2ca0
[ 13.870768] kasan_report+0x47/0x60
[ 13.871189] __lock_acquire+0xe5/0x2ca0
[ 13.871647] ? lock_acquire+0x168/0x6a0
[ 13.872107] ? trace_lock_release+0x5c/0x120
[ 13.872615] ? do_user_addr_fault+0x9c2/0xdb0
[ 13.873135] ? trace_lock_acquire+0x150/0x150
[ 13.873661] ? rcu_read_lock_sched_held+0x87/0x110
[ 13.874232] ? perf_trace_rcu_barrier+0x360/0x360
[ 13.874790] ? avc_has_perm_noaudit+0x442/0x4c0
[ 13.875332] lock_acquire+0x168/0x6a0
[ 13.875772] ? skb_queue_tail+0x32/0x120
[ 13.876240] ? do_kern_addr_fault+0x230/0x230
[ 13.876756] ? read_lock_is_recursive+0x10/0x10
[ 13.877300] ? exc_page_fault+0xf3/0x1b0
[ 13.877770] ? cred_has_capability+0x191/0x3f0
[ 13.878290] ? cred_has_capability+0x2a1/0x3f0
[ 13.878816] ? rcu_lock_release+0x20/0x20
[ 13.879295] _raw_spin_lock_irqsave+0xb1/0x100
[ 13.879821] ? skb_queue_tail+0x32/0x120
[ 13.880287] ? _raw_spin_lock+0x40/0x40
[ 13.880745] skb_queue_tail+0x32/0x120
[ 13.881194] hci_sock_sendmsg+0x1545/0x26b0

From my point of view, adding the local_bh_disable() cannot prevent current hci_sock_dev_event() to set and decrease the ref-count. It's not quite similar with the cases that Desmond discussed.
(Or maybe just I don't know how to use this).

I recently tried to find some similar cases (and I did, reported to security already but get no reply) and figure out how others are fixed.
Some guideline tells me that (http://books.gigatux.nl/mirror/kerneldevelopment/0672327201/ch07lev1sec6.html)

"If process context code and a bottom half share data, you need to disable bottom-half processing and obtain a lock before accessing the data. Doing both ensures local and SMP protection and prevents a deadlock."

Assuming hci_sock_sendmsg()/hci_sock_bound_ioctl() are the process contexts while the hci_sock_dev_event(), not sure, is the BH context. The fact is that the hci_sock_dev_event() should wait for the process contexts. Hence, I think Tetsuo is on the right way.

Regards
Lock-Noob LinMa



2021-07-16 03:49:59

by Desmond Cheong Zhi Xi

[permalink] [raw]
Subject: Re: [PATCH v3] Bluetooth: call lock_sock() outside of spinlock section

On 15/7/21 11:03 am, LinMa wrote:
> Hi there,
>
> I'm just exhilarated to see there have been some new ideas to fix this.
>
>>
>> How about we revert back to use bh_lock_sock_nested but use
>> local_bh_disable like the following patch:
>>
>> https://patchwork.kernel.org/project/bluetooth/patch/[email protected]/
>>
>
> I have checked that patch and learn about some `local_bh_disable/enable` usage.
> To the best of my knowledge, the local_bh_disable() function can be used to disable the processing of bottom halves (softirqs).
> Or in another word, if process context function, hci_sock_sendmsg() for example, can mask the BH (hci_dev_do_close()?). It doesn't need to worry about the UAF.
>
> However, after doing some experiments, I failed :(
> For instance, I try to do following patch:
>
> --- a/net/bluetooth/hci_sock.c
> +++ b/net/bluetooth/hci_sock.c
> @@ -1720,6 +1720,7 @@ static int hci_sock_sendmsg(struct socket *sock, struct msghdr *msg,
> return -EINVAL;
>
> lock_sock(sk);
> + local_bh_disable();
>
> switch (hci_pi(sk)->channel) {
> case HCI_CHANNEL_RAW:
> @@ -1832,7 +1833,9 @@ static int hci_sock_sendmsg(struct socket *sock, struct msghdr *msg,
> err = len;
>
> done:
> + local_bh_enable();
> release_sock(sk);
> +
> return err;
>
> But the POC code shows error message like below:
>
> [ 18.169155] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:197
> [ 18.170181] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 120, name: exp
> [ 18.170987] 1 lock held by exp/120:
> [ 18.171384] #0: ffff888011dd5120 (sk_lock-AF_BLUETOOTH-BTPROTO_HCI){+.+.}-{0:0}, at: hci_sock_sendmsg+0x11e/0x26c0
> [ 18.172300] CPU: 0 PID: 120 Comm: exp Not tainted 5.11.11+ #44
> [ 18.172921] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
> ...

Hi,

Saw this and thought I'd offer my two cents.

This is the original problem that Tetsuo's patch was trying to fix.
Under the hood of lock_sock, we call lock_sock_nested which might sleep
because of the mutex_acquire. But we shouldn't sleep while holding the
rw spinlock. So we either have to acquire a spinlock instead of a mutex
as was done before, or we need to move lock_sock out of the rw spinlock
critical section as Tetsuo proposes.

>
> The patch provided by Desmond adds the local_bh_disable() before the bh_lock_sock() so I also try that in
>
> --- a/net/bluetooth/hci_sock.c
> +++ b/net/bluetooth/hci_sock.c
> @@ -762,6 +762,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
> /* Detach sockets from device */
> read_lock(&hci_sk_list.lock);
> sk_for_each(sk, &hci_sk_list.head) {
> + local_bh_disable();
> bh_lock_sock_nested(sk);
> if (hci_pi(sk)->hdev == hdev) {
> hci_pi(sk)->hdev = NULL;
> @@ -772,6 +773,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
> hci_dev_put(hdev);
> }
> bh_unlock_sock(sk);
> + local_bh_enable();
> }
> read_unlock(&hci_sk_list.lock);
> }
>
> But this is not useful, the UAF still occurs
>

I might be very mistaken on this, but I believe the UAF still happens
because you can't really mix bh_lock_sock* and lock_sock* to protect the
same things. The former holds the spinlock &sk->sk_lock.slock and
synchronizes between user contexts and bottom halves, while the latter
holds a mutex on &sk->sk_lock.dep_map to synchronize between multiple users.

One option I can think of would be to switch instances of lock_sock to
bh_lock_sock_nested for users that might race (such as hci_sock_sendmsg,
hci_sock_bound_ioctl, and others as needed). But I'm not sure if that's
quite what we want, plus we would need to ensure that sleeping functions
aren't called between the bh_lock/unlock.

Best wishes,
Desmond

> [ 13.862117] ==================================================================
> [ 13.863064] BUG: KASAN: use-after-free in __lock_acquire+0xe5/0x2ca0
> [ 13.863852] Read of size 8 at addr ffff888011d9aeb0 by task exp/119
> [ 13.864620]
> [ 13.864818] CPU: 0 PID: 119 Comm: exp Not tainted 5.11.11+ #45
> [ 13.865543] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
> [ 13.866634] Call Trace:
> [ 13.866947] dump_stack+0x183/0x22e
> [ 13.867389] ? show_regs_print_info+0x12/0x12
> [ 13.867927] ? log_buf_vmcoreinfo_setup+0x45d/0x45d
> [ 13.868503] ? _raw_spin_lock_irqsave+0xbd/0x100
> [ 13.869244] print_address_description+0x7b/0x3a0
> [ 13.869828] __kasan_report+0x14e/0x200
> [ 13.870288] ? __lock_acquire+0xe5/0x2ca0
> [ 13.870768] kasan_report+0x47/0x60
> [ 13.871189] __lock_acquire+0xe5/0x2ca0
> [ 13.871647] ? lock_acquire+0x168/0x6a0
> [ 13.872107] ? trace_lock_release+0x5c/0x120
> [ 13.872615] ? do_user_addr_fault+0x9c2/0xdb0
> [ 13.873135] ? trace_lock_acquire+0x150/0x150
> [ 13.873661] ? rcu_read_lock_sched_held+0x87/0x110
> [ 13.874232] ? perf_trace_rcu_barrier+0x360/0x360
> [ 13.874790] ? avc_has_perm_noaudit+0x442/0x4c0
> [ 13.875332] lock_acquire+0x168/0x6a0
> [ 13.875772] ? skb_queue_tail+0x32/0x120
> [ 13.876240] ? do_kern_addr_fault+0x230/0x230
> [ 13.876756] ? read_lock_is_recursive+0x10/0x10
> [ 13.877300] ? exc_page_fault+0xf3/0x1b0
> [ 13.877770] ? cred_has_capability+0x191/0x3f0
> [ 13.878290] ? cred_has_capability+0x2a1/0x3f0
> [ 13.878816] ? rcu_lock_release+0x20/0x20
> [ 13.879295] _raw_spin_lock_irqsave+0xb1/0x100
> [ 13.879821] ? skb_queue_tail+0x32/0x120
> [ 13.880287] ? _raw_spin_lock+0x40/0x40
> [ 13.880745] skb_queue_tail+0x32/0x120
> [ 13.881194] hci_sock_sendmsg+0x1545/0x26b0
>
> From my point of view, adding the local_bh_disable() cannot prevent current hci_sock_dev_event() to set and decrease the ref-count. It's not quite similar with the cases that Desmond discussed.
> (Or maybe just I don't know how to use this).
> > I recently tried to find some similar cases (and I did, reported to
security already but get no reply) and figure out how others are fixed.
> Some guideline tells me that (http://books.gigatux.nl/mirror/kerneldevelopment/0672327201/ch07lev1sec6.html)
>
> "If process context code and a bottom half share data, you need to disable bottom-half processing and obtain a lock before accessing the data. Doing both ensures local and SMP protection and prevents a deadlock."
>
> Assuming hci_sock_sendmsg()/hci_sock_bound_ioctl() are the process contexts while the hci_sock_dev_event(), not sure, is the BH context. The fact is that the hci_sock_dev_event() should wait for the process contexts. Hence, I think Tetsuo is on the right way.
>
> Regards
> Lock-Noob LinMa
>
>
>

2021-07-16 04:16:55

by Desmond Cheong Zhi Xi

[permalink] [raw]
Subject: Re: [PATCH v3] Bluetooth: call lock_sock() outside of spinlock section

On 16/7/21 11:47 am, Desmond Cheong Zhi Xi wrote:
> On 15/7/21 11:03 am, LinMa wrote:
>> Hi there,
>>
>> I'm just exhilarated to see there have been some new ideas to fix this.
>>
>>>
>>> How about we revert back to use bh_lock_sock_nested but use
>>> local_bh_disable like the following patch:
>>>
>>> https://patchwork.kernel.org/project/bluetooth/patch/[email protected]/
>>>
>>>
>>
>> I have checked that patch and learn about some
>> `local_bh_disable/enable` usage.
>> To the best of my knowledge, the local_bh_disable() function can be
>> used to disable the processing of bottom halves (softirqs).
>> Or in another word, if process context function, hci_sock_sendmsg()
>> for example, can mask the BH (hci_dev_do_close()?). It doesn't need to
>> worry about the UAF.
>>
>> However, after doing some experiments, I failed :(
>> For instance, I try to do following patch:
>>
>> --- a/net/bluetooth/hci_sock.c
>> +++ b/net/bluetooth/hci_sock.c
>> @@ -1720,6 +1720,7 @@ static int hci_sock_sendmsg(struct socket *sock,
>> struct msghdr *msg,
>>                  return -EINVAL;
>>
>>          lock_sock(sk);
>> +       local_bh_disable();
>>
>>          switch (hci_pi(sk)->channel) {
>>          case HCI_CHANNEL_RAW:
>> @@ -1832,7 +1833,9 @@ static int hci_sock_sendmsg(struct socket *sock,
>> struct msghdr *msg,
>>          err = len;
>>
>>   done:
>> +       local_bh_enable();
>>          release_sock(sk);
>> +
>>          return err;
>>
>> But the POC code shows error message like below:
>>
>> [   18.169155] BUG: sleeping function called from invalid context at
>> include/linux/sched/mm.h:197
>> [   18.170181] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid:
>> 120, name: exp
>> [   18.170987] 1 lock held by exp/120:
>> [   18.171384]  #0: ffff888011dd5120
>> (sk_lock-AF_BLUETOOTH-BTPROTO_HCI){+.+.}-{0:0}, at:
>> hci_sock_sendmsg+0x11e/0x26c0
>> [   18.172300] CPU: 0 PID: 120 Comm: exp Not tainted 5.11.11+ #44
>> [   18.172921] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>> BIOS 1.10.2-1ubuntu1 04/01/2014
>> ...
>
> Hi,
>
> Saw this and thought I'd offer my two cents.
> BUG: sleeping function called from invalid context
> This is the original problem that Tetsuo's patch was trying to fix.
> Under the hood of lock_sock, we call lock_sock_nested which might sleep
> because of the mutex_acquire. But we shouldn't sleep while holding the
> rw spinlock. So we either have to acquire a spinlock instead of a mutex
> as was done before, or we need to move lock_sock out of the rw spinlock
> critical section as Tetsuo proposes.
>

My bad, was thinking more about the problem and noticed your poc was for
hci_sock_sendmsg, not hci_sock_dev_event. In this case, it's not clear
to me why the atomic context is being violated.

Sorry for the noise.

>>
>> The patch provided by Desmond adds the local_bh_disable() before the
>> bh_lock_sock() so I also try that in
>>
>> --- a/net/bluetooth/hci_sock.c
>> +++ b/net/bluetooth/hci_sock.c
>> @@ -762,6 +762,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int
>> event)
>>                  /* Detach sockets from device */
>>                  read_lock(&hci_sk_list.lock);
>>                  sk_for_each(sk, &hci_sk_list.head) {
>> +                       local_bh_disable();
>>                          bh_lock_sock_nested(sk);
>>                          if (hci_pi(sk)->hdev == hdev) {
>>                                  hci_pi(sk)->hdev = NULL;
>> @@ -772,6 +773,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int
>> event)
>>                                  hci_dev_put(hdev);
>>                          }
>>                          bh_unlock_sock(sk);
>> +                       local_bh_enable();
>>                  }
>>                  read_unlock(&hci_sk_list.lock);
>>          }
>>
>> But this is not useful, the UAF still occurs
>>
>
> I might be very mistaken on this, but I believe the UAF still happens
> because you can't really mix bh_lock_sock* and lock_sock* to protect the
> same things. The former holds the spinlock &sk->sk_lock.slock and
> synchronizes between user contexts and bottom halves, while the latter
> holds a mutex on &sk->sk_lock.dep_map to synchronize between multiple
> users.
>
> One option I can think of would be to switch instances of lock_sock to
> bh_lock_sock_nested for users that might race (such as hci_sock_sendmsg,
> hci_sock_bound_ioctl, and others as needed). But I'm not sure if that's
> quite what we want, plus we would need to ensure that sleeping functions
> aren't called between the bh_lock/unlock.
>
> Best wishes,
> Desmond
>
>> [   13.862117]
>> ==================================================================
>> [   13.863064] BUG: KASAN: use-after-free in __lock_acquire+0xe5/0x2ca0
>> [   13.863852] Read of size 8 at addr ffff888011d9aeb0 by task exp/119
>> [   13.864620]
>> [   13.864818] CPU: 0 PID: 119 Comm: exp Not tainted 5.11.11+ #45
>> [   13.865543] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>> BIOS 1.10.2-1ubuntu1 04/01/2014
>> [   13.866634] Call Trace:
>> [   13.866947]  dump_stack+0x183/0x22e
>> [   13.867389]  ? show_regs_print_info+0x12/0x12
>> [   13.867927]  ? log_buf_vmcoreinfo_setup+0x45d/0x45d
>> [   13.868503]  ? _raw_spin_lock_irqsave+0xbd/0x100
>> [   13.869244]  print_address_description+0x7b/0x3a0
>> [   13.869828]  __kasan_report+0x14e/0x200
>> [   13.870288]  ? __lock_acquire+0xe5/0x2ca0
>> [   13.870768]  kasan_report+0x47/0x60
>> [   13.871189]  __lock_acquire+0xe5/0x2ca0
>> [   13.871647]  ? lock_acquire+0x168/0x6a0
>> [   13.872107]  ? trace_lock_release+0x5c/0x120
>> [   13.872615]  ? do_user_addr_fault+0x9c2/0xdb0
>> [   13.873135]  ? trace_lock_acquire+0x150/0x150
>> [   13.873661]  ? rcu_read_lock_sched_held+0x87/0x110
>> [   13.874232]  ? perf_trace_rcu_barrier+0x360/0x360
>> [   13.874790]  ? avc_has_perm_noaudit+0x442/0x4c0
>> [   13.875332]  lock_acquire+0x168/0x6a0
>> [   13.875772]  ? skb_queue_tail+0x32/0x120
>> [   13.876240]  ? do_kern_addr_fault+0x230/0x230
>> [   13.876756]  ? read_lock_is_recursive+0x10/0x10
>> [   13.877300]  ? exc_page_fault+0xf3/0x1b0
>> [   13.877770]  ? cred_has_capability+0x191/0x3f0
>> [   13.878290]  ? cred_has_capability+0x2a1/0x3f0
>> [   13.878816]  ? rcu_lock_release+0x20/0x20
>> [   13.879295]  _raw_spin_lock_irqsave+0xb1/0x100
>> [   13.879821]  ? skb_queue_tail+0x32/0x120
>> [   13.880287]  ? _raw_spin_lock+0x40/0x40
>> [   13.880745]  skb_queue_tail+0x32/0x120
>> [   13.881194]  hci_sock_sendmsg+0x1545/0x26b0
>>
>>  From my point of view, adding the local_bh_disable() cannot prevent
>> current hci_sock_dev_event() to set and decrease the ref-count. It's
>> not quite similar with the cases that Desmond discussed.
>> (Or maybe just I don't know how to use this).
>>  > I recently tried to find some similar cases (and I did, reported to
> security already but get no reply) and figure out how others are fixed.
>> Some guideline tells me that
>> (http://books.gigatux.nl/mirror/kerneldevelopment/0672327201/ch07lev1sec6.html)
>>
>>
>> "If process context code and a bottom half share data, you need to
>> disable bottom-half processing and obtain a lock before accessing the
>> data. Doing both ensures local and SMP protection and prevents a
>> deadlock."
>>
>> Assuming hci_sock_sendmsg()/hci_sock_bound_ioctl() are the process
>> contexts while the hci_sock_dev_event(), not sure, is the BH context.
>> The fact is that the hci_sock_dev_event() should wait for the process
>> contexts. Hence, I think Tetsuo is on the right way.
>>
>> Regards
>> Lock-Noob LinMa
>>
>>
>>
>

2021-07-16 14:50:37

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH v3] Bluetooth: call lock_sock() outside of spinlock section

On 2021/07/16 13:11, Desmond Cheong Zhi Xi wrote:
> On 16/7/21 11:47 am, Desmond Cheong Zhi Xi wrote:
>> Saw this and thought I'd offer my two cents.
>> BUG: sleeping function called from invalid context
>> This is the original problem that Tetsuo's patch was trying to fix.

Yes.

>> Under the hood of lock_sock, we call lock_sock_nested which might sleep
>> because of the mutex_acquire.

Both lock_sock() and lock_sock_nested() might sleep.

>> But we shouldn't sleep while holding the rw spinlock.

Right. In atomic context (e.g. inside interrupt handler, schedulable context
with interrupts or preemption disabled, schedulable context inside RCU read
critical section, schedulable context inside a spinlock section), we must not
call functions (e.g. waiting for a mutex, waiting for a semaphore, waiting for
a page fault) which are not atomic.

>> So we either have to acquire a spinlock instead of a mutex as was done before,

Regarding hci_sock_dev_event(HCI_DEV_UNREG) case, we can't use a spinlock.

Like LinMa explained, lock_sock() has to be used in order to serialize functions
(e.g. hci_sock_sendmsg()) which access hci_pi(sk)->hdev between lock_sock(sk) and
release_sock(sk). And like I explained, we can't defer resetting hci_pi(sk)->hdev
to NULL, for hci_sock_dev_event(HCI_DEV_UNREG) is responsible for resetting
hci_pi(sk)->hdev to NULL because the caller of hci_sock_dev_event(HCI_DEV_UNREG)
immediately destroys resources associated with this hdev.

>> or we need to move lock_sock out of the rw spinlock critical section as Tetsuo proposes.

Exactly. Since this is a regression introduced when fixing CVE-2021-3573, Linux
distributors are waiting for this patch so that they can apply the fix for CVE-2021-3573.
This patch should be sent to linux.git and stables as soon as possible. But due to little
attention on this patch, I'm already testing this patch in linux-next.git via my tree.
I'll drop when Bluetooth maintainers pick this patch up for linux-5.14-rcX. (Or should I
directly send to Linus?)

>>
>
> My bad, was thinking more about the problem and noticed your poc was for hci_sock_sendmsg,
> not hci_sock_dev_event.

I didn't catch this part. Are you talking about a different poc?
As far as I'm aware, exp.c in POC.zip was for hci_sock_bound_ioctl(HCIUNBLOCKADDR).

hci_sock_bound_ioctl(HCIUNBLOCKADDR) (which is called between lock_sock() and release_sock())
calls copy_from_user() which might cause page fault, and userfaultfd mechanism allows an attacker
to slowdown page fault handling enough to hci_sock_dev_event(HCI_DEV_UNREG) to return without
waiting for hci_sock_bound_ioctl(HCIUNBLOCKADDR) to call release_sock(). This race window
results in UAF (doesn't it, LinMa?).

> In this case, it's not clear to me why the atomic context is being violated.

In atomic context (in hci_sock_dev_event(HCI_DEV_UNREG) case, between
read_lock(&hci_sk_list.lock) and read_unlock(&hci_sk_list.lock)), we must not call
lock_sock(sk) which might wait for hci_sock_bound_ioctl(HCIUNBLOCKADDR) to call release_sock().

>
> Sorry for the noise.
>
>>>
>>> The patch provided by Desmond adds the local_bh_disable() before the bh_lock_sock() so I also try that in
>>>
>>> --- a/net/bluetooth/hci_sock.c
>>> +++ b/net/bluetooth/hci_sock.c
>>> @@ -762,6 +762,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
>>> /* Detach sockets from device */
>>> read_lock(&hci_sk_list.lock);
>>> sk_for_each(sk, &hci_sk_list.head) {
>>> + local_bh_disable();
>>> bh_lock_sock_nested(sk);
>>> if (hci_pi(sk)->hdev == hdev) {
>>> hci_pi(sk)->hdev = NULL;
>>> @@ -772,6 +773,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
>>> hci_dev_put(hdev);
>>> }
>>> bh_unlock_sock(sk);
>>> + local_bh_enable();
>>> }
>>> read_unlock(&hci_sk_list.lock);
>>> }
>>>
>>> But this is not useful, the UAF still occurs
>>>
>>
>> I might be very mistaken on this, but I believe the UAF still happens because
>> you can't really mix bh_lock_sock* and lock_sock* to protect the same things.

Right. https://www.kernel.org/doc/html/v5.13/kernel-hacking/locking.html

>> The former holds the spinlock &sk->sk_lock.slock and synchronizes between
>> user contexts and bottom halves,

serializes access to resources which might be accessed from atomic (i.e. non-schedulable) contexts

>> while the latter holds a mutex on &sk->sk_lock.dep_map to synchronize between
>> multiple users.

serializes access to resources which are accessed from only schedulable (i.e. non-atomic) contexts

>>
>> One option I can think of would be to switch instances of lock_sock to bh_lock_sock_nested
>> for users that might race (such as hci_sock_sendmsg, hci_sock_bound_ioctl, and others as
>> needed). But I'm not sure if that's quite what we want, plus we would need to ensure that
>> sleeping functions aren't called between the bh_lock/unlock.

We can't do it for hci_sock_dev_event(HCI_DEV_UNREG).

2021-07-16 15:29:13

by LinMa

[permalink] [raw]
Subject: Re: Re: [PATCH v3] Bluetooth: call lock_sock() outside of spinlock section

Hello everyone,

Sorry, it's my fault to cause the misunderstanding.

As I keep mentioning "hci_sock_sendmsg()" instead of "hci_sock_bound_ioctl()". In fact, both these two functions are able to cause the race.

> >>
> >
> > My bad, was thinking more about the problem and noticed your poc was for hci_sock_sendmsg,
> > not hci_sock_dev_event.
>
> I didn't catch this part. Are you talking about a different poc?
> As far as I'm aware, exp.c in POC.zip was for hci_sock_bound_ioctl(HCIUNBLOCKADDR).
>
> hci_sock_bound_ioctl(HCIUNBLOCKADDR) (which is called between lock_sock() and release_sock())
> calls copy_from_user() which might cause page fault, and userfaultfd mechanism allows an attacker
> to slowdown page fault handling enough to hci_sock_dev_event(HCI_DEV_UNREG) to return without
> waiting for hci_sock_bound_ioctl(HCIUNBLOCKADDR) to call release_sock(). This race window
> results in UAF (doesn't it, LinMa?).
>

Your understanding above is quite right. In addition, because the hci_sock_sendmsg() calls the memcpy_from_msg(...), which also in fact fetch data from userspace memory, the userfaultfd can take effect as well.

(When writing the exploit for this CVE, the hci_sock_sendmsg() is much useful... so I recently keep mentioning this function)

Regards
Lin Ma

2021-07-17 15:43:48

by LinMa

[permalink] [raw]
Subject: Yet Another Patch for CVE-2021-3573

Hi everyone,

After reading large lines of code in the net directory of the kernel, I have the following thinkings and may need your guys' suggestions.

>> >> Saw this and thought I'd offer my two cents.
>> >> BUG: sleeping function called from invalid context
>> >> This is the original problem that Tetsuo's patch was trying to fix.
>>
>> Yes.
>>
>> >> Under the hood of lock_sock, we call lock_sock_nested which might sleep
>> >> because of the mutex_acquire.
>>
>> Both lock_sock() and lock_sock_nested() might sleep.
>>
>> >> But we shouldn't sleep while holding the rw spinlock.
>>
>> Right. In atomic context (e.g. inside interrupt handler, schedulable context
>> with interrupts or preemption disabled, schedulable context inside RCU read
>> critical section, schedulable context inside a spinlock section), we must not
>> call functions (e.g. waiting for a mutex, waiting for a semaphore, waiting for
>> a page fault) which are not atomic.
>>
>> >> So we either have to acquire a spinlock instead of a mutex as was done before,
>>
>> Regarding hci_sock_dev_event(HCI_DEV_UNREG) case, we can't use a spinlock.
>>
>> Like LinMa explained, lock_sock() has to be used in order to serialize functions
>> (e.g. hci_sock_sendmsg()) which access hci_pi(sk)->hdev between lock_sock(sk) and
>> release_sock(sk). And like I explained, we can't defer resetting hci_pi(sk)->hdev
>> to NULL, for hci_sock_dev_event(HCI_DEV_UNREG) is responsible for resetting
>> hci_pi(sk)->hdev to NULL because the caller of hci_sock_dev_event(HCI_DEV_UNREG)
>> immediately destroys resources associated with this hdev.
>>

This is the critical part of the BUG here. As you can read some similar code, for example, code at /net/iucv/af_iucv.c.

The iucv_sock_bind() function will bind the device:

static int iucv_sock_bind(struct socket *sock, struct sockaddr *addr,
int addr_len)
{
...
iucv->hs_dev = dev;

And this field will be assigned as NULL only when the socket is closed.

static void iucv_sock_close(struct sock *sk)
{
...
if (iucv->hs_dev) {
dev_put(iucv->hs_dev);
iucv->hs_dev = NULL;
sk->sk_bound_dev_if = 0;
}

Even in the afiucv_netdev_event() function, there is non business with iucv->hs_dev.

So why the hci_sock_dev_event(HCI_DEV_UNREG) need to set the NULL pointer and then decrease the ref-count?
As Tetsuo said, because the hci_unregister_dev() function, which is the caller of hci_sock_dev_event() will
reclaim the resource of the hdev object. It will destroy the workqueue and also clean up the sysfs.

If we achieve our patches like the iucv stack, or some other ref-count idea (https://lkml.org/lkml/2021/6/22/1347)
without care, the bad thing will happen. Because there is nothing useful in the hdev object, any changes to it make no sense.

But wait, the write or dereference for this object can be illegal, but there should be some legal actions, like reading flags?

Hence, we can still delay the release of the hdev object to hci_sock_release (like other net code does).
We just need to take care of the checking part.

One quick patch is shown below, my POC didn't trigger any warning but more checks are needed.

diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index 251b9128f..db665f78a 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -764,12 +764,9 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
sk_for_each(sk, &hci_sk_list.head) {
bh_lock_sock_nested(sk);
if (hci_pi(sk)->hdev == hdev) {
- hci_pi(sk)->hdev = NULL;
sk->sk_err = EPIPE;
sk->sk_state = BT_OPEN;
sk->sk_state_change(sk);
-
- hci_dev_put(hdev);
}
bh_unlock_sock(sk);
}
@@ -880,6 +877,7 @@ static int hci_sock_release(struct socket *sock)

atomic_dec(&hdev->promisc);
hci_dev_put(hdev);
+ hci_pi(sk)->hdev = NULL;
}

sock_orphan(sk);
@@ -1727,10 +1725,10 @@ static int hci_sock_sendmsg(struct socket *sock, struct msghdr *msg,
break;
case HCI_CHANNEL_MONITOR:
err = -EOPNOTSUPP;
- goto done;
+ goto donefast;
case HCI_CHANNEL_LOGGING:
err = hci_logging_frame(sk, msg, len);
- goto done;
+ goto donefast;
default:
mutex_lock(&mgmt_chan_list_lock);
chan = __hci_mgmt_chan_find(hci_pi(sk)->channel);
@@ -1740,15 +1738,16 @@ static int hci_sock_sendmsg(struct socket *sock, struct msghdr *msg,
err = -EINVAL;

mutex_unlock(&mgmt_chan_list_lock);
- goto done;
+ goto donefast;
}

hdev = hci_pi(sk)->hdev;
if (!hdev) {
err = -EBADFD;
- goto done;
+ goto donefast;
}

+ hci_dev_lock(hdev);
if (!test_bit(HCI_UP, &hdev->flags)) {
err = -ENETDOWN;
goto done;
@@ -1832,6 +1831,8 @@ static int hci_sock_sendmsg(struct socket *sock, struct msghdr *msg,
err = len;

done:
+ hci_dev_unlock(hdev);
+donefast:
release_sock(sk);
return err;


In short, this patch delays the hci_dev_put() to hci_sock_release() and keeps the old bh_lock_sock_nested().

Once we did that, the UAF in hci_sock_bound_ioctl() are fixed. ( The four different commands in hci_sock_bound_ioctl will just
traverse a empty linked list )

For another UAF point: hci_sock_sendmsg(), this patch uses hci_dev_lock() to make sure the flags and resource in hdev will not be
released till the sendmsg is finished. (Dislike the hci_sock_create(), the hci_sock_sendmsg() can sleep so the mutex lock is possible)

Of course, more auditing is needed but I just want to share this to you. Any suggestions and discussions will be much appreciated.


>> >> or we need to move lock_sock out of the rw spinlock critical section as Tetsuo proposes.
>>
>> Exactly. Since this is a regression introduced when fixing CVE-2021-3573, Linux
>> distributors are waiting for this patch so that they can apply the fix for CVE-2021-3573.
>> This patch should be sent to linux.git and stables as soon as possible. But due to little
>> attention on this patch, I'm already testing this patch in linux-next.git via my tree.
>> I'll drop when Bluetooth maintainers pick this patch up for linux-5.14-rcX. (Or should I
>> directly send to Linus?)
>>
>> >>
>> >
>> > My bad, was thinking more about the problem and noticed your poc was for hci_sock_sendmsg,
>> > not hci_sock_dev_event.
>>
>> I didn't catch this part. Are you talking about a different poc?
>> As far as I'm aware, exp.c in POC.zip was for hci_sock_bound_ioctl(HCIUNBLOCKADDR).
>>
>> hci_sock_bound_ioctl(HCIUNBLOCKADDR) (which is called between lock_sock() and release_sock())
>> calls copy_from_user() which might cause page fault, and userfaultfd mechanism allows an attacker
>> to slowdown page fault handling enough to hci_sock_dev_event(HCI_DEV_UNREG) to return without
>> waiting for hci_sock_bound_ioctl(HCIUNBLOCKADDR) to call release_sock(). This race window
>> results in UAF (doesn't it, LinMa?).
>>
>> > In this case, it's not clear to me why the atomic context is being violated.
>>
>> In atomic context (in hci_sock_dev_event(HCI_DEV_UNREG) case, between
>> read_lock(&hci_sk_list.lock) and read_unlock(&hci_sk_list.lock)), we must not call
>> lock_sock(sk) which might wait for hci_sock_bound_ioctl(HCIUNBLOCKADDR) to call release_sock().
>>
>> >
>> > Sorry for the noise.
>> >
>> >>>
>> >>> The patch provided by Desmond adds the local_bh_disable() before the bh_lock_sock() so I also try that in
>> >>>
>> >>> --- a/net/bluetooth/hci_sock.c
>> >>> +++ b/net/bluetooth/hci_sock.c
>> >>> @@ -762,6 +762,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
>> >>> /* Detach sockets from device */
>> >>> read_lock(&hci_sk_list.lock);
>> >>> sk_for_each(sk, &hci_sk_list.head) {
>> >>> + local_bh_disable();
>> >>> bh_lock_sock_nested(sk);
>> >>> if (hci_pi(sk)->hdev == hdev) {
>> >>> hci_pi(sk)->hdev = NULL;
>> >>> @@ -772,6 +773,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
>> >>> hci_dev_put(hdev);
>> >>> }
>> >>> bh_unlock_sock(sk);
>> >>> + local_bh_enable();
>> >>> }
>> >>> read_unlock(&hci_sk_list.lock);
>> >>> }
>> >>>
>> >>> But this is not useful, the UAF still occurs
>> >>>
>> >>
>> >> I might be very mistaken on this, but I believe the UAF still happens because
>> >> you can't really mix bh_lock_sock* and lock_sock* to protect the same things.
>>
>> Right. https://www.kernel.org/doc/html/v5.13/kernel-hacking/locking.html
>>
>> >> The former holds the spinlock &sk->sk_lock.slock and synchronizes between
>> >> user contexts and bottom halves,
>>
>> serializes access to resources which might be accessed from atomic (i.e. non-schedulable) contexts
>>
>> >> while the latter holds a mutex on &sk->sk_lock.dep_map to synchronize between
>> >> multiple users.
>>
>> serializes access to resources which are accessed from only schedulable (i.e. non-atomic) contexts
>>
>> >>
>> >> One option I can think of would be to switch instances of lock_sock to bh_lock_sock_nested
>> >> for users that might race (such as hci_sock_sendmsg, hci_sock_bound_ioctl, and others as
>> >> needed). But I'm not sure if that's quite what we want, plus we would need to ensure that
>> >> sleeping functions aren't called between the bh_lock/unlock.
>>
>> We can't do it for hci_sock_dev_event(HCI_DEV_UNREG).

Regards

Lin Ma

2021-07-17 15:47:50

by LinMa

[permalink] [raw]
Subject: Re: Yet Another Patch for CVE-2021-3573

Ooooooops

I just found that Luiz has already figured out one good patch:
https://www.spinics.net/lists/linux-bluetooth/msg92649.html

Sorry for the noise and happy weekend.

Thanks
Lin Ma


> -----Original Messages-----
> From: LinMa <[email protected]>
> Sent Time: 2021-07-17 23:41:22 (Saturday)
> To: "Tetsuo Handa" <[email protected]>
> Cc: "Desmond Cheong Zhi Xi" <[email protected]>, "Luiz Augusto von Dentz" <[email protected]>, "Johan Hedberg" <[email protected]>, "Marcel Holtmann" <[email protected]>, "[email protected]" <[email protected]>, "David S. Miller" <[email protected]>, "Jakub Kicinski" <[email protected]>, "open list:NETWORKING [GENERAL]" <[email protected]>
> Subject: Yet Another Patch for CVE-2021-3573
>
> Hi everyone,
>
> After reading large lines of code in the net directory of the kernel, I have the following thinkings and may need your guys' suggestions.
>
> >> >> Saw this and thought I'd offer my two cents.
> >> >> BUG: sleeping function called from invalid context
> >> >> This is the original problem that Tetsuo's patch was trying to fix.
> >>
> >> Yes.
> >>
> >> >> Under the hood of lock_sock, we call lock_sock_nested which might sleep
> >> >> because of the mutex_acquire.
> >>
> >> Both lock_sock() and lock_sock_nested() might sleep.
> >>
> >> >> But we shouldn't sleep while holding the rw spinlock.
> >>
> >> Right. In atomic context (e.g. inside interrupt handler, schedulable context
> >> with interrupts or preemption disabled, schedulable context inside RCU read
> >> critical section, schedulable context inside a spinlock section), we must not
> >> call functions (e.g. waiting for a mutex, waiting for a semaphore, waiting for
> >> a page fault) which are not atomic.
> >>
> >> >> So we either have to acquire a spinlock instead of a mutex as was done before,
> >>
> >> Regarding hci_sock_dev_event(HCI_DEV_UNREG) case, we can't use a spinlock.
> >>
> >> Like LinMa explained, lock_sock() has to be used in order to serialize functions
> >> (e.g. hci_sock_sendmsg()) which access hci_pi(sk)->hdev between lock_sock(sk) and
> >> release_sock(sk). And like I explained, we can't defer resetting hci_pi(sk)->hdev
> >> to NULL, for hci_sock_dev_event(HCI_DEV_UNREG) is responsible for resetting
> >> hci_pi(sk)->hdev to NULL because the caller of hci_sock_dev_event(HCI_DEV_UNREG)
> >> immediately destroys resources associated with this hdev.
> >>
>
> This is the critical part of the BUG here. As you can read some similar code, for example, code at /net/iucv/af_iucv.c.
>
> The iucv_sock_bind() function will bind the device:
>
> static int iucv_sock_bind(struct socket *sock, struct sockaddr *addr,
> int addr_len)
> {
> ...
> iucv->hs_dev = dev;
>
> And this field will be assigned as NULL only when the socket is closed.
>
> static void iucv_sock_close(struct sock *sk)
> {
> ...
> if (iucv->hs_dev) {
> dev_put(iucv->hs_dev);
> iucv->hs_dev = NULL;
> sk->sk_bound_dev_if = 0;
> }
>
> Even in the afiucv_netdev_event() function, there is non business with iucv->hs_dev.
>
> So why the hci_sock_dev_event(HCI_DEV_UNREG) need to set the NULL pointer and then decrease the ref-count?
> As Tetsuo said, because the hci_unregister_dev() function, which is the caller of hci_sock_dev_event() will
> reclaim the resource of the hdev object. It will destroy the workqueue and also clean up the sysfs.
>
> If we achieve our patches like the iucv stack, or some other ref-count idea (https://lkml.org/lkml/2021/6/22/1347)
> without care, the bad thing will happen. Because there is nothing useful in the hdev object, any changes to it make no sense.
>
> But wait, the write or dereference for this object can be illegal, but there should be some legal actions, like reading flags?
>
> Hence, we can still delay the release of the hdev object to hci_sock_release (like other net code does).
> We just need to take care of the checking part.
>
> One quick patch is shown below, my POC didn't trigger any warning but more checks are needed.
>
> diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
> index 251b9128f..db665f78a 100644
> --- a/net/bluetooth/hci_sock.c
> +++ b/net/bluetooth/hci_sock.c
> @@ -764,12 +764,9 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
> sk_for_each(sk, &hci_sk_list.head) {
> bh_lock_sock_nested(sk);
> if (hci_pi(sk)->hdev == hdev) {
> - hci_pi(sk)->hdev = NULL;
> sk->sk_err = EPIPE;
> sk->sk_state = BT_OPEN;
> sk->sk_state_change(sk);
> -
> - hci_dev_put(hdev);
> }
> bh_unlock_sock(sk);
> }
> @@ -880,6 +877,7 @@ static int hci_sock_release(struct socket *sock)
>
> atomic_dec(&hdev->promisc);
> hci_dev_put(hdev);
> + hci_pi(sk)->hdev = NULL;
> }
>
> sock_orphan(sk);
> @@ -1727,10 +1725,10 @@ static int hci_sock_sendmsg(struct socket *sock, struct msghdr *msg,
> break;
> case HCI_CHANNEL_MONITOR:
> err = -EOPNOTSUPP;
> - goto done;
> + goto donefast;
> case HCI_CHANNEL_LOGGING:
> err = hci_logging_frame(sk, msg, len);
> - goto done;
> + goto donefast;
> default:
> mutex_lock(&mgmt_chan_list_lock);
> chan = __hci_mgmt_chan_find(hci_pi(sk)->channel);
> @@ -1740,15 +1738,16 @@ static int hci_sock_sendmsg(struct socket *sock, struct msghdr *msg,
> err = -EINVAL;
>
> mutex_unlock(&mgmt_chan_list_lock);
> - goto done;
> + goto donefast;
> }
>
> hdev = hci_pi(sk)->hdev;
> if (!hdev) {
> err = -EBADFD;
> - goto done;
> + goto donefast;
> }
>
> + hci_dev_lock(hdev);
> if (!test_bit(HCI_UP, &hdev->flags)) {
> err = -ENETDOWN;
> goto done;
> @@ -1832,6 +1831,8 @@ static int hci_sock_sendmsg(struct socket *sock, struct msghdr *msg,
> err = len;
>
> done:
> + hci_dev_unlock(hdev);
> +donefast:
> release_sock(sk);
> return err;
>
>
> In short, this patch delays the hci_dev_put() to hci_sock_release() and keeps the old bh_lock_sock_nested().
>
> Once we did that, the UAF in hci_sock_bound_ioctl() are fixed. ( The four different commands in hci_sock_bound_ioctl will just
> traverse a empty linked list )
>
> For another UAF point: hci_sock_sendmsg(), this patch uses hci_dev_lock() to make sure the flags and resource in hdev will not be
> released till the sendmsg is finished. (Dislike the hci_sock_create(), the hci_sock_sendmsg() can sleep so the mutex lock is possible)
>
> Of course, more auditing is needed but I just want to share this to you. Any suggestions and discussions will be much appreciated.
>
>
> >> >> or we need to move lock_sock out of the rw spinlock critical section as Tetsuo proposes.
> >>
> >> Exactly. Since this is a regression introduced when fixing CVE-2021-3573, Linux
> >> distributors are waiting for this patch so that they can apply the fix for CVE-2021-3573.
> >> This patch should be sent to linux.git and stables as soon as possible. But due to little
> >> attention on this patch, I'm already testing this patch in linux-next.git via my tree.
> >> I'll drop when Bluetooth maintainers pick this patch up for linux-5.14-rcX. (Or should I
> >> directly send to Linus?)
> >>
> >> >>
> >> >
> >> > My bad, was thinking more about the problem and noticed your poc was for hci_sock_sendmsg,
> >> > not hci_sock_dev_event.
> >>
> >> I didn't catch this part. Are you talking about a different poc?
> >> As far as I'm aware, exp.c in POC.zip was for hci_sock_bound_ioctl(HCIUNBLOCKADDR).
> >>
> >> hci_sock_bound_ioctl(HCIUNBLOCKADDR) (which is called between lock_sock() and release_sock())
> >> calls copy_from_user() which might cause page fault, and userfaultfd mechanism allows an attacker
> >> to slowdown page fault handling enough to hci_sock_dev_event(HCI_DEV_UNREG) to return without
> >> waiting for hci_sock_bound_ioctl(HCIUNBLOCKADDR) to call release_sock(). This race window
> >> results in UAF (doesn't it, LinMa?).
> >>
> >> > In this case, it's not clear to me why the atomic context is being violated.
> >>
> >> In atomic context (in hci_sock_dev_event(HCI_DEV_UNREG) case, between
> >> read_lock(&hci_sk_list.lock) and read_unlock(&hci_sk_list.lock)), we must not call
> >> lock_sock(sk) which might wait for hci_sock_bound_ioctl(HCIUNBLOCKADDR) to call release_sock().
> >>
> >> >
> >> > Sorry for the noise.
> >> >
> >> >>>
> >> >>> The patch provided by Desmond adds the local_bh_disable() before the bh_lock_sock() so I also try that in
> >> >>>
> >> >>> --- a/net/bluetooth/hci_sock.c
> >> >>> +++ b/net/bluetooth/hci_sock.c
> >> >>> @@ -762,6 +762,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
> >> >>> /* Detach sockets from device */
> >> >>> read_lock(&hci_sk_list.lock);
> >> >>> sk_for_each(sk, &hci_sk_list.head) {
> >> >>> + local_bh_disable();
> >> >>> bh_lock_sock_nested(sk);
> >> >>> if (hci_pi(sk)->hdev == hdev) {
> >> >>> hci_pi(sk)->hdev = NULL;
> >> >>> @@ -772,6 +773,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
> >> >>> hci_dev_put(hdev);
> >> >>> }
> >> >>> bh_unlock_sock(sk);
> >> >>> + local_bh_enable();
> >> >>> }
> >> >>> read_unlock(&hci_sk_list.lock);
> >> >>> }
> >> >>>
> >> >>> But this is not useful, the UAF still occurs
> >> >>>
> >> >>
> >> >> I might be very mistaken on this, but I believe the UAF still happens because
> >> >> you can't really mix bh_lock_sock* and lock_sock* to protect the same things.
> >>
> >> Right. https://www.kernel.org/doc/html/v5.13/kernel-hacking/locking.html
> >>
> >> >> The former holds the spinlock &sk->sk_lock.slock and synchronizes between
> >> >> user contexts and bottom halves,
> >>
> >> serializes access to resources which might be accessed from atomic (i.e. non-schedulable) contexts
> >>
> >> >> while the latter holds a mutex on &sk->sk_lock.dep_map to synchronize between
> >> >> multiple users.
> >>
> >> serializes access to resources which are accessed from only schedulable (i.e. non-atomic) contexts
> >>
> >> >>
> >> >> One option I can think of would be to switch instances of lock_sock to bh_lock_sock_nested
> >> >> for users that might race (such as hci_sock_sendmsg, hci_sock_bound_ioctl, and others as
> >> >> needed). But I'm not sure if that's quite what we want, plus we would need to ensure that
> >> >> sleeping functions aren't called between the bh_lock/unlock.
> >>
> >> We can't do it for hci_sock_dev_event(HCI_DEV_UNREG).
>
> Regards
>
> Lin Ma

2021-07-22 04:50:36

by LinMa

[permalink] [raw]
Subject: Re: Re: [PATCH v3] Bluetooth: call lock_sock() outside of spinlock section

Hi Tetsuo,

Just find out another interesting function: sock_owned_by_user(). (I am just a noob of kernel locks)

Hence I think the following patch has the same 'effect' as the old patch e305509e678b3 ("Bluetooth: use correct lock to prevent UAF of hdev object")

diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index b04a5a02ecf3..0cc4b88daa96 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -762,7 +762,11 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
/* Detach sockets from device */
read_lock(&hci_sk_list.lock);
sk_for_each(sk, &hci_sk_list.head) {
- lock_sock(sk);
+ bh_lock_sock_nested(sk);
+busywait:
+ if (sock_owned_by_user(sk))
+ goto busywait;
+
if (hci_pi(sk)->hdev == hdev) {
hci_pi(sk)->hdev = NULL;
sk->sk_err = EPIPE;
@@ -771,7 +775,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)

hci_dev_put(hdev);
}
- release_sock(sk);
+ bh_unlock_sock(sk);
}
read_unlock(&hci_sk_list.lock);
}

The sad thing is that it seems will cost CPU resource to do meaningless wait...

What do you think? Can this sock_owned_by_user() function do any help?

Thanks
Lin Ma


> From: "Tetsuo Handa" <[email protected]>
> Sent Time: 2021-07-16 22:48:13 (Friday)
> To: "Desmond Cheong Zhi Xi" <[email protected]>, LinMa <[email protected]>, "Luiz Augusto von Dentz" <[email protected]>, "Johan Hedberg" <[email protected]>, "Marcel Holtmann" <[email protected]>
> Cc: "[email protected]" <[email protected]>, "David S. Miller" <[email protected]>, "Jakub Kicinski" <[email protected]>, "open list:NETWORKING [GENERAL]" <[email protected]>
> Subject: Re: [PATCH v3] Bluetooth: call lock_sock() outside of spinlock section
>
> On 2021/07/16 13:11, Desmond Cheong Zhi Xi wrote:
> > On 16/7/21 11:47 am, Desmond Cheong Zhi Xi wrote:
> >> Saw this and thought I'd offer my two cents.
> >> BUG: sleeping function called from invalid context
> >> This is the original problem that Tetsuo's patch was trying to fix.
>
> Yes.
>
> >> Under the hood of lock_sock, we call lock_sock_nested which might sleep
> >> because of the mutex_acquire.
>
> Both lock_sock() and lock_sock_nested() might sleep.
>
> >> But we shouldn't sleep while holding the rw spinlock.
>
> Right. In atomic context (e.g. inside interrupt handler, schedulable context
> with interrupts or preemption disabled, schedulable context inside RCU read
> critical section, schedulable context inside a spinlock section), we must not
> call functions (e.g. waiting for a mutex, waiting for a semaphore, waiting for
> a page fault) which are not atomic.
>
> >> So we either have to acquire a spinlock instead of a mutex as was done before,
>
> Regarding hci_sock_dev_event(HCI_DEV_UNREG) case, we can't use a spinlock.
>
> Like LinMa explained, lock_sock() has to be used in order to serialize functions
> (e.g. hci_sock_sendmsg()) which access hci_pi(sk)->hdev between lock_sock(sk) and
> release_sock(sk). And like I explained, we can't defer resetting hci_pi(sk)->hdev
> to NULL, for hci_sock_dev_event(HCI_DEV_UNREG) is responsible for resetting
> hci_pi(sk)->hdev to NULL because the caller of hci_sock_dev_event(HCI_DEV_UNREG)
> immediately destroys resources associated with this hdev.
>
> >> or we need to move lock_sock out of the rw spinlock critical section as Tetsuo proposes.
>
> Exactly. Since this is a regression introduced when fixing CVE-2021-3573, Linux
> distributors are waiting for this patch so that they can apply the fix for CVE-2021-3573.
> This patch should be sent to linux.git and stables as soon as possible. But due to little
> attention on this patch, I'm already testing this patch in linux-next.git via my tree.
> I'll drop when Bluetooth maintainers pick this patch up for linux-5.14-rcX. (Or should I
> directly send to Linus?)
>
> >>
> >
> > My bad, was thinking more about the problem and noticed your poc was for hci_sock_sendmsg,
> > not hci_sock_dev_event.
>
> I didn't catch this part. Are you talking about a different poc?
> As far as I'm aware, exp.c in POC.zip was for hci_sock_bound_ioctl(HCIUNBLOCKADDR).
>
> hci_sock_bound_ioctl(HCIUNBLOCKADDR) (which is called between lock_sock() and release_sock())
> calls copy_from_user() which might cause page fault, and userfaultfd mechanism allows an attacker
> to slowdown page fault handling enough to hci_sock_dev_event(HCI_DEV_UNREG) to return without
> waiting for hci_sock_bound_ioctl(HCIUNBLOCKADDR) to call release_sock(). This race window
> results in UAF (doesn't it, LinMa?).
>
> > In this case, it's not clear to me why the atomic context is being violated.
>
> In atomic context (in hci_sock_dev_event(HCI_DEV_UNREG) case, between
> read_lock(&hci_sk_list.lock) and read_unlock(&hci_sk_list.lock)), we must not call
> lock_sock(sk) which might wait for hci_sock_bound_ioctl(HCIUNBLOCKADDR) to call release_sock().
>
> >
> > Sorry for the noise.
> >
> >>>
> >>> The patch provided by Desmond adds the local_bh_disable() before the bh_lock_sock() so I also try that in
> >>>
> >>> --- a/net/bluetooth/hci_sock.c
> >>> +++ b/net/bluetooth/hci_sock.c
> >>> @@ -762,6 +762,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
> >>> /* Detach sockets from device */
> >>> read_lock(&hci_sk_list.lock);
> >>> sk_for_each(sk, &hci_sk_list.head) {
> >>> + local_bh_disable();
> >>> bh_lock_sock_nested(sk);
> >>> if (hci_pi(sk)->hdev == hdev) {
> >>> hci_pi(sk)->hdev = NULL;
> >>> @@ -772,6 +773,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
> >>> hci_dev_put(hdev);
> >>> }
> >>> bh_unlock_sock(sk);
> >>> + local_bh_enable();
> >>> }
> >>> read_unlock(&hci_sk_list.lock);
> >>> }
> >>>
> >>> But this is not useful, the UAF still occurs
> >>>
> >>
> >> I might be very mistaken on this, but I believe the UAF still happens because
> >> you can't really mix bh_lock_sock* and lock_sock* to protect the same things.
>
> Right. https://www.kernel.org/doc/html/v5.13/kernel-hacking/locking.html
>
> >> The former holds the spinlock &sk->sk_lock.slock and synchronizes between
> >> user contexts and bottom halves,
>
> serializes access to resources which might be accessed from atomic (i.e. non-schedulable) contexts
>
> >> while the latter holds a mutex on &sk->sk_lock.dep_map to synchronize between
> >> multiple users.
>
> serializes access to resources which are accessed from only schedulable (i.e. non-atomic) contexts
>
> >>
> >> One option I can think of would be to switch instances of lock_sock to bh_lock_sock_nested
> >> for users that might race (such as hci_sock_sendmsg, hci_sock_bound_ioctl, and others as
> >> needed). But I'm not sure if that's quite what we want, plus we would need to ensure that
> >> sleeping functions aren't called between the bh_lock/unlock.
>
> We can't do it for hci_sock_dev_event(HCI_DEV_UNREG).

2021-07-22 05:18:45

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH v3] Bluetooth: call lock_sock() outside of spinlock section

On 2021/07/22 13:47, LinMa wrote:
> Hi Tetsuo,
>
> Just find out another interesting function: sock_owned_by_user(). (I am just a noob of kernel locks)
>
> Hence I think the following patch has the same 'effect' as the old patch e305509e678b3 ("Bluetooth: use correct lock to prevent UAF of hdev object")
>
> diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
> index b04a5a02ecf3..0cc4b88daa96 100644
> --- a/net/bluetooth/hci_sock.c
> +++ b/net/bluetooth/hci_sock.c
> @@ -762,7 +762,11 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
> /* Detach sockets from device */
> read_lock(&hci_sk_list.lock);
> sk_for_each(sk, &hci_sk_list.head) {
> - lock_sock(sk);
> + bh_lock_sock_nested(sk);
> +busywait:
> + if (sock_owned_by_user(sk))
> + goto busywait;
> +
> if (hci_pi(sk)->hdev == hdev) {
> hci_pi(sk)->hdev = NULL;
> sk->sk_err = EPIPE;
> @@ -771,7 +775,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
>
> hci_dev_put(hdev);
> }
> - release_sock(sk);
> + bh_unlock_sock(sk);
> }
> read_unlock(&hci_sk_list.lock);
> }
>
> The sad thing is that it seems will cost CPU resource to do meaningless wait...
>
> What do you think? Can this sock_owned_by_user() function do any help?

I don't think it helps.

One of problems we are seeing (and my patch will fix) is a race window that
this sk_for_each() loop needs to wait using lock_sock() because
"hci_pi(sk)->hdev = hdev;" is called by hci_sock_bind() under lock_sock().
Doing hci_pi(sk)->hdev == hdev comparison without lock_sock() will lead to
failing to "/* Detach sockets from device */".

2021-07-22 09:38:58

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH v3] Bluetooth: call lock_sock() outside of spinlock section

On 2021/07/17 0:26, LinMa wrote:
> Hello everyone,
>
> Sorry, it's my fault to cause the misunderstanding.
>
> As I keep mentioning "hci_sock_sendmsg()" instead of "hci_sock_bound_ioctl()". In fact,
> both these two functions are able to cause the race.

I sent two patches for avoiding page fault with kernel lock held.

[PATCH v2] Bluetooth: reorganize ioctls from hci_sock_bound_ioctl()
https://lkml.kernel.org/r/[email protected]

[PATCH] Bluetooth: reorganize functions from hci_sock_sendmsg()
https://lkml.kernel.org/r/[email protected]

These two patches will eliminate user-controlled delay at lock_sock() which

[PATCH v3] Bluetooth: call lock_sock() outside of spinlock section
https://lkml.kernel.org/r/[email protected]

would suffer.

Are you aware of more locations which trigger page fault with sock lock held?
If none, we can send these three patches together.

If we are absolutely sure that there is no more locations, we could try choice (1) or (2)
at https://lkml.kernel.org/r/[email protected] .