2016-05-10 01:15:35

by Yichen Zhao

[permalink] [raw]
Subject: [PATCH] Bluetooth: Fix l2cap_sock_teardown_cb race condition with bt_accept_dequeue

Fix a race condition between l2cap_sock_teardown_cb on an L2CAP socket
and bt_accept_dequeue on its parent socket. When the race condition is
encountered bt_accept_dequeue may call bt_accept_unlink on an already
unlinked socket and result in a NULL pointer dereference.

Even if bt_accept_unlink is not called by bt_accept_dequeue,
bt_accept_unlink called by l2cap_sock_teardown_cb can race with
list_for_each_entry_safe in bt_accept_dequeue, causing the latter to
loop indefinitely on the unlinked socket, until release_sock crashes
with a NULL pointer dereference when the sock pointer is freed.

The race condition is fixed by locking the parent socket in
l2cap_sock_teardown_cb.

[50510.241632] BUG: unable to handle kernel NULL pointer dereference at 00000000000001a8
[50510.241694] IP: [<ffffffffc01243f7>] bt_accept_unlink+0x47/0xa0 [bluetooth]
[50510.241759] PGD 0
[50510.241776] Oops: 0002 [#1] SMP
[50510.241802] Modules linked in: rtl8192cu rtl_usb rtlwifi rtl8192c_common 8021q garp stp mrp llc rfcomm bnep nls_iso8859_1 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp arc4 ath9k ath9k_common ath9k_hw ath kvm eeepc_wmi asus_wmi mac80211 snd_hda_codec_hdmi snd_hda_codec_realtek sparse_keymap crct10dif_pclmul snd_hda_codec_generic crc32_pclmul snd_hda_intel snd_hda_controller cfg80211 snd_hda_codec i915 snd_hwdep snd_pcm ghash_clmulni_intel snd_timer snd soundcore serio_raw cryptd drm_kms_helper drm i2c_algo_bit shpchp ath3k mei_me lpc_ich btusb bluetooth 6lowpan_iphc mei lp parport wmi video mac_hid psmouse ahci libahci r8169 mii
[50510.242279] CPU: 0 PID: 934 Comm: krfcommd Not tainted 3.16.0-49-generic #65~14.04.1-Ubuntu
[50510.242327] Hardware name: ASUSTeK Computer INC. VM40B/VM40B, BIOS 1501 12/09/2014
[50510.242370] task: ffff8800d9068a30 ti: ffff8800d7a54000 task.ti: ffff8800d7a54000
[50510.242413] RIP: 0010:[<ffffffffc01243f7>] [<ffffffffc01243f7>] bt_accept_unlink+0x47/0xa0 [bluetooth]
[50510.242480] RSP: 0018:ffff8800d7a57d58 EFLAGS: 00010246
[50510.242511] RAX: 0000000000000000 RBX: ffff880119bb8c00 RCX: ffff880119bb8eb0
[50510.242552] RDX: ffff880119bb8eb0 RSI: 00000000fffffe01 RDI: ffff880119bb8c00
[50510.242592] RBP: ffff8800d7a57d60 R08: 0000000000000283 R09: 0000000000000001
[50510.242633] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800d8da9eb0
[50510.242673] R13: ffff8800d74fdb80 R14: ffff880119bb8c00 R15: ffff8800d8da9c00
[50510.242715] FS: 0000000000000000(0000) GS:ffff88011fa00000(0000) knlGS:0000000000000000
[50510.242761] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[50510.242794] CR2: 00000000000001a8 CR3: 0000000001c13000 CR4: 00000000001407f0
[50510.242835] Stack:
[50510.242849] ffff880119bb8eb0 ffff8800d7a57da0 ffffffffc0124506 ffff8800d8da9eb0
[50510.242899] ffff8800d8da9c00 ffff8800d9068a30 0000000000000000 ffff8800d74fdb80
[50510.242949] ffff8800d6f85208 ffff8800d7a57e08 ffffffffc0159985 000000000000001f
[50510.242999] Call Trace:
[50510.243027] [<ffffffffc0124506>] bt_accept_dequeue+0xb6/0x180 [bluetooth]
[50510.243085] [<ffffffffc0159985>] l2cap_sock_accept+0x125/0x220 [bluetooth]
[50510.243128] [<ffffffff810a1b30>] ? wake_up_state+0x20/0x20
[50510.243163] [<ffffffff8164946e>] kernel_accept+0x4e/0xa0
[50510.243200] [<ffffffffc05b97cd>] rfcomm_run+0x1ad/0x890 [rfcomm]
[50510.243238] [<ffffffffc05b9620>] ? rfcomm_process_rx+0x8a0/0x8a0 [rfcomm]
[50510.243281] [<ffffffff81091572>] kthread+0xd2/0xf0
[50510.243312] [<ffffffff810914a0>] ? kthread_create_on_node+0x1c0/0x1c0
[50510.243353] [<ffffffff8176e9d8>] ret_from_fork+0x58/0x90
[50510.243387] [<ffffffff810914a0>] ? kthread_create_on_node+0x1c0/0x1c0
[50510.243424] Code: 00 48 8b 93 b8 02 00 00 48 8d 83 b0 02 00 00 48 89 51 08 48 89 0a 48 89 83 b0 02 00 00 48 89 83 b8 02 00 00 48 8b 83 c0 02 00 00 <66> 83 a8 a8 01 00 00 01 48 c7 83 c0 02 00 00 00 00 00 00 f0 ff
[50510.243685] RIP [<ffffffffc01243f7>] bt_accept_unlink+0x47/0xa0 [bluetooth]
[50510.243737] RSP <ffff8800d7a57d58>
[50510.243758] CR2: 00000000000001a8
[50510.249457] ---[ end trace bb984f932c4e3ab3 ]---

Signed-off-by: Yichen Zhao <[email protected]>
---
net/bluetooth/l2cap_sock.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
index e4cae72..ff1c821 100644
--- a/net/bluetooth/l2cap_sock.c
+++ b/net/bluetooth/l2cap_sock.c
@@ -1307,6 +1307,15 @@ static void l2cap_sock_teardown_cb(struct l2cap_chan *chan, int err)

BT_DBG("chan %p state %s", chan, state_to_string(chan->state));

+ parent = bt_sk(sk)->parent;
+
+ /* The parent sock must be locked if its state is mutated by
+ * bt_accept_unlink. It must be locked before sk to maintain the same
+ * locking order as bt_accept_dequeue.
+ */
+ if (parent)
+ lock_sock_nested(parent, L2CAP_NESTING_PARENT);
+
/* This callback can be called both for server (BT_LISTEN)
* sockets as well as "normal" ones. To avoid lockdep warnings
* with child socket locking (through l2cap_sock_cleanup_listen)
@@ -1316,7 +1325,11 @@ static void l2cap_sock_teardown_cb(struct l2cap_chan *chan, int err)
*/
lock_sock_nested(sk, atomic_read(&chan->nesting));

- parent = bt_sk(sk)->parent;
+ /* bt_accept_unlink could have been called before locking parent. */
+ if (parent && !bt_sk(sk)->parent) {
+ release_sock(parent);
+ parent = NULL;
+ }

sock_set_flag(sk, SOCK_ZAPPED);

@@ -1348,6 +1361,9 @@ static void l2cap_sock_teardown_cb(struct l2cap_chan *chan, int err)
}

release_sock(sk);
+
+ if (parent)
+ release_sock(parent);
}

static void l2cap_sock_state_change_cb(struct l2cap_chan *chan, int state,
--
2.8.0.rc3.226.g39d4020


2016-05-13 21:00:30

by Yichen Zhao

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: Fix l2cap_sock_teardown_cb race condition with bt_accept_dequeue

Hi Marcel,

> so I am not big fan of the conditional locking in case of parent is set or not. Do you have a test case that reproduces the mentioned race. It would love to have that in tools/l2cap-tester or similar.

So far I could only reproduce the bug by repeatedly performing RFCOMM connections and resets. I'll try to implement something in rfcomm-tester or l2cap-tester.

Since this is a race condition, I'm not confident that I can help you reproduce the bug reliably on a different test setup. I'd appreciate it very much if you can offer any tips on triggering a race condition faster in a test case.

> Maybe the code needs some restructuring to avoid the conditional locking.

I agree that my patch is not very elegant, and I'd love any way to improve it.
I have some ideas, but I'm not familiar enough with kernel development to know whether other solutions are safe to implement, such as:

* Removing bt_accept_unlink from l2cap_teardown_cb, and relying on bt_accept_dequeue to unlink the socket when it's enumerated. Is it safe to leave a zapped sock in accept_q?
* Perform "unlock sock; lock parent; lock sock" before calling bt_accept_unlink in teardown_cb. This is still conditional locking, but around a smaller block of code. Is it safe to unlock a zapped sock?
* Use RCU for handling accept_q. Is this appropriate?

Please let me know what you think.

Regards,

Yichen Zhao

2016-05-13 14:54:35

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: Fix l2cap_sock_teardown_cb race condition with bt_accept_dequeue

Hi Yichen,

> Fix a race condition between l2cap_sock_teardown_cb on an L2CAP socket
> and bt_accept_dequeue on its parent socket. When the race condition is
> encountered bt_accept_dequeue may call bt_accept_unlink on an already
> unlinked socket and result in a NULL pointer dereference.
>
> Even if bt_accept_unlink is not called by bt_accept_dequeue,
> bt_accept_unlink called by l2cap_sock_teardown_cb can race with
> list_for_each_entry_safe in bt_accept_dequeue, causing the latter to
> loop indefinitely on the unlinked socket, until release_sock crashes
> with a NULL pointer dereference when the sock pointer is freed.
>
> The race condition is fixed by locking the parent socket in
> l2cap_sock_teardown_cb.
>
> [50510.241632] BUG: unable to handle kernel NULL pointer dereference at 00000000000001a8
> [50510.241694] IP: [<ffffffffc01243f7>] bt_accept_unlink+0x47/0xa0 [bluetooth]
> [50510.241759] PGD 0
> [50510.241776] Oops: 0002 [#1] SMP
> [50510.241802] Modules linked in: rtl8192cu rtl_usb rtlwifi rtl8192c_common 8021q garp stp mrp llc rfcomm bnep nls_iso8859_1 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp arc4 ath9k ath9k_common ath9k_hw ath kvm eeepc_wmi asus_wmi mac80211 snd_hda_codec_hdmi snd_hda_codec_realtek sparse_keymap crct10dif_pclmul snd_hda_codec_generic crc32_pclmul snd_hda_intel snd_hda_controller cfg80211 snd_hda_codec i915 snd_hwdep snd_pcm ghash_clmulni_intel snd_timer snd soundcore serio_raw cryptd drm_kms_helper drm i2c_algo_bit shpchp ath3k mei_me lpc_ich btusb bluetooth 6lowpan_iphc mei lp parport wmi video mac_hid psmouse ahci libahci r8169 mii
> [50510.242279] CPU: 0 PID: 934 Comm: krfcommd Not tainted 3.16.0-49-generic #65~14.04.1-Ubuntu
> [50510.242327] Hardware name: ASUSTeK Computer INC. VM40B/VM40B, BIOS 1501 12/09/2014
> [50510.242370] task: ffff8800d9068a30 ti: ffff8800d7a54000 task.ti: ffff8800d7a54000
> [50510.242413] RIP: 0010:[<ffffffffc01243f7>] [<ffffffffc01243f7>] bt_accept_unlink+0x47/0xa0 [bluetooth]
> [50510.242480] RSP: 0018:ffff8800d7a57d58 EFLAGS: 00010246
> [50510.242511] RAX: 0000000000000000 RBX: ffff880119bb8c00 RCX: ffff880119bb8eb0
> [50510.242552] RDX: ffff880119bb8eb0 RSI: 00000000fffffe01 RDI: ffff880119bb8c00
> [50510.242592] RBP: ffff8800d7a57d60 R08: 0000000000000283 R09: 0000000000000001
> [50510.242633] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800d8da9eb0
> [50510.242673] R13: ffff8800d74fdb80 R14: ffff880119bb8c00 R15: ffff8800d8da9c00
> [50510.242715] FS: 0000000000000000(0000) GS:ffff88011fa00000(0000) knlGS:0000000000000000
> [50510.242761] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [50510.242794] CR2: 00000000000001a8 CR3: 0000000001c13000 CR4: 00000000001407f0
> [50510.242835] Stack:
> [50510.242849] ffff880119bb8eb0 ffff8800d7a57da0 ffffffffc0124506 ffff8800d8da9eb0
> [50510.242899] ffff8800d8da9c00 ffff8800d9068a30 0000000000000000 ffff8800d74fdb80
> [50510.242949] ffff8800d6f85208 ffff8800d7a57e08 ffffffffc0159985 000000000000001f
> [50510.242999] Call Trace:
> [50510.243027] [<ffffffffc0124506>] bt_accept_dequeue+0xb6/0x180 [bluetooth]
> [50510.243085] [<ffffffffc0159985>] l2cap_sock_accept+0x125/0x220 [bluetooth]
> [50510.243128] [<ffffffff810a1b30>] ? wake_up_state+0x20/0x20
> [50510.243163] [<ffffffff8164946e>] kernel_accept+0x4e/0xa0
> [50510.243200] [<ffffffffc05b97cd>] rfcomm_run+0x1ad/0x890 [rfcomm]
> [50510.243238] [<ffffffffc05b9620>] ? rfcomm_process_rx+0x8a0/0x8a0 [rfcomm]
> [50510.243281] [<ffffffff81091572>] kthread+0xd2/0xf0
> [50510.243312] [<ffffffff810914a0>] ? kthread_create_on_node+0x1c0/0x1c0
> [50510.243353] [<ffffffff8176e9d8>] ret_from_fork+0x58/0x90
> [50510.243387] [<ffffffff810914a0>] ? kthread_create_on_node+0x1c0/0x1c0
> [50510.243424] Code: 00 48 8b 93 b8 02 00 00 48 8d 83 b0 02 00 00 48 89 51 08 48 89 0a 48 89 83 b0 02 00 00 48 89 83 b8 02 00 00 48 8b 83 c0 02 00 00 <66> 83 a8 a8 01 00 00 01 48 c7 83 c0 02 00 00 00 00 00 00 f0 ff
> [50510.243685] RIP [<ffffffffc01243f7>] bt_accept_unlink+0x47/0xa0 [bluetooth]
> [50510.243737] RSP <ffff8800d7a57d58>
> [50510.243758] CR2: 00000000000001a8
> [50510.249457] ---[ end trace bb984f932c4e3ab3 ]---
>
> Signed-off-by: Yichen Zhao <[email protected]>
> ---
> net/bluetooth/l2cap_sock.c | 18 +++++++++++++++++-
> 1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
> index e4cae72..ff1c821 100644
> --- a/net/bluetooth/l2cap_sock.c
> +++ b/net/bluetooth/l2cap_sock.c
> @@ -1307,6 +1307,15 @@ static void l2cap_sock_teardown_cb(struct l2cap_chan *chan, int err)
>
> BT_DBG("chan %p state %s", chan, state_to_string(chan->state));
>
> + parent = bt_sk(sk)->parent;
> +
> + /* The parent sock must be locked if its state is mutated by
> + * bt_accept_unlink. It must be locked before sk to maintain the same
> + * locking order as bt_accept_dequeue.
> + */
> + if (parent)
> + lock_sock_nested(parent, L2CAP_NESTING_PARENT);
> +
> /* This callback can be called both for server (BT_LISTEN)
> * sockets as well as "normal" ones. To avoid lockdep warnings
> * with child socket locking (through l2cap_sock_cleanup_listen)
> @@ -1316,7 +1325,11 @@ static void l2cap_sock_teardown_cb(struct l2cap_chan *chan, int err)
> */
> lock_sock_nested(sk, atomic_read(&chan->nesting));
>
> - parent = bt_sk(sk)->parent;
> + /* bt_accept_unlink could have been called before locking parent. */
> + if (parent && !bt_sk(sk)->parent) {
> + release_sock(parent);
> + parent = NULL;
> + }
>
> sock_set_flag(sk, SOCK_ZAPPED);
>
> @@ -1348,6 +1361,9 @@ static void l2cap_sock_teardown_cb(struct l2cap_chan *chan, int err)
> }
>
> release_sock(sk);
> +
> + if (parent)
> + release_sock(parent);

so I am not big fan of the conditional locking in case of parent is set or not. Do you have a test case that reproduces the mentioned race. It would love to have that in tools/l2cap-tester or similar. Maybe the code needs some restructuring to avoid the conditional locking.

Regards

Marcel

2017-02-15 10:14:37

by Dean Jenkins

[permalink] [raw]
Subject: L2CAP l2cap_sock_teardown_cb() race condition with RFCOMM rfcomm_accept_connection()

Hi Marcel,

My E-mail below is based on Yichen Zhao's 2016 E-mail from the mailing
list thread
http://www.spinics.net/lists/linux-bluetooth/msg67218.html

On 13/05/16 23:00, Yichen Zhao wrote:
>> so I am not big fan of the conditional locking in case of parent is set or not. Do you have a test case that reproduces the mentioned race. It would love to have that in tools/l2cap-tester or similar.
> So far I could only reproduce the bug by repeatedly performing RFCOMM connections and resets. I'll try to implement something in rfcomm-tester or l2cap-tester.
>
> Since this is a race condition, I'm not confident that I can help you reproduce the bug reliably on a different test setup. I'd appreciate it very much if you can offer any tips on triggering a race condition faster in a test case.

We see a similar crash in a highly modified 3.14 kernel on ARM but I
think this failure is still possible in bluetooth-next based on kernel
4.10-rc2.

I think the failure scenario is as follows:

RFCOMM is processing the rfcomm_accept_connection() whilst L2CAP is
tearing down the L2CAP channel in l2cap_sock_teardown_cb().

This means there are 2 threads racing with each other; there is the
RFCOMM krfcommd kernel thread and the userland system call thread
demanding L2CAP is torn down via l2cap_chan_del() (I think).

The real-world failure is seen in userland abruptly terminating
Bluetooth connections such as switching to a pairing mode. Note that our
system does not use Bluez userland.

>> Maybe the code needs some restructuring to avoid the conditional locking.
> I agree that my patch is not very elegant, and I'd love any way to improve it.
> I have some ideas, but I'm not familiar enough with kernel development to know whether other solutions are safe to implement, such as:
>
> * Removing bt_accept_unlink from l2cap_teardown_cb, and relying on bt_accept_dequeue to unlink the socket when it's enumerated. Is it safe to leave a zapped sock in accept_q?
> * Perform "unlock sock; lock parent; lock sock" before calling bt_accept_unlink in teardown_cb. This is still conditional locking, but around a smaller block of code. Is it safe to unlock a zapped sock?
> * Use RCU for handling accept_q. Is this appropriate?
>
> Please let me know what you think.

Our analysis suggests that there is a locking weakness here in
net/bluetooth/af_bluetooth.c:

From bluetooth-next based on kernel 4.10-rc2

> struct sock *bt_accept_dequeue(struct sock *parent, struct socket
> *newsock)
> {
> struct bt_sock *s, *n;
> struct sock *sk;
>
> BT_DBG("parent %p", parent);
>
> list_for_each_entry_safe(s, n, &bt_sk(parent)->accept_q, accept_q) {
> sk = (struct sock *)s;

There is a risk that the sk socket gets modified here by another thread
or via pre-emption because there is no protection of sk with respect to
list operations on the parent. Any locking of the parent is done outside
of bt_accept_dequeue().

In other words, sk can be removed from the list by another thread which
could mean that sk has been freed. Therefore, performing a lock on sk
that has been freed is invalid and dangerous.

We suspect that bt_accept_unlink(sk) gets called by the other thread
which will remove sk from the list and set bt_sk(sk)->parent = NULL;
also potentially sk is freed as well.

> lock_sock(sk);

Taking the sk lock here can be too late as sk may already have been
removed from the list.

>
> /* FIXME: Is this check still needed */
> if (sk->sk_state == BT_CLOSED) {
> bt_accept_unlink(sk);

NULL pointer dereference occurs in "2nd call" to bt_accept_unlink(sk)
because bt_sk(sk)->parent is NULL and crashes the bt_accept_unlink(sk) line:
bt_sk(sk)->parent->sk_ack_backlog--;

> release_sock(sk);
> continue;
> }
>
> if (sk->sk_state == BT_CONNECTED || !newsock ||
> test_bit(BT_SK_DEFER_SETUP, &bt_sk(parent)->flags)) {
> bt_accept_unlink(sk);
> if (newsock)
> sock_graft(sk, newsock);
>
> release_sock(sk);
> return sk;
> }
>
> release_sock(sk);
> }
>
> return NULL;
> }

We think the other thread is running l2cap_sock_teardown_cb() as pointed
out by Yichen Zhao (thanks for the hint).

We made the assumption that l2cap_sock_teardown_cb is acting on the same
sk socket as sk in bt_accept_dequeue().

In net/bluetooth/l2cap_sock.c from bluetooth-next based on kernel 4.10-rc2

> static void l2cap_sock_teardown_cb(struct l2cap_chan *chan, int err)
> {
> struct sock *sk = chan->data;
> struct sock *parent;
>
> BT_DBG("chan %p state %s", chan, state_to_string(chan->state));
>
> /* This callback can be called both for server (BT_LISTEN)
> * sockets as well as "normal" ones. To avoid lockdep warnings
> * with child socket locking (through l2cap_sock_cleanup_listen)
> * we need separation into separate nesting levels. The simplest
> * way to accomplish this is to inherit the nesting level used
> * for the channel.
> */
> lock_sock_nested(sk, atomic_read(&chan->nesting));

Taking a lock on sk does not prevent the failure because
bt_accept_dequeue() can run until it waits for the lock.
This is likely to synchronise and serialise the 2 threads which results
in bt_accept_unlink(sk) being called twice for the same sk.

>
> parent = bt_sk(sk)->parent;

Here parent is taken from the sk socket.
If parent is not NULL, the sk is still in the parent list of
bt_accept_dequeue().

>
> sock_set_flag(sk, SOCK_ZAPPED);
>
> switch (chan->state) {
> case BT_OPEN:
> case BT_BOUND:
> case BT_CLOSED:
> break;
> case BT_LISTEN:
> l2cap_sock_cleanup_listen(sk);
> sk->sk_state = BT_CLOSED;
> chan->state = BT_CLOSED;
>
> break;
> default:
> sk->sk_state = BT_CLOSED;
> chan->state = BT_CLOSED;
>
> sk->sk_err = err;
>
> if (parent) {
> bt_accept_unlink(sk);

This call to bt_accept_unlink() removes sk from the list, sets
bt_sk(sk)->parent to NULL and potentially frees sk.

We think this can trigger this crash in RFCOMM:

rfcomm_run() calls
rfcomm_accept_connection() calls
kernel_accept() calls
l2cap_sock_accept() calls
bt_accept_dequeue() which runs concurrently with l2cap_sock_teardown_cb()
bt_accept_unlink() is called from l2cap_sock_teardown_cb()
bt_accept_unlink() is called from bt_accept_dequeue() causing a NULL
pointer dereference crash

> parent->sk_data_ready(parent);
> } else {
> sk->sk_state_change(sk);
> }
>
> break;
> }
>
> release_sock(sk);
> }

I am not familiar with rfcomm-tester or l2cap-tester so I don't know
whether these tools are capable of reproducing this failure case.

Yichen Zhao's patch shown in
http://www.spinics.net/lists/linux-bluetooth/msg67189.html would seem to
be a solution to this crash. We have not yet tested the patch but the
principle looks right to me.

I agree that conditional locking looks strange but if parent is not NULL
then bt_accept_unlink() must not be called without the parent lock being
held.

Do you have any thoughts on the issue ?

Thanks,

Regards,
Dean

--

Dean Jenkins
Embedded Software Engineer
Linux Transportation Solutions
Mentor Embedded Software Division
Mentor Graphics (UK) Ltd.

http://www.mentor.com/embedded