2022-06-22 08:31:03

by Lee Jones

[permalink] [raw]
Subject: [RESEND 1/1] Bluetooth: Use chan_list_lock to protect the whole put/destroy invokation

This change prevents a use-after-free caused by one of the worker
threads starting up (see below) *after* the final channel reference
has been put() during sock_close() but *before* the references to the
channel have been destroyed.

refcount_t: increment on 0; use-after-free.
BUG: KASAN: use-after-free in refcount_dec_and_test+0x20/0xd0
Read of size 4 at addr ffffffc114f5bf18 by task kworker/u17:14/705

CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S W 4.14.234-00003-g1fb6d0bd49a4-dirty #28
Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 Google Inc. MSM sm8150 Flame DVT (DT)
Workqueue: hci0 hci_rx_work
Call trace:
dump_backtrace+0x0/0x378
show_stack+0x20/0x2c
dump_stack+0x124/0x148
print_address_description+0x80/0x2e8
__kasan_report+0x168/0x188
kasan_report+0x10/0x18
__asan_load4+0x84/0x8c
refcount_dec_and_test+0x20/0xd0
l2cap_chan_put+0x48/0x12c
l2cap_recv_frame+0x4770/0x6550
l2cap_recv_acldata+0x44c/0x7a4
hci_acldata_packet+0x100/0x188
hci_rx_work+0x178/0x23c
process_one_work+0x35c/0x95c
worker_thread+0x4cc/0x960
kthread+0x1a8/0x1c4
ret_from_fork+0x10/0x18

Cc: [email protected]
Cc: Marcel Holtmann <[email protected]>
Cc: Johan Hedberg <[email protected]>
Cc: Luiz Augusto von Dentz <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Paolo Abeni <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Lee Jones <[email protected]>
---
net/bluetooth/l2cap_core.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c
index ae78490ecd3d4..82279c5919fd8 100644
--- a/net/bluetooth/l2cap_core.c
+++ b/net/bluetooth/l2cap_core.c
@@ -483,9 +483,7 @@ static void l2cap_chan_destroy(struct kref *kref)

BT_DBG("chan %p", chan);

- write_lock(&chan_list_lock);
list_del(&chan->global_l);
- write_unlock(&chan_list_lock);

kfree(chan);
}
@@ -501,7 +499,9 @@ void l2cap_chan_put(struct l2cap_chan *c)
{
BT_DBG("chan %p orig refcnt %u", c, kref_read(&c->kref));

+ write_lock(&chan_list_lock);
kref_put(&c->kref, l2cap_chan_destroy);
+ write_unlock(&chan_list_lock);
}
EXPORT_SYMBOL_GPL(l2cap_chan_put);

--
2.36.1.255.ge46751e96f-goog


2022-06-22 09:32:51

by bluez.test.bot

[permalink] [raw]
Subject: RE: [RESEND,1/1] Bluetooth: Use chan_list_lock to protect the whole put/destroy invokation

This is automated email and please do not reply to this email!

Dear submitter,

Thank you for submitting the patches to the linux bluetooth mailing list.
This is a CI test results with your patch series:
PW Link:https://patchwork.kernel.org/project/bluetooth/list/?series=652672

---Test result---

Test Summary:
CheckPatch FAIL 1.22 seconds
GitLint FAIL 0.73 seconds
SubjectPrefix PASS 0.63 seconds
BuildKernel PASS 41.39 seconds
BuildKernel32 PASS 36.81 seconds
Incremental Build with patchesPASS 47.55 seconds
TestRunner: Setup PASS 648.42 seconds
TestRunner: l2cap-tester PASS 15.98 seconds
TestRunner: bnep-tester PASS 5.24 seconds
TestRunner: mgmt-tester PASS 96.63 seconds
TestRunner: rfcomm-tester PASS 8.55 seconds
TestRunner: sco-tester PASS 8.32 seconds
TestRunner: smp-tester PASS 8.43 seconds
TestRunner: userchan-tester PASS 5.41 seconds

Details
##############################
Test: CheckPatch - FAIL - 1.22 seconds
Run checkpatch.pl script with rule in .checkpatch.conf
[RESEND,1/1] Bluetooth: Use chan_list_lock to protect the whole put/destroy invokation\WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#96:
CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S W 4.14.234-00003-g1fb6d0bd49a4-dirty #28

total: 0 errors, 1 warnings, 0 checks, 18 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
mechanically convert to the typical style using --fix or --fix-inplace.

/github/workspace/src/12890300.patch has style problems, please review.

NOTE: Ignored message types: UNKNOWN_COMMIT_ID

NOTE: If any of the errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.


##############################
Test: GitLint - FAIL - 0.73 seconds
Run gitlint with rule in .gitlint
[RESEND,1/1] Bluetooth: Use chan_list_lock to protect the whole put/destroy invokation
1: T1 Title exceeds max length (86>80): "[RESEND,1/1] Bluetooth: Use chan_list_lock to protect the whole put/destroy invokation"
12: B1 Line exceeds max length (103>80): " CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S W 4.14.234-00003-g1fb6d0bd49a4-dirty #28"
13: B1 Line exceeds max length (99>80): " Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 Google Inc. MSM sm8150 Flame DVT (DT)"




---
Regards,
Linux Bluetooth

2022-06-27 14:45:41

by Eric Dumazet

[permalink] [raw]
Subject: Re: [RESEND 1/1] Bluetooth: Use chan_list_lock to protect the whole put/destroy invokation

On Wed, Jun 22, 2022 at 10:27 AM Lee Jones <[email protected]> wrote:
>
> This change prevents a use-after-free caused by one of the worker
> threads starting up (see below) *after* the final channel reference
> has been put() during sock_close() but *before* the references to the
> channel have been destroyed.
>
> refcount_t: increment on 0; use-after-free.
> BUG: KASAN: use-after-free in refcount_dec_and_test+0x20/0xd0
> Read of size 4 at addr ffffffc114f5bf18 by task kworker/u17:14/705
>
> CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S W 4.14.234-00003-g1fb6d0bd49a4-dirty #28
> Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 Google Inc. MSM sm8150 Flame DVT (DT)
> Workqueue: hci0 hci_rx_work
> Call trace:
> dump_backtrace+0x0/0x378
> show_stack+0x20/0x2c
> dump_stack+0x124/0x148
> print_address_description+0x80/0x2e8
> __kasan_report+0x168/0x188
> kasan_report+0x10/0x18
> __asan_load4+0x84/0x8c
> refcount_dec_and_test+0x20/0xd0
> l2cap_chan_put+0x48/0x12c
> l2cap_recv_frame+0x4770/0x6550
> l2cap_recv_acldata+0x44c/0x7a4
> hci_acldata_packet+0x100/0x188
> hci_rx_work+0x178/0x23c
> process_one_work+0x35c/0x95c
> worker_thread+0x4cc/0x960
> kthread+0x1a8/0x1c4
> ret_from_fork+0x10/0x18
>
> Cc: [email protected]

When was the bug added ? (Fixes: tag please)

> Cc: Marcel Holtmann <[email protected]>
> Cc: Johan Hedberg <[email protected]>
> Cc: Luiz Augusto von Dentz <[email protected]>
> Cc: "David S. Miller" <[email protected]>
> Cc: Eric Dumazet <[email protected]>
> Cc: Jakub Kicinski <[email protected]>
> Cc: Paolo Abeni <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Lee Jones <[email protected]>
> ---
> net/bluetooth/l2cap_core.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c
> index ae78490ecd3d4..82279c5919fd8 100644
> --- a/net/bluetooth/l2cap_core.c
> +++ b/net/bluetooth/l2cap_core.c
> @@ -483,9 +483,7 @@ static void l2cap_chan_destroy(struct kref *kref)
>
> BT_DBG("chan %p", chan);
>
> - write_lock(&chan_list_lock);
> list_del(&chan->global_l);
> - write_unlock(&chan_list_lock);
>
> kfree(chan);
> }
> @@ -501,7 +499,9 @@ void l2cap_chan_put(struct l2cap_chan *c)
> {
> BT_DBG("chan %p orig refcnt %u", c, kref_read(&c->kref));
>
> + write_lock(&chan_list_lock);
> kref_put(&c->kref, l2cap_chan_destroy);
> + write_unlock(&chan_list_lock);
> }
> EXPORT_SYMBOL_GPL(l2cap_chan_put);
>
> --
> 2.36.1.255.ge46751e96f-goog
>

I do not think this patch is correct.

a kref does not need to be protected by a write lock.

This might shuffle things enough to work around a particular repro you have.

If the patch was correct why not protect kref_get() sides ?

Before the &hdev->rx_work is scheduled (queue_work(hdev->workqueue,
&hdev->rx_work),
a reference must be taken.

Then this reference must be released at the end of hci_rx_work() or
when hdev->workqueue
is canceled.

This refcount is not needed _if_ the workqueue is properly canceled at
device dismantle,
in a synchronous way.

I do not see this hdev->rx_work being canceled, maybe this is the real issue.

There is a call to drain_workqueue() but this is not enough I think,
because hci_recv_frame()
can re-arm
queue_work(hdev->workqueue, &hdev->rx_work);

2022-06-27 23:53:41

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: [RESEND 1/1] Bluetooth: Use chan_list_lock to protect the whole put/destroy invokation

Hi Eric, Lee,

On Mon, Jun 27, 2022 at 7:41 AM Eric Dumazet <[email protected]> wrote:
>
> On Wed, Jun 22, 2022 at 10:27 AM Lee Jones <[email protected]> wrote:
> >
> > This change prevents a use-after-free caused by one of the worker
> > threads starting up (see below) *after* the final channel reference
> > has been put() during sock_close() but *before* the references to the
> > channel have been destroyed.
> >
> > refcount_t: increment on 0; use-after-free.
> > BUG: KASAN: use-after-free in refcount_dec_and_test+0x20/0xd0
> > Read of size 4 at addr ffffffc114f5bf18 by task kworker/u17:14/705
> >
> > CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S W 4.14.234-00003-g1fb6d0bd49a4-dirty #28
> > Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 Google Inc. MSM sm8150 Flame DVT (DT)
> > Workqueue: hci0 hci_rx_work
> > Call trace:
> > dump_backtrace+0x0/0x378
> > show_stack+0x20/0x2c
> > dump_stack+0x124/0x148
> > print_address_description+0x80/0x2e8
> > __kasan_report+0x168/0x188
> > kasan_report+0x10/0x18
> > __asan_load4+0x84/0x8c
> > refcount_dec_and_test+0x20/0xd0
> > l2cap_chan_put+0x48/0x12c
> > l2cap_recv_frame+0x4770/0x6550
> > l2cap_recv_acldata+0x44c/0x7a4
> > hci_acldata_packet+0x100/0x188
> > hci_rx_work+0x178/0x23c
> > process_one_work+0x35c/0x95c
> > worker_thread+0x4cc/0x960
> > kthread+0x1a8/0x1c4
> > ret_from_fork+0x10/0x18
> >
> > Cc: [email protected]
>
> When was the bug added ? (Fixes: tag please)
>
> > Cc: Marcel Holtmann <[email protected]>
> > Cc: Johan Hedberg <[email protected]>
> > Cc: Luiz Augusto von Dentz <[email protected]>
> > Cc: "David S. Miller" <[email protected]>
> > Cc: Eric Dumazet <[email protected]>
> > Cc: Jakub Kicinski <[email protected]>
> > Cc: Paolo Abeni <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > Signed-off-by: Lee Jones <[email protected]>
> > ---
> > net/bluetooth/l2cap_core.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c
> > index ae78490ecd3d4..82279c5919fd8 100644
> > --- a/net/bluetooth/l2cap_core.c
> > +++ b/net/bluetooth/l2cap_core.c
> > @@ -483,9 +483,7 @@ static void l2cap_chan_destroy(struct kref *kref)
> >
> > BT_DBG("chan %p", chan);
> >
> > - write_lock(&chan_list_lock);
> > list_del(&chan->global_l);
> > - write_unlock(&chan_list_lock);
> >
> > kfree(chan);
> > }
> > @@ -501,7 +499,9 @@ void l2cap_chan_put(struct l2cap_chan *c)
> > {
> > BT_DBG("chan %p orig refcnt %u", c, kref_read(&c->kref));
> >
> > + write_lock(&chan_list_lock);
> > kref_put(&c->kref, l2cap_chan_destroy);
> > + write_unlock(&chan_list_lock);
> > }
> > EXPORT_SYMBOL_GPL(l2cap_chan_put);
> >
> > --
> > 2.36.1.255.ge46751e96f-goog
> >
>
> I do not think this patch is correct.
>
> a kref does not need to be protected by a write lock.
>
> This might shuffle things enough to work around a particular repro you have.
>
> If the patch was correct why not protect kref_get() sides ?
>
> Before the &hdev->rx_work is scheduled (queue_work(hdev->workqueue,
> &hdev->rx_work),
> a reference must be taken.
>
> Then this reference must be released at the end of hci_rx_work() or
> when hdev->workqueue
> is canceled.
>
> This refcount is not needed _if_ the workqueue is properly canceled at
> device dismantle,
> in a synchronous way.
>
> I do not see this hdev->rx_work being canceled, maybe this is the real issue.
>
> There is a call to drain_workqueue() but this is not enough I think,
> because hci_recv_frame()
> can re-arm
> queue_work(hdev->workqueue, &hdev->rx_work);

I suspect this likely a refcount problem, we do l2cap_get_chan_by_scid:

/* Find channel with given SCID.
* Returns locked channel. */
static struct l2cap_chan *l2cap_get_chan_by_scid(struct l2cap_conn
*conn, u16 cid)

So we return a locked channel but that doesn't prevent another thread
to call l2cap_chan_put which doesn't care about l2cap_chan_lock so
perhaps we actually need to host a reference while we have the lock,
at least we do something like that on l2cap_sock.c:

l2cap_chan_hold(chan);
l2cap_chan_lock(chan);

__clear_chan_timer(chan);
l2cap_chan_close(chan, ECONNRESET);
l2cap_sock_kill(sk);

l2cap_chan_unlock(chan);
l2cap_chan_put(chan);


--
Luiz Augusto von Dentz

2022-06-28 18:43:30

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: [RESEND 1/1] Bluetooth: Use chan_list_lock to protect the whole put/destroy invokation

Hi Eric, Lee,

On Mon, Jun 27, 2022 at 4:39 PM Luiz Augusto von Dentz
<[email protected]> wrote:
>
> Hi Eric, Lee,
>
> On Mon, Jun 27, 2022 at 7:41 AM Eric Dumazet <[email protected]> wrote:
> >
> > On Wed, Jun 22, 2022 at 10:27 AM Lee Jones <[email protected]> wrote:
> > >
> > > This change prevents a use-after-free caused by one of the worker
> > > threads starting up (see below) *after* the final channel reference
> > > has been put() during sock_close() but *before* the references to the
> > > channel have been destroyed.
> > >
> > > refcount_t: increment on 0; use-after-free.
> > > BUG: KASAN: use-after-free in refcount_dec_and_test+0x20/0xd0
> > > Read of size 4 at addr ffffffc114f5bf18 by task kworker/u17:14/705
> > >
> > > CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S W 4.14.234-00003-g1fb6d0bd49a4-dirty #28
> > > Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 Google Inc. MSM sm8150 Flame DVT (DT)
> > > Workqueue: hci0 hci_rx_work
> > > Call trace:
> > > dump_backtrace+0x0/0x378
> > > show_stack+0x20/0x2c
> > > dump_stack+0x124/0x148
> > > print_address_description+0x80/0x2e8
> > > __kasan_report+0x168/0x188
> > > kasan_report+0x10/0x18
> > > __asan_load4+0x84/0x8c
> > > refcount_dec_and_test+0x20/0xd0
> > > l2cap_chan_put+0x48/0x12c
> > > l2cap_recv_frame+0x4770/0x6550
> > > l2cap_recv_acldata+0x44c/0x7a4
> > > hci_acldata_packet+0x100/0x188
> > > hci_rx_work+0x178/0x23c
> > > process_one_work+0x35c/0x95c
> > > worker_thread+0x4cc/0x960
> > > kthread+0x1a8/0x1c4
> > > ret_from_fork+0x10/0x18
> > >
> > > Cc: [email protected]
> >
> > When was the bug added ? (Fixes: tag please)
> >
> > > Cc: Marcel Holtmann <[email protected]>
> > > Cc: Johan Hedberg <[email protected]>
> > > Cc: Luiz Augusto von Dentz <[email protected]>
> > > Cc: "David S. Miller" <[email protected]>
> > > Cc: Eric Dumazet <[email protected]>
> > > Cc: Jakub Kicinski <[email protected]>
> > > Cc: Paolo Abeni <[email protected]>
> > > Cc: [email protected]
> > > Cc: [email protected]
> > > Signed-off-by: Lee Jones <[email protected]>
> > > ---
> > > net/bluetooth/l2cap_core.c | 4 ++--
> > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c
> > > index ae78490ecd3d4..82279c5919fd8 100644
> > > --- a/net/bluetooth/l2cap_core.c
> > > +++ b/net/bluetooth/l2cap_core.c
> > > @@ -483,9 +483,7 @@ static void l2cap_chan_destroy(struct kref *kref)
> > >
> > > BT_DBG("chan %p", chan);
> > >
> > > - write_lock(&chan_list_lock);
> > > list_del(&chan->global_l);
> > > - write_unlock(&chan_list_lock);
> > >
> > > kfree(chan);
> > > }
> > > @@ -501,7 +499,9 @@ void l2cap_chan_put(struct l2cap_chan *c)
> > > {
> > > BT_DBG("chan %p orig refcnt %u", c, kref_read(&c->kref));
> > >
> > > + write_lock(&chan_list_lock);
> > > kref_put(&c->kref, l2cap_chan_destroy);
> > > + write_unlock(&chan_list_lock);
> > > }
> > > EXPORT_SYMBOL_GPL(l2cap_chan_put);
> > >
> > > --
> > > 2.36.1.255.ge46751e96f-goog
> > >
> >
> > I do not think this patch is correct.
> >
> > a kref does not need to be protected by a write lock.
> >
> > This might shuffle things enough to work around a particular repro you have.
> >
> > If the patch was correct why not protect kref_get() sides ?
> >
> > Before the &hdev->rx_work is scheduled (queue_work(hdev->workqueue,
> > &hdev->rx_work),
> > a reference must be taken.
> >
> > Then this reference must be released at the end of hci_rx_work() or
> > when hdev->workqueue
> > is canceled.
> >
> > This refcount is not needed _if_ the workqueue is properly canceled at
> > device dismantle,
> > in a synchronous way.
> >
> > I do not see this hdev->rx_work being canceled, maybe this is the real issue.
> >
> > There is a call to drain_workqueue() but this is not enough I think,
> > because hci_recv_frame()
> > can re-arm
> > queue_work(hdev->workqueue, &hdev->rx_work);
>
> I suspect this likely a refcount problem, we do l2cap_get_chan_by_scid:
>
> /* Find channel with given SCID.
> * Returns locked channel. */
> static struct l2cap_chan *l2cap_get_chan_by_scid(struct l2cap_conn
> *conn, u16 cid)
>
> So we return a locked channel but that doesn't prevent another thread
> to call l2cap_chan_put which doesn't care about l2cap_chan_lock so
> perhaps we actually need to host a reference while we have the lock,
> at least we do something like that on l2cap_sock.c:
>
> l2cap_chan_hold(chan);
> l2cap_chan_lock(chan);
>
> __clear_chan_timer(chan);
> l2cap_chan_close(chan, ECONNRESET);
> l2cap_sock_kill(sk);
>
> l2cap_chan_unlock(chan);
> l2cap_chan_put(chan);

Perhaps something like this:

diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c
index 09ecaf556de5..9050b6af3577 100644
--- a/net/bluetooth/l2cap_core.c
+++ b/net/bluetooth/l2cap_core.c
@@ -111,7 +111,7 @@ static struct l2cap_chan
*__l2cap_get_chan_by_scid(struct l2cap_conn *conn,
}

/* Find channel with given SCID.
- * Returns locked channel. */
+ * Returns a reference locked channel. */
static struct l2cap_chan *l2cap_get_chan_by_scid(struct l2cap_conn *conn,
u16 cid)
{
@@ -119,15 +119,17 @@ static struct l2cap_chan
*l2cap_get_chan_by_scid(struct l2cap_conn *conn,

mutex_lock(&conn->chan_lock);
c = __l2cap_get_chan_by_scid(conn, cid);
- if (c)
+ if (c) {
+ l2cap_chan_hold(c);
l2cap_chan_lock(c);
+ }
mutex_unlock(&conn->chan_lock);

return c;
}

/* Find channel with given DCID.
- * Returns locked channel.
+ * Returns a reference locked channel.
*/
static struct l2cap_chan *l2cap_get_chan_by_dcid(struct l2cap_conn *conn,
u16 cid)
@@ -136,8 +138,10 @@ static struct l2cap_chan
*l2cap_get_chan_by_dcid(struct l2cap_conn *conn,

mutex_lock(&conn->chan_lock);
c = __l2cap_get_chan_by_dcid(conn, cid);
- if (c)
+ if (c) {
+ l2cap_chan_hold(c);
l2cap_chan_lock(c);
+ }
mutex_unlock(&conn->chan_lock);

return c;
@@ -4464,6 +4468,7 @@ static inline int l2cap_config_req(struct
l2cap_conn *conn,

unlock:
l2cap_chan_unlock(chan);
+ l2cap_chan_put(chan);
return err;
}

@@ -4578,6 +4583,7 @@ static inline int l2cap_config_rsp(struct
l2cap_conn *conn,

done:
l2cap_chan_unlock(chan);
+ l2cap_chan_put(chan);
return err;
}

@@ -5305,6 +5311,7 @@ static inline int l2cap_move_channel_req(struct
l2cap_conn *conn,
l2cap_send_move_chan_rsp(chan, result);

l2cap_chan_unlock(chan);
+ l2cap_chan_put(chan);

return 0;
}
@@ -5397,6 +5404,7 @@ static void l2cap_move_continue(struct
l2cap_conn *conn, u16 icid, u16 result)
}

l2cap_chan_unlock(chan);
+ l2cap_chan_put(chan);
}

static void l2cap_move_fail(struct l2cap_conn *conn, u8 ident, u16 icid,
@@ -5489,6 +5497,7 @@ static int l2cap_move_channel_confirm(struct
l2cap_conn *conn,
l2cap_send_move_chan_cfm_rsp(conn, cmd->ident, icid);

l2cap_chan_unlock(chan);
+ l2cap_chan_put(chan);

return 0;
}
@@ -5524,6 +5533,7 @@ static inline int
l2cap_move_channel_confirm_rsp(struct l2cap_conn *conn,
}

l2cap_chan_unlock(chan);
+ l2cap_chan_put(chan);

return 0;
}
@@ -5896,12 +5906,11 @@ static inline int l2cap_le_credits(struct
l2cap_conn *conn,
if (credits > max_credits) {
BT_ERR("LE credits overflow");
l2cap_send_disconn_req(chan, ECONNRESET);
- l2cap_chan_unlock(chan);

/* Return 0 so that we don't trigger an unnecessary
* command reject packet.
*/
- return 0;
+ goto unlock;
}

chan->tx_credits += credits;
@@ -5912,7 +5921,9 @@ static inline int l2cap_le_credits(struct
l2cap_conn *conn,
if (chan->tx_credits)
chan->ops->resume(chan);

+unlock:
l2cap_chan_unlock(chan);
+ l2cap_chan_put(chan);

return 0;
}
@@ -7598,6 +7609,7 @@ static void l2cap_data_channel(struct l2cap_conn
*conn, u16 cid,

done:
l2cap_chan_unlock(chan);
+ l2cap_chan_put(chan);
}

static void l2cap_conless_channel(struct l2cap_conn *conn, __le16 psm,


>
> --
> Luiz Augusto von Dentz



--
Luiz Augusto von Dentz

2022-06-29 15:40:39

by Lee Jones

[permalink] [raw]
Subject: Re: [RESEND 1/1] Bluetooth: Use chan_list_lock to protect the whole put/destroy invokation

On Tue, 28 Jun 2022, Luiz Augusto von Dentz wrote:

> Hi Eric, Lee,
>
> On Mon, Jun 27, 2022 at 4:39 PM Luiz Augusto von Dentz
> <[email protected]> wrote:
> >
> > Hi Eric, Lee,
> >
> > On Mon, Jun 27, 2022 at 7:41 AM Eric Dumazet <[email protected]> wrote:
> > >
> > > On Wed, Jun 22, 2022 at 10:27 AM Lee Jones <[email protected]> wrote:
> > > >
> > > > This change prevents a use-after-free caused by one of the worker
> > > > threads starting up (see below) *after* the final channel reference
> > > > has been put() during sock_close() but *before* the references to the
> > > > channel have been destroyed.
> > > >
> > > > refcount_t: increment on 0; use-after-free.
> > > > BUG: KASAN: use-after-free in refcount_dec_and_test+0x20/0xd0
> > > > Read of size 4 at addr ffffffc114f5bf18 by task kworker/u17:14/705
> > > >
> > > > CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S W 4.14.234-00003-g1fb6d0bd49a4-dirty #28
> > > > Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 Google Inc. MSM sm8150 Flame DVT (DT)
> > > > Workqueue: hci0 hci_rx_work
> > > > Call trace:
> > > > dump_backtrace+0x0/0x378
> > > > show_stack+0x20/0x2c
> > > > dump_stack+0x124/0x148
> > > > print_address_description+0x80/0x2e8
> > > > __kasan_report+0x168/0x188
> > > > kasan_report+0x10/0x18
> > > > __asan_load4+0x84/0x8c
> > > > refcount_dec_and_test+0x20/0xd0
> > > > l2cap_chan_put+0x48/0x12c
> > > > l2cap_recv_frame+0x4770/0x6550
> > > > l2cap_recv_acldata+0x44c/0x7a4
> > > > hci_acldata_packet+0x100/0x188
> > > > hci_rx_work+0x178/0x23c
> > > > process_one_work+0x35c/0x95c
> > > > worker_thread+0x4cc/0x960
> > > > kthread+0x1a8/0x1c4
> > > > ret_from_fork+0x10/0x18
> > > >
> > > > Cc: [email protected]
> > >
> > > When was the bug added ? (Fixes: tag please)
> > >
> > > > Cc: Marcel Holtmann <[email protected]>
> > > > Cc: Johan Hedberg <[email protected]>
> > > > Cc: Luiz Augusto von Dentz <[email protected]>
> > > > Cc: "David S. Miller" <[email protected]>
> > > > Cc: Eric Dumazet <[email protected]>
> > > > Cc: Jakub Kicinski <[email protected]>
> > > > Cc: Paolo Abeni <[email protected]>
> > > > Cc: [email protected]
> > > > Cc: [email protected]
> > > > Signed-off-by: Lee Jones <[email protected]>
> > > > ---
> > > > net/bluetooth/l2cap_core.c | 4 ++--
> > > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c
> > > > index ae78490ecd3d4..82279c5919fd8 100644
> > > > --- a/net/bluetooth/l2cap_core.c
> > > > +++ b/net/bluetooth/l2cap_core.c
> > > > @@ -483,9 +483,7 @@ static void l2cap_chan_destroy(struct kref *kref)
> > > >
> > > > BT_DBG("chan %p", chan);
> > > >
> > > > - write_lock(&chan_list_lock);
> > > > list_del(&chan->global_l);
> > > > - write_unlock(&chan_list_lock);
> > > >
> > > > kfree(chan);
> > > > }
> > > > @@ -501,7 +499,9 @@ void l2cap_chan_put(struct l2cap_chan *c)
> > > > {
> > > > BT_DBG("chan %p orig refcnt %u", c, kref_read(&c->kref));
> > > >
> > > > + write_lock(&chan_list_lock);
> > > > kref_put(&c->kref, l2cap_chan_destroy);
> > > > + write_unlock(&chan_list_lock);
> > > > }
> > > > EXPORT_SYMBOL_GPL(l2cap_chan_put);
> > > >
> > > >
> > >
> > > I do not think this patch is correct.
> > >
> > > a kref does not need to be protected by a write lock.
> > >
> > > This might shuffle things enough to work around a particular repro you have.
> > >
> > > If the patch was correct why not protect kref_get() sides ?
> > >
> > > Before the &hdev->rx_work is scheduled (queue_work(hdev->workqueue,
> > > &hdev->rx_work),
> > > a reference must be taken.
> > >
> > > Then this reference must be released at the end of hci_rx_work() or
> > > when hdev->workqueue
> > > is canceled.
> > >
> > > This refcount is not needed _if_ the workqueue is properly canceled at
> > > device dismantle,
> > > in a synchronous way.
> > >
> > > I do not see this hdev->rx_work being canceled, maybe this is the real issue.
> > >
> > > There is a call to drain_workqueue() but this is not enough I think,
> > > because hci_recv_frame()
> > > can re-arm
> > > queue_work(hdev->workqueue, &hdev->rx_work);
> >
> > I suspect this likely a refcount problem, we do l2cap_get_chan_by_scid:
> >
> > /* Find channel with given SCID.
> > * Returns locked channel. */
> > static struct l2cap_chan *l2cap_get_chan_by_scid(struct l2cap_conn
> > *conn, u16 cid)
> >
> > So we return a locked channel but that doesn't prevent another thread
> > to call l2cap_chan_put which doesn't care about l2cap_chan_lock so
> > perhaps we actually need to host a reference while we have the lock,
> > at least we do something like that on l2cap_sock.c:
> >
> > l2cap_chan_hold(chan);
> > l2cap_chan_lock(chan);
> >
> > __clear_chan_timer(chan);
> > l2cap_chan_close(chan, ECONNRESET);
> > l2cap_sock_kill(sk);
> >
> > l2cap_chan_unlock(chan);
> > l2cap_chan_put(chan);
>
> Perhaps something like this:

I'm struggling to apply this for test:

"error: corrupt patch at line 6"

Please could you attach a `git format-patch`ed patch instead?

> diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c
> index 09ecaf556de5..9050b6af3577 100644
> --- a/net/bluetooth/l2cap_core.c
> +++ b/net/bluetooth/l2cap_core.c
> @@ -111,7 +111,7 @@ static struct l2cap_chan
> *__l2cap_get_chan_by_scid(struct l2cap_conn *conn,
> }
>
> /* Find channel with given SCID.
> - * Returns locked channel. */
> + * Returns a reference locked channel. */
> static struct l2cap_chan *l2cap_get_chan_by_scid(struct l2cap_conn *conn,
> u16 cid)
> {
> @@ -119,15 +119,17 @@ static struct l2cap_chan
> *l2cap_get_chan_by_scid(struct l2cap_conn *conn,
>
> mutex_lock(&conn->chan_lock);
> c = __l2cap_get_chan_by_scid(conn, cid);
> - if (c)
> + if (c) {
> + l2cap_chan_hold(c);
> l2cap_chan_lock(c);
> + }
> mutex_unlock(&conn->chan_lock);
>
> return c;
> }
>
> /* Find channel with given DCID.
> - * Returns locked channel.
> + * Returns a reference locked channel.
> */
> static struct l2cap_chan *l2cap_get_chan_by_dcid(struct l2cap_conn *conn,
> u16 cid)
> @@ -136,8 +138,10 @@ static struct l2cap_chan
> *l2cap_get_chan_by_dcid(struct l2cap_conn *conn,
>
> mutex_lock(&conn->chan_lock);
> c = __l2cap_get_chan_by_dcid(conn, cid);
> - if (c)
> + if (c) {
> + l2cap_chan_hold(c);
> l2cap_chan_lock(c);
> + }
> mutex_unlock(&conn->chan_lock);
>
> return c;
> @@ -4464,6 +4468,7 @@ static inline int l2cap_config_req(struct
> l2cap_conn *conn,
>
> unlock:
> l2cap_chan_unlock(chan);
> + l2cap_chan_put(chan);
> return err;
> }
>
> @@ -4578,6 +4583,7 @@ static inline int l2cap_config_rsp(struct
> l2cap_conn *conn,
>
> done:
> l2cap_chan_unlock(chan);
> + l2cap_chan_put(chan);
> return err;
> }
>
> @@ -5305,6 +5311,7 @@ static inline int l2cap_move_channel_req(struct
> l2cap_conn *conn,
> l2cap_send_move_chan_rsp(chan, result);
>
> l2cap_chan_unlock(chan);
> + l2cap_chan_put(chan);
>
> return 0;
> }
> @@ -5397,6 +5404,7 @@ static void l2cap_move_continue(struct
> l2cap_conn *conn, u16 icid, u16 result)
> }
>
> l2cap_chan_unlock(chan);
> + l2cap_chan_put(chan);
> }
>
> static void l2cap_move_fail(struct l2cap_conn *conn, u8 ident, u16 icid,
> @@ -5489,6 +5497,7 @@ static int l2cap_move_channel_confirm(struct
> l2cap_conn *conn,
> l2cap_send_move_chan_cfm_rsp(conn, cmd->ident, icid);
>
> l2cap_chan_unlock(chan);
> + l2cap_chan_put(chan);
>
> return 0;
> }
> @@ -5524,6 +5533,7 @@ static inline int
> l2cap_move_channel_confirm_rsp(struct l2cap_conn *conn,
> }
>
> l2cap_chan_unlock(chan);
> + l2cap_chan_put(chan);
>
> return 0;
> }
> @@ -5896,12 +5906,11 @@ static inline int l2cap_le_credits(struct
> l2cap_conn *conn,
> if (credits > max_credits) {
> BT_ERR("LE credits overflow");
> l2cap_send_disconn_req(chan, ECONNRESET);
> - l2cap_chan_unlock(chan);
>
> /* Return 0 so that we don't trigger an unnecessary
> * command reject packet.
> */
> - return 0;
> + goto unlock;
> }
>
> chan->tx_credits += credits;
> @@ -5912,7 +5921,9 @@ static inline int l2cap_le_credits(struct
> l2cap_conn *conn,
> if (chan->tx_credits)
> chan->ops->resume(chan);
>
> +unlock:
> l2cap_chan_unlock(chan);
> + l2cap_chan_put(chan);
>
> return 0;
> }
> @@ -7598,6 +7609,7 @@ static void l2cap_data_channel(struct l2cap_conn
> *conn, u16 cid,
>
> done:
> l2cap_chan_unlock(chan);
> + l2cap_chan_put(chan);
> }
>
> static void l2cap_conless_channel(struct l2cap_conn *conn, __le16 psm,
>
>
> >
>
>
>

--
Lee Jones [李琼斯]
Principal Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog

2022-07-05 17:30:15

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: [RESEND 1/1] Bluetooth: Use chan_list_lock to protect the whole put/destroy invokation

Hi Lee,

On Wed, Jun 29, 2022 at 8:28 AM Lee Jones <[email protected]> wrote:
>
> On Tue, 28 Jun 2022, Luiz Augusto von Dentz wrote:
>
> > Hi Eric, Lee,
> >
> > On Mon, Jun 27, 2022 at 4:39 PM Luiz Augusto von Dentz
> > <[email protected]> wrote:
> > >
> > > Hi Eric, Lee,
> > >
> > > On Mon, Jun 27, 2022 at 7:41 AM Eric Dumazet <[email protected]> wrote:
> > > >
> > > > On Wed, Jun 22, 2022 at 10:27 AM Lee Jones <[email protected]> wrote:
> > > > >
> > > > > This change prevents a use-after-free caused by one of the worker
> > > > > threads starting up (see below) *after* the final channel reference
> > > > > has been put() during sock_close() but *before* the references to the
> > > > > channel have been destroyed.
> > > > >
> > > > > refcount_t: increment on 0; use-after-free.
> > > > > BUG: KASAN: use-after-free in refcount_dec_and_test+0x20/0xd0
> > > > > Read of size 4 at addr ffffffc114f5bf18 by task kworker/u17:14/705
> > > > >
> > > > > CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S W 4.14.234-00003-g1fb6d0bd49a4-dirty #28
> > > > > Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 Google Inc. MSM sm8150 Flame DVT (DT)
> > > > > Workqueue: hci0 hci_rx_work
> > > > > Call trace:
> > > > > dump_backtrace+0x0/0x378
> > > > > show_stack+0x20/0x2c
> > > > > dump_stack+0x124/0x148
> > > > > print_address_description+0x80/0x2e8
> > > > > __kasan_report+0x168/0x188
> > > > > kasan_report+0x10/0x18
> > > > > __asan_load4+0x84/0x8c
> > > > > refcount_dec_and_test+0x20/0xd0
> > > > > l2cap_chan_put+0x48/0x12c
> > > > > l2cap_recv_frame+0x4770/0x6550
> > > > > l2cap_recv_acldata+0x44c/0x7a4
> > > > > hci_acldata_packet+0x100/0x188
> > > > > hci_rx_work+0x178/0x23c
> > > > > process_one_work+0x35c/0x95c
> > > > > worker_thread+0x4cc/0x960
> > > > > kthread+0x1a8/0x1c4
> > > > > ret_from_fork+0x10/0x18
> > > > >
> > > > > Cc: [email protected]
> > > >
> > > > When was the bug added ? (Fixes: tag please)
> > > >
> > > > > Cc: Marcel Holtmann <[email protected]>
> > > > > Cc: Johan Hedberg <[email protected]>
> > > > > Cc: Luiz Augusto von Dentz <[email protected]>
> > > > > Cc: "David S. Miller" <[email protected]>
> > > > > Cc: Eric Dumazet <[email protected]>
> > > > > Cc: Jakub Kicinski <[email protected]>
> > > > > Cc: Paolo Abeni <[email protected]>
> > > > > Cc: [email protected]
> > > > > Cc: [email protected]
> > > > > Signed-off-by: Lee Jones <[email protected]>
> > > > > ---
> > > > > net/bluetooth/l2cap_core.c | 4 ++--
> > > > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c
> > > > > index ae78490ecd3d4..82279c5919fd8 100644
> > > > > --- a/net/bluetooth/l2cap_core.c
> > > > > +++ b/net/bluetooth/l2cap_core.c
> > > > > @@ -483,9 +483,7 @@ static void l2cap_chan_destroy(struct kref *kref)
> > > > >
> > > > > BT_DBG("chan %p", chan);
> > > > >
> > > > > - write_lock(&chan_list_lock);
> > > > > list_del(&chan->global_l);
> > > > > - write_unlock(&chan_list_lock);
> > > > >
> > > > > kfree(chan);
> > > > > }
> > > > > @@ -501,7 +499,9 @@ void l2cap_chan_put(struct l2cap_chan *c)
> > > > > {
> > > > > BT_DBG("chan %p orig refcnt %u", c, kref_read(&c->kref));
> > > > >
> > > > > + write_lock(&chan_list_lock);
> > > > > kref_put(&c->kref, l2cap_chan_destroy);
> > > > > + write_unlock(&chan_list_lock);
> > > > > }
> > > > > EXPORT_SYMBOL_GPL(l2cap_chan_put);
> > > > >
> > > > >
> > > >
> > > > I do not think this patch is correct.
> > > >
> > > > a kref does not need to be protected by a write lock.
> > > >
> > > > This might shuffle things enough to work around a particular repro you have.
> > > >
> > > > If the patch was correct why not protect kref_get() sides ?
> > > >
> > > > Before the &hdev->rx_work is scheduled (queue_work(hdev->workqueue,
> > > > &hdev->rx_work),
> > > > a reference must be taken.
> > > >
> > > > Then this reference must be released at the end of hci_rx_work() or
> > > > when hdev->workqueue
> > > > is canceled.
> > > >
> > > > This refcount is not needed _if_ the workqueue is properly canceled at
> > > > device dismantle,
> > > > in a synchronous way.
> > > >
> > > > I do not see this hdev->rx_work being canceled, maybe this is the real issue.
> > > >
> > > > There is a call to drain_workqueue() but this is not enough I think,
> > > > because hci_recv_frame()
> > > > can re-arm
> > > > queue_work(hdev->workqueue, &hdev->rx_work);
> > >
> > > I suspect this likely a refcount problem, we do l2cap_get_chan_by_scid:
> > >
> > > /* Find channel with given SCID.
> > > * Returns locked channel. */
> > > static struct l2cap_chan *l2cap_get_chan_by_scid(struct l2cap_conn
> > > *conn, u16 cid)
> > >
> > > So we return a locked channel but that doesn't prevent another thread
> > > to call l2cap_chan_put which doesn't care about l2cap_chan_lock so
> > > perhaps we actually need to host a reference while we have the lock,
> > > at least we do something like that on l2cap_sock.c:
> > >
> > > l2cap_chan_hold(chan);
> > > l2cap_chan_lock(chan);
> > >
> > > __clear_chan_timer(chan);
> > > l2cap_chan_close(chan, ECONNRESET);
> > > l2cap_sock_kill(sk);
> > >
> > > l2cap_chan_unlock(chan);
> > > l2cap_chan_put(chan);
> >
> > Perhaps something like this:
>
> I'm struggling to apply this for test:
>
> "error: corrupt patch at line 6"

Check with the attached patch.


--
Luiz Augusto von Dentz


Attachments:
0001-Bluetooth-L2CAP-WIP.patch (3.54 kB)

2022-07-06 11:09:18

by Lee Jones

[permalink] [raw]
Subject: Re: [RESEND 1/1] Bluetooth: Use chan_list_lock to protect the whole put/destroy invokation

On Tue, 05 Jul 2022, Luiz Augusto von Dentz wrote:

> Hi Lee,
>
> On Wed, Jun 29, 2022 at 8:28 AM Lee Jones <[email protected]> wrote:
> >
> > On Tue, 28 Jun 2022, Luiz Augusto von Dentz wrote:
> >
> > > Hi Eric, Lee,
> > >
> > > On Mon, Jun 27, 2022 at 4:39 PM Luiz Augusto von Dentz
> > > <[email protected]> wrote:
> > > >
> > > > Hi Eric, Lee,
> > > >
> > > > On Mon, Jun 27, 2022 at 7:41 AM Eric Dumazet <[email protected]> wrote:
> > > > >
> > > > > On Wed, Jun 22, 2022 at 10:27 AM Lee Jones <[email protected]> wrote:
> > > > > >
> > > > > > This change prevents a use-after-free caused by one of the worker
> > > > > > threads starting up (see below) *after* the final channel reference
> > > > > > has been put() during sock_close() but *before* the references to the
> > > > > > channel have been destroyed.
> > > > > >
> > > > > > refcount_t: increment on 0; use-after-free.
> > > > > > BUG: KASAN: use-after-free in refcount_dec_and_test+0x20/0xd0
> > > > > > Read of size 4 at addr ffffffc114f5bf18 by task kworker/u17:14/705
> > > > > >
> > > > > > CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S W 4.14.234-00003-g1fb6d0bd49a4-dirty #28
> > > > > > Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 Google Inc. MSM sm8150 Flame DVT (DT)
> > > > > > Workqueue: hci0 hci_rx_work
> > > > > > Call trace:
> > > > > > dump_backtrace+0x0/0x378
> > > > > > show_stack+0x20/0x2c
> > > > > > dump_stack+0x124/0x148
> > > > > > print_address_description+0x80/0x2e8
> > > > > > __kasan_report+0x168/0x188
> > > > > > kasan_report+0x10/0x18
> > > > > > __asan_load4+0x84/0x8c
> > > > > > refcount_dec_and_test+0x20/0xd0
> > > > > > l2cap_chan_put+0x48/0x12c
> > > > > > l2cap_recv_frame+0x4770/0x6550
> > > > > > l2cap_recv_acldata+0x44c/0x7a4
> > > > > > hci_acldata_packet+0x100/0x188
> > > > > > hci_rx_work+0x178/0x23c
> > > > > > process_one_work+0x35c/0x95c
> > > > > > worker_thread+0x4cc/0x960
> > > > > > kthread+0x1a8/0x1c4
> > > > > > ret_from_fork+0x10/0x18
> > > > > >
> > > > > > Cc: [email protected]
> > > > >
> > > > > When was the bug added ? (Fixes: tag please)
> > > > >
> > > > > > Cc: Marcel Holtmann <[email protected]>
> > > > > > Cc: Johan Hedberg <[email protected]>
> > > > > > Cc: Luiz Augusto von Dentz <[email protected]>
> > > > > > Cc: "David S. Miller" <[email protected]>
> > > > > > Cc: Eric Dumazet <[email protected]>
> > > > > > Cc: Jakub Kicinski <[email protected]>
> > > > > > Cc: Paolo Abeni <[email protected]>
> > > > > > Cc: [email protected]
> > > > > > Cc: [email protected]
> > > > > > Signed-off-by: Lee Jones <[email protected]>
> > > > > > ---
> > > > > > net/bluetooth/l2cap_core.c | 4 ++--
> > > > > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c
> > > > > > index ae78490ecd3d4..82279c5919fd8 100644
> > > > > > --- a/net/bluetooth/l2cap_core.c
> > > > > > +++ b/net/bluetooth/l2cap_core.c
> > > > > > @@ -483,9 +483,7 @@ static void l2cap_chan_destroy(struct kref *kref)
> > > > > >
> > > > > > BT_DBG("chan %p", chan);
> > > > > >
> > > > > > - write_lock(&chan_list_lock);
> > > > > > list_del(&chan->global_l);
> > > > > > - write_unlock(&chan_list_lock);
> > > > > >
> > > > > > kfree(chan);
> > > > > > }
> > > > > > @@ -501,7 +499,9 @@ void l2cap_chan_put(struct l2cap_chan *c)
> > > > > > {
> > > > > > BT_DBG("chan %p orig refcnt %u", c, kref_read(&c->kref));
> > > > > >
> > > > > > + write_lock(&chan_list_lock);
> > > > > > kref_put(&c->kref, l2cap_chan_destroy);
> > > > > > + write_unlock(&chan_list_lock);
> > > > > > }
> > > > > > EXPORT_SYMBOL_GPL(l2cap_chan_put);
> > > > > >
> > > > > >
> > > > >
> > > > > I do not think this patch is correct.
> > > > >
> > > > > a kref does not need to be protected by a write lock.
> > > > >
> > > > > This might shuffle things enough to work around a particular repro you have.
> > > > >
> > > > > If the patch was correct why not protect kref_get() sides ?
> > > > >
> > > > > Before the &hdev->rx_work is scheduled (queue_work(hdev->workqueue,
> > > > > &hdev->rx_work),
> > > > > a reference must be taken.
> > > > >
> > > > > Then this reference must be released at the end of hci_rx_work() or
> > > > > when hdev->workqueue
> > > > > is canceled.
> > > > >
> > > > > This refcount is not needed _if_ the workqueue is properly canceled at
> > > > > device dismantle,
> > > > > in a synchronous way.
> > > > >
> > > > > I do not see this hdev->rx_work being canceled, maybe this is the real issue.
> > > > >
> > > > > There is a call to drain_workqueue() but this is not enough I think,
> > > > > because hci_recv_frame()
> > > > > can re-arm
> > > > > queue_work(hdev->workqueue, &hdev->rx_work);
> > > >
> > > > I suspect this likely a refcount problem, we do l2cap_get_chan_by_scid:
> > > >
> > > > /* Find channel with given SCID.
> > > > * Returns locked channel. */
> > > > static struct l2cap_chan *l2cap_get_chan_by_scid(struct l2cap_conn
> > > > *conn, u16 cid)
> > > >
> > > > So we return a locked channel but that doesn't prevent another thread
> > > > to call l2cap_chan_put which doesn't care about l2cap_chan_lock so
> > > > perhaps we actually need to host a reference while we have the lock,
> > > > at least we do something like that on l2cap_sock.c:
> > > >
> > > > l2cap_chan_hold(chan);
> > > > l2cap_chan_lock(chan);
> > > >
> > > > __clear_chan_timer(chan);
> > > > l2cap_chan_close(chan, ECONNRESET);
> > > > l2cap_sock_kill(sk);
> > > >
> > > > l2cap_chan_unlock(chan);
> > > > l2cap_chan_put(chan);
> > >
> > > Perhaps something like this:
> >
> > I'm struggling to apply this for test:
> >
> > "error: corrupt patch at line 6"
>
> Check with the attached patch.

With the patch applied:

[ 188.825418][ T75] refcount_t: addition on 0; use-after-free.
[ 188.825418][ T75] refcount_t: addition on 0; use-after-free.
[ 188.840629][ T75] WARNING: CPU: 5 PID: 75 at lib/refcount.c:25 refcount_warn_saturate+0x147/0x1b0
[ 188.840629][ T75] WARNING: CPU: 5 PID: 75 at lib/refcount.c:25 refcount_warn_saturate+0x147/0x1b0
[ 188.862642][ T75] Modules linked in:
[ 188.862642][ T75] Modules linked in:
[ 188.871686][ T75] CPU: 5 PID: 75 Comm: kworker/u17:0 Not tainted 5.18.0-00005-gc3401a7ad65f #8
[ 188.871686][ T75] CPU: 5 PID: 75 Comm: kworker/u17:0 Not tainted 5.18.0-00005-gc3401a7ad65f #8
[ 188.892806][ T75] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
[ 188.892806][ T75] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
[ 188.917398][ T75] Workqueue: hci0 hci_rx_work
[ 188.917398][ T75] Workqueue: hci0 hci_rx_work
[ 188.928515][ T75] RIP: 0010:refcount_warn_saturate+0x147/0x1b0
[ 188.928515][ T75] RIP: 0010:refcount_warn_saturate+0x147/0x1b0
[ 188.943176][ T75] Code: c7 e0 a1 70 85 31 c0 e8 d7 c2 e8 fe 0f 0b eb a1 e8 fe 33 15 ff c6 05 f9 e2 a5 04 01 48 c7 c7 80 a2 70 80
[ 188.943176][ T75] Code: c7 e0 a1 70 85 31 c0 e8 d7 c2 e8 fe 0f 0b eb a1 e8 fe 33 15 ff c6 05 f9 e2 a5 04 01 48 c7 c7 80 a2 70 80
[ 188.990053][ T75] RSP: 0018:ffffc9000156f800 EFLAGS: 00010246
[ 188.990053][ T75] RSP: 0018:ffffc9000156f800 EFLAGS: 00010246
[ 189.004337][ T75] RAX: 118d918bf1a47e00 RBX: 0000000000000002 RCX: ffff8881130ce000
[ 189.004337][ T75] RAX: 118d918bf1a47e00 RBX: 0000000000000002 RCX: ffff8881130ce000
[ 189.023131][ T75] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
[ 189.023131][ T75] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
[ 189.042044][ T75] RBP: ffffc9000156f810 R08: ffffffff81574218 R09: ffffed107de165d1
[ 189.042044][ T75] RBP: ffffc9000156f810 R08: ffffffff81574218 R09: ffffed107de165d1
[ 189.060967][ T75] R10: ffffed107de165d1 R11: 0000000000000000 R12: 1ffff920002adf10
[ 189.060967][ T75] R10: ffffed107de165d1 R11: 0000000000000000 R12: 1ffff920002adf10
[ 189.079650][ T75] R13: dffffc0000000000 R14: 0000000000000002 R15: ffff888139224818
[ 189.079650][ T75] R13: dffffc0000000000 R14: 0000000000000002 R15: ffff888139224818
[ 189.098573][ T75] FS: 0000000000000000(0000) GS:ffff8883ef080000(0000) knlGS:0000000000000000
[ 189.098573][ T75] FS: 0000000000000000(0000) GS:ffff8883ef080000(0000) knlGS:0000000000000000
[ 189.119604][ T75] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 189.119604][ T75] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 189.135313][ T75] CR2: 00000000004a2e98 CR3: 000000000680f000 CR4: 0000000000350ea0
[ 189.135313][ T75] CR2: 00000000004a2e98 CR3: 000000000680f000 CR4: 0000000000350ea0
[ 189.154165][ T75] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 189.154165][ T75] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 189.173465][ T75] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 189.173465][ T75] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 189.192075][ T75] Call Trace:
[ 189.192075][ T75] Call Trace:
[ 189.199853][ T75] <TASK>
[ 189.199853][ T75] <TASK>
[ 189.206820][ T75] l2cap_global_chan_by_psm+0x55a/0x5a0
[ 189.206820][ T75] l2cap_global_chan_by_psm+0x55a/0x5a0
[ 189.219891][ T75] ? l2cap_connect+0x12c0/0x12c0
[ 189.219891][ T75] ? l2cap_connect+0x12c0/0x12c0
[ 189.231602][ T75] ? __kfree_skb+0x13e/0x1c0
[ 189.231602][ T75] ? __kfree_skb+0x13e/0x1c0
[ 189.242612][ T75] ? l2cap_recv_frame+0xf5c/0x95c0
[ 189.242612][ T75] ? l2cap_recv_frame+0xf5c/0x95c0
[ 189.254714][ T75] ? skb_pull+0xde/0x150
[ 189.254714][ T75] ? skb_pull+0xde/0x150
[ 189.264776][ T75] l2cap_recv_frame+0x5bd/0x95c0
[ 189.264776][ T75] l2cap_recv_frame+0x5bd/0x95c0
[ 189.276329][ T75] ? debug_smp_processor_id+0x1c/0x20
[ 189.276329][ T75] ? debug_smp_processor_id+0x1c/0x20
[ 189.288985][ T75] ? update_cfs_rq_load_avg+0x412/0x4f0
[ 189.288985][ T75] ? update_cfs_rq_load_avg+0x412/0x4f0
[ 189.302202][ T75] ? l2cap_recv_acldata+0x1a60/0x1a60
[ 189.302202][ T75] ? l2cap_recv_acldata+0x1a60/0x1a60
[ 189.314998][ T75] ? __kasan_check_write+0x14/0x20
[ 189.314998][ T75] ? __kasan_check_write+0x14/0x20
[ 189.326877][ T75] ? __switch_to+0x617/0x1060
[ 189.326877][ T75] ? __switch_to+0x617/0x1060
[ 189.337858][ T75] ? __kasan_check_write+0x14/0x20
[ 189.337858][ T75] ? __kasan_check_write+0x14/0x20
[ 189.349712][ T75] ? _raw_spin_lock_irqsave+0xdc/0x1f0
[ 189.349712][ T75] ? _raw_spin_lock_irqsave+0xdc/0x1f0
[ 189.362793][ T75] ? compat_start_thread+0x20/0x20
[ 189.362793][ T75] ? compat_start_thread+0x20/0x20
[ 189.375316][ T75] l2cap_recv_acldata+0x5c9/0x1a60
[ 189.375316][ T75] l2cap_recv_acldata+0x5c9/0x1a60
[ 189.387262][ T75] ? hci_connect_sco+0x9b0/0x9b0
[ 189.387262][ T75] ? hci_connect_sco+0x9b0/0x9b0
[ 189.398857][ T75] hci_rx_work+0x54b/0x750
[ 189.398857][ T75] hci_rx_work+0x54b/0x750
[ 189.409172][ T75] process_one_work+0x6eb/0x1080
[ 189.409172][ T75] process_one_work+0x6eb/0x1080
[ 189.420829][ T75] worker_thread+0xb2b/0x13d0
[ 189.420829][ T75] worker_thread+0xb2b/0x13d0
[ 189.431922][ T75] kthread+0x2b1/0x2d0
[ 189.431922][ T75] kthread+0x2b1/0x2d0
[ 189.441724][ T75] ? process_one_work+0x1080/0x1080
[ 189.441724][ T75] ? process_one_work+0x1080/0x1080
[ 189.454559][ T75] ? kthread_blkcg+0xd0/0xd0
[ 189.454559][ T75] ? kthread_blkcg+0xd0/0xd0
[ 189.465457][ T75] ret_from_fork+0x1f/0x30
[ 189.465457][ T75] ret_from_fork+0x1f/0x30
[ 189.475777][ T75] </TASK>
[ 189.475777][ T75] </TASK>
[ 189.482907][ T75] ---[ end trace 0000000000000000 ]---
[ 189.482907][ T75] ---[ end trace 0000000000000000 ]---
[ 189.496567][ T75] ------------[ cut here ]------------
[ 189.496567][ T75] ------------[ cut here ]------------
[ 189.510250][ T75] refcount_t: underflow; use-after-free.
[ 189.510250][ T75] refcount_t: underflow; use-after-free.

> From 88cf6b4f2b0c9ed0bd7ef3b0d38574b412264609 Mon Sep 17 00:00:00 2001
> From: Luiz Augusto von Dentz <[email protected]>
> Date: Tue, 28 Jun 2022 15:46:04 -0700
> Subject: [PATCH] Bluetooth: L2CAP: WIP
>
> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
> ---
> net/bluetooth/l2cap_core.c | 25 +++++++++++++++++++------
> 1 file changed, 19 insertions(+), 6 deletions(-)
>
> diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c
> index 09ecaf556de5..359fb1ce4372 100644
> --- a/net/bluetooth/l2cap_core.c
> +++ b/net/bluetooth/l2cap_core.c
> @@ -111,7 +111,8 @@ static struct l2cap_chan *__l2cap_get_chan_by_scid(struct l2cap_conn *conn,
> }
>
> /* Find channel with given SCID.
> - * Returns locked channel. */
> + * Returns a reference locked channel.
> + */
> static struct l2cap_chan *l2cap_get_chan_by_scid(struct l2cap_conn *conn,
> u16 cid)
> {
> @@ -119,15 +120,17 @@ static struct l2cap_chan *l2cap_get_chan_by_scid(struct l2cap_conn *conn,
>
> mutex_lock(&conn->chan_lock);
> c = __l2cap_get_chan_by_scid(conn, cid);
> - if (c)
> + if (c) {
> + l2cap_chan_hold(c);
> l2cap_chan_lock(c);
> + }
> mutex_unlock(&conn->chan_lock);
>
> return c;
> }
>
> /* Find channel with given DCID.
> - * Returns locked channel.
> + * Returns a reference locked channel.
> */
> static struct l2cap_chan *l2cap_get_chan_by_dcid(struct l2cap_conn *conn,
> u16 cid)
> @@ -136,8 +139,10 @@ static struct l2cap_chan *l2cap_get_chan_by_dcid(struct l2cap_conn *conn,
>
> mutex_lock(&conn->chan_lock);
> c = __l2cap_get_chan_by_dcid(conn, cid);
> - if (c)
> + if (c) {
> + l2cap_chan_hold(c);
> l2cap_chan_lock(c);
> + }
> mutex_unlock(&conn->chan_lock);
>
> return c;
> @@ -4464,6 +4469,7 @@ static inline int l2cap_config_req(struct l2cap_conn *conn,
>
> unlock:
> l2cap_chan_unlock(chan);
> + l2cap_chan_put(chan);
> return err;
> }
>
> @@ -4578,6 +4584,7 @@ static inline int l2cap_config_rsp(struct l2cap_conn *conn,
>
> done:
> l2cap_chan_unlock(chan);
> + l2cap_chan_put(chan);
> return err;
> }
>
> @@ -5305,6 +5312,7 @@ static inline int l2cap_move_channel_req(struct l2cap_conn *conn,
> l2cap_send_move_chan_rsp(chan, result);
>
> l2cap_chan_unlock(chan);
> + l2cap_chan_put(chan);
>
> return 0;
> }
> @@ -5397,6 +5405,7 @@ static void l2cap_move_continue(struct l2cap_conn *conn, u16 icid, u16 result)
> }
>
> l2cap_chan_unlock(chan);
> + l2cap_chan_put(chan);
> }
>
> static void l2cap_move_fail(struct l2cap_conn *conn, u8 ident, u16 icid,
> @@ -5489,6 +5498,7 @@ static int l2cap_move_channel_confirm(struct l2cap_conn *conn,
> l2cap_send_move_chan_cfm_rsp(conn, cmd->ident, icid);
>
> l2cap_chan_unlock(chan);
> + l2cap_chan_put(chan);
>
> return 0;
> }
> @@ -5524,6 +5534,7 @@ static inline int l2cap_move_channel_confirm_rsp(struct l2cap_conn *conn,
> }
>
> l2cap_chan_unlock(chan);
> + l2cap_chan_put(chan);
>
> return 0;
> }
> @@ -5896,12 +5907,11 @@ static inline int l2cap_le_credits(struct l2cap_conn *conn,
> if (credits > max_credits) {
> BT_ERR("LE credits overflow");
> l2cap_send_disconn_req(chan, ECONNRESET);
> - l2cap_chan_unlock(chan);
>
> /* Return 0 so that we don't trigger an unnecessary
> * command reject packet.
> */
> - return 0;
> + goto unlock;
> }
>
> chan->tx_credits += credits;
> @@ -5912,7 +5922,9 @@ static inline int l2cap_le_credits(struct l2cap_conn *conn,
> if (chan->tx_credits)
> chan->ops->resume(chan);
>
> +unlock:
> l2cap_chan_unlock(chan);
> + l2cap_chan_put(chan);
>
> return 0;
> }
> @@ -7598,6 +7610,7 @@ static void l2cap_data_channel(struct l2cap_conn *conn, u16 cid,
>
> done:
> l2cap_chan_unlock(chan);
> + l2cap_chan_put(chan);
> }
>
> static void l2cap_conless_channel(struct l2cap_conn *conn, __le16 psm,


--
Lee Jones [李琼斯]
Principal Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog

2022-07-06 20:45:56

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: [RESEND 1/1] Bluetooth: Use chan_list_lock to protect the whole put/destroy invokation

Hi Lee,

On Wed, Jul 6, 2022 at 3:53 AM Lee Jones <[email protected]> wrote:
>
> On Tue, 05 Jul 2022, Luiz Augusto von Dentz wrote:
>
> > Hi Lee,
> >
> > On Wed, Jun 29, 2022 at 8:28 AM Lee Jones <[email protected]> wrote:
> > >
> > > On Tue, 28 Jun 2022, Luiz Augusto von Dentz wrote:
> > >
> > > > Hi Eric, Lee,
> > > >
> > > > On Mon, Jun 27, 2022 at 4:39 PM Luiz Augusto von Dentz
> > > > <[email protected]> wrote:
> > > > >
> > > > > Hi Eric, Lee,
> > > > >
> > > > > On Mon, Jun 27, 2022 at 7:41 AM Eric Dumazet <[email protected]> wrote:
> > > > > >
> > > > > > On Wed, Jun 22, 2022 at 10:27 AM Lee Jones <[email protected]> wrote:
> > > > > > >
> > > > > > > This change prevents a use-after-free caused by one of the worker
> > > > > > > threads starting up (see below) *after* the final channel reference
> > > > > > > has been put() during sock_close() but *before* the references to the
> > > > > > > channel have been destroyed.
> > > > > > >
> > > > > > > refcount_t: increment on 0; use-after-free.
> > > > > > > BUG: KASAN: use-after-free in refcount_dec_and_test+0x20/0xd0
> > > > > > > Read of size 4 at addr ffffffc114f5bf18 by task kworker/u17:14/705
> > > > > > >
> > > > > > > CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S W 4.14.234-00003-g1fb6d0bd49a4-dirty #28
> > > > > > > Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 Google Inc. MSM sm8150 Flame DVT (DT)
> > > > > > > Workqueue: hci0 hci_rx_work
> > > > > > > Call trace:
> > > > > > > dump_backtrace+0x0/0x378
> > > > > > > show_stack+0x20/0x2c
> > > > > > > dump_stack+0x124/0x148
> > > > > > > print_address_description+0x80/0x2e8
> > > > > > > __kasan_report+0x168/0x188
> > > > > > > kasan_report+0x10/0x18
> > > > > > > __asan_load4+0x84/0x8c
> > > > > > > refcount_dec_and_test+0x20/0xd0
> > > > > > > l2cap_chan_put+0x48/0x12c
> > > > > > > l2cap_recv_frame+0x4770/0x6550
> > > > > > > l2cap_recv_acldata+0x44c/0x7a4
> > > > > > > hci_acldata_packet+0x100/0x188
> > > > > > > hci_rx_work+0x178/0x23c
> > > > > > > process_one_work+0x35c/0x95c
> > > > > > > worker_thread+0x4cc/0x960
> > > > > > > kthread+0x1a8/0x1c4
> > > > > > > ret_from_fork+0x10/0x18
> > > > > > >
> > > > > > > Cc: [email protected]
> > > > > >
> > > > > > When was the bug added ? (Fixes: tag please)
> > > > > >
> > > > > > > Cc: Marcel Holtmann <[email protected]>
> > > > > > > Cc: Johan Hedberg <[email protected]>
> > > > > > > Cc: Luiz Augusto von Dentz <[email protected]>
> > > > > > > Cc: "David S. Miller" <[email protected]>
> > > > > > > Cc: Eric Dumazet <[email protected]>
> > > > > > > Cc: Jakub Kicinski <[email protected]>
> > > > > > > Cc: Paolo Abeni <[email protected]>
> > > > > > > Cc: [email protected]
> > > > > > > Cc: [email protected]
> > > > > > > Signed-off-by: Lee Jones <[email protected]>
> > > > > > > ---
> > > > > > > net/bluetooth/l2cap_core.c | 4 ++--
> > > > > > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > > >
> > > > > > > diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c
> > > > > > > index ae78490ecd3d4..82279c5919fd8 100644
> > > > > > > --- a/net/bluetooth/l2cap_core.c
> > > > > > > +++ b/net/bluetooth/l2cap_core.c
> > > > > > > @@ -483,9 +483,7 @@ static void l2cap_chan_destroy(struct kref *kref)
> > > > > > >
> > > > > > > BT_DBG("chan %p", chan);
> > > > > > >
> > > > > > > - write_lock(&chan_list_lock);
> > > > > > > list_del(&chan->global_l);
> > > > > > > - write_unlock(&chan_list_lock);
> > > > > > >
> > > > > > > kfree(chan);
> > > > > > > }
> > > > > > > @@ -501,7 +499,9 @@ void l2cap_chan_put(struct l2cap_chan *c)
> > > > > > > {
> > > > > > > BT_DBG("chan %p orig refcnt %u", c, kref_read(&c->kref));
> > > > > > >
> > > > > > > + write_lock(&chan_list_lock);
> > > > > > > kref_put(&c->kref, l2cap_chan_destroy);
> > > > > > > + write_unlock(&chan_list_lock);
> > > > > > > }
> > > > > > > EXPORT_SYMBOL_GPL(l2cap_chan_put);
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > I do not think this patch is correct.
> > > > > >
> > > > > > a kref does not need to be protected by a write lock.
> > > > > >
> > > > > > This might shuffle things enough to work around a particular repro you have.
> > > > > >
> > > > > > If the patch was correct why not protect kref_get() sides ?
> > > > > >
> > > > > > Before the &hdev->rx_work is scheduled (queue_work(hdev->workqueue,
> > > > > > &hdev->rx_work),
> > > > > > a reference must be taken.
> > > > > >
> > > > > > Then this reference must be released at the end of hci_rx_work() or
> > > > > > when hdev->workqueue
> > > > > > is canceled.
> > > > > >
> > > > > > This refcount is not needed _if_ the workqueue is properly canceled at
> > > > > > device dismantle,
> > > > > > in a synchronous way.
> > > > > >
> > > > > > I do not see this hdev->rx_work being canceled, maybe this is the real issue.
> > > > > >
> > > > > > There is a call to drain_workqueue() but this is not enough I think,
> > > > > > because hci_recv_frame()
> > > > > > can re-arm
> > > > > > queue_work(hdev->workqueue, &hdev->rx_work);
> > > > >
> > > > > I suspect this likely a refcount problem, we do l2cap_get_chan_by_scid:
> > > > >
> > > > > /* Find channel with given SCID.
> > > > > * Returns locked channel. */
> > > > > static struct l2cap_chan *l2cap_get_chan_by_scid(struct l2cap_conn
> > > > > *conn, u16 cid)
> > > > >
> > > > > So we return a locked channel but that doesn't prevent another thread
> > > > > to call l2cap_chan_put which doesn't care about l2cap_chan_lock so
> > > > > perhaps we actually need to host a reference while we have the lock,
> > > > > at least we do something like that on l2cap_sock.c:
> > > > >
> > > > > l2cap_chan_hold(chan);
> > > > > l2cap_chan_lock(chan);
> > > > >
> > > > > __clear_chan_timer(chan);
> > > > > l2cap_chan_close(chan, ECONNRESET);
> > > > > l2cap_sock_kill(sk);
> > > > >
> > > > > l2cap_chan_unlock(chan);
> > > > > l2cap_chan_put(chan);
> > > >
> > > > Perhaps something like this:
> > >
> > > I'm struggling to apply this for test:
> > >
> > > "error: corrupt patch at line 6"
> >
> > Check with the attached patch.
>
> With the patch applied:
>
> [ 188.825418][ T75] refcount_t: addition on 0; use-after-free.
> [ 188.825418][ T75] refcount_t: addition on 0; use-after-free.

Looks like the changes just make the issue more visible since we are
trying to add a refcount when it is already 0 so this proves the
design is not quite right since it is removing the object from the
list only when destroying it while we probably need to do it before.

How about we use kref_get_unless_zero as it appears it was introduced
exactly for such cases (patch attached.)

Luiz Augusto von Dentz


Attachments:
0001-Bluetooth-L2CAP-WIP.patch (5.25 kB)

2022-07-06 21:06:31

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: [RESEND 1/1] Bluetooth: Use chan_list_lock to protect the whole put/destroy invokation

Hi,

On Wed, Jul 6, 2022 at 1:36 PM Luiz Augusto von Dentz
<[email protected]> wrote:
>
> Hi Lee,
>
> On Wed, Jul 6, 2022 at 3:53 AM Lee Jones <[email protected]> wrote:
> >
> > On Tue, 05 Jul 2022, Luiz Augusto von Dentz wrote:
> >
> > > Hi Lee,
> > >
> > > On Wed, Jun 29, 2022 at 8:28 AM Lee Jones <[email protected]> wrote:
> > > >
> > > > On Tue, 28 Jun 2022, Luiz Augusto von Dentz wrote:
> > > >
> > > > > Hi Eric, Lee,
> > > > >
> > > > > On Mon, Jun 27, 2022 at 4:39 PM Luiz Augusto von Dentz
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > Hi Eric, Lee,
> > > > > >
> > > > > > On Mon, Jun 27, 2022 at 7:41 AM Eric Dumazet <[email protected]> wrote:
> > > > > > >
> > > > > > > On Wed, Jun 22, 2022 at 10:27 AM Lee Jones <[email protected]> wrote:
> > > > > > > >
> > > > > > > > This change prevents a use-after-free caused by one of the worker
> > > > > > > > threads starting up (see below) *after* the final channel reference
> > > > > > > > has been put() during sock_close() but *before* the references to the
> > > > > > > > channel have been destroyed.
> > > > > > > >
> > > > > > > > refcount_t: increment on 0; use-after-free.
> > > > > > > > BUG: KASAN: use-after-free in refcount_dec_and_test+0x20/0xd0
> > > > > > > > Read of size 4 at addr ffffffc114f5bf18 by task kworker/u17:14/705
> > > > > > > >
> > > > > > > > CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S W 4.14.234-00003-g1fb6d0bd49a4-dirty #28
> > > > > > > > Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 Google Inc. MSM sm8150 Flame DVT (DT)
> > > > > > > > Workqueue: hci0 hci_rx_work
> > > > > > > > Call trace:
> > > > > > > > dump_backtrace+0x0/0x378
> > > > > > > > show_stack+0x20/0x2c
> > > > > > > > dump_stack+0x124/0x148
> > > > > > > > print_address_description+0x80/0x2e8
> > > > > > > > __kasan_report+0x168/0x188
> > > > > > > > kasan_report+0x10/0x18
> > > > > > > > __asan_load4+0x84/0x8c
> > > > > > > > refcount_dec_and_test+0x20/0xd0
> > > > > > > > l2cap_chan_put+0x48/0x12c
> > > > > > > > l2cap_recv_frame+0x4770/0x6550
> > > > > > > > l2cap_recv_acldata+0x44c/0x7a4
> > > > > > > > hci_acldata_packet+0x100/0x188
> > > > > > > > hci_rx_work+0x178/0x23c
> > > > > > > > process_one_work+0x35c/0x95c
> > > > > > > > worker_thread+0x4cc/0x960
> > > > > > > > kthread+0x1a8/0x1c4
> > > > > > > > ret_from_fork+0x10/0x18
> > > > > > > >
> > > > > > > > Cc: [email protected]
> > > > > > >
> > > > > > > When was the bug added ? (Fixes: tag please)
> > > > > > >
> > > > > > > > Cc: Marcel Holtmann <[email protected]>
> > > > > > > > Cc: Johan Hedberg <[email protected]>
> > > > > > > > Cc: Luiz Augusto von Dentz <[email protected]>
> > > > > > > > Cc: "David S. Miller" <[email protected]>
> > > > > > > > Cc: Eric Dumazet <[email protected]>
> > > > > > > > Cc: Jakub Kicinski <[email protected]>
> > > > > > > > Cc: Paolo Abeni <[email protected]>
> > > > > > > > Cc: [email protected]
> > > > > > > > Cc: [email protected]
> > > > > > > > Signed-off-by: Lee Jones <[email protected]>
> > > > > > > > ---
> > > > > > > > net/bluetooth/l2cap_core.c | 4 ++--
> > > > > > > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c
> > > > > > > > index ae78490ecd3d4..82279c5919fd8 100644
> > > > > > > > --- a/net/bluetooth/l2cap_core.c
> > > > > > > > +++ b/net/bluetooth/l2cap_core.c
> > > > > > > > @@ -483,9 +483,7 @@ static void l2cap_chan_destroy(struct kref *kref)
> > > > > > > >
> > > > > > > > BT_DBG("chan %p", chan);
> > > > > > > >
> > > > > > > > - write_lock(&chan_list_lock);
> > > > > > > > list_del(&chan->global_l);
> > > > > > > > - write_unlock(&chan_list_lock);
> > > > > > > >
> > > > > > > > kfree(chan);
> > > > > > > > }
> > > > > > > > @@ -501,7 +499,9 @@ void l2cap_chan_put(struct l2cap_chan *c)
> > > > > > > > {
> > > > > > > > BT_DBG("chan %p orig refcnt %u", c, kref_read(&c->kref));
> > > > > > > >
> > > > > > > > + write_lock(&chan_list_lock);
> > > > > > > > kref_put(&c->kref, l2cap_chan_destroy);
> > > > > > > > + write_unlock(&chan_list_lock);
> > > > > > > > }
> > > > > > > > EXPORT_SYMBOL_GPL(l2cap_chan_put);
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > I do not think this patch is correct.
> > > > > > >
> > > > > > > a kref does not need to be protected by a write lock.
> > > > > > >
> > > > > > > This might shuffle things enough to work around a particular repro you have.
> > > > > > >
> > > > > > > If the patch was correct why not protect kref_get() sides ?
> > > > > > >
> > > > > > > Before the &hdev->rx_work is scheduled (queue_work(hdev->workqueue,
> > > > > > > &hdev->rx_work),
> > > > > > > a reference must be taken.
> > > > > > >
> > > > > > > Then this reference must be released at the end of hci_rx_work() or
> > > > > > > when hdev->workqueue
> > > > > > > is canceled.
> > > > > > >
> > > > > > > This refcount is not needed _if_ the workqueue is properly canceled at
> > > > > > > device dismantle,
> > > > > > > in a synchronous way.
> > > > > > >
> > > > > > > I do not see this hdev->rx_work being canceled, maybe this is the real issue.
> > > > > > >
> > > > > > > There is a call to drain_workqueue() but this is not enough I think,
> > > > > > > because hci_recv_frame()
> > > > > > > can re-arm
> > > > > > > queue_work(hdev->workqueue, &hdev->rx_work);
> > > > > >
> > > > > > I suspect this likely a refcount problem, we do l2cap_get_chan_by_scid:
> > > > > >
> > > > > > /* Find channel with given SCID.
> > > > > > * Returns locked channel. */
> > > > > > static struct l2cap_chan *l2cap_get_chan_by_scid(struct l2cap_conn
> > > > > > *conn, u16 cid)
> > > > > >
> > > > > > So we return a locked channel but that doesn't prevent another thread
> > > > > > to call l2cap_chan_put which doesn't care about l2cap_chan_lock so
> > > > > > perhaps we actually need to host a reference while we have the lock,
> > > > > > at least we do something like that on l2cap_sock.c:
> > > > > >
> > > > > > l2cap_chan_hold(chan);
> > > > > > l2cap_chan_lock(chan);
> > > > > >
> > > > > > __clear_chan_timer(chan);
> > > > > > l2cap_chan_close(chan, ECONNRESET);
> > > > > > l2cap_sock_kill(sk);
> > > > > >
> > > > > > l2cap_chan_unlock(chan);
> > > > > > l2cap_chan_put(chan);
> > > > >
> > > > > Perhaps something like this:
> > > >
> > > > I'm struggling to apply this for test:
> > > >
> > > > "error: corrupt patch at line 6"
> > >
> > > Check with the attached patch.
> >
> > With the patch applied:
> >
> > [ 188.825418][ T75] refcount_t: addition on 0; use-after-free.
> > [ 188.825418][ T75] refcount_t: addition on 0; use-after-free.
>
> Looks like the changes just make the issue more visible since we are
> trying to add a refcount when it is already 0 so this proves the
> design is not quite right since it is removing the object from the
> list only when destroying it while we probably need to do it before.
>
> How about we use kref_get_unless_zero as it appears it was introduced
> exactly for such cases (patch attached.)

Looks like I missed a few places like l2cap_global_chan_by_psm so here
is another version.

> Luiz Augusto von Dentz



--
Luiz Augusto von Dentz


Attachments:
0001-Bluetooth-L2CAP-WIP.patch (6.03 kB)

2022-07-20 11:57:20

by Lee Jones

[permalink] [raw]
Subject: Re: [RESEND 1/1] Bluetooth: Use chan_list_lock to protect the whole put/destroy invokation

On Wed, 06 Jul 2022, Luiz Augusto von Dentz wrote:
> > > > > > Perhaps something like this:
> > > > >
> > > > > I'm struggling to apply this for test:
> > > > >
> > > > > "error: corrupt patch at line 6"
> > > >
> > > > Check with the attached patch.
> > >
> > > With the patch applied:
> > >
> > > [ 188.825418][ T75] refcount_t: addition on 0; use-after-free.
> > > [ 188.825418][ T75] refcount_t: addition on 0; use-after-free.
> >
> > Looks like the changes just make the issue more visible since we are
> > trying to add a refcount when it is already 0 so this proves the
> > design is not quite right since it is removing the object from the
> > list only when destroying it while we probably need to do it before.
> >
> > How about we use kref_get_unless_zero as it appears it was introduced
> > exactly for such cases (patch attached.)
>
> Looks like I missed a few places like l2cap_global_chan_by_psm so here
> is another version.

Okay, with the patch below the kernel doesn't produce a back-trace.

Only this, which I assume is expected?

[ 535.398255][ T495] Bluetooth: hci0: unexpected cc 0x0c03 length: 249 > 1
[ 535.398255][ T495] Bluetooth: hci0: unexpected cc 0x0c03 length: 249 > 1
[ 535.417007][ T495] Bluetooth: hci0: unexpected cc 0x1003 length: 249 > 9
[ 535.417007][ T495] Bluetooth: hci0: unexpected cc 0x1003 length: 249 > 9
[ 535.434810][ T495] Bluetooth: hci0: unexpected cc 0x1001 length: 249 > 9
[ 535.434810][ T495] Bluetooth: hci0: unexpected cc 0x1001 length: 249 > 9
[ 535.452886][ T495] Bluetooth: hci0: unexpected cc 0x0c23 length: 249 > 4
[ 535.452886][ T495] Bluetooth: hci0: unexpected cc 0x0c23 length: 249 > 4
[ 535.470574][ T495] Bluetooth: hci0: unexpected cc 0x0c25 length: 249 > 3
[ 535.470574][ T495] Bluetooth: hci0: unexpected cc 0x0c25 length: 249 > 3
[ 535.488009][ T495] Bluetooth: hci0: unexpected cc 0x0c38 length: 249 > 2
[ 535.488009][ T495] Bluetooth: hci0: unexpected cc 0x0c38 length: 249 > 2
[ 537.551677][ T74] Bluetooth: hci0: command 0x0409 tx timeout
[ 537.551677][ T74] Bluetooth: hci0: command 0x0409 tx timeout
[ 539.641362][ T373] Bluetooth: hci0: command 0x041b tx timeout
[ 539.641362][ T373] Bluetooth: hci0: command 0x041b tx timeout
[ 541.711056][ T274] Bluetooth: hci0: command 0x040f tx timeout
[ 541.711056][ T274] Bluetooth: hci0: command 0x040f tx timeout
[ 543.790939][ T66] Bluetooth: hci0: command 0x0419 tx timeout
[ 543.790939][ T66] Bluetooth: hci0: command 0x0419 tx timeout

> From 235937ac7a39d16e5dabbfca0ac1d58e4cc814d9 Mon Sep 17 00:00:00 2001
> From: Luiz Augusto von Dentz <[email protected]>
> Date: Tue, 28 Jun 2022 15:46:04 -0700
> Subject: [PATCH] Bluetooth: L2CAP: WIP
>
> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
> ---
> include/net/bluetooth/l2cap.h | 1 +
> net/bluetooth/l2cap_core.c | 58 +++++++++++++++++++++++++++--------
> 2 files changed, 46 insertions(+), 13 deletions(-)
>
> diff --git a/include/net/bluetooth/l2cap.h b/include/net/bluetooth/l2cap.h
> index 3c4f550e5a8b..2f766e3437ce 100644
> --- a/include/net/bluetooth/l2cap.h
> +++ b/include/net/bluetooth/l2cap.h
> @@ -847,6 +847,7 @@ enum {
> };
>
> void l2cap_chan_hold(struct l2cap_chan *c);
> +struct l2cap_chan *l2cap_chan_hold_unless_zero(struct l2cap_chan *c);
> void l2cap_chan_put(struct l2cap_chan *c);
>
> static inline void l2cap_chan_lock(struct l2cap_chan *chan)
> diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c
> index 09ecaf556de5..3e5d81e971cc 100644
> --- a/net/bluetooth/l2cap_core.c
> +++ b/net/bluetooth/l2cap_core.c
> @@ -111,7 +111,8 @@ static struct l2cap_chan *__l2cap_get_chan_by_scid(struct l2cap_conn *conn,
> }
>
> /* Find channel with given SCID.
> - * Returns locked channel. */
> + * Returns a reference locked channel.
> + */
> static struct l2cap_chan *l2cap_get_chan_by_scid(struct l2cap_conn *conn,
> u16 cid)
> {
> @@ -119,15 +120,19 @@ static struct l2cap_chan *l2cap_get_chan_by_scid(struct l2cap_conn *conn,
>
> mutex_lock(&conn->chan_lock);
> c = __l2cap_get_chan_by_scid(conn, cid);
> - if (c)
> - l2cap_chan_lock(c);
> + if (c) {
> + /* Only lock if chan reference is not 0 */
> + c = l2cap_chan_hold_unless_zero(c);
> + if (c)
> + l2cap_chan_lock(c);
> + }
> mutex_unlock(&conn->chan_lock);
>
> return c;
> }
>
> /* Find channel with given DCID.
> - * Returns locked channel.
> + * Returns a reference locked channel.
> */
> static struct l2cap_chan *l2cap_get_chan_by_dcid(struct l2cap_conn *conn,
> u16 cid)
> @@ -136,8 +141,12 @@ static struct l2cap_chan *l2cap_get_chan_by_dcid(struct l2cap_conn *conn,
>
> mutex_lock(&conn->chan_lock);
> c = __l2cap_get_chan_by_dcid(conn, cid);
> - if (c)
> - l2cap_chan_lock(c);
> + if (c) {
> + /* Only lock if chan reference is not 0 */
> + c = l2cap_chan_hold_unless_zero(c);
> + if (c)
> + l2cap_chan_lock(c);
> + }
> mutex_unlock(&conn->chan_lock);
>
> return c;
> @@ -162,8 +171,12 @@ static struct l2cap_chan *l2cap_get_chan_by_ident(struct l2cap_conn *conn,
>
> mutex_lock(&conn->chan_lock);
> c = __l2cap_get_chan_by_ident(conn, ident);
> - if (c)
> - l2cap_chan_lock(c);
> + if (c) {
> + /* Only lock if chan reference is not 0 */
> + c = l2cap_chan_hold_unless_zero(c);
> + if (c)
> + l2cap_chan_lock(c);
> + }
> mutex_unlock(&conn->chan_lock);
>
> return c;
> @@ -497,6 +510,16 @@ void l2cap_chan_hold(struct l2cap_chan *c)
> kref_get(&c->kref);
> }
>
> +struct l2cap_chan *l2cap_chan_hold_unless_zero(struct l2cap_chan *c)
> +{
> + BT_DBG("chan %p orig refcnt %u", c, kref_read(&c->kref));
> +
> + if (!kref_get_unless_zero(&c->kref))
> + return NULL;
> +
> + return c;
> +}
> +
> void l2cap_chan_put(struct l2cap_chan *c)
> {
> BT_DBG("chan %p orig refcnt %u", c, kref_read(&c->kref));
> @@ -1969,7 +1992,7 @@ static struct l2cap_chan *l2cap_global_chan_by_psm(int state, __le16 psm,
> src_match = !bacmp(&c->src, src);
> dst_match = !bacmp(&c->dst, dst);
> if (src_match && dst_match) {
> - l2cap_chan_hold(c);
> + c = l2cap_chan_hold_unless_zero(c);
> read_unlock(&chan_list_lock);
> return c;
> }
> @@ -1984,7 +2007,7 @@ static struct l2cap_chan *l2cap_global_chan_by_psm(int state, __le16 psm,
> }
>
> if (c1)
> - l2cap_chan_hold(c1);
> + c1 = l2cap_chan_hold_unless_zero(c1);
>
> read_unlock(&chan_list_lock);
>
> @@ -4464,6 +4487,7 @@ static inline int l2cap_config_req(struct l2cap_conn *conn,
>
> unlock:
> l2cap_chan_unlock(chan);
> + l2cap_chan_put(chan);
> return err;
> }
>
> @@ -4578,6 +4602,7 @@ static inline int l2cap_config_rsp(struct l2cap_conn *conn,
>
> done:
> l2cap_chan_unlock(chan);
> + l2cap_chan_put(chan);
> return err;
> }
>
> @@ -5305,6 +5330,7 @@ static inline int l2cap_move_channel_req(struct l2cap_conn *conn,
> l2cap_send_move_chan_rsp(chan, result);
>
> l2cap_chan_unlock(chan);
> + l2cap_chan_put(chan);
>
> return 0;
> }
> @@ -5397,6 +5423,7 @@ static void l2cap_move_continue(struct l2cap_conn *conn, u16 icid, u16 result)
> }
>
> l2cap_chan_unlock(chan);
> + l2cap_chan_put(chan);
> }
>
> static void l2cap_move_fail(struct l2cap_conn *conn, u8 ident, u16 icid,
> @@ -5426,6 +5453,7 @@ static void l2cap_move_fail(struct l2cap_conn *conn, u8 ident, u16 icid,
> l2cap_send_move_chan_cfm(chan, L2CAP_MC_UNCONFIRMED);
>
> l2cap_chan_unlock(chan);
> + l2cap_chan_put(chan);
> }
>
> static int l2cap_move_channel_rsp(struct l2cap_conn *conn,
> @@ -5489,6 +5517,7 @@ static int l2cap_move_channel_confirm(struct l2cap_conn *conn,
> l2cap_send_move_chan_cfm_rsp(conn, cmd->ident, icid);
>
> l2cap_chan_unlock(chan);
> + l2cap_chan_put(chan);
>
> return 0;
> }
> @@ -5524,6 +5553,7 @@ static inline int l2cap_move_channel_confirm_rsp(struct l2cap_conn *conn,
> }
>
> l2cap_chan_unlock(chan);
> + l2cap_chan_put(chan);
>
> return 0;
> }
> @@ -5896,12 +5926,11 @@ static inline int l2cap_le_credits(struct l2cap_conn *conn,
> if (credits > max_credits) {
> BT_ERR("LE credits overflow");
> l2cap_send_disconn_req(chan, ECONNRESET);
> - l2cap_chan_unlock(chan);
>
> /* Return 0 so that we don't trigger an unnecessary
> * command reject packet.
> */
> - return 0;
> + goto unlock;
> }
>
> chan->tx_credits += credits;
> @@ -5912,7 +5941,9 @@ static inline int l2cap_le_credits(struct l2cap_conn *conn,
> if (chan->tx_credits)
> chan->ops->resume(chan);
>
> +unlock:
> l2cap_chan_unlock(chan);
> + l2cap_chan_put(chan);
>
> return 0;
> }
> @@ -7598,6 +7629,7 @@ static void l2cap_data_channel(struct l2cap_conn *conn, u16 cid,
>
> done:
> l2cap_chan_unlock(chan);
> + l2cap_chan_put(chan);
> }
>
> static void l2cap_conless_channel(struct l2cap_conn *conn, __le16 psm,
> @@ -8086,7 +8118,7 @@ static struct l2cap_chan *l2cap_global_fixed_chan(struct l2cap_chan *c,
> if (src_type != c->src_type)
> continue;
>
> - l2cap_chan_hold(c);
> + c = l2cap_chan_hold_unless_zero(c);
> read_unlock(&chan_list_lock);
> return c;
> }


--
Lee Jones [李琼斯]
Principal Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog