2023-02-24 19:53:21

by Zhengping Jiang

[permalink] [raw]
Subject: [kernel PATCH v2 0/1] Clear workqueue to avoid use-after-free


After the hci_sync rework, cmd_sync_work was cleared when calling
hci_unregister_dev, but not when powering off the adapter.
Use-after-free errors happen when a work is still scheduled
when cmd is freed by __mgmt_power_off.

Changes in v2:
- Add function to clear the queue without stop the timer

Changes in v1:
- Clear cmd_sync_work queue before clearing the mgmt cmd list

Zhengping Jiang (1):
Bluetooth: hci_sync: clear workqueue before clear mgmt cmd

net/bluetooth/hci_sync.c | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)

--
2.39.2.722.g9855ee24e9-goog



2023-02-24 19:53:24

by Zhengping Jiang

[permalink] [raw]
Subject: [kernel PATCH v2 1/1] Bluetooth: hci_sync: clear workqueue before clear mgmt cmd

Clear cmd_sync_work queue before clearing the mgmt cmd list to avoid
racing conditions which cause use-after-free.

When powering off the adapter, the mgmt cmd list will be cleared. If a
work is queued in the cmd_sync_work queue at the same time, it will
cause the risk of use-after-free, as the cmd pointer is not checked
before use.

Signed-off-by: Zhengping Jiang <[email protected]>
---

Changes in v2:
- Add function to clear the queue without stop the timer

Changes in v1:
- Clear cmd_sync_work queue before clearing the mgmt cmd list

net/bluetooth/hci_sync.c | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
index 117eedb6f709..b70365dfff0c 100644
--- a/net/bluetooth/hci_sync.c
+++ b/net/bluetooth/hci_sync.c
@@ -636,6 +636,23 @@ void hci_cmd_sync_init(struct hci_dev *hdev)
INIT_DELAYED_WORK(&hdev->adv_instance_expire, adv_timeout_expire);
}

+static void hci_pend_cmd_sync_clear(struct hci_dev *hdev)
+{
+ struct hci_cmd_sync_work_entry *entry, *tmp;
+
+ mutex_lock(&hdev->cmd_sync_work_lock);
+ list_for_each_entry_safe(entry, tmp, &hdev->cmd_sync_work_list, list) {
+ if (entry->destroy) {
+ hci_req_sync_lock(hdev);
+ entry->destroy(hdev, entry->data, -ECANCELED);
+ hci_req_sync_unlock(hdev);
+ }
+ list_del(&entry->list);
+ kfree(entry);
+ }
+ mutex_unlock(&hdev->cmd_sync_work_lock);
+}
+
void hci_cmd_sync_clear(struct hci_dev *hdev)
{
struct hci_cmd_sync_work_entry *entry, *tmp;
@@ -4842,8 +4859,10 @@ int hci_dev_close_sync(struct hci_dev *hdev)

if (!auto_off && hdev->dev_type == HCI_PRIMARY &&
!hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
- hci_dev_test_flag(hdev, HCI_MGMT))
+ hci_dev_test_flag(hdev, HCI_MGMT)) {
+ hci_pend_cmd_sync_clear(hdev);
__mgmt_power_off(hdev);
+ }

hci_inquiry_cache_flush(hdev);
hci_pend_le_actions_clear(hdev);
--
2.39.2.722.g9855ee24e9-goog


2023-02-24 21:02:51

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: [kernel PATCH v2 1/1] Bluetooth: hci_sync: clear workqueue before clear mgmt cmd

Hi Zhengping,

On Fri, Feb 24, 2023 at 11:53 AM Zhengping Jiang <[email protected]> wrote:
>
> Clear cmd_sync_work queue before clearing the mgmt cmd list to avoid
> racing conditions which cause use-after-free.
>
> When powering off the adapter, the mgmt cmd list will be cleared. If a
> work is queued in the cmd_sync_work queue at the same time, it will
> cause the risk of use-after-free, as the cmd pointer is not checked
> before use.
>
> Signed-off-by: Zhengping Jiang <[email protected]>
> ---
>
> Changes in v2:
> - Add function to clear the queue without stop the timer
>
> Changes in v1:
> - Clear cmd_sync_work queue before clearing the mgmt cmd list
>
> net/bluetooth/hci_sync.c | 21 ++++++++++++++++++++-
> 1 file changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
> index 117eedb6f709..b70365dfff0c 100644
> --- a/net/bluetooth/hci_sync.c
> +++ b/net/bluetooth/hci_sync.c
> @@ -636,6 +636,23 @@ void hci_cmd_sync_init(struct hci_dev *hdev)
> INIT_DELAYED_WORK(&hdev->adv_instance_expire, adv_timeout_expire);
> }
>
> +static void hci_pend_cmd_sync_clear(struct hci_dev *hdev)
> +{
> + struct hci_cmd_sync_work_entry *entry, *tmp;
> +
> + mutex_lock(&hdev->cmd_sync_work_lock);
> + list_for_each_entry_safe(entry, tmp, &hdev->cmd_sync_work_list, list) {
> + if (entry->destroy) {
> + hci_req_sync_lock(hdev);
> + entry->destroy(hdev, entry->data, -ECANCELED);
> + hci_req_sync_unlock(hdev);
> + }
> + list_del(&entry->list);
> + kfree(entry);
> + }
> + mutex_unlock(&hdev->cmd_sync_work_lock);
> +}
> +
> void hci_cmd_sync_clear(struct hci_dev *hdev)
> {
> struct hci_cmd_sync_work_entry *entry, *tmp;
> @@ -4842,8 +4859,10 @@ int hci_dev_close_sync(struct hci_dev *hdev)
>
> if (!auto_off && hdev->dev_type == HCI_PRIMARY &&
> !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> - hci_dev_test_flag(hdev, HCI_MGMT))
> + hci_dev_test_flag(hdev, HCI_MGMT)) {
> + hci_pend_cmd_sync_clear(hdev);

Any particular reason why you are not using hci_cmd_sync_clear
instead? We also may want to move the clearing logic to
hci_dev_close_sync since it should be equivalent to
hci_request_cancel_all.

> __mgmt_power_off(hdev);
> + }
>
> hci_inquiry_cache_flush(hdev);
> hci_pend_le_actions_clear(hdev);
> --
> 2.39.2.722.g9855ee24e9-goog
>


--
Luiz Augusto von Dentz

2023-02-24 22:38:02

by Zhengping Jiang

[permalink] [raw]
Subject: Re: [kernel PATCH v2 1/1] Bluetooth: hci_sync: clear workqueue before clear mgmt cmd

Hi Luiz,

> Any particular reason why you are not using hci_cmd_sync_clear
> instead?

That is a good question and we used hci_cmd_sync_clear in the first
version, but it will clear the queue and also close the timer. As a
result, when the adapter is turned on again, the timer will not
schedule any new jobs. So the option is to use hci_cmd_sync_clear and
re-initiate the queue or to write a new function which only clears the
queue.

> We also may want to move the clearing logic to
> hci_dev_close_sync since it should be equivalent to
> hci_request_cancel_all.

I actually have a question here. I saw
"drain_workqueue(hdev->workqueue)" in hci_dev_close_sync and thought
it should force clearing the cmd_sync queue. But it seems cannot
prevent the use-after-free situation.

Any suggestions to improve the solution?

Thanks,
Zhengping


On Fri, Feb 24, 2023 at 1:02 PM Luiz Augusto von Dentz
<[email protected]> wrote:
>
> Hi Zhengping,
>
> On Fri, Feb 24, 2023 at 11:53 AM Zhengping Jiang <[email protected]> wrote:
> >
> > Clear cmd_sync_work queue before clearing the mgmt cmd list to avoid
> > racing conditions which cause use-after-free.
> >
> > When powering off the adapter, the mgmt cmd list will be cleared. If a
> > work is queued in the cmd_sync_work queue at the same time, it will
> > cause the risk of use-after-free, as the cmd pointer is not checked
> > before use.
> >
> > Signed-off-by: Zhengping Jiang <[email protected]>
> > ---
> >
> > Changes in v2:
> > - Add function to clear the queue without stop the timer
> >
> > Changes in v1:
> > - Clear cmd_sync_work queue before clearing the mgmt cmd list
> >
> > net/bluetooth/hci_sync.c | 21 ++++++++++++++++++++-
> > 1 file changed, 20 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
> > index 117eedb6f709..b70365dfff0c 100644
> > --- a/net/bluetooth/hci_sync.c
> > +++ b/net/bluetooth/hci_sync.c
> > @@ -636,6 +636,23 @@ void hci_cmd_sync_init(struct hci_dev *hdev)
> > INIT_DELAYED_WORK(&hdev->adv_instance_expire, adv_timeout_expire);
> > }
> >
> > +static void hci_pend_cmd_sync_clear(struct hci_dev *hdev)
> > +{
> > + struct hci_cmd_sync_work_entry *entry, *tmp;
> > +
> > + mutex_lock(&hdev->cmd_sync_work_lock);
> > + list_for_each_entry_safe(entry, tmp, &hdev->cmd_sync_work_list, list) {
> > + if (entry->destroy) {
> > + hci_req_sync_lock(hdev);
> > + entry->destroy(hdev, entry->data, -ECANCELED);
> > + hci_req_sync_unlock(hdev);
> > + }
> > + list_del(&entry->list);
> > + kfree(entry);
> > + }
> > + mutex_unlock(&hdev->cmd_sync_work_lock);
> > +}
> > +
> > void hci_cmd_sync_clear(struct hci_dev *hdev)
> > {
> > struct hci_cmd_sync_work_entry *entry, *tmp;
> > @@ -4842,8 +4859,10 @@ int hci_dev_close_sync(struct hci_dev *hdev)
> >
> > if (!auto_off && hdev->dev_type == HCI_PRIMARY &&
> > !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> > - hci_dev_test_flag(hdev, HCI_MGMT))
> > + hci_dev_test_flag(hdev, HCI_MGMT)) {
> > + hci_pend_cmd_sync_clear(hdev);
>
> Any particular reason why you are not using hci_cmd_sync_clear
> instead? We also may want to move the clearing logic to
> hci_dev_close_sync since it should be equivalent to
> hci_request_cancel_all.
>
> > __mgmt_power_off(hdev);
> > + }
> >
> > hci_inquiry_cache_flush(hdev);
> > hci_pend_le_actions_clear(hdev);
> > --
> > 2.39.2.722.g9855ee24e9-goog
> >
>
>
> --
> Luiz Augusto von Dentz

2023-02-27 07:18:52

by Zhengping Jiang

[permalink] [raw]
Subject: Re: [kernel PATCH v2 1/1] Bluetooth: hci_sync: clear workqueue before clear mgmt cmd

Hi Luiz,

I have a question. Given that each command in the cmd_sync queue
should clean up the memory in a callback function. I was wondering if
the call to cmd_complete_rsp in __mgmt_power_off function is still
necessary? Will this always risk a race condition that cmd has been
released when the complete callback or _sync function is run?

Thanks,
Zhengping

On Fri, Feb 24, 2023 at 2:37 PM Zhengping Jiang <[email protected]> wrote:
>
> Hi Luiz,
>
> > Any particular reason why you are not using hci_cmd_sync_clear
> > instead?
>
> That is a good question and we used hci_cmd_sync_clear in the first
> version, but it will clear the queue and also close the timer. As a
> result, when the adapter is turned on again, the timer will not
> schedule any new jobs. So the option is to use hci_cmd_sync_clear and
> re-initiate the queue or to write a new function which only clears the
> queue.
>
> > We also may want to move the clearing logic to
> > hci_dev_close_sync since it should be equivalent to
> > hci_request_cancel_all.
>
> I actually have a question here. I saw
> "drain_workqueue(hdev->workqueue)" in hci_dev_close_sync and thought
> it should force clearing the cmd_sync queue. But it seems cannot
> prevent the use-after-free situation.
>
> Any suggestions to improve the solution?
>
> Thanks,
> Zhengping
>
>
> On Fri, Feb 24, 2023 at 1:02 PM Luiz Augusto von Dentz
> <[email protected]> wrote:
> >
> > Hi Zhengping,
> >
> > On Fri, Feb 24, 2023 at 11:53 AM Zhengping Jiang <[email protected]> wrote:
> > >
> > > Clear cmd_sync_work queue before clearing the mgmt cmd list to avoid
> > > racing conditions which cause use-after-free.
> > >
> > > When powering off the adapter, the mgmt cmd list will be cleared. If a
> > > work is queued in the cmd_sync_work queue at the same time, it will
> > > cause the risk of use-after-free, as the cmd pointer is not checked
> > > before use.
> > >
> > > Signed-off-by: Zhengping Jiang <[email protected]>
> > > ---
> > >
> > > Changes in v2:
> > > - Add function to clear the queue without stop the timer
> > >
> > > Changes in v1:
> > > - Clear cmd_sync_work queue before clearing the mgmt cmd list
> > >
> > > net/bluetooth/hci_sync.c | 21 ++++++++++++++++++++-
> > > 1 file changed, 20 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
> > > index 117eedb6f709..b70365dfff0c 100644
> > > --- a/net/bluetooth/hci_sync.c
> > > +++ b/net/bluetooth/hci_sync.c
> > > @@ -636,6 +636,23 @@ void hci_cmd_sync_init(struct hci_dev *hdev)
> > > INIT_DELAYED_WORK(&hdev->adv_instance_expire, adv_timeout_expire);
> > > }
> > >
> > > +static void hci_pend_cmd_sync_clear(struct hci_dev *hdev)
> > > +{
> > > + struct hci_cmd_sync_work_entry *entry, *tmp;
> > > +
> > > + mutex_lock(&hdev->cmd_sync_work_lock);
> > > + list_for_each_entry_safe(entry, tmp, &hdev->cmd_sync_work_list, list) {
> > > + if (entry->destroy) {
> > > + hci_req_sync_lock(hdev);
> > > + entry->destroy(hdev, entry->data, -ECANCELED);
> > > + hci_req_sync_unlock(hdev);
> > > + }
> > > + list_del(&entry->list);
> > > + kfree(entry);
> > > + }
> > > + mutex_unlock(&hdev->cmd_sync_work_lock);
> > > +}
> > > +
> > > void hci_cmd_sync_clear(struct hci_dev *hdev)
> > > {
> > > struct hci_cmd_sync_work_entry *entry, *tmp;
> > > @@ -4842,8 +4859,10 @@ int hci_dev_close_sync(struct hci_dev *hdev)
> > >
> > > if (!auto_off && hdev->dev_type == HCI_PRIMARY &&
> > > !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> > > - hci_dev_test_flag(hdev, HCI_MGMT))
> > > + hci_dev_test_flag(hdev, HCI_MGMT)) {
> > > + hci_pend_cmd_sync_clear(hdev);
> >
> > Any particular reason why you are not using hci_cmd_sync_clear
> > instead? We also may want to move the clearing logic to
> > hci_dev_close_sync since it should be equivalent to
> > hci_request_cancel_all.
> >
> > > __mgmt_power_off(hdev);
> > > + }
> > >
> > > hci_inquiry_cache_flush(hdev);
> > > hci_pend_le_actions_clear(hdev);
> > > --
> > > 2.39.2.722.g9855ee24e9-goog
> > >
> >
> >
> > --
> > Luiz Augusto von Dentz

2023-02-27 23:41:42

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: [kernel PATCH v2 1/1] Bluetooth: hci_sync: clear workqueue before clear mgmt cmd

Hi Zhengping,

On Sun, Feb 26, 2023 at 11:18 PM Zhengping Jiang <[email protected]> wrote:
>
> Hi Luiz,
>
> I have a question. Given that each command in the cmd_sync queue
> should clean up the memory in a callback function. I was wondering if
> the call to cmd_complete_rsp in __mgmt_power_off function is still
> necessary? Will this always risk a race condition that cmd has been
> released when the complete callback or _sync function is run?

Not sure I follow you here, do you have a stack trace when the user
after free occurs?

> Thanks,
> Zhengping
>
> On Fri, Feb 24, 2023 at 2:37 PM Zhengping Jiang <[email protected]> wrote:
> >
> > Hi Luiz,
> >
> > > Any particular reason why you are not using hci_cmd_sync_clear
> > > instead?
> >
> > That is a good question and we used hci_cmd_sync_clear in the first
> > version, but it will clear the queue and also close the timer. As a
> > result, when the adapter is turned on again, the timer will not
> > schedule any new jobs. So the option is to use hci_cmd_sync_clear and
> > re-initiate the queue or to write a new function which only clears the
> > queue.
> >
> > > We also may want to move the clearing logic to
> > > hci_dev_close_sync since it should be equivalent to
> > > hci_request_cancel_all.
> >
> > I actually have a question here. I saw
> > "drain_workqueue(hdev->workqueue)" in hci_dev_close_sync and thought
> > it should force clearing the cmd_sync queue. But it seems cannot
> > prevent the use-after-free situation.
> >
> > Any suggestions to improve the solution?
> >
> > Thanks,
> > Zhengping
> >
> >
> > On Fri, Feb 24, 2023 at 1:02 PM Luiz Augusto von Dentz
> > <[email protected]> wrote:
> > >
> > > Hi Zhengping,
> > >
> > > On Fri, Feb 24, 2023 at 11:53 AM Zhengping Jiang <[email protected]> wrote:
> > > >
> > > > Clear cmd_sync_work queue before clearing the mgmt cmd list to avoid
> > > > racing conditions which cause use-after-free.
> > > >
> > > > When powering off the adapter, the mgmt cmd list will be cleared. If a
> > > > work is queued in the cmd_sync_work queue at the same time, it will
> > > > cause the risk of use-after-free, as the cmd pointer is not checked
> > > > before use.
> > > >
> > > > Signed-off-by: Zhengping Jiang <[email protected]>
> > > > ---
> > > >
> > > > Changes in v2:
> > > > - Add function to clear the queue without stop the timer
> > > >
> > > > Changes in v1:
> > > > - Clear cmd_sync_work queue before clearing the mgmt cmd list
> > > >
> > > > net/bluetooth/hci_sync.c | 21 ++++++++++++++++++++-
> > > > 1 file changed, 20 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
> > > > index 117eedb6f709..b70365dfff0c 100644
> > > > --- a/net/bluetooth/hci_sync.c
> > > > +++ b/net/bluetooth/hci_sync.c
> > > > @@ -636,6 +636,23 @@ void hci_cmd_sync_init(struct hci_dev *hdev)
> > > > INIT_DELAYED_WORK(&hdev->adv_instance_expire, adv_timeout_expire);
> > > > }
> > > >
> > > > +static void hci_pend_cmd_sync_clear(struct hci_dev *hdev)
> > > > +{
> > > > + struct hci_cmd_sync_work_entry *entry, *tmp;
> > > > +
> > > > + mutex_lock(&hdev->cmd_sync_work_lock);
> > > > + list_for_each_entry_safe(entry, tmp, &hdev->cmd_sync_work_list, list) {
> > > > + if (entry->destroy) {
> > > > + hci_req_sync_lock(hdev);
> > > > + entry->destroy(hdev, entry->data, -ECANCELED);
> > > > + hci_req_sync_unlock(hdev);
> > > > + }
> > > > + list_del(&entry->list);
> > > > + kfree(entry);
> > > > + }
> > > > + mutex_unlock(&hdev->cmd_sync_work_lock);
> > > > +}
> > > > +
> > > > void hci_cmd_sync_clear(struct hci_dev *hdev)
> > > > {
> > > > struct hci_cmd_sync_work_entry *entry, *tmp;
> > > > @@ -4842,8 +4859,10 @@ int hci_dev_close_sync(struct hci_dev *hdev)
> > > >
> > > > if (!auto_off && hdev->dev_type == HCI_PRIMARY &&
> > > > !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> > > > - hci_dev_test_flag(hdev, HCI_MGMT))
> > > > + hci_dev_test_flag(hdev, HCI_MGMT)) {
> > > > + hci_pend_cmd_sync_clear(hdev);
> > >
> > > Any particular reason why you are not using hci_cmd_sync_clear
> > > instead? We also may want to move the clearing logic to
> > > hci_dev_close_sync since it should be equivalent to
> > > hci_request_cancel_all.
> > >
> > > > __mgmt_power_off(hdev);
> > > > + }
> > > >
> > > > hci_inquiry_cache_flush(hdev);
> > > > hci_pend_le_actions_clear(hdev);
> > > > --
> > > > 2.39.2.722.g9855ee24e9-goog
> > > >
> > >
> > >
> > > --
> > > Luiz Augusto von Dentz



--
Luiz Augusto von Dentz

2023-02-27 23:58:50

by Zhengping Jiang

[permalink] [raw]
Subject: Re: [kernel PATCH v2 1/1] Bluetooth: hci_sync: clear workqueue before clear mgmt cmd

Hi Luiz,

Sure. Hope this helps.
Here is one log from the user.

[ 53.368740] ==================================================================
[ 53.376167] BUG: KASAN: use-after-free in set_ssp_sync+0x44/0x154 [bluetooth]
[ 53.384303] Read of size 8 at addr ffffff80b7ee0318 by task kworker/u17:0/190
[ 53.396342] CPU: 7 PID: 190 Comm: kworker/u17:0 Tainted: G W
5.15.59-lockdep #1 29eed131ef0afd42bc369a6a0ca1c69d3653699b
[ 53.408868] Hardware name: Qualcomm Technologies, Inc. sc7280 CRD
platform (rev5+) (DT)
[ 53.417095] Workqueue: hci0 hci_cmd_sync_work [bluetooth]
[ 53.422780] Call trace:
[ 53.425310] dump_backtrace+0x0/0x424
[ 53.429108] show_stack+0x20/0x2c

[ 53.432534] dump_stack_lvl+0x84/0xb4
[ 53.436514] print_address_description+0x30/0x2fc
[ 53.441369] kasan_report+0x15c/0x19c
[ 53.445975] __asan_report_load8_noabort+0x44/0x50
[ 53.450910] set_ssp_sync+0x44/0x154 [bluetooth
34f6fa2bbf49f3d7faf6ea04e8755ae16590a6b3]
[ 53.460136] hci_cmd_sync_work+0x1c8/0x2c8 [bluetooth
34f6fa2bbf49f3d7faf6ea04e8755ae16590a6b3]
[ 53.472214] process_one_work+0x59c/0xa88
[ 53.476990] worker_thread+0x81c/0xd18
[ 53.480854] kthread+0x2d4/0x3d8
[ 53.484272] ret_from_fork+0x10/0x20

[ 53.489733] Allocated by task 1162:
[ 53.493336] kasan_save_stack+0x38/0x68
[ 53.498115] __kasan_kmalloc+0xb4/0xd0
[ 53.501993] kmem_cache_alloc_trace+0x29c/0x374
[ 53.506661] mgmt_pending_new+0x74/0x200 [bluetooth]
[ 53.511905] mgmt_pending_add+0x28/0xec [bluetooth]
[ 53.517059] set_ssp+0x2d8/0x5b0 [bluetooth]
[ 53.521575] hci_mgmt_cmd+0x5c4/0x8b0 [bluetooth]
[ 53.526538] hci_sock_sendmsg+0x28c/0x95c [bluetooth]
[ 53.531850] sock_sendmsg+0xb4/0xd8
[ 53.535454] sock_write_iter+0x1c0/0x2d0
[ 53.539494] do_iter_readv_writev+0x350/0x4e0
[ 53.543980] do_iter_write+0xf0/0x2e4
[ 53.547747] vfs_writev+0xd0/0x13c
[ 53.551254] do_writev+0xe8/0x1fc
[ 53.554672] __arm64_sys_writev+0x84/0x98
[ 53.558805] invoke_syscall+0x78/0x20c
[ 53.562665] el0_svc_common+0x12c/0x2f0
[ 53.566618] do_el0_svc+0x94/0x13c
[ 53.570125] el0_svc+0x5c/0x108
[ 53.573374] el0t_64_sync_handler+0x78/0x108
[ 53.577773] el0t_64_sync+0x1a4/0x1a8

[ 53.583089] Freed by task 3207:
[ 53.586325] kasan_save_stack+0x38/0x68
[ 53.590282] kasan_set_track+0x28/0x3c
[ 53.594153] kasan_set_free_info+0x28/0x4c
[ 53.598369] ____kasan_slab_free+0x138/0x17c
[ 53.602767] __kasan_slab_free+0x18/0x28
[ 53.606803] slab_free_freelist_hook+0x188/0x260
[ 53.611559] kfree+0x138/0x29c
[ 53.614708] mgmt_pending_free+0xac/0xdc [bluetooth]
[ 53.619948] mgmt_pending_remove+0xd8/0xf0 [bluetooth]
[ 53.625357] cmd_complete_rsp+0xc8/0x178 [bluetooth]
[ 53.630586] mgmt_pending_foreach+0xa8/0xf8 [bluetooth]
[ 53.636076] __mgmt_power_off+0x114/0x26c [bluetooth]
[ 53.641390] hci_dev_close_sync+0x314/0x814 [bluetooth]
[ 53.646882] hci_dev_do_close+0x3c/0x7c [bluetooth]
[ 53.652017] hci_dev_close+0xa4/0x15c [bluetooth]
[ 53.656980] hci_sock_ioctl+0x298/0x444 [bluetooth]
[ 53.662117] sock_do_ioctl+0xd0/0x1e8
[ 53.665900] sock_ioctl+0x4fc/0x72c
[ 53.669500] __arm64_sys_ioctl+0x118/0x154
[ 53.673726] invoke_syscall+0x78/0x20c
[ 53.677587] el0_svc_common+0x12c/0x2f0
[ 53.681533] do_el0_svc+0x94/0x13c
[ 53.685043] el0_svc+0x5c/0x108
[ 53.688278] el0t_64_sync_handler+0x78/0x108
[ 53.692677] el0t_64_sync+0x1a4/0x1a8

[ 53.697988] Last potentially related work creation:
[ 53.703009] kasan_save_stack+0x38/0x68
[ 53.706962] kasan_record_aux_stack+0x104/0x130
[ 53.711622] __call_rcu+0x14c/0x860
[ 53.715212] call_rcu+0x18/0x24
[ 53.718448] sk_filter_uncharge+0xc0/0x120
[ 53.722667] __sk_destruct+0xb4/0x4a8
[ 53.726435] sk_destruct+0x78/0xa0
[ 53.729941] __sk_free+0x190/0x270
[ 53.733453] sk_free+0x54/0x8c
[ 53.736603] deferred_put_nlk_sk+0x1d4/0x20c
[ 53.741000] rcu_do_batch+0x3e8/0xd08
[ 53.744772] nocb_cb_wait+0xc8/0xa3c
[ 53.748453] rcu_nocb_cb_kthread+0x48/0x134
[ 53.752768] kthread+0x2d4/0x3d8
[ 53.756098] ret_from_fork+0x10/0x20

This is another one at a different function but with the same signature.

[ 43.363512] ==================================================================
[ 43.370966] BUG: KASAN: use-after-free in
mgmt_remove_adv_monitor_sync+0x40/0xcc [bluetooth]
[ 43.379813] Read of size 8 at addr ffffff8096c28d18 by task kworker/u17:0/192
[ 43.387158]
[ 43.388705] CPU: 6 PID: 192 Comm: kworker/u17:0 Tainted: G W
5.15.59-lockdep #1 59f35e3dfc07f6688b084869895c7a39892c891a
localhost ~ # [ 43.410184] Workqueue: hci0 hci_cmd_sync_work [bluetooth]

[ 43.418887] Call trace:
[ 43.422407] dump_backtrace+0x0/0x424
[ 43.426191] show_stack+0x20/0x2c
[ 43.429608] dump_stack_lvl+0x84/0xb4
[ 43.433395] print_address_description+0x30/0x2fc
[ 43.438243] kasan_report+0x15c/0x19c
[ 43.442070] __asan_report_load8_noabort+0x44/0x50
hciconfig hci0 up
[ 43.447009] mgmt_remove_adv_monitor_sync+0x40/0xcc [bluetooth
8dae3a82177133cfa9626e7322b3b0c8f665102d]
[ 43.458568] hci_cmd_sync_work+0x1bc/0x2bc [bluetooth
8dae3a82177133cfa9626e7322b3b0c8f665102d]
[ 43.467656] process_one_work+0x59c/0xa88
[ 43.472530] worker_thread+0x81c/0xd18
[ 43.476410] kthread+0x2d4/0x3d8
localhost ~ # [ 43.479753] ret_from_fork+0x10/0x20
[ 43.486588]
[ 43.488156] Allocated by task 1118:
[ 43.491751] kasan_save_stack+0x38/0x68
[ 43.495709] __kasan_kmalloc+0xb4/0xd0
[ 43.499577] kmem_cache_alloc_trace+0x29c/0x374
[ 43.504238] mgmt_pending_new+0x74/0x200 [bluetooth]
sleep 2[ 43.509509] mgmt_pending_add+0x28/0xec [bluetooth]

[ 43.515244] remove_adv_monitor+0xf8/0x174 [bluetooth]
[ 43.521533] hci_mgmt_cmd+0x5c4/0x8b0 [bluetooth]
[ 43.526527] hci_sock_sendmsg+0x28c/0x95c [bluetooth]
[ 43.531873] sock_sendmsg+0xb4/0xd8
[ 43.535472] sock_write_iter+0x1c0/0x2d0
[ 43.539519] do_iter_readv_writev+0x350/0x4e0
[ 43.544012] do_iter_write+0xf0/0x2e4
[ 43.547788] vfs_writev+0xd0/0x13c
[ 43.551295] do_writev+0xe8/0x1fc
[ 43.554710] __arm64_sys_writev+0x84/0x98
[ 43.558838] invoke_syscall+0x78/0x20c
[ 43.562709] el0_svc_common+0x12c/0x2f0
[ 43.566654] do_el0_svc+0x94/0x13c
[ 43.570155] el0_svc+0x5c/0x108
[ 43.573391] el0t_64_sync_handler+0x78/0x108
[ 43.577785] el0t_64_sync+0x1a4/0x1a8
[ 43.581564]
[ 43.583115] Freed by task 3217:
[ 43.586356] kasan_save_stack+0x38/0x68
[ 43.590314] kasan_set_track+0x28/0x3c
[ 43.594180] kasan_set_free_info+0x28/0x4c
[ 43.598396] ____kasan_slab_free+0x138/0x17c
[ 43.602794] __kasan_slab_free+0x18/0x28
[ 43.606838] slab_free_freelist_hook+0x188/0x260
[ 43.611591] kfree+0x138/0x29c
[ 43.614741] mgmt_pending_free+0xac/0xdc [bluetooth]
[ 43.620003] mgmt_pending_remove+0xd8/0xf0 [bluetooth]
[ 43.625434] cmd_complete_rsp+0xc8/0x178 [bluetooth]
[ 43.630686] mgmt_pending_foreach+0xa8/0xf8 [bluetooth]
[ 43.636198] __mgmt_power_off+0x114/0x26c [bluetooth]
[ 43.641532] hci_dev_close_sync+0x2ec/0x7ec [bluetooth]
[ 43.647049] hci_dev_do_close+0x3c/0x7c [bluetooth]
[ 43.652209] hci_dev_close+0xac/0x164 [bluetooth]
[ 43.657190] hci_sock_ioctl+0x298/0x444 [bluetooth]
[ 43.662353] sock_do_ioctl+0xd0/0x1e8
[ 43.666134] sock_ioctl+0x4fc/0x72c
[ 43.669736] __arm64_sys_ioctl+0x118/0x154
[ 43.673961] invoke_syscall+0x78/0x20c
[ 43.677820] el0_svc_common+0x12c/0x2f0
[ 43.681770] do_el0_svc+0x94/0x13c
[ 43.685278] el0_svc+0x5c/0x108
[ 43.688514] el0t_64_sync_handler+0x78/0x108
[ 43.692913] el0t_64_sync+0x1a4/0x1a8

Thanks,
Zhengping

On Mon, Feb 27, 2023 at 3:41 PM Luiz Augusto von Dentz
<[email protected]> wrote:
>
> Hi Zhengping,
>
> On Sun, Feb 26, 2023 at 11:18 PM Zhengping Jiang <[email protected]> wrote:
> >
> > Hi Luiz,
> >
> > I have a question. Given that each command in the cmd_sync queue
> > should clean up the memory in a callback function. I was wondering if
> > the call to cmd_complete_rsp in __mgmt_power_off function is still
> > necessary? Will this always risk a race condition that cmd has been
> > released when the complete callback or _sync function is run?
>
> Not sure I follow you here, do you have a stack trace when the user
> after free occurs?
>
> > Thanks,
> > Zhengping
> >
> > On Fri, Feb 24, 2023 at 2:37 PM Zhengping Jiang <[email protected]> wrote:
> > >
> > > Hi Luiz,
> > >
> > > > Any particular reason why you are not using hci_cmd_sync_clear
> > > > instead?
> > >
> > > That is a good question and we used hci_cmd_sync_clear in the first
> > > version, but it will clear the queue and also close the timer. As a
> > > result, when the adapter is turned on again, the timer will not
> > > schedule any new jobs. So the option is to use hci_cmd_sync_clear and
> > > re-initiate the queue or to write a new function which only clears the
> > > queue.
> > >
> > > > We also may want to move the clearing logic to
> > > > hci_dev_close_sync since it should be equivalent to
> > > > hci_request_cancel_all.
> > >
> > > I actually have a question here. I saw
> > > "drain_workqueue(hdev->workqueue)" in hci_dev_close_sync and thought
> > > it should force clearing the cmd_sync queue. But it seems cannot
> > > prevent the use-after-free situation.
> > >
> > > Any suggestions to improve the solution?
> > >
> > > Thanks,
> > > Zhengping
> > >
> > >
> > > On Fri, Feb 24, 2023 at 1:02 PM Luiz Augusto von Dentz
> > > <[email protected]> wrote:
> > > >
> > > > Hi Zhengping,
> > > >
> > > > On Fri, Feb 24, 2023 at 11:53 AM Zhengping Jiang <[email protected]> wrote:
> > > > >
> > > > > Clear cmd_sync_work queue before clearing the mgmt cmd list to avoid
> > > > > racing conditions which cause use-after-free.
> > > > >
> > > > > When powering off the adapter, the mgmt cmd list will be cleared. If a
> > > > > work is queued in the cmd_sync_work queue at the same time, it will
> > > > > cause the risk of use-after-free, as the cmd pointer is not checked
> > > > > before use.
> > > > >
> > > > > Signed-off-by: Zhengping Jiang <[email protected]>
> > > > > ---
> > > > >
> > > > > Changes in v2:
> > > > > - Add function to clear the queue without stop the timer
> > > > >
> > > > > Changes in v1:
> > > > > - Clear cmd_sync_work queue before clearing the mgmt cmd list
> > > > >
> > > > > net/bluetooth/hci_sync.c | 21 ++++++++++++++++++++-
> > > > > 1 file changed, 20 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
> > > > > index 117eedb6f709..b70365dfff0c 100644
> > > > > --- a/net/bluetooth/hci_sync.c
> > > > > +++ b/net/bluetooth/hci_sync.c
> > > > > @@ -636,6 +636,23 @@ void hci_cmd_sync_init(struct hci_dev *hdev)
> > > > > INIT_DELAYED_WORK(&hdev->adv_instance_expire, adv_timeout_expire);
> > > > > }
> > > > >
> > > > > +static void hci_pend_cmd_sync_clear(struct hci_dev *hdev)
> > > > > +{
> > > > > + struct hci_cmd_sync_work_entry *entry, *tmp;
> > > > > +
> > > > > + mutex_lock(&hdev->cmd_sync_work_lock);
> > > > > + list_for_each_entry_safe(entry, tmp, &hdev->cmd_sync_work_list, list) {
> > > > > + if (entry->destroy) {
> > > > > + hci_req_sync_lock(hdev);
> > > > > + entry->destroy(hdev, entry->data, -ECANCELED);
> > > > > + hci_req_sync_unlock(hdev);
> > > > > + }
> > > > > + list_del(&entry->list);
> > > > > + kfree(entry);
> > > > > + }
> > > > > + mutex_unlock(&hdev->cmd_sync_work_lock);
> > > > > +}
> > > > > +
> > > > > void hci_cmd_sync_clear(struct hci_dev *hdev)
> > > > > {
> > > > > struct hci_cmd_sync_work_entry *entry, *tmp;
> > > > > @@ -4842,8 +4859,10 @@ int hci_dev_close_sync(struct hci_dev *hdev)
> > > > >
> > > > > if (!auto_off && hdev->dev_type == HCI_PRIMARY &&
> > > > > !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> > > > > - hci_dev_test_flag(hdev, HCI_MGMT))
> > > > > + hci_dev_test_flag(hdev, HCI_MGMT)) {
> > > > > + hci_pend_cmd_sync_clear(hdev);
> > > >
> > > > Any particular reason why you are not using hci_cmd_sync_clear
> > > > instead? We also may want to move the clearing logic to
> > > > hci_dev_close_sync since it should be equivalent to
> > > > hci_request_cancel_all.
> > > >
> > > > > __mgmt_power_off(hdev);
> > > > > + }
> > > > >
> > > > > hci_inquiry_cache_flush(hdev);
> > > > > hci_pend_le_actions_clear(hdev);
> > > > > --
> > > > > 2.39.2.722.g9855ee24e9-goog
> > > > >
> > > >
> > > >
> > > > --
> > > > Luiz Augusto von Dentz
>
>
>
> --
> Luiz Augusto von Dentz

2023-02-28 21:12:43

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: [kernel PATCH v2 1/1] Bluetooth: hci_sync: clear workqueue before clear mgmt cmd

Hi Zhengping,

On Mon, Feb 27, 2023 at 3:58 PM Zhengping Jiang <[email protected]> wrote:
>
> Hi Luiz,
>
> Sure. Hope this helps.
> Here is one log from the user.
>
> [ 53.368740] ==================================================================
> [ 53.376167] BUG: KASAN: use-after-free in set_ssp_sync+0x44/0x154 [bluetooth]
> [ 53.384303] Read of size 8 at addr ffffff80b7ee0318 by task kworker/u17:0/190
> [ 53.396342] CPU: 7 PID: 190 Comm: kworker/u17:0 Tainted: G W
> 5.15.59-lockdep #1 29eed131ef0afd42bc369a6a0ca1c69d3653699b
> [ 53.408868] Hardware name: Qualcomm Technologies, Inc. sc7280 CRD
> platform (rev5+) (DT)
> [ 53.417095] Workqueue: hci0 hci_cmd_sync_work [bluetooth]
> [ 53.422780] Call trace:
> [ 53.425310] dump_backtrace+0x0/0x424
> [ 53.429108] show_stack+0x20/0x2c
>
> [ 53.432534] dump_stack_lvl+0x84/0xb4
> [ 53.436514] print_address_description+0x30/0x2fc
> [ 53.441369] kasan_report+0x15c/0x19c
> [ 53.445975] __asan_report_load8_noabort+0x44/0x50
> [ 53.450910] set_ssp_sync+0x44/0x154 [bluetooth
> 34f6fa2bbf49f3d7faf6ea04e8755ae16590a6b3]
> [ 53.460136] hci_cmd_sync_work+0x1c8/0x2c8 [bluetooth
> 34f6fa2bbf49f3d7faf6ea04e8755ae16590a6b3]
> [ 53.472214] process_one_work+0x59c/0xa88
> [ 53.476990] worker_thread+0x81c/0xd18
> [ 53.480854] kthread+0x2d4/0x3d8
> [ 53.484272] ret_from_fork+0x10/0x20
>
> [ 53.489733] Allocated by task 1162:
> [ 53.493336] kasan_save_stack+0x38/0x68
> [ 53.498115] __kasan_kmalloc+0xb4/0xd0
> [ 53.501993] kmem_cache_alloc_trace+0x29c/0x374
> [ 53.506661] mgmt_pending_new+0x74/0x200 [bluetooth]
> [ 53.511905] mgmt_pending_add+0x28/0xec [bluetooth]
> [ 53.517059] set_ssp+0x2d8/0x5b0 [bluetooth]
> [ 53.521575] hci_mgmt_cmd+0x5c4/0x8b0 [bluetooth]
> [ 53.526538] hci_sock_sendmsg+0x28c/0x95c [bluetooth]
> [ 53.531850] sock_sendmsg+0xb4/0xd8
> [ 53.535454] sock_write_iter+0x1c0/0x2d0
> [ 53.539494] do_iter_readv_writev+0x350/0x4e0
> [ 53.543980] do_iter_write+0xf0/0x2e4
> [ 53.547747] vfs_writev+0xd0/0x13c
> [ 53.551254] do_writev+0xe8/0x1fc
> [ 53.554672] __arm64_sys_writev+0x84/0x98
> [ 53.558805] invoke_syscall+0x78/0x20c
> [ 53.562665] el0_svc_common+0x12c/0x2f0
> [ 53.566618] do_el0_svc+0x94/0x13c
> [ 53.570125] el0_svc+0x5c/0x108
> [ 53.573374] el0t_64_sync_handler+0x78/0x108
> [ 53.577773] el0t_64_sync+0x1a4/0x1a8
>
> [ 53.583089] Freed by task 3207:
> [ 53.586325] kasan_save_stack+0x38/0x68
> [ 53.590282] kasan_set_track+0x28/0x3c
> [ 53.594153] kasan_set_free_info+0x28/0x4c
> [ 53.598369] ____kasan_slab_free+0x138/0x17c
> [ 53.602767] __kasan_slab_free+0x18/0x28
> [ 53.606803] slab_free_freelist_hook+0x188/0x260
> [ 53.611559] kfree+0x138/0x29c
> [ 53.614708] mgmt_pending_free+0xac/0xdc [bluetooth]
> [ 53.619948] mgmt_pending_remove+0xd8/0xf0 [bluetooth]
> [ 53.625357] cmd_complete_rsp+0xc8/0x178 [bluetooth]
> [ 53.630586] mgmt_pending_foreach+0xa8/0xf8 [bluetooth]
> [ 53.636076] __mgmt_power_off+0x114/0x26c [bluetooth]
> [ 53.641390] hci_dev_close_sync+0x314/0x814 [bluetooth]
> [ 53.646882] hci_dev_do_close+0x3c/0x7c [bluetooth]
> [ 53.652017] hci_dev_close+0xa4/0x15c [bluetooth]
> [ 53.656980] hci_sock_ioctl+0x298/0x444 [bluetooth]
> [ 53.662117] sock_do_ioctl+0xd0/0x1e8
> [ 53.665900] sock_ioctl+0x4fc/0x72c
> [ 53.669500] __arm64_sys_ioctl+0x118/0x154
> [ 53.673726] invoke_syscall+0x78/0x20c
> [ 53.677587] el0_svc_common+0x12c/0x2f0
> [ 53.681533] do_el0_svc+0x94/0x13c
> [ 53.685043] el0_svc+0x5c/0x108
> [ 53.688278] el0t_64_sync_handler+0x78/0x108
> [ 53.692677] el0t_64_sync+0x1a4/0x1a8
>
> [ 53.697988] Last potentially related work creation:
> [ 53.703009] kasan_save_stack+0x38/0x68
> [ 53.706962] kasan_record_aux_stack+0x104/0x130
> [ 53.711622] __call_rcu+0x14c/0x860
> [ 53.715212] call_rcu+0x18/0x24
> [ 53.718448] sk_filter_uncharge+0xc0/0x120
> [ 53.722667] __sk_destruct+0xb4/0x4a8
> [ 53.726435] sk_destruct+0x78/0xa0
> [ 53.729941] __sk_free+0x190/0x270
> [ 53.733453] sk_free+0x54/0x8c
> [ 53.736603] deferred_put_nlk_sk+0x1d4/0x20c
> [ 53.741000] rcu_do_batch+0x3e8/0xd08
> [ 53.744772] nocb_cb_wait+0xc8/0xa3c
> [ 53.748453] rcu_nocb_cb_kthread+0x48/0x134
> [ 53.752768] kthread+0x2d4/0x3d8
> [ 53.756098] ret_from_fork+0x10/0x20
>
> This is another one at a different function but with the same signature.
>
> [ 43.363512] ==================================================================
> [ 43.370966] BUG: KASAN: use-after-free in
> mgmt_remove_adv_monitor_sync+0x40/0xcc [bluetooth]
> [ 43.379813] Read of size 8 at addr ffffff8096c28d18 by task kworker/u17:0/192
> [ 43.387158]
> [ 43.388705] CPU: 6 PID: 192 Comm: kworker/u17:0 Tainted: G W
> 5.15.59-lockdep #1 59f35e3dfc07f6688b084869895c7a39892c891a
> localhost ~ # [ 43.410184] Workqueue: hci0 hci_cmd_sync_work [bluetooth]
>
> [ 43.418887] Call trace:
> [ 43.422407] dump_backtrace+0x0/0x424
> [ 43.426191] show_stack+0x20/0x2c
> [ 43.429608] dump_stack_lvl+0x84/0xb4
> [ 43.433395] print_address_description+0x30/0x2fc
> [ 43.438243] kasan_report+0x15c/0x19c
> [ 43.442070] __asan_report_load8_noabort+0x44/0x50
> hciconfig hci0 up
> [ 43.447009] mgmt_remove_adv_monitor_sync+0x40/0xcc [bluetooth
> 8dae3a82177133cfa9626e7322b3b0c8f665102d]
> [ 43.458568] hci_cmd_sync_work+0x1bc/0x2bc [bluetooth
> 8dae3a82177133cfa9626e7322b3b0c8f665102d]
> [ 43.467656] process_one_work+0x59c/0xa88
> [ 43.472530] worker_thread+0x81c/0xd18
> [ 43.476410] kthread+0x2d4/0x3d8
> localhost ~ # [ 43.479753] ret_from_fork+0x10/0x20
> [ 43.486588]
> [ 43.488156] Allocated by task 1118:
> [ 43.491751] kasan_save_stack+0x38/0x68
> [ 43.495709] __kasan_kmalloc+0xb4/0xd0
> [ 43.499577] kmem_cache_alloc_trace+0x29c/0x374
> [ 43.504238] mgmt_pending_new+0x74/0x200 [bluetooth]
> sleep 2[ 43.509509] mgmt_pending_add+0x28/0xec [bluetooth]
>
> [ 43.515244] remove_adv_monitor+0xf8/0x174 [bluetooth]
> [ 43.521533] hci_mgmt_cmd+0x5c4/0x8b0 [bluetooth]
> [ 43.526527] hci_sock_sendmsg+0x28c/0x95c [bluetooth]
> [ 43.531873] sock_sendmsg+0xb4/0xd8
> [ 43.535472] sock_write_iter+0x1c0/0x2d0
> [ 43.539519] do_iter_readv_writev+0x350/0x4e0
> [ 43.544012] do_iter_write+0xf0/0x2e4
> [ 43.547788] vfs_writev+0xd0/0x13c
> [ 43.551295] do_writev+0xe8/0x1fc
> [ 43.554710] __arm64_sys_writev+0x84/0x98
> [ 43.558838] invoke_syscall+0x78/0x20c
> [ 43.562709] el0_svc_common+0x12c/0x2f0
> [ 43.566654] do_el0_svc+0x94/0x13c
> [ 43.570155] el0_svc+0x5c/0x108
> [ 43.573391] el0t_64_sync_handler+0x78/0x108
> [ 43.577785] el0t_64_sync+0x1a4/0x1a8
> [ 43.581564]
> [ 43.583115] Freed by task 3217:
> [ 43.586356] kasan_save_stack+0x38/0x68
> [ 43.590314] kasan_set_track+0x28/0x3c
> [ 43.594180] kasan_set_free_info+0x28/0x4c
> [ 43.598396] ____kasan_slab_free+0x138/0x17c
> [ 43.602794] __kasan_slab_free+0x18/0x28
> [ 43.606838] slab_free_freelist_hook+0x188/0x260
> [ 43.611591] kfree+0x138/0x29c
> [ 43.614741] mgmt_pending_free+0xac/0xdc [bluetooth]
> [ 43.620003] mgmt_pending_remove+0xd8/0xf0 [bluetooth]
> [ 43.625434] cmd_complete_rsp+0xc8/0x178 [bluetooth]
> [ 43.630686] mgmt_pending_foreach+0xa8/0xf8 [bluetooth]
> [ 43.636198] __mgmt_power_off+0x114/0x26c [bluetooth]
> [ 43.641532] hci_dev_close_sync+0x2ec/0x7ec [bluetooth]
> [ 43.647049] hci_dev_do_close+0x3c/0x7c [bluetooth]
> [ 43.652209] hci_dev_close+0xac/0x164 [bluetooth]
> [ 43.657190] hci_sock_ioctl+0x298/0x444 [bluetooth]
> [ 43.662353] sock_do_ioctl+0xd0/0x1e8
> [ 43.666134] sock_ioctl+0x4fc/0x72c
> [ 43.669736] __arm64_sys_ioctl+0x118/0x154
> [ 43.673961] invoke_syscall+0x78/0x20c
> [ 43.677820] el0_svc_common+0x12c/0x2f0
> [ 43.681770] do_el0_svc+0x94/0x13c
> [ 43.685278] el0_svc+0x5c/0x108
> [ 43.688514] el0t_64_sync_handler+0x78/0x108
> [ 43.692913] el0t_64_sync+0x1a4/0x1a8
>
> Thanks,
> Zhengping

Ok, how about we do something like the following:

https://gist.github.com/Vudentz/365d664275e4d2e2af157e47f0502f50

The actual real culprit seem to be __mgmt_power_off does cleanup
mgmt_pending but that is still accessible via cmd_sync_work_list, this
is probably how hci_request was designed but in case of cmd_sync we
normally have the data as part of cmd_sync_work_list.

> On Mon, Feb 27, 2023 at 3:41 PM Luiz Augusto von Dentz
> <[email protected]> wrote:
> >
> > Hi Zhengping,
> >
> > On Sun, Feb 26, 2023 at 11:18 PM Zhengping Jiang <[email protected]> wrote:
> > >
> > > Hi Luiz,
> > >
> > > I have a question. Given that each command in the cmd_sync queue
> > > should clean up the memory in a callback function. I was wondering if
> > > the call to cmd_complete_rsp in __mgmt_power_off function is still
> > > necessary? Will this always risk a race condition that cmd has been
> > > released when the complete callback or _sync function is run?
> >
> > Not sure I follow you here, do you have a stack trace when the user
> > after free occurs?
> >
> > > Thanks,
> > > Zhengping
> > >
> > > On Fri, Feb 24, 2023 at 2:37 PM Zhengping Jiang <[email protected]> wrote:
> > > >
> > > > Hi Luiz,
> > > >
> > > > > Any particular reason why you are not using hci_cmd_sync_clear
> > > > > instead?
> > > >
> > > > That is a good question and we used hci_cmd_sync_clear in the first
> > > > version, but it will clear the queue and also close the timer. As a
> > > > result, when the adapter is turned on again, the timer will not
> > > > schedule any new jobs. So the option is to use hci_cmd_sync_clear and
> > > > re-initiate the queue or to write a new function which only clears the
> > > > queue.
> > > >
> > > > > We also may want to move the clearing logic to
> > > > > hci_dev_close_sync since it should be equivalent to
> > > > > hci_request_cancel_all.
> > > >
> > > > I actually have a question here. I saw
> > > > "drain_workqueue(hdev->workqueue)" in hci_dev_close_sync and thought
> > > > it should force clearing the cmd_sync queue. But it seems cannot
> > > > prevent the use-after-free situation.
> > > >
> > > > Any suggestions to improve the solution?
> > > >
> > > > Thanks,
> > > > Zhengping
> > > >
> > > >
> > > > On Fri, Feb 24, 2023 at 1:02 PM Luiz Augusto von Dentz
> > > > <[email protected]> wrote:
> > > > >
> > > > > Hi Zhengping,
> > > > >
> > > > > On Fri, Feb 24, 2023 at 11:53 AM Zhengping Jiang <[email protected]> wrote:
> > > > > >
> > > > > > Clear cmd_sync_work queue before clearing the mgmt cmd list to avoid
> > > > > > racing conditions which cause use-after-free.
> > > > > >
> > > > > > When powering off the adapter, the mgmt cmd list will be cleared. If a
> > > > > > work is queued in the cmd_sync_work queue at the same time, it will
> > > > > > cause the risk of use-after-free, as the cmd pointer is not checked
> > > > > > before use.
> > > > > >
> > > > > > Signed-off-by: Zhengping Jiang <[email protected]>
> > > > > > ---
> > > > > >
> > > > > > Changes in v2:
> > > > > > - Add function to clear the queue without stop the timer
> > > > > >
> > > > > > Changes in v1:
> > > > > > - Clear cmd_sync_work queue before clearing the mgmt cmd list
> > > > > >
> > > > > > net/bluetooth/hci_sync.c | 21 ++++++++++++++++++++-
> > > > > > 1 file changed, 20 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
> > > > > > index 117eedb6f709..b70365dfff0c 100644
> > > > > > --- a/net/bluetooth/hci_sync.c
> > > > > > +++ b/net/bluetooth/hci_sync.c
> > > > > > @@ -636,6 +636,23 @@ void hci_cmd_sync_init(struct hci_dev *hdev)
> > > > > > INIT_DELAYED_WORK(&hdev->adv_instance_expire, adv_timeout_expire);
> > > > > > }
> > > > > >
> > > > > > +static void hci_pend_cmd_sync_clear(struct hci_dev *hdev)
> > > > > > +{
> > > > > > + struct hci_cmd_sync_work_entry *entry, *tmp;
> > > > > > +
> > > > > > + mutex_lock(&hdev->cmd_sync_work_lock);
> > > > > > + list_for_each_entry_safe(entry, tmp, &hdev->cmd_sync_work_list, list) {
> > > > > > + if (entry->destroy) {
> > > > > > + hci_req_sync_lock(hdev);
> > > > > > + entry->destroy(hdev, entry->data, -ECANCELED);
> > > > > > + hci_req_sync_unlock(hdev);
> > > > > > + }
> > > > > > + list_del(&entry->list);
> > > > > > + kfree(entry);
> > > > > > + }
> > > > > > + mutex_unlock(&hdev->cmd_sync_work_lock);
> > > > > > +}
> > > > > > +
> > > > > > void hci_cmd_sync_clear(struct hci_dev *hdev)
> > > > > > {
> > > > > > struct hci_cmd_sync_work_entry *entry, *tmp;
> > > > > > @@ -4842,8 +4859,10 @@ int hci_dev_close_sync(struct hci_dev *hdev)
> > > > > >
> > > > > > if (!auto_off && hdev->dev_type == HCI_PRIMARY &&
> > > > > > !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> > > > > > - hci_dev_test_flag(hdev, HCI_MGMT))
> > > > > > + hci_dev_test_flag(hdev, HCI_MGMT)) {
> > > > > > + hci_pend_cmd_sync_clear(hdev);
> > > > >
> > > > > Any particular reason why you are not using hci_cmd_sync_clear
> > > > > instead? We also may want to move the clearing logic to
> > > > > hci_dev_close_sync since it should be equivalent to
> > > > > hci_request_cancel_all.
> > > > >
> > > > > > __mgmt_power_off(hdev);
> > > > > > + }
> > > > > >
> > > > > > hci_inquiry_cache_flush(hdev);
> > > > > > hci_pend_le_actions_clear(hdev);
> > > > > > --
> > > > > > 2.39.2.722.g9855ee24e9-goog
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Luiz Augusto von Dentz
> >
> >
> >
> > --
> > Luiz Augusto von Dentz



--
Luiz Augusto von Dentz

2023-03-01 00:18:58

by Zhengping Jiang

[permalink] [raw]
Subject: Re: [kernel PATCH v2 1/1] Bluetooth: hci_sync: clear workqueue before clear mgmt cmd

Hi Luiz,

This looks good to me. I still have a question. Does this prevent a
job scheduled between "hci_cmd_sync_work_list_clear(hdev);" and
"__mgmt_power_off(hdev);"? Otherwise, the chance for a race condition
is still there. Maybe using cancel_work_sync and re-init the workqueue
timer is the right thing to do?

Thanks,
Zhengping

On Tue, Feb 28, 2023 at 1:11 PM Luiz Augusto von Dentz
<[email protected]> wrote:
>
> Hi Zhengping,
>
> On Mon, Feb 27, 2023 at 3:58 PM Zhengping Jiang <[email protected]> wrote:
> >
> > Hi Luiz,
> >
> > Sure. Hope this helps.
> > Here is one log from the user.
> >
> > [ 53.368740] ==================================================================
> > [ 53.376167] BUG: KASAN: use-after-free in set_ssp_sync+0x44/0x154 [bluetooth]
> > [ 53.384303] Read of size 8 at addr ffffff80b7ee0318 by task kworker/u17:0/190
> > [ 53.396342] CPU: 7 PID: 190 Comm: kworker/u17:0 Tainted: G W
> > 5.15.59-lockdep #1 29eed131ef0afd42bc369a6a0ca1c69d3653699b
> > [ 53.408868] Hardware name: Qualcomm Technologies, Inc. sc7280 CRD
> > platform (rev5+) (DT)
> > [ 53.417095] Workqueue: hci0 hci_cmd_sync_work [bluetooth]
> > [ 53.422780] Call trace:
> > [ 53.425310] dump_backtrace+0x0/0x424
> > [ 53.429108] show_stack+0x20/0x2c
> >
> > [ 53.432534] dump_stack_lvl+0x84/0xb4
> > [ 53.436514] print_address_description+0x30/0x2fc
> > [ 53.441369] kasan_report+0x15c/0x19c
> > [ 53.445975] __asan_report_load8_noabort+0x44/0x50
> > [ 53.450910] set_ssp_sync+0x44/0x154 [bluetooth
> > 34f6fa2bbf49f3d7faf6ea04e8755ae16590a6b3]
> > [ 53.460136] hci_cmd_sync_work+0x1c8/0x2c8 [bluetooth
> > 34f6fa2bbf49f3d7faf6ea04e8755ae16590a6b3]
> > [ 53.472214] process_one_work+0x59c/0xa88
> > [ 53.476990] worker_thread+0x81c/0xd18
> > [ 53.480854] kthread+0x2d4/0x3d8
> > [ 53.484272] ret_from_fork+0x10/0x20
> >
> > [ 53.489733] Allocated by task 1162:
> > [ 53.493336] kasan_save_stack+0x38/0x68
> > [ 53.498115] __kasan_kmalloc+0xb4/0xd0
> > [ 53.501993] kmem_cache_alloc_trace+0x29c/0x374
> > [ 53.506661] mgmt_pending_new+0x74/0x200 [bluetooth]
> > [ 53.511905] mgmt_pending_add+0x28/0xec [bluetooth]
> > [ 53.517059] set_ssp+0x2d8/0x5b0 [bluetooth]
> > [ 53.521575] hci_mgmt_cmd+0x5c4/0x8b0 [bluetooth]
> > [ 53.526538] hci_sock_sendmsg+0x28c/0x95c [bluetooth]
> > [ 53.531850] sock_sendmsg+0xb4/0xd8
> > [ 53.535454] sock_write_iter+0x1c0/0x2d0
> > [ 53.539494] do_iter_readv_writev+0x350/0x4e0
> > [ 53.543980] do_iter_write+0xf0/0x2e4
> > [ 53.547747] vfs_writev+0xd0/0x13c
> > [ 53.551254] do_writev+0xe8/0x1fc
> > [ 53.554672] __arm64_sys_writev+0x84/0x98
> > [ 53.558805] invoke_syscall+0x78/0x20c
> > [ 53.562665] el0_svc_common+0x12c/0x2f0
> > [ 53.566618] do_el0_svc+0x94/0x13c
> > [ 53.570125] el0_svc+0x5c/0x108
> > [ 53.573374] el0t_64_sync_handler+0x78/0x108
> > [ 53.577773] el0t_64_sync+0x1a4/0x1a8
> >
> > [ 53.583089] Freed by task 3207:
> > [ 53.586325] kasan_save_stack+0x38/0x68
> > [ 53.590282] kasan_set_track+0x28/0x3c
> > [ 53.594153] kasan_set_free_info+0x28/0x4c
> > [ 53.598369] ____kasan_slab_free+0x138/0x17c
> > [ 53.602767] __kasan_slab_free+0x18/0x28
> > [ 53.606803] slab_free_freelist_hook+0x188/0x260
> > [ 53.611559] kfree+0x138/0x29c
> > [ 53.614708] mgmt_pending_free+0xac/0xdc [bluetooth]
> > [ 53.619948] mgmt_pending_remove+0xd8/0xf0 [bluetooth]
> > [ 53.625357] cmd_complete_rsp+0xc8/0x178 [bluetooth]
> > [ 53.630586] mgmt_pending_foreach+0xa8/0xf8 [bluetooth]
> > [ 53.636076] __mgmt_power_off+0x114/0x26c [bluetooth]
> > [ 53.641390] hci_dev_close_sync+0x314/0x814 [bluetooth]
> > [ 53.646882] hci_dev_do_close+0x3c/0x7c [bluetooth]
> > [ 53.652017] hci_dev_close+0xa4/0x15c [bluetooth]
> > [ 53.656980] hci_sock_ioctl+0x298/0x444 [bluetooth]
> > [ 53.662117] sock_do_ioctl+0xd0/0x1e8
> > [ 53.665900] sock_ioctl+0x4fc/0x72c
> > [ 53.669500] __arm64_sys_ioctl+0x118/0x154
> > [ 53.673726] invoke_syscall+0x78/0x20c
> > [ 53.677587] el0_svc_common+0x12c/0x2f0
> > [ 53.681533] do_el0_svc+0x94/0x13c
> > [ 53.685043] el0_svc+0x5c/0x108
> > [ 53.688278] el0t_64_sync_handler+0x78/0x108
> > [ 53.692677] el0t_64_sync+0x1a4/0x1a8
> >
> > [ 53.697988] Last potentially related work creation:
> > [ 53.703009] kasan_save_stack+0x38/0x68
> > [ 53.706962] kasan_record_aux_stack+0x104/0x130
> > [ 53.711622] __call_rcu+0x14c/0x860
> > [ 53.715212] call_rcu+0x18/0x24
> > [ 53.718448] sk_filter_uncharge+0xc0/0x120
> > [ 53.722667] __sk_destruct+0xb4/0x4a8
> > [ 53.726435] sk_destruct+0x78/0xa0
> > [ 53.729941] __sk_free+0x190/0x270
> > [ 53.733453] sk_free+0x54/0x8c
> > [ 53.736603] deferred_put_nlk_sk+0x1d4/0x20c
> > [ 53.741000] rcu_do_batch+0x3e8/0xd08
> > [ 53.744772] nocb_cb_wait+0xc8/0xa3c
> > [ 53.748453] rcu_nocb_cb_kthread+0x48/0x134
> > [ 53.752768] kthread+0x2d4/0x3d8
> > [ 53.756098] ret_from_fork+0x10/0x20
> >
> > This is another one at a different function but with the same signature.
> >
> > [ 43.363512] ==================================================================
> > [ 43.370966] BUG: KASAN: use-after-free in
> > mgmt_remove_adv_monitor_sync+0x40/0xcc [bluetooth]
> > [ 43.379813] Read of size 8 at addr ffffff8096c28d18 by task kworker/u17:0/192
> > [ 43.387158]
> > [ 43.388705] CPU: 6 PID: 192 Comm: kworker/u17:0 Tainted: G W
> > 5.15.59-lockdep #1 59f35e3dfc07f6688b084869895c7a39892c891a
> > localhost ~ # [ 43.410184] Workqueue: hci0 hci_cmd_sync_work [bluetooth]
> >
> > [ 43.418887] Call trace:
> > [ 43.422407] dump_backtrace+0x0/0x424
> > [ 43.426191] show_stack+0x20/0x2c
> > [ 43.429608] dump_stack_lvl+0x84/0xb4
> > [ 43.433395] print_address_description+0x30/0x2fc
> > [ 43.438243] kasan_report+0x15c/0x19c
> > [ 43.442070] __asan_report_load8_noabort+0x44/0x50
> > hciconfig hci0 up
> > [ 43.447009] mgmt_remove_adv_monitor_sync+0x40/0xcc [bluetooth
> > 8dae3a82177133cfa9626e7322b3b0c8f665102d]
> > [ 43.458568] hci_cmd_sync_work+0x1bc/0x2bc [bluetooth
> > 8dae3a82177133cfa9626e7322b3b0c8f665102d]
> > [ 43.467656] process_one_work+0x59c/0xa88
> > [ 43.472530] worker_thread+0x81c/0xd18
> > [ 43.476410] kthread+0x2d4/0x3d8
> > localhost ~ # [ 43.479753] ret_from_fork+0x10/0x20
> > [ 43.486588]
> > [ 43.488156] Allocated by task 1118:
> > [ 43.491751] kasan_save_stack+0x38/0x68
> > [ 43.495709] __kasan_kmalloc+0xb4/0xd0
> > [ 43.499577] kmem_cache_alloc_trace+0x29c/0x374
> > [ 43.504238] mgmt_pending_new+0x74/0x200 [bluetooth]
> > sleep 2[ 43.509509] mgmt_pending_add+0x28/0xec [bluetooth]
> >
> > [ 43.515244] remove_adv_monitor+0xf8/0x174 [bluetooth]
> > [ 43.521533] hci_mgmt_cmd+0x5c4/0x8b0 [bluetooth]
> > [ 43.526527] hci_sock_sendmsg+0x28c/0x95c [bluetooth]
> > [ 43.531873] sock_sendmsg+0xb4/0xd8
> > [ 43.535472] sock_write_iter+0x1c0/0x2d0
> > [ 43.539519] do_iter_readv_writev+0x350/0x4e0
> > [ 43.544012] do_iter_write+0xf0/0x2e4
> > [ 43.547788] vfs_writev+0xd0/0x13c
> > [ 43.551295] do_writev+0xe8/0x1fc
> > [ 43.554710] __arm64_sys_writev+0x84/0x98
> > [ 43.558838] invoke_syscall+0x78/0x20c
> > [ 43.562709] el0_svc_common+0x12c/0x2f0
> > [ 43.566654] do_el0_svc+0x94/0x13c
> > [ 43.570155] el0_svc+0x5c/0x108
> > [ 43.573391] el0t_64_sync_handler+0x78/0x108
> > [ 43.577785] el0t_64_sync+0x1a4/0x1a8
> > [ 43.581564]
> > [ 43.583115] Freed by task 3217:
> > [ 43.586356] kasan_save_stack+0x38/0x68
> > [ 43.590314] kasan_set_track+0x28/0x3c
> > [ 43.594180] kasan_set_free_info+0x28/0x4c
> > [ 43.598396] ____kasan_slab_free+0x138/0x17c
> > [ 43.602794] __kasan_slab_free+0x18/0x28
> > [ 43.606838] slab_free_freelist_hook+0x188/0x260
> > [ 43.611591] kfree+0x138/0x29c
> > [ 43.614741] mgmt_pending_free+0xac/0xdc [bluetooth]
> > [ 43.620003] mgmt_pending_remove+0xd8/0xf0 [bluetooth]
> > [ 43.625434] cmd_complete_rsp+0xc8/0x178 [bluetooth]
> > [ 43.630686] mgmt_pending_foreach+0xa8/0xf8 [bluetooth]
> > [ 43.636198] __mgmt_power_off+0x114/0x26c [bluetooth]
> > [ 43.641532] hci_dev_close_sync+0x2ec/0x7ec [bluetooth]
> > [ 43.647049] hci_dev_do_close+0x3c/0x7c [bluetooth]
> > [ 43.652209] hci_dev_close+0xac/0x164 [bluetooth]
> > [ 43.657190] hci_sock_ioctl+0x298/0x444 [bluetooth]
> > [ 43.662353] sock_do_ioctl+0xd0/0x1e8
> > [ 43.666134] sock_ioctl+0x4fc/0x72c
> > [ 43.669736] __arm64_sys_ioctl+0x118/0x154
> > [ 43.673961] invoke_syscall+0x78/0x20c
> > [ 43.677820] el0_svc_common+0x12c/0x2f0
> > [ 43.681770] do_el0_svc+0x94/0x13c
> > [ 43.685278] el0_svc+0x5c/0x108
> > [ 43.688514] el0t_64_sync_handler+0x78/0x108
> > [ 43.692913] el0t_64_sync+0x1a4/0x1a8
> >
> > Thanks,
> > Zhengping
>
> Ok, how about we do something like the following:
>
> https://gist.github.com/Vudentz/365d664275e4d2e2af157e47f0502f50
>
> The actual real culprit seem to be __mgmt_power_off does cleanup
> mgmt_pending but that is still accessible via cmd_sync_work_list, this
> is probably how hci_request was designed but in case of cmd_sync we
> normally have the data as part of cmd_sync_work_list.
>
> > On Mon, Feb 27, 2023 at 3:41 PM Luiz Augusto von Dentz
> > <[email protected]> wrote:
> > >
> > > Hi Zhengping,
> > >
> > > On Sun, Feb 26, 2023 at 11:18 PM Zhengping Jiang <[email protected]> wrote:
> > > >
> > > > Hi Luiz,
> > > >
> > > > I have a question. Given that each command in the cmd_sync queue
> > > > should clean up the memory in a callback function. I was wondering if
> > > > the call to cmd_complete_rsp in __mgmt_power_off function is still
> > > > necessary? Will this always risk a race condition that cmd has been
> > > > released when the complete callback or _sync function is run?
> > >
> > > Not sure I follow you here, do you have a stack trace when the user
> > > after free occurs?
> > >
> > > > Thanks,
> > > > Zhengping
> > > >
> > > > On Fri, Feb 24, 2023 at 2:37 PM Zhengping Jiang <[email protected]> wrote:
> > > > >
> > > > > Hi Luiz,
> > > > >
> > > > > > Any particular reason why you are not using hci_cmd_sync_clear
> > > > > > instead?
> > > > >
> > > > > That is a good question and we used hci_cmd_sync_clear in the first
> > > > > version, but it will clear the queue and also close the timer. As a
> > > > > result, when the adapter is turned on again, the timer will not
> > > > > schedule any new jobs. So the option is to use hci_cmd_sync_clear and
> > > > > re-initiate the queue or to write a new function which only clears the
> > > > > queue.
> > > > >
> > > > > > We also may want to move the clearing logic to
> > > > > > hci_dev_close_sync since it should be equivalent to
> > > > > > hci_request_cancel_all.
> > > > >
> > > > > I actually have a question here. I saw
> > > > > "drain_workqueue(hdev->workqueue)" in hci_dev_close_sync and thought
> > > > > it should force clearing the cmd_sync queue. But it seems cannot
> > > > > prevent the use-after-free situation.
> > > > >
> > > > > Any suggestions to improve the solution?
> > > > >
> > > > > Thanks,
> > > > > Zhengping
> > > > >
> > > > >
> > > > > On Fri, Feb 24, 2023 at 1:02 PM Luiz Augusto von Dentz
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > Hi Zhengping,
> > > > > >
> > > > > > On Fri, Feb 24, 2023 at 11:53 AM Zhengping Jiang <[email protected]> wrote:
> > > > > > >
> > > > > > > Clear cmd_sync_work queue before clearing the mgmt cmd list to avoid
> > > > > > > racing conditions which cause use-after-free.
> > > > > > >
> > > > > > > When powering off the adapter, the mgmt cmd list will be cleared. If a
> > > > > > > work is queued in the cmd_sync_work queue at the same time, it will
> > > > > > > cause the risk of use-after-free, as the cmd pointer is not checked
> > > > > > > before use.
> > > > > > >
> > > > > > > Signed-off-by: Zhengping Jiang <[email protected]>
> > > > > > > ---
> > > > > > >
> > > > > > > Changes in v2:
> > > > > > > - Add function to clear the queue without stop the timer
> > > > > > >
> > > > > > > Changes in v1:
> > > > > > > - Clear cmd_sync_work queue before clearing the mgmt cmd list
> > > > > > >
> > > > > > > net/bluetooth/hci_sync.c | 21 ++++++++++++++++++++-
> > > > > > > 1 file changed, 20 insertions(+), 1 deletion(-)
> > > > > > >
> > > > > > > diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
> > > > > > > index 117eedb6f709..b70365dfff0c 100644
> > > > > > > --- a/net/bluetooth/hci_sync.c
> > > > > > > +++ b/net/bluetooth/hci_sync.c
> > > > > > > @@ -636,6 +636,23 @@ void hci_cmd_sync_init(struct hci_dev *hdev)
> > > > > > > INIT_DELAYED_WORK(&hdev->adv_instance_expire, adv_timeout_expire);
> > > > > > > }
> > > > > > >
> > > > > > > +static void hci_pend_cmd_sync_clear(struct hci_dev *hdev)
> > > > > > > +{
> > > > > > > + struct hci_cmd_sync_work_entry *entry, *tmp;
> > > > > > > +
> > > > > > > + mutex_lock(&hdev->cmd_sync_work_lock);
> > > > > > > + list_for_each_entry_safe(entry, tmp, &hdev->cmd_sync_work_list, list) {
> > > > > > > + if (entry->destroy) {
> > > > > > > + hci_req_sync_lock(hdev);
> > > > > > > + entry->destroy(hdev, entry->data, -ECANCELED);
> > > > > > > + hci_req_sync_unlock(hdev);
> > > > > > > + }
> > > > > > > + list_del(&entry->list);
> > > > > > > + kfree(entry);
> > > > > > > + }
> > > > > > > + mutex_unlock(&hdev->cmd_sync_work_lock);
> > > > > > > +}
> > > > > > > +
> > > > > > > void hci_cmd_sync_clear(struct hci_dev *hdev)
> > > > > > > {
> > > > > > > struct hci_cmd_sync_work_entry *entry, *tmp;
> > > > > > > @@ -4842,8 +4859,10 @@ int hci_dev_close_sync(struct hci_dev *hdev)
> > > > > > >
> > > > > > > if (!auto_off && hdev->dev_type == HCI_PRIMARY &&
> > > > > > > !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> > > > > > > - hci_dev_test_flag(hdev, HCI_MGMT))
> > > > > > > + hci_dev_test_flag(hdev, HCI_MGMT)) {
> > > > > > > + hci_pend_cmd_sync_clear(hdev);
> > > > > >
> > > > > > Any particular reason why you are not using hci_cmd_sync_clear
> > > > > > instead? We also may want to move the clearing logic to
> > > > > > hci_dev_close_sync since it should be equivalent to
> > > > > > hci_request_cancel_all.
> > > > > >
> > > > > > > __mgmt_power_off(hdev);
> > > > > > > + }
> > > > > > >
> > > > > > > hci_inquiry_cache_flush(hdev);
> > > > > > > hci_pend_le_actions_clear(hdev);
> > > > > > > --
> > > > > > > 2.39.2.722.g9855ee24e9-goog
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Luiz Augusto von Dentz
> > >
> > >
> > >
> > > --
> > > Luiz Augusto von Dentz
>
>
>
> --
> Luiz Augusto von Dentz

2023-03-01 01:54:45

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: [kernel PATCH v2 1/1] Bluetooth: hci_sync: clear workqueue before clear mgmt cmd

Hi Zhengping,

On Tue, Feb 28, 2023 at 4:18 PM Zhengping Jiang <[email protected]> wrote:
>
> Hi Luiz,
>
> This looks good to me. I still have a question. Does this prevent a
> job scheduled between "hci_cmd_sync_work_list_clear(hdev);" and
> "__mgmt_power_off(hdev);"? Otherwise, the chance for a race condition
> is still there. Maybe using cancel_work_sync and re-init the workqueue
> timer is the right thing to do?

I tried the cancel_work_sync but it doesn't work since to
cmd_sync_work itself can call hci_dev_close_sync so it deadlocks, Ive
also tried stopping new scheduling of new work based on HCI_UP flag
but that causes some tests not to run, perhaps we need a dedicated
flag to ensure cmd_sync cannot be schedule after a certain point but I
could found the exact point it is, anyway I fine leaving this to when
we actually have a more clear understanding or a reproducer.

> Thanks,
> Zhengping
>
> On Tue, Feb 28, 2023 at 1:11 PM Luiz Augusto von Dentz
> <[email protected]> wrote:
> >
> > Hi Zhengping,
> >
> > On Mon, Feb 27, 2023 at 3:58 PM Zhengping Jiang <[email protected]> wrote:
> > >
> > > Hi Luiz,
> > >
> > > Sure. Hope this helps.
> > > Here is one log from the user.
> > >
> > > [ 53.368740] ==================================================================
> > > [ 53.376167] BUG: KASAN: use-after-free in set_ssp_sync+0x44/0x154 [bluetooth]
> > > [ 53.384303] Read of size 8 at addr ffffff80b7ee0318 by task kworker/u17:0/190
> > > [ 53.396342] CPU: 7 PID: 190 Comm: kworker/u17:0 Tainted: G W
> > > 5.15.59-lockdep #1 29eed131ef0afd42bc369a6a0ca1c69d3653699b
> > > [ 53.408868] Hardware name: Qualcomm Technologies, Inc. sc7280 CRD
> > > platform (rev5+) (DT)
> > > [ 53.417095] Workqueue: hci0 hci_cmd_sync_work [bluetooth]
> > > [ 53.422780] Call trace:
> > > [ 53.425310] dump_backtrace+0x0/0x424
> > > [ 53.429108] show_stack+0x20/0x2c
> > >
> > > [ 53.432534] dump_stack_lvl+0x84/0xb4
> > > [ 53.436514] print_address_description+0x30/0x2fc
> > > [ 53.441369] kasan_report+0x15c/0x19c
> > > [ 53.445975] __asan_report_load8_noabort+0x44/0x50
> > > [ 53.450910] set_ssp_sync+0x44/0x154 [bluetooth
> > > 34f6fa2bbf49f3d7faf6ea04e8755ae16590a6b3]
> > > [ 53.460136] hci_cmd_sync_work+0x1c8/0x2c8 [bluetooth
> > > 34f6fa2bbf49f3d7faf6ea04e8755ae16590a6b3]
> > > [ 53.472214] process_one_work+0x59c/0xa88
> > > [ 53.476990] worker_thread+0x81c/0xd18
> > > [ 53.480854] kthread+0x2d4/0x3d8
> > > [ 53.484272] ret_from_fork+0x10/0x20
> > >
> > > [ 53.489733] Allocated by task 1162:
> > > [ 53.493336] kasan_save_stack+0x38/0x68
> > > [ 53.498115] __kasan_kmalloc+0xb4/0xd0
> > > [ 53.501993] kmem_cache_alloc_trace+0x29c/0x374
> > > [ 53.506661] mgmt_pending_new+0x74/0x200 [bluetooth]
> > > [ 53.511905] mgmt_pending_add+0x28/0xec [bluetooth]
> > > [ 53.517059] set_ssp+0x2d8/0x5b0 [bluetooth]
> > > [ 53.521575] hci_mgmt_cmd+0x5c4/0x8b0 [bluetooth]
> > > [ 53.526538] hci_sock_sendmsg+0x28c/0x95c [bluetooth]
> > > [ 53.531850] sock_sendmsg+0xb4/0xd8
> > > [ 53.535454] sock_write_iter+0x1c0/0x2d0
> > > [ 53.539494] do_iter_readv_writev+0x350/0x4e0
> > > [ 53.543980] do_iter_write+0xf0/0x2e4
> > > [ 53.547747] vfs_writev+0xd0/0x13c
> > > [ 53.551254] do_writev+0xe8/0x1fc
> > > [ 53.554672] __arm64_sys_writev+0x84/0x98
> > > [ 53.558805] invoke_syscall+0x78/0x20c
> > > [ 53.562665] el0_svc_common+0x12c/0x2f0
> > > [ 53.566618] do_el0_svc+0x94/0x13c
> > > [ 53.570125] el0_svc+0x5c/0x108
> > > [ 53.573374] el0t_64_sync_handler+0x78/0x108
> > > [ 53.577773] el0t_64_sync+0x1a4/0x1a8
> > >
> > > [ 53.583089] Freed by task 3207:
> > > [ 53.586325] kasan_save_stack+0x38/0x68
> > > [ 53.590282] kasan_set_track+0x28/0x3c
> > > [ 53.594153] kasan_set_free_info+0x28/0x4c
> > > [ 53.598369] ____kasan_slab_free+0x138/0x17c
> > > [ 53.602767] __kasan_slab_free+0x18/0x28
> > > [ 53.606803] slab_free_freelist_hook+0x188/0x260
> > > [ 53.611559] kfree+0x138/0x29c
> > > [ 53.614708] mgmt_pending_free+0xac/0xdc [bluetooth]
> > > [ 53.619948] mgmt_pending_remove+0xd8/0xf0 [bluetooth]
> > > [ 53.625357] cmd_complete_rsp+0xc8/0x178 [bluetooth]
> > > [ 53.630586] mgmt_pending_foreach+0xa8/0xf8 [bluetooth]
> > > [ 53.636076] __mgmt_power_off+0x114/0x26c [bluetooth]
> > > [ 53.641390] hci_dev_close_sync+0x314/0x814 [bluetooth]
> > > [ 53.646882] hci_dev_do_close+0x3c/0x7c [bluetooth]
> > > [ 53.652017] hci_dev_close+0xa4/0x15c [bluetooth]
> > > [ 53.656980] hci_sock_ioctl+0x298/0x444 [bluetooth]
> > > [ 53.662117] sock_do_ioctl+0xd0/0x1e8
> > > [ 53.665900] sock_ioctl+0x4fc/0x72c
> > > [ 53.669500] __arm64_sys_ioctl+0x118/0x154
> > > [ 53.673726] invoke_syscall+0x78/0x20c
> > > [ 53.677587] el0_svc_common+0x12c/0x2f0
> > > [ 53.681533] do_el0_svc+0x94/0x13c
> > > [ 53.685043] el0_svc+0x5c/0x108
> > > [ 53.688278] el0t_64_sync_handler+0x78/0x108
> > > [ 53.692677] el0t_64_sync+0x1a4/0x1a8
> > >
> > > [ 53.697988] Last potentially related work creation:
> > > [ 53.703009] kasan_save_stack+0x38/0x68
> > > [ 53.706962] kasan_record_aux_stack+0x104/0x130
> > > [ 53.711622] __call_rcu+0x14c/0x860
> > > [ 53.715212] call_rcu+0x18/0x24
> > > [ 53.718448] sk_filter_uncharge+0xc0/0x120
> > > [ 53.722667] __sk_destruct+0xb4/0x4a8
> > > [ 53.726435] sk_destruct+0x78/0xa0
> > > [ 53.729941] __sk_free+0x190/0x270
> > > [ 53.733453] sk_free+0x54/0x8c
> > > [ 53.736603] deferred_put_nlk_sk+0x1d4/0x20c
> > > [ 53.741000] rcu_do_batch+0x3e8/0xd08
> > > [ 53.744772] nocb_cb_wait+0xc8/0xa3c
> > > [ 53.748453] rcu_nocb_cb_kthread+0x48/0x134
> > > [ 53.752768] kthread+0x2d4/0x3d8
> > > [ 53.756098] ret_from_fork+0x10/0x20
> > >
> > > This is another one at a different function but with the same signature.
> > >
> > > [ 43.363512] ==================================================================
> > > [ 43.370966] BUG: KASAN: use-after-free in
> > > mgmt_remove_adv_monitor_sync+0x40/0xcc [bluetooth]
> > > [ 43.379813] Read of size 8 at addr ffffff8096c28d18 by task kworker/u17:0/192
> > > [ 43.387158]
> > > [ 43.388705] CPU: 6 PID: 192 Comm: kworker/u17:0 Tainted: G W
> > > 5.15.59-lockdep #1 59f35e3dfc07f6688b084869895c7a39892c891a
> > > localhost ~ # [ 43.410184] Workqueue: hci0 hci_cmd_sync_work [bluetooth]
> > >
> > > [ 43.418887] Call trace:
> > > [ 43.422407] dump_backtrace+0x0/0x424
> > > [ 43.426191] show_stack+0x20/0x2c
> > > [ 43.429608] dump_stack_lvl+0x84/0xb4
> > > [ 43.433395] print_address_description+0x30/0x2fc
> > > [ 43.438243] kasan_report+0x15c/0x19c
> > > [ 43.442070] __asan_report_load8_noabort+0x44/0x50
> > > hciconfig hci0 up
> > > [ 43.447009] mgmt_remove_adv_monitor_sync+0x40/0xcc [bluetooth
> > > 8dae3a82177133cfa9626e7322b3b0c8f665102d]
> > > [ 43.458568] hci_cmd_sync_work+0x1bc/0x2bc [bluetooth
> > > 8dae3a82177133cfa9626e7322b3b0c8f665102d]
> > > [ 43.467656] process_one_work+0x59c/0xa88
> > > [ 43.472530] worker_thread+0x81c/0xd18
> > > [ 43.476410] kthread+0x2d4/0x3d8
> > > localhost ~ # [ 43.479753] ret_from_fork+0x10/0x20
> > > [ 43.486588]
> > > [ 43.488156] Allocated by task 1118:
> > > [ 43.491751] kasan_save_stack+0x38/0x68
> > > [ 43.495709] __kasan_kmalloc+0xb4/0xd0
> > > [ 43.499577] kmem_cache_alloc_trace+0x29c/0x374
> > > [ 43.504238] mgmt_pending_new+0x74/0x200 [bluetooth]
> > > sleep 2[ 43.509509] mgmt_pending_add+0x28/0xec [bluetooth]
> > >
> > > [ 43.515244] remove_adv_monitor+0xf8/0x174 [bluetooth]
> > > [ 43.521533] hci_mgmt_cmd+0x5c4/0x8b0 [bluetooth]
> > > [ 43.526527] hci_sock_sendmsg+0x28c/0x95c [bluetooth]
> > > [ 43.531873] sock_sendmsg+0xb4/0xd8
> > > [ 43.535472] sock_write_iter+0x1c0/0x2d0
> > > [ 43.539519] do_iter_readv_writev+0x350/0x4e0
> > > [ 43.544012] do_iter_write+0xf0/0x2e4
> > > [ 43.547788] vfs_writev+0xd0/0x13c
> > > [ 43.551295] do_writev+0xe8/0x1fc
> > > [ 43.554710] __arm64_sys_writev+0x84/0x98
> > > [ 43.558838] invoke_syscall+0x78/0x20c
> > > [ 43.562709] el0_svc_common+0x12c/0x2f0
> > > [ 43.566654] do_el0_svc+0x94/0x13c
> > > [ 43.570155] el0_svc+0x5c/0x108
> > > [ 43.573391] el0t_64_sync_handler+0x78/0x108
> > > [ 43.577785] el0t_64_sync+0x1a4/0x1a8
> > > [ 43.581564]
> > > [ 43.583115] Freed by task 3217:
> > > [ 43.586356] kasan_save_stack+0x38/0x68
> > > [ 43.590314] kasan_set_track+0x28/0x3c
> > > [ 43.594180] kasan_set_free_info+0x28/0x4c
> > > [ 43.598396] ____kasan_slab_free+0x138/0x17c
> > > [ 43.602794] __kasan_slab_free+0x18/0x28
> > > [ 43.606838] slab_free_freelist_hook+0x188/0x260
> > > [ 43.611591] kfree+0x138/0x29c
> > > [ 43.614741] mgmt_pending_free+0xac/0xdc [bluetooth]
> > > [ 43.620003] mgmt_pending_remove+0xd8/0xf0 [bluetooth]
> > > [ 43.625434] cmd_complete_rsp+0xc8/0x178 [bluetooth]
> > > [ 43.630686] mgmt_pending_foreach+0xa8/0xf8 [bluetooth]
> > > [ 43.636198] __mgmt_power_off+0x114/0x26c [bluetooth]
> > > [ 43.641532] hci_dev_close_sync+0x2ec/0x7ec [bluetooth]
> > > [ 43.647049] hci_dev_do_close+0x3c/0x7c [bluetooth]
> > > [ 43.652209] hci_dev_close+0xac/0x164 [bluetooth]
> > > [ 43.657190] hci_sock_ioctl+0x298/0x444 [bluetooth]
> > > [ 43.662353] sock_do_ioctl+0xd0/0x1e8
> > > [ 43.666134] sock_ioctl+0x4fc/0x72c
> > > [ 43.669736] __arm64_sys_ioctl+0x118/0x154
> > > [ 43.673961] invoke_syscall+0x78/0x20c
> > > [ 43.677820] el0_svc_common+0x12c/0x2f0
> > > [ 43.681770] do_el0_svc+0x94/0x13c
> > > [ 43.685278] el0_svc+0x5c/0x108
> > > [ 43.688514] el0t_64_sync_handler+0x78/0x108
> > > [ 43.692913] el0t_64_sync+0x1a4/0x1a8
> > >
> > > Thanks,
> > > Zhengping
> >
> > Ok, how about we do something like the following:
> >
> > https://gist.github.com/Vudentz/365d664275e4d2e2af157e47f0502f50
> >
> > The actual real culprit seem to be __mgmt_power_off does cleanup
> > mgmt_pending but that is still accessible via cmd_sync_work_list, this
> > is probably how hci_request was designed but in case of cmd_sync we
> > normally have the data as part of cmd_sync_work_list.
> >
> > > On Mon, Feb 27, 2023 at 3:41 PM Luiz Augusto von Dentz
> > > <[email protected]> wrote:
> > > >
> > > > Hi Zhengping,
> > > >
> > > > On Sun, Feb 26, 2023 at 11:18 PM Zhengping Jiang <[email protected]> wrote:
> > > > >
> > > > > Hi Luiz,
> > > > >
> > > > > I have a question. Given that each command in the cmd_sync queue
> > > > > should clean up the memory in a callback function. I was wondering if
> > > > > the call to cmd_complete_rsp in __mgmt_power_off function is still
> > > > > necessary? Will this always risk a race condition that cmd has been
> > > > > released when the complete callback or _sync function is run?
> > > >
> > > > Not sure I follow you here, do you have a stack trace when the user
> > > > after free occurs?
> > > >
> > > > > Thanks,
> > > > > Zhengping
> > > > >
> > > > > On Fri, Feb 24, 2023 at 2:37 PM Zhengping Jiang <[email protected]> wrote:
> > > > > >
> > > > > > Hi Luiz,
> > > > > >
> > > > > > > Any particular reason why you are not using hci_cmd_sync_clear
> > > > > > > instead?
> > > > > >
> > > > > > That is a good question and we used hci_cmd_sync_clear in the first
> > > > > > version, but it will clear the queue and also close the timer. As a
> > > > > > result, when the adapter is turned on again, the timer will not
> > > > > > schedule any new jobs. So the option is to use hci_cmd_sync_clear and
> > > > > > re-initiate the queue or to write a new function which only clears the
> > > > > > queue.
> > > > > >
> > > > > > > We also may want to move the clearing logic to
> > > > > > > hci_dev_close_sync since it should be equivalent to
> > > > > > > hci_request_cancel_all.
> > > > > >
> > > > > > I actually have a question here. I saw
> > > > > > "drain_workqueue(hdev->workqueue)" in hci_dev_close_sync and thought
> > > > > > it should force clearing the cmd_sync queue. But it seems cannot
> > > > > > prevent the use-after-free situation.
> > > > > >
> > > > > > Any suggestions to improve the solution?
> > > > > >
> > > > > > Thanks,
> > > > > > Zhengping
> > > > > >
> > > > > >
> > > > > > On Fri, Feb 24, 2023 at 1:02 PM Luiz Augusto von Dentz
> > > > > > <[email protected]> wrote:
> > > > > > >
> > > > > > > Hi Zhengping,
> > > > > > >
> > > > > > > On Fri, Feb 24, 2023 at 11:53 AM Zhengping Jiang <[email protected]> wrote:
> > > > > > > >
> > > > > > > > Clear cmd_sync_work queue before clearing the mgmt cmd list to avoid
> > > > > > > > racing conditions which cause use-after-free.
> > > > > > > >
> > > > > > > > When powering off the adapter, the mgmt cmd list will be cleared. If a
> > > > > > > > work is queued in the cmd_sync_work queue at the same time, it will
> > > > > > > > cause the risk of use-after-free, as the cmd pointer is not checked
> > > > > > > > before use.
> > > > > > > >
> > > > > > > > Signed-off-by: Zhengping Jiang <[email protected]>
> > > > > > > > ---
> > > > > > > >
> > > > > > > > Changes in v2:
> > > > > > > > - Add function to clear the queue without stop the timer
> > > > > > > >
> > > > > > > > Changes in v1:
> > > > > > > > - Clear cmd_sync_work queue before clearing the mgmt cmd list
> > > > > > > >
> > > > > > > > net/bluetooth/hci_sync.c | 21 ++++++++++++++++++++-
> > > > > > > > 1 file changed, 20 insertions(+), 1 deletion(-)
> > > > > > > >
> > > > > > > > diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
> > > > > > > > index 117eedb6f709..b70365dfff0c 100644
> > > > > > > > --- a/net/bluetooth/hci_sync.c
> > > > > > > > +++ b/net/bluetooth/hci_sync.c
> > > > > > > > @@ -636,6 +636,23 @@ void hci_cmd_sync_init(struct hci_dev *hdev)
> > > > > > > > INIT_DELAYED_WORK(&hdev->adv_instance_expire, adv_timeout_expire);
> > > > > > > > }
> > > > > > > >
> > > > > > > > +static void hci_pend_cmd_sync_clear(struct hci_dev *hdev)
> > > > > > > > +{
> > > > > > > > + struct hci_cmd_sync_work_entry *entry, *tmp;
> > > > > > > > +
> > > > > > > > + mutex_lock(&hdev->cmd_sync_work_lock);
> > > > > > > > + list_for_each_entry_safe(entry, tmp, &hdev->cmd_sync_work_list, list) {
> > > > > > > > + if (entry->destroy) {
> > > > > > > > + hci_req_sync_lock(hdev);
> > > > > > > > + entry->destroy(hdev, entry->data, -ECANCELED);
> > > > > > > > + hci_req_sync_unlock(hdev);
> > > > > > > > + }
> > > > > > > > + list_del(&entry->list);
> > > > > > > > + kfree(entry);
> > > > > > > > + }
> > > > > > > > + mutex_unlock(&hdev->cmd_sync_work_lock);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > void hci_cmd_sync_clear(struct hci_dev *hdev)
> > > > > > > > {
> > > > > > > > struct hci_cmd_sync_work_entry *entry, *tmp;
> > > > > > > > @@ -4842,8 +4859,10 @@ int hci_dev_close_sync(struct hci_dev *hdev)
> > > > > > > >
> > > > > > > > if (!auto_off && hdev->dev_type == HCI_PRIMARY &&
> > > > > > > > !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> > > > > > > > - hci_dev_test_flag(hdev, HCI_MGMT))
> > > > > > > > + hci_dev_test_flag(hdev, HCI_MGMT)) {
> > > > > > > > + hci_pend_cmd_sync_clear(hdev);
> > > > > > >
> > > > > > > Any particular reason why you are not using hci_cmd_sync_clear
> > > > > > > instead? We also may want to move the clearing logic to
> > > > > > > hci_dev_close_sync since it should be equivalent to
> > > > > > > hci_request_cancel_all.
> > > > > > >
> > > > > > > > __mgmt_power_off(hdev);
> > > > > > > > + }
> > > > > > > >
> > > > > > > > hci_inquiry_cache_flush(hdev);
> > > > > > > > hci_pend_le_actions_clear(hdev);
> > > > > > > > --
> > > > > > > > 2.39.2.722.g9855ee24e9-goog
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Luiz Augusto von Dentz
> > > >
> > > >
> > > >
> > > > --
> > > > Luiz Augusto von Dentz
> >
> >
> >
> > --
> > Luiz Augusto von Dentz



--
Luiz Augusto von Dentz

2023-03-01 02:12:18

by Zhengping Jiang

[permalink] [raw]
Subject: Re: [kernel PATCH v2 1/1] Bluetooth: hci_sync: clear workqueue before clear mgmt cmd

Hi Luiz,

Thanks for testing these options!

> perhaps we need a dedicated flag to ensure cmd_sync cannot be schedule after a certain point
This actually sounds promising to me. I would think about this.

This does not happen in regular use, but one of our customers has a
script to run a stress test by turning on/off the adapter and
rebooting for a few cycles. Then the crash can be reproduced. If you
have any new ideas, I can schedule a test.

Just to confirm if you will submit the current patch or you would hold
it for a solid solution? The current patch to clear the cmd_sync list
indeed reduces the crash frequency.

Best,
Zhengping

On Tue, Feb 28, 2023 at 5:53 PM Luiz Augusto von Dentz
<[email protected]> wrote:
>
> Hi Zhengping,
>
> On Tue, Feb 28, 2023 at 4:18 PM Zhengping Jiang <[email protected]> wrote:
> >
> > Hi Luiz,
> >
> > This looks good to me. I still have a question. Does this prevent a
> > job scheduled between "hci_cmd_sync_work_list_clear(hdev);" and
> > "__mgmt_power_off(hdev);"? Otherwise, the chance for a race condition
> > is still there. Maybe using cancel_work_sync and re-init the workqueue
> > timer is the right thing to do?
>
> I tried the cancel_work_sync but it doesn't work since to
> cmd_sync_work itself can call hci_dev_close_sync so it deadlocks, Ive
> also tried stopping new scheduling of new work based on HCI_UP flag
> but that causes some tests not to run, perhaps we need a dedicated
> flag to ensure cmd_sync cannot be schedule after a certain point but I
> could found the exact point it is, anyway I fine leaving this to when
> we actually have a more clear understanding or a reproducer.
>
> > Thanks,
> > Zhengping
> >
> > On Tue, Feb 28, 2023 at 1:11 PM Luiz Augusto von Dentz
> > <[email protected]> wrote:
> > >
> > > Hi Zhengping,
> > >
> > > On Mon, Feb 27, 2023 at 3:58 PM Zhengping Jiang <[email protected]> wrote:
> > > >
> > > > Hi Luiz,
> > > >
> > > > Sure. Hope this helps.
> > > > Here is one log from the user.
> > > >
> > > > [ 53.368740] ==================================================================
> > > > [ 53.376167] BUG: KASAN: use-after-free in set_ssp_sync+0x44/0x154 [bluetooth]
> > > > [ 53.384303] Read of size 8 at addr ffffff80b7ee0318 by task kworker/u17:0/190
> > > > [ 53.396342] CPU: 7 PID: 190 Comm: kworker/u17:0 Tainted: G W
> > > > 5.15.59-lockdep #1 29eed131ef0afd42bc369a6a0ca1c69d3653699b
> > > > [ 53.408868] Hardware name: Qualcomm Technologies, Inc. sc7280 CRD
> > > > platform (rev5+) (DT)
> > > > [ 53.417095] Workqueue: hci0 hci_cmd_sync_work [bluetooth]
> > > > [ 53.422780] Call trace:
> > > > [ 53.425310] dump_backtrace+0x0/0x424
> > > > [ 53.429108] show_stack+0x20/0x2c
> > > >
> > > > [ 53.432534] dump_stack_lvl+0x84/0xb4
> > > > [ 53.436514] print_address_description+0x30/0x2fc
> > > > [ 53.441369] kasan_report+0x15c/0x19c
> > > > [ 53.445975] __asan_report_load8_noabort+0x44/0x50
> > > > [ 53.450910] set_ssp_sync+0x44/0x154 [bluetooth
> > > > 34f6fa2bbf49f3d7faf6ea04e8755ae16590a6b3]
> > > > [ 53.460136] hci_cmd_sync_work+0x1c8/0x2c8 [bluetooth
> > > > 34f6fa2bbf49f3d7faf6ea04e8755ae16590a6b3]
> > > > [ 53.472214] process_one_work+0x59c/0xa88
> > > > [ 53.476990] worker_thread+0x81c/0xd18
> > > > [ 53.480854] kthread+0x2d4/0x3d8
> > > > [ 53.484272] ret_from_fork+0x10/0x20
> > > >
> > > > [ 53.489733] Allocated by task 1162:
> > > > [ 53.493336] kasan_save_stack+0x38/0x68
> > > > [ 53.498115] __kasan_kmalloc+0xb4/0xd0
> > > > [ 53.501993] kmem_cache_alloc_trace+0x29c/0x374
> > > > [ 53.506661] mgmt_pending_new+0x74/0x200 [bluetooth]
> > > > [ 53.511905] mgmt_pending_add+0x28/0xec [bluetooth]
> > > > [ 53.517059] set_ssp+0x2d8/0x5b0 [bluetooth]
> > > > [ 53.521575] hci_mgmt_cmd+0x5c4/0x8b0 [bluetooth]
> > > > [ 53.526538] hci_sock_sendmsg+0x28c/0x95c [bluetooth]
> > > > [ 53.531850] sock_sendmsg+0xb4/0xd8
> > > > [ 53.535454] sock_write_iter+0x1c0/0x2d0
> > > > [ 53.539494] do_iter_readv_writev+0x350/0x4e0
> > > > [ 53.543980] do_iter_write+0xf0/0x2e4
> > > > [ 53.547747] vfs_writev+0xd0/0x13c
> > > > [ 53.551254] do_writev+0xe8/0x1fc
> > > > [ 53.554672] __arm64_sys_writev+0x84/0x98
> > > > [ 53.558805] invoke_syscall+0x78/0x20c
> > > > [ 53.562665] el0_svc_common+0x12c/0x2f0
> > > > [ 53.566618] do_el0_svc+0x94/0x13c
> > > > [ 53.570125] el0_svc+0x5c/0x108
> > > > [ 53.573374] el0t_64_sync_handler+0x78/0x108
> > > > [ 53.577773] el0t_64_sync+0x1a4/0x1a8
> > > >
> > > > [ 53.583089] Freed by task 3207:
> > > > [ 53.586325] kasan_save_stack+0x38/0x68
> > > > [ 53.590282] kasan_set_track+0x28/0x3c
> > > > [ 53.594153] kasan_set_free_info+0x28/0x4c
> > > > [ 53.598369] ____kasan_slab_free+0x138/0x17c
> > > > [ 53.602767] __kasan_slab_free+0x18/0x28
> > > > [ 53.606803] slab_free_freelist_hook+0x188/0x260
> > > > [ 53.611559] kfree+0x138/0x29c
> > > > [ 53.614708] mgmt_pending_free+0xac/0xdc [bluetooth]
> > > > [ 53.619948] mgmt_pending_remove+0xd8/0xf0 [bluetooth]
> > > > [ 53.625357] cmd_complete_rsp+0xc8/0x178 [bluetooth]
> > > > [ 53.630586] mgmt_pending_foreach+0xa8/0xf8 [bluetooth]
> > > > [ 53.636076] __mgmt_power_off+0x114/0x26c [bluetooth]
> > > > [ 53.641390] hci_dev_close_sync+0x314/0x814 [bluetooth]
> > > > [ 53.646882] hci_dev_do_close+0x3c/0x7c [bluetooth]
> > > > [ 53.652017] hci_dev_close+0xa4/0x15c [bluetooth]
> > > > [ 53.656980] hci_sock_ioctl+0x298/0x444 [bluetooth]
> > > > [ 53.662117] sock_do_ioctl+0xd0/0x1e8
> > > > [ 53.665900] sock_ioctl+0x4fc/0x72c
> > > > [ 53.669500] __arm64_sys_ioctl+0x118/0x154
> > > > [ 53.673726] invoke_syscall+0x78/0x20c
> > > > [ 53.677587] el0_svc_common+0x12c/0x2f0
> > > > [ 53.681533] do_el0_svc+0x94/0x13c
> > > > [ 53.685043] el0_svc+0x5c/0x108
> > > > [ 53.688278] el0t_64_sync_handler+0x78/0x108
> > > > [ 53.692677] el0t_64_sync+0x1a4/0x1a8
> > > >
> > > > [ 53.697988] Last potentially related work creation:
> > > > [ 53.703009] kasan_save_stack+0x38/0x68
> > > > [ 53.706962] kasan_record_aux_stack+0x104/0x130
> > > > [ 53.711622] __call_rcu+0x14c/0x860
> > > > [ 53.715212] call_rcu+0x18/0x24
> > > > [ 53.718448] sk_filter_uncharge+0xc0/0x120
> > > > [ 53.722667] __sk_destruct+0xb4/0x4a8
> > > > [ 53.726435] sk_destruct+0x78/0xa0
> > > > [ 53.729941] __sk_free+0x190/0x270
> > > > [ 53.733453] sk_free+0x54/0x8c
> > > > [ 53.736603] deferred_put_nlk_sk+0x1d4/0x20c
> > > > [ 53.741000] rcu_do_batch+0x3e8/0xd08
> > > > [ 53.744772] nocb_cb_wait+0xc8/0xa3c
> > > > [ 53.748453] rcu_nocb_cb_kthread+0x48/0x134
> > > > [ 53.752768] kthread+0x2d4/0x3d8
> > > > [ 53.756098] ret_from_fork+0x10/0x20
> > > >
> > > > This is another one at a different function but with the same signature.
> > > >
> > > > [ 43.363512] ==================================================================
> > > > [ 43.370966] BUG: KASAN: use-after-free in
> > > > mgmt_remove_adv_monitor_sync+0x40/0xcc [bluetooth]
> > > > [ 43.379813] Read of size 8 at addr ffffff8096c28d18 by task kworker/u17:0/192
> > > > [ 43.387158]
> > > > [ 43.388705] CPU: 6 PID: 192 Comm: kworker/u17:0 Tainted: G W
> > > > 5.15.59-lockdep #1 59f35e3dfc07f6688b084869895c7a39892c891a
> > > > localhost ~ # [ 43.410184] Workqueue: hci0 hci_cmd_sync_work [bluetooth]
> > > >
> > > > [ 43.418887] Call trace:
> > > > [ 43.422407] dump_backtrace+0x0/0x424
> > > > [ 43.426191] show_stack+0x20/0x2c
> > > > [ 43.429608] dump_stack_lvl+0x84/0xb4
> > > > [ 43.433395] print_address_description+0x30/0x2fc
> > > > [ 43.438243] kasan_report+0x15c/0x19c
> > > > [ 43.442070] __asan_report_load8_noabort+0x44/0x50
> > > > hciconfig hci0 up
> > > > [ 43.447009] mgmt_remove_adv_monitor_sync+0x40/0xcc [bluetooth
> > > > 8dae3a82177133cfa9626e7322b3b0c8f665102d]
> > > > [ 43.458568] hci_cmd_sync_work+0x1bc/0x2bc [bluetooth
> > > > 8dae3a82177133cfa9626e7322b3b0c8f665102d]
> > > > [ 43.467656] process_one_work+0x59c/0xa88
> > > > [ 43.472530] worker_thread+0x81c/0xd18
> > > > [ 43.476410] kthread+0x2d4/0x3d8
> > > > localhost ~ # [ 43.479753] ret_from_fork+0x10/0x20
> > > > [ 43.486588]
> > > > [ 43.488156] Allocated by task 1118:
> > > > [ 43.491751] kasan_save_stack+0x38/0x68
> > > > [ 43.495709] __kasan_kmalloc+0xb4/0xd0
> > > > [ 43.499577] kmem_cache_alloc_trace+0x29c/0x374
> > > > [ 43.504238] mgmt_pending_new+0x74/0x200 [bluetooth]
> > > > sleep 2[ 43.509509] mgmt_pending_add+0x28/0xec [bluetooth]
> > > >
> > > > [ 43.515244] remove_adv_monitor+0xf8/0x174 [bluetooth]
> > > > [ 43.521533] hci_mgmt_cmd+0x5c4/0x8b0 [bluetooth]
> > > > [ 43.526527] hci_sock_sendmsg+0x28c/0x95c [bluetooth]
> > > > [ 43.531873] sock_sendmsg+0xb4/0xd8
> > > > [ 43.535472] sock_write_iter+0x1c0/0x2d0
> > > > [ 43.539519] do_iter_readv_writev+0x350/0x4e0
> > > > [ 43.544012] do_iter_write+0xf0/0x2e4
> > > > [ 43.547788] vfs_writev+0xd0/0x13c
> > > > [ 43.551295] do_writev+0xe8/0x1fc
> > > > [ 43.554710] __arm64_sys_writev+0x84/0x98
> > > > [ 43.558838] invoke_syscall+0x78/0x20c
> > > > [ 43.562709] el0_svc_common+0x12c/0x2f0
> > > > [ 43.566654] do_el0_svc+0x94/0x13c
> > > > [ 43.570155] el0_svc+0x5c/0x108
> > > > [ 43.573391] el0t_64_sync_handler+0x78/0x108
> > > > [ 43.577785] el0t_64_sync+0x1a4/0x1a8
> > > > [ 43.581564]
> > > > [ 43.583115] Freed by task 3217:
> > > > [ 43.586356] kasan_save_stack+0x38/0x68
> > > > [ 43.590314] kasan_set_track+0x28/0x3c
> > > > [ 43.594180] kasan_set_free_info+0x28/0x4c
> > > > [ 43.598396] ____kasan_slab_free+0x138/0x17c
> > > > [ 43.602794] __kasan_slab_free+0x18/0x28
> > > > [ 43.606838] slab_free_freelist_hook+0x188/0x260
> > > > [ 43.611591] kfree+0x138/0x29c
> > > > [ 43.614741] mgmt_pending_free+0xac/0xdc [bluetooth]
> > > > [ 43.620003] mgmt_pending_remove+0xd8/0xf0 [bluetooth]
> > > > [ 43.625434] cmd_complete_rsp+0xc8/0x178 [bluetooth]
> > > > [ 43.630686] mgmt_pending_foreach+0xa8/0xf8 [bluetooth]
> > > > [ 43.636198] __mgmt_power_off+0x114/0x26c [bluetooth]
> > > > [ 43.641532] hci_dev_close_sync+0x2ec/0x7ec [bluetooth]
> > > > [ 43.647049] hci_dev_do_close+0x3c/0x7c [bluetooth]
> > > > [ 43.652209] hci_dev_close+0xac/0x164 [bluetooth]
> > > > [ 43.657190] hci_sock_ioctl+0x298/0x444 [bluetooth]
> > > > [ 43.662353] sock_do_ioctl+0xd0/0x1e8
> > > > [ 43.666134] sock_ioctl+0x4fc/0x72c
> > > > [ 43.669736] __arm64_sys_ioctl+0x118/0x154
> > > > [ 43.673961] invoke_syscall+0x78/0x20c
> > > > [ 43.677820] el0_svc_common+0x12c/0x2f0
> > > > [ 43.681770] do_el0_svc+0x94/0x13c
> > > > [ 43.685278] el0_svc+0x5c/0x108
> > > > [ 43.688514] el0t_64_sync_handler+0x78/0x108
> > > > [ 43.692913] el0t_64_sync+0x1a4/0x1a8
> > > >
> > > > Thanks,
> > > > Zhengping
> > >
> > > Ok, how about we do something like the following:
> > >
> > > https://gist.github.com/Vudentz/365d664275e4d2e2af157e47f0502f50
> > >
> > > The actual real culprit seem to be __mgmt_power_off does cleanup
> > > mgmt_pending but that is still accessible via cmd_sync_work_list, this
> > > is probably how hci_request was designed but in case of cmd_sync we
> > > normally have the data as part of cmd_sync_work_list.
> > >
> > > > On Mon, Feb 27, 2023 at 3:41 PM Luiz Augusto von Dentz
> > > > <[email protected]> wrote:
> > > > >
> > > > > Hi Zhengping,
> > > > >
> > > > > On Sun, Feb 26, 2023 at 11:18 PM Zhengping Jiang <[email protected]> wrote:
> > > > > >
> > > > > > Hi Luiz,
> > > > > >
> > > > > > I have a question. Given that each command in the cmd_sync queue
> > > > > > should clean up the memory in a callback function. I was wondering if
> > > > > > the call to cmd_complete_rsp in __mgmt_power_off function is still
> > > > > > necessary? Will this always risk a race condition that cmd has been
> > > > > > released when the complete callback or _sync function is run?
> > > > >
> > > > > Not sure I follow you here, do you have a stack trace when the user
> > > > > after free occurs?
> > > > >
> > > > > > Thanks,
> > > > > > Zhengping
> > > > > >
> > > > > > On Fri, Feb 24, 2023 at 2:37 PM Zhengping Jiang <[email protected]> wrote:
> > > > > > >
> > > > > > > Hi Luiz,
> > > > > > >
> > > > > > > > Any particular reason why you are not using hci_cmd_sync_clear
> > > > > > > > instead?
> > > > > > >
> > > > > > > That is a good question and we used hci_cmd_sync_clear in the first
> > > > > > > version, but it will clear the queue and also close the timer. As a
> > > > > > > result, when the adapter is turned on again, the timer will not
> > > > > > > schedule any new jobs. So the option is to use hci_cmd_sync_clear and
> > > > > > > re-initiate the queue or to write a new function which only clears the
> > > > > > > queue.
> > > > > > >
> > > > > > > > We also may want to move the clearing logic to
> > > > > > > > hci_dev_close_sync since it should be equivalent to
> > > > > > > > hci_request_cancel_all.
> > > > > > >
> > > > > > > I actually have a question here. I saw
> > > > > > > "drain_workqueue(hdev->workqueue)" in hci_dev_close_sync and thought
> > > > > > > it should force clearing the cmd_sync queue. But it seems cannot
> > > > > > > prevent the use-after-free situation.
> > > > > > >
> > > > > > > Any suggestions to improve the solution?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Zhengping
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Feb 24, 2023 at 1:02 PM Luiz Augusto von Dentz
> > > > > > > <[email protected]> wrote:
> > > > > > > >
> > > > > > > > Hi Zhengping,
> > > > > > > >
> > > > > > > > On Fri, Feb 24, 2023 at 11:53 AM Zhengping Jiang <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > Clear cmd_sync_work queue before clearing the mgmt cmd list to avoid
> > > > > > > > > racing conditions which cause use-after-free.
> > > > > > > > >
> > > > > > > > > When powering off the adapter, the mgmt cmd list will be cleared. If a
> > > > > > > > > work is queued in the cmd_sync_work queue at the same time, it will
> > > > > > > > > cause the risk of use-after-free, as the cmd pointer is not checked
> > > > > > > > > before use.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Zhengping Jiang <[email protected]>
> > > > > > > > > ---
> > > > > > > > >
> > > > > > > > > Changes in v2:
> > > > > > > > > - Add function to clear the queue without stop the timer
> > > > > > > > >
> > > > > > > > > Changes in v1:
> > > > > > > > > - Clear cmd_sync_work queue before clearing the mgmt cmd list
> > > > > > > > >
> > > > > > > > > net/bluetooth/hci_sync.c | 21 ++++++++++++++++++++-
> > > > > > > > > 1 file changed, 20 insertions(+), 1 deletion(-)
> > > > > > > > >
> > > > > > > > > diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
> > > > > > > > > index 117eedb6f709..b70365dfff0c 100644
> > > > > > > > > --- a/net/bluetooth/hci_sync.c
> > > > > > > > > +++ b/net/bluetooth/hci_sync.c
> > > > > > > > > @@ -636,6 +636,23 @@ void hci_cmd_sync_init(struct hci_dev *hdev)
> > > > > > > > > INIT_DELAYED_WORK(&hdev->adv_instance_expire, adv_timeout_expire);
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > +static void hci_pend_cmd_sync_clear(struct hci_dev *hdev)
> > > > > > > > > +{
> > > > > > > > > + struct hci_cmd_sync_work_entry *entry, *tmp;
> > > > > > > > > +
> > > > > > > > > + mutex_lock(&hdev->cmd_sync_work_lock);
> > > > > > > > > + list_for_each_entry_safe(entry, tmp, &hdev->cmd_sync_work_list, list) {
> > > > > > > > > + if (entry->destroy) {
> > > > > > > > > + hci_req_sync_lock(hdev);
> > > > > > > > > + entry->destroy(hdev, entry->data, -ECANCELED);
> > > > > > > > > + hci_req_sync_unlock(hdev);
> > > > > > > > > + }
> > > > > > > > > + list_del(&entry->list);
> > > > > > > > > + kfree(entry);
> > > > > > > > > + }
> > > > > > > > > + mutex_unlock(&hdev->cmd_sync_work_lock);
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > void hci_cmd_sync_clear(struct hci_dev *hdev)
> > > > > > > > > {
> > > > > > > > > struct hci_cmd_sync_work_entry *entry, *tmp;
> > > > > > > > > @@ -4842,8 +4859,10 @@ int hci_dev_close_sync(struct hci_dev *hdev)
> > > > > > > > >
> > > > > > > > > if (!auto_off && hdev->dev_type == HCI_PRIMARY &&
> > > > > > > > > !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> > > > > > > > > - hci_dev_test_flag(hdev, HCI_MGMT))
> > > > > > > > > + hci_dev_test_flag(hdev, HCI_MGMT)) {
> > > > > > > > > + hci_pend_cmd_sync_clear(hdev);
> > > > > > > >
> > > > > > > > Any particular reason why you are not using hci_cmd_sync_clear
> > > > > > > > instead? We also may want to move the clearing logic to
> > > > > > > > hci_dev_close_sync since it should be equivalent to
> > > > > > > > hci_request_cancel_all.
> > > > > > > >
> > > > > > > > > __mgmt_power_off(hdev);
> > > > > > > > > + }
> > > > > > > > >
> > > > > > > > > hci_inquiry_cache_flush(hdev);
> > > > > > > > > hci_pend_le_actions_clear(hdev);
> > > > > > > > > --
> > > > > > > > > 2.39.2.722.g9855ee24e9-goog
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Luiz Augusto von Dentz
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Luiz Augusto von Dentz
> > >
> > >
> > >
> > > --
> > > Luiz Augusto von Dentz
>
>
>
> --
> Luiz Augusto von Dentz