2022-09-02 11:28:57

by Tetsuo Handa

[permalink] [raw]
Subject: [PATCH] Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works

syzbot is reporting attempt to schedule hdev->cmd_work work from system_wq
WQ into hdev->workqueue WQ which is under draining operation [1], for
commit c8efcc2589464ac7 ("workqueue: allow chained queueing during
destruction") does not allow such operation.

The check introduced by commit 877afadad2dce8aa ("Bluetooth: When HCI work
queue is drained, only queue chained work") was incomplete.

Use hdev->workqueue WQ when queuing hdev->{cmd,ncmd}_timer works because
hci_{cmd,ncmd}_timeout() calls queue_work(hdev->workqueue). Also, protect
the queuing operation with RCU read lock in order to avoid calling
queue_delayed_work() after cancel_delayed_work() completed.

Link: https://syzkaller.appspot.com/bug?extid=243b7d89777f90f7613b [1]
Reported-by: syzbot <[email protected]>
Signed-off-by: Tetsuo Handa <[email protected]>
Fixes: 877afadad2dce8aa ("Bluetooth: When HCI work queue is drained, only queue chained work")
---
This is a difficult to trigger race condition, and therefore reproducer is
not available. Please do logical check in addition to automated testing.

net/bluetooth/hci_core.c | 15 +++++++++++++--
net/bluetooth/hci_event.c | 6 ++++--
2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index b3a5a3cc9372..9873d2e67988 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -597,6 +597,15 @@ static int hci_dev_do_reset(struct hci_dev *hdev)

/* Cancel these to avoid queueing non-chained pending work */
hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
+ /* Wait for
+ *
+ * if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
+ * queue_delayed_work(&hdev->{cmd,ncmd}_timer)
+ *
+ * inside RCU section to see the flag or complete scheduling.
+ */
+ synchronize_rcu();
+ /* Explicitly cancel works in case scheduled after setting the flag. */
cancel_delayed_work(&hdev->cmd_timer);
cancel_delayed_work(&hdev->ncmd_timer);

@@ -4056,12 +4065,14 @@ static void hci_cmd_work(struct work_struct *work)
if (res < 0)
__hci_cmd_sync_cancel(hdev, -res);

+ rcu_read_lock();
if (test_bit(HCI_RESET, &hdev->flags) ||
hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
cancel_delayed_work(&hdev->cmd_timer);
else
- schedule_delayed_work(&hdev->cmd_timer,
- HCI_CMD_TIMEOUT);
+ queue_delayed_work(hdev->workqueue, &hdev->cmd_timer,
+ HCI_CMD_TIMEOUT);
+ rcu_read_unlock();
} else {
skb_queue_head(&hdev->cmd_q, skb);
queue_work(hdev->workqueue, &hdev->cmd_work);
diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
index 6643c9c20fa4..d6f0e6ca0e7e 100644
--- a/net/bluetooth/hci_event.c
+++ b/net/bluetooth/hci_event.c
@@ -3766,16 +3766,18 @@ static inline void handle_cmd_cnt_and_timer(struct hci_dev *hdev, u8 ncmd)
{
cancel_delayed_work(&hdev->cmd_timer);

+ rcu_read_lock();
if (!test_bit(HCI_RESET, &hdev->flags)) {
if (ncmd) {
cancel_delayed_work(&hdev->ncmd_timer);
atomic_set(&hdev->cmd_cnt, 1);
} else {
if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
- schedule_delayed_work(&hdev->ncmd_timer,
- HCI_NCMD_TIMEOUT);
+ queue_delayed_work(hdev->workqueue, &hdev->ncmd_timer,
+ HCI_NCMD_TIMEOUT);
}
}
+ rcu_read_unlock();
}

static u8 hci_cc_le_read_buffer_size_v2(struct hci_dev *hdev, void *data,
--
2.18.4


2022-09-02 12:01:36

by bluez.test.bot

[permalink] [raw]
Subject: RE: Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works

This is automated email and please do not reply to this email!

Dear submitter,

Thank you for submitting the patches to the linux bluetooth mailing list.
This is a CI test results with your patch series:
PW Link:https://patchwork.kernel.org/project/bluetooth/list/?series=673604

---Test result---

Test Summary:
CheckPatch FAIL 1.66 seconds
GitLint PASS 0.82 seconds
SubjectPrefix PASS 0.67 seconds
BuildKernel PASS 34.07 seconds
BuildKernel32 PASS 30.21 seconds
Incremental Build with patchesPASS 44.33 seconds
TestRunner: Setup PASS 512.05 seconds
TestRunner: l2cap-tester PASS 16.75 seconds
TestRunner: iso-tester PASS 15.58 seconds
TestRunner: bnep-tester PASS 6.27 seconds
TestRunner: mgmt-tester PASS 100.77 seconds
TestRunner: rfcomm-tester PASS 9.46 seconds
TestRunner: sco-tester PASS 9.39 seconds
TestRunner: smp-tester PASS 9.49 seconds
TestRunner: userchan-tester PASS 6.46 seconds

Details
##############################
Test: CheckPatch - FAIL - 1.66 seconds
Run checkpatch.pl script with rule in .checkpatch.conf
Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works\ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit fatal: unsaf ("ace/src' is owned by someone else)")'
#66:
commit c8efcc2589464ac7 ("workqueue: allow chained queueing during
destruction") does not allow such operation.

ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit fatal: unsaf ("ace/src' is owned by someone else)")'
#69:
The check introduced by commit 877afadad2dce8aa ("Bluetooth: When HCI work
queue is drained, only queue chained work") was incomplete.

total: 2 errors, 0 warnings, 0 checks, 51 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
mechanically convert to the typical style using --fix or --fix-inplace.

/github/workspace/src/12964073.patch has style problems, please review.

NOTE: Ignored message types: UNKNOWN_COMMIT_ID

NOTE: If any of the errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.




---
Regards,
Linux Bluetooth

2022-09-02 18:58:17

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works

Hi Tetsuo,

On Fri, Sep 2, 2022 at 4:23 AM Tetsuo Handa
<[email protected]> wrote:
>
> syzbot is reporting attempt to schedule hdev->cmd_work work from system_wq
> WQ into hdev->workqueue WQ which is under draining operation [1], for
> commit c8efcc2589464ac7 ("workqueue: allow chained queueing during
> destruction") does not allow such operation.
>
> The check introduced by commit 877afadad2dce8aa ("Bluetooth: When HCI work
> queue is drained, only queue chained work") was incomplete.
>
> Use hdev->workqueue WQ when queuing hdev->{cmd,ncmd}_timer works because
> hci_{cmd,ncmd}_timeout() calls queue_work(hdev->workqueue). Also, protect
> the queuing operation with RCU read lock in order to avoid calling
> queue_delayed_work() after cancel_delayed_work() completed.

Didn't we introduce HCI_CMD_DRAIN_WORKQUEUE exactly to avoid queuing
after the cancel pattern? I wonder if wouldn't be better to introduce
some function that disables/enables the workqueue so we don't have to
do extra tracking in the driver/subsystem?

> Link: https://syzkaller.appspot.com/bug?extid=243b7d89777f90f7613b [1]
> Reported-by: syzbot <[email protected]>
> Signed-off-by: Tetsuo Handa <[email protected]>
> Fixes: 877afadad2dce8aa ("Bluetooth: When HCI work queue is drained, only queue chained work")
> ---
> This is a difficult to trigger race condition, and therefore reproducer is
> not available. Please do logical check in addition to automated testing.
>
> net/bluetooth/hci_core.c | 15 +++++++++++++--
> net/bluetooth/hci_event.c | 6 ++++--
> 2 files changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> index b3a5a3cc9372..9873d2e67988 100644
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -597,6 +597,15 @@ static int hci_dev_do_reset(struct hci_dev *hdev)
>
> /* Cancel these to avoid queueing non-chained pending work */
> hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
> + /* Wait for
> + *
> + * if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
> + * queue_delayed_work(&hdev->{cmd,ncmd}_timer)
> + *
> + * inside RCU section to see the flag or complete scheduling.
> + */
> + synchronize_rcu();
> + /* Explicitly cancel works in case scheduled after setting the flag. */
> cancel_delayed_work(&hdev->cmd_timer);
> cancel_delayed_work(&hdev->ncmd_timer);
>
> @@ -4056,12 +4065,14 @@ static void hci_cmd_work(struct work_struct *work)
> if (res < 0)
> __hci_cmd_sync_cancel(hdev, -res);
>
> + rcu_read_lock();
> if (test_bit(HCI_RESET, &hdev->flags) ||
> hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
> cancel_delayed_work(&hdev->cmd_timer);
> else
> - schedule_delayed_work(&hdev->cmd_timer,
> - HCI_CMD_TIMEOUT);
> + queue_delayed_work(hdev->workqueue, &hdev->cmd_timer,
> + HCI_CMD_TIMEOUT);
> + rcu_read_unlock();
> } else {
> skb_queue_head(&hdev->cmd_q, skb);
> queue_work(hdev->workqueue, &hdev->cmd_work);
> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
> index 6643c9c20fa4..d6f0e6ca0e7e 100644
> --- a/net/bluetooth/hci_event.c
> +++ b/net/bluetooth/hci_event.c
> @@ -3766,16 +3766,18 @@ static inline void handle_cmd_cnt_and_timer(struct hci_dev *hdev, u8 ncmd)
> {
> cancel_delayed_work(&hdev->cmd_timer);
>
> + rcu_read_lock();
> if (!test_bit(HCI_RESET, &hdev->flags)) {
> if (ncmd) {
> cancel_delayed_work(&hdev->ncmd_timer);
> atomic_set(&hdev->cmd_cnt, 1);
> } else {
> if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
> - schedule_delayed_work(&hdev->ncmd_timer,
> - HCI_NCMD_TIMEOUT);
> + queue_delayed_work(hdev->workqueue, &hdev->ncmd_timer,
> + HCI_NCMD_TIMEOUT);
> }
> }
> + rcu_read_unlock();
> }
>
> static u8 hci_cc_le_read_buffer_size_v2(struct hci_dev *hdev, void *data,
> --
> 2.18.4
>


--
Luiz Augusto von Dentz

2022-09-02 21:35:09

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works

Hi Tetsuo,

On Fri, Sep 2, 2022 at 11:45 AM Luiz Augusto von Dentz
<[email protected]> wrote:
>
> Hi Tetsuo,
>
> On Fri, Sep 2, 2022 at 4:23 AM Tetsuo Handa
> <[email protected]> wrote:
> >
> > syzbot is reporting attempt to schedule hdev->cmd_work work from system_wq
> > WQ into hdev->workqueue WQ which is under draining operation [1], for
> > commit c8efcc2589464ac7 ("workqueue: allow chained queueing during
> > destruction") does not allow such operation.
> >
> > The check introduced by commit 877afadad2dce8aa ("Bluetooth: When HCI work
> > queue is drained, only queue chained work") was incomplete.
> >
> > Use hdev->workqueue WQ when queuing hdev->{cmd,ncmd}_timer works because
> > hci_{cmd,ncmd}_timeout() calls queue_work(hdev->workqueue). Also, protect
> > the queuing operation with RCU read lock in order to avoid calling
> > queue_delayed_work() after cancel_delayed_work() completed.
>
> Didn't we introduce HCI_CMD_DRAIN_WORKQUEUE exactly to avoid queuing
> after the cancel pattern? I wonder if wouldn't be better to introduce
> some function that disables/enables the workqueue so we don't have to
> do extra tracking in the driver/subsystem?
>
> > Link: https://syzkaller.appspot.com/bug?extid=243b7d89777f90f7613b [1]
> > Reported-by: syzbot <[email protected]>
> > Signed-off-by: Tetsuo Handa <[email protected]>
> > Fixes: 877afadad2dce8aa ("Bluetooth: When HCI work queue is drained, only queue chained work")
> > ---
> > This is a difficult to trigger race condition, and therefore reproducer is
> > not available. Please do logical check in addition to automated testing.
> >
> > net/bluetooth/hci_core.c | 15 +++++++++++++--
> > net/bluetooth/hci_event.c | 6 ++++--
> > 2 files changed, 17 insertions(+), 4 deletions(-)
> >
> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> > index b3a5a3cc9372..9873d2e67988 100644
> > --- a/net/bluetooth/hci_core.c
> > +++ b/net/bluetooth/hci_core.c
> > @@ -597,6 +597,15 @@ static int hci_dev_do_reset(struct hci_dev *hdev)
> >
> > /* Cancel these to avoid queueing non-chained pending work */
> > hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
> > + /* Wait for
> > + *
> > + * if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
> > + * queue_delayed_work(&hdev->{cmd,ncmd}_timer)
> > + *
> > + * inside RCU section to see the flag or complete scheduling.
> > + */
> > + synchronize_rcu();
> > + /* Explicitly cancel works in case scheduled after setting the flag. */
> > cancel_delayed_work(&hdev->cmd_timer);
> > cancel_delayed_work(&hdev->ncmd_timer);
> >
> > @@ -4056,12 +4065,14 @@ static void hci_cmd_work(struct work_struct *work)
> > if (res < 0)
> > __hci_cmd_sync_cancel(hdev, -res);
> >
> > + rcu_read_lock();
> > if (test_bit(HCI_RESET, &hdev->flags) ||
> > hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
> > cancel_delayed_work(&hdev->cmd_timer);
> > else
> > - schedule_delayed_work(&hdev->cmd_timer,
> > - HCI_CMD_TIMEOUT);
> > + queue_delayed_work(hdev->workqueue, &hdev->cmd_timer,
> > + HCI_CMD_TIMEOUT);
> > + rcu_read_unlock();
> > } else {
> > skb_queue_head(&hdev->cmd_q, skb);
> > queue_work(hdev->workqueue, &hdev->cmd_work);
> > diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
> > index 6643c9c20fa4..d6f0e6ca0e7e 100644
> > --- a/net/bluetooth/hci_event.c
> > +++ b/net/bluetooth/hci_event.c
> > @@ -3766,16 +3766,18 @@ static inline void handle_cmd_cnt_and_timer(struct hci_dev *hdev, u8 ncmd)
> > {
> > cancel_delayed_work(&hdev->cmd_timer);
> >
> > + rcu_read_lock();
> > if (!test_bit(HCI_RESET, &hdev->flags)) {
> > if (ncmd) {
> > cancel_delayed_work(&hdev->ncmd_timer);
> > atomic_set(&hdev->cmd_cnt, 1);
> > } else {
> > if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
> > - schedule_delayed_work(&hdev->ncmd_timer,
> > - HCI_NCMD_TIMEOUT);
> > + queue_delayed_work(hdev->workqueue, &hdev->ncmd_timer,
> > + HCI_NCMD_TIMEOUT);
> > }
> > }
> > + rcu_read_unlock();
> > }
> >
> > static u8 hci_cc_le_read_buffer_size_v2(struct hci_dev *hdev, void *data,
> > --
> > 2.18.4
> >

I was thinking on doing something like the following:

https://gist.github.com/Vudentz/a2288015fedbed366fcdb612264a9d16

Since there is no reason to queue any command if we are draining and
are gonna reset at the end it is pretty useless to queue commands at
that point.

>
> --
> Luiz Augusto von Dentz



--
Luiz Augusto von Dentz

2022-09-03 06:51:55

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works

On 2022/09/03 6:31, Luiz Augusto von Dentz wrote:
> Hi Tetsuo,
>
> On Fri, Sep 2, 2022 at 11:45 AM Luiz Augusto von Dentz <[email protected]> wrote:
>> Didn't we introduce HCI_CMD_DRAIN_WORKQUEUE exactly to avoid queuing
>> after the cancel pattern?

HCI_CMD_DRAIN_WORKQUEUE does not help for this case.

What extid=243b7d89777f90f7613b is reporting is

hci_cmd_timeout() { hci_dev_do_reset() {
starts sleeping due to e.g. preemption
hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE); // Sets HCI_CMD_DRAIN_WORKQUEUE flag
cancel_delayed_work(&hdev->cmd_timer); // does nothing because hci_cmd_timeout() is already running
cancel_delayed_work(&hdev->ncmd_timer);
drain_workqueue(hdev->workqueue) {
sets __WQ_DRAINING flag on hdev->workqueue
starts waiting for completion of all works on hdev->workqueue
finishes sleeping due to e.g. preemption
queue_work(hdev->workqueue, &hdev->cmd_work) // <= complains attempt to queue work from system_wq into __WQ_DRAINING hdev->workqueue
}
finishes waiting for completion of all works on hdev->workqueue
clears __WQ_DRAINING flag
}
}

race condition. Notice that cancel_delayed_work() does not wait for
completion of already started hci_cmd_timeout() callback.

If you need to wait for completion of already started callback,
you need to use _sync version (e.g. cancel_delayed_work_sync()).
And watch out for locking dependency when using _sync version.

>> I wonder if wouldn't be better to introduce
>> some function that disables/enables the workqueue so we don't have to
>> do extra tracking in the driver/subsystem?
>>
>
> I was thinking on doing something like the following:
>
> https://gist.github.com/Vudentz/a2288015fedbed366fcdb612264a9d16

That patch does not close race, for

@@ -4037,6 +4038,10 @@ static void hci_cmd_work(struct work_struct *work)
BT_DBG("%s cmd_cnt %d cmd queued %d", hdev->name,
atomic_read(&hdev->cmd_cnt), skb_queue_len(&hdev->cmd_q));

+ /* Don't queue while draining */
+ if (hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
+ return;
/*
* BUG: WE ARE FREE TO SLEEP FOR ARBITRARY DURATION IMMEDIATELY AFTER CHECKING THE FLAG.
* ANY "TEST AND DO SOMETHING" NEEDS TO BE PROTECTED BY A LOCK MECHANISM.
*/
+
/* Send queued commands */
if (atomic_read(&hdev->cmd_cnt)) {
skb = skb_dequeue(&hdev->cmd_q);

. In other words, HCI_CMD_DRAIN_WORKQUEUE does not fix what extid=63bed493aebbf6872647 is reporting.

If "TEST AND DO SOMETHING" does not sleep, RCU is a handy lock mechanism.

>
> Since there is no reason to queue any command if we are draining and
> are gonna reset at the end it is pretty useless to queue commands at
> that point.

Then, you can add that check.

2022-09-04 02:15:44

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works

Hi Tetsuo,

On Fri, Sep 2, 2022 at 11:49 PM Tetsuo Handa
<[email protected]> wrote:
>
> On 2022/09/03 6:31, Luiz Augusto von Dentz wrote:
> > Hi Tetsuo,
> >
> > On Fri, Sep 2, 2022 at 11:45 AM Luiz Augusto von Dentz <[email protected]> wrote:
> >> Didn't we introduce HCI_CMD_DRAIN_WORKQUEUE exactly to avoid queuing
> >> after the cancel pattern?
>
> HCI_CMD_DRAIN_WORKQUEUE does not help for this case.
>
> What extid=243b7d89777f90f7613b is reporting is
>
> hci_cmd_timeout() { hci_dev_do_reset() {
> starts sleeping due to e.g. preemption
> hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE); // Sets HCI_CMD_DRAIN_WORKQUEUE flag
> cancel_delayed_work(&hdev->cmd_timer); // does nothing because hci_cmd_timeout() is already running
> cancel_delayed_work(&hdev->ncmd_timer);
> drain_workqueue(hdev->workqueue) {
> sets __WQ_DRAINING flag on hdev->workqueue
> starts waiting for completion of all works on hdev->workqueue
> finishes sleeping due to e.g. preemption
> queue_work(hdev->workqueue, &hdev->cmd_work) // <= complains attempt to queue work from system_wq into __WQ_DRAINING hdev->workqueue

And we can check for __WQ_DRAINING? Anyway checking
HCI_CMD_DRAIN_WORKQUEUE seems useless so we either have to check if
queue_work can be used or not.

> }
> finishes waiting for completion of all works on hdev->workqueue
> clears __WQ_DRAINING flag
> }
> }
>
> race condition. Notice that cancel_delayed_work() does not wait for
> completion of already started hci_cmd_timeout() callback.
>
> If you need to wait for completion of already started callback,
> you need to use _sync version (e.g. cancel_delayed_work_sync()).
> And watch out for locking dependency when using _sync version.
>
> >> I wonder if wouldn't be better to introduce
> >> some function that disables/enables the workqueue so we don't have to
> >> do extra tracking in the driver/subsystem?
> >>
> >
> > I was thinking on doing something like the following:
> >
> > https://gist.github.com/Vudentz/a2288015fedbed366fcdb612264a9d16
>
> That patch does not close race, for
>
> @@ -4037,6 +4038,10 @@ static void hci_cmd_work(struct work_struct *work)
> BT_DBG("%s cmd_cnt %d cmd queued %d", hdev->name,
> atomic_read(&hdev->cmd_cnt), skb_queue_len(&hdev->cmd_q));
>
> + /* Don't queue while draining */
> + if (hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
> + return;
> /*
> * BUG: WE ARE FREE TO SLEEP FOR ARBITRARY DURATION IMMEDIATELY AFTER CHECKING THE FLAG.
> * ANY "TEST AND DO SOMETHING" NEEDS TO BE PROTECTED BY A LOCK MECHANISM.
> */

Then we need a lock not a flag.

> /* Send queued commands */
> if (atomic_read(&hdev->cmd_cnt)) {
> skb = skb_dequeue(&hdev->cmd_q);
>
> . In other words, HCI_CMD_DRAIN_WORKQUEUE does not fix what extid=63bed493aebbf6872647 is reporting.
>
> If "TEST AND DO SOMETHING" does not sleep, RCU is a handy lock mechanism.
>
> >
> > Since there is no reason to queue any command if we are draining and
> > are gonna reset at the end it is pretty useless to queue commands at
> > that point.
>
> Then, you can add that check.
>


--
Luiz Augusto von Dentz

2022-09-04 02:28:09

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works

Hello,

On Sat, Sep 03, 2022 at 07:11:18PM -0700, Luiz Augusto von Dentz wrote:
> And we can check for __WQ_DRAINING? Anyway checking

Please don't do that. That's an internal flag. It shouldn't be *that*
difficult to avoid this without peeking into wq internal state.

Thanks.

--
tejun

2022-09-05 08:33:40

by Schspa Shi

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works


Tejun Heo <[email protected]> writes:

> Hello,
>
> On Sat, Sep 03, 2022 at 07:11:18PM -0700, Luiz Augusto von Dentz wrote:
>> And we can check for __WQ_DRAINING? Anyway checking
>
> Please don't do that. That's an internal flag. It shouldn't be *that*
> difficult to avoid this without peeking into wq internal state.
>
> Thanks.

It seems we only need to change hdev->{cmd,ncmd}_timer to
hdev->workqueue, there will be no race because drain_workqueue will
flush all pending work internally.
Any new timeout work will see HCI_CMD_DRAIN_WORKQUEUE flags after we
cancel and flushed all the delayed work.

--
BRs
Schspa Shi

2022-09-05 11:40:26

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works

On 2022/09/05 17:24, Schspa Shi wrote:
>
> Tejun Heo <[email protected]> writes:
>
>> Hello,
>>
>> On Sat, Sep 03, 2022 at 07:11:18PM -0700, Luiz Augusto von Dentz wrote:
>>> And we can check for __WQ_DRAINING? Anyway checking
>>
>> Please don't do that. That's an internal flag. It shouldn't be *that*
>> difficult to avoid this without peeking into wq internal state.
>>
>> Thanks.
>
> It seems we only need to change hdev->{cmd,ncmd}_timer to
> hdev->workqueue, there will be no race because drain_workqueue will
> flush all pending work internally.

True for queue_work(), not always true for queue_delayed_work(). Explained below.

> Any new timeout work will see HCI_CMD_DRAIN_WORKQUEUE flags after we
> cancel and flushed all the delayed work.
>

If you don't mind calling

queue_work(&hdev->cmd_work) followed by hci_cmd_work() (case A below)

and/or

queue_delayed_work(&hdev->ncmd_timer) potentially followed by hci_ncmd_timeout()/hci_reset_dev() (case B and C below)

after observing HCI_CMD_DRAIN_WORKQUEUE flag.
We need to use RCU protection if you mind one of these.



Case A:

hci_dev_do_reset() {
hci_cmd_work() {
if (test_bit(HCI_RESET, &hdev->flags) ||
hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
cancel_delayed_work(&hdev->cmd_timer);
else
queue_delayed_work(hdev->workqueue, &hdev->cmd_timer, HCI_CMD_TIMEOUT);
} else {
skb_queue_head(&hdev->cmd_q, skb);
hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
cancel_delayed_work(&hdev->cmd_timer);
cancel_delayed_work(&hdev->ncmd_timer);
queue_work(hdev->workqueue, &hdev->cmd_work); // Queuing after setting HCI_CMD_DRAIN_WORKQUEUE despite the intent of HCI_CMD_DRAIN_WORKQUEUE...
drain_workqueue(hdev->workqueue); // Will wait for hci_cmd_timeout() queued by queue_work() to complete.

}

// Actual flush() happens here.

hci_dev_clear_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
}



Case B:

hci_dev_do_reset() {
handle_cmd_cnt_and_timer(struct hci_dev *hdev, u8 ncmd) {
if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
cancel_delayed_work(&hdev->cmd_timer);
queue_delayed_work(hdev->workqueue, &hdev->ncmd_timer, HCI_NCMD_TIMEOUT);
cancel_delayed_work(&hdev->ncmd_timer); // May or may not cancel hci_ncmd_timeout() queued by queue_delayed_work().
drain_workqueue(hdev->workqueue); // Will wait for hci_ncmd_timeout() queued by queue_delayed_work() to complete if cancel_delayed_work() failed to cancel.

}

// Actual flush() happens here.

hci_dev_clear_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
}



Case C:

hci_dev_do_reset() {
handle_cmd_cnt_and_timer(struct hci_dev *hdev, u8 ncmd) {
if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
cancel_delayed_work(&hdev->cmd_timer);
cancel_delayed_work(&hdev->ncmd_timer); // Does nothing.
queue_delayed_work(hdev->workqueue, &hdev->ncmd_timer, HCI_NCMD_TIMEOUT);
drain_workqueue(hdev->workqueue); // Will wait for hci_ncmd_timeout() queued by queue_delayed_work() to complete if delay timer has expired.

}

// Actual flush() happens here, but hci_ncmd_timeout() queued by queue_delayed_work() can be running if delay timer has not expired as of calling drain_workqueue().

hci_dev_clear_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
}

2022-09-05 12:43:13

by Schspa Shi

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works


Tetsuo Handa <[email protected]> writes:

> On 2022/09/05 17:24, Schspa Shi wrote:
>>
>> Tejun Heo <[email protected]> writes:
>>
>>> Hello,
>>>
>>> On Sat, Sep 03, 2022 at 07:11:18PM -0700, Luiz Augusto von Dentz wrote:
>>>> And we can check for __WQ_DRAINING? Anyway checking
>>>
>>> Please don't do that. That's an internal flag. It shouldn't be *that*
>>> difficult to avoid this without peeking into wq internal state.
>>>
>>> Thanks.
>>
>> It seems we only need to change hdev->{cmd,ncmd}_timer to
>> hdev->workqueue, there will be no race because drain_workqueue will
>> flush all pending work internally.
>
> True for queue_work(), not always true for queue_delayed_work(). Explained below.
>

Ok, you are right, got it now.

>> Any new timeout work will see HCI_CMD_DRAIN_WORKQUEUE flags after we
>> cancel and flushed all the delayed work.
>>
>
> If you don't mind calling
>
> queue_work(&hdev->cmd_work) followed by hci_cmd_work() (case A below)
>
> and/or
>
> queue_delayed_work(&hdev->ncmd_timer) potentially followed by hci_ncmd_timeout()/hci_reset_dev() (case B and C below)
>
> after observing HCI_CMD_DRAIN_WORKQUEUE flag.
> We need to use RCU protection if you mind one of these.
>
>
>
> Case A:
>
> hci_dev_do_reset() {
> hci_cmd_work() {
> if (test_bit(HCI_RESET, &hdev->flags) ||
> hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
> cancel_delayed_work(&hdev->cmd_timer);
> else
> queue_delayed_work(hdev->workqueue, &hdev->cmd_timer, HCI_CMD_TIMEOUT);
> } else {
> skb_queue_head(&hdev->cmd_q, skb);
> hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
> cancel_delayed_work(&hdev->cmd_timer);
> cancel_delayed_work(&hdev->ncmd_timer);
> queue_work(hdev->workqueue, &hdev->cmd_work); // Queuing after setting HCI_CMD_DRAIN_WORKQUEUE despite the intent of HCI_CMD_DRAIN_WORKQUEUE...
> drain_workqueue(hdev->workqueue); // Will wait for hci_cmd_timeout() queued by queue_work() to complete.
>
> }
>
> // Actual flush() happens here.
>
> hci_dev_clear_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
> }
>
>
>
> Case B:
>
> hci_dev_do_reset() {
> handle_cmd_cnt_and_timer(struct hci_dev *hdev, u8 ncmd) {
> if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
> hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
> cancel_delayed_work(&hdev->cmd_timer);
> queue_delayed_work(hdev->workqueue, &hdev->ncmd_timer, HCI_NCMD_TIMEOUT);
> cancel_delayed_work(&hdev->ncmd_timer); // May or may not cancel hci_ncmd_timeout() queued by queue_delayed_work().
> drain_workqueue(hdev->workqueue); // Will wait for hci_ncmd_timeout() queued by queue_delayed_work() to complete if cancel_delayed_work() failed to cancel.
>
> }
>
> // Actual flush() happens here.
>
> hci_dev_clear_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
> }
>
>
>
> Case C:
>
> hci_dev_do_reset() {
> handle_cmd_cnt_and_timer(struct hci_dev *hdev, u8 ncmd) {
> if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
> hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
> cancel_delayed_work(&hdev->cmd_timer);
> cancel_delayed_work(&hdev->ncmd_timer); // Does nothing.
> queue_delayed_work(hdev->workqueue, &hdev->ncmd_timer, HCI_NCMD_TIMEOUT);
> drain_workqueue(hdev->workqueue); // Will wait for hci_ncmd_timeout() queued by queue_delayed_work() to complete if delay timer has expired.
>
> }
>
> // Actual flush() happens here, but hci_ncmd_timeout() queued by queue_delayed_work() can be running if delay timer has not expired as of calling drain_workqueue().
>
> hci_dev_clear_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
> }

--
BRs
Schspa Shi

2022-09-19 17:46:24

by patchwork-bot+bluetooth

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works

Hello:

This patch was applied to bluetooth/bluetooth-next.git (master)
by Luiz Augusto von Dentz <[email protected]>:

On Fri, 2 Sep 2022 20:23:48 +0900 you wrote:
> syzbot is reporting attempt to schedule hdev->cmd_work work from system_wq
> WQ into hdev->workqueue WQ which is under draining operation [1], for
> commit c8efcc2589464ac7 ("workqueue: allow chained queueing during
> destruction") does not allow such operation.
>
> The check introduced by commit 877afadad2dce8aa ("Bluetooth: When HCI work
> queue is drained, only queue chained work") was incomplete.
>
> [...]

Here is the summary with links:
- Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works
https://git.kernel.org/bluetooth/bluetooth-next/c/deee93d13d38

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html