From: Ganapathi Bhat <[email protected]>
We came across a corner case issue during reboot stress test
in which hciconfig shows BD address is all zero. Reason is we
don't get response for HCI RESET command during initialization
The issue is tracked to a race where USB subsystem calls
btusb_intr_complete() to deliver a data(NOOP frame) received
on interrupt endpoint. HCI_RUNNING flag is not yet set by
bluetooth subsystem. So we ignore that frame and return.
As we missed to resubmit the buffer to interrupt endpoint in
this case, we don't get response for BT reset command downloaded
after this.
This patch handles the corner case to resolve zero BD address
problem.
Signed-off-by: Ganapathi Bhat <[email protected]>
Signed-off-by: Amitkumar Karwar <[email protected]>
---
drivers/bluetooth/btusb.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c
index 811f9b9..b5596ac 100644
--- a/drivers/bluetooth/btusb.c
+++ b/drivers/bluetooth/btusb.c
@@ -607,10 +607,7 @@ static void btusb_intr_complete(struct urb *urb)
BT_DBG("%s urb %p status %d count %d", hdev->name, urb, urb->status,
urb->actual_length);
- if (!test_bit(HCI_RUNNING, &hdev->flags))
- return;
-
- if (urb->status == 0) {
+ if (urb->status == 0 && test_bit(HCI_RUNNING, &hdev->flags)) {
hdev->stat.byte_rx += urb->actual_length;
if (btusb_recv_intr(data, urb->transfer_buffer,
--
1.9.1
Hi Marcel,
> From: Amitkumar Karwar [mailto:[email protected]]
> Sent: Tuesday, October 18, 2016 6:27 PM
> To: [email protected]
> Cc: [email protected]; [email protected]; Cathy Luo;
> Nishant Sarmukadam; Ganapathi Bhat; Amitkumar Karwar
> Subject: [PATCH] btusb: fix zero BD address problem during stress test
>
> From: Ganapathi Bhat <[email protected]>
>
> We came across a corner case issue during reboot stress test in which
> hciconfig shows BD address is all zero. Reason is we don't get response
> for HCI RESET command during initialization
>
> The issue is tracked to a race where USB subsystem calls
> btusb_intr_complete() to deliver a data(NOOP frame) received on
> interrupt endpoint. HCI_RUNNING flag is not yet set by bluetooth
> subsystem. So we ignore that frame and return.
>
> As we missed to resubmit the buffer to interrupt endpoint in this case,
> we don't get response for BT reset command downloaded after this.
>
> This patch handles the corner case to resolve zero BD address problem.
>
> Signed-off-by: Ganapathi Bhat <[email protected]>
> Signed-off-by: Amitkumar Karwar <[email protected]>
> ---
> drivers/bluetooth/btusb.c | 5 +----
> 1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c
> index 811f9b9..b5596ac 100644
> --- a/drivers/bluetooth/btusb.c
> +++ b/drivers/bluetooth/btusb.c
> @@ -607,10 +607,7 @@ static void btusb_intr_complete(struct urb *urb)
> BT_DBG("%s urb %p status %d count %d", hdev->name, urb, urb-
> >status,
> urb->actual_length);
>
> - if (!test_bit(HCI_RUNNING, &hdev->flags))
> - return;
> -
> - if (urb->status == 0) {
> + if (urb->status == 0 && test_bit(HCI_RUNNING, &hdev->flags)) {
> hdev->stat.byte_rx += urb->actual_length;
>
> if (btusb_recv_intr(data, urb->transfer_buffer,
Did you get a chance to check this?
Please let us know if you have any review comments.
Regards,
Amitkumar
Hi Amitkumar,
>> From: Amitkumar Karwar [mailto:[email protected]]
>> Sent: Tuesday, October 18, 2016 6:27 PM
>> To: [email protected]
>> Cc: [email protected]; [email protected]; Cathy Luo;
>> Nishant Sarmukadam; Ganapathi Bhat; Amitkumar Karwar
>> Subject: [PATCH] btusb: fix zero BD address problem during stress test
>>
>> From: Ganapathi Bhat <[email protected]>
>>
>> We came across a corner case issue during reboot stress test in which
>> hciconfig shows BD address is all zero. Reason is we don't get response
>> for HCI RESET command during initialization
>>
>> The issue is tracked to a race where USB subsystem calls
>> btusb_intr_complete() to deliver a data(NOOP frame) received on
>> interrupt endpoint. HCI_RUNNING flag is not yet set by bluetooth
>> subsystem. So we ignore that frame and return.
>>
>> As we missed to resubmit the buffer to interrupt endpoint in this case,
>> we don't get response for BT reset command downloaded after this.
>>
>> This patch handles the corner case to resolve zero BD address problem.
>>
>> Signed-off-by: Ganapathi Bhat <[email protected]>
>> Signed-off-by: Amitkumar Karwar <[email protected]>
>> ---
>> drivers/bluetooth/btusb.c | 5 +----
>> 1 file changed, 1 insertion(+), 4 deletions(-)
>>
>> diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c
>> index 811f9b9..b5596ac 100644
>> --- a/drivers/bluetooth/btusb.c
>> +++ b/drivers/bluetooth/btusb.c
>> @@ -607,10 +607,7 @@ static void btusb_intr_complete(struct urb *urb)
>> BT_DBG("%s urb %p status %d count %d", hdev->name, urb, urb-
>>> status,
>> urb->actual_length);
>>
>> - if (!test_bit(HCI_RUNNING, &hdev->flags))
>> - return;
>> -
>> - if (urb->status == 0) {
>> + if (urb->status == 0 && test_bit(HCI_RUNNING, &hdev->flags)) {
>> hdev->stat.byte_rx += urb->actual_length;
>>
>> if (btusb_recv_intr(data, urb->transfer_buffer,
>
> Did you get a chance to check this?
> Please let us know if you have any review comments.
can you explain how this is correct and show me the HCI traces for this.
Regards
Marcel
Hi Marcel,
> From: Marcel Holtmann [mailto:[email protected]]
> Sent: Wednesday, November 23, 2016 3:25 AM
> To: Amitkumar Karwar
> Cc: [email protected]; [email protected];
> Cathy Luo; Nishant Sarmukadam; Ganapathi Bhat
> Subject: Re: [PATCH] btusb: fix zero BD address problem during stress
> test
>
> Hi Amitkumar,
>
> >> From: Amitkumar Karwar [mailto:[email protected]]
> >> Sent: Tuesday, October 18, 2016 6:27 PM
> >> To: [email protected]
> >> Cc: [email protected]; [email protected]; Cathy Luo;
> >> Nishant Sarmukadam; Ganapathi Bhat; Amitkumar Karwar
> >> Subject: [PATCH] btusb: fix zero BD address problem during stress
> >> test
> >>
> >> From: Ganapathi Bhat <[email protected]>
> >>
> >> We came across a corner case issue during reboot stress test in
> which
> >> hciconfig shows BD address is all zero. Reason is we don't get
> >> response for HCI RESET command during initialization
> >>
> >> The issue is tracked to a race where USB subsystem calls
> >> btusb_intr_complete() to deliver a data(NOOP frame) received on
> >> interrupt endpoint. HCI_RUNNING flag is not yet set by bluetooth
> >> subsystem. So we ignore that frame and return.
> >>
> >> As we missed to resubmit the buffer to interrupt endpoint in this
> >> case, we don't get response for BT reset command downloaded after
> this.
> >>
> >> This patch handles the corner case to resolve zero BD address
> problem.
> >>
> >> Signed-off-by: Ganapathi Bhat <[email protected]>
> >> Signed-off-by: Amitkumar Karwar <[email protected]>
> >> ---
> >> drivers/bluetooth/btusb.c | 5 +----
> >> 1 file changed, 1 insertion(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c
> >> index 811f9b9..b5596ac 100644
> >> --- a/drivers/bluetooth/btusb.c
> >> +++ b/drivers/bluetooth/btusb.c
> >> @@ -607,10 +607,7 @@ static void btusb_intr_complete(struct urb
> *urb)
> >> BT_DBG("%s urb %p status %d count %d", hdev->name, urb, urb-
> >>> status,
> >> urb->actual_length);
> >>
> >> - if (!test_bit(HCI_RUNNING, &hdev->flags))
> >> - return;
> >> -
> >> - if (urb->status == 0) {
> >> + if (urb->status == 0 && test_bit(HCI_RUNNING, &hdev->flags)) {
> >> hdev->stat.byte_rx += urb->actual_length;
> >>
> >> if (btusb_recv_intr(data, urb->transfer_buffer,
> >
> > Did you get a chance to check this?
> > Please let us know if you have any review comments.
>
> can you explain how this is correct and show me the HCI traces for
> this.
>
I suppose HCI trace means hcidump logs here. As device hasn't yet initialized, hcidump won't show anything.
We had added debug info in btusb driver to trace the data received on all USB endpoints and also checked usbmon logs.
Here is the sequence of events we observed in a corner case while running stress test.
1) Inside btusb_open() call ------ Thread 1
2) btusb_submit_intr_urb() submits the URB for receiving data on interrupt endpoint ---- Thread 1
3) btusb_intr_complete() gets called to deliver NOP frame from HCI controller ---- Thread 2
4) HCI_RUNNING isn't set yet. So we return from btusb_intr_complete() without resubmitting the buffer --- Thread 2
5) Exit btusb_open() ---- Thread 1
6) "set_bit(HCI_RUNNING, &hdev->flags)" done by bluetooth core ---- Thread 1
Later HCI_RESET command gets timedout, as we haven't re-submitted buffer for interrupt endpoint in step (4) above.
Please find attached logs. usbmon log shows first frame received on interrupt endpoint is NOP(Search for Marvell in log).
Here is what bluetooth spec says about NOP frame.
"To indicate to the Host that the Controller is ready to receive HCI command packets,
the Controller generates a Command Status event with Status 0x00 and
Command_Opcode 0x0000, and the Num_HCI_Command_Packets event
parameter is set to 1 or more. Command_Opcode, 0x0000 is a NOP (No Operation)"
Regards,
Amitkumar
Hi Amitkumar,
>>>> From: Amitkumar Karwar [mailto:[email protected]]
>>>> Sent: Tuesday, October 18, 2016 6:27 PM
>>>> To: [email protected]
>>>> Cc: [email protected]; [email protected]; Cathy Luo;
>>>> Nishant Sarmukadam; Ganapathi Bhat; Amitkumar Karwar
>>>> Subject: [PATCH] btusb: fix zero BD address problem during stress
>>>> test
>>>>
>>>> From: Ganapathi Bhat <[email protected]>
>>>>
>>>> We came across a corner case issue during reboot stress test in
>> which
>>>> hciconfig shows BD address is all zero. Reason is we don't get
>>>> response for HCI RESET command during initialization
>>>>
>>>> The issue is tracked to a race where USB subsystem calls
>>>> btusb_intr_complete() to deliver a data(NOOP frame) received on
>>>> interrupt endpoint. HCI_RUNNING flag is not yet set by bluetooth
>>>> subsystem. So we ignore that frame and return.
>>>>
>>>> As we missed to resubmit the buffer to interrupt endpoint in this
>>>> case, we don't get response for BT reset command downloaded after
>> this.
>>>>
>>>> This patch handles the corner case to resolve zero BD address
>> problem.
>>>>
>>>> Signed-off-by: Ganapathi Bhat <[email protected]>
>>>> Signed-off-by: Amitkumar Karwar <[email protected]>
>>>> ---
>>>> drivers/bluetooth/btusb.c | 5 +----
>>>> 1 file changed, 1 insertion(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c
>>>> index 811f9b9..b5596ac 100644
>>>> --- a/drivers/bluetooth/btusb.c
>>>> +++ b/drivers/bluetooth/btusb.c
>>>> @@ -607,10 +607,7 @@ static void btusb_intr_complete(struct urb
>> *urb)
>>>> BT_DBG("%s urb %p status %d count %d", hdev->name, urb, urb-
>>>>> status,
>>>> urb->actual_length);
>>>>
>>>> - if (!test_bit(HCI_RUNNING, &hdev->flags))
>>>> - return;
>>>> -
>>>> - if (urb->status == 0) {
>>>> + if (urb->status == 0 && test_bit(HCI_RUNNING, &hdev->flags)) {
>>>> hdev->stat.byte_rx += urb->actual_length;
>>>>
>>>> if (btusb_recv_intr(data, urb->transfer_buffer,
>>>
>>> Did you get a chance to check this?
>>> Please let us know if you have any review comments.
>>
>> can you explain how this is correct and show me the HCI traces for
>> this.
>>
>
> I suppose HCI trace means hcidump logs here. As device hasn't yet initialized, hcidump won't show anything.
> We had added debug info in btusb driver to trace the data received on all USB endpoints and also checked usbmon logs.
use btmon and it will show it.
> Here is the sequence of events we observed in a corner case while running stress test.
> 1) Inside btusb_open() call ------ Thread 1
> 2) btusb_submit_intr_urb() submits the URB for receiving data on interrupt endpoint ---- Thread 1
> 3) btusb_intr_complete() gets called to deliver NOP frame from HCI controller ---- Thread 2
> 4) HCI_RUNNING isn't set yet. So we return from btusb_intr_complete() without resubmitting the buffer --- Thread 2
> 5) Exit btusb_open() ---- Thread 1
> 6) "set_bit(HCI_RUNNING, &hdev->flags)" done by bluetooth core ---- Thread 1
>
> Later HCI_RESET command gets timedout, as we haven't re-submitted buffer for interrupt endpoint in step (4) above.
>
> Please find attached logs. usbmon log shows first frame received on interrupt endpoint is NOP(Search for Marvell in log).
>
> Here is what bluetooth spec says about NOP frame.
>
> "To indicate to the Host that the Controller is ready to receive HCI command packets,
> the Controller generates a Command Status event with Status 0x00 and
> Command_Opcode 0x0000, and the Num_HCI_Command_Packets event
> parameter is set to 1 or more. Command_Opcode, 0x0000 is a NOP (No Operation)”
So I wonder if we need to remove the HCI_RUNNING logic from the drivers. It is only left in a few USB drivers and I removed the rest and moved it into the core. I am not in favour of papering over this issue. I need to understand what is wrong. And actually HCI_RUNNING needs to be taken away from the drivers. So the question is if btusb.c still needs it or if that is just a leftover. Meaning is it protecting anything?
Regards
Marcel
Hi Marcel,
> From: Marcel Holtmann [mailto:[email protected]]
> Sent: Wednesday, November 23, 2016 1:46 PM
> To: Amitkumar Karwar
> Cc: [email protected]; [email protected];
> Cathy Luo; Nishant Sarmukadam; Ganapathi Bhat
> Subject: Re: [PATCH] btusb: fix zero BD address problem during stress
> test
>
> Hi Amitkumar,
>
> >>>> From: Amitkumar Karwar [mailto:[email protected]]
> >>>> Sent: Tuesday, October 18, 2016 6:27 PM
> >>>> To: [email protected]
> >>>> Cc: [email protected]; [email protected]; Cathy Luo;
> >>>> Nishant Sarmukadam; Ganapathi Bhat; Amitkumar Karwar
> >>>> Subject: [PATCH] btusb: fix zero BD address problem during stress
> >>>> test
> >>>>
> >>>> From: Ganapathi Bhat <[email protected]>
> >>>>
> >>>> We came across a corner case issue during reboot stress test in
> >> which
> >>>> hciconfig shows BD address is all zero. Reason is we don't get
> >>>> response for HCI RESET command during initialization
> >>>>
> >>>> The issue is tracked to a race where USB subsystem calls
> >>>> btusb_intr_complete() to deliver a data(NOOP frame) received on
> >>>> interrupt endpoint. HCI_RUNNING flag is not yet set by bluetooth
> >>>> subsystem. So we ignore that frame and return.
> >>>>
> >>>> As we missed to resubmit the buffer to interrupt endpoint in this
> >>>> case, we don't get response for BT reset command downloaded after
> >> this.
> >>>>
> >>>> This patch handles the corner case to resolve zero BD address
> >> problem.
> >>>>
> >>>> Signed-off-by: Ganapathi Bhat <[email protected]>
> >>>> Signed-off-by: Amitkumar Karwar <[email protected]>
> >>>> ---
> >>>> drivers/bluetooth/btusb.c | 5 +----
> >>>> 1 file changed, 1 insertion(+), 4 deletions(-)
> >>>>
> >>>> diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c
> >>>> index 811f9b9..b5596ac 100644
> >>>> --- a/drivers/bluetooth/btusb.c
> >>>> +++ b/drivers/bluetooth/btusb.c
> >>>> @@ -607,10 +607,7 @@ static void btusb_intr_complete(struct urb
> >> *urb)
> >>>> BT_DBG("%s urb %p status %d count %d", hdev->name, urb, urb-
> >>>>> status,
> >>>> urb->actual_length);
> >>>>
> >>>> - if (!test_bit(HCI_RUNNING, &hdev->flags))
> >>>> - return;
> >>>> -
> >>>> - if (urb->status == 0) {
> >>>> + if (urb->status == 0 && test_bit(HCI_RUNNING, &hdev-
> >flags)) {
> >>>> hdev->stat.byte_rx += urb->actual_length;
> >>>>
> >>>> if (btusb_recv_intr(data, urb->transfer_buffer,
> >>>
> >>> Did you get a chance to check this?
> >>> Please let us know if you have any review comments.
> >>
> >> can you explain how this is correct and show me the HCI traces for
> >> this.
> >>
> >
> > I suppose HCI trace means hcidump logs here. As device hasn't yet
> initialized, hcidump won't show anything.
> > We had added debug info in btusb driver to trace the data received on
> all USB endpoints and also checked usbmon logs.
>
> use btmon and it will show it.
>
> > Here is the sequence of events we observed in a corner case while
> running stress test.
> > 1) Inside btusb_open() call ------ Thread 1
> > 2) btusb_submit_intr_urb() submits the URB for receiving data on
> > interrupt endpoint ---- Thread 1
> > 3) btusb_intr_complete() gets called to deliver NOP frame from HCI
> > controller ---- Thread 2
> > 4) HCI_RUNNING isn't set yet. So we return from btusb_intr_complete()
> > without resubmitting the buffer --- Thread 2
> > 5) Exit btusb_open() ---- Thread 1
> > 6) "set_bit(HCI_RUNNING, &hdev->flags)" done by bluetooth core ----
> > Thread 1
> >
> > Later HCI_RESET command gets timedout, as we haven't re-submitted
> buffer for interrupt endpoint in step (4) above.
> >
> > Please find attached logs. usbmon log shows first frame received on
> interrupt endpoint is NOP(Search for Marvell in log).
> >
> > Here is what bluetooth spec says about NOP frame.
> >
> > "To indicate to the Host that the Controller is ready to receive HCI
> > command packets, the Controller generates a Command Status event with
> > Status 0x00 and Command_Opcode 0x0000, and the
> Num_HCI_Command_Packets
> > event parameter is set to 1 or more. Command_Opcode, 0x0000 is a NOP
> (No Operation)”
>
> So I wonder if we need to remove the HCI_RUNNING logic from the
> drivers. It is only left in a few USB drivers and I removed the rest
> and moved it into the core. I am not in favour of papering over this
> issue. I need to understand what is wrong. And actually HCI_RUNNING
> needs to be taken away from the drivers. So the question is if btusb.c
> still needs it or if that is just a leftover. Meaning is it protecting
> anything?
>
I can see HCI_RUNNING check being used at below places in btusb.c
1) btusb_intr_complete
2) btusb_bulk_complete
3) btusb_isoc_complete
4) btusb_tx_complete
5) btusb_isoc_tx_complete
6) btusb_bcm_set_diag
7) btusb_resume
I think, it can be removed from (1), (2) and (3), as we already have below check inside hci_recv_frame()
if (!hdev || (!test_bit(HCI_UP, &hdev->flags)
&& !test_bit(HCI_INIT, &hdev->flags))) {
kfree_skb(skb);
return -ENXIO;
}
For (4) and (5), it's just used to decide if we should update "hdev->stat" or not.
Regards,
Amitkumar
Hi Marcel,
> From: Amitkumar Karwar
> Sent: Wednesday, November 23, 2016 3:14 PM
> To: 'Marcel Holtmann'
> Cc: [email protected]; [email protected];
> Cathy Luo; Nishant Sarmukadam; Ganapathi Bhat
> Subject: RE: [PATCH] btusb: fix zero BD address problem during stress
> test
>
> Hi Marcel,
>
> > From: Marcel Holtmann [mailto:[email protected]]
> > Sent: Wednesday, November 23, 2016 1:46 PM
> > To: Amitkumar Karwar
> > Cc: [email protected]; [email protected];
> > Cathy Luo; Nishant Sarmukadam; Ganapathi Bhat
> > Subject: Re: [PATCH] btusb: fix zero BD address problem during stress
> > test
> >
> > Hi Amitkumar,
> >
> > >>>> From: Amitkumar Karwar [mailto:[email protected]]
> > >>>> Sent: Tuesday, October 18, 2016 6:27 PM
> > >>>> To: [email protected]
> > >>>> Cc: [email protected]; [email protected]; Cathy
> Luo;
> > >>>> Nishant Sarmukadam; Ganapathi Bhat; Amitkumar Karwar
> > >>>> Subject: [PATCH] btusb: fix zero BD address problem during
> stress
> > >>>> test
> > >>>>
> > >>>> From: Ganapathi Bhat <[email protected]>
> > >>>>
> > >>>> We came across a corner case issue during reboot stress test in
> > >> which
> > >>>> hciconfig shows BD address is all zero. Reason is we don't get
> > >>>> response for HCI RESET command during initialization
> > >>>>
> > >>>> The issue is tracked to a race where USB subsystem calls
> > >>>> btusb_intr_complete() to deliver a data(NOOP frame) received on
> > >>>> interrupt endpoint. HCI_RUNNING flag is not yet set by bluetooth
> > >>>> subsystem. So we ignore that frame and return.
> > >>>>
> > >>>> As we missed to resubmit the buffer to interrupt endpoint in
> this
> > >>>> case, we don't get response for BT reset command downloaded
> after
> > >> this.
> > >>>>
> > >>>> This patch handles the corner case to resolve zero BD address
> > >> problem.
> > >>>>
> > >>>> Signed-off-by: Ganapathi Bhat <[email protected]>
> > >>>> Signed-off-by: Amitkumar Karwar <[email protected]>
> > >>>> ---
> > >>>> drivers/bluetooth/btusb.c | 5 +----
> > >>>> 1 file changed, 1 insertion(+), 4 deletions(-)
> > >>>>
> > >>>> diff --git a/drivers/bluetooth/btusb.c
> > >>>> b/drivers/bluetooth/btusb.c index 811f9b9..b5596ac 100644
> > >>>> --- a/drivers/bluetooth/btusb.c
> > >>>> +++ b/drivers/bluetooth/btusb.c
> > >>>> @@ -607,10 +607,7 @@ static void btusb_intr_complete(struct urb
> > >> *urb)
> > >>>> BT_DBG("%s urb %p status %d count %d", hdev->name, urb,
> urb-
> > >>>>> status,
> > >>>> urb->actual_length);
> > >>>>
> > >>>> - if (!test_bit(HCI_RUNNING, &hdev->flags))
> > >>>> - return;
> > >>>> -
> > >>>> - if (urb->status == 0) {
> > >>>> + if (urb->status == 0 && test_bit(HCI_RUNNING, &hdev-
> > >flags)) {
> > >>>> hdev->stat.byte_rx += urb->actual_length;
> > >>>>
> > >>>> if (btusb_recv_intr(data, urb->transfer_buffer,
> > >>>
> > >>> Did you get a chance to check this?
> > >>> Please let us know if you have any review comments.
> > >>
> > >> can you explain how this is correct and show me the HCI traces for
> > >> this.
> > >>
> > >
> > > I suppose HCI trace means hcidump logs here. As device hasn't yet
> > initialized, hcidump won't show anything.
> > > We had added debug info in btusb driver to trace the data received
> > > on
> > all USB endpoints and also checked usbmon logs.
> >
> > use btmon and it will show it.
> >
> > > Here is the sequence of events we observed in a corner case while
> > running stress test.
> > > 1) Inside btusb_open() call ------ Thread 1
> > > 2) btusb_submit_intr_urb() submits the URB for receiving data on
> > > interrupt endpoint ---- Thread 1
> > > 3) btusb_intr_complete() gets called to deliver NOP frame from HCI
> > > controller ---- Thread 2
> > > 4) HCI_RUNNING isn't set yet. So we return from
> > > btusb_intr_complete() without resubmitting the buffer --- Thread 2
> > > 5) Exit btusb_open() ---- Thread 1
> > > 6) "set_bit(HCI_RUNNING, &hdev->flags)" done by bluetooth core ----
> > > Thread 1
> > >
> > > Later HCI_RESET command gets timedout, as we haven't re-submitted
> > buffer for interrupt endpoint in step (4) above.
> > >
> > > Please find attached logs. usbmon log shows first frame received on
> > interrupt endpoint is NOP(Search for Marvell in log).
> > >
> > > Here is what bluetooth spec says about NOP frame.
> > >
> > > "To indicate to the Host that the Controller is ready to receive
> HCI
> > > command packets, the Controller generates a Command Status event
> > > with Status 0x00 and Command_Opcode 0x0000, and the
> > Num_HCI_Command_Packets
> > > event parameter is set to 1 or more. Command_Opcode, 0x0000 is a
> NOP
> > (No Operation)”
> >
> > So I wonder if we need to remove the HCI_RUNNING logic from the
> > drivers. It is only left in a few USB drivers and I removed the rest
> > and moved it into the core. I am not in favour of papering over this
> > issue. I need to understand what is wrong. And actually HCI_RUNNING
> > needs to be taken away from the drivers. So the question is if
> btusb.c
> > still needs it or if that is just a leftover. Meaning is it
> protecting
> > anything?
> >
>
> I can see HCI_RUNNING check being used at below places in btusb.c
>
> 1) btusb_intr_complete
> 2) btusb_bulk_complete
> 3) btusb_isoc_complete
> 4) btusb_tx_complete
> 5) btusb_isoc_tx_complete
> 6) btusb_bcm_set_diag
> 7) btusb_resume
>
> I think, it can be removed from (1), (2) and (3), as we already have
> below check inside hci_recv_frame()
>
> if (!hdev || (!test_bit(HCI_UP, &hdev->flags)
> && !test_bit(HCI_INIT, &hdev->flags))) {
> kfree_skb(skb);
> return -ENXIO;
> }
>
> For (4) and (5), it's just used to decide if we should update "hdev-
> >stat" or not.
>
Please guide on how should we fix this issue. We observe timeout for HCI_RESET command sometimes when system is booted. This happens due to missing urb_submitt() when first frame(NOP) received on INT endpoint is ignored. Below change would also help.
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -1318,12 +1318,13 @@ static int hci_dev_do_open(struct hci_dev *hdev)
goto done;
}
+ set_bit(HCI_RUNNING, &hdev->flags);
if (hdev->open(hdev)) {
+ clear_bit(HCI_RUNNING, &hdev->flags);
ret = -EIO;
goto done;
}
- set_bit(HCI_RUNNING, &hdev->flags);
hci_sock_dev_event(hdev, HCI_DEV_OPEN);
Regards,
Amitkumar
Hi Marcel,
> From: Amitkumar Karwar
> Sent: Friday, November 25, 2016 4:51 PM
> To: 'Marcel Holtmann'
> Cc: '[email protected]'; '[email protected]';
> Cathy Luo; Nishant Sarmukadam; Ganapathi Bhat
> Subject: RE: [PATCH] btusb: fix zero BD address problem during stress
> test
>
> Hi Marcel,
>
> > From: Amitkumar Karwar
> > Sent: Wednesday, November 23, 2016 3:14 PM
> > To: 'Marcel Holtmann'
> > Cc: [email protected]; [email protected];
> > Cathy Luo; Nishant Sarmukadam; Ganapathi Bhat
> > Subject: RE: [PATCH] btusb: fix zero BD address problem during stress
> > test
> >
> > Hi Marcel,
> >
> > > From: Marcel Holtmann [mailto:[email protected]]
> > > Sent: Wednesday, November 23, 2016 1:46 PM
> > > To: Amitkumar Karwar
> > > Cc: [email protected]; [email protected];
> > > Cathy Luo; Nishant Sarmukadam; Ganapathi Bhat
> > > Subject: Re: [PATCH] btusb: fix zero BD address problem during
> > > stress test
> > >
> > > Hi Amitkumar,
> > >
> > > >>>> From: Amitkumar Karwar [mailto:[email protected]]
> > > >>>> Sent: Tuesday, October 18, 2016 6:27 PM
> > > >>>> To: [email protected]
> > > >>>> Cc: [email protected]; [email protected]; Cathy
> > Luo;
> > > >>>> Nishant Sarmukadam; Ganapathi Bhat; Amitkumar Karwar
> > > >>>> Subject: [PATCH] btusb: fix zero BD address problem during
> > stress
> > > >>>> test
> > > >>>>
> > > >>>> From: Ganapathi Bhat <[email protected]>
> > > >>>>
> > > >>>> We came across a corner case issue during reboot stress test
> in
> > > >> which
> > > >>>> hciconfig shows BD address is all zero. Reason is we don't get
> > > >>>> response for HCI RESET command during initialization
> > > >>>>
> > > >>>> The issue is tracked to a race where USB subsystem calls
> > > >>>> btusb_intr_complete() to deliver a data(NOOP frame) received
> on
> > > >>>> interrupt endpoint. HCI_RUNNING flag is not yet set by
> > > >>>> bluetooth subsystem. So we ignore that frame and return.
> > > >>>>
> > > >>>> As we missed to resubmit the buffer to interrupt endpoint in
> > this
> > > >>>> case, we don't get response for BT reset command downloaded
> > after
> > > >> this.
> > > >>>>
> > > >>>> This patch handles the corner case to resolve zero BD address
> > > >> problem.
> > > >>>>
> > > >>>> Signed-off-by: Ganapathi Bhat <[email protected]>
> > > >>>> Signed-off-by: Amitkumar Karwar <[email protected]>
> > > >>>> ---
> > > >>>> drivers/bluetooth/btusb.c | 5 +----
> > > >>>> 1 file changed, 1 insertion(+), 4 deletions(-)
> > > >>>>
> > > >>>> diff --git a/drivers/bluetooth/btusb.c
> > > >>>> b/drivers/bluetooth/btusb.c index 811f9b9..b5596ac 100644
> > > >>>> --- a/drivers/bluetooth/btusb.c
> > > >>>> +++ b/drivers/bluetooth/btusb.c
> > > >>>> @@ -607,10 +607,7 @@ static void btusb_intr_complete(struct
> urb
> > > >> *urb)
> > > >>>> BT_DBG("%s urb %p status %d count %d", hdev->name, urb,
> > urb-
> > > >>>>> status,
> > > >>>> urb->actual_length);
> > > >>>>
> > > >>>> - if (!test_bit(HCI_RUNNING, &hdev->flags))
> > > >>>> - return;
> > > >>>> -
> > > >>>> - if (urb->status == 0) {
> > > >>>> + if (urb->status == 0 && test_bit(HCI_RUNNING, &hdev-
> > > >flags)) {
> > > >>>> hdev->stat.byte_rx += urb->actual_length;
> > > >>>>
> > > >>>> if (btusb_recv_intr(data, urb->transfer_buffer,
> > > >>>
> > > >>> Did you get a chance to check this?
> > > >>> Please let us know if you have any review comments.
> > > >>
> > > >> can you explain how this is correct and show me the HCI traces
> > > >> for this.
> > > >>
> > > >
> > > > I suppose HCI trace means hcidump logs here. As device hasn't yet
> > > initialized, hcidump won't show anything.
> > > > We had added debug info in btusb driver to trace the data
> received
> > > > on
> > > all USB endpoints and also checked usbmon logs.
> > >
> > > use btmon and it will show it.
> > >
> > > > Here is the sequence of events we observed in a corner case while
> > > running stress test.
> > > > 1) Inside btusb_open() call ------ Thread 1
> > > > 2) btusb_submit_intr_urb() submits the URB for receiving data on
> > > > interrupt endpoint ---- Thread 1
> > > > 3) btusb_intr_complete() gets called to deliver NOP frame from
> HCI
> > > > controller ---- Thread 2
> > > > 4) HCI_RUNNING isn't set yet. So we return from
> > > > btusb_intr_complete() without resubmitting the buffer --- Thread
> 2
> > > > 5) Exit btusb_open() ---- Thread 1
> > > > 6) "set_bit(HCI_RUNNING, &hdev->flags)" done by bluetooth core
> > > > ---- Thread 1
> > > >
> > > > Later HCI_RESET command gets timedout, as we haven't re-submitted
> > > buffer for interrupt endpoint in step (4) above.
> > > >
> > > > Please find attached logs. usbmon log shows first frame received
> > > > on
> > > interrupt endpoint is NOP(Search for Marvell in log).
> > > >
> > > > Here is what bluetooth spec says about NOP frame.
> > > >
> > > > "To indicate to the Host that the Controller is ready to receive
> > HCI
> > > > command packets, the Controller generates a Command Status event
> > > > with Status 0x00 and Command_Opcode 0x0000, and the
> > > Num_HCI_Command_Packets
> > > > event parameter is set to 1 or more. Command_Opcode, 0x0000 is a
> > NOP
> > > (No Operation)”
> > >
> > > So I wonder if we need to remove the HCI_RUNNING logic from the
> > > drivers. It is only left in a few USB drivers and I removed the
> rest
> > > and moved it into the core. I am not in favour of papering over
> this
> > > issue. I need to understand what is wrong. And actually HCI_RUNNING
> > > needs to be taken away from the drivers. So the question is if
> > btusb.c
> > > still needs it or if that is just a leftover. Meaning is it
> > protecting
> > > anything?
> > >
> >
> > I can see HCI_RUNNING check being used at below places in btusb.c
> >
> > 1) btusb_intr_complete
> > 2) btusb_bulk_complete
> > 3) btusb_isoc_complete
> > 4) btusb_tx_complete
> > 5) btusb_isoc_tx_complete
> > 6) btusb_bcm_set_diag
> > 7) btusb_resume
> >
> > I think, it can be removed from (1), (2) and (3), as we already have
> > below check inside hci_recv_frame()
> >
> > if (!hdev || (!test_bit(HCI_UP, &hdev->flags)
> > && !test_bit(HCI_INIT, &hdev->flags))) {
> > kfree_skb(skb);
> > return -ENXIO;
> > }
> >
> > For (4) and (5), it's just used to decide if we should update "hdev-
> > >stat" or not.
> >
>
> Please guide on how should we fix this issue. We observe timeout for
> HCI_RESET command sometimes when system is booted. This happens due to
> missing urb_submitt() when first frame(NOP) received on INT endpoint is
> ignored. Below change would also help.
>
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -1318,12 +1318,13 @@ static int hci_dev_do_open(struct hci_dev
> *hdev)
> goto done;
> }
>
> + set_bit(HCI_RUNNING, &hdev->flags);
> if (hdev->open(hdev)) {
> + clear_bit(HCI_RUNNING, &hdev->flags);
> ret = -EIO;
> goto done;
> }
>
> - set_bit(HCI_RUNNING, &hdev->flags);
> hci_sock_dev_event(hdev, HCI_DEV_OPEN);
How can we proceed on this? I'm not much familiar with btusb driver, so not sure if HCI_RUNNING logic can be completely removed.
Meanwhile could you accept proposed fix in this patch for the potential issue we spotted during stress tests.
The change doesn't have any side-effects. During teardown, race between btusb_close() and btusb_intr_complete() is appropriately taken care of with the help of BTUSB_INTR_RUNNING flag.
Regards,
Amitkumar Karwar
Hi Amitkumar,
>>>>>>>> From: Amitkumar Karwar [mailto:[email protected]]
>>>>>>>> Sent: Tuesday, October 18, 2016 6:27 PM
>>>>>>>> To: [email protected]
>>>>>>>> Cc: [email protected]; [email protected]; Cathy
>>> Luo;
>>>>>>>> Nishant Sarmukadam; Ganapathi Bhat; Amitkumar Karwar
>>>>>>>> Subject: [PATCH] btusb: fix zero BD address problem during
>>> stress
>>>>>>>> test
>>>>>>>>
>>>>>>>> From: Ganapathi Bhat <[email protected]>
>>>>>>>>
>>>>>>>> We came across a corner case issue during reboot stress test
>> in
>>>>>> which
>>>>>>>> hciconfig shows BD address is all zero. Reason is we don't get
>>>>>>>> response for HCI RESET command during initialization
>>>>>>>>
>>>>>>>> The issue is tracked to a race where USB subsystem calls
>>>>>>>> btusb_intr_complete() to deliver a data(NOOP frame) received
>> on
>>>>>>>> interrupt endpoint. HCI_RUNNING flag is not yet set by
>>>>>>>> bluetooth subsystem. So we ignore that frame and return.
>>>>>>>>
>>>>>>>> As we missed to resubmit the buffer to interrupt endpoint in
>>> this
>>>>>>>> case, we don't get response for BT reset command downloaded
>>> after
>>>>>> this.
>>>>>>>>
>>>>>>>> This patch handles the corner case to resolve zero BD address
>>>>>> problem.
>>>>>>>>
>>>>>>>> Signed-off-by: Ganapathi Bhat <[email protected]>
>>>>>>>> Signed-off-by: Amitkumar Karwar <[email protected]>
>>>>>>>> ---
>>>>>>>> drivers/bluetooth/btusb.c | 5 +----
>>>>>>>> 1 file changed, 1 insertion(+), 4 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/bluetooth/btusb.c
>>>>>>>> b/drivers/bluetooth/btusb.c index 811f9b9..b5596ac 100644
>>>>>>>> --- a/drivers/bluetooth/btusb.c
>>>>>>>> +++ b/drivers/bluetooth/btusb.c
>>>>>>>> @@ -607,10 +607,7 @@ static void btusb_intr_complete(struct
>> urb
>>>>>> *urb)
>>>>>>>> BT_DBG("%s urb %p status %d count %d", hdev->name, urb,
>>> urb-
>>>>>>>>> status,
>>>>>>>> urb->actual_length);
>>>>>>>>
>>>>>>>> - if (!test_bit(HCI_RUNNING, &hdev->flags))
>>>>>>>> - return;
>>>>>>>> -
>>>>>>>> - if (urb->status == 0) {
>>>>>>>> + if (urb->status == 0 && test_bit(HCI_RUNNING, &hdev-
>>>>> flags)) {
>>>>>>>> hdev->stat.byte_rx += urb->actual_length;
>>>>>>>>
>>>>>>>> if (btusb_recv_intr(data, urb->transfer_buffer,
>>>>>>>
>>>>>>> Did you get a chance to check this?
>>>>>>> Please let us know if you have any review comments.
>>>>>>
>>>>>> can you explain how this is correct and show me the HCI traces
>>>>>> for this.
>>>>>>
>>>>>
>>>>> I suppose HCI trace means hcidump logs here. As device hasn't yet
>>>> initialized, hcidump won't show anything.
>>>>> We had added debug info in btusb driver to trace the data
>> received
>>>>> on
>>>> all USB endpoints and also checked usbmon logs.
>>>>
>>>> use btmon and it will show it.
>>>>
>>>>> Here is the sequence of events we observed in a corner case while
>>>> running stress test.
>>>>> 1) Inside btusb_open() call ------ Thread 1
>>>>> 2) btusb_submit_intr_urb() submits the URB for receiving data on
>>>>> interrupt endpoint ---- Thread 1
>>>>> 3) btusb_intr_complete() gets called to deliver NOP frame from
>> HCI
>>>>> controller ---- Thread 2
>>>>> 4) HCI_RUNNING isn't set yet. So we return from
>>>>> btusb_intr_complete() without resubmitting the buffer --- Thread
>> 2
>>>>> 5) Exit btusb_open() ---- Thread 1
>>>>> 6) "set_bit(HCI_RUNNING, &hdev->flags)" done by bluetooth core
>>>>> ---- Thread 1
>>>>>
>>>>> Later HCI_RESET command gets timedout, as we haven't re-submitted
>>>> buffer for interrupt endpoint in step (4) above.
>>>>>
>>>>> Please find attached logs. usbmon log shows first frame received
>>>>> on
>>>> interrupt endpoint is NOP(Search for Marvell in log).
>>>>>
>>>>> Here is what bluetooth spec says about NOP frame.
>>>>>
>>>>> "To indicate to the Host that the Controller is ready to receive
>>> HCI
>>>>> command packets, the Controller generates a Command Status event
>>>>> with Status 0x00 and Command_Opcode 0x0000, and the
>>>> Num_HCI_Command_Packets
>>>>> event parameter is set to 1 or more. Command_Opcode, 0x0000 is a
>>> NOP
>>>> (No Operation)”
>>>>
>>>> So I wonder if we need to remove the HCI_RUNNING logic from the
>>>> drivers. It is only left in a few USB drivers and I removed the
>> rest
>>>> and moved it into the core. I am not in favour of papering over
>> this
>>>> issue. I need to understand what is wrong. And actually HCI_RUNNING
>>>> needs to be taken away from the drivers. So the question is if
>>> btusb.c
>>>> still needs it or if that is just a leftover. Meaning is it
>>> protecting
>>>> anything?
>>>>
>>>
>>> I can see HCI_RUNNING check being used at below places in btusb.c
>>>
>>> 1) btusb_intr_complete
>>> 2) btusb_bulk_complete
>>> 3) btusb_isoc_complete
>>> 4) btusb_tx_complete
>>> 5) btusb_isoc_tx_complete
>>> 6) btusb_bcm_set_diag
>>> 7) btusb_resume
>>>
>>> I think, it can be removed from (1), (2) and (3), as we already have
>>> below check inside hci_recv_frame()
>>>
>>> if (!hdev || (!test_bit(HCI_UP, &hdev->flags)
>>> && !test_bit(HCI_INIT, &hdev->flags))) {
>>> kfree_skb(skb);
>>> return -ENXIO;
>>> }
>>>
>>> For (4) and (5), it's just used to decide if we should update "hdev-
>>>> stat" or not.
>>>
>>
>> Please guide on how should we fix this issue. We observe timeout for
>> HCI_RESET command sometimes when system is booted. This happens due to
>> missing urb_submitt() when first frame(NOP) received on INT endpoint is
>> ignored. Below change would also help.
>>
>> --- a/net/bluetooth/hci_core.c
>> +++ b/net/bluetooth/hci_core.c
>> @@ -1318,12 +1318,13 @@ static int hci_dev_do_open(struct hci_dev
>> *hdev)
>> goto done;
>> }
>>
>> + set_bit(HCI_RUNNING, &hdev->flags);
>> if (hdev->open(hdev)) {
>> + clear_bit(HCI_RUNNING, &hdev->flags);
>> ret = -EIO;
>> goto done;
>> }
>>
>> - set_bit(HCI_RUNNING, &hdev->flags);
>> hci_sock_dev_event(hdev, HCI_DEV_OPEN);
>
> How can we proceed on this? I'm not much familiar with btusb driver, so not sure if HCI_RUNNING logic can be completely removed.
> Meanwhile could you accept proposed fix in this patch for the potential issue we spotted during stress tests.
>
> The change doesn't have any side-effects. During teardown, race between btusb_close() and btusb_intr_complete() is appropriately taken care of with the help of BTUSB_INTR_RUNNING flag.
so what is the downside of just replacing all HCI_RUNNING with the appropriate BTUSB_INTR_RUNNING, BTUSB_BULK_RUNNING etc.?
Regards
Marcel