2011-03-01 18:16:09

by Andrzej Kaczmarek

[permalink] [raw]
Subject: [PATCH] Bluetooth: Add counter for not acked HCI commands

Adds counter for HCI commands which were sent but are not yet acked.
This is to prevent race conditions in scenarios where HCI commands
are sent between complete event and command status event, i.e.
last sent HCI command is not accounted in credits number returned
by command status event.

< HCI Command: Create Connection (0x01|0x0005) plen 13
bdaddr 00:23:76:E3:24:58 ptype 0xcc18 rswitch 0x01 clkoffset 0x0000
Packet type: DM1 DM3 DM5 DH1 DH3 DH5
> HCI Event: Command Status (0x0f) plen 4
Create Connection (0x01|0x0005) status 0x00 ncmd 1
> HCI Event: Connect Complete (0x03) plen 11
status 0x00 handle 1 bdaddr 00:23:76:E3:24:58 type ACL encrypt 0x00
< HCI Command: Read Remote Supported Features (0x01|0x001b) plen 2
handle 1
> HCI Event: Command Status (0x0f) plen 4
Unknown (0x00|0x0000) status 0x00 ncmd 2
> HCI Event: Command Status (0x0f) plen 4
Read Remote Supported Features (0x01|0x001b) status 0x00 ncmd 1
> HCI Event: Max Slots Change (0x1b) plen 3
handle 1 slots 5
> HCI Event: Read Remote Supported Features (0x0b) plen 11
status 0x00 handle 1
Features: 0xbf 0xfe 0x8f 0xfe 0x9b 0xff 0x79 0x83
< HCI Command: Read Remote Extended Features (0x01|0x001c) plen 3
handle 1 page 1
> HCI Event: Command Status (0x0f) plen 4
Unknown (0x00|0x0000) status 0x00 ncmd 2
> HCI Event: Command Status (0x0f) plen 4
Read Remote Extended Features (0x01|0x001c) status 0x00 ncmd 1
> HCI Event: Read Remote Extended Features (0x23) plen 13
status 0x00 handle 1 page 1 max 1
Features: 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00
< HCI Command: Authentication Requested (0x01|0x0011) plen 2
handle 1
> HCI Event: Command Status (0x0f) plen 4
Unknown (0x00|0x0000) status 0x00 ncmd 2
< HCI Command: Remote Name Request (0x01|0x0019) plen 10
bdaddr 00:23:76:E3:24:58 mode 2 clkoffset 0x0000
> HCI Event: Command Status (0x0f) plen 4
Authentication Requested (0x01|0x0011) status 0x00 ncmd 1
> HCI Event: Link Key Request (0x17) plen 6
bdaddr 00:23:76:E3:24:58

Following command should not be sent here since we have effectively no
credits for HCI command, i.e. 1 credit returned by CS for Authentication
Requested was already consumed by Remote Name Request.

< HCI Command: Link Key Request Negative Reply (0x01|0x000c) plen 6
bdaddr 00:23:76:E3:24:58
> HCI Event: Command Status (0x0f) plen 4
Remote Name Request (0x01|0x0019) status 0x00 ncmd 0
> HCI Event: Command Complete (0x0e) plen 10
Link Key Request Negative Reply (0x01|0x000c) ncmd 0
status 0x0c bdaddr 00:23:76:E3:24:58
Error: Command Disallowed

Signed-off-by: Andrzej Kaczmarek <[email protected]>
---
include/net/bluetooth/hci_core.h | 1 +
net/bluetooth/hci_core.c | 10 ++++++++--
net/bluetooth/hci_event.c | 8 ++++++--
3 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
index 441dadb..baf190b 100644
--- a/include/net/bluetooth/hci_core.h
+++ b/include/net/bluetooth/hci_core.h
@@ -121,6 +121,7 @@ struct hci_dev {
unsigned long quirks;

atomic_t cmd_cnt;
+ atomic_t cmd_not_ack;
unsigned int acl_cnt;
unsigned int sco_cnt;
unsigned int le_cnt;
diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index b372fb8..1c6886d 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -531,6 +531,7 @@ int hci_dev_open(__u16 dev)

if (!test_bit(HCI_RAW, &hdev->flags)) {
atomic_set(&hdev->cmd_cnt, 1);
+ atomic_set(&hdev->cmd_not_ack, 0);
set_bit(HCI_INIT, &hdev->flags);
hdev->init_last_cmd = 0;

@@ -606,6 +607,7 @@ static int hci_dev_do_close(struct hci_dev *hdev)
/* Reset device */
skb_queue_purge(&hdev->cmd_q);
atomic_set(&hdev->cmd_cnt, 1);
+ atomic_set(&hdev->cmd_not_ack, 0);
if (!test_bit(HCI_RAW, &hdev->flags)) {
set_bit(HCI_INIT, &hdev->flags);
__hci_request(hdev, hci_reset_req, 0,
@@ -684,6 +686,7 @@ int hci_dev_reset(__u16 dev)
hdev->flush(hdev);

atomic_set(&hdev->cmd_cnt, 1);
+ atomic_set(&hdev->cmd_not_ack, 0);
hdev->acl_cnt = 0; hdev->sco_cnt = 0; hdev->le_cnt = 0;

if (!test_bit(HCI_RAW, &hdev->flags))
@@ -1074,6 +1077,7 @@ static void hci_cmd_timer(unsigned long arg)

BT_ERR("%s command tx timeout", hdev->name);
atomic_set(&hdev->cmd_cnt, 1);
+ atomic_add_unless(&hdev->cmd_not_ack, -1, 0);
tasklet_schedule(&hdev->cmd_task);
}

@@ -2015,10 +2019,11 @@ static void hci_cmd_task(unsigned long arg)
struct hci_dev *hdev = (struct hci_dev *) arg;
struct sk_buff *skb;

- BT_DBG("%s cmd %d", hdev->name, atomic_read(&hdev->cmd_cnt));
+ BT_DBG("%s cnt %d not_ack %d", hdev->name, atomic_read(&hdev->cmd_cnt),
+ atomic_read(&hdev->cmd_not_ack));

/* Send queued commands */
- if (atomic_read(&hdev->cmd_cnt)) {
+ if (atomic_read(&hdev->cmd_cnt) > atomic_read(&hdev->cmd_not_ack)) {
skb = skb_dequeue(&hdev->cmd_q);
if (!skb)
return;
@@ -2028,6 +2033,7 @@ static void hci_cmd_task(unsigned long arg)
hdev->sent_cmd = skb_clone(skb, GFP_ATOMIC);
if (hdev->sent_cmd) {
atomic_dec(&hdev->cmd_cnt);
+ atomic_inc(&hdev->cmd_not_ack);
hci_send_frame(skb);
mod_timer(&hdev->cmd_timer,
jiffies + msecs_to_jiffies(HCI_CMD_TIMEOUT));
diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
index 3fbfa50..3cf63a1 100644
--- a/net/bluetooth/hci_event.c
+++ b/net/bluetooth/hci_event.c
@@ -1766,8 +1766,10 @@ static inline void hci_cmd_complete_evt(struct hci_dev *hdev, struct sk_buff *sk
break;
}

- if (ev->opcode != HCI_OP_NOP)
+ if (ev->opcode != HCI_OP_NOP) {
del_timer(&hdev->cmd_timer);
+ atomic_add_unless(&hdev->cmd_not_ack, -1, 0);
+ }

if (ev->ncmd) {
atomic_set(&hdev->cmd_cnt, 1);
@@ -1844,8 +1846,10 @@ static inline void hci_cmd_status_evt(struct hci_dev *hdev, struct sk_buff *skb)
break;
}

- if (ev->opcode != HCI_OP_NOP)
+ if (ev->opcode != HCI_OP_NOP) {
del_timer(&hdev->cmd_timer);
+ atomic_add_unless(&hdev->cmd_not_ack, -1, 0);
+ }

if (ev->ncmd) {
atomic_set(&hdev->cmd_cnt, 1);
--
1.7.0.4

Patch on behalf of ST-Ericsson


2011-03-18 09:22:16

by Andrzej Kaczmarek

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: Add counter for not acked HCI commands

Hi Gustavo,

On 17.03.2011 00:27, Gustavo F. Padovan wrote:
>>> That's exactly how linux stack work, and I'm seeing where we could be doing
>>> wrong, so I believe your patch is fixing nothing. A best version of your
>>> patch is:
>>>
>>> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
>>> index cebe7588..2d4d441 100644
>>> --- a/net/bluetooth/hci_event.c
>>> +++ b/net/bluetooth/hci_event.c
>>> @@ -1771,7 +1771,7 @@ static inline void hci_cmd_complete_evt(struct hci_dev *hdev, struct sk_buff *sk
>>> if (ev->opcode != HCI_OP_NOP)
>>> del_timer(&hdev->cmd_timer);
>>>
>>> - if (ev->ncmd) {
>>> + if (ev->ncmd&& ev->opcode != HCI_OP_NOP) {
>>> atomic_set(&hdev->cmd_cnt, 1);
>>> if (!skb_queue_empty(&hdev->cmd_q))
>>> tasklet_schedule(&hdev->cmd_task);
>>
>> I don't think that this is correct.
>
> Yes, this is wrong. It's just a simplified version of Andrzej's patch.

No, my patch does not break other scenarios, i.e. the one mentioned by
Brian ;-)

I agree with him that we can make proper decision in every case only if
we have both credit and unacked count separately due to asynchronous
nature of HCI acks. And it's a good point that we do not have to count
unacked commands but instead just store information whether last sent
cmd is not yet acked.

I was going to send v2 patch with suggestions from Brian included but I
didn't get answer whether cmd_cnt has to be atomic_t for some reason
other than historical. Both acl_cnt and sco_cnt are simple integers, for
example.

>> Second: How is cmd_cnt being used? I see it being decremented when a
>> command is sent in hci_cmd_task, or being set to "1" when one of the two
>> events are received. It is also set to "1" for time-outs and channel
>> open/close/reset activities.
>>
>> It looks to me like the cmd_cnt is being misused. It should in fact be
>> set to the value indicated by the event, and not just "1". It also does
>> not address the original problem observed by Andrzej, which is that the
>> Asyncronous link between the Host and Baseband can cause them to be
>> momentarily out of sync with each other.
>
> We limit cmd_cnt to 1 in our stack, to be sure that one command a time will be
> sent, I don't for the reason for that but Marcel certainly has one.

Pretty good reason could be that last command sent is stored in hci_dev
and then used in cc/cs events. I don't have problem with this kind of
solution, I just think it's not enough here.

I also don't see point of making it a counter while it behaves like a
weird flag: set to 1, but decrease instead of set to 0. A bit
inconsistent in my opinion.

BR,
Andrzej

2011-03-16 23:27:39

by Gustavo Padovan

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: Add counter for not acked HCI commands

Hi Brian,

* Brian Gix <[email protected]> [2011-03-16 15:50:52 -0700]:

> Hi Gustavo & Andrzej,
>
> On 3/16/2011 2:30 PM, Gustavo F. Padovan wrote:
> > Hi Andrzej,
> >
> > * Brian Gix<[email protected]> [2011-03-04 08:12:45 -0800]:
> >
> >> Hi Andrzej,
> >>
> >> On 3/4/2011 4:37 AM, Andrzej Kaczmarek wrote:
> >>> Hi Brian,
> >>>
> >>> On 01.03.2011 20:25, Brian Gix wrote:
> >>>> The problem you describe sounds like one I had to solve in the past, but
> >>>> unfortunately, I think it may be a little more difficult to solve here.
> >>>> This particular baseband appears to have an outstanding Cmd queue of
> >>>> 2. It also appears to consume one of them for extended periods of time
> >>>> when making requests of the remote device, and using the NOP
> >>>> Cmd-Status-Event to inform the host that the slot is now free.
> >>>>
> >>>> As you are observing, the completion of the task (triggering additional
> >>>> requests locally) overlaps with these NOP responses, giving a false
> >>>> count to the host of available cmd slots.
> >>>>
> >>>> Personally, I consider this to be a baseband bug, which could have been
> >>>> avoided by having a max outstanding queue of 1.
> >>>
> >>> This particular controller uses 1 credit for each command that is being
> >>> processed and having max outstanding queue of 1 would make some
> >>> scenarios impossible - consider authentication with
> >>> HCI_Authentication_Request pending and other HCI command to be sent in
> >>> parallel.
> >>>
> >>
> >> The adjustment I suggest doesn't disallow this. I was having a
> >> theory-of-operation talk with a baseband guy once, and this is what he
> >> had to say:
> >>
> >> The HCI interface is intended to be an interface that immediately
> >> responds to *every* command. The problem is that some commands are
> >> intended for the local baseband (and can be handled immediately) and
> >> others require interaction outside of the control of the local baseband,
> >> and take an indeterminate amount of time.
> >>
> >> So two response mechanism were created:
> >>
> >> Command Immediate Rsp Delayed Rsp
> >> Cmd --> Cmd Complete Evt (Cmds handled Locally)
> >> Cmd --> Cmd Status Evt --> Cmplt Event ("long" Async Cmds)
> >>
> >> The HCI flow control is contained in both the Cmd-Complt-Evt and the
> >> Cmd-Status-Evt.
> >>
> >> So it is assumed that both flow control response event types will be
> >> delivered immediately after the baseband receives them. Of course
> >> because of the communication link, these response are still asyncronous
> >> in most cases including the BlueZ case.
> >>
> >> The baseband guy basically said that "the baseband" does not expect the
> >> next command until the host has processed the (immediate) response to
> >> the previous one. And that the (immediate) response to the previous one
> >> should be RXed in milliseconds at the most.
> >>
> >> So I would always delay sending the next command until the prior
> >> commands CmdStatus or CmdCmplt has been received. This should work
> >> unless there is something seriously wrong with the baseband.
> >
> > That's exactly how linux stack work, and I'm seeing where we could be doing
> > wrong, so I believe your patch is fixing nothing. A best version of your
> > patch is:
> >
> > diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
> > index cebe7588..2d4d441 100644
> > --- a/net/bluetooth/hci_event.c
> > +++ b/net/bluetooth/hci_event.c
> > @@ -1771,7 +1771,7 @@ static inline void hci_cmd_complete_evt(struct hci_dev *hdev, struct sk_buff *sk
> > if (ev->opcode != HCI_OP_NOP)
> > del_timer(&hdev->cmd_timer);
> >
> > - if (ev->ncmd) {
> > + if (ev->ncmd&& ev->opcode != HCI_OP_NOP) {
> > atomic_set(&hdev->cmd_cnt, 1);
> > if (!skb_queue_empty(&hdev->cmd_q))
> > tasklet_schedule(&hdev->cmd_task);
>
> I don't think that this is correct.

Yes, this is wrong. It's just a simplified version of Andrzej's patch.

>
> First: Any change made here in hci_cmd_cmplete_evt would need to also be
> made in hci_cmd_status_evt. The "NOP" event can be either a Cmd Cmplt
> or a Cmd Status, and in the example given by Andrzej, it was a Cmd Status.

Yes, of course. My patch is just a concept (as it is wrong)

>
> Second: How is cmd_cnt being used? I see it being decremented when a
> command is sent in hci_cmd_task, or being set to "1" when one of the two
> events are received. It is also set to "1" for time-outs and channel
> open/close/reset activities.
>
> It looks to me like the cmd_cnt is being misused. It should in fact be
> set to the value indicated by the event, and not just "1". It also does
> not address the original problem observed by Andrzej, which is that the
> Asyncronous link between the Host and Baseband can cause them to be
> momentarily out of sync with each other.

We limit cmd_cnt to 1 in our stack, to be sure that one command a time will be
sent, I don't for the reason for that but Marcel certainly has one.

--
Gustavo F. Padovan
http://profusion.mobi

2011-03-16 22:50:52

by Brian Gix

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: Add counter for not acked HCI commands

Hi Gustavo & Andrzej,

On 3/16/2011 2:30 PM, Gustavo F. Padovan wrote:
> Hi Andrzej,
>
> * Brian Gix<[email protected]> [2011-03-04 08:12:45 -0800]:
>
>> Hi Andrzej,
>>
>> On 3/4/2011 4:37 AM, Andrzej Kaczmarek wrote:
>>> Hi Brian,
>>>
>>> On 01.03.2011 20:25, Brian Gix wrote:
>>>> The problem you describe sounds like one I had to solve in the past, but
>>>> unfortunately, I think it may be a little more difficult to solve here.
>>>> This particular baseband appears to have an outstanding Cmd queue of
>>>> 2. It also appears to consume one of them for extended periods of time
>>>> when making requests of the remote device, and using the NOP
>>>> Cmd-Status-Event to inform the host that the slot is now free.
>>>>
>>>> As you are observing, the completion of the task (triggering additional
>>>> requests locally) overlaps with these NOP responses, giving a false
>>>> count to the host of available cmd slots.
>>>>
>>>> Personally, I consider this to be a baseband bug, which could have been
>>>> avoided by having a max outstanding queue of 1.
>>>
>>> This particular controller uses 1 credit for each command that is being
>>> processed and having max outstanding queue of 1 would make some
>>> scenarios impossible - consider authentication with
>>> HCI_Authentication_Request pending and other HCI command to be sent in
>>> parallel.
>>>
>>
>> The adjustment I suggest doesn't disallow this. I was having a
>> theory-of-operation talk with a baseband guy once, and this is what he
>> had to say:
>>
>> The HCI interface is intended to be an interface that immediately
>> responds to *every* command. The problem is that some commands are
>> intended for the local baseband (and can be handled immediately) and
>> others require interaction outside of the control of the local baseband,
>> and take an indeterminate amount of time.
>>
>> So two response mechanism were created:
>>
>> Command Immediate Rsp Delayed Rsp
>> Cmd --> Cmd Complete Evt (Cmds handled Locally)
>> Cmd --> Cmd Status Evt --> Cmplt Event ("long" Async Cmds)
>>
>> The HCI flow control is contained in both the Cmd-Complt-Evt and the
>> Cmd-Status-Evt.
>>
>> So it is assumed that both flow control response event types will be
>> delivered immediately after the baseband receives them. Of course
>> because of the communication link, these response are still asyncronous
>> in most cases including the BlueZ case.
>>
>> The baseband guy basically said that "the baseband" does not expect the
>> next command until the host has processed the (immediate) response to
>> the previous one. And that the (immediate) response to the previous one
>> should be RXed in milliseconds at the most.
>>
>> So I would always delay sending the next command until the prior
>> commands CmdStatus or CmdCmplt has been received. This should work
>> unless there is something seriously wrong with the baseband.
>
> That's exactly how linux stack work, and I'm seeing where we could be doing
> wrong, so I believe your patch is fixing nothing. A best version of your
> patch is:
>
> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
> index cebe7588..2d4d441 100644
> --- a/net/bluetooth/hci_event.c
> +++ b/net/bluetooth/hci_event.c
> @@ -1771,7 +1771,7 @@ static inline void hci_cmd_complete_evt(struct hci_dev *hdev, struct sk_buff *sk
> if (ev->opcode != HCI_OP_NOP)
> del_timer(&hdev->cmd_timer);
>
> - if (ev->ncmd) {
> + if (ev->ncmd&& ev->opcode != HCI_OP_NOP) {
> atomic_set(&hdev->cmd_cnt, 1);
> if (!skb_queue_empty(&hdev->cmd_q))
> tasklet_schedule(&hdev->cmd_task);

I don't think that this is correct.

First: Any change made here in hci_cmd_cmplete_evt would need to also be
made in hci_cmd_status_evt. The "NOP" event can be either a Cmd Cmplt
or a Cmd Status, and in the example given by Andrzej, it was a Cmd Status.

Second: How is cmd_cnt being used? I see it being decremented when a
command is sent in hci_cmd_task, or being set to "1" when one of the two
events are received. It is also set to "1" for time-outs and channel
open/close/reset activities.

It looks to me like the cmd_cnt is being misused. It should in fact be
set to the value indicated by the event, and not just "1". It also does
not address the original problem observed by Andrzej, which is that the
Asyncronous link between the Host and Baseband can cause them to be
momentarily out of sync with each other.

Gustavo's suggestion misses the problem, that the reason basebands use
NOPs at all is to hold onto available command slots, only to release
them later with the NOP. It is entirely possible that a baseband could
indicate with Cmd Status an ev->ncmd == 0, and the entire transaction
then end with a NOP (of either type) with ev->ncmd > 0. If that were to
occur, the cmd_q would never get serviced.

I don't see any solution to this other than to *separately* track the
existence of an unAcked cmd, and conditioning the servicing of the cmd_q
on *both* the (non) existence of an unAcked cmd *and* the availablity of
slots in cmd_cnt.


>
>
> That patch means that want to avoid send new commands when a Command Status
> Event arrives, but we want exactly the opposite.
>


--
Brian Gix
[email protected]
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum

2011-03-16 21:30:20

by Gustavo Padovan

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: Add counter for not acked HCI commands

Hi Andrzej,

* Brian Gix <[email protected]> [2011-03-04 08:12:45 -0800]:

> Hi Andrzej,
>
> On 3/4/2011 4:37 AM, Andrzej Kaczmarek wrote:
> > Hi Brian,
> >
> > On 01.03.2011 20:25, Brian Gix wrote:
> >> The problem you describe sounds like one I had to solve in the past, but
> >> unfortunately, I think it may be a little more difficult to solve here.
> >> This particular baseband appears to have an outstanding Cmd queue of
> >> 2. It also appears to consume one of them for extended periods of time
> >> when making requests of the remote device, and using the NOP
> >> Cmd-Status-Event to inform the host that the slot is now free.
> >>
> >> As you are observing, the completion of the task (triggering additional
> >> requests locally) overlaps with these NOP responses, giving a false
> >> count to the host of available cmd slots.
> >>
> >> Personally, I consider this to be a baseband bug, which could have been
> >> avoided by having a max outstanding queue of 1.
> >
> > This particular controller uses 1 credit for each command that is being
> > processed and having max outstanding queue of 1 would make some
> > scenarios impossible - consider authentication with
> > HCI_Authentication_Request pending and other HCI command to be sent in
> > parallel.
> >
>
> The adjustment I suggest doesn't disallow this. I was having a
> theory-of-operation talk with a baseband guy once, and this is what he
> had to say:
>
> The HCI interface is intended to be an interface that immediately
> responds to *every* command. The problem is that some commands are
> intended for the local baseband (and can be handled immediately) and
> others require interaction outside of the control of the local baseband,
> and take an indeterminate amount of time.
>
> So two response mechanism were created:
>
> Command Immediate Rsp Delayed Rsp
> Cmd --> Cmd Complete Evt (Cmds handled Locally)
> Cmd --> Cmd Status Evt --> Cmplt Event ("long" Async Cmds)
>
> The HCI flow control is contained in both the Cmd-Complt-Evt and the
> Cmd-Status-Evt.
>
> So it is assumed that both flow control response event types will be
> delivered immediately after the baseband receives them. Of course
> because of the communication link, these response are still asyncronous
> in most cases including the BlueZ case.
>
> The baseband guy basically said that "the baseband" does not expect the
> next command until the host has processed the (immediate) response to
> the previous one. And that the (immediate) response to the previous one
> should be RXed in milliseconds at the most.
>
> So I would always delay sending the next command until the prior
> commands CmdStatus or CmdCmplt has been received. This should work
> unless there is something seriously wrong with the baseband.

That's exactly how linux stack work, and I'm seeing where we could be doing
wrong, so I believe your patch is fixing nothing. A best version of your
patch is:

diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
index cebe7588..2d4d441 100644
--- a/net/bluetooth/hci_event.c
+++ b/net/bluetooth/hci_event.c
@@ -1771,7 +1771,7 @@ static inline void hci_cmd_complete_evt(struct hci_dev *hdev, struct sk_buff *sk
if (ev->opcode != HCI_OP_NOP)
del_timer(&hdev->cmd_timer);

- if (ev->ncmd) {
+ if (ev->ncmd && ev->opcode != HCI_OP_NOP) {
atomic_set(&hdev->cmd_cnt, 1);
if (!skb_queue_empty(&hdev->cmd_q))
tasklet_schedule(&hdev->cmd_task);


That patch means that want to avoid send new commands when a Command Status
Event arrives, but we want exactly the opposite.

--
Gustavo F. Padovan
http://profusion.mobi

2011-03-04 16:12:45

by Brian Gix

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: Add counter for not acked HCI commands

Hi Andrzej,

On 3/4/2011 4:37 AM, Andrzej Kaczmarek wrote:
> Hi Brian,
>
> On 01.03.2011 20:25, Brian Gix wrote:
>> The problem you describe sounds like one I had to solve in the past, but
>> unfortunately, I think it may be a little more difficult to solve here.
>> This particular baseband appears to have an outstanding Cmd queue of
>> 2. It also appears to consume one of them for extended periods of time
>> when making requests of the remote device, and using the NOP
>> Cmd-Status-Event to inform the host that the slot is now free.
>>
>> As you are observing, the completion of the task (triggering additional
>> requests locally) overlaps with these NOP responses, giving a false
>> count to the host of available cmd slots.
>>
>> Personally, I consider this to be a baseband bug, which could have been
>> avoided by having a max outstanding queue of 1.
>
> This particular controller uses 1 credit for each command that is being
> processed and having max outstanding queue of 1 would make some
> scenarios impossible - consider authentication with
> HCI_Authentication_Request pending and other HCI command to be sent in
> parallel.
>

The adjustment I suggest doesn't disallow this. I was having a
theory-of-operation talk with a baseband guy once, and this is what he
had to say:

The HCI interface is intended to be an interface that immediately
responds to *every* command. The problem is that some commands are
intended for the local baseband (and can be handled immediately) and
others require interaction outside of the control of the local baseband,
and take an indeterminate amount of time.

So two response mechanism were created:

Command Immediate Rsp Delayed Rsp
Cmd --> Cmd Complete Evt (Cmds handled Locally)
Cmd --> Cmd Status Evt --> Cmplt Event ("long" Async Cmds)

The HCI flow control is contained in both the Cmd-Complt-Evt and the
Cmd-Status-Evt.

So it is assumed that both flow control response event types will be
delivered immediately after the baseband receives them. Of course
because of the communication link, these response are still asyncronous
in most cases including the BlueZ case.

The baseband guy basically said that "the baseband" does not expect the
next command until the host has processed the (immediate) response to
the previous one. And that the (immediate) response to the previous one
should be RXed in milliseconds at the most.

So I would always delay sending the next command until the prior
commands CmdStatus or CmdCmplt has been received. This should work
unless there is something seriously wrong with the baseband.


--
Brian Gix
[email protected]
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum

2011-03-04 12:37:21

by Andrzej Kaczmarek

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: Add counter for not acked HCI commands

Hi Brian,

On 01.03.2011 20:25, Brian Gix wrote:
> The problem you describe sounds like one I had to solve in the past, but
> unfortunately, I think it may be a little more difficult to solve here.
> This particular baseband appears to have an outstanding Cmd queue of
> 2. It also appears to consume one of them for extended periods of time
> when making requests of the remote device, and using the NOP
> Cmd-Status-Event to inform the host that the slot is now free.
>
> As you are observing, the completion of the task (triggering additional
> requests locally) overlaps with these NOP responses, giving a false
> count to the host of available cmd slots.
>
> Personally, I consider this to be a baseband bug, which could have been
> avoided by having a max outstanding queue of 1.

This particular controller uses 1 credit for each command that is being
processed and having max outstanding queue of 1 would make some
scenarios impossible - consider authentication with
HCI_Authentication_Request pending and other HCI command to be sent in
parallel.

Since spec does not specify how credits should be used so I don't
consider this a baseband bug, more like a design decision.

> Note 1:
> My biggest problem with your patch is the point marked above. I agree
> that it would solve the problem you observed, and your overall analysis
> of the situation, but because of the way this particular baseband is
> operating, and the way I have seen other basebands operate in the past,
> I think this decision point as to when to send the next command is
> incorrect.
>
> A flaw in the HCI command handshaking paradigm is that a baseband can
> decide to take away or not take away one of these slots, and give you
> notification asynchronously. I have seen basebands with presumptively
> two slots return a cmd status with ncmd 0 when getting a remote device
> name, if no ACL connection was currently established, for example.
>
> The safest way I have seen this problem handled is to force a
> psuedo-single-threading of commands on the system by NEVER sending a
> command while another command has not yet received it's Cmd-Status or
> Cmd-Cmplt event. In other words, instead of the test for cmd_cnt>
> cmd_not_acked, I would simply make into cmd_cnt&& !cmd_not_acked.

I never saw baseband which behaves as you described so I assumed that
each command should take no more that 1 credit. But again, since spec is
not very specific at this point perhaps this assumption was wrong, so
making this psuedo-single-threading as you described sounds reasonable
to me.

>> atomic_dec(&hdev->cmd_cnt);
>> + atomic_inc(&hdev->cmd_not_ack);
> [...]
>
> Also, I am not sure this means what you may intend it to mean. I think
> that this is intended to do both of these operations "together
> atomically", but that as written, it is possible for an interruption to
> occur between the two operations.

No, I didn't mean to do both of these operations together atomically. I
followed scheme to update cmd_cnt atomically. Honestly, I'm not sure why
cmd_cnt is updated atomically while other counters are not, it does not
guarantee us proper updating of this value - see hci_cmd_task for example.

Perhaps atomic functions should be stripped from cmd_cnt or code should
be updated more thoroughly to lock access to counters properly. I'll be
glad to hear opinion on this as well.

BR,
Andrzej

2011-03-01 19:29:42

by Brian Gix

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: Add counter for not acked HCI commands

Hi Again,

On 11-03-01 10:16 AM, Andrzej Kaczmarek wrote:
[...]

> atomic_dec(&hdev->cmd_cnt);
> + atomic_inc(&hdev->cmd_not_ack);
[...]

Also, I am not sure this means what you may intend it to mean. I think
that this is intended to do both of these operations "together
atomically", but that as written, it is possible for an interruption to
occur between the two operations.


--
Brian Gix
[email protected]
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum

2011-03-01 19:25:02

by Brian Gix

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: Add counter for not acked HCI commands

Hi Andrzej,

On 11-03-01 10:16 AM, Andrzej Kaczmarek wrote:
> Adds counter for HCI commands which were sent but are not yet acked.
> This is to prevent race conditions in scenarios where HCI commands
> are sent between complete event and command status event, i.e.
> last sent HCI command is not accounted in credits number returned
> by command status event.
>
> < HCI Command: Create Connection (0x01|0x0005) plen 13
> bdaddr 00:23:76:E3:24:58 ptype 0xcc18 rswitch 0x01 clkoffset 0x0000
> Packet type: DM1 DM3 DM5 DH1 DH3 DH5
>> HCI Event: Command Status (0x0f) plen 4
> Create Connection (0x01|0x0005) status 0x00 ncmd 1
>> HCI Event: Connect Complete (0x03) plen 11
> status 0x00 handle 1 bdaddr 00:23:76:E3:24:58 type ACL encrypt 0x00
> < HCI Command: Read Remote Supported Features (0x01|0x001b) plen 2
> handle 1
>> HCI Event: Command Status (0x0f) plen 4
> Unknown (0x00|0x0000) status 0x00 ncmd 2
>> HCI Event: Command Status (0x0f) plen 4
> Read Remote Supported Features (0x01|0x001b) status 0x00 ncmd 1
>> HCI Event: Max Slots Change (0x1b) plen 3
> handle 1 slots 5
>> HCI Event: Read Remote Supported Features (0x0b) plen 11
> status 0x00 handle 1
> Features: 0xbf 0xfe 0x8f 0xfe 0x9b 0xff 0x79 0x83
> < HCI Command: Read Remote Extended Features (0x01|0x001c) plen 3
> handle 1 page 1
>> HCI Event: Command Status (0x0f) plen 4
> Unknown (0x00|0x0000) status 0x00 ncmd 2
>> HCI Event: Command Status (0x0f) plen 4
> Read Remote Extended Features (0x01|0x001c) status 0x00 ncmd 1
>> HCI Event: Read Remote Extended Features (0x23) plen 13
> status 0x00 handle 1 page 1 max 1
> Features: 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00
> < HCI Command: Authentication Requested (0x01|0x0011) plen 2
> handle 1
>> HCI Event: Command Status (0x0f) plen 4
> Unknown (0x00|0x0000) status 0x00 ncmd 2
> < HCI Command: Remote Name Request (0x01|0x0019) plen 10
> bdaddr 00:23:76:E3:24:58 mode 2 clkoffset 0x0000
>> HCI Event: Command Status (0x0f) plen 4
> Authentication Requested (0x01|0x0011) status 0x00 ncmd 1
>> HCI Event: Link Key Request (0x17) plen 6
> bdaddr 00:23:76:E3:24:58
>
> Following command should not be sent here since we have effectively no
> credits for HCI command, i.e. 1 credit returned by CS for Authentication
> Requested was already consumed by Remote Name Request.
>
> < HCI Command: Link Key Request Negative Reply (0x01|0x000c) plen 6
> bdaddr 00:23:76:E3:24:58
>> HCI Event: Command Status (0x0f) plen 4
> Remote Name Request (0x01|0x0019) status 0x00 ncmd 0
>> HCI Event: Command Complete (0x0e) plen 10
> Link Key Request Negative Reply (0x01|0x000c) ncmd 0
> status 0x0c bdaddr 00:23:76:E3:24:58
> Error: Command Disallowed
>
> Signed-off-by: Andrzej Kaczmarek<[email protected]>
> ---
> include/net/bluetooth/hci_core.h | 1 +
> net/bluetooth/hci_core.c | 10 ++++++++--
> net/bluetooth/hci_event.c | 8 ++++++--
> 3 files changed, 15 insertions(+), 4 deletions(-)
>
> diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
> index 441dadb..baf190b 100644
> --- a/include/net/bluetooth/hci_core.h
> +++ b/include/net/bluetooth/hci_core.h
> @@ -121,6 +121,7 @@ struct hci_dev {
> unsigned long quirks;
>
> atomic_t cmd_cnt;
> + atomic_t cmd_not_ack;
> unsigned int acl_cnt;
> unsigned int sco_cnt;
> unsigned int le_cnt;
> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> index b372fb8..1c6886d 100644
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -531,6 +531,7 @@ int hci_dev_open(__u16 dev)
>
> if (!test_bit(HCI_RAW,&hdev->flags)) {
> atomic_set(&hdev->cmd_cnt, 1);
> + atomic_set(&hdev->cmd_not_ack, 0);
> set_bit(HCI_INIT,&hdev->flags);
> hdev->init_last_cmd = 0;
>
> @@ -606,6 +607,7 @@ static int hci_dev_do_close(struct hci_dev *hdev)
> /* Reset device */
> skb_queue_purge(&hdev->cmd_q);
> atomic_set(&hdev->cmd_cnt, 1);
> + atomic_set(&hdev->cmd_not_ack, 0);
> if (!test_bit(HCI_RAW,&hdev->flags)) {
> set_bit(HCI_INIT,&hdev->flags);
> __hci_request(hdev, hci_reset_req, 0,
> @@ -684,6 +686,7 @@ int hci_dev_reset(__u16 dev)
> hdev->flush(hdev);
>
> atomic_set(&hdev->cmd_cnt, 1);
> + atomic_set(&hdev->cmd_not_ack, 0);
> hdev->acl_cnt = 0; hdev->sco_cnt = 0; hdev->le_cnt = 0;
>
> if (!test_bit(HCI_RAW,&hdev->flags))
> @@ -1074,6 +1077,7 @@ static void hci_cmd_timer(unsigned long arg)
>
> BT_ERR("%s command tx timeout", hdev->name);
> atomic_set(&hdev->cmd_cnt, 1);
> + atomic_add_unless(&hdev->cmd_not_ack, -1, 0);
> tasklet_schedule(&hdev->cmd_task);
> }
>
> @@ -2015,10 +2019,11 @@ static void hci_cmd_task(unsigned long arg)
> struct hci_dev *hdev = (struct hci_dev *) arg;
> struct sk_buff *skb;
>
> - BT_DBG("%s cmd %d", hdev->name, atomic_read(&hdev->cmd_cnt));
> + BT_DBG("%s cnt %d not_ack %d", hdev->name, atomic_read(&hdev->cmd_cnt),
> + atomic_read(&hdev->cmd_not_ack));
>
> /* Send queued commands */
> - if (atomic_read(&hdev->cmd_cnt)) {
> + if (atomic_read(&hdev->cmd_cnt)> atomic_read(&hdev->cmd_not_ack)) {

See below for fuller explanation of my opinion.

But I would make this test be--> cmd_cnt && !cmd_not_acked

> skb = skb_dequeue(&hdev->cmd_q);
> if (!skb)
> return;
> @@ -2028,6 +2033,7 @@ static void hci_cmd_task(unsigned long arg)
> hdev->sent_cmd = skb_clone(skb, GFP_ATOMIC);
> if (hdev->sent_cmd) {
> atomic_dec(&hdev->cmd_cnt);
> + atomic_inc(&hdev->cmd_not_ack);
> hci_send_frame(skb);
> mod_timer(&hdev->cmd_timer,
> jiffies + msecs_to_jiffies(HCI_CMD_TIMEOUT));
> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
> index 3fbfa50..3cf63a1 100644
> --- a/net/bluetooth/hci_event.c
> +++ b/net/bluetooth/hci_event.c
> @@ -1766,8 +1766,10 @@ static inline void hci_cmd_complete_evt(struct hci_dev *hdev, struct sk_buff *sk
> break;
> }
>
> - if (ev->opcode != HCI_OP_NOP)
> + if (ev->opcode != HCI_OP_NOP) {
> del_timer(&hdev->cmd_timer);
> + atomic_add_unless(&hdev->cmd_not_ack, -1, 0);
> + }
>
> if (ev->ncmd) {
> atomic_set(&hdev->cmd_cnt, 1);
> @@ -1844,8 +1846,10 @@ static inline void hci_cmd_status_evt(struct hci_dev *hdev, struct sk_buff *skb)
> break;
> }
>
> - if (ev->opcode != HCI_OP_NOP)
> + if (ev->opcode != HCI_OP_NOP) {
> del_timer(&hdev->cmd_timer);
> + atomic_add_unless(&hdev->cmd_not_ack, -1, 0);
> + }
>
> if (ev->ncmd) {
> atomic_set(&hdev->cmd_cnt, 1);


The problem you describe sounds like one I had to solve in the past, but
unfortunately, I think it may be a little more difficult to solve here.
This particular baseband appears to have an outstanding Cmd queue of
2. It also appears to consume one of them for extended periods of time
when making requests of the remote device, and using the NOP
Cmd-Status-Event to inform the host that the slot is now free.

As you are observing, the completion of the task (triggering additional
requests locally) overlaps with these NOP responses, giving a false
count to the host of available cmd slots.

Personally, I consider this to be a baseband bug, which could have been
avoided by having a max outstanding queue of 1.

Note 1:
My biggest problem with your patch is the point marked above. I agree
that it would solve the problem you observed, and your overall analysis
of the situation, but because of the way this particular baseband is
operating, and the way I have seen other basebands operate in the past,
I think this decision point as to when to send the next command is
incorrect.

A flaw in the HCI command handshaking paradigm is that a baseband can
decide to take away or not take away one of these slots, and give you
notification asynchronously. I have seen basebands with presumptively
two slots return a cmd status with ncmd 0 when getting a remote device
name, if no ACL connection was currently established, for example.

The safest way I have seen this problem handled is to force a
psuedo-single-threading of commands on the system by NEVER sending a
command while another command has not yet received it's Cmd-Status or
Cmd-Cmplt event. In other words, instead of the test for cmd_cnt >
cmd_not_acked, I would simply make into cmd_cnt && !cmd_not_acked.

I think the rest of it is basically OK.

Note that it is still then possible to have multiple "very-asynchronous"
commands that include remote interaction, but it ensures that that part
that is suppose to be functionally syncrounous (Cmd + Status || Cmplt)
is done the way most basebands are expecting them to be. I believe this
to be the intended usage mode of the HCI paradigm.




--
Brian Gix
[email protected]
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum