2014-05-14 14:00:24

by Alexander Holler

[permalink] [raw]
Subject: [PATCH 1/2] bluetooth: don't include local processing of HCI commands in the command timeout

I assume the timeout for processing HCI commands was originally intended to
detect hung bluetooth devices and should not include the time needed locally
to handle the response to an HCI command. That is important because the time
needed locally (by the kernel or even userland) to process responses to HCI
commands varies a lot between systems and HCI commands. That's even more true
since many actions to HCI command responses are handled inside works which
might be delayed quiet some time, depending on the actual system load.

So stop the timeout as soon as a response to an HCI command was received.

This fixes various problems which resulted in HCI command timeouts and an
afterwards non-working bluetooth stack, especially on slower systems like
some ARM devices.

Drawback is that in-kernel problems like deadlocks aren't detected by HCI
command timeouts anymore, but such problems should be detected and handled
by other means and not by a timeout where it is hard to specify a value
reasonable for all possible systems (-configurations, -loads).

Furthermore, if the timeout includes local processing of HCI command
responses, in-kernel errors like hung tasks might be masked by the
timeout, because the hung task would be killed by the timeout before
the hung task would be detected (by other means).

Signed-off-by: Alexander Holler <[email protected]>
---
net/bluetooth/hci_event.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
index 15010a2..94c2dc0 100644
--- a/net/bluetooth/hci_event.c
+++ b/net/bluetooth/hci_event.c
@@ -2338,6 +2338,9 @@ static void hci_cmd_complete_evt(struct hci_dev *hdev, struct sk_buff *skb)

opcode = __le16_to_cpu(ev->opcode);

+ if (opcode != HCI_OP_NOP)
+ del_timer(&hdev->cmd_timer);
+
switch (opcode) {
case HCI_OP_INQUIRY_CANCEL:
hci_cc_inquiry_cancel(hdev, skb);
@@ -2584,9 +2587,6 @@ static void hci_cmd_complete_evt(struct hci_dev *hdev, struct sk_buff *skb)
break;
}

- if (opcode != HCI_OP_NOP)
- del_timer(&hdev->cmd_timer);
-
hci_req_cmd_complete(hdev, opcode, status);

if (ev->ncmd && !test_bit(HCI_RESET, &hdev->flags)) {
@@ -2605,6 +2605,9 @@ static void hci_cmd_status_evt(struct hci_dev *hdev, struct sk_buff *skb)

opcode = __le16_to_cpu(ev->opcode);

+ if (opcode != HCI_OP_NOP)
+ del_timer(&hdev->cmd_timer);
+
switch (opcode) {
case HCI_OP_INQUIRY:
hci_cs_inquiry(hdev, ev->status);
@@ -2675,9 +2678,6 @@ static void hci_cmd_status_evt(struct hci_dev *hdev, struct sk_buff *skb)
break;
}

- if (opcode != HCI_OP_NOP)
- del_timer(&hdev->cmd_timer);
-
if (ev->status ||
(hdev->sent_cmd && !bt_cb(hdev->sent_cmd)->req.event))
hci_req_cmd_complete(hdev, opcode, ev->status);
--
1.8.3.1


2014-05-31 07:09:40

by Alexander Holler

[permalink] [raw]
Subject: Re: [PATCH 1/2] bluetooth: don't include local processing of HCI commands in the command timeout

Am 31.05.2014 08:45, schrieb Alexander Holler:
> Am 31.05.2014 07:28, schrieb Marcel Holtmann:

>> so I actually wonder if we should move away from timer and move to a delayed work item to handle the timeout and if that would actually fix this issue.
>
> The problem is that I have absolutely no clue where these timeouts do
> come from. They appear for different commands and almost always only at
> boot. If the machine did come up without hci command timeouts, there
> never was one afterwards. So I digged in the dark and the above patch
> was one of the results. But I still had to increase the command timeout.
> And I can't do much testing as I use this box. I just experience this
> problem almost always when I boot it (to do kernel updates) and since a
> very long time (more than a year I think).

And I should mention that I had these timeouts with a different dongle
too (I switched from a bt 3.x dongle to a bt 4.x dongle around a year
ago). But as said, in the commit msg, the old behaviour might have
masked different problems like deadlocks or similiar too, so I just
might have experienced several different problems.

Regards,

Alexander Holler

.

2014-05-31 06:45:41

by Alexander Holler

[permalink] [raw]
Subject: Re: [PATCH 1/2] bluetooth: don't include local processing of HCI commands in the command timeout

Am 31.05.2014 07:28, schrieb Marcel Holtmann:
> Hi Alexander,
>
>> I assume the timeout for processing HCI commands was originally intended to
>> detect hung bluetooth devices and should not include the time needed locally
>> to handle the response to an HCI command. That is important because the time
>> needed locally (by the kernel or even userland) to process responses to HCI
>> commands varies a lot between systems and HCI commands. That's even more true
>> since many actions to HCI command responses are handled inside works which
>> might be delayed quiet some time, depending on the actual system load.
>>
>> So stop the timeout as soon as a response to an HCI command was received.
>>
>> This fixes various problems which resulted in HCI command timeouts and an
>> afterwards non-working bluetooth stack, especially on slower systems like
>> some ARM devices.
>>
>> Drawback is that in-kernel problems like deadlocks aren't detected by HCI
>> command timeouts anymore, but such problems should be detected and handled
>> by other means and not by a timeout where it is hard to specify a value
>> reasonable for all possible systems (-configurations, -loads).
>>
>> Furthermore, if the timeout includes local processing of HCI command
>> responses, in-kernel errors like hung tasks might be masked by the
>> timeout, because the hung task would be killed by the timeout before
>> the hung task would be detected (by other means).
>>
>> Signed-off-by: Alexander Holler <[email protected]>
>> ---
>> net/bluetooth/hci_event.c | 12 ++++++------
>> 1 file changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
>> index 15010a2..94c2dc0 100644
>> --- a/net/bluetooth/hci_event.c
>> +++ b/net/bluetooth/hci_event.c
>> @@ -2338,6 +2338,9 @@ static void hci_cmd_complete_evt(struct hci_dev *hdev, struct sk_buff *skb)
>>
>> opcode = __le16_to_cpu(ev->opcode);
>>
>> + if (opcode != HCI_OP_NOP)
>> + del_timer(&hdev->cmd_timer);
>> +
>
> so I actually wonder if we should move away from timer and move to a delayed work item to handle the timeout and if that would actually fix this issue.

The problem is that I have absolutely no clue where these timeouts do
come from. They appear for different commands and almost always only at
boot. If the machine did come up without hci command timeouts, there
never was one afterwards. So I digged in the dark and the above patch
was one of the results. But I still had to increase the command timeout.
And I can't do much testing as I use this box. I just experience this
problem almost always when I boot it (to do kernel updates) and since a
very long time (more than a year I think).

Regards,

Alexander Holler

2014-05-31 06:36:54

by Alexander Holler

[permalink] [raw]
Subject: Re: [PATCH 2/2] bluetooth: raise HCI_CMD_TIMEOUT from 2s to 8s

Am 31.05.2014 07:32, schrieb Marcel Holtmann:
> Hi Alexander,
>
>> The reasoning to do this is the following:
>>
>> - If a timeout occurs, the HCI-communication is broken afterwards and the
>> dongle isn't usable anymore.
>> - If it works after e.g. waiting 4s everyone is still happy but if it
>> just breaks after only waiting 2s nothing is gained.
>> - Having to wait some more seconds until an error occurs doesn't change
>> anything.
>>
>> So there is no disadvantage in rasing the timeout but a great advantage
>> in case the dongle needs more than 2s to process an HCI command.
>> E.g. I had sometimes HCI command timeouts at boot (but never after the BT stack
>> was successfull started). I assume the reason might be the USB-probing which
>> happend before through the bootloader, which might have confused the dongle
>> such that it needs a bit more time, but I'm not sure.
>>
>> Together with the patch which limits the timeout only to the actual time the
>> dongle needs to process an HCI command (and doesn't include the time the
>> kernel needs to process the answer to an HCI command), my problems were gone.
>>
>> Signed-off-by: Alexander Holler <[email protected]>
>> ---
>> include/net/bluetooth/hci.h | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
>> index be150cf..d50fd34 100644
>> --- a/include/net/bluetooth/hci.h
>> +++ b/include/net/bluetooth/hci.h
>> @@ -180,7 +180,7 @@ enum {
>> #define HCI_DISCONN_TIMEOUT msecs_to_jiffies(2000) /* 2 seconds */
>> #define HCI_PAIRING_TIMEOUT msecs_to_jiffies(60000) /* 60 seconds */
>> #define HCI_INIT_TIMEOUT msecs_to_jiffies(10000) /* 10 seconds */
>> -#define HCI_CMD_TIMEOUT msecs_to_jiffies(2000) /* 2 seconds */
>> +#define HCI_CMD_TIMEOUT msecs_to_jiffies(8000) /* 8 seconds */
>> #define HCI_ACL_TX_TIMEOUT msecs_to_jiffies(45000) /* 45 seconds */
>> #define HCI_AUTO_OFF_TIMEOUT msecs_to_jiffies(2000) /* 2 seconds */
>> #define HCI_POWER_OFF_TIMEOUT msecs_to_jiffies(5000) /* 5 seconds */
>
> I think moving the command timeout handling into a delayed work struct might actually solve this problem nicely and does not force us to increase the timeout. A chip that does not respond for 8 seconds is a pretty bad chip.

As I said in another mail, I don't think it is the chip. On the system
where I'm experiencing these timeouts there's still the USB-subsysten
inbetween. And this system boots from USB too, which means there's a lot
of other traffic on the USB-bus besides the one for the USB-BT-dongle.
And I don't know how the USB-stack (and hw) schedules the traffic, if he
is able to schedule that at all.

Regards,

Alexander

2014-05-31 05:32:31

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [PATCH 2/2] bluetooth: raise HCI_CMD_TIMEOUT from 2s to 8s

Hi Alexander,

> The reasoning to do this is the following:
>
> - If a timeout occurs, the HCI-communication is broken afterwards and the
> dongle isn't usable anymore.
> - If it works after e.g. waiting 4s everyone is still happy but if it
> just breaks after only waiting 2s nothing is gained.
> - Having to wait some more seconds until an error occurs doesn't change
> anything.
>
> So there is no disadvantage in rasing the timeout but a great advantage
> in case the dongle needs more than 2s to process an HCI command.
> E.g. I had sometimes HCI command timeouts at boot (but never after the BT stack
> was successfull started). I assume the reason might be the USB-probing which
> happend before through the bootloader, which might have confused the dongle
> such that it needs a bit more time, but I'm not sure.
>
> Together with the patch which limits the timeout only to the actual time the
> dongle needs to process an HCI command (and doesn't include the time the
> kernel needs to process the answer to an HCI command), my problems were gone.
>
> Signed-off-by: Alexander Holler <[email protected]>
> ---
> include/net/bluetooth/hci.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
> index be150cf..d50fd34 100644
> --- a/include/net/bluetooth/hci.h
> +++ b/include/net/bluetooth/hci.h
> @@ -180,7 +180,7 @@ enum {
> #define HCI_DISCONN_TIMEOUT msecs_to_jiffies(2000) /* 2 seconds */
> #define HCI_PAIRING_TIMEOUT msecs_to_jiffies(60000) /* 60 seconds */
> #define HCI_INIT_TIMEOUT msecs_to_jiffies(10000) /* 10 seconds */
> -#define HCI_CMD_TIMEOUT msecs_to_jiffies(2000) /* 2 seconds */
> +#define HCI_CMD_TIMEOUT msecs_to_jiffies(8000) /* 8 seconds */
> #define HCI_ACL_TX_TIMEOUT msecs_to_jiffies(45000) /* 45 seconds */
> #define HCI_AUTO_OFF_TIMEOUT msecs_to_jiffies(2000) /* 2 seconds */
> #define HCI_POWER_OFF_TIMEOUT msecs_to_jiffies(5000) /* 5 seconds */

I think moving the command timeout handling into a delayed work struct might actually solve this problem nicely and does not force us to increase the timeout. A chip that does not respond for 8 seconds is a pretty bad chip.

Regards

Marcel


2014-05-31 05:28:08

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [PATCH 1/2] bluetooth: don't include local processing of HCI commands in the command timeout

Hi Alexander,

> I assume the timeout for processing HCI commands was originally intended to
> detect hung bluetooth devices and should not include the time needed locally
> to handle the response to an HCI command. That is important because the time
> needed locally (by the kernel or even userland) to process responses to HCI
> commands varies a lot between systems and HCI commands. That's even more true
> since many actions to HCI command responses are handled inside works which
> might be delayed quiet some time, depending on the actual system load.
>
> So stop the timeout as soon as a response to an HCI command was received.
>
> This fixes various problems which resulted in HCI command timeouts and an
> afterwards non-working bluetooth stack, especially on slower systems like
> some ARM devices.
>
> Drawback is that in-kernel problems like deadlocks aren't detected by HCI
> command timeouts anymore, but such problems should be detected and handled
> by other means and not by a timeout where it is hard to specify a value
> reasonable for all possible systems (-configurations, -loads).
>
> Furthermore, if the timeout includes local processing of HCI command
> responses, in-kernel errors like hung tasks might be masked by the
> timeout, because the hung task would be killed by the timeout before
> the hung task would be detected (by other means).
>
> Signed-off-by: Alexander Holler <[email protected]>
> ---
> net/bluetooth/hci_event.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
> index 15010a2..94c2dc0 100644
> --- a/net/bluetooth/hci_event.c
> +++ b/net/bluetooth/hci_event.c
> @@ -2338,6 +2338,9 @@ static void hci_cmd_complete_evt(struct hci_dev *hdev, struct sk_buff *skb)
>
> opcode = __le16_to_cpu(ev->opcode);
>
> + if (opcode != HCI_OP_NOP)
> + del_timer(&hdev->cmd_timer);
> +

so I actually wonder if we should move away from timer and move to a delayed work item to handle the timeout and if that would actually fix this issue.

Regards

Marcel


2014-05-27 15:19:51

by Alexander Holler

[permalink] [raw]
Subject: Re: [PATCH 2/2] bluetooth: raise HCI_CMD_TIMEOUT from 2s to 8s

Am 16.05.2014 07:35, schrieb Alexander Holler:
> Am 15.05.2014 17:19, schrieb Alexander Holler:
>> Am 15.05.2014 16:50, schrieb Alexander Holler:
>>> Am 15.05.2014 14:54, schrieb Luiz Augusto von Dentz:
>>
>>>> This timeout seems arbitrary so I suppose we can increase it if we
>>>> feel it is necessary but we used already different timeout for
>>>> different commands like HCI_POWER_OFF_TIMEOUT, so perhaps if we can
>>>> identify which command is more likely to timeout.
>>>>
>>>> We could perhaps auto reset if a command timeout if there is really no
>>>> other way to recover.
>>>
>>> It is arbitrary but 2s is not enough here. And as I've written in the
>>> description, there is absolutely no reason to keep this timeout
>>> unnecessarily short. No one cares if an error message appears after 2s
>>> or 8s if the communication with the dongle is in both cases broken
>>> afterwards.
>>>
>>> One of the commands I experieced the problem with was e.g.
>>> HCI_OP_DELETE_STORED_LINK_KEY or HCI_OP_WRITE_SSP_MODE.
>>
>> The problem is that you can never be sure what the origin of a timeouted
>> command was. It might have been e.g. the USB-subsystem through wich the
>> command and the response has to travel (in case of USB dongles) and not
>> the dongle itself.
>
> To explain a bit more, the box I'm experiencing these problems boots
> from USB2.0 HD. So it's likely that there is quiet some action on the
> bus and that shouldn't affect the operation of the BT-stack (besides
> slowing it maybe a bit down).

Anything wrong with my conclusion?

I don't know what's the origin of the current timeout of 2s, but I don't
think it takes the used transport into account.

So if the source of these 2s is the spec for BT-devices, the 2s would
make sense as a timeout to test BT-devices, but it already becomes an
arbitrary value whenever the transport isn't realtime and/or exclusive
for the device but e.g. an I/O scheduler is inbetween (which is the case
for USB).

Regards,

Alexander Holler

2014-05-16 05:35:49

by Alexander Holler

[permalink] [raw]
Subject: Re: [PATCH 2/2] bluetooth: raise HCI_CMD_TIMEOUT from 2s to 8s

Am 15.05.2014 17:19, schrieb Alexander Holler:
> Am 15.05.2014 16:50, schrieb Alexander Holler:
>> Am 15.05.2014 14:54, schrieb Luiz Augusto von Dentz:
>
>>> This timeout seems arbitrary so I suppose we can increase it if we
>>> feel it is necessary but we used already different timeout for
>>> different commands like HCI_POWER_OFF_TIMEOUT, so perhaps if we can
>>> identify which command is more likely to timeout.
>>>
>>> We could perhaps auto reset if a command timeout if there is really no
>>> other way to recover.
>>
>> It is arbitrary but 2s is not enough here. And as I've written in the
>> description, there is absolutely no reason to keep this timeout
>> unnecessarily short. No one cares if an error message appears after 2s
>> or 8s if the communication with the dongle is in both cases broken
>> afterwards.
>>
>> One of the commands I experieced the problem with was e.g.
>> HCI_OP_DELETE_STORED_LINK_KEY or HCI_OP_WRITE_SSP_MODE.
>
> The problem is that you can never be sure what the origin of a timeouted
> command was. It might have been e.g. the USB-subsystem through wich the
> command and the response has to travel (in case of USB dongles) and not
> the dongle itself.

To explain a bit more, the box I'm experiencing these problems boots
from USB2.0 HD. So it's likely that there is quiet some action on the
bus and that shouldn't affect the operation of the BT-stack (besides
slowing it maybe a bit down).

Regards,

Alexander Holler

2014-05-15 15:19:11

by Alexander Holler

[permalink] [raw]
Subject: Re: [PATCH 2/2] bluetooth: raise HCI_CMD_TIMEOUT from 2s to 8s

Am 15.05.2014 16:50, schrieb Alexander Holler:
> Am 15.05.2014 14:54, schrieb Luiz Augusto von Dentz:

>> This timeout seems arbitrary so I suppose we can increase it if we
>> feel it is necessary but we used already different timeout for
>> different commands like HCI_POWER_OFF_TIMEOUT, so perhaps if we can
>> identify which command is more likely to timeout.
>>
>> We could perhaps auto reset if a command timeout if there is really no
>> other way to recover.
>
> It is arbitrary but 2s is not enough here. And as I've written in the
> description, there is absolutely no reason to keep this timeout
> unnecessarily short. No one cares if an error message appears after 2s
> or 8s if the communication with the dongle is in both cases broken
> afterwards.
>
> One of the commands I experieced the problem with was e.g.
> HCI_OP_DELETE_STORED_LINK_KEY or HCI_OP_WRITE_SSP_MODE.

The problem is that you can never be sure what the origin of a timeouted
command was. It might have been e.g. the USB-subsystem through wich the
command and the response has to travel (in case of USB dongles) and not
the dongle itself.

Regards,

Alexander Holler

2014-05-15 14:50:58

by Alexander Holler

[permalink] [raw]
Subject: Re: [PATCH 2/2] bluetooth: raise HCI_CMD_TIMEOUT from 2s to 8s

Am 15.05.2014 14:54, schrieb Luiz Augusto von Dentz:
> Hi Alexander,
>
> On Wed, May 14, 2014 at 5:00 PM, Alexander Holler <[email protected]> wrote:
>> The reasoning to do this is the following:
>>
>> - If a timeout occurs, the HCI-communication is broken afterwards and the
>> dongle isn't usable anymore.
>> - If it works after e.g. waiting 4s everyone is still happy but if it
>> just breaks after only waiting 2s nothing is gained.
>> - Having to wait some more seconds until an error occurs doesn't change
>> anything.
>>
>> So there is no disadvantage in rasing the timeout but a great advantage
>> in case the dongle needs more than 2s to process an HCI command.
>> E.g. I had sometimes HCI command timeouts at boot (but never after the BT stack
>> was successfull started). I assume the reason might be the USB-probing which
>> happend before through the bootloader, which might have confused the dongle
>> such that it needs a bit more time, but I'm not sure.
>>
>> Together with the patch which limits the timeout only to the actual time the
>> dongle needs to process an HCI command (and doesn't include the time the
>> kernel needs to process the answer to an HCI command), my problems were gone.
>>
>> Signed-off-by: Alexander Holler <[email protected]>
>> ---
>> include/net/bluetooth/hci.h | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
>> index be150cf..d50fd34 100644
>> --- a/include/net/bluetooth/hci.h
>> +++ b/include/net/bluetooth/hci.h
>> @@ -180,7 +180,7 @@ enum {
>> #define HCI_DISCONN_TIMEOUT msecs_to_jiffies(2000) /* 2 seconds */
>> #define HCI_PAIRING_TIMEOUT msecs_to_jiffies(60000) /* 60 seconds */
>> #define HCI_INIT_TIMEOUT msecs_to_jiffies(10000) /* 10 seconds */
>> -#define HCI_CMD_TIMEOUT msecs_to_jiffies(2000) /* 2 seconds */
>> +#define HCI_CMD_TIMEOUT msecs_to_jiffies(8000) /* 8 seconds */
>> #define HCI_ACL_TX_TIMEOUT msecs_to_jiffies(45000) /* 45 seconds */
>> #define HCI_AUTO_OFF_TIMEOUT msecs_to_jiffies(2000) /* 2 seconds */
>> #define HCI_POWER_OFF_TIMEOUT msecs_to_jiffies(5000) /* 5 seconds */
>> --
>> 1.8.3.1
>
> This timeout seems arbitrary so I suppose we can increase it if we
> feel it is necessary but we used already different timeout for
> different commands like HCI_POWER_OFF_TIMEOUT, so perhaps if we can
> identify which command is more likely to timeout.
>
> We could perhaps auto reset if a command timeout if there is really no
> other way to recover.

It is arbitrary but 2s is not enough here. And as I've written in the
description, there is absolutely no reason to keep this timeout
unnecessarily short. No one cares if an error message appears after 2s
or 8s if the communication with the dongle is in both cases broken
afterwards.

One of the commands I experieced the problem with was e.g.
HCI_OP_DELETE_STORED_LINK_KEY or HCI_OP_WRITE_SSP_MODE.

Regards,

Alexander Holler

2014-05-15 12:54:00

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: [PATCH 2/2] bluetooth: raise HCI_CMD_TIMEOUT from 2s to 8s

Hi Alexander,

On Wed, May 14, 2014 at 5:00 PM, Alexander Holler <[email protected]> wrote:
> The reasoning to do this is the following:
>
> - If a timeout occurs, the HCI-communication is broken afterwards and the
> dongle isn't usable anymore.
> - If it works after e.g. waiting 4s everyone is still happy but if it
> just breaks after only waiting 2s nothing is gained.
> - Having to wait some more seconds until an error occurs doesn't change
> anything.
>
> So there is no disadvantage in rasing the timeout but a great advantage
> in case the dongle needs more than 2s to process an HCI command.
> E.g. I had sometimes HCI command timeouts at boot (but never after the BT stack
> was successfull started). I assume the reason might be the USB-probing which
> happend before through the bootloader, which might have confused the dongle
> such that it needs a bit more time, but I'm not sure.
>
> Together with the patch which limits the timeout only to the actual time the
> dongle needs to process an HCI command (and doesn't include the time the
> kernel needs to process the answer to an HCI command), my problems were gone.
>
> Signed-off-by: Alexander Holler <[email protected]>
> ---
> include/net/bluetooth/hci.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
> index be150cf..d50fd34 100644
> --- a/include/net/bluetooth/hci.h
> +++ b/include/net/bluetooth/hci.h
> @@ -180,7 +180,7 @@ enum {
> #define HCI_DISCONN_TIMEOUT msecs_to_jiffies(2000) /* 2 seconds */
> #define HCI_PAIRING_TIMEOUT msecs_to_jiffies(60000) /* 60 seconds */
> #define HCI_INIT_TIMEOUT msecs_to_jiffies(10000) /* 10 seconds */
> -#define HCI_CMD_TIMEOUT msecs_to_jiffies(2000) /* 2 seconds */
> +#define HCI_CMD_TIMEOUT msecs_to_jiffies(8000) /* 8 seconds */
> #define HCI_ACL_TX_TIMEOUT msecs_to_jiffies(45000) /* 45 seconds */
> #define HCI_AUTO_OFF_TIMEOUT msecs_to_jiffies(2000) /* 2 seconds */
> #define HCI_POWER_OFF_TIMEOUT msecs_to_jiffies(5000) /* 5 seconds */
> --
> 1.8.3.1

This timeout seems arbitrary so I suppose we can increase it if we
feel it is necessary but we used already different timeout for
different commands like HCI_POWER_OFF_TIMEOUT, so perhaps if we can
identify which command is more likely to timeout.

We could perhaps auto reset if a command timeout if there is really no
other way to recover.


--
Luiz Augusto von Dentz

2014-05-14 14:00:25

by Alexander Holler

[permalink] [raw]
Subject: [PATCH 2/2] bluetooth: raise HCI_CMD_TIMEOUT from 2s to 8s

The reasoning to do this is the following:

- If a timeout occurs, the HCI-communication is broken afterwards and the
dongle isn't usable anymore.
- If it works after e.g. waiting 4s everyone is still happy but if it
just breaks after only waiting 2s nothing is gained.
- Having to wait some more seconds until an error occurs doesn't change
anything.

So there is no disadvantage in rasing the timeout but a great advantage
in case the dongle needs more than 2s to process an HCI command.
E.g. I had sometimes HCI command timeouts at boot (but never after the BT stack
was successfull started). I assume the reason might be the USB-probing which
happend before through the bootloader, which might have confused the dongle
such that it needs a bit more time, but I'm not sure.

Together with the patch which limits the timeout only to the actual time the
dongle needs to process an HCI command (and doesn't include the time the
kernel needs to process the answer to an HCI command), my problems were gone.

Signed-off-by: Alexander Holler <[email protected]>
---
include/net/bluetooth/hci.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
index be150cf..d50fd34 100644
--- a/include/net/bluetooth/hci.h
+++ b/include/net/bluetooth/hci.h
@@ -180,7 +180,7 @@ enum {
#define HCI_DISCONN_TIMEOUT msecs_to_jiffies(2000) /* 2 seconds */
#define HCI_PAIRING_TIMEOUT msecs_to_jiffies(60000) /* 60 seconds */
#define HCI_INIT_TIMEOUT msecs_to_jiffies(10000) /* 10 seconds */
-#define HCI_CMD_TIMEOUT msecs_to_jiffies(2000) /* 2 seconds */
+#define HCI_CMD_TIMEOUT msecs_to_jiffies(8000) /* 8 seconds */
#define HCI_ACL_TX_TIMEOUT msecs_to_jiffies(45000) /* 45 seconds */
#define HCI_AUTO_OFF_TIMEOUT msecs_to_jiffies(2000) /* 2 seconds */
#define HCI_POWER_OFF_TIMEOUT msecs_to_jiffies(5000) /* 5 seconds */
--
1.8.3.1

2014-06-01 01:42:55

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [PATCH 2/2] bluetooth: raise HCI_CMD_TIMEOUT from 2s to 8s

Hi Alexander,

>>> The reasoning to do this is the following:
>>>
>>> - If a timeout occurs, the HCI-communication is broken afterwards and the
>>> dongle isn't usable anymore.
>>> - If it works after e.g. waiting 4s everyone is still happy but if it
>>> just breaks after only waiting 2s nothing is gained.
>>> - Having to wait some more seconds until an error occurs doesn't change
>>> anything.
>>>
>>> So there is no disadvantage in rasing the timeout but a great advantage
>>> in case the dongle needs more than 2s to process an HCI command.
>>> E.g. I had sometimes HCI command timeouts at boot (but never after the BT stack
>>> was successfull started). I assume the reason might be the USB-probing which
>>> happend before through the bootloader, which might have confused the dongle
>>> such that it needs a bit more time, but I'm not sure.
>>>
>>> Together with the patch which limits the timeout only to the actual time the
>>> dongle needs to process an HCI command (and doesn't include the time the
>>> kernel needs to process the answer to an HCI command), my problems were gone.
>>>
>>> Signed-off-by: Alexander Holler <[email protected]>
>>> ---
>>> include/net/bluetooth/hci.h | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
>>> index be150cf..d50fd34 100644
>>> --- a/include/net/bluetooth/hci.h
>>> +++ b/include/net/bluetooth/hci.h
>>> @@ -180,7 +180,7 @@ enum {
>>> #define HCI_DISCONN_TIMEOUT msecs_to_jiffies(2000) /* 2 seconds */
>>> #define HCI_PAIRING_TIMEOUT msecs_to_jiffies(60000) /* 60 seconds */
>>> #define HCI_INIT_TIMEOUT msecs_to_jiffies(10000) /* 10 seconds */
>>> -#define HCI_CMD_TIMEOUT msecs_to_jiffies(2000) /* 2 seconds */
>>> +#define HCI_CMD_TIMEOUT msecs_to_jiffies(8000) /* 8 seconds */
>>> #define HCI_ACL_TX_TIMEOUT msecs_to_jiffies(45000) /* 45 seconds */
>>> #define HCI_AUTO_OFF_TIMEOUT msecs_to_jiffies(2000) /* 2 seconds */
>>> #define HCI_POWER_OFF_TIMEOUT msecs_to_jiffies(5000) /* 5 seconds */
>>
>> I think moving the command timeout handling into a delayed work struct might actually solve this problem nicely and does not force us to increase the timeout. A chip that does not respond for 8 seconds is a pretty bad chip.
>
> As I said in another mail, I don't think it is the chip. On the system
> where I'm experiencing these timeouts there's still the USB-subsysten
> inbetween. And this system boots from USB too, which means there's a lot
> of other traffic on the USB-bus besides the one for the USB-BT-dongle.
> And I don't know how the USB-stack (and hw) schedules the traffic, if he
> is able to schedule that at all.

I posted an experimental patch for changing cmd_timer into a delayed_work item and this might solve your scheduling problem. However that patch is untested and might needs some additional thinking on which queue the delayed work should be scheduled on. Nevertheless, you see the direction that we should explore first.

The L2CAP layer is not using struct timer_list at all anymore and in general we should remove timer_list usage from the Bluetooth subsystem since we do all our command processing in a workqueue. At least for lower protocols like HCI, L2CAP, SMP and A2DP.

Regards

Marcel