2024-01-08 22:46:42

by Jonas Dreßler

[permalink] [raw]
Subject: [PATCH v3 0/4] Bluetooth: Improve retrying of connection attempts

Since commit 4c67bc74f016 ("[Bluetooth] Support concurrent connect
requests"), the kernel supports trying to connect again in case the
bluetooth card is busy and fails to connect.

The logic that should handle this became a bit spotty over time, and also
cards these days appear to fail with more errors than just "Command
Disallowed".

This series refactores the handling of concurrent connection requests
by serializing all "Create Connection" commands for ACL connections
similar to how we do it for LE connections.

---

v1: https://lore.kernel.org/linux-bluetooth/[email protected]/
v2: https://lore.kernel.org/linux-bluetooth/[email protected]/
v3:
- Move the new sync function to hci_sync.c as requested by review
- Abort connection on failure using hci_abort_conn_sync() instead of
hci_abort_conn()
- Make the last commit message a bit more precise regarding the meaning
of BT_CONNECT2 state

Jonas Dreßler (4):
Bluetooth: Remove superfluous call to hci_conn_check_pending()
Bluetooth: hci_event: Use HCI error defines instead of magic values
Bluetooth: hci_conn: Only do ACL connections sequentially
Bluetooth: Remove pending ACL connection attempts

include/net/bluetooth/hci.h | 3 ++
include/net/bluetooth/hci_core.h | 1 -
include/net/bluetooth/hci_sync.h | 3 ++
net/bluetooth/hci_conn.c | 83 +++-----------------------------
net/bluetooth/hci_event.c | 29 +++--------
net/bluetooth/hci_sync.c | 72 +++++++++++++++++++++++++++
6 files changed, 93 insertions(+), 98 deletions(-)

--
2.43.0



2024-01-08 22:47:02

by Jonas Dreßler

[permalink] [raw]
Subject: [PATCH v3 1/4] Bluetooth: Remove superfluous call to hci_conn_check_pending()

The "pending connections" feature was originally introduced with commit
4c67bc74f016 ("[Bluetooth] Support concurrent connect requests") and
6bd57416127e ("[Bluetooth] Handling pending connect attempts after
inquiry") to handle controllers supporting only a single connection request
at a time. Later things were extended to also cancel ongoing inquiries on
connect() with commit 89e65975fea5 ("Bluetooth: Cancel Inquiry before
Create Connection").

With commit a9de9248064b ("[Bluetooth] Switch from OGF+OCF to using only
opcodes"), hci_conn_check_pending() was introduced as a helper to
consolidate a few places where we check for pending connections (indicated
by the BT_CONNECT2 flag) and then try to connect.

This refactoring commit also snuck in two more calls to
hci_conn_check_pending():

- One is in the failure callback of hci_cs_inquiry(), this one probably
makes sense: If we send an "HCI Inquiry" command and then immediately
after a "Create Connection" command, the "Create Connection" command might
fail before the "HCI Inquiry" command, and then we want to retry the
"Create Connection" on failure of the "HCI Inquiry".

- The other added call to hci_conn_check_pending() is in the event handler
for the "Remote Name" event, this seems unrelated and is possibly a
copy-paste error, so remove that one.

Fixes: a9de9248064b ("[Bluetooth] Switch from OGF+OCF to using only opcodes")
Signed-off-by: Jonas Dreßler <[email protected]>
---
net/bluetooth/hci_event.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
index 1e1c91473..9423394f6 100644
--- a/net/bluetooth/hci_event.c
+++ b/net/bluetooth/hci_event.c
@@ -3547,8 +3547,6 @@ static void hci_remote_name_evt(struct hci_dev *hdev, void *data,

bt_dev_dbg(hdev, "status 0x%2.2x", ev->status);

- hci_conn_check_pending(hdev);
-
hci_dev_lock(hdev);

conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &ev->bdaddr);
--
2.43.0


2024-01-08 22:47:54

by Jonas Dreßler

[permalink] [raw]
Subject: [PATCH v3 3/4] Bluetooth: hci_conn: Only do ACL connections sequentially

Pretty much all bluetooth chipsets only support paging a single device at
a time, and if they don't reject a secondary "Create Connection" request
while another is still ongoing, they'll most likely serialize those
requests in the firware.

With commit 4c67bc74f016 ("[Bluetooth] Support concurrent connect
requests") we started adding some serialization of our own in case the
adapter returns "Command Disallowed" HCI error.

This commit was using the BT_CONNECT2 state for the serialization, this
state is also used for a few more things (most notably to indicate we're
waiting for an inquiry to cancel) and therefore a bit unreliable. Also
not all BT firwares would respond with "Command Disallowed" on too many
connection requests, some will also respond with "Hardware Failure"
(BCM4378), and others will error out later and send a "Connect Complete"
event with error "Rejected Limited Resources" (Marvell 88W8897).

We can clean things up a bit and also make the serialization more reliable
by using our hci_sync machinery to always do "Create Connection" requests
in a sequential manner.

This is very similar to what we're already doing for establishing LE
connections, and it works well there.
---
include/net/bluetooth/hci.h | 1 +
include/net/bluetooth/hci_sync.h | 3 ++
net/bluetooth/hci_conn.c | 69 ++++--------------------------
net/bluetooth/hci_sync.c | 72 ++++++++++++++++++++++++++++++++
4 files changed, 85 insertions(+), 60 deletions(-)

diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
index 63f84e185..a84102ad5 100644
--- a/include/net/bluetooth/hci.h
+++ b/include/net/bluetooth/hci.h
@@ -437,6 +437,7 @@ enum {
#define HCI_ACL_TX_TIMEOUT msecs_to_jiffies(45000) /* 45 seconds */
#define HCI_AUTO_OFF_TIMEOUT msecs_to_jiffies(2000) /* 2 seconds */
#define HCI_POWER_OFF_TIMEOUT msecs_to_jiffies(5000) /* 5 seconds */
+#define HCI_ACL_CONN_TIMEOUT msecs_to_jiffies(20000) /* 20 seconds */
#define HCI_LE_CONN_TIMEOUT msecs_to_jiffies(20000) /* 20 seconds */
#define HCI_LE_AUTOCONN_TIMEOUT msecs_to_jiffies(4000) /* 4 seconds */

diff --git a/include/net/bluetooth/hci_sync.h b/include/net/bluetooth/hci_sync.h
index 57eeb07ae..2bc3235f3 100644
--- a/include/net/bluetooth/hci_sync.h
+++ b/include/net/bluetooth/hci_sync.h
@@ -136,3 +136,6 @@ int hci_le_terminate_big_sync(struct hci_dev *hdev, u8 handle, u8 reason);
int hci_le_big_terminate_sync(struct hci_dev *hdev, u8 handle);

int hci_le_pa_terminate_sync(struct hci_dev *hdev, u16 handle);
+
+int hci_acl_create_connection_sync(struct hci_dev *hdev,
+ struct hci_conn *conn);
diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c
index 73470cc35..c9a5734fc 100644
--- a/net/bluetooth/hci_conn.c
+++ b/net/bluetooth/hci_conn.c
@@ -178,64 +178,6 @@ static void hci_conn_cleanup(struct hci_conn *conn)
hci_conn_put(conn);
}

-static void hci_acl_create_connection(struct hci_conn *conn)
-{
- struct hci_dev *hdev = conn->hdev;
- struct inquiry_entry *ie;
- struct hci_cp_create_conn cp;
-
- BT_DBG("hcon %p", conn);
-
- /* Many controllers disallow HCI Create Connection while it is doing
- * HCI Inquiry. So we cancel the Inquiry first before issuing HCI Create
- * Connection. This may cause the MGMT discovering state to become false
- * without user space's request but it is okay since the MGMT Discovery
- * APIs do not promise that discovery should be done forever. Instead,
- * the user space monitors the status of MGMT discovering and it may
- * request for discovery again when this flag becomes false.
- */
- if (test_bit(HCI_INQUIRY, &hdev->flags)) {
- /* Put this connection to "pending" state so that it will be
- * executed after the inquiry cancel command complete event.
- */
- conn->state = BT_CONNECT2;
- hci_send_cmd(hdev, HCI_OP_INQUIRY_CANCEL, 0, NULL);
- return;
- }
-
- conn->state = BT_CONNECT;
- conn->out = true;
- conn->role = HCI_ROLE_MASTER;
-
- conn->attempt++;
-
- conn->link_policy = hdev->link_policy;
-
- memset(&cp, 0, sizeof(cp));
- bacpy(&cp.bdaddr, &conn->dst);
- cp.pscan_rep_mode = 0x02;
-
- ie = hci_inquiry_cache_lookup(hdev, &conn->dst);
- if (ie) {
- if (inquiry_entry_age(ie) <= INQUIRY_ENTRY_AGE_MAX) {
- cp.pscan_rep_mode = ie->data.pscan_rep_mode;
- cp.pscan_mode = ie->data.pscan_mode;
- cp.clock_offset = ie->data.clock_offset |
- cpu_to_le16(0x8000);
- }
-
- memcpy(conn->dev_class, ie->data.dev_class, 3);
- }
-
- cp.pkt_type = cpu_to_le16(conn->pkt_type);
- if (lmp_rswitch_capable(hdev) && !(hdev->link_mode & HCI_LM_MASTER))
- cp.role_switch = 0x01;
- else
- cp.role_switch = 0x00;
-
- hci_send_cmd(hdev, HCI_OP_CREATE_CONN, sizeof(cp), &cp);
-}
-
int hci_disconnect(struct hci_conn *conn, __u8 reason)
{
BT_DBG("hcon %p", conn);
@@ -1647,10 +1589,17 @@ struct hci_conn *hci_connect_acl(struct hci_dev *hdev, bdaddr_t *dst,

acl->conn_reason = conn_reason;
if (acl->state == BT_OPEN || acl->state == BT_CLOSED) {
+ int err;
+
acl->sec_level = BT_SECURITY_LOW;
acl->pending_sec_level = sec_level;
acl->auth_type = auth_type;
- hci_acl_create_connection(acl);
+
+ err = hci_acl_create_connection_sync(hdev, acl);
+ if (err) {
+ hci_conn_del(acl);
+ return ERR_PTR(err);
+ }
}

return acl;
@@ -2580,7 +2529,7 @@ void hci_conn_check_pending(struct hci_dev *hdev)

conn = hci_conn_hash_lookup_state(hdev, ACL_LINK, BT_CONNECT2);
if (conn)
- hci_acl_create_connection(conn);
+ hci_acl_create_connection_sync(hdev, conn);

hci_dev_unlock(hdev);
}
diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
index a15ab0b87..067d44570 100644
--- a/net/bluetooth/hci_sync.c
+++ b/net/bluetooth/hci_sync.c
@@ -6565,3 +6565,75 @@ int hci_update_adv_data(struct hci_dev *hdev, u8 instance)
return hci_cmd_sync_queue(hdev, _update_adv_data_sync,
UINT_PTR(instance), NULL);
}
+
+static int __hci_acl_create_connection_sync(struct hci_dev *hdev, void *data)
+{
+ struct hci_conn *conn = data;
+ struct inquiry_entry *ie;
+ struct hci_cp_create_conn cp;
+ int err;
+
+ BT_DBG("hcon %p", conn);
+
+ /* Many controllers disallow HCI Create Connection while it is doing
+ * HCI Inquiry. So we cancel the Inquiry first before issuing HCI Create
+ * Connection. This may cause the MGMT discovering state to become false
+ * without user space's request but it is okay since the MGMT Discovery
+ * APIs do not promise that discovery should be done forever. Instead,
+ * the user space monitors the status of MGMT discovering and it may
+ * request for discovery again when this flag becomes false.
+ */
+ if (test_bit(HCI_INQUIRY, &hdev->flags)) {
+ err = __hci_cmd_sync_status(hdev, HCI_OP_INQUIRY_CANCEL, 0,
+ NULL, HCI_CMD_TIMEOUT);
+ if (err)
+ bt_dev_warn(hdev, "Failed to cancel inquiry %d", err);
+ }
+
+ conn->state = BT_CONNECT;
+ conn->out = true;
+ conn->role = HCI_ROLE_MASTER;
+
+ conn->attempt++;
+
+ conn->link_policy = hdev->link_policy;
+
+ memset(&cp, 0, sizeof(cp));
+ bacpy(&cp.bdaddr, &conn->dst);
+ cp.pscan_rep_mode = 0x02;
+
+ ie = hci_inquiry_cache_lookup(hdev, &conn->dst);
+ if (ie) {
+ if (inquiry_entry_age(ie) <= INQUIRY_ENTRY_AGE_MAX) {
+ cp.pscan_rep_mode = ie->data.pscan_rep_mode;
+ cp.pscan_mode = ie->data.pscan_mode;
+ cp.clock_offset = ie->data.clock_offset |
+ cpu_to_le16(0x8000);
+ }
+
+ memcpy(conn->dev_class, ie->data.dev_class, 3);
+ }
+
+ cp.pkt_type = cpu_to_le16(conn->pkt_type);
+ if (lmp_rswitch_capable(hdev) && !(hdev->link_mode & HCI_LM_MASTER))
+ cp.role_switch = 0x01;
+ else
+ cp.role_switch = 0x00;
+
+ err = __hci_cmd_sync_status_sk(hdev, HCI_OP_CREATE_CONN,
+ sizeof(cp), &cp,
+ HCI_EV_CONN_COMPLETE,
+ HCI_ACL_CONN_TIMEOUT, NULL);
+
+ if (err == -ETIMEDOUT)
+ hci_abort_conn_sync(hdev, conn, HCI_ERROR_LOCAL_HOST_TERM);
+
+ return err;
+}
+
+int hci_acl_create_connection_sync(struct hci_dev *hdev,
+ struct hci_conn *conn)
+{
+ return hci_cmd_sync_queue(hdev, __hci_acl_create_connection_sync,
+ conn, NULL);
+}
--
2.43.0


2024-01-08 23:13:13

by bluez.test.bot

[permalink] [raw]
Subject: RE: Bluetooth: Improve retrying of connection attempts

This is an automated email and please do not reply to this email.

Dear Submitter,

Thank you for submitting the patches to the linux bluetooth mailing list.
While preparing the CI tests, the patches you submitted couldn't be applied to the current HEAD of the repository.

----- Output -----

error: patch failed: include/net/bluetooth/hci.h:437
error: include/net/bluetooth/hci.h: patch does not apply
error: patch failed: net/bluetooth/hci_conn.c:178
error: net/bluetooth/hci_conn.c: patch does not apply
hint: Use 'git am --show-current-patch' to see the failed patch

Please resolve the issue and submit the patches again.


---
Regards,
Linux Bluetooth

2024-01-09 17:54:35

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: [PATCH v3 0/4] Bluetooth: Improve retrying of connection attempts

Hi Jonas,

On Mon, Jan 8, 2024 at 5:46 PM Jonas Dreßler <[email protected]> wrote:
>
> Since commit 4c67bc74f016 ("[Bluetooth] Support concurrent connect
> requests"), the kernel supports trying to connect again in case the
> bluetooth card is busy and fails to connect.
>
> The logic that should handle this became a bit spotty over time, and also
> cards these days appear to fail with more errors than just "Command
> Disallowed".
>
> This series refactores the handling of concurrent connection requests
> by serializing all "Create Connection" commands for ACL connections
> similar to how we do it for LE connections.
>
> ---
>
> v1: https://lore.kernel.org/linux-bluetooth/[email protected]/
> v2: https://lore.kernel.org/linux-bluetooth/[email protected]/
> v3:
> - Move the new sync function to hci_sync.c as requested by review
> - Abort connection on failure using hci_abort_conn_sync() instead of
> hci_abort_conn()
> - Make the last commit message a bit more precise regarding the meaning
> of BT_CONNECT2 state
>
> Jonas Dreßler (4):
> Bluetooth: Remove superfluous call to hci_conn_check_pending()
> Bluetooth: hci_event: Use HCI error defines instead of magic values
> Bluetooth: hci_conn: Only do ACL connections sequentially
> Bluetooth: Remove pending ACL connection attempts
>
> include/net/bluetooth/hci.h | 3 ++
> include/net/bluetooth/hci_core.h | 1 -
> include/net/bluetooth/hci_sync.h | 3 ++
> net/bluetooth/hci_conn.c | 83 +++-----------------------------
> net/bluetooth/hci_event.c | 29 +++--------
> net/bluetooth/hci_sync.c | 72 +++++++++++++++++++++++++++
> 6 files changed, 93 insertions(+), 98 deletions(-)
>
> --
> 2.43.0

After rebasing and fixing a little bit here and there, see v4, looks
like this changes is affecting the following mgmt-tester -s "Pair
Device - Power off 1":

Pair Device - Power off 1 - init
Read Version callback
Status: Success (0x00)
Version 1.22
Read Commands callback
Status: Success (0x00)
Read Index List callback
Status: Success (0x00)
Index Added callback
Index: 0x0000
Enable management Mesh interface
Enabling Mesh feature
Read Info callback
Status: Success (0x00)
Address: 00:AA:01:00:00:00
Version: 0x09
Manufacturer: 0x05f1
Supported settings: 0x0001bfff
Current settings: 0x00000080
Class: 0x000000
Name:
Short name:
Mesh feature is enabled
Pair Device - Power off 1 - setup
Setup sending Set Bondable (0x0009)
Setup sending Set Powered (0x0005)
Initial settings completed
Test setup condition added, total 1
Client set connectable: Success (0x00)
Test setup condition complete, 0 left
Pair Device - Power off 1 - setup complete
Pair Device - Power off 1 - run
Sending Pair Device (0x0019)
Bluetooth: hci0: command 0x0405 tx timeout
Bluetooth: hci0: command 0x0408 tx timeout
Test condition added, total 1
Pair Device - Power off 1 - test timed out
Pair Device (0x0019): Disconnected (0x0e)
Pair Device - Power off 1 - test not run
Pair Device - Power off 1 - teardown
Pair Device - Power off 1 - teardown
Index Removed callback
Index: 0x0000
Pair Device - Power off 1 - teardown complete
Pair Device - Power off 1 - done

--
Luiz Augusto von Dentz

2024-01-09 21:58:12

by Jonas Dreßler

[permalink] [raw]
Subject: Re: [PATCH v3 0/4] Bluetooth: Improve retrying of connection attempts

Hi Luiz,

On 1/9/24 18:53, Luiz Augusto von Dentz wrote:
> Hi Jonas,
>
> On Mon, Jan 8, 2024 at 5:46 PM Jonas Dreßler <[email protected]> wrote:
>>
>> Since commit 4c67bc74f016 ("[Bluetooth] Support concurrent connect
>> requests"), the kernel supports trying to connect again in case the
>> bluetooth card is busy and fails to connect.
>>
>> The logic that should handle this became a bit spotty over time, and also
>> cards these days appear to fail with more errors than just "Command
>> Disallowed".
>>
>> This series refactores the handling of concurrent connection requests
>> by serializing all "Create Connection" commands for ACL connections
>> similar to how we do it for LE connections.
>>
>> ---
>>
>> v1: https://lore.kernel.org/linux-bluetooth/[email protected]/
>> v2: https://lore.kernel.org/linux-bluetooth/[email protected]/
>> v3:
>> - Move the new sync function to hci_sync.c as requested by review
>> - Abort connection on failure using hci_abort_conn_sync() instead of
>> hci_abort_conn()
>> - Make the last commit message a bit more precise regarding the meaning
>> of BT_CONNECT2 state
>>
>> Jonas Dreßler (4):
>> Bluetooth: Remove superfluous call to hci_conn_check_pending()
>> Bluetooth: hci_event: Use HCI error defines instead of magic values
>> Bluetooth: hci_conn: Only do ACL connections sequentially
>> Bluetooth: Remove pending ACL connection attempts
>>
>> include/net/bluetooth/hci.h | 3 ++
>> include/net/bluetooth/hci_core.h | 1 -
>> include/net/bluetooth/hci_sync.h | 3 ++
>> net/bluetooth/hci_conn.c | 83 +++-----------------------------
>> net/bluetooth/hci_event.c | 29 +++--------
>> net/bluetooth/hci_sync.c | 72 +++++++++++++++++++++++++++
>> 6 files changed, 93 insertions(+), 98 deletions(-)
>>
>> --
>> 2.43.0
>
> After rebasing and fixing a little bit here and there, see v4, looks
> like this changes is affecting the following mgmt-tester -s "Pair
> Device - Power off 1":
>
> Pair Device - Power off 1 - init
> Read Version callback
> Status: Success (0x00)
> Version 1.22
> Read Commands callback
> Status: Success (0x00)
> Read Index List callback
> Status: Success (0x00)
> Index Added callback
> Index: 0x0000
> Enable management Mesh interface
> Enabling Mesh feature
> Read Info callback
> Status: Success (0x00)
> Address: 00:AA:01:00:00:00
> Version: 0x09
> Manufacturer: 0x05f1
> Supported settings: 0x0001bfff
> Current settings: 0x00000080
> Class: 0x000000
> Name:
> Short name:
> Mesh feature is enabled
> Pair Device - Power off 1 - setup
> Setup sending Set Bondable (0x0009)
> Setup sending Set Powered (0x0005)
> Initial settings completed
> Test setup condition added, total 1
> Client set connectable: Success (0x00)
> Test setup condition complete, 0 left
> Pair Device - Power off 1 - setup complete
> Pair Device - Power off 1 - run
> Sending Pair Device (0x0019)
> Bluetooth: hci0: command 0x0405 tx timeout
> Bluetooth: hci0: command 0x0408 tx timeout
> Test condition added, total 1
> Pair Device - Power off 1 - test timed out
> Pair Device (0x0019): Disconnected (0x0e)
> Pair Device - Power off 1 - test not run
> Pair Device - Power off 1 - teardown
> Pair Device - Power off 1 - teardown
> Index Removed callback
> Index: 0x0000
> Pair Device - Power off 1 - teardown complete
> Pair Device - Power off 1 - done
>

Thanks for landing the first two commits!

I think this is actually the same issue causing the test failure
as in the other issue I had:
https://lore.kernel.org/linux-bluetooth/[email protected]/

It seems that the emulator is unable to reply to HCI commands sent
from the hci_sync machinery, possibly because that is sending things
on a separate thread?

Cheers,
Jonas

2024-01-24 16:17:55

by Jonas Dreßler

[permalink] [raw]
Subject: Re: [PATCH v3 0/4] Bluetooth: Improve retrying of connection attempts

Hi Luiz,

On 1/9/24 10:57 PM, Jonas Dreßler wrote:
> Hi Luiz,
>
> On 1/9/24 18:53, Luiz Augusto von Dentz wrote:
>> Hi Jonas,
>>
>> On Mon, Jan 8, 2024 at 5:46 PM Jonas Dreßler <[email protected]> wrote:
>>>
>>> Since commit 4c67bc74f016 ("[Bluetooth] Support concurrent connect
>>> requests"), the kernel supports trying to connect again in case the
>>> bluetooth card is busy and fails to connect.
>>>
>>> The logic that should handle this became a bit spotty over time, and also
>>> cards these days appear to fail with more errors than just "Command
>>> Disallowed".
>>>
>>> This series refactores the handling of concurrent connection requests
>>> by serializing all "Create Connection" commands for ACL connections
>>> similar to how we do it for LE connections.
>>>
>>> ---
>>>
>>> v1: https://lore.kernel.org/linux-bluetooth/[email protected]/
>>> v2: https://lore.kernel.org/linux-bluetooth/[email protected]/
>>> v3:
>>>    - Move the new sync function to hci_sync.c as requested by review
>>>    - Abort connection on failure using hci_abort_conn_sync() instead of
>>>      hci_abort_conn()
>>>    - Make the last commit message a bit more precise regarding the meaning
>>>      of BT_CONNECT2 state
>>>
>>> Jonas Dreßler (4):
>>>    Bluetooth: Remove superfluous call to hci_conn_check_pending()
>>>    Bluetooth: hci_event: Use HCI error defines instead of magic values
>>>    Bluetooth: hci_conn: Only do ACL connections sequentially
>>>    Bluetooth: Remove pending ACL connection attempts
>>>
>>>   include/net/bluetooth/hci.h      |  3 ++
>>>   include/net/bluetooth/hci_core.h |  1 -
>>>   include/net/bluetooth/hci_sync.h |  3 ++
>>>   net/bluetooth/hci_conn.c         | 83 +++-----------------------------
>>>   net/bluetooth/hci_event.c        | 29 +++--------
>>>   net/bluetooth/hci_sync.c         | 72 +++++++++++++++++++++++++++
>>>   6 files changed, 93 insertions(+), 98 deletions(-)
>>>
>>> --
>>> 2.43.0
>>
>> After rebasing and fixing a little bit here and there, see v4, looks
>> like this changes is affecting the following mgmt-tester -s "Pair
>> Device - Power off 1":
>>
>> Pair Device - Power off 1 - init
>>    Read Version callback
>>      Status: Success (0x00)
>>      Version 1.22
>>    Read Commands callback
>>      Status: Success (0x00)
>>    Read Index List callback
>>      Status: Success (0x00)
>>    Index Added callback
>>      Index: 0x0000
>>    Enable management Mesh interface
>>    Enabling Mesh feature
>>    Read Info callback
>>      Status: Success (0x00)
>>      Address: 00:AA:01:00:00:00
>>      Version: 0x09
>>      Manufacturer: 0x05f1
>>      Supported settings: 0x0001bfff
>>      Current settings: 0x00000080
>>      Class: 0x000000
>>      Name:
>>      Short name:
>>    Mesh feature is enabled
>> Pair Device - Power off 1 - setup
>>    Setup sending Set Bondable (0x0009)
>>    Setup sending Set Powered (0x0005)
>>    Initial settings completed
>>    Test setup condition added, total 1
>>    Client set connectable: Success (0x00)
>>    Test setup condition complete, 0 left
>> Pair Device - Power off 1 - setup complete
>> Pair Device - Power off 1 - run
>>    Sending Pair Device (0x0019)
>> Bluetooth: hci0: command 0x0405 tx timeout
>> Bluetooth: hci0: command 0x0408 tx timeout
>>    Test condition added, total 1
>> Pair Device - Power off 1 - test timed out
>>    Pair Device (0x0019): Disconnected (0x0e)
>> Pair Device - Power off 1 - test not run
>> Pair Device - Power off 1 - teardown
>> Pair Device - Power off 1 - teardown
>>    Index Removed callback
>>      Index: 0x0000
>> Pair Device - Power off 1 - teardown complete
>> Pair Device - Power off 1 - done
>>
>
> Thanks for landing the first two commits!
>
> I think this is actually the same issue causing the test failure
> as in the other issue I had:
> https://lore.kernel.org/linux-bluetooth/[email protected]/
>
> It seems that the emulator is unable to reply to HCI commands sent
> from the hci_sync machinery, possibly because that is sending things
> on a separate thread?

Okay I did some further digging now: Turns out this actually not a problem
with vhci and the emulator, but (in this test case) it's actually intended
that there's the command times out, because force_power_off is TRUE for
this test case, and the HCI device gets shut down right after sending the MGMT
command.

The test broke because the "Command Complete" MGMT event comes back with status
"Disconnected" instead of "Not Powered": The reason for that is the
hci_abort_conn_sync() that I added in the case where the "Create Connection" HCI
times out. hci_abort_conn_sync() calls hci_conn_failed() with
HCI_ERROR_LOCAL_HOST_TERM as expected, this in turn calls the hci_connect_cfm()
callback (pairing_complete_cb), and there we we look up HCI_ERROR_LOCAL_HOST_TERM
in mgmt_status_table, ending up with MGMT_STATUS_DISCONNECTED.

When I remove the hci_abort_conn_sync() we get the "Not Powered" failure again,
I'm not exactly sure why that happens (I assume there's some kind of generic mgmt
failure return handler that checks hdev_is_powered() and then sets the error).

So the question now is do we want to adjust the test (and possibly bluetoothd?)
to expect "Disconnected" instead of "Not Powered", or should I get rid of the
hci_abort_conn_sync() again? Fwiw, in hci_le_create_conn_sync() we also clean
up like this on ETIMEDOUT (maybe the spec is just different there?), so
consistency wise it seems better to adjust the test to expect "Disconnected".

Cheers,
Jonas

>
> Cheers,
> Jonas

2024-01-24 16:34:32

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: [PATCH v3 0/4] Bluetooth: Improve retrying of connection attempts

Hi Jonas,

On Wed, Jan 24, 2024 at 11:17 AM Jonas Dreßler <[email protected]> wrote:
>
> Hi Luiz,
>
> On 1/9/24 10:57 PM, Jonas Dreßler wrote:
> > Hi Luiz,
> >
> > On 1/9/24 18:53, Luiz Augusto von Dentz wrote:
> >> Hi Jonas,
> >>
> >> On Mon, Jan 8, 2024 at 5:46 PM Jonas Dreßler <[email protected]> wrote:
> >>>
> >>> Since commit 4c67bc74f016 ("[Bluetooth] Support concurrent connect
> >>> requests"), the kernel supports trying to connect again in case the
> >>> bluetooth card is busy and fails to connect.
> >>>
> >>> The logic that should handle this became a bit spotty over time, and also
> >>> cards these days appear to fail with more errors than just "Command
> >>> Disallowed".
> >>>
> >>> This series refactores the handling of concurrent connection requests
> >>> by serializing all "Create Connection" commands for ACL connections
> >>> similar to how we do it for LE connections.
> >>>
> >>> ---
> >>>
> >>> v1: https://lore.kernel.org/linux-bluetooth/[email protected]/
> >>> v2: https://lore.kernel.org/linux-bluetooth/[email protected]/
> >>> v3:
> >>> - Move the new sync function to hci_sync.c as requested by review
> >>> - Abort connection on failure using hci_abort_conn_sync() instead of
> >>> hci_abort_conn()
> >>> - Make the last commit message a bit more precise regarding the meaning
> >>> of BT_CONNECT2 state
> >>>
> >>> Jonas Dreßler (4):
> >>> Bluetooth: Remove superfluous call to hci_conn_check_pending()
> >>> Bluetooth: hci_event: Use HCI error defines instead of magic values
> >>> Bluetooth: hci_conn: Only do ACL connections sequentially
> >>> Bluetooth: Remove pending ACL connection attempts
> >>>
> >>> include/net/bluetooth/hci.h | 3 ++
> >>> include/net/bluetooth/hci_core.h | 1 -
> >>> include/net/bluetooth/hci_sync.h | 3 ++
> >>> net/bluetooth/hci_conn.c | 83 +++-----------------------------
> >>> net/bluetooth/hci_event.c | 29 +++--------
> >>> net/bluetooth/hci_sync.c | 72 +++++++++++++++++++++++++++
> >>> 6 files changed, 93 insertions(+), 98 deletions(-)
> >>>
> >>> --
> >>> 2.43.0
> >>
> >> After rebasing and fixing a little bit here and there, see v4, looks
> >> like this changes is affecting the following mgmt-tester -s "Pair
> >> Device - Power off 1":
> >>
> >> Pair Device - Power off 1 - init
> >> Read Version callback
> >> Status: Success (0x00)
> >> Version 1.22
> >> Read Commands callback
> >> Status: Success (0x00)
> >> Read Index List callback
> >> Status: Success (0x00)
> >> Index Added callback
> >> Index: 0x0000
> >> Enable management Mesh interface
> >> Enabling Mesh feature
> >> Read Info callback
> >> Status: Success (0x00)
> >> Address: 00:AA:01:00:00:00
> >> Version: 0x09
> >> Manufacturer: 0x05f1
> >> Supported settings: 0x0001bfff
> >> Current settings: 0x00000080
> >> Class: 0x000000
> >> Name:
> >> Short name:
> >> Mesh feature is enabled
> >> Pair Device - Power off 1 - setup
> >> Setup sending Set Bondable (0x0009)
> >> Setup sending Set Powered (0x0005)
> >> Initial settings completed
> >> Test setup condition added, total 1
> >> Client set connectable: Success (0x00)
> >> Test setup condition complete, 0 left
> >> Pair Device - Power off 1 - setup complete
> >> Pair Device - Power off 1 - run
> >> Sending Pair Device (0x0019)
> >> Bluetooth: hci0: command 0x0405 tx timeout
> >> Bluetooth: hci0: command 0x0408 tx timeout
> >> Test condition added, total 1
> >> Pair Device - Power off 1 - test timed out
> >> Pair Device (0x0019): Disconnected (0x0e)
> >> Pair Device - Power off 1 - test not run
> >> Pair Device - Power off 1 - teardown
> >> Pair Device - Power off 1 - teardown
> >> Index Removed callback
> >> Index: 0x0000
> >> Pair Device - Power off 1 - teardown complete
> >> Pair Device - Power off 1 - done
> >>
> >
> > Thanks for landing the first two commits!
> >
> > I think this is actually the same issue causing the test failure
> > as in the other issue I had:
> > https://lore.kernel.org/linux-bluetooth/[email protected]/
> >
> > It seems that the emulator is unable to reply to HCI commands sent
> > from the hci_sync machinery, possibly because that is sending things
> > on a separate thread?
>
> Okay I did some further digging now: Turns out this actually not a problem
> with vhci and the emulator, but (in this test case) it's actually intended
> that there's the command times out, because force_power_off is TRUE for
> this test case, and the HCI device gets shut down right after sending the MGMT
> command.
>
> The test broke because the "Command Complete" MGMT event comes back with status
> "Disconnected" instead of "Not Powered": The reason for that is the
> hci_abort_conn_sync() that I added in the case where the "Create Connection" HCI
> times out. hci_abort_conn_sync() calls hci_conn_failed() with
> HCI_ERROR_LOCAL_HOST_TERM as expected, this in turn calls the hci_connect_cfm()
> callback (pairing_complete_cb), and there we we look up HCI_ERROR_LOCAL_HOST_TERM
> in mgmt_status_table, ending up with MGMT_STATUS_DISCONNECTED.
>
> When I remove the hci_abort_conn_sync() we get the "Not Powered" failure again,
> I'm not exactly sure why that happens (I assume there's some kind of generic mgmt
> failure return handler that checks hdev_is_powered() and then sets the error).
>
> So the question now is do we want to adjust the test (and possibly bluetoothd?)
> to expect "Disconnected" instead of "Not Powered", or should I get rid of the
> hci_abort_conn_sync() again? Fwiw, in hci_le_create_conn_sync() we also clean
> up like this on ETIMEDOUT (maybe the spec is just different there?), so
> consistency wise it seems better to adjust the test to expect "Disconnected".

Great that you find time to dig into this, and yes I think it is fine
to expect a different error if in the process we clean up using
hci_abort_conn_sync we just need to make sure nothing else is affected
by this change.

> Cheers,
> Jonas
>
> >
> > Cheers,
> > Jonas



--
Luiz Augusto von Dentz