2023-06-02 09:51:39

by Herman Meerlo

[permalink] [raw]
Subject: Intermittent Not connectable devices reported

I'm working on a Raspberry Pi device that performs periodic BLE discovery and tries to connect to multiple devices. Most of the times this goes fine but especially in BLE heavy environments every now and then it fails because it says that the device is not found. I have been debugging this issue for a few weeks now and it comes down to this:

• Connect to DBus to start the discovery
• Devices are found, interfacesAdded callbacks are made, everything looks fine
• Stop discovery
• Directly some devices are removed by the interfacesRemoved callback
• My code does not get the device it searches for -> unhappy user

What happens under the hood is that DBus removes all devices from it's cache that have been indicated by Bluez as being Not connectable. I.e. there is no use in keeping them around, you can't connect to them anyway. But... this is not true for the device I'm looking for. It is marked as Not connectable incorrectly.
So to chase down the problem I create a btmon dump which shows indeed that the device is reported to be Not connectable after having received a SCAN_RSP:

> HCI Event: LE Meta Event (0x3e) plen 38 #73 [hci0] 34.359921
LE Advertising Report (0x02)
Num reports: 1
Event type: Scan response - SCAN_RSP (0x04)
Address type: Public (0x00)
Address: F4:B8:5E:64:02:55 (Texas Instruments)
Data length: 26
Name (complete): BeeWi SmartLite
Peripheral Conn. Interval: 0x0028 - 0x0050
TX power: 0 dBm
RSSI: -42 dBm (0xd6)
@ MGMT Event: Device Found (0x0012) plen 40 {0x0001} [hci0] 34.360057
LE Address: F4:B8:5E:64:02:55 (Texas Instruments)
RSSI: -42 dBm (0xd6)
Flags: 0x00000004
Not Connectable
Data length: 26
Name (complete): BeeWi SmartLite
Peripheral Conn. Interval: 0x0028 - 0x0050
TX power: 0 dBm

But all ADV_IND PDU's before that clearly indicate that the device is connectable, it is only after this SCAN_RSP that it is reported as Not connectable:

> HCI Event: LE Meta Event (0x3e) plen 27 #46 [hci0] 34.152817
LE Advertising Report (0x02)
Num reports: 1
Event type: Connectable undirected - ADV_IND (0x00)
Address type: Public (0x00)
Address: F4:B8:5E:64:02:55 (Texas Instruments)
Data length: 15
Flags: 0x06
LE General Discoverable Mode
BR/EDR Not Supported
Company: Texas Instruments Inc. (13)
Data: 06030108b0e408f7
RSSI: -43 dBm (0xd5)
@ MGMT Event: Device Found (0x0012) plen 31 {0x0001} [hci0] 34.152905
LE Address: 44:6E:FF:00:0D:65 (Resolvable)
RSSI: -74 dBm (0xb6)
Flags: 0x00000000
Data length: 17
Flags: 0x1a
LE General Discoverable Mode
Simultaneous LE and BR/EDR (Controller)
Simultaneous LE and BR/EDR (Host)
TX power: 9 dBm
Company: Apple, Inc. (76)
Type: Unknown (16)
Data: 01188898dc
> HCI Event: LE Meta Event (0x3e) plen 41 #47 [hci0] 34.156958
LE Advertising Report (0x02)
Num reports: 1
Event type: Connectable undirected - ADV_IND (0x00)
Address type: Random (0x01)
Address: FA:BD:8D:12:26:BF (Static)
Data length: 29
Name (short): P mesh
Flags: 0x04
BR/EDR Not Supported
128-bit Service UUIDs (partial): 1 entry
Vendor specific
RSSI: -47 dBm (0xd1)
@ MGMT Event: Device Found (0x0012) plen 29 {0x0001} [hci0] 34.157030
LE Address: F4:B8:5E:64:02:55 (Texas Instruments)
RSSI: -43 dBm (0xd5)
Flags: 0x00000000
Data length: 15
Flags: 0x06
LE General Discoverable Mode
BR/EDR Not Supported
Company: Texas Instruments Inc. (13)
Data: 06030108b0e408f7

So I am heavily doubting whether the Linux kernel code has a bug in handling the SCAN_RSP. Please look at this part of the kernel:

https://github.com/torvalds/linux/blob/48b1320a674e1ff5de2fad8606bee38f724594dc/net/bluetooth/hci_event.c#L6326

It sets the NOT CONNECTABLE flag by default for a SCAN_RSP and will overwrite it with any flags it has received with a previous ADV_IND. But it does not seem to take into account that in BLE heavy environments the previous ADV_IND might be of a totally different device. So every now and then it will enter the first path where it will just report the device with a NOT_CONNECTABLE flag. Or am I missing something here?


2023-06-07 06:52:13

by Herman Meerlo

[permalink] [raw]
Subject: Re: Intermittent Not connectable devices reported


> On 2 Jun 2023, at 11:45, Herman Meerlo <[email protected]> wrote:
>
> I'm working on a Raspberry Pi device that performs periodic BLE discovery and tries to connect to multiple devices. Most of the times this goes fine but especially in BLE heavy environments every now and then it fails because it says that the device is not found. I have been debugging this issue for a few weeks now and it comes down to this:
>
> • Connect to DBus to start the discovery
> • Devices are found, interfacesAdded callbacks are made, everything looks fine
> • Stop discovery
> • Directly some devices are removed by the interfacesRemoved callback
> • My code does not get the device it searches for -> unhappy user
>
> What happens under the hood is that DBus removes all devices from it's cache that have been indicated by Bluez as being Not connectable. I.e. there is no use in keeping them around, you can't connect to them anyway. But... this is not true for the device I'm looking for. It is marked as Not connectable incorrectly.
> So to chase down the problem I create a btmon dump which shows indeed that the device is reported to be Not connectable after having received a SCAN_RSP:
>
>> HCI Event: LE Meta Event (0x3e) plen 38 #73 [hci0] 34.359921
> LE Advertising Report (0x02)
> Num reports: 1
> Event type: Scan response - SCAN_RSP (0x04)
> Address type: Public (0x00)
> Address: F4:B8:5E:64:02:55 (Texas Instruments)
> Data length: 26
> Name (complete): BeeWi SmartLite
> Peripheral Conn. Interval: 0x0028 - 0x0050
> TX power: 0 dBm
> RSSI: -42 dBm (0xd6)
> @ MGMT Event: Device Found (0x0012) plen 40 {0x0001} [hci0] 34.360057
> LE Address: F4:B8:5E:64:02:55 (Texas Instruments)
> RSSI: -42 dBm (0xd6)
> Flags: 0x00000004
> Not Connectable
> Data length: 26
> Name (complete): BeeWi SmartLite
> Peripheral Conn. Interval: 0x0028 - 0x0050
> TX power: 0 dBm
>
> But all ADV_IND PDU's before that clearly indicate that the device is connectable, it is only after this SCAN_RSP that it is reported as Not connectable:
>
>> HCI Event: LE Meta Event (0x3e) plen 27 #46 [hci0] 34.152817
> LE Advertising Report (0x02)
> Num reports: 1
> Event type: Connectable undirected - ADV_IND (0x00)
> Address type: Public (0x00)
> Address: F4:B8:5E:64:02:55 (Texas Instruments)
> Data length: 15
> Flags: 0x06
> LE General Discoverable Mode
> BR/EDR Not Supported
> Company: Texas Instruments Inc. (13)
> Data: 06030108b0e408f7
> RSSI: -43 dBm (0xd5)
> @ MGMT Event: Device Found (0x0012) plen 31 {0x0001} [hci0] 34.152905
> LE Address: 44:6E:FF:00:0D:65 (Resolvable)
> RSSI: -74 dBm (0xb6)
> Flags: 0x00000000
> Data length: 17
> Flags: 0x1a
> LE General Discoverable Mode
> Simultaneous LE and BR/EDR (Controller)
> Simultaneous LE and BR/EDR (Host)
> TX power: 9 dBm
> Company: Apple, Inc. (76)
> Type: Unknown (16)
> Data: 01188898dc
>> HCI Event: LE Meta Event (0x3e) plen 41 #47 [hci0] 34.156958
> LE Advertising Report (0x02)
> Num reports: 1
> Event type: Connectable undirected - ADV_IND (0x00)
> Address type: Random (0x01)
> Address: FA:BD:8D:12:26:BF (Static)
> Data length: 29
> Name (short): P mesh
> Flags: 0x04
> BR/EDR Not Supported
> 128-bit Service UUIDs (partial): 1 entry
> Vendor specific
> RSSI: -47 dBm (0xd1)
> @ MGMT Event: Device Found (0x0012) plen 29 {0x0001} [hci0] 34.157030
> LE Address: F4:B8:5E:64:02:55 (Texas Instruments)
> RSSI: -43 dBm (0xd5)
> Flags: 0x00000000
> Data length: 15
> Flags: 0x06
> LE General Discoverable Mode
> BR/EDR Not Supported
> Company: Texas Instruments Inc. (13)
> Data: 06030108b0e408f7
>
> So I am heavily doubting whether the Linux kernel code has a bug in handling the SCAN_RSP. Please look at this part of the kernel:
>
> https://github.com/torvalds/linux/blob/48b1320a674e1ff5de2fad8606bee38f724594dc/net/bluetooth/hci_event.c#L6326
>
> It sets the NOT CONNECTABLE flag by default for a SCAN_RSP and will overwrite it with any flags it has received with a previous ADV_IND. But it does not seem to take into account that in BLE heavy environments the previous ADV_IND might be of a totally different device. So every now and then it will enter the first path where it will just report the device with a NOT_CONNECTABLE flag. Or am I missing something here?

Can someone take a look at this? It is happening quite often and I can not imagine that I’m the only one with this issue.



2023-06-07 19:25:45

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: Intermittent Not connectable devices reported

Hi Herman,

On Fri, Jun 2, 2023 at 2:51 AM Herman Meerlo <[email protected]> wrote:
>
> I'm working on a Raspberry Pi device that performs periodic BLE discovery and tries to connect to multiple devices. Most of the times this goes fine but especially in BLE heavy environments every now and then it fails because it says that the device is not found. I have been debugging this issue for a few weeks now and it comes down to this:
>
> • Connect to DBus to start the discovery
> • Devices are found, interfacesAdded callbacks are made, everything looks fine
> • Stop discovery
> • Directly some devices are removed by the interfacesRemoved callback
> • My code does not get the device it searches for -> unhappy user
>
> What happens under the hood is that DBus removes all devices from it's cache that have been indicated by Bluez as being Not connectable. I.e. there is no use in keeping them around, you can't connect to them anyway. But... this is not true for the device I'm looking for. It is marked as Not connectable incorrectly.
> So to chase down the problem I create a btmon dump which shows indeed that the device is reported to be Not connectable after having received a SCAN_RSP:
>
> > HCI Event: LE Meta Event (0x3e) plen 38 #73 [hci0] 34.359921
> LE Advertising Report (0x02)
> Num reports: 1
> Event type: Scan response - SCAN_RSP (0x04)
> Address type: Public (0x00)
> Address: F4:B8:5E:64:02:55 (Texas Instruments)
> Data length: 26
> Name (complete): BeeWi SmartLite
> Peripheral Conn. Interval: 0x0028 - 0x0050
> TX power: 0 dBm
> RSSI: -42 dBm (0xd6)
> @ MGMT Event: Device Found (0x0012) plen 40 {0x0001} [hci0] 34.360057
> LE Address: F4:B8:5E:64:02:55 (Texas Instruments)
> RSSI: -42 dBm (0xd6)
> Flags: 0x00000004
> Not Connectable
> Data length: 26
> Name (complete): BeeWi SmartLite
> Peripheral Conn. Interval: 0x0028 - 0x0050
> TX power: 0 dBm
>
> But all ADV_IND PDU's before that clearly indicate that the device is connectable, it is only after this SCAN_RSP that it is reported as Not connectable:
>
> > HCI Event: LE Meta Event (0x3e) plen 27 #46 [hci0] 34.152817
> LE Advertising Report (0x02)
> Num reports: 1
> Event type: Connectable undirected - ADV_IND (0x00)
> Address type: Public (0x00)
> Address: F4:B8:5E:64:02:55 (Texas Instruments)
> Data length: 15
> Flags: 0x06
> LE General Discoverable Mode
> BR/EDR Not Supported
> Company: Texas Instruments Inc. (13)
> Data: 06030108b0e408f7
> RSSI: -43 dBm (0xd5)
> @ MGMT Event: Device Found (0x0012) plen 31 {0x0001} [hci0] 34.152905
> LE Address: 44:6E:FF:00:0D:65 (Resolvable)
> RSSI: -74 dBm (0xb6)
> Flags: 0x00000000
> Data length: 17
> Flags: 0x1a
> LE General Discoverable Mode
> Simultaneous LE and BR/EDR (Controller)
> Simultaneous LE and BR/EDR (Host)
> TX power: 9 dBm
> Company: Apple, Inc. (76)
> Type: Unknown (16)
> Data: 01188898dc
> > HCI Event: LE Meta Event (0x3e) plen 41 #47 [hci0] 34.156958
> LE Advertising Report (0x02)
> Num reports: 1
> Event type: Connectable undirected - ADV_IND (0x00)
> Address type: Random (0x01)
> Address: FA:BD:8D:12:26:BF (Static)
> Data length: 29
> Name (short): P mesh
> Flags: 0x04
> BR/EDR Not Supported
> 128-bit Service UUIDs (partial): 1 entry
> Vendor specific
> RSSI: -47 dBm (0xd1)
> @ MGMT Event: Device Found (0x0012) plen 29 {0x0001} [hci0] 34.157030
> LE Address: F4:B8:5E:64:02:55 (Texas Instruments)
> RSSI: -43 dBm (0xd5)
> Flags: 0x00000000
> Data length: 15
> Flags: 0x06
> LE General Discoverable Mode
> BR/EDR Not Supported
> Company: Texas Instruments Inc. (13)
> Data: 06030108b0e408f7
>
> So I am heavily doubting whether the Linux kernel code has a bug in handling the SCAN_RSP. Please look at this part of the kernel:
>
> https://github.com/torvalds/linux/blob/48b1320a674e1ff5de2fad8606bee38f724594dc/net/bluetooth/hci_event.c#L6326
>
> It sets the NOT CONNECTABLE flag by default for a SCAN_RSP and will overwrite it with any flags it has received with a previous ADV_IND. But it does not seem to take into account that in BLE heavy environments the previous ADV_IND might be of a totally different device. So every now and then it will enter the first path where it will just report the device with a NOT_CONNECTABLE flag. Or am I missing something here?

You are saying that the controller would interleave ADV_IND of
different peers before SCAN_RSP, in that case yes that would be
possible that the device would be marked as NOT_CONNECTABLE, usually
that is not really the case which is why we end up adding the code
above:

* In the really unlikely case that a controller get confused
* and just sends a scan response event, then it is marked as
* not connectable as well.

I guess it is not really unlikely after all, so perhaps we need a flag
indicating this is a standalone SCAN_RSP e.g. MGMT_DEV_FOUND_SCAN_RSP,
then we can treat it accordingly in userspace.

--
Luiz Augusto von Dentz

2023-06-08 16:36:44

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: Intermittent Not connectable devices reported

Hi Herman,

On Wed, Jun 7, 2023 at 12:21 PM Luiz Augusto von Dentz
<[email protected]> wrote:
>
> Hi Herman,
>
> On Fri, Jun 2, 2023 at 2:51 AM Herman Meerlo <[email protected]> wrote:
> >
> > I'm working on a Raspberry Pi device that performs periodic BLE discovery and tries to connect to multiple devices. Most of the times this goes fine but especially in BLE heavy environments every now and then it fails because it says that the device is not found. I have been debugging this issue for a few weeks now and it comes down to this:
> >
> > • Connect to DBus to start the discovery
> > • Devices are found, interfacesAdded callbacks are made, everything looks fine
> > • Stop discovery
> > • Directly some devices are removed by the interfacesRemoved callback
> > • My code does not get the device it searches for -> unhappy user
> >
> > What happens under the hood is that DBus removes all devices from it's cache that have been indicated by Bluez as being Not connectable. I.e. there is no use in keeping them around, you can't connect to them anyway. But... this is not true for the device I'm looking for. It is marked as Not connectable incorrectly.
> > So to chase down the problem I create a btmon dump which shows indeed that the device is reported to be Not connectable after having received a SCAN_RSP:
> >
> > > HCI Event: LE Meta Event (0x3e) plen 38 #73 [hci0] 34.359921
> > LE Advertising Report (0x02)
> > Num reports: 1
> > Event type: Scan response - SCAN_RSP (0x04)
> > Address type: Public (0x00)
> > Address: F4:B8:5E:64:02:55 (Texas Instruments)
> > Data length: 26
> > Name (complete): BeeWi SmartLite
> > Peripheral Conn. Interval: 0x0028 - 0x0050
> > TX power: 0 dBm
> > RSSI: -42 dBm (0xd6)
> > @ MGMT Event: Device Found (0x0012) plen 40 {0x0001} [hci0] 34.360057
> > LE Address: F4:B8:5E:64:02:55 (Texas Instruments)
> > RSSI: -42 dBm (0xd6)
> > Flags: 0x00000004
> > Not Connectable
> > Data length: 26
> > Name (complete): BeeWi SmartLite
> > Peripheral Conn. Interval: 0x0028 - 0x0050
> > TX power: 0 dBm
> >
> > But all ADV_IND PDU's before that clearly indicate that the device is connectable, it is only after this SCAN_RSP that it is reported as Not connectable:
> >
> > > HCI Event: LE Meta Event (0x3e) plen 27 #46 [hci0] 34.152817
> > LE Advertising Report (0x02)
> > Num reports: 1
> > Event type: Connectable undirected - ADV_IND (0x00)
> > Address type: Public (0x00)
> > Address: F4:B8:5E:64:02:55 (Texas Instruments)
> > Data length: 15
> > Flags: 0x06
> > LE General Discoverable Mode
> > BR/EDR Not Supported
> > Company: Texas Instruments Inc. (13)
> > Data: 06030108b0e408f7
> > RSSI: -43 dBm (0xd5)
> > @ MGMT Event: Device Found (0x0012) plen 31 {0x0001} [hci0] 34.152905
> > LE Address: 44:6E:FF:00:0D:65 (Resolvable)
> > RSSI: -74 dBm (0xb6)
> > Flags: 0x00000000
> > Data length: 17
> > Flags: 0x1a
> > LE General Discoverable Mode
> > Simultaneous LE and BR/EDR (Controller)
> > Simultaneous LE and BR/EDR (Host)
> > TX power: 9 dBm
> > Company: Apple, Inc. (76)
> > Type: Unknown (16)
> > Data: 01188898dc
> > > HCI Event: LE Meta Event (0x3e) plen 41 #47 [hci0] 34.156958
> > LE Advertising Report (0x02)
> > Num reports: 1
> > Event type: Connectable undirected - ADV_IND (0x00)
> > Address type: Random (0x01)
> > Address: FA:BD:8D:12:26:BF (Static)
> > Data length: 29
> > Name (short): P mesh
> > Flags: 0x04
> > BR/EDR Not Supported
> > 128-bit Service UUIDs (partial): 1 entry
> > Vendor specific
> > RSSI: -47 dBm (0xd1)
> > @ MGMT Event: Device Found (0x0012) plen 29 {0x0001} [hci0] 34.157030
> > LE Address: F4:B8:5E:64:02:55 (Texas Instruments)
> > RSSI: -43 dBm (0xd5)
> > Flags: 0x00000000
> > Data length: 15
> > Flags: 0x06
> > LE General Discoverable Mode
> > BR/EDR Not Supported
> > Company: Texas Instruments Inc. (13)
> > Data: 06030108b0e408f7
> >
> > So I am heavily doubting whether the Linux kernel code has a bug in handling the SCAN_RSP. Please look at this part of the kernel:
> >
> > https://github.com/torvalds/linux/blob/48b1320a674e1ff5de2fad8606bee38f724594dc/net/bluetooth/hci_event.c#L6326
> >
> > It sets the NOT CONNECTABLE flag by default for a SCAN_RSP and will overwrite it with any flags it has received with a previous ADV_IND. But it does not seem to take into account that in BLE heavy environments the previous ADV_IND might be of a totally different device. So every now and then it will enter the first path where it will just report the device with a NOT_CONNECTABLE flag. Or am I missing something here?
>
> You are saying that the controller would interleave ADV_IND of
> different peers before SCAN_RSP, in that case yes that would be
> possible that the device would be marked as NOT_CONNECTABLE, usually
> that is not really the case which is why we end up adding the code
> above:
>
> * In the really unlikely case that a controller get confused
> * and just sends a scan response event, then it is marked as
> * not connectable as well.
>
> I guess it is not really unlikely after all, so perhaps we need a flag
> indicating this is a standalone SCAN_RSP e.g. MGMT_DEV_FOUND_SCAN_RSP,
> then we can treat it accordingly in userspace.

Can you try with the following change:

https://patchwork.kernel.org/project/bluetooth/patch/[email protected]/

--
Luiz Augusto von Dentz

2023-06-09 11:13:12

by Herman Meerlo

[permalink] [raw]
Subject: Re: Intermittent Not connectable devices reported



> On 8 Jun 2023, at 18:32, Luiz Augusto von Dentz <[email protected]> wrote:
>
> Hi Herman,
>
> On Wed, Jun 7, 2023 at 12:21 PM Luiz Augusto von Dentz
> <[email protected]> wrote:
>>
>> Hi Herman,
>>
>> On Fri, Jun 2, 2023 at 2:51 AM Herman Meerlo <[email protected]> wrote:
>>>
>>> I'm working on a Raspberry Pi device that performs periodic BLE discovery and tries to connect to multiple devices. Most of the times this goes fine but especially in BLE heavy environments every now and then it fails because it says that the device is not found. I have been debugging this issue for a few weeks now and it comes down to this:
>>>
>>> • Connect to DBus to start the discovery
>>> • Devices are found, interfacesAdded callbacks are made, everything looks fine
>>> • Stop discovery
>>> • Directly some devices are removed by the interfacesRemoved callback
>>> • My code does not get the device it searches for -> unhappy user
>>>
>>> What happens under the hood is that DBus removes all devices from it's cache that have been indicated by Bluez as being Not connectable. I.e. there is no use in keeping them around, you can't connect to them anyway. But... this is not true for the device I'm looking for. It is marked as Not connectable incorrectly.
>>> So to chase down the problem I create a btmon dump which shows indeed that the device is reported to be Not connectable after having received a SCAN_RSP:
>>>
>>>> HCI Event: LE Meta Event (0x3e) plen 38 #73 [hci0] 34.359921
>>> LE Advertising Report (0x02)
>>> Num reports: 1
>>> Event type: Scan response - SCAN_RSP (0x04)
>>> Address type: Public (0x00)
>>> Address: F4:B8:5E:64:02:55 (Texas Instruments)
>>> Data length: 26
>>> Name (complete): BeeWi SmartLite
>>> Peripheral Conn. Interval: 0x0028 - 0x0050
>>> TX power: 0 dBm
>>> RSSI: -42 dBm (0xd6)
>>> @ MGMT Event: Device Found (0x0012) plen 40 {0x0001} [hci0] 34.360057
>>> LE Address: F4:B8:5E:64:02:55 (Texas Instruments)
>>> RSSI: -42 dBm (0xd6)
>>> Flags: 0x00000004
>>> Not Connectable
>>> Data length: 26
>>> Name (complete): BeeWi SmartLite
>>> Peripheral Conn. Interval: 0x0028 - 0x0050
>>> TX power: 0 dBm
>>>
>>> But all ADV_IND PDU's before that clearly indicate that the device is connectable, it is only after this SCAN_RSP that it is reported as Not connectable:
>>>
>>>> HCI Event: LE Meta Event (0x3e) plen 27 #46 [hci0] 34.152817
>>> LE Advertising Report (0x02)
>>> Num reports: 1
>>> Event type: Connectable undirected - ADV_IND (0x00)
>>> Address type: Public (0x00)
>>> Address: F4:B8:5E:64:02:55 (Texas Instruments)
>>> Data length: 15
>>> Flags: 0x06
>>> LE General Discoverable Mode
>>> BR/EDR Not Supported
>>> Company: Texas Instruments Inc. (13)
>>> Data: 06030108b0e408f7
>>> RSSI: -43 dBm (0xd5)
>>> @ MGMT Event: Device Found (0x0012) plen 31 {0x0001} [hci0] 34.152905
>>> LE Address: 44:6E:FF:00:0D:65 (Resolvable)
>>> RSSI: -74 dBm (0xb6)
>>> Flags: 0x00000000
>>> Data length: 17
>>> Flags: 0x1a
>>> LE General Discoverable Mode
>>> Simultaneous LE and BR/EDR (Controller)
>>> Simultaneous LE and BR/EDR (Host)
>>> TX power: 9 dBm
>>> Company: Apple, Inc. (76)
>>> Type: Unknown (16)
>>> Data: 01188898dc
>>>> HCI Event: LE Meta Event (0x3e) plen 41 #47 [hci0] 34.156958
>>> LE Advertising Report (0x02)
>>> Num reports: 1
>>> Event type: Connectable undirected - ADV_IND (0x00)
>>> Address type: Random (0x01)
>>> Address: FA:BD:8D:12:26:BF (Static)
>>> Data length: 29
>>> Name (short): P mesh
>>> Flags: 0x04
>>> BR/EDR Not Supported
>>> 128-bit Service UUIDs (partial): 1 entry
>>> Vendor specific
>>> RSSI: -47 dBm (0xd1)
>>> @ MGMT Event: Device Found (0x0012) plen 29 {0x0001} [hci0] 34.157030
>>> LE Address: F4:B8:5E:64:02:55 (Texas Instruments)
>>> RSSI: -43 dBm (0xd5)
>>> Flags: 0x00000000
>>> Data length: 15
>>> Flags: 0x06
>>> LE General Discoverable Mode
>>> BR/EDR Not Supported
>>> Company: Texas Instruments Inc. (13)
>>> Data: 06030108b0e408f7
>>>
>>> So I am heavily doubting whether the Linux kernel code has a bug in handling the SCAN_RSP. Please look at this part of the kernel:
>>>
>>> https://github.com/torvalds/linux/blob/48b1320a674e1ff5de2fad8606bee38f724594dc/net/bluetooth/hci_event.c#L6326
>>>
>>> It sets the NOT CONNECTABLE flag by default for a SCAN_RSP and will overwrite it with any flags it has received with a previous ADV_IND. But it does not seem to take into account that in BLE heavy environments the previous ADV_IND might be of a totally different device. So every now and then it will enter the first path where it will just report the device with a NOT_CONNECTABLE flag. Or am I missing something here?
>>
>> You are saying that the controller would interleave ADV_IND of
>> different peers before SCAN_RSP, in that case yes that would be
>> possible that the device would be marked as NOT_CONNECTABLE, usually
>> that is not really the case which is why we end up adding the code
>> above:
>>
>> * In the really unlikely case that a controller get confused
>> * and just sends a scan response event, then it is marked as
>> * not connectable as well.
>>
>> I guess it is not really unlikely after all, so perhaps we need a flag
>> indicating this is a standalone SCAN_RSP e.g. MGMT_DEV_FOUND_SCAN_RSP,
>> then we can treat it accordingly in userspace.
>
> Can you try with the following change:
>
> https://patchwork.kernel.org/project/bluetooth/patch/[email protected]/
>
> --
> Luiz Augusto von Dentz

Yes of course, I am going to give it a try. It will require a complete new build setup so it will probably take a while. But really thankful that you made a potential fix for this problem. I’ll keep you posted.


2023-06-14 08:49:53

by Herman Meerlo

[permalink] [raw]
Subject: Re: Intermittent Not connectable devices reported



> On 8 Jun 2023, at 18:32, Luiz Augusto von Dentz <[email protected]> wrote:
>
> Hi Herman,
>
> On Wed, Jun 7, 2023 at 12:21 PM Luiz Augusto von Dentz
> <[email protected]> wrote:
>>
>> Hi Herman,
>>
>> On Fri, Jun 2, 2023 at 2:51 AM Herman Meerlo <[email protected]> wrote:
>>>
>>> I'm working on a Raspberry Pi device that performs periodic BLE discovery and tries to connect to multiple devices. Most of the times this goes fine but especially in BLE heavy environments every now and then it fails because it says that the device is not found. I have been debugging this issue for a few weeks now and it comes down to this:
>>>
>>> • Connect to DBus to start the discovery
>>> • Devices are found, interfacesAdded callbacks are made, everything looks fine
>>> • Stop discovery
>>> • Directly some devices are removed by the interfacesRemoved callback
>>> • My code does not get the device it searches for -> unhappy user
>>>
>>> What happens under the hood is that DBus removes all devices from it's cache that have been indicated by Bluez as being Not connectable. I.e. there is no use in keeping them around, you can't connect to them anyway. But... this is not true for the device I'm looking for. It is marked as Not connectable incorrectly.
>>> So to chase down the problem I create a btmon dump which shows indeed that the device is reported to be Not connectable after having received a SCAN_RSP:
>>>
>>>> HCI Event: LE Meta Event (0x3e) plen 38 #73 [hci0] 34.359921
>>> LE Advertising Report (0x02)
>>> Num reports: 1
>>> Event type: Scan response - SCAN_RSP (0x04)
>>> Address type: Public (0x00)
>>> Address: F4:B8:5E:64:02:55 (Texas Instruments)
>>> Data length: 26
>>> Name (complete): BeeWi SmartLite
>>> Peripheral Conn. Interval: 0x0028 - 0x0050
>>> TX power: 0 dBm
>>> RSSI: -42 dBm (0xd6)
>>> @ MGMT Event: Device Found (0x0012) plen 40 {0x0001} [hci0] 34.360057
>>> LE Address: F4:B8:5E:64:02:55 (Texas Instruments)
>>> RSSI: -42 dBm (0xd6)
>>> Flags: 0x00000004
>>> Not Connectable
>>> Data length: 26
>>> Name (complete): BeeWi SmartLite
>>> Peripheral Conn. Interval: 0x0028 - 0x0050
>>> TX power: 0 dBm
>>>
>>> But all ADV_IND PDU's before that clearly indicate that the device is connectable, it is only after this SCAN_RSP that it is reported as Not connectable:
>>>
>>>> HCI Event: LE Meta Event (0x3e) plen 27 #46 [hci0] 34.152817
>>> LE Advertising Report (0x02)
>>> Num reports: 1
>>> Event type: Connectable undirected - ADV_IND (0x00)
>>> Address type: Public (0x00)
>>> Address: F4:B8:5E:64:02:55 (Texas Instruments)
>>> Data length: 15
>>> Flags: 0x06
>>> LE General Discoverable Mode
>>> BR/EDR Not Supported
>>> Company: Texas Instruments Inc. (13)
>>> Data: 06030108b0e408f7
>>> RSSI: -43 dBm (0xd5)
>>> @ MGMT Event: Device Found (0x0012) plen 31 {0x0001} [hci0] 34.152905
>>> LE Address: 44:6E:FF:00:0D:65 (Resolvable)
>>> RSSI: -74 dBm (0xb6)
>>> Flags: 0x00000000
>>> Data length: 17
>>> Flags: 0x1a
>>> LE General Discoverable Mode
>>> Simultaneous LE and BR/EDR (Controller)
>>> Simultaneous LE and BR/EDR (Host)
>>> TX power: 9 dBm
>>> Company: Apple, Inc. (76)
>>> Type: Unknown (16)
>>> Data: 01188898dc
>>>> HCI Event: LE Meta Event (0x3e) plen 41 #47 [hci0] 34.156958
>>> LE Advertising Report (0x02)
>>> Num reports: 1
>>> Event type: Connectable undirected - ADV_IND (0x00)
>>> Address type: Random (0x01)
>>> Address: FA:BD:8D:12:26:BF (Static)
>>> Data length: 29
>>> Name (short): P mesh
>>> Flags: 0x04
>>> BR/EDR Not Supported
>>> 128-bit Service UUIDs (partial): 1 entry
>>> Vendor specific
>>> RSSI: -47 dBm (0xd1)
>>> @ MGMT Event: Device Found (0x0012) plen 29 {0x0001} [hci0] 34.157030
>>> LE Address: F4:B8:5E:64:02:55 (Texas Instruments)
>>> RSSI: -43 dBm (0xd5)
>>> Flags: 0x00000000
>>> Data length: 15
>>> Flags: 0x06
>>> LE General Discoverable Mode
>>> BR/EDR Not Supported
>>> Company: Texas Instruments Inc. (13)
>>> Data: 06030108b0e408f7
>>>
>>> So I am heavily doubting whether the Linux kernel code has a bug in handling the SCAN_RSP. Please look at this part of the kernel:
>>>
>>> https://github.com/torvalds/linux/blob/48b1320a674e1ff5de2fad8606bee38f724594dc/net/bluetooth/hci_event.c#L6326
>>>
>>> It sets the NOT CONNECTABLE flag by default for a SCAN_RSP and will overwrite it with any flags it has received with a previous ADV_IND. But it does not seem to take into account that in BLE heavy environments the previous ADV_IND might be of a totally different device. So every now and then it will enter the first path where it will just report the device with a NOT_CONNECTABLE flag. Or am I missing something here?
>>
>> You are saying that the controller would interleave ADV_IND of
>> different peers before SCAN_RSP, in that case yes that would be
>> possible that the device would be marked as NOT_CONNECTABLE, usually
>> that is not really the case which is why we end up adding the code
>> above:
>>
>> * In the really unlikely case that a controller get confused
>> * and just sends a scan response event, then it is marked as
>> * not connectable as well.
>>
>> I guess it is not really unlikely after all, so perhaps we need a flag
>> indicating this is a standalone SCAN_RSP e.g. MGMT_DEV_FOUND_SCAN_RSP,
>> then we can treat it accordingly in userspace.
>
> Can you try with the following change:
>
> https://patchwork.kernel.org/project/bluetooth/patch/[email protected]/
>
> --
> Luiz Augusto von Dentz

I can confirm this change works. I see this now in the batman logs instead of the “Not Connectable”:


@ MGMT Event: Device Found (0x0012) plen 34 {0x0001} [hci0] 37.727785
LE Address: E1:A1:94:4C:15:D8 (Static)
RSSI: -80 dBm (0xb0)
Flags: 0x00000020. <————— the new flag!
Unknown device flag (0x00000020)
Data length: 20
Name (complete): F48F92CBE54430A8EA


2023-06-14 17:19:20

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: Intermittent Not connectable devices reported

Hi Herman,

On Wed, Jun 14, 2023 at 1:29 AM Herman Meerlo <[email protected]> wrote:
>
>
>
> > On 8 Jun 2023, at 18:32, Luiz Augusto von Dentz <[email protected]> wrote:
> >
> > Hi Herman,
> >
> > On Wed, Jun 7, 2023 at 12:21 PM Luiz Augusto von Dentz
> > <[email protected]> wrote:
> >>
> >> Hi Herman,
> >>
> >> On Fri, Jun 2, 2023 at 2:51 AM Herman Meerlo <[email protected]> wrote:
> >>>
> >>> I'm working on a Raspberry Pi device that performs periodic BLE discovery and tries to connect to multiple devices. Most of the times this goes fine but especially in BLE heavy environments every now and then it fails because it says that the device is not found. I have been debugging this issue for a few weeks now and it comes down to this:
> >>>
> >>> • Connect to DBus to start the discovery
> >>> • Devices are found, interfacesAdded callbacks are made, everything looks fine
> >>> • Stop discovery
> >>> • Directly some devices are removed by the interfacesRemoved callback
> >>> • My code does not get the device it searches for -> unhappy user
> >>>
> >>> What happens under the hood is that DBus removes all devices from it's cache that have been indicated by Bluez as being Not connectable. I.e. there is no use in keeping them around, you can't connect to them anyway. But... this is not true for the device I'm looking for. It is marked as Not connectable incorrectly.
> >>> So to chase down the problem I create a btmon dump which shows indeed that the device is reported to be Not connectable after having received a SCAN_RSP:
> >>>
> >>>> HCI Event: LE Meta Event (0x3e) plen 38 #73 [hci0] 34.359921
> >>> LE Advertising Report (0x02)
> >>> Num reports: 1
> >>> Event type: Scan response - SCAN_RSP (0x04)
> >>> Address type: Public (0x00)
> >>> Address: F4:B8:5E:64:02:55 (Texas Instruments)
> >>> Data length: 26
> >>> Name (complete): BeeWi SmartLite
> >>> Peripheral Conn. Interval: 0x0028 - 0x0050
> >>> TX power: 0 dBm
> >>> RSSI: -42 dBm (0xd6)
> >>> @ MGMT Event: Device Found (0x0012) plen 40 {0x0001} [hci0] 34.360057
> >>> LE Address: F4:B8:5E:64:02:55 (Texas Instruments)
> >>> RSSI: -42 dBm (0xd6)
> >>> Flags: 0x00000004
> >>> Not Connectable
> >>> Data length: 26
> >>> Name (complete): BeeWi SmartLite
> >>> Peripheral Conn. Interval: 0x0028 - 0x0050
> >>> TX power: 0 dBm
> >>>
> >>> But all ADV_IND PDU's before that clearly indicate that the device is connectable, it is only after this SCAN_RSP that it is reported as Not connectable:
> >>>
> >>>> HCI Event: LE Meta Event (0x3e) plen 27 #46 [hci0] 34.152817
> >>> LE Advertising Report (0x02)
> >>> Num reports: 1
> >>> Event type: Connectable undirected - ADV_IND (0x00)
> >>> Address type: Public (0x00)
> >>> Address: F4:B8:5E:64:02:55 (Texas Instruments)
> >>> Data length: 15
> >>> Flags: 0x06
> >>> LE General Discoverable Mode
> >>> BR/EDR Not Supported
> >>> Company: Texas Instruments Inc. (13)
> >>> Data: 06030108b0e408f7
> >>> RSSI: -43 dBm (0xd5)
> >>> @ MGMT Event: Device Found (0x0012) plen 31 {0x0001} [hci0] 34.152905
> >>> LE Address: 44:6E:FF:00:0D:65 (Resolvable)
> >>> RSSI: -74 dBm (0xb6)
> >>> Flags: 0x00000000
> >>> Data length: 17
> >>> Flags: 0x1a
> >>> LE General Discoverable Mode
> >>> Simultaneous LE and BR/EDR (Controller)
> >>> Simultaneous LE and BR/EDR (Host)
> >>> TX power: 9 dBm
> >>> Company: Apple, Inc. (76)
> >>> Type: Unknown (16)
> >>> Data: 01188898dc
> >>>> HCI Event: LE Meta Event (0x3e) plen 41 #47 [hci0] 34.156958
> >>> LE Advertising Report (0x02)
> >>> Num reports: 1
> >>> Event type: Connectable undirected - ADV_IND (0x00)
> >>> Address type: Random (0x01)
> >>> Address: FA:BD:8D:12:26:BF (Static)
> >>> Data length: 29
> >>> Name (short): P mesh
> >>> Flags: 0x04
> >>> BR/EDR Not Supported
> >>> 128-bit Service UUIDs (partial): 1 entry
> >>> Vendor specific
> >>> RSSI: -47 dBm (0xd1)
> >>> @ MGMT Event: Device Found (0x0012) plen 29 {0x0001} [hci0] 34.157030
> >>> LE Address: F4:B8:5E:64:02:55 (Texas Instruments)
> >>> RSSI: -43 dBm (0xd5)
> >>> Flags: 0x00000000
> >>> Data length: 15
> >>> Flags: 0x06
> >>> LE General Discoverable Mode
> >>> BR/EDR Not Supported
> >>> Company: Texas Instruments Inc. (13)
> >>> Data: 06030108b0e408f7
> >>>
> >>> So I am heavily doubting whether the Linux kernel code has a bug in handling the SCAN_RSP. Please look at this part of the kernel:
> >>>
> >>> https://github.com/torvalds/linux/blob/48b1320a674e1ff5de2fad8606bee38f724594dc/net/bluetooth/hci_event.c#L6326
> >>>
> >>> It sets the NOT CONNECTABLE flag by default for a SCAN_RSP and will overwrite it with any flags it has received with a previous ADV_IND. But it does not seem to take into account that in BLE heavy environments the previous ADV_IND might be of a totally different device. So every now and then it will enter the first path where it will just report the device with a NOT_CONNECTABLE flag. Or am I missing something here?
> >>
> >> You are saying that the controller would interleave ADV_IND of
> >> different peers before SCAN_RSP, in that case yes that would be
> >> possible that the device would be marked as NOT_CONNECTABLE, usually
> >> that is not really the case which is why we end up adding the code
> >> above:
> >>
> >> * In the really unlikely case that a controller get confused
> >> * and just sends a scan response event, then it is marked as
> >> * not connectable as well.
> >>
> >> I guess it is not really unlikely after all, so perhaps we need a flag
> >> indicating this is a standalone SCAN_RSP e.g. MGMT_DEV_FOUND_SCAN_RSP,
> >> then we can treat it accordingly in userspace.
> >
> > Can you try with the following change:
> >
> > https://patchwork.kernel.org/project/bluetooth/patch/[email protected]/
> >
> > --
> > Luiz Augusto von Dentz
>
> I can confirm this change works. I see this now in the batman logs instead of the “Not Connectable”:
>
>
> @ MGMT Event: Device Found (0x0012) plen 34 {0x0001} [hci0] 37.727785
> LE Address: E1:A1:94:4C:15:D8 (Static)
> RSSI: -80 dBm (0xb0)
> Flags: 0x00000020. <————— the new flag!
> Unknown device flag (0x00000020)
> Data length: 20
> Name (complete): F48F92CBE54430A8EA
>

Yeah, that is to be expected, I will document and add support to btmon
to print the flag properly.

--
Luiz Augusto von Dentz