It is possible that the Bluetooth management will receive scan enabled
signal and LE meta events one by another without any delay. Since the
start discovery procedure is performed in an asynchronous manner, it is
possible that these HCI events will be processed concurrently by two
different worker threads. In such case, it is possible that the LE meta
event, which reports new device, will be discarded, because discovering
is still in the starting state.
The problem is most prominent with the btvirt userspace tool, which
sends LE Meta events just after reporting scan as enabled. Testing
scenario:
1. Create two HCI interfaces:
> btvirt -l2
2. Setup BLE advertisement on hci1:
> bluetoothctl
>> select 00:AA:01:00:00:00
>> menu advertise
>> uuids 03B80E5A-EDE8-4B33-A751-6CE34EC4C700
>> discoverable on
>> back
>> advertise peripheral
3. Start scanning on hci2:
> bluetoothctl
>> select 00:AA:01:01:00:01
>> scan le
// From time to time, new device is not reported
To make this issue 100% reproducible, one can simply add slight delay,
e.g. msleep(100) at the beginning of the start_discovery_complete()
function in the net/bluetooth/mgmt.c file.
This patch adds synchronization for start discovery procedure and device
found reporting by the Bluetooth management. In case of discovering
being in the starting state, the worker which processes LE Meta event
will wait for the cmd_sync_work on which the start discovery procedure
is queued.
Signed-off-by: Arkadiusz Bokowy <[email protected]>
---
include/net/bluetooth/hci_core.h | 5 +++++
include/net/bluetooth/hci_sync.h | 1 +
net/bluetooth/hci_sync.c | 7 +++++++
net/bluetooth/mgmt.c | 17 +++++++++++++++--
4 files changed, 28 insertions(+), 2 deletions(-)
diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
index f36c1fd5d64e..456bbdf56246 100644
--- a/include/net/bluetooth/hci_core.h
+++ b/include/net/bluetooth/hci_core.h
@@ -916,6 +916,11 @@ static inline void hci_discovery_filter_clear(struct hci_dev *hdev)
bool hci_discovery_active(struct hci_dev *hdev);
+static inline bool hci_discovery_starting(struct hci_dev *hdev)
+{
+ return hdev->discovery.state == DISCOVERY_STARTING;
+}
+
void hci_discovery_set_state(struct hci_dev *hdev, int state);
static inline int inquiry_cache_empty(struct hci_dev *hdev)
diff --git a/include/net/bluetooth/hci_sync.h b/include/net/bluetooth/hci_sync.h
index 6efbc2152146..67cf6689a692 100644
--- a/include/net/bluetooth/hci_sync.h
+++ b/include/net/bluetooth/hci_sync.h
@@ -43,6 +43,7 @@ void hci_cmd_sync_init(struct hci_dev *hdev);
void hci_cmd_sync_clear(struct hci_dev *hdev);
void hci_cmd_sync_cancel(struct hci_dev *hdev, int err);
void __hci_cmd_sync_cancel(struct hci_dev *hdev, int err);
+void hci_cmd_sync_flush(struct hci_dev *hdev);
int hci_cmd_sync_submit(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
void *data, hci_cmd_sync_work_destroy_t destroy);
diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
index 3640d73f9595..58905a5b7b1e 100644
--- a/net/bluetooth/hci_sync.c
+++ b/net/bluetooth/hci_sync.c
@@ -681,6 +681,13 @@ void hci_cmd_sync_cancel(struct hci_dev *hdev, int err)
}
EXPORT_SYMBOL(hci_cmd_sync_cancel);
+/* Wait for all pending HCI commands to complete.
+ */
+void hci_cmd_sync_flush(struct hci_dev *hdev)
+{
+ flush_work(&hdev->cmd_sync_work);
+}
+
/* Submit HCI command to be run in as cmd_sync_work:
*
* - hdev must _not_ be unregistered
diff --git a/net/bluetooth/mgmt.c b/net/bluetooth/mgmt.c
index ba2e00646e8e..fc494348f2f7 100644
--- a/net/bluetooth/mgmt.c
+++ b/net/bluetooth/mgmt.c
@@ -10374,18 +10374,31 @@ void mgmt_device_found(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 link_type,
{
struct sk_buff *skb;
struct mgmt_ev_device_found *ev;
- bool report_device = hci_discovery_active(hdev);
+ bool report_device;
if (hci_dev_test_flag(hdev, HCI_MESH) && link_type == LE_LINK)
mesh_device_found(hdev, bdaddr, addr_type, rssi, flags,
eir, eir_len, scan_rsp, scan_rsp_len,
instant);
+ /* Discovery start procedure is perfomed on a workqueue in an
+ * asynchronous manner. This procedure is finished when the scan
+ * enabled signal is received from the controller. Just after
+ * this signal, the controller might send another event (e.g. LE
+ * Meta). In such case, we need to make sure that the discovery
+ * procedure is finished, because otherwise we might omit some
+ * scan results.
+ */
+ if (hci_discovery_starting(hdev))
+ hci_cmd_sync_flush(hdev);
+
+ report_device = hci_discovery_active(hdev);
+
/* Don't send events for a non-kernel initiated discovery. With
* LE one exception is if we have pend_le_reports > 0 in which
* case we're doing passive scanning and want these events.
*/
- if (!hci_discovery_active(hdev)) {
+ if (!report_device) {
if (link_type == ACL_LINK)
return;
if (link_type == LE_LINK && !list_empty(&hdev->pend_le_reports))
--
2.34.1
Hi Arkadiusz,
On Thu, Sep 28, 2023 at 4:05 PM Arkadiusz Bokowy <[email protected]> wrote:
>
> It is possible that the Bluetooth management will receive scan enabled
> signal and LE meta events one by another without any delay. Since the
> start discovery procedure is performed in an asynchronous manner, it is
> possible that these HCI events will be processed concurrently by two
> different worker threads. In such case, it is possible that the LE meta
> event, which reports new device, will be discarded, because discovering
> is still in the starting state.
>
> The problem is most prominent with the btvirt userspace tool, which
> sends LE Meta events just after reporting scan as enabled. Testing
> scenario:
>
> 1. Create two HCI interfaces:
> > btvirt -l2
>
> 2. Setup BLE advertisement on hci1:
> > bluetoothctl
> >> select 00:AA:01:00:00:00
> >> menu advertise
> >> uuids 03B80E5A-EDE8-4B33-A751-6CE34EC4C700
> >> discoverable on
> >> back
> >> advertise peripheral
>
> 3. Start scanning on hci2:
> > bluetoothctl
> >> select 00:AA:01:01:00:01
> >> scan le
> // From time to time, new device is not reported
>
> To make this issue 100% reproducible, one can simply add slight delay,
> e.g. msleep(100) at the beginning of the start_discovery_complete()
> function in the net/bluetooth/mgmt.c file.
>
> This patch adds synchronization for start discovery procedure and device
> found reporting by the Bluetooth management. In case of discovering
> being in the starting state, the worker which processes LE Meta event
> will wait for the cmd_sync_work on which the start discovery procedure
> is queued.
>
> Signed-off-by: Arkadiusz Bokowy <[email protected]>
> ---
> include/net/bluetooth/hci_core.h | 5 +++++
> include/net/bluetooth/hci_sync.h | 1 +
> net/bluetooth/hci_sync.c | 7 +++++++
> net/bluetooth/mgmt.c | 17 +++++++++++++++--
> 4 files changed, 28 insertions(+), 2 deletions(-)
>
> diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
> index f36c1fd5d64e..456bbdf56246 100644
> --- a/include/net/bluetooth/hci_core.h
> +++ b/include/net/bluetooth/hci_core.h
> @@ -916,6 +916,11 @@ static inline void hci_discovery_filter_clear(struct hci_dev *hdev)
>
> bool hci_discovery_active(struct hci_dev *hdev);
>
> +static inline bool hci_discovery_starting(struct hci_dev *hdev)
> +{
> + return hdev->discovery.state == DISCOVERY_STARTING;
> +}
> +
> void hci_discovery_set_state(struct hci_dev *hdev, int state);
>
> static inline int inquiry_cache_empty(struct hci_dev *hdev)
> diff --git a/include/net/bluetooth/hci_sync.h b/include/net/bluetooth/hci_sync.h
> index 6efbc2152146..67cf6689a692 100644
> --- a/include/net/bluetooth/hci_sync.h
> +++ b/include/net/bluetooth/hci_sync.h
> @@ -43,6 +43,7 @@ void hci_cmd_sync_init(struct hci_dev *hdev);
> void hci_cmd_sync_clear(struct hci_dev *hdev);
> void hci_cmd_sync_cancel(struct hci_dev *hdev, int err);
> void __hci_cmd_sync_cancel(struct hci_dev *hdev, int err);
> +void hci_cmd_sync_flush(struct hci_dev *hdev);
>
> int hci_cmd_sync_submit(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
> void *data, hci_cmd_sync_work_destroy_t destroy);
> diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
> index 3640d73f9595..58905a5b7b1e 100644
> --- a/net/bluetooth/hci_sync.c
> +++ b/net/bluetooth/hci_sync.c
> @@ -681,6 +681,13 @@ void hci_cmd_sync_cancel(struct hci_dev *hdev, int err)
> }
> EXPORT_SYMBOL(hci_cmd_sync_cancel);
>
> +/* Wait for all pending HCI commands to complete.
> + */
> +void hci_cmd_sync_flush(struct hci_dev *hdev)
> +{
> + flush_work(&hdev->cmd_sync_work);
Afaik this will block waiting the work to complete which sounds a
little dangerous especially if hdev has been locked.
> +}
> +
> /* Submit HCI command to be run in as cmd_sync_work:
> *
> * - hdev must _not_ be unregistered
> diff --git a/net/bluetooth/mgmt.c b/net/bluetooth/mgmt.c
> index ba2e00646e8e..fc494348f2f7 100644
> --- a/net/bluetooth/mgmt.c
> +++ b/net/bluetooth/mgmt.c
> @@ -10374,18 +10374,31 @@ void mgmt_device_found(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 link_type,
> {
> struct sk_buff *skb;
> struct mgmt_ev_device_found *ev;
> - bool report_device = hci_discovery_active(hdev);
> + bool report_device;
>
> if (hci_dev_test_flag(hdev, HCI_MESH) && link_type == LE_LINK)
> mesh_device_found(hdev, bdaddr, addr_type, rssi, flags,
> eir, eir_len, scan_rsp, scan_rsp_len,
> instant);
>
> + /* Discovery start procedure is perfomed on a workqueue in an
> + * asynchronous manner. This procedure is finished when the scan
> + * enabled signal is received from the controller. Just after
> + * this signal, the controller might send another event (e.g. LE
> + * Meta). In such case, we need to make sure that the discovery
> + * procedure is finished, because otherwise we might omit some
> + * scan results.
> + */
> + if (hci_discovery_starting(hdev))
> + hci_cmd_sync_flush(hdev);
> +
> + report_device = hci_discovery_active(hdev);
Couldn't we just do:
diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index 195aea2198a9..78f0a8fb0a19 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -136,6 +136,7 @@ bool hci_discovery_active(struct hci_dev *hdev)
struct discovery_state *discov = &hdev->discovery;
switch (discov->state) {
+ case DISCOVERY_STARTING:
case DISCOVERY_FINDING:
case DISCOVERY_RESOLVING:
return true;
> /* Don't send events for a non-kernel initiated discovery. With
> * LE one exception is if we have pend_le_reports > 0 in which
> * case we're doing passive scanning and want these events.
> */
> - if (!hci_discovery_active(hdev)) {
> + if (!report_device) {
> if (link_type == ACL_LINK)
> return;
> if (link_type == LE_LINK && !list_empty(&hdev->pend_le_reports))
> --
> 2.34.1
>
--
Luiz Augusto von Dentz
Hi,
>> +/* Wait for all pending HCI commands to complete.
>> + */
>> +void hci_cmd_sync_flush(struct hci_dev *hdev)
>> +{
>> + flush_work(&hdev->cmd_sync_work);
>
> Afaik this will block waiting the work to complete which sounds a
> little dangerous especially if hdev has been locked.
Yes, this will block wait for all tasks queued on the cmd_synd_work.
Unfortunately, I'm not very familiar (not yet) with BlueZ kernel
component, so I'm not saying that this solution is correct. I hoped
that someone with actual kernel knowledge will review it :)
Anyway, my simple test case passes with such solution without any lockups.
Alternatively, I can move this block wait before hdev lock in
hci_le_*_adv_report_evt() functions.
> Couldn't we just do:
>
> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> index 195aea2198a9..78f0a8fb0a19 100644
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -136,6 +136,7 @@ bool hci_discovery_active(struct hci_dev *hdev)
> struct discovery_state *discov = &hdev->discovery;
>
> switch (discov->state) {
> + case DISCOVERY_STARTING:
> case DISCOVERY_FINDING:
> case DISCOVERY_RESOLVING:
> return true;
I'm not sure whether it will fix the issue... I've tested it and it does
not pass my test with a delay added to the start_discovery_complete()
function. The problem here is with synchronization. Since the LE meta
event (device found) and start discovery completion might be processed
simultaneously... Also, it will not be true that discovery is active if
the state is "starting", because HCI might return error when enabling
scanning.
There is other solution to my problem, though. In a real world case
scenario, it's not an issue that the LE meta event coming just after
scan enabled signal will be dropped, because there will be more such
events later. The problem is with btvirt, which does not "broadcasts" LE
meta events when discovering is enabled. So, I can "fix" btvirt instead
of patching the kernel, by repeatedly signaling LE meta events. This
will slightly increase CPU load with btvirt, but will work. What do you
think?
Regards,
Arek
Hi Arek,
On Mon, Oct 9, 2023 at 2:25 AM Arkadiusz Bokowy <[email protected]> wrote:
>
> Hi,
>
> >> +/* Wait for all pending HCI commands to complete.
> >> + */
> >> +void hci_cmd_sync_flush(struct hci_dev *hdev)
> >> +{
> >> + flush_work(&hdev->cmd_sync_work);
> >
> > Afaik this will block waiting the work to complete which sounds a
> > little dangerous especially if hdev has been locked.
>
> Yes, this will block wait for all tasks queued on the cmd_synd_work.
> Unfortunately, I'm not very familiar (not yet) with BlueZ kernel
> component, so I'm not saying that this solution is correct. I hoped
> that someone with actual kernel knowledge will review it :)
>
> Anyway, my simple test case passes with such solution without any lockups.
>
> Alternatively, I can move this block wait before hdev lock in
> hci_le_*_adv_report_evt() functions.
>
> > Couldn't we just do:
> >
> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> > index 195aea2198a9..78f0a8fb0a19 100644
> > --- a/net/bluetooth/hci_core.c
> > +++ b/net/bluetooth/hci_core.c
> > @@ -136,6 +136,7 @@ bool hci_discovery_active(struct hci_dev *hdev)
> > struct discovery_state *discov = &hdev->discovery;
> >
> > switch (discov->state) {
> > + case DISCOVERY_STARTING:
> > case DISCOVERY_FINDING:
> > case DISCOVERY_RESOLVING:
> > return true;
>
> I'm not sure whether it will fix the issue... I've tested it and it does
> not pass my test with a delay added to the start_discovery_complete()
> function. The problem here is with synchronization. Since the LE meta
> event (device found) and start discovery completion might be processed
> simultaneously... Also, it will not be true that discovery is active if
> the state is "starting", because HCI might return error when enabling
> scanning.
>
> There is other solution to my problem, though. In a real world case
> scenario, it's not an issue that the LE meta event coming just after
> scan enabled signal will be dropped, because there will be more such
> events later. The problem is with btvirt, which does not "broadcasts" LE
> meta events when discovering is enabled. So, I can "fix" btvirt instead
> of patching the kernel, by repeatedly signaling LE meta events. This
> will slightly increase CPU load with btvirt, but will work. What do you
> think?
Yeah, I think it is more of an issue with btdev then, perhaps we need
to add a delay or something to generate the reports, or we keep
generating them based on the scan parameters that way it would emulate
a little bit better how it works with real controllers, that said keep
in mind that we need to cancel all timers properly as well if we go
ahead with this change.
> Regards,
> Arek
--
Luiz Augusto von Dentz