LinuxLists.cc - Handling EBUSY for LE connections.

2014-11-13 04:27:04

Subject: Handling EBUSY for LE connections.

Hi all,

We have a use case in which multiple LE connection attempts are made
in rapid succession to a device that frequently rotates its private
address. The connect call frequently fails with
"org.bluez.Error.Failed: Device or resource is busy (16)". I tracked
this down to a line in net/bluetooth/hci_conn.c:hci_connect_le that
has the following comment before returning -EBUSY:

/* Since the controller supports only one LE connection attempt at a
* a time, we return -EBUSY if there is any connection attempt running.
*/

This is all fine and good, however the org.bluez.Error.Failed error
name is too generic for an application to determine how to recover
from this case and it would be nice if there was a way for an
application to know that it should perhaps try connecting again later,
especially without having to parse the error message string to make
sense of things while there can be a clear error name.

So, would it make sense for bluetoothd to return
org.bluez.Error.InProgress or even org.bluez.Error.Busy here? Another
potential solution I had in mind was to have bluetoothd actually queue
LE connection attempts if one is already pending, or perhaps
automatically schedule a retry if kernel returns EBUSY.

What do you think is the best solution here?

Thanks,
Arman

2014-11-21 10:08:18

by Johan Hedberg

[permalink] [raw]

Subject: Re: Handling EBUSY for LE connections.

Hi Arman,

On Thu, Nov 20, 2014, Johan Hedberg wrote:
> On Wed, Nov 19, 2014, Arman Uguray wrote:
> > I now have a few of questions before I start implementing anything:
> >
> > 1. I'm confused about how the current code handles the case where
> > hci_connect_le returns -EBUSY while handling an advertisement report.
> > The code simply ignores the error and hci_conn object gets created, so
> > the next time an advertisement report gets received and we call
> > hci_connect_le, won't that just return early since
> > hci_conn_hash_lookup_ba will return something? How will this actually
> > ever get to adding the LE Create Connection request?
>
> Regarding advertising reports while there's a connect attempt in
> progress, hci_connect_le() disables advertising before issuing
> HCI_LE_Create_Conn so advertising should never be active while we're
> trying to connect. Only once the connection times out or we're
> successfully connected does advertising get re-enabled and we're ready
> again to connect any other device that might be around.

Sorry, you must have gotten quite confused by my bs here. I was
obviously mixing advertising and scanning. My point still stands though:
hci_connect_le also disables scanning for the connection attempt so no
advertising reports should be coming while trying to connect.

Johan

2014-11-20 09:34:03

by Johan Hedberg

[permalink] [raw]

Subject: Re: Handling EBUSY for LE connections.

Hi Arman,

On Wed, Nov 19, 2014, Arman Uguray wrote:
> I now have a few of questions before I start implementing anything:
>
> 1. I'm confused about how the current code handles the case where
> hci_connect_le returns -EBUSY while handling an advertisement report.
> The code simply ignores the error and hci_conn object gets created, so
> the next time an advertisement report gets received and we call
> hci_connect_le, won't that just return early since
> hci_conn_hash_lookup_ba will return something? How will this actually
> ever get to adding the LE Create Connection request?

Regarding advertising reports while there's a connect attempt in
progress, hci_connect_le() disables advertising before issuing
HCI_LE_Create_Conn so advertising should never be active while we're
trying to connect. Only once the connection times out or we're
successfully connected does advertising get re-enabled and we're ready
again to connect any other device that might be around.

Regarding conn objects stuck in BT_CONNECT, the state is only set to
that later in hci_connect_le() when calling hci_req_add_le_create_conn()
i.e. when we know we can issue the command, so I don't see any obvious
problems here.

> 2. The background scan code currently assigns ownership of the
> hci_conn structure to the hci_conn_params structure that triggered it.
> I guess it's OK if a reference is held both by the parameters AND the
> l2cap_conn structure if this is triggered by an L2CAP connect instead?

I don't see any problems in having multiple references like this.

> 3. The background scan code currently passes BT_SECURITY_LOW to
> hci_connect_le, but we probably need a way to obtain the security
> level from the l2cap sock options in the l2cap connect case. What's
> the best way to pass this data along?

The first thing that comes to mind is to add an extra u8 sec_level
variable to the hci_conn_params struct.

Johan

2014-11-19 22:20:10

by Arman Uguray

[permalink] [raw]

Subject: Re: Handling EBUSY for LE connections.

Hi Johan,

> On Wed, Nov 19, 2014 at 1:27 PM, Johan Hedberg <[email protected]> wrote:
> Hi Arman,
>
> On Mon, Nov 17, 2014, Arman Uguray wrote:
>> >> I guess I'm seeking some programming advice now as a net/bluetooth
>> >> noob. So it looks like in l2cap_code.c:l2cap_chan_connect I want to
>> >> replace the call to "hci_connect_le" with some variant of
>> >> hci_conn_params_set. Based on my understanding, eventually if an ADV
>> >> report is received, hci_event.c:process_adv_report will handle
>> >> creating the connection for us (in hci_event.c:check_pending_le_conn).
>> >>
>> >> From here, what would be the best (and proper) way to propagate this
>> >> information and the newly created hci_conn structure to the l2cap
>> >> layer so that the l2cap_conn structure can be set up?
>> >
>> > that is a good question. I think Johan is a bit deeper in the code.
>> > Might worthwhile to jump on IRC and brainstorm with him on this.
>> >
>>
>> I'm pretty much always out of luck getting ahold of people on IRC, so
>> hopefully Johan will see this thread and respond eventually.
>>
>> > He has done connection creation for the slave side using direct
>> > advertising and that would have some similar logical handling. At
>> > least I think it does. And if not, we might first need a few
>> > cleanups to make the LE connection handling for L2CAP sockets more
>> > streamlined and easy to integrate.
>> >
>>
>> So, from looking at the code, I see that l2cap_sock_connect calls
>> l2cap_chan_connect, which internally creates the hci_conn via
>> hci_connect_le. If all goes well, l2cap_sock_connect simply sits and
>> waits until the socket state becomes BT_CONNECTED. So, I guess we may
>> want something similar to that. We could revise hci_connect_le so that
>> it automatically creates the temporary whitelist entry and listens for
>> AD. Or perhaps we could have an additional API that sends an LE Create
>> Connection that accepts an hci_conn instead of an hci_dev, which can
>> then be called from hci_event.c:check_pending_le_conn.
>>
>> Anyway, I think I have a rough idea of what needs to be done overall
>> but I'd appreciate Johan's input.
>
> It sounds like you're pretty much on the right track. Creating a
> temporary entry to the same list that's used for the usual passive
> scanning based connections should work. All other code that deals with
> notifying the l2cap_chan once we get the LE_Connection_Complete event
> should already be there and do the right thing, so the only new thing
> that should be needed at that point is to permanently remove the
> temporary entry you added.
>
> Anyway, this is what I *think* should work. Once you have some initial
> patches ready it'll be easier to see how things really are :)
>

I now have a few of questions before I start implementing anything:

1. I'm confused about how the current code handles the case where
hci_connect_le returns -EBUSY while handling an advertisement report.
The code simply ignores the error and hci_conn object gets created, so
the next time an advertisement report gets received and we call
hci_connect_le, won't that just return early since
hci_conn_hash_lookup_ba will return something? How will this actually
ever get to adding the LE Create Connection request?

2. The background scan code currently assigns ownership of the
hci_conn structure to the hci_conn_params structure that triggered it.
I guess it's OK if a reference is held both by the parameters AND the
l2cap_conn structure if this is triggered by an L2CAP connect instead?

3. The background scan code currently passes BT_SECURITY_LOW to
hci_connect_le, but we probably need a way to obtain the security
level from the l2cap sock options in the l2cap connect case. What's
the best way to pass this data along?

> Johan

Thanks!
Arman

2014-11-19 21:27:51

by Johan Hedberg

[permalink] [raw]

Subject: Re: Handling EBUSY for LE connections.

Hi Arman,

On Mon, Nov 17, 2014, Arman Uguray wrote:
> >> I guess I'm seeking some programming advice now as a net/bluetooth
> >> noob. So it looks like in l2cap_code.c:l2cap_chan_connect I want to
> >> replace the call to "hci_connect_le" with some variant of
> >> hci_conn_params_set. Based on my understanding, eventually if an ADV
> >> report is received, hci_event.c:process_adv_report will handle
> >> creating the connection for us (in hci_event.c:check_pending_le_conn).
> >>
> >> From here, what would be the best (and proper) way to propagate this
> >> information and the newly created hci_conn structure to the l2cap
> >> layer so that the l2cap_conn structure can be set up?
> >
> > that is a good question. I think Johan is a bit deeper in the code.
> > Might worthwhile to jump on IRC and brainstorm with him on this.
> >
>
> I'm pretty much always out of luck getting ahold of people on IRC, so
> hopefully Johan will see this thread and respond eventually.
>
> > He has done connection creation for the slave side using direct
> > advertising and that would have some similar logical handling. At
> > least I think it does. And if not, we might first need a few
> > cleanups to make the LE connection handling for L2CAP sockets more
> > streamlined and easy to integrate.
> >
>
> So, from looking at the code, I see that l2cap_sock_connect calls
> l2cap_chan_connect, which internally creates the hci_conn via
> hci_connect_le. If all goes well, l2cap_sock_connect simply sits and
> waits until the socket state becomes BT_CONNECTED. So, I guess we may
> want something similar to that. We could revise hci_connect_le so that
> it automatically creates the temporary whitelist entry and listens for
> AD. Or perhaps we could have an additional API that sends an LE Create
> Connection that accepts an hci_conn instead of an hci_dev, which can
> then be called from hci_event.c:check_pending_le_conn.
>
> Anyway, I think I have a rough idea of what needs to be done overall
> but I'd appreciate Johan's input.

It sounds like you're pretty much on the right track. Creating a
temporary entry to the same list that's used for the usual passive
scanning based connections should work. All other code that deals with
notifying the l2cap_chan once we get the LE_Connection_Complete event
should already be there and do the right thing, so the only new thing
that should be needed at that point is to permanently remove the
temporary entry you added.

Anyway, this is what I *think* should work. Once you have some initial
patches ready it'll be easier to see how things really are :)

Johan

2014-11-17 23:07:11

by Arman Uguray

[permalink] [raw]

Subject: Re: Handling EBUSY for LE connections.

Hi Marcel,

> On Fri, Nov 14, 2014 at 4:35 PM, Marcel Holtmann <[email protected]> wr=
ote:
> Hi Arman,
>
>>>>>>> We have a use case in which multiple LE connection attempts are mad=
e
>>>>>>> in rapid succession to a device that frequently rotates its private
>>>>>>> address. The connect call frequently fails with
>>>>>>> "org.bluez.Error.Failed: Device or resource is busy (16)". I tracke=
d
>>>>>>> this down to a line in net/bluetooth/hci_conn.c:hci_connect_le that
>>>>>>> has the following comment before returning -EBUSY:
>>>>>>>
>>>>>>> /* Since the controller supports only one LE connection attempt at=
a
>>>>>>> * a time, we return -EBUSY if there is any connection attempt run=
ning.
>>>>>>> */
>>>>>>>
>>>>>>> This is all fine and good, however the org.bluez.Error.Failed error
>>>>>>> name is too generic for an application to determine how to recover
>>>>>>> from this case and it would be nice if there was a way for an
>>>>>>> application to know that it should perhaps try connecting again lat=
er,
>>>>>>> especially without having to parse the error message string to make
>>>>>>> sense of things while there can be a clear error name.
>>>>>>>
>>>>>>> So, would it make sense for bluetoothd to return
>>>>>>> org.bluez.Error.InProgress or even org.bluez.Error.Busy here? Anoth=
er
>>>>>>> potential solution I had in mind was to have bluetoothd actually qu=
eue
>>>>>>> LE connection attempts if one is already pending, or perhaps
>>>>>>> automatically schedule a retry if kernel returns EBUSY.
>>>>>>>
>>>>>>> What do you think is the best solution here?
>>>>>>
>>>>>> we should fix that directly in the kernel. Back in the days we had a=
similar issue with BR/EDR and some more "dumb" controllers. We queued the =
connection attempt in the kernel and issued it whenever the controller was =
free again.
>>>>>>
>>>>>> One other thing that we should be doing for "direct" LE connection a=
ttempts is to run them through the LE passive background scanning. That way=
we have the same connection logic handling the background scanning and the=
connection attempts.
>>>>>>
>>>>>> Mainly I am thinking that if we have a L2CAP connect() request, we j=
ust add it to the LE passive background scanning. The only difference would=
be that these can actually time out. Otherwise it should be exactly the sa=
me.
>>>>>
>>>>> OK, so what is the work that I would need to do here? I'm not super
>>>>> familiar with the kernel code, hence I'd appreciate some guidance.
>>>>> From what you said, I gather two different work items:
>>>>>
>>>>> 1. Queue connection attempts and reissue when the controller becomes =
free.
>>>>> 2. Make use of background scanning.
>>>>>
>>>>> If we do #2 would we need #1 as well? I assume #2 will only be
>>>>> possible on later kernels, is there a solution that would easily
>>>>> backport to older kernels?
>>>>
>>>> the background scanning is part of 3.17 and later, and you want to bac=
kport that anyway. Main reason is that it using the controller whitelist an=
d is needed for proper LE HID support anyway.
>>>>
>>>
>>> OK. For ChromiumOS's 3.10 and 3.14 kernels Scott has already done a
>>> backport from the latest bluetooth-next but we had concerns for kernel
>>> 3.8. I guess for now I'll go with your background scan based
>>> suggestion and worry about those older platforms later.
>>>
>>>> So the only thing that should be needed is the capabilities to add an =
entry to the connection list that has a timeout. Right now only userspace v=
ia mgmt Add/Remove Device can modify this list. These entries would come fr=
om L2CAP connect() calls and need to fail if the device in question can not=
be found in x seconds.
>>>>
>>>> This means the connection logic from the background scanning should ge=
t a delayed workqueue for the timeout and we have to add timeout values to =
each entry in our list. And then remove them when timeout has been reached =
and send a proper socket error back to userspace. The real advantage comes =
into play that as a result the connection handling would then also use the =
controller whitelist and with that can easily connect multiple device after=
each other. For the automatic connections that we trigger via mgmt Add/Rem=
ove Device this already works right now.
>>>>
>>>
>>> OK, that sounds straightforward actually. I'll start looking into this
>>> and probably send out some RFC patches later.
>>>
>>
>> I guess I'm seeking some programming advice now as a net/bluetooth
>> noob. So it looks like in l2cap_code.c:l2cap_chan_connect I want to
>> replace the call to "hci_connect_le" with some variant of
>> hci_conn_params_set. Based on my understanding, eventually if an ADV
>> report is received, hci_event.c:process_adv_report will handle
>> creating the connection for us (in hci_event.c:check_pending_le_conn).
>>
>> From here, what would be the best (and proper) way to propagate this
>> information and the newly created hci_conn structure to the l2cap
>> layer so that the l2cap_conn structure can be set up?
>
> that is a good question. I think Johan is a bit deeper in the code. Might=
worthwhile to jump on IRC and brainstorm with him on this.
>

I'm pretty much always out of luck getting ahold of people on IRC, so
hopefully Johan will see this thread and respond eventually.

> He has done connection creation for the slave side using direct advertisi=
ng and that would have some similar logical handling. At least I think it d=
oes. And if not, we might first need a few cleanups to make the LE connecti=
on handling for L2CAP sockets more streamlined and easy to integrate.
>

So, from looking at the code, I see that l2cap_sock_connect calls
l2cap_chan_connect, which internally creates the hci_conn via
hci_connect_le. If all goes well, l2cap_sock_connect simply sits and
waits until the socket state becomes BT_CONNECTED. So, I guess we may
want something similar to that. We could revise hci_connect_le so that
it automatically creates the temporary whitelist entry and listens for
AD. Or perhaps we could have an additional API that sends an LE Create
Connection that accepts an hci_conn instead of an hci_dev, which can
then be called from hci_event.c:check_pending_le_conn.

Anyway, I think I have a rough idea of what needs to be done overall
but I'd appreciate Johan's input.

> Regards
>
> Marcel
>

Thanks,
Arman

2014-11-15 00:35:12

by Marcel Holtmann

[permalink] [raw]

Subject: Re: Handling EBUSY for LE connections.

Hi Arman,

>>>>>> We have a use case in which multiple LE connection attempts are made
>>>>>> in rapid succession to a device that frequently rotates its private
>>>>>> address. The connect call frequently fails with
>>>>>> "org.bluez.Error.Failed: Device or resource is busy (16)". I tracked
>>>>>> this down to a line in net/bluetooth/hci_conn.c:hci_connect_le that
>>>>>> has the following comment before returning -EBUSY:
>>>>>>
>>>>>> /* Since the controller supports only one LE connection attempt at a
>>>>>> * a time, we return -EBUSY if there is any connection attempt running.
>>>>>> */
>>>>>>
>>>>>> This is all fine and good, however the org.bluez.Error.Failed error
>>>>>> name is too generic for an application to determine how to recover
>>>>>> from this case and it would be nice if there was a way for an
>>>>>> application to know that it should perhaps try connecting again later,
>>>>>> especially without having to parse the error message string to make
>>>>>> sense of things while there can be a clear error name.
>>>>>>
>>>>>> So, would it make sense for bluetoothd to return
>>>>>> org.bluez.Error.InProgress or even org.bluez.Error.Busy here? Another
>>>>>> potential solution I had in mind was to have bluetoothd actually queue
>>>>>> LE connection attempts if one is already pending, or perhaps
>>>>>> automatically schedule a retry if kernel returns EBUSY.
>>>>>>
>>>>>> What do you think is the best solution here?
>>>>>
>>>>> we should fix that directly in the kernel. Back in the days we had a similar issue with BR/EDR and some more "dumb" controllers. We queued the connection attempt in the kernel and issued it whenever the controller was free again.
>>>>>
>>>>> One other thing that we should be doing for "direct" LE connection attempts is to run them through the LE passive background scanning. That way we have the same connection logic handling the background scanning and the connection attempts.
>>>>>
>>>>> Mainly I am thinking that if we have a L2CAP connect() request, we just add it to the LE passive background scanning. The only difference would be that these can actually time out. Otherwise it should be exactly the same.
>>>>
>>>> OK, so what is the work that I would need to do here? I'm not super
>>>> familiar with the kernel code, hence I'd appreciate some guidance.
>>>> From what you said, I gather two different work items:
>>>>
>>>> 1. Queue connection attempts and reissue when the controller becomes free.
>>>> 2. Make use of background scanning.
>>>>
>>>> If we do #2 would we need #1 as well? I assume #2 will only be
>>>> possible on later kernels, is there a solution that would easily
>>>> backport to older kernels?
>>>
>>> the background scanning is part of 3.17 and later, and you want to backport that anyway. Main reason is that it using the controller whitelist and is needed for proper LE HID support anyway.
>>>
>>
>> OK. For ChromiumOS's 3.10 and 3.14 kernels Scott has already done a
>> backport from the latest bluetooth-next but we had concerns for kernel
>> 3.8. I guess for now I'll go with your background scan based
>> suggestion and worry about those older platforms later.
>>
>>> So the only thing that should be needed is the capabilities to add an entry to the connection list that has a timeout. Right now only userspace via mgmt Add/Remove Device can modify this list. These entries would come from L2CAP connect() calls and need to fail if the device in question can not be found in x seconds.
>>>
>>> This means the connection logic from the background scanning should get a delayed workqueue for the timeout and we have to add timeout values to each entry in our list. And then remove them when timeout has been reached and send a proper socket error back to userspace. The real advantage comes into play that as a result the connection handling would then also use the controller whitelist and with that can easily connect multiple device after each other. For the automatic connections that we trigger via mgmt Add/Remove Device this already works right now.
>>>
>>
>> OK, that sounds straightforward actually. I'll start looking into this
>> and probably send out some RFC patches later.
>>
>
> I guess I'm seeking some programming advice now as a net/bluetooth
> noob. So it looks like in l2cap_code.c:l2cap_chan_connect I want to
> replace the call to "hci_connect_le" with some variant of
> hci_conn_params_set. Based on my understanding, eventually if an ADV
> report is received, hci_event.c:process_adv_report will handle
> creating the connection for us (in hci_event.c:check_pending_le_conn).
>
> From here, what would be the best (and proper) way to propagate this
> information and the newly created hci_conn structure to the l2cap
> layer so that the l2cap_conn structure can be set up?

that is a good question. I think Johan is a bit deeper in the code. Might worthwhile to jump on IRC and brainstorm with him on this.

He has done connection creation for the slave side using direct advertising and that would have some similar logical handling. At least I think it does. And if not, we might first need a few cleanups to make the LE connection handling for L2CAP sockets more streamlined and easy to integrate.

Regards

Marcel

2014-11-14 23:17:29

by Arman Uguray

[permalink] [raw]

Subject: Re: Handling EBUSY for LE connections.

Hi Marcel,

> On Wed, Nov 12, 2014 at 9:28 PM, Arman Uguray <[email protected]> wr=
ote:
> Hi Marcel,
>
>> On Wed, Nov 12, 2014 at 9:10 PM, Marcel Holtmann <[email protected]> w=
rote:
>> Hi Arman,
>>
>>>>> We have a use case in which multiple LE connection attempts are made
>>>>> in rapid succession to a device that frequently rotates its private
>>>>> address. The connect call frequently fails with
>>>>> "org.bluez.Error.Failed: Device or resource is busy (16)". I tracked
>>>>> this down to a line in net/bluetooth/hci_conn.c:hci_connect_le that
>>>>> has the following comment before returning -EBUSY:
>>>>>
>>>>> /* Since the controller supports only one LE connection attempt at =
a
>>>>> * a time, we return -EBUSY if there is any connection attempt runn=
ing.
>>>>> */
>>>>>
>>>>> This is all fine and good, however the org.bluez.Error.Failed error
>>>>> name is too generic for an application to determine how to recover
>>>>> from this case and it would be nice if there was a way for an
>>>>> application to know that it should perhaps try connecting again later=
,
>>>>> especially without having to parse the error message string to make
>>>>> sense of things while there can be a clear error name.
>>>>>
>>>>> So, would it make sense for bluetoothd to return
>>>>> org.bluez.Error.InProgress or even org.bluez.Error.Busy here? Another
>>>>> potential solution I had in mind was to have bluetoothd actually queu=
e
>>>>> LE connection attempts if one is already pending, or perhaps
>>>>> automatically schedule a retry if kernel returns EBUSY.
>>>>>
>>>>> What do you think is the best solution here?
>>>>
>>>> we should fix that directly in the kernel. Back in the days we had a s=
imilar issue with BR/EDR and some more "dumb" controllers. We queued the co=
nnection attempt in the kernel and issued it whenever the controller was fr=
ee again.
>>>>
>>>> One other thing that we should be doing for "direct" LE connection att=
empts is to run them through the LE passive background scanning. That way w=
e have the same connection logic handling the background scanning and the c=
onnection attempts.
>>>>
>>>> Mainly I am thinking that if we have a L2CAP connect() request, we jus=
t add it to the LE passive background scanning. The only difference would b=
e that these can actually time out. Otherwise it should be exactly the same=
.
>>>
>>> OK, so what is the work that I would need to do here? I'm not super
>>> familiar with the kernel code, hence I'd appreciate some guidance.
>>> From what you said, I gather two different work items:
>>>
>>> 1. Queue connection attempts and reissue when the controller becomes f=
ree.
>>> 2. Make use of background scanning.
>>>
>>> If we do #2 would we need #1 as well? I assume #2 will only be
>>> possible on later kernels, is there a solution that would easily
>>> backport to older kernels?
>>
>> the background scanning is part of 3.17 and later, and you want to backp=
ort that anyway. Main reason is that it using the controller whitelist and =
is needed for proper LE HID support anyway.
>>
>
> OK. For ChromiumOS's 3.10 and 3.14 kernels Scott has already done a
> backport from the latest bluetooth-next but we had concerns for kernel
> 3.8. I guess for now I'll go with your background scan based
> suggestion and worry about those older platforms later.
>
>> So the only thing that should be needed is the capabilities to add an en=
try to the connection list that has a timeout. Right now only userspace via=
mgmt Add/Remove Device can modify this list. These entries would come from=
L2CAP connect() calls and need to fail if the device in question can not b=
e found in x seconds.
>>
>> This means the connection logic from the background scanning should get =
a delayed workqueue for the timeout and we have to add timeout values to ea=
ch entry in our list. And then remove them when timeout has been reached an=
d send a proper socket error back to userspace. The real advantage comes in=
to play that as a result the connection handling would then also use the co=
ntroller whitelist and with that can easily connect multiple device after e=
ach other. For the automatic connections that we trigger via mgmt Add/Remov=
e Device this already works right now.
>>
>
> OK, that sounds straightforward actually. I'll start looking into this
> and probably send out some RFC patches later.
>
>> Regards
>>
>> Marcel
>>
>
> Thanks,
> Arman

I guess I'm seeking some programming advice now as a net/bluetooth
noob. So it looks like in l2cap_code.c:l2cap_chan_connect I want to
replace the call to "hci_connect_le" with some variant of
hci_conn_params_set. Based on my understanding, eventually if an ADV
report is received, hci_event.c:process_adv_report will handle
creating the connection for us (in hci_event.c:check_pending_le_conn).

>From here, what would be the best (and proper) way to propagate this
information and the newly created hci_conn structure to the l2cap
layer so that the l2cap_conn structure can be set up?

Thanks,
Arman

2014-11-13 05:28:14

by Arman Uguray

[permalink] [raw]

Subject: Re: Handling EBUSY for LE connections.

Hi Marcel,

> On Wed, Nov 12, 2014 at 9:10 PM, Marcel Holtmann <[email protected]> wr=
ote:
> Hi Arman,
>
>>>> We have a use case in which multiple LE connection attempts are made
>>>> in rapid succession to a device that frequently rotates its private
>>>> address. The connect call frequently fails with
>>>> "org.bluez.Error.Failed: Device or resource is busy (16)". I tracked
>>>> this down to a line in net/bluetooth/hci_conn.c:hci_connect_le that
>>>> has the following comment before returning -EBUSY:
>>>>
>>>> /* Since the controller supports only one LE connection attempt at a
>>>> * a time, we return -EBUSY if there is any connection attempt runni=
ng.
>>>> */
>>>>
>>>> This is all fine and good, however the org.bluez.Error.Failed error
>>>> name is too generic for an application to determine how to recover
>>>> from this case and it would be nice if there was a way for an
>>>> application to know that it should perhaps try connecting again later,
>>>> especially without having to parse the error message string to make
>>>> sense of things while there can be a clear error name.
>>>>
>>>> So, would it make sense for bluetoothd to return
>>>> org.bluez.Error.InProgress or even org.bluez.Error.Busy here? Another
>>>> potential solution I had in mind was to have bluetoothd actually queue
>>>> LE connection attempts if one is already pending, or perhaps
>>>> automatically schedule a retry if kernel returns EBUSY.
>>>>
>>>> What do you think is the best solution here?
>>>
>>> we should fix that directly in the kernel. Back in the days we had a si=
milar issue with BR/EDR and some more "dumb" controllers. We queued the con=
nection attempt in the kernel and issued it whenever the controller was fre=
e again.
>>>
>>> One other thing that we should be doing for "direct" LE connection atte=
mpts is to run them through the LE passive background scanning. That way we=
have the same connection logic handling the background scanning and the co=
nnection attempts.
>>>
>>> Mainly I am thinking that if we have a L2CAP connect() request, we just=
add it to the LE passive background scanning. The only difference would be=
that these can actually time out. Otherwise it should be exactly the same.
>>
>> OK, so what is the work that I would need to do here? I'm not super
>> familiar with the kernel code, hence I'd appreciate some guidance.
>> From what you said, I gather two different work items:
>>
>> 1. Queue connection attempts and reissue when the controller becomes fr=
ee.
>> 2. Make use of background scanning.
>>
>> If we do #2 would we need #1 as well? I assume #2 will only be
>> possible on later kernels, is there a solution that would easily
>> backport to older kernels?
>
> the background scanning is part of 3.17 and later, and you want to backpo=
rt that anyway. Main reason is that it using the controller whitelist and i=
s needed for proper LE HID support anyway.
>

OK. For ChromiumOS's 3.10 and 3.14 kernels Scott has already done a
backport from the latest bluetooth-next but we had concerns for kernel
3.8. I guess for now I'll go with your background scan based
suggestion and worry about those older platforms later.

> So the only thing that should be needed is the capabilities to add an ent=
ry to the connection list that has a timeout. Right now only userspace via =
mgmt Add/Remove Device can modify this list. These entries would come from =
L2CAP connect() calls and need to fail if the device in question can not be=
found in x seconds.
>
> This means the connection logic from the background scanning should get a=
delayed workqueue for the timeout and we have to add timeout values to eac=
h entry in our list. And then remove them when timeout has been reached and=
send a proper socket error back to userspace. The real advantage comes int=
o play that as a result the connection handling would then also use the con=
troller whitelist and with that can easily connect multiple device after ea=
ch other. For the automatic connections that we trigger via mgmt Add/Remove=
Device this already works right now.
>

OK, that sounds straightforward actually. I'll start looking into this
and probably send out some RFC patches later.

> Regards
>
> Marcel
>

Thanks,
Arman

2014-11-13 05:10:24

by Marcel Holtmann

[permalink] [raw]

Subject: Re: Handling EBUSY for LE connections.

Hi Arman,

>>> We have a use case in which multiple LE connection attempts are made
>>> in rapid succession to a device that frequently rotates its private
>>> address. The connect call frequently fails with
>>> "org.bluez.Error.Failed: Device or resource is busy (16)". I tracked
>>> this down to a line in net/bluetooth/hci_conn.c:hci_connect_le that
>>> has the following comment before returning -EBUSY:
>>>
>>> /* Since the controller supports only one LE connection attempt at a
>>> * a time, we return -EBUSY if there is any connection attempt running.
>>> */
>>>
>>> This is all fine and good, however the org.bluez.Error.Failed error
>>> name is too generic for an application to determine how to recover
>>> from this case and it would be nice if there was a way for an
>>> application to know that it should perhaps try connecting again later,
>>> especially without having to parse the error message string to make
>>> sense of things while there can be a clear error name.
>>>
>>> So, would it make sense for bluetoothd to return
>>> org.bluez.Error.InProgress or even org.bluez.Error.Busy here? Another
>>> potential solution I had in mind was to have bluetoothd actually queue
>>> LE connection attempts if one is already pending, or perhaps
>>> automatically schedule a retry if kernel returns EBUSY.
>>>
>>> What do you think is the best solution here?
>>
>> we should fix that directly in the kernel. Back in the days we had a similar issue with BR/EDR and some more "dumb" controllers. We queued the connection attempt in the kernel and issued it whenever the controller was free again.
>>
>> One other thing that we should be doing for "direct" LE connection attempts is to run them through the LE passive background scanning. That way we have the same connection logic handling the background scanning and the connection attempts.
>>
>> Mainly I am thinking that if we have a L2CAP connect() request, we just add it to the LE passive background scanning. The only difference would be that these can actually time out. Otherwise it should be exactly the same.
>
> OK, so what is the work that I would need to do here? I'm not super
> familiar with the kernel code, hence I'd appreciate some guidance.
> From what you said, I gather two different work items:
>
> 1. Queue connection attempts and reissue when the controller becomes free.
> 2. Make use of background scanning.
>
> If we do #2 would we need #1 as well? I assume #2 will only be
> possible on later kernels, is there a solution that would easily
> backport to older kernels?

the background scanning is part of 3.17 and later, and you want to backport that anyway. Main reason is that it using the controller whitelist and is needed for proper LE HID support anyway.

So the only thing that should be needed is the capabilities to add an entry to the connection list that has a timeout. Right now only userspace via mgmt Add/Remove Device can modify this list. These entries would come from L2CAP connect() calls and need to fail if the device in question can not be found in x seconds.

This means the connection logic from the background scanning should get a delayed workqueue for the timeout and we have to add timeout values to each entry in our list. And then remove them when timeout has been reached and send a proper socket error back to userspace. The real advantage comes into play that as a result the connection handling would then also use the controller whitelist and with that can easily connect multiple device after each other. For the automatic connections that we trigger via mgmt Add/Remove Device this already works right now.

Regards

Marcel

2014-11-13 04:51:53

by Arman Uguray

[permalink] [raw]

Subject: Re: Handling EBUSY for LE connections.

Hi Marcel,

> On Wed, Nov 12, 2014 at 8:41 PM, Marcel Holtmann <[email protected]> wrote:
> Hi Arman,
>
>> We have a use case in which multiple LE connection attempts are made
>> in rapid succession to a device that frequently rotates its private
>> address. The connect call frequently fails with
>> "org.bluez.Error.Failed: Device or resource is busy (16)". I tracked
>> this down to a line in net/bluetooth/hci_conn.c:hci_connect_le that
>> has the following comment before returning -EBUSY:
>>
>> /* Since the controller supports only one LE connection attempt at a
>> * a time, we return -EBUSY if there is any connection attempt running.
>> */
>>
>> This is all fine and good, however the org.bluez.Error.Failed error
>> name is too generic for an application to determine how to recover
>> from this case and it would be nice if there was a way for an
>> application to know that it should perhaps try connecting again later,
>> especially without having to parse the error message string to make
>> sense of things while there can be a clear error name.
>>
>> So, would it make sense for bluetoothd to return
>> org.bluez.Error.InProgress or even org.bluez.Error.Busy here? Another
>> potential solution I had in mind was to have bluetoothd actually queue
>> LE connection attempts if one is already pending, or perhaps
>> automatically schedule a retry if kernel returns EBUSY.
>>
>> What do you think is the best solution here?
>
> we should fix that directly in the kernel. Back in the days we had a similar issue with BR/EDR and some more "dumb" controllers. We queued the connection attempt in the kernel and issued it whenever the controller was free again.
>
> One other thing that we should be doing for "direct" LE connection attempts is to run them through the LE passive background scanning. That way we have the same connection logic handling the background scanning and the connection attempts.
>
> Mainly I am thinking that if we have a L2CAP connect() request, we just add it to the LE passive background scanning. The only difference would be that these can actually time out. Otherwise it should be exactly the same.
>
> Regards
>
> Marcel
>

OK, so what is the work that I would need to do here? I'm not super
familiar with the kernel code, hence I'd appreciate some guidance.
>From what you said, I gather two different work items:

1. Queue connection attempts and reissue when the controller becomes free.
2. Make use of background scanning.

If we do #2 would we need #1 as well? I assume #2 will only be
possible on later kernels, is there a solution that would easily
backport to older kernels?

Thanks,
Arman

2014-11-13 04:41:36

by Marcel Holtmann

[permalink] [raw]

Subject: Re: Handling EBUSY for LE connections.

Hi Arman,

> We have a use case in which multiple LE connection attempts are made
> in rapid succession to a device that frequently rotates its private
> address. The connect call frequently fails with
> "org.bluez.Error.Failed: Device or resource is busy (16)". I tracked
> this down to a line in net/bluetooth/hci_conn.c:hci_connect_le that
> has the following comment before returning -EBUSY:
>
> /* Since the controller supports only one LE connection attempt at a
> * a time, we return -EBUSY if there is any connection attempt running.
> */
>
> This is all fine and good, however the org.bluez.Error.Failed error
> name is too generic for an application to determine how to recover
> from this case and it would be nice if there was a way for an
> application to know that it should perhaps try connecting again later,
> especially without having to parse the error message string to make
> sense of things while there can be a clear error name.
>
> So, would it make sense for bluetoothd to return
> org.bluez.Error.InProgress or even org.bluez.Error.Busy here? Another
> potential solution I had in mind was to have bluetoothd actually queue
> LE connection attempts if one is already pending, or perhaps
> automatically schedule a retry if kernel returns EBUSY.
>
> What do you think is the best solution here?

we should fix that directly in the kernel. Back in the days we had a similar issue with BR/EDR and some more "dumb" controllers. We queued the connection attempt in the kernel and issued it whenever the controller was free again.

One other thing that we should be doing for "direct" LE connection attempts is to run them through the LE passive background scanning. That way we have the same connection logic handling the background scanning and the connection attempts.

Mainly I am thinking that if we have a L2CAP connect() request, we just add it to the LE passive background scanning. The only difference would be that these can actually time out. Otherwise it should be exactly the same.

Regards

Marcel