Return-Path: <cyrus@holtmann.org>
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 8.0 \(1990.1\))
Subject: Re: Handling EBUSY for LE connections.
From: Marcel Holtmann <marcel@holtmann.org>
In-Reply-To: <CAHrH25S5X5UZJwH0JuoWGW70bDdO48iS8Ex5pF0K9Cd7JsRV8A@mail.gmail.com>
Date: Sat, 15 Nov 2014 09:35:12 +0900
Cc: BlueZ development <linux-bluetooth@vger.kernel.org>,
	Jakub Pawlowski <jpawlowski@google.com>
Message-Id: <7ED5AE30-EF53-47EA-A9CE-DD371CE10910@holtmann.org>
References: <CAHrH25TMxuHY3hqQQFWe+=5ghzc5rrEG2ED5sa410ddiJ-YuDA@mail.gmail.com> <50181A20-3313-48C7-924B-05C67B975139@holtmann.org> <CAHrH25T6tmUO2Zvsr=R7+yHwQE0_hY_sMt4UQAmCBMZBOH0PRQ@mail.gmail.com> <ED9629AB-B978-49FE-A7D7-5D9E62642504@holtmann.org> <CAHrH25Te4J6ospnJYA00mb1RofwHrS4=KD8LjmTLvo5Y4Jy8EQ@mail.gmail.com> <CAHrH25S5X5UZJwH0JuoWGW70bDdO48iS8Ex5pF0K9Cd7JsRV8A@mail.gmail.com>
To: Arman Uguray <armansito@chromium.org>
Sender: linux-bluetooth-owner@vger.kernel.org
List-ID: <linux-bluetooth.vger.kernel.org>

Hi Arman,

>>>>>> We have a use case in which multiple LE connection attempts are made
>>>>>> in rapid succession to a device that frequently rotates its private
>>>>>> address. The connect call frequently fails with
>>>>>> "org.bluez.Error.Failed: Device or resource is busy (16)". I tracked
>>>>>> this down to a line in net/bluetooth/hci_conn.c:hci_connect_le that
>>>>>> has the following comment before returning -EBUSY:
>>>>>> 
>>>>>>  /* Since the controller supports only one LE connection attempt at a
>>>>>>   * a time, we return -EBUSY if there is any connection attempt running.
>>>>>>   */
>>>>>> 
>>>>>> This is all fine and good, however the org.bluez.Error.Failed error
>>>>>> name is too generic for an application to determine how to recover
>>>>>> from this case and it would be nice if there was a way for an
>>>>>> application to know that it should perhaps try connecting again later,
>>>>>> especially without having to parse the error message string to make
>>>>>> sense of things while there can be a clear error name.
>>>>>> 
>>>>>> So, would it make sense for bluetoothd to return
>>>>>> org.bluez.Error.InProgress or even org.bluez.Error.Busy here? Another
>>>>>> potential solution I had in mind was to have bluetoothd actually queue
>>>>>> LE connection attempts if one is already pending, or perhaps
>>>>>> automatically schedule a retry if kernel returns EBUSY.
>>>>>> 
>>>>>> What do you think is the best solution here?
>>>>> 
>>>>> we should fix that directly in the kernel. Back in the days we had a similar issue with BR/EDR and some more "dumb" controllers. We queued the connection attempt in the kernel and issued it whenever the controller was free again.
>>>>> 
>>>>> One other thing that we should be doing for "direct" LE connection attempts is to run them through the LE passive background scanning. That way we have the same connection logic handling the background scanning and the connection attempts.
>>>>> 
>>>>> Mainly I am thinking that if we have a L2CAP connect() request, we just add it to the LE passive background scanning. The only difference would be that these can actually time out. Otherwise it should be exactly the same.
>>>> 
>>>> OK, so what is the work that I would need to do here? I'm not super
>>>> familiar with the kernel code, hence I'd appreciate some guidance.
>>>> From what you said, I gather two different work items:
>>>> 
>>>> 1. Queue connection attempts and reissue when the controller becomes free.
>>>> 2. Make use of background scanning.
>>>> 
>>>> If we do #2 would we need #1 as well? I assume #2 will only be
>>>> possible on later kernels, is there a solution that would easily
>>>> backport to older kernels?
>>> 
>>> the background scanning is part of 3.17 and later, and you want to backport that anyway. Main reason is that it using the controller whitelist and is needed for proper LE HID support anyway.
>>> 
>> 
>> OK. For ChromiumOS's 3.10 and 3.14 kernels Scott has already done a
>> backport from the latest bluetooth-next but we had concerns for kernel
>> 3.8. I guess for now I'll go with your background scan based
>> suggestion and worry about those older platforms later.
>> 
>>> So the only thing that should be needed is the capabilities to add an entry to the connection list that has a timeout. Right now only userspace via mgmt Add/Remove Device can modify this list. These entries would come from L2CAP connect() calls and need to fail if the device in question can not be found in x seconds.
>>> 
>>> This means the connection logic from the background scanning should get a delayed workqueue for the timeout and we have to add timeout values to each entry in our list. And then remove them when timeout has been reached and send a proper socket error back to userspace. The real advantage comes into play that as a result the connection handling would then also use the controller whitelist and with that can easily connect multiple device after each other. For the automatic connections that we trigger via mgmt Add/Remove Device this already works right now.
>>> 
>> 
>> OK, that sounds straightforward actually. I'll start looking into this
>> and probably send out some RFC patches later.
>> 
> 
> I guess I'm seeking some programming advice now as a net/bluetooth
> noob. So it looks like in l2cap_code.c:l2cap_chan_connect I want to
> replace the call to "hci_connect_le" with some variant of
> hci_conn_params_set. Based on my understanding, eventually if an ADV
> report is received, hci_event.c:process_adv_report will handle
> creating the connection for us (in hci_event.c:check_pending_le_conn).
> 
> From here, what would be the best (and proper) way to propagate this
> information and the newly created hci_conn structure to the l2cap
> layer so that the l2cap_conn structure can be set up?

that is a good question. I think Johan is a bit deeper in the code. Might worthwhile to jump on IRC and brainstorm with him on this.

He has done connection creation for the slave side using direct advertising and that would have some similar logical handling. At least I think it does. And if not, we might first need a few cleanups to make the LE connection handling for L2CAP sockets more streamlined and easy to integrate.

Regards

Marcel