Return-Path: MIME-Version: 1.0 Sender: armansito@google.com In-Reply-To: References: <50181A20-3313-48C7-924B-05C67B975139@holtmann.org> Date: Fri, 14 Nov 2014 15:17:29 -0800 Message-ID: Subject: Re: Handling EBUSY for LE connections. From: Arman Uguray To: Marcel Holtmann Cc: BlueZ development , Jakub Pawlowski Content-Type: text/plain; charset=UTF-8 List-ID: Hi Marcel, > On Wed, Nov 12, 2014 at 9:28 PM, Arman Uguray wr= ote: > Hi Marcel, > >> On Wed, Nov 12, 2014 at 9:10 PM, Marcel Holtmann w= rote: >> Hi Arman, >> >>>>> We have a use case in which multiple LE connection attempts are made >>>>> in rapid succession to a device that frequently rotates its private >>>>> address. The connect call frequently fails with >>>>> "org.bluez.Error.Failed: Device or resource is busy (16)". I tracked >>>>> this down to a line in net/bluetooth/hci_conn.c:hci_connect_le that >>>>> has the following comment before returning -EBUSY: >>>>> >>>>> /* Since the controller supports only one LE connection attempt at = a >>>>> * a time, we return -EBUSY if there is any connection attempt runn= ing. >>>>> */ >>>>> >>>>> This is all fine and good, however the org.bluez.Error.Failed error >>>>> name is too generic for an application to determine how to recover >>>>> from this case and it would be nice if there was a way for an >>>>> application to know that it should perhaps try connecting again later= , >>>>> especially without having to parse the error message string to make >>>>> sense of things while there can be a clear error name. >>>>> >>>>> So, would it make sense for bluetoothd to return >>>>> org.bluez.Error.InProgress or even org.bluez.Error.Busy here? Another >>>>> potential solution I had in mind was to have bluetoothd actually queu= e >>>>> LE connection attempts if one is already pending, or perhaps >>>>> automatically schedule a retry if kernel returns EBUSY. >>>>> >>>>> What do you think is the best solution here? >>>> >>>> we should fix that directly in the kernel. Back in the days we had a s= imilar issue with BR/EDR and some more "dumb" controllers. We queued the co= nnection attempt in the kernel and issued it whenever the controller was fr= ee again. >>>> >>>> One other thing that we should be doing for "direct" LE connection att= empts is to run them through the LE passive background scanning. That way w= e have the same connection logic handling the background scanning and the c= onnection attempts. >>>> >>>> Mainly I am thinking that if we have a L2CAP connect() request, we jus= t add it to the LE passive background scanning. The only difference would b= e that these can actually time out. Otherwise it should be exactly the same= . >>> >>> OK, so what is the work that I would need to do here? I'm not super >>> familiar with the kernel code, hence I'd appreciate some guidance. >>> From what you said, I gather two different work items: >>> >>> 1. Queue connection attempts and reissue when the controller becomes f= ree. >>> 2. Make use of background scanning. >>> >>> If we do #2 would we need #1 as well? I assume #2 will only be >>> possible on later kernels, is there a solution that would easily >>> backport to older kernels? >> >> the background scanning is part of 3.17 and later, and you want to backp= ort that anyway. Main reason is that it using the controller whitelist and = is needed for proper LE HID support anyway. >> > > OK. For ChromiumOS's 3.10 and 3.14 kernels Scott has already done a > backport from the latest bluetooth-next but we had concerns for kernel > 3.8. I guess for now I'll go with your background scan based > suggestion and worry about those older platforms later. > >> So the only thing that should be needed is the capabilities to add an en= try to the connection list that has a timeout. Right now only userspace via= mgmt Add/Remove Device can modify this list. These entries would come from= L2CAP connect() calls and need to fail if the device in question can not b= e found in x seconds. >> >> This means the connection logic from the background scanning should get = a delayed workqueue for the timeout and we have to add timeout values to ea= ch entry in our list. And then remove them when timeout has been reached an= d send a proper socket error back to userspace. The real advantage comes in= to play that as a result the connection handling would then also use the co= ntroller whitelist and with that can easily connect multiple device after e= ach other. For the automatic connections that we trigger via mgmt Add/Remov= e Device this already works right now. >> > > OK, that sounds straightforward actually. I'll start looking into this > and probably send out some RFC patches later. > >> Regards >> >> Marcel >> > > Thanks, > Arman I guess I'm seeking some programming advice now as a net/bluetooth noob. So it looks like in l2cap_code.c:l2cap_chan_connect I want to replace the call to "hci_connect_le" with some variant of hci_conn_params_set. Based on my understanding, eventually if an ADV report is received, hci_event.c:process_adv_report will handle creating the connection for us (in hci_event.c:check_pending_le_conn). >From here, what would be the best (and proper) way to propagate this information and the newly created hci_conn structure to the l2cap layer so that the l2cap_conn structure can be set up? Thanks, Arman