2014-03-11 21:44:46

by Scott James Remnant

[permalink] [raw]
Subject: rctest -c "Can't connect: Device or resource busy (16)"

Is this expected behavior for RFCOMM? When using rctest -r on one
side, and rctest -c (connect/disconnect/connect...) on the other, the
client side fails to connect a significant percentage of the time:

rctest[23705]: Can't connect: Device or resource busy (16)
rctest[23705]: Can't connect: Device or resource busy (16)
rctest[23705]: Can't connect: Device or resource busy (16)
rctest[23705]: Can't connect: Device or resource busy (16)
rctest[23705]: Can't connect: Device or resource busy (16)
rctest[23705]: Can't connect: Device or resource busy (16)
rctest[23705]: Connected [handle 34817, class 0xff8801, priority 0]
rctest[23705]: Can't connect: Device or resource busy (16)
rctest[23705]: Can't connect: Device or resource busy (16)
rctest[23705]: Can't connect: Device or resource busy (16)
rctest[23705]: Can't connect: Device or resource busy (16)
rctest[23705]: Can't connect: Device or resource busy (16)
rctest[23705]: Can't connect: Device or resource busy (16)

Scott


2014-03-28 17:49:42

by Scott James Remnant

[permalink] [raw]
Subject: Re: rctest -c "Can't connect: Device or resource busy (16)"

On Wed, Mar 12, 2014 at 10:18 AM, Peter Hurley <[email protected]> wrote:
>> On 03/11/2014 10:14 PM, Marcel Holtmann wrote:

> There are 2 separate problems going on here.
>
> The failure to reconnect is happening because after sending DISC,
> the sender still replies to stale MCC packets, which the receiver
> (whose state now reflects the DISC) replies with a DM packet which
> gets received on the re-opened dlc, causing it to disconnect with
> error. This is trivially fixable, and AFAICT, is not a regression.
>
> The 2nd problem of not being able to reconnect in a reasonable
> number of times is due to the huge round-trip delays after receiving
> DISC and replying with UA.
>

Not sure whether this is related, or a different problem:

Using BlueZ's Profile APIs to make an RFCOMM connection to a paired
device, write data, receive reply, then disconnect after a while
(within 10 mins usually) we end up in a state where further
connections to that device fail. On 3.8 they seem to fail with
org.bluez.Error.InProgress while on bluetooth-next they fail with
org.bluez.Error.Failed.

"connect ADDR" in bluetoothctl returns the same error, and there isn't
any btmon activity for the attempt.

Restarting bluetoothd, resetting the radio, and even rmmod/modprobe of
btusb do not fix the issue - only a full reboot.

(btmon of the entire session attached)

Scott


Attachments:
355651.btsnoop.bz2 (222.90 kB)

2014-03-13 16:43:06

by Peter Hurley

[permalink] [raw]
Subject: [PATCH 3/3] bluetooth: rfcomm: Defer session teardown after last dlc

Commit 9cf5b0ea3a7f1432c61029f7aaf4b8b338628884,
[Bluetooth] Disconnect L2CAP connection after last RFCOMM DLC,
immediately disconnects the session when closing the last dlc.
This commit predates the later addition of a session timer [1]
which sets up a timer to automatically close the session after the
last dlc is closed. This automatic behavior was further noted
noted in [2]:

When the last RFCOMM data channel is closed, a timer is normally set
up to disconnect the control channel at a later time. If the control
channel disconnect command is sent with the timer pending, the timer
needs to be cancelled.

Rely on automatic session disconnect; this fulfills the responsibility
of closing the multiplexer while allowing the last dlc to be closed and
reopened without needing to handshake a new l2cap connection.

[1] commit 9e726b17422bade75fba94e625cd35fd1353e682
Bluetooth: Fix rejected connection not disconnecting ACL link

[2] commit 79e654787c67f6b05f73366ff8ccac72ba7249e6
Bluetooth: Clear RFCOMM session timer when disconnecting last channel

Signed-off-by: Peter Hurley <[email protected]>
---
net/bluetooth/rfcomm/core.c | 7 -------
1 file changed, 7 deletions(-)

diff --git a/net/bluetooth/rfcomm/core.c b/net/bluetooth/rfcomm/core.c
index 292322c..062a96c 100644
--- a/net/bluetooth/rfcomm/core.c
+++ b/net/bluetooth/rfcomm/core.c
@@ -1193,13 +1193,6 @@ static struct rfcomm_session *rfcomm_recv_ua(struct rfcomm_session *s, u8 dlci)
case BT_DISCONN:
d->state = BT_CLOSED;
__rfcomm_dlc_close(d, 0);
-
- if (list_empty(&s->dlcs)) {
- s->state = BT_DISCONN;
- rfcomm_send_disc(s, 0);
- rfcomm_session_clear_timer(s);
- }
-
break;
}
} else {
--
1.8.1.2

2014-03-13 16:43:05

by Peter Hurley

[permalink] [raw]
Subject: [PATCH 2/3] bluetooth: rfcomm: Create new session if closing old session

If the rfcomm session has already sent DISC (thus in BT_DISCONN
state), allow dlc opens to create a new session instead of trying
to open a dlc in a closing session (and instead of returning an
error).

Allow multiple rfcomm sessions with the same endpoints as long as
the existing sessions are either in BT_DISCONN or BT_CLOSED states.

Fixes multi-second delays when running rctest -c on one station
and rctest -r on the other.

Reported-by: Scott James Remnant <[email protected]>
Signed-off-by: Peter Hurley <[email protected]>
---
net/bluetooth/rfcomm/core.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/net/bluetooth/rfcomm/core.c b/net/bluetooth/rfcomm/core.c
index 6aa90c0..292322c 100644
--- a/net/bluetooth/rfcomm/core.c
+++ b/net/bluetooth/rfcomm/core.c
@@ -695,6 +695,9 @@ static struct rfcomm_session *rfcomm_session_get(bdaddr_t *src, bdaddr_t *dst)
s = list_entry(p, struct rfcomm_session, list);
chan = l2cap_pi(s->sock->sk)->chan;

+ if (s->state == BT_DISCONN || s->state == BT_CLOSED)
+ continue;
+
if ((!bacmp(src, BDADDR_ANY) || !bacmp(&chan->src, src)) &&
!bacmp(&chan->dst, dst))
return s;
--
1.8.1.2


2014-03-13 16:43:04

by Peter Hurley

[permalink] [raw]
Subject: [PATCH 1/3] bluetooth: rfcomm: Reply with DM after dlc disconnect

Stale commands and data may be received after DISC has already
been sent for a dlc; specifically the MSC, RLS, RPN and DISC commands
must reply with DM for a dlc already closing. [The PN command receive
already handles this case and other TS 0710 commands are not dlc-specific.]

Fixes when a stale reply to a stale command causes a DM response
on a newly reopened dlc. For example,

Station A Station B

MSC --->| |
| |<--- DISC
| MSC ----> |
| <---- DISC |---> MSC
DISC <---| |<--- MSC
UA --->| <---- MSC |

The dlc is now closed on Station A.

| UA ----> |
MSC <---| |---> UA

The dlc is now closed on Station B.

DM --->| | Open new dlc @ same dlci
| DM ----> |
| |---> DM
| | Prematurely closes new dlc.

Reported-by: Scott James Remnant <[email protected]>
Signed-off-by: Peter Hurley <[email protected]>
---
net/bluetooth/rfcomm/core.c | 22 +++++++++++++++++++---
1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/net/bluetooth/rfcomm/core.c b/net/bluetooth/rfcomm/core.c
index 112749c..6aa90c0 100644
--- a/net/bluetooth/rfcomm/core.c
+++ b/net/bluetooth/rfcomm/core.c
@@ -1253,7 +1253,7 @@ static struct rfcomm_session *rfcomm_recv_disc(struct rfcomm_session *s,

if (dlci) {
struct rfcomm_dlc *d = rfcomm_dlc_get(s, dlci);
- if (d) {
+ if (d && d->state != BT_DISCONN) {
rfcomm_send_ua(s, dlci);

if (d->state == BT_CONNECT || d->state == BT_CONFIG)
@@ -1445,6 +1445,7 @@ static int rfcomm_recv_pn(struct rfcomm_session *s, int cr, struct sk_buff *skb)

static int rfcomm_recv_rpn(struct rfcomm_session *s, int cr, int len, struct sk_buff *skb)
{
+ struct rfcomm_dlc *d;
struct rfcomm_rpn *rpn = (void *) skb->data;
u8 dlci = __get_dlci(rpn->dlci);

@@ -1461,6 +1462,12 @@ static int rfcomm_recv_rpn(struct rfcomm_session *s, int cr, int len, struct sk_
dlci, cr, len, rpn->bit_rate, rpn->line_settings, rpn->flow_ctrl,
rpn->xon_char, rpn->xoff_char, rpn->param_mask);

+ d = rfcomm_dlc_get(s, dlci);
+ if (!d || d->state == BT_DISCONN) {
+ rfcomm_send_dm(s, dlci);
+ return 0;
+ }
+
if (!cr)
return 0;

@@ -1551,11 +1558,18 @@ rpn_out:

static int rfcomm_recv_rls(struct rfcomm_session *s, int cr, struct sk_buff *skb)
{
+ struct rfcomm_dlc *d;
struct rfcomm_rls *rls = (void *) skb->data;
u8 dlci = __get_dlci(rls->dlci);

BT_DBG("dlci %d cr %d status 0x%x", dlci, cr, rls->status);

+ d = rfcomm_dlc_get(s, dlci);
+ if (!d || d->state == BT_DISCONN) {
+ rfcomm_send_dm(s, dlci);
+ return 0;
+ }
+
if (!cr)
return 0;

@@ -1577,8 +1591,10 @@ static int rfcomm_recv_msc(struct rfcomm_session *s, int cr, struct sk_buff *skb
BT_DBG("dlci %d cr %d v24 0x%x", dlci, cr, msc->v24_sig);

d = rfcomm_dlc_get(s, dlci);
- if (!d)
+ if (!d || d->state == BT_DISCONN) {
+ rfcomm_send_dm(s, dlci);
return 0;
+ }

if (cr) {
if (msc->v24_sig & RFCOMM_V24_FC && !d->cfc)
@@ -1671,7 +1687,7 @@ static int rfcomm_recv_data(struct rfcomm_session *s, u8 dlci, int pf, struct sk
BT_DBG("session %p state %ld dlci %d pf %d", s, s->state, dlci, pf);

d = rfcomm_dlc_get(s, dlci);
- if (!d) {
+ if (!d || d->state == BT_DISCONN) {
rfcomm_send_dm(s, dlci);
goto drop;
}
--
1.8.1.2

2014-03-13 16:43:03

by Peter Hurley

[permalink] [raw]
Subject: [PATCH 0/3] Fix RFCOMM connect/disconn races

Marcel,

This patch series fixes the observed errors when running
rctest -r on one station and rctest -c on the other.

Please note that patch 2/3 is somewhat risky in that it allows
multiple RFCOMM sessions between the same endpoints to co-exist,
provided that all but one of the sessions is closing or is closed.
I did test this, but there may be some subtlety I've overlooked.

[I note this because you may prefer the simpler and more bulletproof
solution of returning -EBUSY from __rfcomm_dlc_open() in this
situation; I can understand if you have lost patience for RFCOMM
breakage.]

This series does not address the excessive latency from sending
the DISC command to receiving the UA reply. I think a partial
solution would be to skip the rfcomm thread and run
rfcomm_process_sessions() directly from rfcomm_l2data_ready();
this would require some rework where sk->sk_data_ready() is
called directly from contexts that aren't suitable for running
rfcomm_process_sessions(). And this wouldn't address the
USB polling latency at all.

Regards,

Peter Hurley (3):
bluetooth: rfcomm: Reply with DM after dlc disconnect
bluetooth: rfcomm: Create new session if closing old session
bluetooth: rfcomm: Defer session teardown after last dlc

net/bluetooth/rfcomm/core.c | 32 ++++++++++++++++++++++----------
1 file changed, 22 insertions(+), 10 deletions(-)

--
1.8.1.2

2014-03-12 17:18:50

by Peter Hurley

[permalink] [raw]
Subject: Re: rctest -c "Can't connect: Device or resource busy (16)"

> On 03/11/2014 10:14 PM, Marcel Holtmann wrote:
>> Hi Scott,
>>
>>> 3.10 with bluetooth-next applied - btmon of both sides attached
>>
>> I am seeing the same with bluetooth-next on a 3.13 kernel. It even has a weird pattern for me.
>>
>> rctest[52844]: Connected [handle 11, class 0x000000, priority 0]
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Connection refused (111)
>> rctest[52844]: Can't connect: Connection reset by peer (104)
>> rctest[52844]: Can't connect: Connection reset by peer (104)
>> rctest[52844]: Connected [handle 11, class 0x000000, priority 0]
>>
>> Unfortunately I currently have no idea on why this happens. When using
>> SO_LINGER of 60 seconds with rctest -L 60 -c <bdaddr> I am getting some
>> really odd tracing behavior.
>>
>>> ACL Data RX: Handle 11 flags 0x02 dlen 14
>> L2CAP: Command Reject (0x01) ident 7 len 6
>> Reason: Invalid CID in request (0x0002)
>> Destination CID: 64
>> Source CID: 0
>>
>> This should never happen and makes me wonder if something is
>> even screwed up one down in L2CAP. Do you happen to have time
>> to bisect this and figure out which patch introduced this behavior.

There are 2 separate problems going on here.

The failure to reconnect is happening because after sending DISC,
the sender still replies to stale MCC packets, which the receiver
(whose state now reflects the DISC) replies with a DM packet which
gets received on the re-opened dlc, causing it to disconnect with
error. This is trivially fixable, and AFAICT, is not a regression.

The 2nd problem of not being able to reconnect in a reasonable
number of times is due to the huge round-trip delays after receiving
DISC and replying with UA.

This is a result of the excessive number of execution contexts
being used on the BT stack.

1) from USB interrupt -> btusb_bulk_complete -> queue_work(rx_work)
2) hci_rx_work -> l2cap_data_channel -> l2cap_sock_recv_cb -> sock_queue_rcv_skb
3) -> rfcomm_l2data_ready -> rfcomm_schedule
4) rfcomm_process_rx -> rfcomm_recv_disc -> rfcomm_send_ua -> l2cap_do_send -> hci_queue_acl
5) hci_tx_work -> btusb_send_frame

On a 8-core preempt workstation, this takes ~200us, which is a lot for
receiving a 12-byte packet and replying with a 12-byte packet.
Plus, it doesn't help that USB tacks on another 918us to actually
complete the URB.

On a 1-core non-preempt laptop, this takes 51ms!

Regards,
Peter Hurley

2014-03-12 04:27:57

by Peter Hurley

[permalink] [raw]
Subject: Re: rctest -c "Can't connect: Device or resource busy (16)"

On 03/11/2014 11:01 PM, Peter Hurley wrote:
> On 03/11/2014 10:14 PM, Marcel Holtmann wrote:
>> Hi Scott,
>>
>>> 3.10 with bluetooth-next applied - btmon of both sides attached
>>
>> I am seeing the same with bluetooth-next on a 3.13 kernel. It even has a weird pattern for me.
>>
>> rctest[52844]: Connected [handle 11, class 0x000000, priority 0]
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Connection refused (111)
>> rctest[52844]: Can't connect: Connection reset by peer (104)
>> rctest[52844]: Can't connect: Connection reset by peer (104)
>> rctest[52844]: Connected [handle 11, class 0x000000, priority 0]
>>
>> Unfortunately I currently have no idea on why this happens. When using
>> SO_LINGER of 60 seconds with rctest -L 60 -c <bdaddr> I am getting some
>> really odd tracing behavior.
>>
>>> ACL Data RX: Handle 11 flags 0x02 dlen 14
>> L2CAP: Command Reject (0x01) ident 7 len 6
>> Reason: Invalid CID in request (0x0002)
>> Destination CID: 64
>> Source CID: 0
>>
>> This should never happen and makes me wonder if something is
>> even screwed up one down in L2CAP. Do you happen to have time
>> to bisect this and figure out which patch introduced this behavior.

Sorry, I was wrong about the UA non-receipt; found it much further
down in the kernel trace. There's quite a lag between sending
the DISC and receiving the UA reply; about 12ms.

2014-03-12 03:27:29

by Peter Hurley

[permalink] [raw]
Subject: Re: rctest -c "Can't connect: Device or resource busy (16)"

On 03/11/2014 11:01 PM, Peter Hurley wrote:
> On 03/11/2014 10:14 PM, Marcel Holtmann wrote:
>> Hi Scott,
>>
>>> 3.10 with bluetooth-next applied - btmon of both sides attached
>>
>> I am seeing the same with bluetooth-next on a 3.13 kernel. It even has a weird pattern for me.

The pattern occurs because this the dlc is not getting
unlinked on this end until the watchdog timer goes off and
closes it. [The listener is unlinking the dlc properly. ]

>> rctest[52844]: Connected [handle 11, class 0x000000, priority 0]
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Device or resource busy (16)
>> rctest[52844]: Can't connect: Connection refused (111)
>> rctest[52844]: Can't connect: Connection reset by peer (104)
>> rctest[52844]: Can't connect: Connection reset by peer (104)
>> rctest[52844]: Connected [handle 11, class 0x000000, priority 0]
>>
>> Unfortunately I currently have no idea on why this happens. When using
>> SO_LINGER of 60 seconds with rctest -L 60 -c <bdaddr> I am getting some
>> really odd tracing behavior.
>>
>>> ACL Data RX: Handle 11 flags 0x02 dlen 14
>> L2CAP: Command Reject (0x01) ident 7 len 6
>> Reason: Invalid CID in request (0x0002)
>> Destination CID: 64
>> Source CID: 0
>>
>> This should never happen and makes me wonder if something is
>> even screwed up one down in L2CAP. Do you happen to have time
>> to bisect this and figure out which patch introduced this behavior.
>
> Marcel,
>
> I'm looking into this on the assumption I broke something in
> the refactor of __rfcomm_dlc_close.

Ok, it's nothing I broke. This is repeatable on 3.12 (sender and
receiver). I'll keep looking this anyway since at least I know
what's broken.

> I can reproduce this and looking at the kernel trace shows
> that that UA reply of the dlc DISC is not being processed, because
> the rfcomm thread is not getting scheduled, but I'm not sure where
> that was supposed to happen.
>
> IOW,
>
> rctest -r | rctest -c xx:xx:....
> | ...
> | rfcomm_sock_shutdown
> | __rfcomm_sock_close
> | rfcomm_dlc_close dlci 20
> | __rfcomm_dlc_close
> | rfcomm_send_disc dlci 20
> rfcomm_recv_disc state 1 dlci 20 |
> rfcomm_send_ua dlci 20 |
> __rfcomm_dlc_close state 9 |
> rfcomm_dlc_unlink |
> ... | UA never shows up here because
> | rfcomm_process_sessions isn't run
> | so rfcomm_recv_ua never happens

Regards,
Peter Hurley

2014-03-12 03:01:23

by Peter Hurley

[permalink] [raw]
Subject: Re: rctest -c "Can't connect: Device or resource busy (16)"

On 03/11/2014 10:14 PM, Marcel Holtmann wrote:
> Hi Scott,
>
>> 3.10 with bluetooth-next applied - btmon of both sides attached
>
> I am seeing the same with bluetooth-next on a 3.13 kernel. It even has a weird pattern for me.
>
> rctest[52844]: Connected [handle 11, class 0x000000, priority 0]
> rctest[52844]: Can't connect: Device or resource busy (16)
> rctest[52844]: Can't connect: Device or resource busy (16)
> rctest[52844]: Can't connect: Device or resource busy (16)
> rctest[52844]: Can't connect: Device or resource busy (16)
> rctest[52844]: Can't connect: Device or resource busy (16)
> rctest[52844]: Can't connect: Device or resource busy (16)
> rctest[52844]: Can't connect: Device or resource busy (16)
> rctest[52844]: Can't connect: Device or resource busy (16)
> rctest[52844]: Can't connect: Device or resource busy (16)
> rctest[52844]: Can't connect: Device or resource busy (16)
> rctest[52844]: Can't connect: Device or resource busy (16)
> rctest[52844]: Can't connect: Device or resource busy (16)
> rctest[52844]: Can't connect: Connection refused (111)
> rctest[52844]: Can't connect: Connection reset by peer (104)
> rctest[52844]: Can't connect: Connection reset by peer (104)
> rctest[52844]: Connected [handle 11, class 0x000000, priority 0]
>
> Unfortunately I currently have no idea on why this happens. When using
> SO_LINGER of 60 seconds with rctest -L 60 -c <bdaddr> I am getting some
> really odd tracing behavior.
>
>> ACL Data RX: Handle 11 flags 0x02 dlen 14
> L2CAP: Command Reject (0x01) ident 7 len 6
> Reason: Invalid CID in request (0x0002)
> Destination CID: 64
> Source CID: 0
>
> This should never happen and makes me wonder if something is
> even screwed up one down in L2CAP. Do you happen to have time
> to bisect this and figure out which patch introduced this behavior.

Marcel,

I'm looking into this on the assumption I broke something in
the refactor of __rfcomm_dlc_close.

I can reproduce this and looking at the kernel trace shows
that that UA reply of the dlc DISC is not being processed, because
the rfcomm thread is not getting scheduled, but I'm not sure where
that was supposed to happen.

IOW,

rctest -r | rctest -c xx:xx:....
| ...
| rfcomm_sock_shutdown
| __rfcomm_sock_close
| rfcomm_dlc_close dlci 20
| __rfcomm_dlc_close
| rfcomm_send_disc dlci 20
rfcomm_recv_disc state 1 dlci 20 |
rfcomm_send_ua dlci 20 |
__rfcomm_dlc_close state 9 |
rfcomm_dlc_unlink |
... | UA never shows up here because
| rfcomm_process_sessions isn't run
| so rfcomm_recv_ua never happens

Regards,
Peter Hurley

2014-03-12 02:14:18

by Marcel Holtmann

[permalink] [raw]
Subject: Re: rctest -c "Can't connect: Device or resource busy (16)"

Hi Scott,

> 3.10 with bluetooth-next applied - btmon of both sides attached

I am seeing the same with bluetooth-next on a 3.13 kernel. It even has a weird pattern for me.

rctest[52844]: Connected [handle 11, class 0x000000, priority 0]
rctest[52844]: Can't connect: Device or resource busy (16)
rctest[52844]: Can't connect: Device or resource busy (16)
rctest[52844]: Can't connect: Device or resource busy (16)
rctest[52844]: Can't connect: Device or resource busy (16)
rctest[52844]: Can't connect: Device or resource busy (16)
rctest[52844]: Can't connect: Device or resource busy (16)
rctest[52844]: Can't connect: Device or resource busy (16)
rctest[52844]: Can't connect: Device or resource busy (16)
rctest[52844]: Can't connect: Device or resource busy (16)
rctest[52844]: Can't connect: Device or resource busy (16)
rctest[52844]: Can't connect: Device or resource busy (16)
rctest[52844]: Can't connect: Device or resource busy (16)
rctest[52844]: Can't connect: Connection refused (111)
rctest[52844]: Can't connect: Connection reset by peer (104)
rctest[52844]: Can't connect: Connection reset by peer (104)
rctest[52844]: Connected [handle 11, class 0x000000, priority 0]

Unfortunately I currently have no idea on why this happens. When using SO_LINGER of 60 seconds with rctest -L 60 -c <bdaddr> I am getting some really odd tracing behavior.

> ACL Data RX: Handle 11 flags 0x02 dlen 14
L2CAP: Command Reject (0x01) ident 7 len 6
Reason: Invalid CID in request (0x0002)
Destination CID: 64
Source CID: 0

This should never happen and makes me wonder if something is even screwed up one down in L2CAP. Do you happen to have time to bisect this and figure out which patch introduced this behavior.

Regards

Marcel


2014-03-11 23:37:44

by Scott James Remnant

[permalink] [raw]
Subject: Re: rctest -c "Can't connect: Device or resource busy (16)"

3.10 with bluetooth-next applied - btmon of both sides attached

On Tue, Mar 11, 2014 at 4:03 PM, Marcel Holtmann <[email protected]> wrote:
> Hi Scott,
>
>> Is this expected behavior for RFCOMM? When using rctest -r on one
>> side, and rctest -c (connect/disconnect/connect...) on the other, the
>> client side fails to connect a significant percentage of the time:
>>
>> rctest[23705]: Can't connect: Device or resource busy (16)
>> rctest[23705]: Can't connect: Device or resource busy (16)
>> rctest[23705]: Can't connect: Device or resource busy (16)
>> rctest[23705]: Can't connect: Device or resource busy (16)
>> rctest[23705]: Can't connect: Device or resource busy (16)
>> rctest[23705]: Can't connect: Device or resource busy (16)
>> rctest[23705]: Connected [handle 34817, class 0xff8801, priority 0]
>> rctest[23705]: Can't connect: Device or resource busy (16)
>> rctest[23705]: Can't connect: Device or resource busy (16)
>> rctest[23705]: Can't connect: Device or resource busy (16)
>> rctest[23705]: Can't connect: Device or resource busy (16)
>> rctest[23705]: Can't connect: Device or resource busy (16)
>> rctest[23705]: Can't connect: Device or resource busy (16)
>
> it is not an expected behavior. Can you create a btmon trace for this and see what error code we get on the HCI level.
>
> Without any kind of analysis this sound like a race condition in the disconnect handling where the connection is not fully terminated and you try to re-connect too quickly.
>
> What kernel version is this? And as usual the first question, do you still see this behavior with a bluetooth-next kernel?
>
> Regards
>
> Marcel
>


Attachments:
connect.btsnoop (13.32 kB)
listen.btsnoop (13.73 kB)
Download all attachments

2014-03-11 23:03:56

by Marcel Holtmann

[permalink] [raw]
Subject: Re: rctest -c "Can't connect: Device or resource busy (16)"

Hi Scott,

> Is this expected behavior for RFCOMM? When using rctest -r on one
> side, and rctest -c (connect/disconnect/connect...) on the other, the
> client side fails to connect a significant percentage of the time:
>
> rctest[23705]: Can't connect: Device or resource busy (16)
> rctest[23705]: Can't connect: Device or resource busy (16)
> rctest[23705]: Can't connect: Device or resource busy (16)
> rctest[23705]: Can't connect: Device or resource busy (16)
> rctest[23705]: Can't connect: Device or resource busy (16)
> rctest[23705]: Can't connect: Device or resource busy (16)
> rctest[23705]: Connected [handle 34817, class 0xff8801, priority 0]
> rctest[23705]: Can't connect: Device or resource busy (16)
> rctest[23705]: Can't connect: Device or resource busy (16)
> rctest[23705]: Can't connect: Device or resource busy (16)
> rctest[23705]: Can't connect: Device or resource busy (16)
> rctest[23705]: Can't connect: Device or resource busy (16)
> rctest[23705]: Can't connect: Device or resource busy (16)

it is not an expected behavior. Can you create a btmon trace for this and see what error code we get on the HCI level.

Without any kind of analysis this sound like a race condition in the disconnect handling where the connection is not fully terminated and you try to re-connect too quickly.

What kernel version is this? And as usual the first question, do you still see this behavior with a bluetooth-next kernel?

Regards

Marcel


2014-04-02 20:04:06

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [PATCH 0/3] Fix RFCOMM connect/disconn races

Hi Peter,

> This patch series fixes the observed errors when running
> rctest -r on one station and rctest -c on the other.
>
> Please note that patch 2/3 is somewhat risky in that it allows
> multiple RFCOMM sessions between the same endpoints to co-exist,
> provided that all but one of the sessions is closing or is closed.
> I did test this, but there may be some subtlety I've overlooked.
>
> [I note this because you may prefer the simpler and more bulletproof
> solution of returning -EBUSY from __rfcomm_dlc_open() in this
> situation; I can understand if you have lost patience for RFCOMM
> breakage.]

I do not know if we can allow tow sessions with the same endpoint and not causing any other error. Especially also not breaking existing Bluetooth qualification test cases.

Would it be safe to take 1/3 and 3/3 and still think about 2/3 a bit more.

> This series does not address the excessive latency from sending
> the DISC command to receiving the UA reply. I think a partial
> solution would be to skip the rfcomm thread and run
> rfcomm_process_sessions() directly from rfcomm_l2data_ready();
> this would require some rework where sk->sk_data_ready() is
> called directly from contexts that aren't suitable for running
> rfcomm_process_sessions(). And this wouldn't address the
> USB polling latency at all.

I have no idea what is causing this at all. However it could be well that this comes from the time when we moved HCI processing from tasklets into workqueues. And now we have things messing with each other.

The long term goal (which nobody seems to be working on) was to replace the L2CAP socket that RFCOMM uses and thus why the rfcommd thread is needed with a direct L2CAP ops handling.

Regards

Marcel