2009-05-20 01:20:25

by Nick Pelly

[permalink] [raw]
Subject: bug? kernel does not send HCI Create Connection Cancel Command on shutdown() or close() of a connecting rfcomm socket

I found that neither close() nor shutdown() on a RFCOMM socket that is
currently connecting will cause the kernel to send HCI Create
Connection Cancel Command. This seems like a problem, since this means
there is no way to cancel an outgoing connection - even in the single
threaded case.

Ideally for our use case both shutdown() and close() would cause the
Create Connection Cancel command to be sent.

It is easy to check this with a code snippet like:

fd = _socket(PF_BLUETOOTH, SOCK_STREAM, BTPROTO_RFCOMM);
flags = fcntl(fd, F_GETFL);
fcntl(fd, F_SETFL, flags | O_NONBLOCK);
connect(fd, (struct sockaddr *) &addr, sizeof(struct sockaddr));

sleep(1);
shutdown(fd, SHUT_RDWR);
sleep(1);
close(fd);

Following this with hcidump you can see the Create Connection command
sent, but it does not get canceled on close or shutdown.

2009-05-19 17:55:16.098103 < HCI Command: Create Connection
(0x01|0x0005) plen 13
bdaddr 00:11:22:33:44:55 ptype 0xcc18 rswitch 0x01 clkoffset 0x0000
Packet type: DM1 DM3 DM5 DH1 DH3 DH5
2009-05-19 17:55:16.118305 > HCI Event: Command Status (0x0f) plen 4
Create Connection (0x01|0x0005) status 0x00 ncmd 1

<--- shutdown()
<--- close()

2009-05-19 17:55:26.361744 > HCI Event: Connect Complete (0x03) plen 11
status 0x04 handle 1 bdaddr 00:11:22:33:44:55 type ACL encrypt 0x00
Error: Page Timeout
2009-05-19 17:55:33.119251 < HCI Command: Create Connection
(0x01|0x0005) plen 13
bdaddr 00:11:22:33:44:55 ptype 0xcc18 rswitch 0x01 clkoffset 0x0000
Packet type: DM1 DM3 DM5 DH1 DH3 DH5
2009-05-19 17:55:33.139240 > HCI Event: Command Status (0x0f) plen 4
Create Connection (0x01|0x0005) status 0x00 ncmd 1


Tested on 2.6.29.

Is this behavior intentional or is this a bug?

Nick


2009-05-20 21:57:51

by Nick Pelly

[permalink] [raw]
Subject: Re: bug? kernel does not send HCI Create Connection Cancel Command on shutdown() or close() of a connecting rfcomm socket

On Tue, May 19, 2009 at 7:49 PM, Marcel Holtmann <[email protected]> wrote:
> Hi Nick,
>
>> I found that neither close() nor shutdown() on a RFCOMM socket that is
>> currently connecting will cause the kernel to send HCI Create
>> Connection Cancel Command. This seems like a problem, since this means
>> there is no way to cancel an outgoing connection - even in the single
>> threaded case.
>>
>> Ideally for our use case both shutdown() and close() would cause the
>> Create Connection Cancel command to be sent.
>>
>> It is easy to check this with a code snippet like:
>>
>> ? ? fd = _socket(PF_BLUETOOTH, SOCK_STREAM, BTPROTO_RFCOMM);
>> ? ? flags = fcntl(fd, F_GETFL);
>> ? ? fcntl(fd, F_SETFL, flags | O_NONBLOCK);
>> ? ? connect(fd, (struct sockaddr *) &addr, sizeof(struct sockaddr));
>>
>> ? ? sleep(1);
>> ? ? shutdown(fd, SHUT_RDWR);
>> ? ? sleep(1);
>> ? ? close(fd);
>>
>> Following this with hcidump you can see the Create Connection command
>> sent, but it does not get canceled on close or shutdown.
>>
>> 2009-05-19 17:55:16.098103 < HCI Command: Create Connection
>> (0x01|0x0005) plen 13
>> ? ? bdaddr 00:11:22:33:44:55 ptype 0xcc18 rswitch 0x01 clkoffset 0x0000
>> ? ? Packet type: DM1 DM3 DM5 DH1 DH3 DH5
>> 2009-05-19 17:55:16.118305 > HCI Event: Command Status (0x0f) plen 4
>> ? ? Create Connection (0x01|0x0005) status 0x00 ncmd 1
>>
>> <--- shutdown()
>> <--- close()
>>
>> 2009-05-19 17:55:26.361744 > HCI Event: Connect Complete (0x03) plen 11
>> ? ? status 0x04 handle 1 bdaddr 00:11:22:33:44:55 type ACL encrypt 0x00
>> ? ? Error: Page Timeout
>> 2009-05-19 17:55:33.119251 < HCI Command: Create Connection
>> (0x01|0x0005) plen 13
>> ? ? bdaddr 00:11:22:33:44:55 ptype 0xcc18 rswitch 0x01 clkoffset 0x0000
>> ? ? Packet type: DM1 DM3 DM5 DH1 DH3 DH5
>> 2009-05-19 17:55:33.139240 > HCI Event: Command Status (0x0f) plen 4
>> ? ? Create Connection (0x01|0x0005) status 0x00 ncmd 1
>>
>>
>> Tested on 2.6.29.
>>
>> Is this behavior intentional or is this a bug?
>
> I know that I tested this massively with non-blocking sockets and there
> is works perfectly fine. No idea why shutdown() or close() is not
> catching this.
>
> Please check with 2.6.30-rc6 kernel since the Simple Pairing changes
> might have broken something here.

I tested on the following kernels:

Android 2.6.27 (2.6.27 + several bluetooth-next.git patches)
Android 2.6.29 (2.6.29 + several bluetooth-next.git patches)
2.6.28 (Ubuntu 9.10)

All with the same results.

I don't have easy access to a 2.6.30 kernel right now.

Thanks for letting us know this is not intentional behavior.

Nick

2009-05-20 02:49:17

by Marcel Holtmann

[permalink] [raw]
Subject: Re: bug? kernel does not send HCI Create Connection Cancel Command on shutdown() or close() of a connecting rfcomm socket

Hi Nick,

> I found that neither close() nor shutdown() on a RFCOMM socket that is
> currently connecting will cause the kernel to send HCI Create
> Connection Cancel Command. This seems like a problem, since this means
> there is no way to cancel an outgoing connection - even in the single
> threaded case.
>
> Ideally for our use case both shutdown() and close() would cause the
> Create Connection Cancel command to be sent.
>
> It is easy to check this with a code snippet like:
>
> fd = _socket(PF_BLUETOOTH, SOCK_STREAM, BTPROTO_RFCOMM);
> flags = fcntl(fd, F_GETFL);
> fcntl(fd, F_SETFL, flags | O_NONBLOCK);
> connect(fd, (struct sockaddr *) &addr, sizeof(struct sockaddr));
>
> sleep(1);
> shutdown(fd, SHUT_RDWR);
> sleep(1);
> close(fd);
>
> Following this with hcidump you can see the Create Connection command
> sent, but it does not get canceled on close or shutdown.
>
> 2009-05-19 17:55:16.098103 < HCI Command: Create Connection
> (0x01|0x0005) plen 13
> bdaddr 00:11:22:33:44:55 ptype 0xcc18 rswitch 0x01 clkoffset 0x0000
> Packet type: DM1 DM3 DM5 DH1 DH3 DH5
> 2009-05-19 17:55:16.118305 > HCI Event: Command Status (0x0f) plen 4
> Create Connection (0x01|0x0005) status 0x00 ncmd 1
>
> <--- shutdown()
> <--- close()
>
> 2009-05-19 17:55:26.361744 > HCI Event: Connect Complete (0x03) plen 11
> status 0x04 handle 1 bdaddr 00:11:22:33:44:55 type ACL encrypt 0x00
> Error: Page Timeout
> 2009-05-19 17:55:33.119251 < HCI Command: Create Connection
> (0x01|0x0005) plen 13
> bdaddr 00:11:22:33:44:55 ptype 0xcc18 rswitch 0x01 clkoffset 0x0000
> Packet type: DM1 DM3 DM5 DH1 DH3 DH5
> 2009-05-19 17:55:33.139240 > HCI Event: Command Status (0x0f) plen 4
> Create Connection (0x01|0x0005) status 0x00 ncmd 1
>
>
> Tested on 2.6.29.
>
> Is this behavior intentional or is this a bug?

I know that I tested this massively with non-blocking sockets and there
is works perfectly fine. No idea why shutdown() or close() is not
catching this.

Please check with 2.6.30-rc6 kernel since the Simple Pairing changes
might have broken something here.

Regards

Marcel



2009-07-15 22:11:23

by Nick Pelly

[permalink] [raw]
Subject: Re: bug? kernel does not send HCI Create Connection Cancel Command on shutdown() or close() of a connecting rfcomm socket

On Tue, Jul 14, 2009 at 9:15 AM, Nick Pelly<[email protected]> wrote:
> On Mon, Jul 13, 2009 at 2:27 PM, Luiz Augusto von
> Dentz<[email protected]> wrote:
>> Hi Nick,
>>
>> On Mon, Jul 13, 2009 at 12:46 PM, Nick Pelly<[email protected]> wrote:
>>> Any comments on this patch?
>>>
>>> It works for me, but my understanding of the RFCOMM state machine is naive.
>>>
>>
>> iirc BT_CONFIG(PN frame) means the DLC is being configured than we got
>> into connecting phase (BT_CONNECT) and send SABM frame. Only when
>> receiving UA frame DLC is consider connected (BT_CONNECTED), so your
>> patch seems good by assuming that we don't need to send a DISC for a
>> DLC not connected. But there is still a good use for it to cancel the
>> DLC connection attempt, so perhaps a better alternative would be to
>> use a much shorter timeout in those cases.
>
> Thanks for the advice.
>
> I have prepared a new patch which uses a very short timeout (10ms) on
> the DLC disconnect when in BT_CONFIG. I have tested this patch and it
> also resolves the issue. Patch attached.

ping

I will be offline for 2 weeks from tomorrow, so if there is further
testing or patches you would like me to try then I won't be able to
help after tomorrow.

Nick

2009-07-14 16:15:15

by Nick Pelly

[permalink] [raw]
Subject: Re: bug? kernel does not send HCI Create Connection Cancel Command on shutdown() or close() of a connecting rfcomm socket

On Mon, Jul 13, 2009 at 2:27 PM, Luiz Augusto von
Dentz<[email protected]> wrote:
> Hi Nick,
>
> On Mon, Jul 13, 2009 at 12:46 PM, Nick Pelly<[email protected]> wrote:
>> Any comments on this patch?
>>
>> It works for me, but my understanding of the RFCOMM state machine is naive.
>>
>
> iirc BT_CONFIG(PN frame) means the DLC is being configured than we got
> into connecting phase (BT_CONNECT) and send SABM frame. Only when
> receiving UA frame DLC is consider connected (BT_CONNECTED), so your
> patch seems good by assuming that we don't need to send a DISC for a
> DLC not connected. But there is still a good use for it to cancel the
> DLC connection attempt, so perhaps a better alternative would be to
> use a much shorter timeout in those cases.

Thanks for the advice.

I have prepared a new patch which uses a very short timeout (10ms) on
the DLC disconnect when in BT_CONFIG. I have tested this patch and it
also resolves the issue. Patch attached.

Nick


Attachments:
0001-Bluetooth-Use-a-very-short-timeout-for-dlci-disconne.patch (2.38 kB)

2009-07-13 21:27:51

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: bug? kernel does not send HCI Create Connection Cancel Command on shutdown() or close() of a connecting rfcomm socket

Hi Nick,

On Mon, Jul 13, 2009 at 12:46 PM, Nick Pelly<[email protected]> wrote:
> Any comments on this patch?
>
> It works for me, but my understanding of the RFCOMM state machine is naiv=
e.
>

iirc BT_CONFIG(PN frame) means the DLC is being configured than we got
into connecting phase (BT_CONNECT) and send SABM frame. Only when
receiving UA frame DLC is consider connected (BT_CONNECTED), so your
patch seems good by assuming that we don't need to send a DISC for a
DLC not connected. But there is still a good use for it to cancel the
DLC connection attempt, so perhaps a better alternative would be to
use a much shorter timeout in those cases.


--=20
Luiz Augusto von Dentz
Engenheiro de Computa=E7=E3o

2009-07-13 15:46:13

by Nick Pelly

[permalink] [raw]
Subject: Re: bug? kernel does not send HCI Create Connection Cancel Command on shutdown() or close() of a connecting rfcomm socket

On Thu, Jul 9, 2009 at 12:37 PM, Nick Pelly<[email protected]> wrote:
> For the RFCOMM issue.
>
> This is the log when shutdown() is called on the rfcomm socket while
> the ACL link is connecting:
>
> [? 132.856414] rfcomm:rfcomm_sock_shutdown: sock c5cb3a20, sk c63fca00
> [? 132.856933] rfcomm:__rfcomm_sock_close: sk c63fca00 state 5 socket c5cb3a20
> [? 132.857788] rfcomm:__rfcomm_dlc_close: dlc c61ea240 state 7 dlci 38
> err 0 session c63d4ba0
> [? 132.858612] rfcomm:rfcomm_send_disc: c63d4ba0 dlci 38
> [? 132.859069] rfcomm:rfcomm_send_frame: session c63d4ba0 len 4
> [? 132.859893] l2cap:l2cap_sock_sendmsg: sock c5cb38c0, sk c63fc800
> [? 132.860351] rfcomm:rfcomm_dlc_set_timer: dlc c61ea240 state 8 timeout 2000
> [ ?133.863739] rfcomm:rfcomm_sock_release: sock c5cb3a20, sk c63fca00
> [ ?133.864257] rfcomm:rfcomm_sock_shutdown: sock c5cb3a20, sk c63fca00
> [ ?133.865081] rfcomm:rfcomm_sock_kill: sk c63fca00 state 5 refcnt 2
> [ ?133.865539] rfcomm:rfcomm_sock_destruct: sk c63fca00 dlc c61ea240
>
>
>
> I'm surprised to see d->state for the rfcomm_dlc is BT_CONFIG at
> __rfcomm_dlc_close(), but looking at __rfcomm_dlc_open() this appears
> to be intentional.
>
> We do not hit __l2cap_sock_close()
>
> We attempt a graceful rfcomm disconnected by sending the dlci
> disconnected frame - but this does not make sense - since there is no
> rfcomm connection yet.
>
>
> Assuming that d->state == BT_CONFIG during this phase is correct, then
> the attached patch will fix this issue.
>
> However - I don't know the rfcomm state machine - so this patch may
> having side effects. Requesting comments.

Any comments on this patch?

It works for me, but my understanding of the RFCOMM state machine is naive.

Nick

2009-07-09 19:37:49

by Nick Pelly

[permalink] [raw]
Subject: Re: bug? kernel does not send HCI Create Connection Cancel Command on shutdown() or close() of a connecting rfcomm socket

For the RFCOMM issue.

This is the log when shutdown() is called on the rfcomm socket while
the ACL link is connecting:

[? 132.856414] rfcomm:rfcomm_sock_shutdown: sock c5cb3a20, sk c63fca00
[? 132.856933] rfcomm:__rfcomm_sock_close: sk c63fca00 state 5 socket c5cb3a20
[? 132.857788] rfcomm:__rfcomm_dlc_close: dlc c61ea240 state 7 dlci 38
err 0 session c63d4ba0
[? 132.858612] rfcomm:rfcomm_send_disc: c63d4ba0 dlci 38
[? 132.859069] rfcomm:rfcomm_send_frame: session c63d4ba0 len 4
[? 132.859893] l2cap:l2cap_sock_sendmsg: sock c5cb38c0, sk c63fc800
[? 132.860351] rfcomm:rfcomm_dlc_set_timer: dlc c61ea240 state 8 timeout 2000
[ 133.863739] rfcomm:rfcomm_sock_release: sock c5cb3a20, sk c63fca00
[ 133.864257] rfcomm:rfcomm_sock_shutdown: sock c5cb3a20, sk c63fca00
[ 133.865081] rfcomm:rfcomm_sock_kill: sk c63fca00 state 5 refcnt 2
[ 133.865539] rfcomm:rfcomm_sock_destruct: sk c63fca00 dlc c61ea240



I'm surprised to see d->state for the rfcomm_dlc is BT_CONFIG at
__rfcomm_dlc_close(), but looking at __rfcomm_dlc_open() this appears
to be intentional.

We do not hit __l2cap_sock_close()

We attempt a graceful rfcomm disconnected by sending the dlci
disconnected frame - but this does not make sense - since there is no
rfcomm connection yet.


Assuming that d->state == BT_CONFIG during this phase is correct, then
the attached patch will fix this issue.

However - I don't know the rfcomm state machine - so this patch may
having side effects. Requesting comments.

Nick


Attachments:
0001-Bluetooth-Do-not-attempt-to-send-dlci-disconnect-whe.patch (973.00 B)

2009-07-06 23:31:16

by Marcel Holtmann

[permalink] [raw]
Subject: Re: bug? kernel does not send HCI Create Connection Cancel Command on shutdown() or close() of a connecting rfcomm socket

Hi Nick,

> >>> I found that neither close() nor shutdown() on a RFCOMM socket that is
> >>> currently connecting will cause the kernel to send HCI Create
> >>> Connection Cancel Command. This seems like a problem, since this means
> >>> there is no way to cancel an outgoing connection - even in the single
> >>> threaded case.
> >>>
> >>> Ideally for our use case both shutdown() and close() would cause the
> >>> Create Connection Cancel command to be sent.
> >>>
> >>> It is easy to check this with a code snippet like:
> >>>
> >>> fd = _socket(PF_BLUETOOTH, SOCK_STREAM, BTPROTO_RFCOMM);
> >>> flags = fcntl(fd, F_GETFL);
> >>> fcntl(fd, F_SETFL, flags | O_NONBLOCK);
> >>> connect(fd, (struct sockaddr *) &addr, sizeof(struct sockaddr));
> >>>
> >>> sleep(1);
> >>> shutdown(fd, SHUT_RDWR);
> >>> sleep(1);
> >>> close(fd);
> >>>
> >>> Following this with hcidump you can see the Create Connection command
> >>> sent, but it does not get canceled on close or shutdown.
> >>>
> >>> 2009-05-19 17:55:16.098103 < HCI Command: Create Connection
> >>> (0x01|0x0005) plen 13
> >>> bdaddr 00:11:22:33:44:55 ptype 0xcc18 rswitch 0x01 clkoffset 0x0000
> >>> Packet type: DM1 DM3 DM5 DH1 DH3 DH5
> >>> 2009-05-19 17:55:16.118305 > HCI Event: Command Status (0x0f) plen 4
> >>> Create Connection (0x01|0x0005) status 0x00 ncmd 1
> >>>
> >>> <--- shutdown()
> >>> <--- close()
> >>>
> >>> 2009-05-19 17:55:26.361744 > HCI Event: Connect Complete (0x03) plen 11
> >>> status 0x04 handle 1 bdaddr 00:11:22:33:44:55 type ACL encrypt 0x00
> >>> Error: Page Timeout
> >>> 2009-05-19 17:55:33.119251 < HCI Command: Create Connection
> >>> (0x01|0x0005) plen 13
> >>> bdaddr 00:11:22:33:44:55 ptype 0xcc18 rswitch 0x01 clkoffset 0x0000
> >>> Packet type: DM1 DM3 DM5 DH1 DH3 DH5
> >>> 2009-05-19 17:55:33.139240 > HCI Event: Command Status (0x0f) plen 4
> >>> Create Connection (0x01|0x0005) status 0x00 ncmd 1
> >>>
> >>>
> >>> Tested on 2.6.29.
> >>>
> >>> Is this behavior intentional or is this a bug?
> >>
> >> I know that I tested this massively with non-blocking sockets and there
> >> is works perfectly fine. No idea why shutdown() or close() is not
> >> catching this.
> >>
> >> Please check with 2.6.30-rc6 kernel since the Simple Pairing changes
> >> might have broken something here.
> >
> > I tested on the following kernels:
> >
> > Android 2.6.27 (2.6.27 + several bluetooth-next.git patches)
> > Android 2.6.29 (2.6.29 + several bluetooth-next.git patches)
> > 2.6.28 (Ubuntu 9.10)
> >
> > All with the same results.
> >
> > I don't have easy access to a 2.6.30 kernel right now.
> >
> > Thanks for letting us know this is not intentional behavior.
>
> FYI: I still see this issue on 2.6.31-rc1
>
> Any suggestions of where to begin debugging this? I will look at it today.
>
> $ uname -a
> Linux npelly-kernel 2.6.31-020631rc1gc0d1117-generic
> #020631rc1gc0d1117 SMP Wed Jul 1 09:04:38 UTC 2009 i686 GNU/Linux
>
> $ hcidump -XVt
> 2009-07-06 11:46:53.792864 < HCI Command: Create Connection
> (0x01|0x0005) plen 13
> bdaddr 00:01:02:0A:0B:0C ptype 0xcc18 rswitch 0x01 clkoffset 0x0000
> Packet type: DM1 DM3 DM5 DH1 DH3 DH5
> 2009-07-06 11:46:53.800563 > HCI Event: Command Status (0x0f) plen 4
> Create Connection (0x01|0x0005) status 0x00 ncmd 1
> 2009-07-06 11:46:58.923825 > HCI Event: Connect Complete (0x03) plen 11
> status 0x04 handle 0 bdaddr 00:01:02:0A:0B:0C type ACL encrypt 0x00
> Error: Page Timeout
>
> $ strace ./a.out
> [...]
> socket(PF_BLUETOOTH, SOCK_STREAM, 3) = 3
> fcntl64(3, F_GETFL) = 0x2 (flags O_RDWR)
> fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
> connect(3, {sa_family=AF_BLUETOOTH,
> sa_data="\f\v\n\2\1\0\4\10\2\0\0\0\t\207"...}, 10) = -1 EINPROGRESS
> (Operation now in progress)
> rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
> rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
> rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> nanosleep({1, 0}, {1, 0}) = 0
> shutdown(3, 2 /* send and receive */) = 0
> rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
> rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
> rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> nanosleep({1, 0}, {1, 0}) = 0
> close(3) = 0
> exit_group(0) = ?
>
> $ cat sock_shutdown_bug.c
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <sys/socket.h>
> #include <fcntl.h>
>
> #include <bluetooth/bluetooth.h>
> #include <bluetooth/rfcomm.h>
>
> int main(int argc, char **argv) {
> int fd;
> int ret;
> long flags;
> struct sockaddr_rc addr;
>
> addr.rc_family = AF_BLUETOOTH;
> str2ba("00:01:02:0A:0B:0C", &addr.rc_bdaddr);
> addr.rc_channel = 4;
>
> fd = socket(PF_BLUETOOTH, SOCK_STREAM, BTPROTO_RFCOMM);
> flags = fcntl(fd, F_GETFL);
> fcntl(fd, F_SETFL, flags | O_NONBLOCK);
>
> connect(fd, (struct sockaddr *)&addr, sizeof(struct sockaddr_rc));
>
> sleep(1);
> shutdown(fd, SHUT_RDWR);
> sleep(1);
> close(fd);
> }

this would be the program to test L2CAP:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/socket.h>
#include <fcntl.h>

#include <bluetooth/bluetooth.h>
#include <bluetooth/l2cap.h>

int main(int argc, char **argv) {
int fd;
int ret;
long flags;
struct sockaddr_l2 addr;

addr.l2_family = AF_BLUETOOTH;
str2ba("00:01:02:0A:0B:0C", &addr.l2_bdaddr);
addr.l2_psm = htobs(1);
addr.l2_cid = 0;

fd = socket(PF_BLUETOOTH, SOCK_SEQPACKET, BTPROTO_L2CAP);
flags = fcntl(fd, F_GETFL);
fcntl(fd, F_SETFL, flags | O_NONBLOCK);

connect(fd, (struct sockaddr *)&addr, sizeof(addr));

sleep(1);
shutdown(fd, SHUT_RDWR);
sleep(1);
close(fd);
}

Regards

Marcel



2009-07-06 18:55:01

by Nick Pelly

[permalink] [raw]
Subject: Re: bug? kernel does not send HCI Create Connection Cancel Command on shutdown() or close() of a connecting rfcomm socket

On Wed, May 20, 2009 at 2:57 PM, Nick Pelly<[email protected]> wrote:
> On Tue, May 19, 2009 at 7:49 PM, Marcel Holtmann <[email protected]> wr=
ote:
>> Hi Nick,
>>
>>> I found that neither close() nor shutdown() on a RFCOMM socket that is
>>> currently connecting will cause the kernel to send HCI Create
>>> Connection Cancel Command. This seems like a problem, since this means
>>> there is no way to cancel an outgoing connection - even in the single
>>> threaded case.
>>>
>>> Ideally for our use case both shutdown() and close() would cause the
>>> Create Connection Cancel command to be sent.
>>>
>>> It is easy to check this with a code snippet like:
>>>
>>> =A0 =A0 fd =3D _socket(PF_BLUETOOTH, SOCK_STREAM, BTPROTO_RFCOMM);
>>> =A0 =A0 flags =3D fcntl(fd, F_GETFL);
>>> =A0 =A0 fcntl(fd, F_SETFL, flags | O_NONBLOCK);
>>> =A0 =A0 connect(fd, (struct sockaddr *) &addr, sizeof(struct sockaddr))=
;
>>>
>>> =A0 =A0 sleep(1);
>>> =A0 =A0 shutdown(fd, SHUT_RDWR);
>>> =A0 =A0 sleep(1);
>>> =A0 =A0 close(fd);
>>>
>>> Following this with hcidump you can see the Create Connection command
>>> sent, but it does not get canceled on close or shutdown.
>>>
>>> 2009-05-19 17:55:16.098103 < HCI Command: Create Connection
>>> (0x01|0x0005) plen 13
>>> =A0 =A0 bdaddr 00:11:22:33:44:55 ptype 0xcc18 rswitch 0x01 clkoffset 0x=
0000
>>> =A0 =A0 Packet type: DM1 DM3 DM5 DH1 DH3 DH5
>>> 2009-05-19 17:55:16.118305 > HCI Event: Command Status (0x0f) plen 4
>>> =A0 =A0 Create Connection (0x01|0x0005) status 0x00 ncmd 1
>>>
>>> <--- shutdown()
>>> <--- close()
>>>
>>> 2009-05-19 17:55:26.361744 > HCI Event: Connect Complete (0x03) plen 11
>>> =A0 =A0 status 0x04 handle 1 bdaddr 00:11:22:33:44:55 type ACL encrypt =
0x00
>>> =A0 =A0 Error: Page Timeout
>>> 2009-05-19 17:55:33.119251 < HCI Command: Create Connection
>>> (0x01|0x0005) plen 13
>>> =A0 =A0 bdaddr 00:11:22:33:44:55 ptype 0xcc18 rswitch 0x01 clkoffset 0x=
0000
>>> =A0 =A0 Packet type: DM1 DM3 DM5 DH1 DH3 DH5
>>> 2009-05-19 17:55:33.139240 > HCI Event: Command Status (0x0f) plen 4
>>> =A0 =A0 Create Connection (0x01|0x0005) status 0x00 ncmd 1
>>>
>>>
>>> Tested on 2.6.29.
>>>
>>> Is this behavior intentional or is this a bug?
>>
>> I know that I tested this massively with non-blocking sockets and there
>> is works perfectly fine. No idea why shutdown() or close() is not
>> catching this.
>>
>> Please check with 2.6.30-rc6 kernel since the Simple Pairing changes
>> might have broken something here.
>
> I tested on the following kernels:
>
> Android 2.6.27 (2.6.27 + several bluetooth-next.git patches)
> Android 2.6.29 (2.6.29 + several bluetooth-next.git patches)
> 2.6.28 (Ubuntu 9.10)
>
> All with the same results.
>
> I don't have easy access to a 2.6.30 kernel right now.
>
> Thanks for letting us know this is not intentional behavior.

FYI: I still see this issue on 2.6.31-rc1

Any suggestions of where to begin debugging this? I will look at it today.

$ uname -a
Linux npelly-kernel 2.6.31-020631rc1gc0d1117-generic
#020631rc1gc0d1117 SMP Wed Jul 1 09:04:38 UTC 2009 i686 GNU/Linux

$ hcidump -XVt
2009-07-06 11:46:53.792864 < HCI Command: Create Connection
(0x01|0x0005) plen 13
bdaddr 00:01:02:0A:0B:0C ptype 0xcc18 rswitch 0x01 clkoffset 0x0000
Packet type: DM1 DM3 DM5 DH1 DH3 DH5
2009-07-06 11:46:53.800563 > HCI Event: Command Status (0x0f) plen 4
Create Connection (0x01|0x0005) status 0x00 ncmd 1
2009-07-06 11:46:58.923825 > HCI Event: Connect Complete (0x03) plen 11
status 0x04 handle 0 bdaddr 00:01:02:0A:0B:0C type ACL encrypt 0x00
Error: Page Timeout

$ strace ./a.out
[...]
socket(PF_BLUETOOTH, SOCK_STREAM, 3) =3D 3
fcntl64(3, F_GETFL) =3D 0x2 (flags O_RDWR)
fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK) =3D 0
connect(3, {sa_family=3DAF_BLUETOOTH,
sa_data=3D"\f\v\n\2\1\0\4\10\2\0\0\0\t\207"...}, 10) =3D -1 EINPROGRESS
(Operation now in progress)
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) =3D 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) =3D 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) =3D 0
nanosleep({1, 0}, {1, 0}) =3D 0
shutdown(3, 2 /* send and receive */) =3D 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) =3D 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) =3D 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) =3D 0
nanosleep({1, 0}, {1, 0}) =3D 0
close(3) =3D 0
exit_group(0) =3D ?

$ cat sock_shutdown_bug.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/socket.h>
#include <fcntl.h>

#include <bluetooth/bluetooth.h>
#include <bluetooth/rfcomm.h>

int main(int argc, char **argv) {
int fd;
int ret;
long flags;
struct sockaddr_rc addr;

addr.rc_family =3D AF_BLUETOOTH;
str2ba("00:01:02:0A:0B:0C", &addr.rc_bdaddr);
addr.rc_channel =3D 4;

fd =3D socket(PF_BLUETOOTH, SOCK_STREAM, BTPROTO_RFCOMM);
flags =3D fcntl(fd, F_GETFL);
fcntl(fd, F_SETFL, flags | O_NONBLOCK);

connect(fd, (struct sockaddr *)&addr, sizeof(struct sockaddr_rc));

sleep(1);
shutdown(fd, SHUT_RDWR);
sleep(1);
close(fd);
}

2009-08-06 18:03:24

by Nick Pelly

[permalink] [raw]
Subject: Re: bug? kernel does not send HCI Create Connection Cancel Command on shutdown() or close() of a connecting rfcomm socket

On Wed, Jul 15, 2009 at 3:11 PM, Nick Pelly<[email protected]> wrote:
> On Tue, Jul 14, 2009 at 9:15 AM, Nick Pelly<[email protected]> wrote:
>> On Mon, Jul 13, 2009 at 2:27 PM, Luiz Augusto von
>> Dentz<[email protected]> wrote:
>>> Hi Nick,
>>>
>>> On Mon, Jul 13, 2009 at 12:46 PM, Nick Pelly<[email protected]> wrote:
>>>> Any comments on this patch?
>>>>
>>>> It works for me, but my understanding of the RFCOMM state machine is naive.
>>>>
>>>
>>> iirc BT_CONFIG(PN frame) means the DLC is being configured than we got
>>> into connecting phase (BT_CONNECT) and send SABM frame. Only when
>>> receiving UA frame DLC is consider connected (BT_CONNECTED), so your
>>> patch seems good by assuming that we don't need to send a DISC for a
>>> DLC not connected. But there is still a good use for it to cancel the
>>> DLC connection attempt, so perhaps a better alternative would be to
>>> use a much shorter timeout in those cases.
>>
>> Thanks for the advice.
>>
>> I have prepared a new patch which uses a very short timeout (10ms) on
>> the DLC disconnect when in BT_CONFIG. I have tested this patch and it
>> also resolves the issue. Patch attached.
>
> ping
>
> I will be offline for 2 weeks from tomorrow, so if there is further
> testing or patches you would like me to try then I won't be able to
> help after tomorrow.

ping.

Looking for 'yes this is ok, patch merged'

or

'no this is not ok, because....'

Nick

2010-02-03 01:56:07

by Nick Pelly

[permalink] [raw]
Subject: Re: bug? kernel does not send HCI Create Connection Cancel Command on shutdown() or close() of a connecting rfcomm socket

On Thu, Aug 6, 2009 at 10:03 AM, Nick Pelly <[email protected]> wrote:
> On Wed, Jul 15, 2009 at 3:11 PM, Nick Pelly<[email protected]> wrote:
>> On Tue, Jul 14, 2009 at 9:15 AM, Nick Pelly<[email protected]> wrote:
>>> On Mon, Jul 13, 2009 at 2:27 PM, Luiz Augusto von
>>> Dentz<[email protected]> wrote:
>>>> Hi Nick,
>>>>
>>>> On Mon, Jul 13, 2009 at 12:46 PM, Nick Pelly<[email protected]> wrote:
>>>>> Any comments on this patch?
>>>>>
>>>>> It works for me, but my understanding of the RFCOMM state machine is naive.
>>>>>
>>>>
>>>> iirc BT_CONFIG(PN frame) means the DLC is being configured than we got
>>>> into connecting phase (BT_CONNECT) and send SABM frame. Only when
>>>> receiving UA frame DLC is consider connected (BT_CONNECTED), so your
>>>> patch seems good by assuming that we don't need to send a DISC for a
>>>> DLC not connected. But there is still a good use for it to cancel the
>>>> DLC connection attempt, so perhaps a better alternative would be to
>>>> use a much shorter timeout in those cases.
>>>
>>> Thanks for the advice.
>>>
>>> I have prepared a new patch which uses a very short timeout (10ms) on
>>> the DLC disconnect when in BT_CONFIG. I have tested this patch and it
>>> also resolves the issue. Patch attached.
>>
>> ping
>>
>> I will be offline for 2 weeks from tomorrow, so if there is further
>> testing or patches you would like me to try then I won't be able to
>> help after tomorrow.
>
> ping.
>
> Looking for 'yes this is ok, patch merged'
>
> or
>
> 'no this is not ok, because....'
>
> Nick
>

Ping.

We're still running with this patch.

Nick

2010-02-03 01:54:40

by Nick Pelly

[permalink] [raw]
Subject: Re: bug? kernel does not send HCI Create Connection Cancel Command on shutdown() or close() of a connecting rfcomm socket

On Thu, Aug 6, 2009 at 10:03 AM, Nick Pelly <[email protected]> wrote:

> On Wed, Jul 15, 2009 at 3:11 PM, Nick Pelly<[email protected]> wrote:
> > On Tue, Jul 14, 2009 at 9:15 AM, Nick Pelly<[email protected]> wrote:
> >> On Mon, Jul 13, 2009 at 2:27 PM, Luiz Augusto von
> >> Dentz<[email protected]> wrote:
> >>> Hi Nick,
> >>>
> >>> On Mon, Jul 13, 2009 at 12:46 PM, Nick Pelly<[email protected]> wrote:
> >>>> Any comments on this patch?
> >>>>
> >>>> It works for me, but my understanding of the RFCOMM state machine is
> naive.
> >>>>
> >>>
> >>> iirc BT_CONFIG(PN frame) means the DLC is being configured than we got
> >>> into connecting phase (BT_CONNECT) and send SABM frame. Only when
> >>> receiving UA frame DLC is consider connected (BT_CONNECTED), so your
> >>> patch seems good by assuming that we don't need to send a DISC for a
> >>> DLC not connected. But there is still a good use for it to cancel the
> >>> DLC connection attempt, so perhaps a better alternative would be to
> >>> use a much shorter timeout in those cases.
> >>
> >> Thanks for the advice.
> >>
> >> I have prepared a new patch which uses a very short timeout (10ms) on
> >> the DLC disconnect when in BT_CONFIG. I have tested this patch and it
> >> also resolves the issue. Patch attached.
> >
> > ping
> >
> > I will be offline for 2 weeks from tomorrow, so if there is further
> > testing or patches you would like me to try then I won't be able to
> > help after tomorrow.
>
> ping.
>
> Looking for 'yes this is ok, patch merged'
>
> or
>
> 'no this is not ok, because....'


Ping.

we've been running with this patch a while now.

Nick