2013-11-12 21:56:23

by Marcel Holtmann

[permalink] [raw]
Subject: Re: shutdown(3) and bluetooth.

Hi Dave,

> Is shutdown() allowed to block indefinitely ? The man page doesn't say either way,
> and I've noticed that my fuzz tester occasionally hangs for days spinning in bt_sock_wait_state()
>
> Is there something I should be doing to guarantee that this operation
> will either time out, or return instantly ?
>
> In this specific case, I doubt anything is on the "sender" end of the socket, so
> it's going to be waiting forever for a state change that won't arrive.

can you give us some extra information here. What kind of Bluetooth socket is this actually. From the top of my head, I have no idea why we would even wait forever. Normally when all low-level links are gone, the socket will shut down anyway.

Regards

Marcel



2013-11-13 01:58:43

by Marcel Holtmann

[permalink] [raw]
Subject: Re: shutdown(3) and bluetooth.

Hi Dave,

>>> So it seems it affects both SCO and RFCOMM.
>>>
>>>> What kernel did you run this against? It is a shot in the dark, but can you try linux-next quickly.
>>>> There was a socket related fix for the socket options where we confused RFCOMM vs L2CAP struct sock.
>>>
>>> first noticed it on Linus' latest HEAD, and then reproduced it on 3.11.6
>>> I'll look at linux-next tomorrow.
>>
>> I looked through the code and only call bt_sock_wait_state when SOCK_LINGER and sk_lingertime is set. In that case we actually block until the socket state changes to BT_CLOSED.
>>
>> The only way I see this could happen is if you have a huge linger timeout and confused the socket state before. What is actually the list of system calls that you are throwing at this socket.
>
> Ah. I recently changed some code that's now doing this on every socket at shutdown..
> (simplified cut-n-paste)
>
> struct linger ling = { .l_onoff = FALSE, };
>
> for (i = 0; i < nr_sockets; i++) {
> fd = shm->sockets[i].fd;
> shm->sockets[i].fd = 0;
>
> setsockopt(fd, SOL_SOCKET, SO_LINGER, &ling, sizeof(struct linger));
> shutdown(fd, SHUT_RDWR);
> close(fd);
> }
>
> I could just rip out that linger code completely and just hope that sockets staying in
> TIME_WAIT is good enough. iirc, I added it when after multiple runs, some of the
> weirder protocols would fail to open a socket once a certain number of existing
> sockets had opened, even if they were in SOCK_WAIT
>
> two remaining questions though. That code is setting linger to false. Why would
> that cause the sk_lingertime to be taken into consideration ? And why is this
> only a problem for bluetooth (apparently) ?

we are not touching that part of setsockopt. That is handled by net/core/sock.c and we just check if SOCK_LINGER flag is set and if we have a positive sk_lingertime. So this is a bit suspicious on why this is happening, but I don?t think it is our mistake.

Regards

Marcel


2013-11-13 00:28:19

by Dave Jones

[permalink] [raw]
Subject: Re: shutdown(3) and bluetooth.

On Wed, Nov 13, 2013 at 08:37:15AM +0900, Marcel Holtmann wrote:

> > So it seems it affects both SCO and RFCOMM.
> >
> >> What kernel did you run this against? It is a shot in the dark, but can you try linux-next quickly.
> >> There was a socket related fix for the socket options where we confused RFCOMM vs L2CAP struct sock.
> >
> > first noticed it on Linus' latest HEAD, and then reproduced it on 3.11.6
> > I'll look at linux-next tomorrow.
>
> I looked through the code and only call bt_sock_wait_state when SOCK_LINGER and sk_lingertime is set. In that case we actually block until the socket state changes to BT_CLOSED.
>
> The only way I see this could happen is if you have a huge linger timeout and confused the socket state before. What is actually the list of system calls that you are throwing at this socket.

Ah. I recently changed some code that's now doing this on every socket at shutdown..
(simplified cut-n-paste)

struct linger ling = { .l_onoff = FALSE, };

for (i = 0; i < nr_sockets; i++) {
fd = shm->sockets[i].fd;
shm->sockets[i].fd = 0;

setsockopt(fd, SOL_SOCKET, SO_LINGER, &ling, sizeof(struct linger));
shutdown(fd, SHUT_RDWR);
close(fd);
}

I could just rip out that linger code completely and just hope that sockets staying in
TIME_WAIT is good enough. iirc, I added it when after multiple runs, some of the
weirder protocols would fail to open a socket once a certain number of existing
sockets had opened, even if they were in SOCK_WAIT

two remaining questions though. That code is setting linger to false. Why would
that cause the sk_lingertime to be taken into consideration ? And why is this
only a problem for bluetooth (apparently) ?

Dave

2013-11-12 23:37:15

by Marcel Holtmann

[permalink] [raw]
Subject: Re: shutdown(3) and bluetooth.

Hi Dave,

>>> Here's the info I found in the logs, it looks like this was the only bluetooth socket.
>>>
>>> fd[195] = domain:31 (PF_BLUETOOTH) type:0x5 protocol:2
>>> Setsockopt(1 d 2134000 8) on fd 195
>>
>> this is a bit confusing. Protocol 2 is actually SCO, but the stack trace shows RFCOMM.
>
> Sorry, mixed up two separate runs. In the log above, the stack trace is actually..
>
> [<ffffffffa0492dca>] bt_sock_wait_state+0xda/0x240 [bluetooth]
> [<ffffffffa04c86d8>] sco_sock_release+0xb8/0xf0 [bluetooth]
> [<ffffffff815cb1ff>] sock_release+0x1f/0x90
> [<ffffffff815cb282>] sock_close+0x12/0x20
>
>
>>> ./trinity -P PF_BLUETOOTH -l off -c setsockopt
>>>
>>> let it run a few seconds, and then ctrl-c. The main process will never exit.
>>>
>>> 5814 pts/6 Ss 0:00 | \_ bash
>>> 5876 pts/6 S+ 0:00 | | \_ ./trinity -P PF_BLUETOOTH -l off -c setsockopt
>>> 5877 pts/6 Z+ 0:00 | | \_ [trinity] <defunct>
>>> 5878 pts/6 S+ 0:01 | | \_ [trinity-main]
>>>
>>> $ sudo cat /proc/5878/stack
>>> [<ffffffffa04397a2>] bt_sock_wait_state+0xc2/0x190 [bluetooth]
>>> [<ffffffffa0847a75>] rfcomm_sock_shutdown+0x85/0xb0 [rfcomm]
>>> [<ffffffffa0847ad9>] rfcomm_sock_release+0x39/0xb0 [rfcomm]
>
> So it seems it affects both SCO and RFCOMM.
>
>> What kernel did you run this against? It is a shot in the dark, but can you try linux-next quickly.
>> There was a socket related fix for the socket options where we confused RFCOMM vs L2CAP struct sock.
>
> first noticed it on Linus' latest HEAD, and then reproduced it on 3.11.6
> I'll look at linux-next tomorrow.

I looked through the code and only call bt_sock_wait_state when SOCK_LINGER and sk_lingertime is set. In that case we actually block until the socket state changes to BT_CLOSED.

The only way I see this could happen is if you have a huge linger timeout and confused the socket state before. What is actually the list of system calls that you are throwing at this socket.

Regards

Marcel


2013-11-12 22:48:19

by Dave Jones

[permalink] [raw]
Subject: Re: shutdown(3) and bluetooth.

On Wed, Nov 13, 2013 at 07:32:09AM +0900, Marcel Holtmann wrote:

> > Here's the info I found in the logs, it looks like this was the only bluetooth socket.
> >
> > fd[195] = domain:31 (PF_BLUETOOTH) type:0x5 protocol:2
> > Setsockopt(1 d 2134000 8) on fd 195
>
> this is a bit confusing. Protocol 2 is actually SCO, but the stack trace shows RFCOMM.

Sorry, mixed up two separate runs. In the log above, the stack trace is actually..

[<ffffffffa0492dca>] bt_sock_wait_state+0xda/0x240 [bluetooth]
[<ffffffffa04c86d8>] sco_sock_release+0xb8/0xf0 [bluetooth]
[<ffffffff815cb1ff>] sock_release+0x1f/0x90
[<ffffffff815cb282>] sock_close+0x12/0x20


> > ./trinity -P PF_BLUETOOTH -l off -c setsockopt
> >
> > let it run a few seconds, and then ctrl-c. The main process will never exit.
> >
> > 5814 pts/6 Ss 0:00 | \_ bash
> > 5876 pts/6 S+ 0:00 | | \_ ./trinity -P PF_BLUETOOTH -l off -c setsockopt
> > 5877 pts/6 Z+ 0:00 | | \_ [trinity] <defunct>
> > 5878 pts/6 S+ 0:01 | | \_ [trinity-main]
> >
> > $ sudo cat /proc/5878/stack
> > [<ffffffffa04397a2>] bt_sock_wait_state+0xc2/0x190 [bluetooth]
> > [<ffffffffa0847a75>] rfcomm_sock_shutdown+0x85/0xb0 [rfcomm]
> > [<ffffffffa0847ad9>] rfcomm_sock_release+0x39/0xb0 [rfcomm]

So it seems it affects both SCO and RFCOMM.

> What kernel did you run this against? It is a shot in the dark, but can you try linux-next quickly.
> There was a socket related fix for the socket options where we confused RFCOMM vs L2CAP struct sock.

first noticed it on Linus' latest HEAD, and then reproduced it on 3.11.6
I'll look at linux-next tomorrow.

thanks,

Dave

2013-11-12 22:32:09

by Marcel Holtmann

[permalink] [raw]
Subject: Re: shutdown(3) and bluetooth.

Hi Dave,

>>> Is shutdown() allowed to block indefinitely ? The man page doesn't say either way,
>>> and I've noticed that my fuzz tester occasionally hangs for days spinning in bt_sock_wait_state()
>>>
>>> Is there something I should be doing to guarantee that this operation
>>> will either time out, or return instantly ?
>>>
>>> In this specific case, I doubt anything is on the "sender" end of the socket, so
>>> it's going to be waiting forever for a state change that won't arrive.
>>
>> can you give us some extra information here. What kind of Bluetooth socket is this actually. From the top of my head, I have no idea why we would even wait forever. Normally when all low-level links are gone, the socket will shut down anyway.
>
> Here's the info I found in the logs, it looks like this was the only bluetooth socket.
>
> fd[195] = domain:31 (PF_BLUETOOTH) type:0x5 protocol:2
> Setsockopt(1 d 2134000 8) on fd 195

this is a bit confusing. Protocol 2 is actually SCO, but the stack trace shows RFCOMM.

> it doesn't look like any further operations were done on this fd during the fuzzers runtime.
>
> Quick way to reproduce:
>
> ./trinity -P PF_BLUETOOTH -l off -c setsockopt
>
> let it run a few seconds, and then ctrl-c. The main process will never exit.
>
> 5814 pts/6 Ss 0:00 | \_ bash
> 5876 pts/6 S+ 0:00 | | \_ ./trinity -P PF_BLUETOOTH -l off -c setsockopt
> 5877 pts/6 Z+ 0:00 | | \_ [trinity] <defunct>
> 5878 pts/6 S+ 0:01 | | \_ [trinity-main]
>
> $ sudo cat /proc/5878/stack
> [<ffffffffa04397a2>] bt_sock_wait_state+0xc2/0x190 [bluetooth]
> [<ffffffffa0847a75>] rfcomm_sock_shutdown+0x85/0xb0 [rfcomm]
> [<ffffffffa0847ad9>] rfcomm_sock_release+0x39/0xb0 [rfcomm]
> [<ffffffff81532fcf>] sock_release+0x1f/0x80
> [<ffffffff81533042>] sock_close+0x12/0x20
> [<ffffffff811a9ac1>] __fput+0xe1/0x230
> [<ffffffff811a9c5e>] ____fput+0xe/0x10
> [<ffffffff8108534c>] task_work_run+0xbc/0xe0
> [<ffffffff8106944c>] do_exit+0x2bc/0xa20
> [<ffffffff81069c2f>] do_group_exit+0x3f/0xa0
> [<ffffffff81069ca4>] SyS_exit_group+0x14/0x20
> [<ffffffff81656b27>] tracesys+0xdd/0xe2
> [<ffffffffffffffff>] 0xffffffffffffffff

What kernel did you run this against? It is a shot in the dark, but can you try linux-next quickly. There was a socket related fix for the socket options where we confused RFCOMM vs L2CAP struct sock.

Regards

Marcel


2013-11-12 22:10:38

by Dave Jones

[permalink] [raw]
Subject: Re: shutdown(3) and bluetooth.

On Wed, Nov 13, 2013 at 06:56:23AM +0900, Marcel Holtmann wrote:
> Hi Dave,
>
> > Is shutdown() allowed to block indefinitely ? The man page doesn't say either way,
> > and I've noticed that my fuzz tester occasionally hangs for days spinning in bt_sock_wait_state()
> >
> > Is there something I should be doing to guarantee that this operation
> > will either time out, or return instantly ?
> >
> > In this specific case, I doubt anything is on the "sender" end of the socket, so
> > it's going to be waiting forever for a state change that won't arrive.
>
> can you give us some extra information here. What kind of Bluetooth socket is this actually. From the top of my head, I have no idea why we would even wait forever. Normally when all low-level links are gone, the socket will shut down anyway.

Here's the info I found in the logs, it looks like this was the only bluetooth socket.

fd[195] = domain:31 (PF_BLUETOOTH) type:0x5 protocol:2
Setsockopt(1 d 2134000 8) on fd 195

it doesn't look like any further operations were done on this fd during the fuzzers runtime.

Quick way to reproduce:

./trinity -P PF_BLUETOOTH -l off -c setsockopt

let it run a few seconds, and then ctrl-c. The main process will never exit.

5814 pts/6 Ss 0:00 | \_ bash
5876 pts/6 S+ 0:00 | | \_ ./trinity -P PF_BLUETOOTH -l off -c setsockopt
5877 pts/6 Z+ 0:00 | | \_ [trinity] <defunct>
5878 pts/6 S+ 0:01 | | \_ [trinity-main]

$ sudo cat /proc/5878/stack
[<ffffffffa04397a2>] bt_sock_wait_state+0xc2/0x190 [bluetooth]
[<ffffffffa0847a75>] rfcomm_sock_shutdown+0x85/0xb0 [rfcomm]
[<ffffffffa0847ad9>] rfcomm_sock_release+0x39/0xb0 [rfcomm]
[<ffffffff81532fcf>] sock_release+0x1f/0x80
[<ffffffff81533042>] sock_close+0x12/0x20
[<ffffffff811a9ac1>] __fput+0xe1/0x230
[<ffffffff811a9c5e>] ____fput+0xe/0x10
[<ffffffff8108534c>] task_work_run+0xbc/0xe0
[<ffffffff8106944c>] do_exit+0x2bc/0xa20
[<ffffffff81069c2f>] do_group_exit+0x3f/0xa0
[<ffffffff81069ca4>] SyS_exit_group+0x14/0x20
[<ffffffff81656b27>] tracesys+0xdd/0xe2
[<ffffffffffffffff>] 0xffffffffffffffff


Dave