2004-09-17 00:10:50

by Daryl Van Vorst

[permalink] [raw]
Subject: [Bluez-devel] Rfcomm Use Count

Marcel,

I have a simple way to reproduce at least part of this bug. I don't have =
an
up-to-date x86 machine to try this on, but I suspect you'll see the same
behaviour:

1. Compile and run the attached code on one machine
2. Connect to it from another machine using: rctest -n -P1 <bd_addr>
3. Hit ctrl-c on rctest
4. Hit ctrl-c on bzt (or whatever you called the compiled code)
5. lsmod and look at the rfcomm use count.

I think the problem stems from rfcomm_cleanup_listen() and
bluez_accept_dequeue(). Bluez_accept_dequeue() won't return the socket =
if it
is in the closed state, and so rfcomm_cleanup_listen() can't fully =
cleanup.

And if accept is called before rfcomm_cleanup_listen(), then (I think) =
the
socket will be unlinked from the accept queue (by accept) but not =
killed,
and so also will not get cleaned up.

Things appear to work if you reverse the order of steps 3 and 4.

I'd send you a patch if I had a simple one, but I don't know what the =
best
approach is. On solution may be to make bluez_accept_dequeue() always =
return
the socket regardless of state and then fix anything that calls
bluez_accept_dequeue() to handle the possibility of a closed socket =
being
returned.

-Daryl.

--- test code below ---
// bzt.c

#include <stdio.h>
#include <errno.h>
#include <sys/socket.h>

#include <bluetooth/bluetooth.h>
#include <bluetooth/rfcomm.h>

int main(int argc, char *argv[])
{
struct sockaddr_rc loc_addr;
int s;
=09
if((s =3D socket(PF_BLUETOOTH, SOCK_STREAM, BTPROTO_RFCOMM)) < 0) {
printf("Can't create server socket:
%s(%d)\n",strerror(errno), errno);
return -1;
}
=09
loc_addr.rc_family =3D AF_BLUETOOTH;
bacpy(&loc_addr.rc_bdaddr, BDADDR_ANY);
loc_addr.rc_channel =3D 1;
if(bind(s,(struct sockaddr *) &loc_addr, sizeof(loc_addr)) < 0) {
printf("Can't bind %s(%d)\n",strerror(errno), errno);
return -1;
}
=09
if(listen(s,10)) {
printf("Can't listen %s(%d)\n",strerror(errno),errno);
return -1;
}
=09
printf("Listening...\n");
while(1) sleep(999);
=09
return 0;
}



-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel


2004-09-22 19:57:17

by Marcel Holtmann

[permalink] [raw]
Subject: RE: [Bluez-devel] Rfcomm Use Count

Hi Daryl,

> > I understand what you mean and it should be impossible to
> > have an socket
> > on the accept_q with state BT_CLOSED. May you wanna remove the extra
> > code in ...accept_dequeue() and only leave a warning message
> > there when
> > you do your tests. However this extra code should not harm and I don't
> > see an easy solution for getting the module reference
> > decrementing right
> > when we have to do it in ...accept_dequeue() for all socket types.
>
> I think I'll leave the code as is, unless you're considering removing the
> code in accept_dequeue(). If there's a problem, we'll see the use count go
> up.

replace it with a BT_ERR message (or only add one), so that you see if
it gets called in your tests. I think the code is useless, but I am not
100% sure.

> Thanks for working on this... Looks like you spent some late nights on it.

You are welcome. Yes, I spent some time to understand what is really
going on. Btw someone must look at the SCO code and check if we have the
same problem there.

Regards

Marcel




-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2004-09-22 20:05:49

by Daryl Van Vorst

[permalink] [raw]
Subject: RE: [Bluez-devel] Rfcomm Use Count

Marcel,

> replace it with a BT_ERR message (or only add one), so that you see if
> it gets called in your tests. I think the code is useless,
> but I am not
> 100% sure.

Ok, will do.

> > Thanks for working on this... Looks like you spent some
> late nights on it.
>
> You are welcome. Yes, I spent some time to understand what is really
> going on. Btw someone must look at the SCO code and check if
> we have the
> same problem there.

I'll keep that in mind if I ever do any SCO stuff... But that's pretty
unlikely at least for now.

-Daryl.

2004-09-22 19:52:26

by Daryl Van Vorst

[permalink] [raw]
Subject: RE: [Bluez-devel] Rfcomm Use Count

Hi Marcel,

> I understand what you mean and it should be impossible to
> have an socket
> on the accept_q with state BT_CLOSED. May you wanna remove the extra
> code in ...accept_dequeue() and only leave a warning message
> there when
> you do your tests. However this extra code should not harm and I don't
> see an easy solution for getting the module reference
> decrementing right
> when we have to do it in ...accept_dequeue() for all socket types.

I think I'll leave the code as is, unless you're considering removing the
code in accept_dequeue(). If there's a problem, we'll see the use count go
up.

Thanks for working on this... Looks like you spent some late nights on it.

-Daryl.

2004-09-22 19:33:56

by Marcel Holtmann

[permalink] [raw]
Subject: RE: [Bluez-devel] Rfcomm Use Count

Hi Daryl,

> > > Does this change make the "if (sk->state == BT_CLOSED)" statement in
> > > bluez_accept_dequeue() redundant?
> >
> > I think so. Haven't thought about it so far.
> >
> > > This is important because if somehow a socket is in
> > BT_CLOSED state and is
> > > in the accept queue during shutdown, then
> > rfcomm_cleanup_listen() can't
> > > clean it up (because bluez_accept_dequeue() won't return
> > it). I think your
> > > changes make it impossible for there to ever be an rfcomm
> > socket (or l2cap
> > > socket) in the BT_CLOSED state in the accept queue. If
> > that's true, then
> > > this isn't an issue. Thoughts?
> >
> > Sorry, but I can't follow your point here. Please explain it again.
>
> Bluez_accept_dequeue() won't return a socket if it is in the closed state.
> So rfcomm_cleanup_listen() will never call close and kill on the socket
> because ...accept_dequeue() never returns it (it just unlinks it). And then
> we would have the use count problem. But if there is no possibility of a
> socket being in the closed state on the accept queue (for rfcomm at least),
> then this isn't an issue.
>
> I think this is fixed - so if I'm still talking nonsense, just ignore it. ;)

I understand what you mean and it should be impossible to have an socket
on the accept_q with state BT_CLOSED. May you wanna remove the extra
code in ...accept_dequeue() and only leave a warning message there when
you do your tests. However this extra code should not harm and I don't
see an easy solution for getting the module reference decrementing right
when we have to do it in ...accept_dequeue() for all socket types.

Regards

Marcel




-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2004-09-22 19:05:17

by Daryl Van Vorst

[permalink] [raw]
Subject: RE: [Bluez-devel] Rfcomm Use Count

Hi Marcel,

> try to test your normal stuff to make sure that it doesn't break
> anything else.

So far so good. I'm setting up some automated testers to pound on it for =
a
while.

> > Does this change make the "if (sk->state =3D=3D BT_CLOSED)" =
statement in
> > bluez_accept_dequeue() redundant?
>=20
> I think so. Haven't thought about it so far.
>=20
> > This is important because if somehow a socket is in=20
> BT_CLOSED state and is
> > in the accept queue during shutdown, then=20
> rfcomm_cleanup_listen() can't
> > clean it up (because bluez_accept_dequeue() won't return=20
> it). I think your
> > changes make it impossible for there to ever be an rfcomm=20
> socket (or l2cap
> > socket) in the BT_CLOSED state in the accept queue. If=20
> that's true, then
> > this isn't an issue. Thoughts?
>=20
> Sorry, but I can't follow your point here. Please explain it again.

Bluez_accept_dequeue() won't return a socket if it is in the closed =
state.
So rfcomm_cleanup_listen() will never call close and kill on the socket
because ...accept_dequeue() never returns it (it just unlinks it). And =
then
we would have the use count problem. But if there is no possibility of a
socket being in the closed state on the accept queue (for rfcomm at =
least),
then this isn't an issue.

I think this is fixed - so if I'm still talking nonsense, just ignore =
it. ;)

> > Do you think we've covered the case where there are=20
> incomming connections
> > while things are at various stages of being shut down? I=20
> think we're ok, but
> > I wanted to bring it up.
>=20
> Actually I don't know. In general we should be protected against that
> with our locking.

As far as I could see the locking protects against it. But I did not =
spend
enough time on it to fully understand how it works. If there is a =
problem,
my testing should eventually uncover it.

-Daryl.

2004-09-22 18:12:45

by Marcel Holtmann

[permalink] [raw]
Subject: RE: [Bluez-devel] Rfcomm Use Count

Hi Daryl,

> I manually applied your patch to my 2.4 kernel, and it seems to work fine. I
> haven't been able to break it yet. Looks good. I will keep trying.

try to test your normal stuff to make sure that it doesn't break
anything else.

> Does this change make the "if (sk->state == BT_CLOSED)" statement in
> bluez_accept_dequeue() redundant?

I think so. Haven't thought about it so far.

> This is important because if somehow a socket is in BT_CLOSED state and is
> in the accept queue during shutdown, then rfcomm_cleanup_listen() can't
> clean it up (because bluez_accept_dequeue() won't return it). I think your
> changes make it impossible for there to ever be an rfcomm socket (or l2cap
> socket) in the BT_CLOSED state in the accept queue. If that's true, then
> this isn't an issue. Thoughts?

Sorry, but I can't follow your point here. Please explain it again.

> Do you think we've covered the case where there are incomming connections
> while things are at various stages of being shut down? I think we're ok, but
> I wanted to bring it up.

Actually I don't know. In general we should be protected against that
with our locking.

Regards

Marcel




-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2004-09-22 17:57:40

by Daryl Van Vorst

[permalink] [raw]
Subject: RE: [Bluez-devel] Rfcomm Use Count

Hi Marcel,

> after more and more thinking about that problem I am almost=20
> sure that it
> is right to call rfcomm_sock_kill() in the state change=20
> function. Anyway
> we must do this on an unlocked socket. Here is my proposal=20
> for the final
> patch, but we need real testing on this so that I can be sure=20
> that there
> are no side effects. Can anyone test it on SMP or HT systems?

I manually applied your patch to my 2.4 kernel, and it seems to work =
fine. I
haven't been able to break it yet. Looks good. I will keep trying.

Does this change make the "if (sk->state =3D=3D BT_CLOSED)" statement in
bluez_accept_dequeue() redundant?

This is important because if somehow a socket is in BT_CLOSED state and =
is
in the accept queue during shutdown, then rfcomm_cleanup_listen() can't
clean it up (because bluez_accept_dequeue() won't return it). I think =
your
changes make it impossible for there to ever be an rfcomm socket (or =
l2cap
socket) in the BT_CLOSED state in the accept queue. If that's true, then
this isn't an issue. Thoughts?

Do you think we've covered the case where there are incomming =
connections
while things are at various stages of being shut down? I think we're ok, =
but
I wanted to bring it up.

-Daryl.

2004-09-22 13:53:49

by Marcel Holtmann

[permalink] [raw]
Subject: RE: [Bluez-devel] Rfcomm Use Count

Hi Daryl,

> the correct approach seems to be:
>
> sk->sk_zapped = 1;
> rfcomm_sock_kill(sk);
>
> The problem is that rfcomm_sock_kill() must be called on an unlocked
> socket and I think that we will deadlock on a SMP machine or get some
> NULL pointer dereferences.

after more and more thinking about that problem I am almost sure that it
is right to call rfcomm_sock_kill() in the state change function. Anyway
we must do this on an unlocked socket. Here is my proposal for the final
patch, but we need real testing on this so that I can be sure that there
are no side effects. Can anyone test it on SMP or HT systems?

Regards

Marcel


Attachments:
patch (2.15 kB)

2004-09-22 11:08:41

by Marcel Holtmann

[permalink] [raw]
Subject: RE: [Bluez-devel] Rfcomm Use Count

Hi Daryl,

> > maybe this one helps:
> >
> > --- 1.29/net/bluetooth/rfcomm/sock.c 2004-06-04 02:41:47 +02:00
> > +++ edited/net/bluetooth/rfcomm/sock.c 2004-09-22 00:12:34 +02:00
> > @@ -104,8 +104,13 @@
> > if (d->state == BT_CONNECTED)
> > rfcomm_session_getaddr(d->session,
> > &bt_sk(sk)->src, NULL);
> > sk->sk_state_change(sk);
> > - } else
> > + } else {
> > + if (d->state == BT_CLOSED) {
> > + bt_accept_unlink(sk);
> > + sk->sk_zapped = 1;
> > + }
> > parent->sk_data_ready(parent, 0);
> > + }
> >
> > bh_unlock_sock(sk);
> > }
>
> No luck.
>
> Same behaviour as without the close and kill calls.

the correct approach seems to be:

sk->sk_zapped = 1;
rfcomm_sock_kill(sk);

The problem is that rfcomm_sock_kill() must be called on an unlocked
socket and I think that we will deadlock on a SMP machine or get some
NULL pointer dereferences.

> I noticed in l2cap.c that after l2cap_chan_del() is called (which now does
> the accept_unlink()), l2cap_sock_kill() is often called which decrements its
> use count which was incremented in l2cap_sock_alloc() (much like in rfcomm).
> But in rfcomm, there don't appear to be any calls to ...sock_kill() for
> similar circumstances.

That is right and I agree that a ...sock_kill() is missing that calls
sock_put() to decrease the reference count.

Regards

Marcel




-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2004-09-21 22:44:50

by Daryl Van Vorst

[permalink] [raw]
Subject: RE: [Bluez-devel] Rfcomm Use Count

Hi Marcel,

> maybe this one helps:
>=20
> --- 1.29/net/bluetooth/rfcomm/sock.c 2004-06-04 02:41:47 +02:00
> +++ edited/net/bluetooth/rfcomm/sock.c 2004-09-22 00:12:34 +02:00
> @@ -104,8 +104,13 @@
> if (d->state =3D=3D BT_CONNECTED)
> rfcomm_session_getaddr(d->session,=20
> &bt_sk(sk)->src, NULL);
> sk->sk_state_change(sk);
> - } else
> + } else {
> + if (d->state =3D=3D BT_CLOSED) {
> + bt_accept_unlink(sk);
> + sk->sk_zapped =3D 1;
> + }
> parent->sk_data_ready(parent, 0);
> + }
> =20
> bh_unlock_sock(sk);
> }

No luck.

Same behaviour as without the close and kill calls.

I noticed in l2cap.c that after l2cap_chan_del() is called (which now =
does
the accept_unlink()), l2cap_sock_kill() is often called which decrements =
its
use count which was incremented in l2cap_sock_alloc() (much like in =
rfcomm).
But in rfcomm, there don't appear to be any calls to ...sock_kill() for
similar circumstances.

-Daryl.

2004-09-21 22:26:45

by Marcel Holtmann

[permalink] [raw]
Subject: RE: [Bluez-devel] Rfcomm Use Count

Hi Daryl,

> At the moment I think calling ...sock_close() and ...sock_kill() in that
> function is wrong. However I am not 100% sure.

maybe this one helps:

--- 1.29/net/bluetooth/rfcomm/sock.c 2004-06-04 02:41:47 +02:00
+++ edited/net/bluetooth/rfcomm/sock.c 2004-09-22 00:12:34 +02:00
@@ -104,8 +104,13 @@
if (d->state == BT_CONNECTED)
rfcomm_session_getaddr(d->session, &bt_sk(sk)->src, NULL);
sk->sk_state_change(sk);
- } else
+ } else {
+ if (d->state == BT_CLOSED) {
+ bt_accept_unlink(sk);
+ sk->sk_zapped = 1;
+ }
parent->sk_data_ready(parent, 0);
+ }

bh_unlock_sock(sk);
}

Regards

Marcel




-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2004-09-21 22:07:24

by Marcel Holtmann

[permalink] [raw]
Subject: RE: [Bluez-devel] Rfcomm Use Count

Hi Daryl,

> Looks like the rfcomm part isn't quite there. The accept queue doesn't fill
> up, but the use count still increases. And /proc/bluetooth/rfcomm still
> lists all of the closed sockets.
>
> I added rfcomm_sock_close() and rfcomm_sock_kill() right after your addition
> to rfcomm_sk_state_change() and it seems to work. The affects of
> rfcomm_sock_alloc() (called by ...connect_ind()) needed to be dealt with. I
> changed it as follows:
>
> --- mh_sock.c 2004-09-21 14:21:36.000000000 -0700
> +++ sock.c 2004-09-21 14:21:47.000000000 -0700
> @@ -104,8 +104,11 @@
> rfcomm_session_getaddr(d->session,
> &bluez_pi(sk)->src, NULL);
> sk->state_change(sk);
> } else {
> - if (d->state == BT_CLOSED)
> + if (d->state == BT_CLOSED) {
> bluez_accept_unlink(sk);
> + rfcomm_sock_close(sk);
> + rfcomm_sock_kill(sk);
> + }
> parent->data_ready(parent, 0);
> }
>
> I am not sure about the possible negative side effects of this. What do you
> think?

what I think is that we have two different bugs here. We may solved the
first one now, which wasn't a module reference count bug. The second now
increases the module reference count and don't decreases it correctly.
Since the RFCOMM implementation is much more complex, because of its
support for TTY and sockets as end user interfaces, we need to find out
if the socket functions are involved or the core/TTY functions.

At the moment I think calling ...sock_close() and ...sock_kill() in that
function is wrong. However I am not 100% sure.

Regards

Marcel




-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2004-09-21 21:26:45

by Daryl Van Vorst

[permalink] [raw]
Subject: RE: [Bluez-devel] Rfcomm Use Count

Hi Marcel,

> > The L2CAP solution looks very sane to me, but I am not sure=20
> about the
> > RFCOMM part and we may will have a locking problem. This=20
> really needs
> > intensive testing.
>=20
> Excellent. Thank you very much.
>=20
> I'll let you know how it works for me.

Looks like the rfcomm part isn't quite there. The accept queue doesn't =
fill
up, but the use count still increases. And /proc/bluetooth/rfcomm still
lists all of the closed sockets.

I added rfcomm_sock_close() and rfcomm_sock_kill() right after your =
addition
to rfcomm_sk_state_change() and it seems to work. The affects of
rfcomm_sock_alloc() (called by ...connect_ind()) needed to be dealt =
with. I
changed it as follows:

--- mh_sock.c 2004-09-21 14:21:36.000000000 -0700
+++ sock.c 2004-09-21 14:21:47.000000000 -0700
@@ -104,8 +104,11 @@
rfcomm_session_getaddr(d->session,
&bluez_pi(sk)->src, NULL);
sk->state_change(sk);
} else {
- if (d->state =3D=3D BT_CLOSED)
+ if (d->state =3D=3D BT_CLOSED) {
bluez_accept_unlink(sk);
+ rfcomm_sock_close(sk);
+ rfcomm_sock_kill(sk);
+ }
parent->data_ready(parent, 0);
}
=20
I am not sure about the possible negative side effects of this. What do =
you
think?

-Daryl.

2004-09-21 20:32:24

by Marcel Holtmann

[permalink] [raw]
Subject: RE: [Bluez-devel] Rfcomm Use Count

Hi Daryl,

> > for the L2CAP part the attached patch may fixes the problem
> > and let the
> > module reference count decrease after the disconnect. The
> > patch is only
> > compile tested. May you wanna give it a chance and run your
> > L2CAP tests
> > again.
>
> I gave it a try and it works fine as far as I can tell. I had to apply the
> patch manually though (I'm assuming this is because you're working with a
> 2.6 kernel and I'm not).

I will fix 2.6 first, before I will backport it to 2.4.

> I don't see the use count incrementing badly any more, and the listening
> socket will continue to accept connections after the number of
> connects/disconnects have exceeded the the accept queue.

This sounds great and hopefully this patch has no negative side effects
on normal listen/accept calls and outgoing connections.

> Does this give you any ideas for rfcomm?

Attached is the full patch that should also fix the RFCOMM part. This is
again only compile tested, because I had no time to setup a test system.
The L2CAP solution looks very sane to me, but I am not sure about the
RFCOMM part and we may will have a locking problem. This really needs
intensive testing.

Regards

Marcel


Attachments:
patch (1.87 kB)

2004-09-21 20:39:19

by Daryl Van Vorst

[permalink] [raw]
Subject: RE: [Bluez-devel] Rfcomm Use Count

Hi Marcel,

> I will fix 2.6 first, before I will backport it to 2.4.

Sounds good.

> Attached is the full patch that should also fix the RFCOMM
> part. This is
> again only compile tested, because I had no time to setup a
> test system.
> The L2CAP solution looks very sane to me, but I am not sure about the
> RFCOMM part and we may will have a locking problem. This really needs
> intensive testing.

Excellent. Thank you very much.

I'll let you know how it works for me.

-Daryl.

2004-09-21 20:14:43

by Daryl Van Vorst

[permalink] [raw]
Subject: RE: [Bluez-devel] Rfcomm Use Count

Hi Marcel,

> for the L2CAP part the attached patch may fixes the problem
> and let the
> module reference count decrease after the disconnect. The
> patch is only
> compile tested. May you wanna give it a chance and run your
> L2CAP tests
> again.

I gave it a try and it works fine as far as I can tell. I had to apply the
patch manually though (I'm assuming this is because you're working with a
2.6 kernel and I'm not).

I don't see the use count incrementing badly any more, and the listening
socket will continue to accept connections after the number of
connects/disconnects have exceeded the the accept queue.

Does this give you any ideas for rfcomm?

-Daryl.

2004-09-20 23:33:03

by Marcel Holtmann

[permalink] [raw]
Subject: RE: [Bluez-devel] Rfcomm Use Count

Hi Daryl,

> > so this means when we disconnect we must ensure that we also
> > remove this
> > connection from the accept queue if it is pending there.
> >
> > What I know think is that calling the bt_accept_unlink() function only
> > from bt_accept_dequeue() is wrong. If bt_sk(sk)->parent is set and we
> > ran into a disconnect then we must unlink it by ourself. Does
> > this make
> > sense?
>
> This makes sense to me, but I'm not convinced that it's enough of a problem
> to warrant major changes. And I'm probably not the best person to ask. ;)

for the L2CAP part the attached patch may fixes the problem and let the
module reference count decrease after the disconnect. The patch is only
compile tested. May you wanna give it a chance and run your L2CAP tests
again.

> Do we know how the tcp stack handles this kind of thing? (I had a quick look
> at a text on sockets and it didn't specifically cover the case of
> connections getting closed which are in the queue. But it was clear that the
> precise behaviour of the queue varies for tcp-ip from unix to unix.)
>
> Most servers (I think) would sit with accept() blocking and then spend a
> very brief time handing off the new connection before blocking on accept()
> again. So this wouldn't cause much trouble.
>
> I suppose this could affect a very simple server which is single-threaded
> (and so could spend significant time between accept() calls). Or a very busy
> server. I'm not sure that this is really a major issue for bluetooth (the
> bandwith and number of real connections is quite limited compared to
> tcp-ip).

The queueing for the accept case is different and so you can't really
compare Bluetooth with TCP/IP. Maybe we made a design mistake in the
early days of BlueZ that will now hit us very badly.

Regards

Marcel


Attachments:
patch (1.40 kB)

2004-09-20 22:38:01

by Daryl Van Vorst

[permalink] [raw]
Subject: RE: [Bluez-devel] Rfcomm Use Count

Hi Marcel,

> > I just tried an experiment where I listen for a while before calling
> > accept(). I can raise the use cound by connecting,=20
> disconnecting, etc (as
> > before). As soon as accept() is called, the excess use=20
> count goes away.
> >=20
> > Before accept() is called, the server will start refusing=20
> connections after
> > you fill up the accept queue. The queue is emptied with a=20
> single call to
> > accept().
>=20
> so this means when we disconnect we must ensure that we also=20
> remove this
> connection from the accept queue if it is pending there.
>=20
> What I know think is that calling the bt_accept_unlink() function only
> from bt_accept_dequeue() is wrong. If bt_sk(sk)->parent is set and we
> ran into a disconnect then we must unlink it by ourself. Does=20
> this make
> sense?

This makes sense to me, but I'm not convinced that it's enough of a =
problem
to warrant major changes. And I'm probably not the best person to ask. =
;)

Do we know how the tcp stack handles this kind of thing? (I had a quick =
look
at a text on sockets and it didn't specifically cover the case of
connections getting closed which are in the queue. But it was clear that =
the
precise behaviour of the queue varies for tcp-ip from unix to unix.)

Most servers (I think) would sit with accept() blocking and then spend a
very brief time handing off the new connection before blocking on =
accept()
again. So this wouldn't cause much trouble.

I suppose this could affect a very simple server which is =
single-threaded
(and so could spend significant time between accept() calls). Or a very =
busy
server. I'm not sure that this is really a major issue for bluetooth =
(the
bandwith and number of real connections is quite limited compared to
tcp-ip).

-Daryl.

2004-09-20 21:28:57

by Marcel Holtmann

[permalink] [raw]
Subject: RE: [Bluez-devel] Rfcomm Use Count

Hi Daryl,

> I just tried an experiment where I listen for a while before calling
> accept(). I can raise the use cound by connecting, disconnecting, etc (as
> before). As soon as accept() is called, the excess use count goes away.
>
> Before accept() is called, the server will start refusing connections after
> you fill up the accept queue. The queue is emptied with a single call to
> accept().

so this means when we disconnect we must ensure that we also remove this
connection from the accept queue if it is pending there.

What I know think is that calling the bt_accept_unlink() function only
from bt_accept_dequeue() is wrong. If bt_sk(sk)->parent is set and we
ran into a disconnect then we must unlink it by ourself. Does this make
sense?

Regards

Marcel




-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2004-09-20 21:03:33

by Daryl Van Vorst

[permalink] [raw]
Subject: RE: [Bluez-devel] Rfcomm Use Count

Marcel,

> > Each time I connect with l2test -n, I see the use count go
> up by 2. When I
> > hit ctrl-c on l2test, the use count goes down by 1 after a
> few seconds.
> > These excess increments (one for each connection) don't go
> away until the
> > test program is killed.
> >
> > So there is a bit of a problem.
>
> then we use some kind of garbage collection already. This is
> bad and we
> shouldn't do so, but right now I don't see the correct fix for this
> behaviour.

I just tried an experiment where I listen for a while before calling
accept(). I can raise the use cound by connecting, disconnecting, etc (as
before). As soon as accept() is called, the excess use count goes away.

Before accept() is called, the server will start refusing connections after
you fill up the accept queue. The queue is emptied with a single call to
accept().

-Daryl.

2004-09-20 20:52:48

by Daryl Van Vorst

[permalink] [raw]
Subject: RE: [Bluez-devel] Rfcomm Use Count

Marcel,

> > With my suggestion above the condition would be handled by=20
> either accept()
> > or cleanup_listen(), whichever is called first. Though I=20
> guess you could
> > still consider that to be garbage collection.
>=20
> when it is handled in ..cleanup_listen() then it gets handled when you
> kill the server process. I think this is far too late, because you can
> have more than one connection attempts in between. We must handle this
> when we close the connection.

Agreed, but if bluez_accept_dequeue() returns closed connections then it =
can
also be handled in accept(). So this would only be an issue in =
situations
where a socket is listening for a long time without there ever being a =
call
to accept().

The topic of cleanup_listen() came up just because the only way it has
appeared for me in normal use was with incomming connections while the
server was being shut down. While the server is running, incomming
connections are returned by accept() before they get closed by the =
remote
side.

It is still unclear to me why I could not bind to the socket (error was
address already in use, 98) when I tried to restart the server. This
behaviour is not seen with the simple test program I sent last week (and =
so
might be unrelated).

> This is correct and so I don't really like this approach. I=20
> think there
> is a simply logic mistake in the closing path of the RFCOMM code.

Could be. Then the closing path of the rfcomm code would need to be able =
to
remove connections from the accept queue. This might be a difficult =
change.

-Daryl.

2004-09-20 20:34:48

by Marcel Holtmann

[permalink] [raw]
Subject: RE: [Bluez-devel] Rfcomm Use Count

Hi Daryl,

> > > I can't get it to happen with l2cap. But I did notice that
> > it takes a few
> > > seconds for the use count to go down after the listening
> > program exits.
> >
> > what do you see when you keep the listing program running and then do
> > some connects? Do the module reference count increases every time?
>
> Each time I connect with l2test -n, I see the use count go up by 2. When I
> hit ctrl-c on l2test, the use count goes down by 1 after a few seconds.
> These excess increments (one for each connection) don't go away until the
> test program is killed.
>
> So there is a bit of a problem.

then we use some kind of garbage collection already. This is bad and we
shouldn't do so, but right now I don't see the correct fix for this
behaviour.

Regards

Marcel




-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2004-09-20 20:11:38

by Daryl Van Vorst

[permalink] [raw]
Subject: RE: [Bluez-devel] Rfcomm Use Count

Marcel,

> > I can't get it to happen with l2cap. But I did notice that
> it takes a few
> > seconds for the use count to go down after the listening
> program exits.
>
> what do you see when you keep the listing program running and then do
> some connects? Do the module reference count increases every time?

Each time I connect with l2test -n, I see the use count go up by 2. When I
hit ctrl-c on l2test, the use count goes down by 1 after a few seconds.
These excess increments (one for each connection) don't go away until the
test program is killed.

So there is a bit of a problem.

-Daryl.

2004-09-20 19:50:21

by Marcel Holtmann

[permalink] [raw]
Subject: RE: [Bluez-devel] Rfcomm Use Count

Hi Daryl,

> I can't get it to happen with l2cap. But I did notice that it takes a few
> seconds for the use count to go down after the listening program exits.

what do you see when you keep the listing program running and then do
some connects? Do the module reference count increases every time?

Regards

Marcel




-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2004-09-20 19:48:46

by Marcel Holtmann

[permalink] [raw]
Subject: RE: [Bluez-devel] Rfcomm Use Count

Hi Daryl,

> > > I may not have been clear about my thoughts on
> > rfcomm_sock_cleanup_listen().
> > > If bluez_accept_dequeue() did return sockets regardless of
> > state, then
> > > rfcomm_sock_cleanup_listen() should work (unless calling
> > close on an already
> > > closed socket causes trouble). When it calls
> > rfcomm_sock_kill(), sock_put()
> > > gets called which calls destruct() which should decrement
> > the use count.
> >
> > The ...cleanup_listen() kicks in when you stop the listening
> > server, but
> > that is far too late. The problematic part already happened.
> > We close a
> > connection that wasn't accepted. We must fix it there and not
> > afterwards
> > like somekind of garbage collection. This is the kernel and not some
> > weird Jave/C++ code.
>
> With my suggestion above the condition would be handled by either accept()
> or cleanup_listen(), whichever is called first. Though I guess you could
> still consider that to be garbage collection.

when it is handled in ..cleanup_listen() then it gets handled when you
kill the server process. I think this is far too late, because you can
have more than one connection attempts in between. We must handle this
when we close the connection.

> > > What if bluez_accept_dequeue() called sk->shutdown() on
> > sockets which are
> > > already closed in the accept queue?
> >
> > I am not sure about the side effects, but this can be a
> > solution. What I
> > think is that setting the socket to BT_CLOSED is wrong. Maybe using
> > BT_DISCONN is better, but this involves changing the state machine.
>
> Calling sk->shutdown() is bluez_accept_dequeue() would be virtually the same
> timing (i.e. garbage collecting likeness) as the suggestion above (returning
> the closed socket in ...accept_dequeue). It would happen only when accept()
> or cleanup_listen() are called.

This is correct and so I don't really like this approach. I think there
is a simply logic mistake in the closing path of the RFCOMM code.

Regards

Marcel




-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2004-09-20 18:52:43

by Daryl Van Vorst

[permalink] [raw]
Subject: RE: [Bluez-devel] Rfcomm Use Count

Hi Marcel,

> > Making bluez_accept_dequeue() return sockets regardless of=20
> state is a
> > potential solution. Accept() for rfcomm and l2cap would=20
> then need to be
> > modified to kill already closed sockets. The existing loop=20
> in the accept()'s
> > would need to be modified or a new one added to handle
> > bluez_accept_dequeue() not always returning an open socket.
>=20
> I don't like that idea and the problem is not in=20
> ...cleanup_listen() and
> so I am not sure if it will really help us.

Fair enough. Though I'm talking about changing bluez_accept_dequeue(), =
not
necessarily cleanup_listen().

> > I may not have been clear about my thoughts on=20
> rfcomm_sock_cleanup_listen().
> > If bluez_accept_dequeue() did return sockets regardless of=20
> state, then
> > rfcomm_sock_cleanup_listen() should work (unless calling=20
> close on an already
> > closed socket causes trouble). When it calls=20
> rfcomm_sock_kill(), sock_put()
> > gets called which calls destruct() which should decrement=20
> the use count.
>=20
> The ...cleanup_listen() kicks in when you stop the listening=20
> server, but
> that is far too late. The problematic part already happened.=20
> We close a
> connection that wasn't accepted. We must fix it there and not=20
> afterwards
> like somekind of garbage collection. This is the kernel and not some
> weird Jave/C++ code.

With my suggestion above the condition would be handled by either =
accept()
or cleanup_listen(), whichever is called first. Though I guess you could
still consider that to be garbage collection.

> > What if bluez_accept_dequeue() called sk->shutdown() on=20
> sockets which are
> > already closed in the accept queue?
>=20
> I am not sure about the side effects, but this can be a=20
> solution. What I
> think is that setting the socket to BT_CLOSED is wrong. Maybe using
> BT_DISCONN is better, but this involves changing the state machine.

Calling sk->shutdown() is bluez_accept_dequeue() would be virtually the =
same
timing (i.e. garbage collecting likeness) as the suggestion above =
(returning
the closed socket in ...accept_dequeue). It would happen only when =
accept()
or cleanup_listen() are called.

> > I'll try out l2cap later for you. We should see the same thing.
>=20
> I am not sure anymore. Maybe this is only RFCOMM related.

I sent another e-mail a moment ago... It doesn't happen with l2cap (or =
at
least I wasn't able to get it to happen).

-Daryl.

2004-09-20 18:32:57

by Marcel Holtmann

[permalink] [raw]
Subject: RE: [Bluez-devel] Rfcomm Use Count

Hi Daryl,

> > At the moment I must admit that I have no idea how to fix
> > this in a sane
> > way. It seems that this bug is in there from the beginning and a wrong
> > fix can cause unexpected side effects.
> >
> > I don't think that the problem is in rfcomm_sock_cleanup_listen(),
> > because the wrong use count is already present after step 3.
> > So when we
> > close a connected DLC that is not accepted yet, we still have
> > it on the
> > accept queue then we have a problem. Maybe there is a bug in our state
> > machine and this is not socket related.
>
> Incoming connections must be added to the accept queue (unless I'm really
> missing something). So the issue is just what to do when the remote side
> closes them before accept() gets to them.

that is correct.

> Making bluez_accept_dequeue() return sockets regardless of state is a
> potential solution. Accept() for rfcomm and l2cap would then need to be
> modified to kill already closed sockets. The existing loop in the accept()'s
> would need to be modified or a new one added to handle
> bluez_accept_dequeue() not always returning an open socket.

I don't like that idea and the problem is not in ...cleanup_listen() and
so I am not sure if it will really help us.

> I may not have been clear about my thoughts on rfcomm_sock_cleanup_listen().
> If bluez_accept_dequeue() did return sockets regardless of state, then
> rfcomm_sock_cleanup_listen() should work (unless calling close on an already
> closed socket causes trouble). When it calls rfcomm_sock_kill(), sock_put()
> gets called which calls destruct() which should decrement the use count.

The ...cleanup_listen() kicks in when you stop the listening server, but
that is far too late. The problematic part already happened. We close a
connection that wasn't accepted. We must fix it there and not afterwards
like somekind of garbage collection. This is the kernel and not some
weird Jave/C++ code.

> What if bluez_accept_dequeue() called sk->shutdown() on sockets which are
> already closed in the accept queue?

I am not sure about the side effects, but this can be a solution. What I
think is that setting the socket to BT_CLOSED is wrong. Maybe using
BT_DISCONN is better, but this involves changing the state machine.

> I'll try out l2cap later for you. We should see the same thing.

I am not sure anymore. Maybe this is only RFCOMM related.

Regards

Marcel




-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2004-09-20 18:37:52

by Daryl Van Vorst

[permalink] [raw]
Subject: RE: [Bluez-devel] Rfcomm Use Count

Marcel,

I can't get it to happen with l2cap. But I did notice that it takes a few
seconds for the use count to go down after the listening program exits.

-Daryl.

> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of
> Daryl Van Vorst
> Sent: September 20, 2004 10:59 AM
> To: 'Marcel Holtmann'
> Cc: 'BlueZ Mailing List'
> Subject: RE: [Bluez-devel] Rfcomm Use Count
>
>
> Hi Marcel,
>
> > At the moment I must admit that I have no idea how to fix
> > this in a sane
> > way. It seems that this bug is in there from the beginning
> and a wrong
> > fix can cause unexpected side effects.
> >
> > I don't think that the problem is in rfcomm_sock_cleanup_listen(),
> > because the wrong use count is already present after step 3.
> > So when we
> > close a connected DLC that is not accepted yet, we still have
> > it on the
> > accept queue then we have a problem. Maybe there is a bug
> in our state
> > machine and this is not socket related.
>
> Incoming connections must be added to the accept queue
> (unless I'm really
> missing something). So the issue is just what to do when the
> remote side
> closes them before accept() gets to them.
>
> Making bluez_accept_dequeue() return sockets regardless of state is a
> potential solution. Accept() for rfcomm and l2cap would then
> need to be
> modified to kill already closed sockets. The existing loop in
> the accept()'s
> would need to be modified or a new one added to handle
> bluez_accept_dequeue() not always returning an open socket.
>
> I may not have been clear about my thoughts on
> rfcomm_sock_cleanup_listen().
> If bluez_accept_dequeue() did return sockets regardless of state, then
> rfcomm_sock_cleanup_listen() should work (unless calling
> close on an already
> closed socket causes trouble). When it calls
> rfcomm_sock_kill(), sock_put()
> gets called which calls destruct() which should decrement the
> use count.
>
> What if bluez_accept_dequeue() called sk->shutdown() on
> sockets which are
> already closed in the accept queue?
>
> I'll try out l2cap later for you. We should see the same thing.
>
> -Daryl.
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
> Project Admins to receive an Apple iPod Mini FREE for your
> judgement on
> who ports your project to Linux PPC the best. Sponsored by IBM.
> Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
> _______________________________________________
> Bluez-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/bluez-devel
>

2004-09-20 17:58:55

by Daryl Van Vorst

[permalink] [raw]
Subject: RE: [Bluez-devel] Rfcomm Use Count

Hi Marcel,

> At the moment I must admit that I have no idea how to fix=20
> this in a sane
> way. It seems that this bug is in there from the beginning and a wrong
> fix can cause unexpected side effects.
>=20
> I don't think that the problem is in rfcomm_sock_cleanup_listen(),
> because the wrong use count is already present after step 3.=20
> So when we
> close a connected DLC that is not accepted yet, we still have=20
> it on the
> accept queue then we have a problem. Maybe there is a bug in our state
> machine and this is not socket related.

Incoming connections must be added to the accept queue (unless I'm =
really
missing something). So the issue is just what to do when the remote side
closes them before accept() gets to them.

Making bluez_accept_dequeue() return sockets regardless of state is a
potential solution. Accept() for rfcomm and l2cap would then need to be
modified to kill already closed sockets. The existing loop in the =
accept()'s
would need to be modified or a new one added to handle
bluez_accept_dequeue() not always returning an open socket.

I may not have been clear about my thoughts on =
rfcomm_sock_cleanup_listen().
If bluez_accept_dequeue() did return sockets regardless of state, then
rfcomm_sock_cleanup_listen() should work (unless calling close on an =
already
closed socket causes trouble). When it calls rfcomm_sock_kill(), =
sock_put()
gets called which calls destruct() which should decrement the use count.

What if bluez_accept_dequeue() called sk->shutdown() on sockets which =
are
already closed in the accept queue?

I'll try out l2cap later for you. We should see the same thing.

-Daryl.

2004-09-17 08:58:54

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [Bluez-devel] Rfcomm Use Count

Hi Daryl,

> I have a simple way to reproduce at least part of this bug. I don't have an
> up-to-date x86 machine to try this on, but I suspect you'll see the same
> behaviour:
>
> 1. Compile and run the attached code on one machine
> 2. Connect to it from another machine using: rctest -n -P1 <bd_addr>
> 3. Hit ctrl-c on rctest
> 4. Hit ctrl-c on bzt (or whatever you called the compiled code)
> 5. lsmod and look at the rfcomm use count.
>
> I think the problem stems from rfcomm_cleanup_listen() and
> bluez_accept_dequeue(). Bluez_accept_dequeue() won't return the socket if it
> is in the closed state, and so rfcomm_cleanup_listen() can't fully cleanup.
>
> And if accept is called before rfcomm_cleanup_listen(), then (I think) the
> socket will be unlinked from the accept queue (by accept) but not killed,
> and so also will not get cleaned up.
>
> Things appear to work if you reverse the order of steps 3 and 4.

I can verify that this bug exists even in a 2.6 kernel. There is no need
to execute step 4, because the missing decrementing of the use count is
already there after step 3. And I think the same problem will exists for
the L2CAP module. Please verify this for me.

> I'd send you a patch if I had a simple one, but I don't know what the best
> approach is. On solution may be to make bluez_accept_dequeue() always return
> the socket regardless of state and then fix anything that calls
> bluez_accept_dequeue() to handle the possibility of a closed socket being
> returned.

At the moment I must admit that I have no idea how to fix this in a sane
way. It seems that this bug is in there from the beginning and a wrong
fix can cause unexpected side effects.

I don't think that the problem is in rfcomm_sock_cleanup_listen(),
because the wrong use count is already present after step 3. So when we
close a connected DLC that is not accepted yet, we still have it on the
accept queue then we have a problem. Maybe there is a bug in our state
machine and this is not socket related.

Regards

Marcel




-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2004-10-02 09:26:12

by Marcel Holtmann

[permalink] [raw]
Subject: RE: [Bluez-devel] Rfcomm Use Count

Hi Diego,

> I had the same problem here.
>
> The patch seems to solve it.
>
> Tested rfcomm on linux 2.6 single processor.
>
> I tried to apply the fix also to 2.4.x with
> the following patch that compiles.
>
> I hope it is correct.
>
> Thanks for your work.

I will do a patch for 2.4 when I resynced every pending 2.6 stuff. At
the moment I am not quite sure that even my 2.6 patch is the best
solution. However it is better then nothing.

Regards

Marcel




-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel