Hi,
It seems there is a regression introduced on v2.6.30-rc1 while using
DEFER_SETUP on a RFCOMM socket. It happens when the user reject the
authorization:
2009-05-27 13:38:14.048925 < ACL data: handle 42 flags 0x02 dlen 12
L2CAP(s): Disconn req: dcid 0x0042 scid 0x0040
2009-05-27 13:38:14.052864 > HCI Event: Number of Completed Packets (0x13) plen
5
handle 42 packets 1
2009-05-27 13:38:14.174975 > ACL data: handle 42 flags 0x02 dlen 12
L2CAP(s): Disconn rsp: dcid 0x0042 scid 0x0040
No hci disconnect command is sent so the ACL link will stay up
forever, I suspect some timeout got removed in rc1 since with v2.6.29
after some period the ACL is disconnected, but for my surprise it
seems HCI_IDLE_TIMEOUT is not being used.
--
Luiz Augusto von Dentz
Engenheiro de Computa??o
Another update now fixing oopses by calling rfcomm_clear_session_timer
on rfcomm_session_del.
On Thu, Jun 25, 2009 at 10:14 AM, Luiz Augusto von
Dentz<[email protected]> wrote:
> Patch now against updated bluetooth-testing
>
> --
> Luiz Augusto von Dentz
> Engenheiro de Computa??o
>
--
Luiz Augusto von Dentz
Engenheiro de Computa??o
Patch now against updated bluetooth-testing
--
Luiz Augusto von Dentz
Engenheiro de Computa??o
Lastest version which now triggers DISC 0 instead of
rfcomm_session_put and introduce RFCOMM_IDLE_TIMEOUT (2 sec), there is
also a fix to use test_and_clear_bit instead of just test_bit since
now the session would not go away until the remote stack reply with
UA.
--
Luiz Augusto von Dentz
Engenheiro de Computa??o
Hi Stefan,
On Tue, Jun 23, 2009 at 3:04 PM, Stefan Seyfried<[email protected]> wrote:
> What if the remote stack went away completely (machine crashed, out of
> battery...)?
>
iirc the link manager on chip detects this and send us the proper disconnec=
t.
--=20
Luiz Augusto von Dentz
Engenheiro de Computa=E7=E3o
Hi Marcel and Luiz
(me commenting without any real knowledge about the issue, but being
curious and having seen similar stuff on other protocols... :)
On Tue, 23 Jun 2009 17:06:12 +0200
Marcel Holtmann <[email protected]> wrote:
> > Moving ahead, what about the timeout, 20 seconds seems too much
> > doesn't it? Are you fine with the places where I clear the timer? =20
>=20
> the timeout should be 2 seconds. At most 5 seconds.
>=20
> I still prefer if we just send DISC and then wait for the remote stack
> to do the right thing. If it doesn't then that stack is so broken that
> whatever we try to do is wrong anyway at that point. =20
What if the remote stack went away completely (machine crashed, out of
battery...)?
Best regards,
Stefan
--=20
Stefan Seyfried
R&D Team Mobile Devices | "Any ideas, John?"
SUSE LINUX Products GmbH, N=C3=BCrnberg | "Well, surrounding them's out."
This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG N=C3=BCrnberg)
Hi Luiz,
> > what about just triggering the timer and then sending DISC for DLCI 0. I
> > don't see a big benefit for this reference counting overhead.
> >
> > When we send the DISC, we will receive UA and thus get the required
> > rfcomm_session_put() that then leads to the ACL disconnect.
>
> Hmm, that seems to work too and is what we are doing in case of
> receiving a DM, but there is a chance that the remote stack doesn't
> respond the DISC in that case we can also use the session timer to
> timeout. Anyway I don't think such broken stack exist, although we do
> set timeout when sending DISC to a specific DLCI, so lets leave this
> for latter when he actually have a real offender which doesn't respond
> with UA.
>
> Moving ahead, what about the timeout, 20 seconds seems too much
> doesn't it? Are you fine with the places where I clear the timer?
the timeout should be 2 seconds. At most 5 seconds.
I still prefer if we just send DISC and then wait for the remote stack
to do the right thing. If it doesn't then that stack is so broken that
whatever we try to do is wrong anyway at that point.
Regards
Marcel
Hi Marcel,
On Tue, Jun 23, 2009 at 11:40 AM, Marcel Holtmann<[email protected]> wrot=
e:
> what about just triggering the timer and then sending DISC for DLCI 0. I
> don't see a big benefit for this reference counting overhead.
>
> When we send the DISC, we will receive UA and thus get the required
> rfcomm_session_put() that then leads to the ACL disconnect.
Hmm, that seems to work too and is what we are doing in case of
receiving a DM, but there is a chance that the remote stack doesn't
respond the DISC in that case we can also use the session timer to
timeout. Anyway I don't think such broken stack exist, although we do
set timeout when sending DISC to a specific DLCI, so lets leave this
for latter when he actually have a real offender which doesn't respond
with UA.
Moving ahead, what about the timeout, 20 seconds seems too much
doesn't it? Are you fine with the places where I clear the timer?
--=20
Luiz Augusto von Dentz
Engenheiro de Computa=E7=E3o
Hi Luiz,
> Cleanup version, which include code style fixing suggested by Johan on irc.
>
> There are 2 other things that comes to my mind that need discussing:
>
> - 20 sec timeout is perhaps too big
> - clear the timer on rfcomm_session_close is perhaps not save then we
> can move it to rfcomm_session_del
what about just triggering the timer and then sending DISC for DLCI 0. I
don't see a big benefit for this reference counting overhead.
When we send the DISC, we will receive UA and thus get the required
rfcomm_session_put() that then leads to the ACL disconnect.
Regards
Marcel
Cleanup version, which include code style fixing suggested by Johan on irc.
There are 2 other things that comes to my mind that need discussing:
- 20 sec timeout is perhaps too big
- clear the timer on rfcomm_session_close is perhaps not save then we
can move it to rfcomm_session_del
--
Luiz Augusto von Dentz
Engenheiro de Computa??o
Proper git format-patch
--
Luiz Augusto von Dentz
Engenheiro de Computa??o
Hi Marcel,
On Sun, Jun 21, 2009 at 4:04 PM, Marcel Holtmann<[email protected]> wrote:
> stupid specification. It is just bloody stupid that we have to cleanup
> someone else's stuff that we haven't initiated in the first place :(
>
> diff --git a/net/bluetooth/rfcomm/core.c b/net/bluetooth/rfcomm/core.c
> index 374536e..864c3c4 100644
> --- a/net/bluetooth/rfcomm/core.c
> +++ b/net/bluetooth/rfcomm/core.c
> @@ -466,6 +466,11 @@ static int __rfcomm_dlc_close(struct rfcomm_dlc *d, int err)
>
> ? ? ? ? ? ? ? ?skb_queue_purge(&d->tx_queue);
> ? ? ? ? ? ? ? ?rfcomm_dlc_unlink(d);
> +
> + ? ? ? ? ? ? ? /* Specification demands to cleanup after remote
> + ? ? ? ? ? ? ? ?* initiated session when closing last DLC */
> + ? ? ? ? ? ? ? if (list_empty(&s->dlcs))
> + ? ? ? ? ? ? ? ? ? ? ? rfcomm_session_put(s);
> ? ? ? ?}
As your patch seems to trigger DISC 0 in both sides when the remote
stack cope with DM I would suggest introducing a timer_list to
rfcomm_session so we can give some time to remote stack to take down
dlci 0 clean it up otherwise.
@@ -244,6 +246,33 @@ static inline int rfcomm_check_security(struct
rfcomm_dlc *d)
auth_type);
}
+static void rfcomm_session_timeout(unsigned long arg)
+{
+ struct rfcomm_session *s = (void *) arg;
+
+ BT_DBG("session %p state %ld", s, s->state);
+
+ set_bit(RFCOMM_TIMED_OUT, &s->flags);
+ rfcomm_session_put(s);
+ rfcomm_schedule(RFCOMM_SCHED_TIMEO);
+}
+
+static void rfcomm_session_set_timer(struct rfcomm_session *s, long timeout)
+{
+ BT_DBG("session %p state %ld timeout %ld", s, s->state, timeout);
+
+ if (!mod_timer(&s->timer, jiffies + timeout))
+ rfcomm_session_hold(s);
+}
+
+static void rfcomm_session_clear_timer(struct rfcomm_session *s)
+{
+ BT_DBG("session %p state %ld", s, s->state);
+
+ if (timer_pending(&s->timer) && del_timer(&s->timer))
+ rfcomm_session_put(s);
+}
+
/* ---- RFCOMM DLCs ---- */
static void rfcomm_dlc_timeout(unsigned long arg)
{
@@ -320,6 +349,7 @@ static void rfcomm_dlc_link(struct rfcomm_session
*s, struct rfcomm_dlc *d)
rfcomm_session_hold(s);
+ rfcomm_session_clear_timer(s);
rfcomm_dlc_hold(d);
list_add(&d->list, &s->dlcs);
d->session = s;
@@ -335,6 +365,9 @@ static void rfcomm_dlc_unlink(struct rfcomm_dlc *d)
d->session = NULL;
rfcomm_dlc_put(d);
+ if (list_empty(&s->dlcs))
+ rfcomm_session_set_timer(s, RFCOMM_DISC_TIMEOUT);
+
rfcomm_session_put(s);
}
@@ -454,6 +487,7 @@ static int __rfcomm_dlc_close(struct rfcomm_dlc *d, int err)
rfcomm_schedule(RFCOMM_SCHED_AUTH);
break;
}
+
/* Fall through */
default:
@@ -567,6 +601,8 @@ static struct rfcomm_session
*rfcomm_session_add(struct socket *sock, int state)
BT_DBG("session %p sock %p", s, sock);
+ setup_timer(&s->timer, rfcomm_session_timeout, (unsigned long)s);
+
INIT_LIST_HEAD(&s->dlcs);
s->state = state;
s->sock = sock;
@@ -639,6 +675,7 @@ static void rfcomm_session_close(struct
rfcomm_session *s, int err)
__rfcomm_dlc_close(d, err);
}
+ rfcomm_session_clear_timer(s);
rfcomm_session_put(s);
}
@@ -1774,6 +1811,7 @@ static inline void rfcomm_process_dlcs(struct
rfcomm_session *s)
rfcomm_send_dm(s, d->dlci);
else
d->state = BT_CLOSED;
+
__rfcomm_dlc_close(d, ECONNREFUSED);
continue;
}
@@ -1879,6 +1917,11 @@ static inline void rfcomm_process_sessions(void)
struct rfcomm_session *s;
s = list_entry(p, struct rfcomm_session, list);
+ if (test_bit(RFCOMM_TIMED_OUT, &s->flags)) {
+ rfcomm_session_put(s);
+ continue;
+ }
+
if (s->state == BT_LISTEN) {
rfcomm_accept_connection(s);
continue;
--
Luiz Augusto von Dentz
Engenheiro de Computa??o
Hi Luiz,
> > did you actually test this change? And understand it?
> >
>
> Yep, this was actually one of my first attempts to fix the problem and
> it make no difference, but the real problem is not rfcomm_dlc
> reference being hold it is currently rfcomm_session reference which
> are not released until the remote device respond with DISC dlci 0, but
> in case where the remote never respond this reference will be held
> forever which cause the ACL to never be disconnected.
>
> There is 2 session reference being hold, one by rfcomm_dlc_link
> (core.c:321) which rfcomm_dlc_unlink should takes care and another one
> created on rfcomm_accept_connection (core.c:1837) which afaik won't go
> away if the remote device doesn't respond with a proper DISC to dlci
> 0.
stupid specification. It is just bloody stupid that we have to cleanup
someone else's stuff that we haven't initiated in the first place :(
diff --git a/net/bluetooth/rfcomm/core.c b/net/bluetooth/rfcomm/core.c
index 374536e..864c3c4 100644
--- a/net/bluetooth/rfcomm/core.c
+++ b/net/bluetooth/rfcomm/core.c
@@ -466,6 +466,11 @@ static int __rfcomm_dlc_close(struct rfcomm_dlc *d, int err)
skb_queue_purge(&d->tx_queue);
rfcomm_dlc_unlink(d);
+
+ /* Specification demands to cleanup after remote
+ * initiated session when closing last DLC */
+ if (list_empty(&s->dlcs))
+ rfcomm_session_put(s);
}
The patch above should actually fix this, but it is neither compile nor
runtime tested.
If it actually break outgoing connections, which it might, then we have
to add a !d->out to the if statement here and move the whole statement
before rfcomm_dlc_unlink and skb_queue_purge. That is fine anyway since
the rfcomm_dlc_link will always hold at least one session reference
count.
Regards
Marcel
Hi Marcel,
On Sun, Jun 21, 2009 at 1:58 PM, Marcel Holtmann<[email protected]> wrote:
> Hi Luiz,
>
> did you actually test this change? And understand it?
>
Yep, this was actually one of my first attempts to fix the problem and
it make no difference, but the real problem is not rfcomm_dlc
reference being hold it is currently rfcomm_session reference which
are not released until the remote device respond with DISC dlci 0, but
in case where the remote never respond this reference will be held
forever which cause the ACL to never be disconnected.
There is 2 session reference being hold, one by rfcomm_dlc_link
(core.c:321) which rfcomm_dlc_unlink should takes care and another one
created on rfcomm_accept_connection (core.c:1837) which afaik won't go
away if the remote device doesn't respond with a proper DISC to dlci
0.
--
Luiz Augusto von Dentz
Engenheiro de Computa??o
Hi Luiz,
> > so does this fixes it:
> >
> > diff --git a/net/bluetooth/rfcomm/core.c b/net/bluetooth/rfcomm/core.c
> > index 374536e..266c3b7 100644
> > --- a/net/bluetooth/rfcomm/core.c
> > +++ b/net/bluetooth/rfcomm/core.c
> > @@ -1772,8 +1772,7 @@ static inline void rfcomm_process_dlcs(struct
> > rfcomm_sessi
> > rfcomm_dlc_clear_timer(d);
> > if (!d->out)
> > rfcomm_send_dm(s, d->dlci);
> > - else
> > - d->state = BT_CLOSED;
> > + d->state = BT_CLOSED;
> > __rfcomm_dlc_close(d, ECONNREFUSED);
> > continue;
> > }
> >
>
> Not really, rfcomm_dlc is being closed and freed properly, BT_CONNECT2
> or BT_CLOSED doesn't make much difference to __rfcomm_dlc_close as
> they both trigger default case. As I said the code works fine with
> stacks that cope with DM response, when it doesn't you have to
> manually trigger rfcomm_session_put to take care of the reference
> created on rfcomm_accept_connection.
did you actually test this change? And understand it?
Regards
Marcel
Hi Marcel,
On Sun, Jun 21, 2009 at 11:48 AM, Marcel Holtmann<[email protected]> wrot=
e:
> Hi Luiz,
>
> so does this fixes it:
>
> diff --git a/net/bluetooth/rfcomm/core.c b/net/bluetooth/rfcomm/core.c
> index 374536e..266c3b7 100644
> --- a/net/bluetooth/rfcomm/core.c
> +++ b/net/bluetooth/rfcomm/core.c
> @@ -1772,8 +1772,7 @@ static inline void rfcomm_process_dlcs(struct
> rfcomm_sessi
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0rfcomm_dlc_clear_timer(d);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (!d->out)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0rfcomm_sen=
d_dm(s, d->dlci);
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 else
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 d->state =
=3D BT_CLOSED;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 d->state =3D BT_CLOSED;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0__rfcomm_dlc_close(d, ECON=
NREFUSED);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0continue;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0}
>
Not really, rfcomm_dlc is being closed and freed properly, BT_CONNECT2
or BT_CLOSED doesn't make much difference to __rfcomm_dlc_close as
they both trigger default case. As I said the code works fine with
stacks that cope with DM response, when it doesn't you have to
manually trigger rfcomm_session_put to take care of the reference
created on rfcomm_accept_connection.
--=20
Luiz Augusto von Dentz
Engenheiro de Computa=E7=E3o
Hi Luiz,
> Finally find out what is the problem, we are currently sending DM to
> reject the connection when using DEFER_SETUP which is fine according
> to RFCOMM spec:
>
> "the responding implementation may replace the "proper" response on
> the Multiplexer Control channel with a DM frame, sent on the referenced DLCI
> to indicate that the DLCI is not open, and that the responder would not grant a
> request to open it later either."
>
> What it doesn't mention is which part is supposed to take down DLCI 0
> in case that there is no other DLC configured, from what I could find
> out the initiator is supposed to take down by sending DISC 0. This
> seems to work fine when using rctest, but some headset may not take
> any action when receiving the DM frame, so what can be done in this
> case?
>
> What about a timeout to trigger rfcomm_session_put and send DISC 0?
> This might solve the problem and we avoid the double DISC 0 in case
> the initiator stack implementation cope with DM.
so does this fixes it:
diff --git a/net/bluetooth/rfcomm/core.c b/net/bluetooth/rfcomm/core.c
index 374536e..266c3b7 100644
--- a/net/bluetooth/rfcomm/core.c
+++ b/net/bluetooth/rfcomm/core.c
@@ -1772,8 +1772,7 @@ static inline void rfcomm_process_dlcs(struct
rfcomm_sessi
rfcomm_dlc_clear_timer(d);
if (!d->out)
rfcomm_send_dm(s, d->dlci);
- else
- d->state = BT_CLOSED;
+ d->state = BT_CLOSED;
__rfcomm_dlc_close(d, ECONNREFUSED);
continue;
}
Regards
Marcel
Hi,
Finally find out what is the problem, we are currently sending DM to
reject the connection when using DEFER_SETUP which is fine according
to RFCOMM spec:
"the responding implementation may replace the "proper" response on
the Multiplexer Control channel with a DM frame, sent on the referenced DLCI
to indicate that the DLCI is not open, and that the responder would not grant a
request to open it later either."
What it doesn't mention is which part is supposed to take down DLCI 0
in case that there is no other DLC configured, from what I could find
out the initiator is supposed to take down by sending DISC 0. This
seems to work fine when using rctest, but some headset may not take
any action when receiving the DM frame, so what can be done in this
case?
What about a timeout to trigger rfcomm_session_put and send DISC 0?
This might solve the problem and we avoid the double DISC 0 in case
the initiator stack implementation cope with DM.
--
Luiz Augusto von Dentz
Engenheiro de Computa??o