2015-10-14 10:18:44

by Dean Jenkins

[permalink] [raw]
Subject: [PATCH v1 0/3] Fix L2CAP ERTM shutdown locking

Details of the issue are described in the thread
[RFC] Bluetooth ERTM L2CAP deadlocks (hung tasks) due to l2cap_sock_shutdown()
and kernel.org Bugzilla https://bugzilla.kernel.org/show_bug.cgi?id=99301

Summary
=======

This is a continuation with the discussion in
http://marc.info/?l=linux-bluetooth&m=143507884817418&w=2

The latest discussion can be found in the thread
[PATCH v1 0/7] Avoid L2CAP ERTM shutdown hung tasks
http://www.spinics.net/lists/linux-bluetooth/msg64576.html

Note some of the patches from the orginal patchset had been applied to
bluetooth-next by the maintainers. This new patchset applies to what is
currently in bluetooth-next.

The solution is for preventing various flavours of hung task relating to
l2cap_sock_shutdown() when the connection type is ERTM. The solution improves
the locking to allow __l2cap_wait_ack() to wait with no locks held.

Locking Solution
================

This solution reorganises the l2cap_sock_shutdown() function which is designed
to only action shutdown of the channel when shutdown is not already in progress.

This solution also reorganizes the mutex lock and is now only protecting
l2cap_chan_close(). This is now consistent with other places where
l2cap_chan_close() is called.

If a conn connection exists, call mutex_lock(&conn->chan_lock)
before calling l2cap_chan_close() to ensure other L2CAP protocol operations
do not interfere.

Note that the conn structure has to be protected from being freed as it is
possible for the connection to be disconnected whilst the locks are not held.
This solution allows the mutex lock to be used even when the connection has
just been disconnected.

In addition this solution reduces the scope of chan locking.

The only place where chan locking is needed is the call to
l2cap_chan_close(chan, 0) which if necessary closes the channel. Therefore, move
the l2cap_chan_lock(chan) and l2cap_chan_unlock(chan) locking calls to be around
l2cap_chan_close(chan, 0).

This allows __l2cap_wait_ack() to be called with no mutex lock and no chan lock
being held so L2CAP messaging over the ACL link can be done unimpaired. Note
that the sk lock is released whilst __l2cap_wait_ack() waits meaning that no
locks are held whilst waiting for ACKs.

Disconnection Request race condition modification
=================================================

There is a L2CAP protocol race between the local peer and the remote peer
demanding disconnection of the L2CAP link.

When L2CAP ERTM is used, l2cap_sock_shutdown() can be called from userland to
disconnect L2CAP. However, there can be a delay introduced by waiting for ACKs.
During this waiting period, the remote peer may have sent a Disconnection
Request.

This means that during this waiting period, the remote peer may have sent a
Disconnection Request which is actioned by the kernel. Therefore, recheck the
shutdown status of the socket after waiting for ACKs because there is no need to
do further processing if the connection has gone.

Testing
=======

We do not use x86 based systems so we are unable to check whether bluetooth-next
is working with this patchset.

SO_LINGER time was not formally tested. Our crash logs suggested that SO_LINGER
time was not used.

This has been tested in kernels 3.8 and 3.14 on an ARM based board for a
commercial product. Results show that the hung tasks no longer occur and
the revised locking seems to work OK.

Some limited testing on ARM 3.14 kernel using l2cap_tester and hci_vhci module
was done but this should be repeated by someone on bluetooth-next before
accepting the changes.

Note that the failure scenario was Media player AVRCP browsing. Waiting for
ACK's is unimportant in this scenario because it does not matter whether
a media browsing request such as "get item" was successfully transferred at the
point of shutting down the channel.

These patches have not been tested on a bluetooth-next based build. However,
bluetooth-next with the patches applied has been built using x86_64_defconfig
with Bluetooth enabled.

v1 changes
==========

0001-Bluetooth-Unwind-l2cap_sock_shutdown.patch was introduced
to handle l2cap_sock_shutdown() function which is designed to only
action shutdown of the channel when shutdown is not already in progress.
This patch also reorganise the code flow by adding a goto to jump to the
end of function handling when shutdown is already being actioned.

0002-Bluetooth-Reorganize-mutex-lock-in-l2cap_sock_shutdo.patch was introduced
to reorganize the mutex lock and also reduce the scope of chan locking.

0003-Bluetooth-l2cap_disconnection_req-priority-over-shut.patch was introduced
to handle a L2CAP protocol race between the local peer and the remote peer
demanding disconnection of the L2CAP link, when L2CAP ERTM is used.

TODO list
=========

Check if __l2cap_wait_ack() is really needed because the L2CAP ACKs only ensure
that the upper layer messsages got to the remote peer. It does not ensure that
an upper layer response message will come back before the shutdown completes.

Feedback
========

Please review and provide comments on this solution. Thanks.

Patchset
========

Patches are based on bluetooth-next master branch (14 October 2015):

Dean Jenkins (3):
Bluetooth: Unwind l2cap_sock_shutdown()
Bluetooth: Reorganize mutex lock in l2cap_sock_shutdown()
Bluetooth: l2cap_disconnection_req priority over shutdown

net/bluetooth/l2cap_sock.c | 71 ++++++++++++++++++++++++++++++----------------
1 file changed, 47 insertions(+), 24 deletions(-)

--
1.8.5.6


2015-10-20 13:50:29

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] Fix L2CAP ERTM shutdown locking

Hi Dean,

> Details of the issue are described in the thread
> [RFC] Bluetooth ERTM L2CAP deadlocks (hung tasks) due to l2cap_sock_shutdown()
> and kernel.org Bugzilla https://bugzilla.kernel.org/show_bug.cgi?id=99301
>
> Summary
> =======
>
> This is a continuation with the discussion in
> http://marc.info/?l=linux-bluetooth&m=143507884817418&w=2
>
> The latest discussion can be found in the thread
> [PATCH v1 0/7] Avoid L2CAP ERTM shutdown hung tasks
> http://www.spinics.net/lists/linux-bluetooth/msg64576.html
>
> Note some of the patches from the orginal patchset had been applied to
> bluetooth-next by the maintainers. This new patchset applies to what is
> currently in bluetooth-next.
>
> The solution is for preventing various flavours of hung task relating to
> l2cap_sock_shutdown() when the connection type is ERTM. The solution improves
> the locking to allow __l2cap_wait_ack() to wait with no locks held.
>
> Locking Solution
> ================
>
> This solution reorganises the l2cap_sock_shutdown() function which is designed
> to only action shutdown of the channel when shutdown is not already in progress.
>
> This solution also reorganizes the mutex lock and is now only protecting
> l2cap_chan_close(). This is now consistent with other places where
> l2cap_chan_close() is called.
>
> If a conn connection exists, call mutex_lock(&conn->chan_lock)
> before calling l2cap_chan_close() to ensure other L2CAP protocol operations
> do not interfere.
>
> Note that the conn structure has to be protected from being freed as it is
> possible for the connection to be disconnected whilst the locks are not held.
> This solution allows the mutex lock to be used even when the connection has
> just been disconnected.
>
> In addition this solution reduces the scope of chan locking.
>
> The only place where chan locking is needed is the call to
> l2cap_chan_close(chan, 0) which if necessary closes the channel. Therefore, move
> the l2cap_chan_lock(chan) and l2cap_chan_unlock(chan) locking calls to be around
> l2cap_chan_close(chan, 0).
>
> This allows __l2cap_wait_ack() to be called with no mutex lock and no chan lock
> being held so L2CAP messaging over the ACL link can be done unimpaired. Note
> that the sk lock is released whilst __l2cap_wait_ack() waits meaning that no
> locks are held whilst waiting for ACKs.
>
> Disconnection Request race condition modification
> =================================================
>
> There is a L2CAP protocol race between the local peer and the remote peer
> demanding disconnection of the L2CAP link.
>
> When L2CAP ERTM is used, l2cap_sock_shutdown() can be called from userland to
> disconnect L2CAP. However, there can be a delay introduced by waiting for ACKs.
> During this waiting period, the remote peer may have sent a Disconnection
> Request.
>
> This means that during this waiting period, the remote peer may have sent a
> Disconnection Request which is actioned by the kernel. Therefore, recheck the
> shutdown status of the socket after waiting for ACKs because there is no need to
> do further processing if the connection has gone.
>
> Testing
> =======
>
> We do not use x86 based systems so we are unable to check whether bluetooth-next
> is working with this patchset.
>
> SO_LINGER time was not formally tested. Our crash logs suggested that SO_LINGER
> time was not used.
>
> This has been tested in kernels 3.8 and 3.14 on an ARM based board for a
> commercial product. Results show that the hung tasks no longer occur and
> the revised locking seems to work OK.
>
> Some limited testing on ARM 3.14 kernel using l2cap_tester and hci_vhci module
> was done but this should be repeated by someone on bluetooth-next before
> accepting the changes.
>
> Note that the failure scenario was Media player AVRCP browsing. Waiting for
> ACK's is unimportant in this scenario because it does not matter whether
> a media browsing request such as "get item" was successfully transferred at the
> point of shutting down the channel.
>
> These patches have not been tested on a bluetooth-next based build. However,
> bluetooth-next with the patches applied has been built using x86_64_defconfig
> with Bluetooth enabled.
>
> v1 changes
> ==========
>
> 0001-Bluetooth-Unwind-l2cap_sock_shutdown.patch was introduced
> to handle l2cap_sock_shutdown() function which is designed to only
> action shutdown of the channel when shutdown is not already in progress.
> This patch also reorganise the code flow by adding a goto to jump to the
> end of function handling when shutdown is already being actioned.
>
> 0002-Bluetooth-Reorganize-mutex-lock-in-l2cap_sock_shutdo.patch was introduced
> to reorganize the mutex lock and also reduce the scope of chan locking.
>
> 0003-Bluetooth-l2cap_disconnection_req-priority-over-shut.patch was introduced
> to handle a L2CAP protocol race between the local peer and the remote peer
> demanding disconnection of the L2CAP link, when L2CAP ERTM is used.
>
> TODO list
> =========
>
> Check if __l2cap_wait_ack() is really needed because the L2CAP ACKs only ensure
> that the upper layer messsages got to the remote peer. It does not ensure that
> an upper layer response message will come back before the shutdown completes.
>
> Feedback
> ========
>
> Please review and provide comments on this solution. Thanks.
>
> Patchset
> ========
>
> Patches are based on bluetooth-next master branch (14 October 2015):
>
> Dean Jenkins (3):
> Bluetooth: Unwind l2cap_sock_shutdown()
> Bluetooth: Reorganize mutex lock in l2cap_sock_shutdown()
> Bluetooth: l2cap_disconnection_req priority over shutdown
>
> net/bluetooth/l2cap_sock.c | 71 ++++++++++++++++++++++++++++++----------------
> 1 file changed, 47 insertions(+), 24 deletions(-)

all 3 patches have been applied to bluetooth-next tree.

Regards

Marcel


2015-10-14 10:56:50

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: [PATCH v1 3/3] Bluetooth: l2cap_disconnection_req priority over shutdown

Hi Dean,

On Wed, Oct 14, 2015 at 1:18 PM, Dean Jenkins <[email protected]> wrote:
> There is a L2CAP protocol race between the local peer and
> the remote peer demanding disconnection of the L2CAP link.
>
> When L2CAP ERTM is used, l2cap_sock_shutdown() can be called
> from userland to disconnect L2CAP. However, there can be a
> delay introduced by waiting for ACKs. During this waiting
> period, the remote peer may have sent a Disconnection Request.
> Therefore, recheck the shutdown status of the socket
> after waiting for ACKs because there is no need to do
> further processing if the connection has gone.
>
> Signed-off-by: Dean Jenkins <[email protected]>
> Signed-off-by: Harish Jenny K N <[email protected]>
> ---
> net/bluetooth/l2cap_sock.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
> index d06fb54..1bb5515 100644
> --- a/net/bluetooth/l2cap_sock.c
> +++ b/net/bluetooth/l2cap_sock.c
> @@ -1129,9 +1129,17 @@ static int l2cap_sock_shutdown(struct socket *sock, int how)
>
> if (chan->mode == L2CAP_MODE_ERTM &&
> chan->unacked_frames > 0 &&
> - chan->state == BT_CONNECTED)
> + chan->state == BT_CONNECTED) {
> err = __l2cap_wait_ack(sk, chan);
>
> + /* After waiting for ACKs, check whether shutdown
> + * has already been actioned to close the L2CAP
> + * link such as by l2cap_disconnection_req().
> + */
> + if (sk->sk_shutdown)
> + goto has_shutdown;

This still looks like blocking would still be possible, maybe it won't
block indefinitely but it still could block until ack timeout, which
should only happen when SOL_LINGER is set. But there is the more
fundamental question here which is why do we need to wait unacked
packets? Isn't it better to let the remote know we are disconnecting
by sending the disconnect request then it can flush any pending acks
or drop then instead of waiting like this? This also blocks receiving
data because the sk would be locked, anyway userspace would be
blocking on close at this point, so this means we cannot ack packets
either so I don't think the unacked packets could make any difference
if the idea was to prevent loosing data.

> + }
> +
> sk->sk_shutdown = SHUTDOWN_MASK;
> release_sock(sk);
>
> @@ -1162,6 +1170,7 @@ static int l2cap_sock_shutdown(struct socket *sock, int how)
> err = bt_sock_wait_state(sk, BT_CLOSED,
> sk->sk_lingertime);
>
> +has_shutdown:
> l2cap_chan_put(chan);
> sock_put(sk);
>
> --
> 1.8.5.6
>



--
Luiz Augusto von Dentz

2015-10-14 10:18:47

by Dean Jenkins

[permalink] [raw]
Subject: [PATCH v1 3/3] Bluetooth: l2cap_disconnection_req priority over shutdown

There is a L2CAP protocol race between the local peer and
the remote peer demanding disconnection of the L2CAP link.

When L2CAP ERTM is used, l2cap_sock_shutdown() can be called
from userland to disconnect L2CAP. However, there can be a
delay introduced by waiting for ACKs. During this waiting
period, the remote peer may have sent a Disconnection Request.
Therefore, recheck the shutdown status of the socket
after waiting for ACKs because there is no need to do
further processing if the connection has gone.

Signed-off-by: Dean Jenkins <[email protected]>
Signed-off-by: Harish Jenny K N <[email protected]>
---
net/bluetooth/l2cap_sock.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
index d06fb54..1bb5515 100644
--- a/net/bluetooth/l2cap_sock.c
+++ b/net/bluetooth/l2cap_sock.c
@@ -1129,9 +1129,17 @@ static int l2cap_sock_shutdown(struct socket *sock, int how)

if (chan->mode == L2CAP_MODE_ERTM &&
chan->unacked_frames > 0 &&
- chan->state == BT_CONNECTED)
+ chan->state == BT_CONNECTED) {
err = __l2cap_wait_ack(sk, chan);

+ /* After waiting for ACKs, check whether shutdown
+ * has already been actioned to close the L2CAP
+ * link such as by l2cap_disconnection_req().
+ */
+ if (sk->sk_shutdown)
+ goto has_shutdown;
+ }
+
sk->sk_shutdown = SHUTDOWN_MASK;
release_sock(sk);

@@ -1162,6 +1170,7 @@ static int l2cap_sock_shutdown(struct socket *sock, int how)
err = bt_sock_wait_state(sk, BT_CLOSED,
sk->sk_lingertime);

+has_shutdown:
l2cap_chan_put(chan);
sock_put(sk);

--
1.8.5.6

2015-10-14 10:18:46

by Dean Jenkins

[permalink] [raw]
Subject: [PATCH v1 2/3] Bluetooth: Reorganize mutex lock in l2cap_sock_shutdown()

This commit reorganizes the mutex lock and is now
only protecting l2cap_chan_close(). This is now consistent
with other places where l2cap_chan_close() is called.

If a conn connection exists, call
mutex_lock(&conn->chan_lock) before calling l2cap_chan_close()
to ensure other L2CAP protocol operations do not interfere.

Note that the conn structure has to be protected from being
freed as it is possible for the connection to be disconnected
whilst the locks are not held. This solution allows the mutex
lock to be used even when the connection has just been
disconnected.

This commit also reduces the scope of chan locking.

The only place where chan locking is needed is the call to
l2cap_chan_close(chan, 0) which if necessary closes the channel.
Therefore, move the l2cap_chan_lock(chan) and
l2cap_chan_lock(chan) locking calls to around
l2cap_chan_close(chan, 0).

This allows __l2cap_wait_ack(sk, chan) to be called with no
chan locks being held so L2CAP messaging over the ACL link
can be done unimpaired.

Signed-off-by: Dean Jenkins <[email protected]>
Signed-off-by: Harish Jenny K N <[email protected]>
---
net/bluetooth/l2cap_sock.c | 44 +++++++++++++++++++++++++++-----------------
1 file changed, 27 insertions(+), 17 deletions(-)

diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
index ca5598d..d06fb54 100644
--- a/net/bluetooth/l2cap_sock.c
+++ b/net/bluetooth/l2cap_sock.c
@@ -1111,6 +1111,8 @@ static int l2cap_sock_shutdown(struct socket *sock, int how)
if (!sk)
return 0;

+ lock_sock(sk);
+
if (sk->sk_shutdown)
goto shutdown_already;

@@ -1122,25 +1124,37 @@ static int l2cap_sock_shutdown(struct socket *sock, int how)
chan = l2cap_pi(sk)->chan;
/* prevent chan structure from being freed whilst unlocked */
l2cap_chan_hold(chan);
- conn = chan->conn;

BT_DBG("chan %p state %s", chan, state_to_string(chan->state));

- if (conn)
- mutex_lock(&conn->chan_lock);
-
- l2cap_chan_lock(chan);
- lock_sock(sk);
-
if (chan->mode == L2CAP_MODE_ERTM &&
chan->unacked_frames > 0 &&
chan->state == BT_CONNECTED)
err = __l2cap_wait_ack(sk, chan);

sk->sk_shutdown = SHUTDOWN_MASK;
-
release_sock(sk);
+
+ l2cap_chan_lock(chan);
+ conn = chan->conn;
+ if (conn)
+ /* prevent conn structure from being freed */
+ l2cap_conn_get(conn);
+ l2cap_chan_unlock(chan);
+
+ if (conn)
+ /* mutex lock must be taken before l2cap_chan_lock() */
+ mutex_lock(&conn->chan_lock);
+
+ l2cap_chan_lock(chan);
l2cap_chan_close(chan, 0);
+ l2cap_chan_unlock(chan);
+
+ if (conn) {
+ mutex_unlock(&conn->chan_lock);
+ l2cap_conn_put(conn);
+ }
+
lock_sock(sk);

if (sock_flag(sk, SOCK_LINGER) && sk->sk_lingertime &&
@@ -1148,20 +1162,16 @@ static int l2cap_sock_shutdown(struct socket *sock, int how)
err = bt_sock_wait_state(sk, BT_CLOSED,
sk->sk_lingertime);

+ l2cap_chan_put(chan);
+ sock_put(sk);
+
+shutdown_already:
if (!err && sk->sk_err)
err = -sk->sk_err;

release_sock(sk);
- l2cap_chan_unlock(chan);
-
- if (conn)
- mutex_unlock(&conn->chan_lock);

- l2cap_chan_put(chan);
- sock_put(sk);
-
-shutdown_already:
- BT_DBG("err: %d", err);
+ BT_DBG("Sock shutdown complete err: %d", err);

return err;
}
--
1.8.5.6

2015-10-14 10:18:45

by Dean Jenkins

[permalink] [raw]
Subject: [PATCH v1 1/3] Bluetooth: Unwind l2cap_sock_shutdown()

l2cap_sock_shutdown() is designed to only action shutdown
of the channel when shutdown is not already in progress.
Therefore, reorganise the code flow by adding a goto
to jump to the end of function handling when shutdown is
already being actioned. This removes one level of code
indentation and make the code more readable.

Signed-off-by: Dean Jenkins <[email protected]>
Signed-off-by: Harish Jenny K N <[email protected]>
---
net/bluetooth/l2cap_sock.c | 32 ++++++++++++++++++--------------
1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
index 586b3d5..ca5598d 100644
--- a/net/bluetooth/l2cap_sock.c
+++ b/net/bluetooth/l2cap_sock.c
@@ -1111,6 +1111,11 @@ static int l2cap_sock_shutdown(struct socket *sock, int how)
if (!sk)
return 0;

+ if (sk->sk_shutdown)
+ goto shutdown_already;
+
+ BT_DBG("Handling sock shutdown");
+
/* prevent sk structure from being freed whilst unlocked */
sock_hold(sk);

@@ -1127,23 +1132,21 @@ static int l2cap_sock_shutdown(struct socket *sock, int how)
l2cap_chan_lock(chan);
lock_sock(sk);

- if (!sk->sk_shutdown) {
- if (chan->mode == L2CAP_MODE_ERTM &&
- chan->unacked_frames > 0 &&
- chan->state == BT_CONNECTED)
- err = __l2cap_wait_ack(sk, chan);
+ if (chan->mode == L2CAP_MODE_ERTM &&
+ chan->unacked_frames > 0 &&
+ chan->state == BT_CONNECTED)
+ err = __l2cap_wait_ack(sk, chan);

- sk->sk_shutdown = SHUTDOWN_MASK;
+ sk->sk_shutdown = SHUTDOWN_MASK;

- release_sock(sk);
- l2cap_chan_close(chan, 0);
- lock_sock(sk);
+ release_sock(sk);
+ l2cap_chan_close(chan, 0);
+ lock_sock(sk);

- if (sock_flag(sk, SOCK_LINGER) && sk->sk_lingertime &&
- !(current->flags & PF_EXITING))
- err = bt_sock_wait_state(sk, BT_CLOSED,
- sk->sk_lingertime);
- }
+ if (sock_flag(sk, SOCK_LINGER) && sk->sk_lingertime &&
+ !(current->flags & PF_EXITING))
+ err = bt_sock_wait_state(sk, BT_CLOSED,
+ sk->sk_lingertime);

if (!err && sk->sk_err)
err = -sk->sk_err;
@@ -1157,6 +1160,7 @@ static int l2cap_sock_shutdown(struct socket *sock, int how)
l2cap_chan_put(chan);
sock_put(sk);

+shutdown_already:
BT_DBG("err: %d", err);

return err;
--
1.8.5.6