Return-Path: Message-ID: <55715F47.7070900@mentor.com> Date: Fri, 5 Jun 2015 09:35:19 +0100 From: Dean Jenkins MIME-Version: 1.0 To: BlueZ development CC: Marcel Holtmann , Gustavo Padovan , Johan Hedberg , "Frkuska, Joshua" Subject: Re: [RFC] Bluetooth ERTM L2CAP deadlocks (hung tasks) due to l2cap_sock_shutdown() References: <554B8439.9020409@mentor.com> <55683B1E.7090001@mentor.com> In-Reply-To: <55683B1E.7090001@mentor.com> Content-Type: text/plain; charset="utf-8"; format=flowed List-ID: Hi Marcel, Gustavo and Johan, We are working on a fix for the Bluetooth ERTM L2CAP deadlock (hung task) issue. In particular to fix kernel 3.8, however, I have forward-ported the patches to bluetooth-next and I believe the fix is appropriate for bluetooth-next as sjanc on IRC was able to reproduce the deadlock on bluetooth-next. I have uploaded our set of proposed patches in l2cap_ertm_shutdown_deadlock_patches.tgz to https://bugzilla.kernel.org/show_bug.cgi?id=99301 We are still testing so the patches may not be the final versions but the signs are good that the solution works. Feel free to comment and collaborate. I use the nick "djenkins" in the Bluetooth IRC chat room, please connect me if you wish to discuss the issue. Thanks for your time. Regards, Dean On 29/05/15 11:10, Dean Jenkins wrote: > Hi Marcel, Gustavo and Johan, > > My original E-mail is below for completeness so that you understand > the context of this new E-mail. > > We have been successful in demonstrating a L2CAP ERTM shutdown > deadlock by using the Bluez l2test utility. We think the latest 4.x > kernel can also deadlock (not tested). > > Details of the simple test case are as follows: > > Apparatus > ====== > > Raspberry Pi (RPi) original B variant (single ARM core) running Linux > Raspian with kernel 3.18.11+ and a USB BT v4.0 dongle. bluez is v5.15 > (locally built) > Dell laptop running Linux Mageia 4 with kernel 3.14.32 and an on-board > USB BT dongle. bluez is v5.11 > > Method > ===== > > INITIAL SETTINGS: > > Ensure bluetoothd is running on the laptop and RPi so the D-Bus > services are available for the Bluez utilities to use > Run bluetoothctl on the laptop and RPi and ensure that an adaptor is > shown > > Type in the following commands on both systems: > agent on > default-agent > power on > discoverable on > scan on > > Both systems should now detect each other > > Make a note of each device's BT address. Call them and > > > On the RPi pair with the laptop (visa versa will also work): > > On the RPi do command > pair > Accept the pin code > > On the laptop do > Accept the pin code > > Pairing should now be successful. bluetoothd must continue to run so > that the linkkeys can be serviced by the agent. > > bluetoothctl can be quit on both systems > > USING L2TEST > > l2test can be used in various modes of operation controlled via > command line options. l2test can be used in a client-server > arrangement whereby 1 BT device listens for a connection (server) and > the other BT device connects (client) to the listening device. > > For this testcase: > > RPi is the l2test client (unit under test) > laptop is the l2test server > > On the laptop use the command: > l2test -i hci0 -X ertm -d > This command listens for connections with optional L2CAP ERTM > supported and dumps out the received data. > > TEST 1: NON-ERTM CONNECTION > > On the RPi use the command: > l2test -s -b 1024 -B /dev/urandom -P 4113 > This command sends 1024 bytes to the laptop via PSM 4113 used by > l2test over a L2CAP basic mode channel. > > TEST 2: ERTM CONNECTION > > On the RPi use the command: > l2test -s -b 1024 -B /dev/urandom -X ertm -P 4113 > This command sends 1024 bytes to the laptop via PSM 4113 used by > l2test over an ERTM channel > > Results > ===== > > TEST 1 > > root@raspberrypi:~# l2test -s -b 1024 -B /dev/urandom -P 4113 > 5C:F3:70:68:BA:DD > l2test[10346]: Connected to 5C:F3:70:68:BA:DD (bredr, psm 4113, scid 64) > l2test[10346]: Local device 5C:F3:70:68:BA:DA (bredr, psm 4113, scid 64) > l2test[10346]: Options [imtu 672, omtu 672, flush_to 65535, mode 0, > handle 11, class 0x000000, priority 0, rcvbuf 163840] > l2test[10346]: Sending ... > l2test[10346]: Closing channel ... > l2test[10346]: Done > l2test[10346]: Exit > > Using basic L2CAP mode, the channel sends and closes OK. > l2test terminates as expected. > > TEST 2 > > root@raspberrypi:~# l2test -s -b 1024 -B /dev/urandom -X ertm -P 4113 > 5C:F3:70:68:BA:DD > l2test[10366]: Connected to 5C:F3:70:68:BA:DD (bredr, psm 4113, scid 64) > l2test[10366]: Local device 5C:F3:70:68:BA:DA (bredr, psm 4113, scid 64) > l2test[10366]: Options [imtu 672, omtu 672, flush_to 65535, mode 3, > handle 11, class 0x000000, priority 0, rcvbuf 163840] > l2test[10366]: Sending ... > l2test[10366]: Closing channel ... > > Using L2CAP ERTM, the channel is instructed to close but hangs. > l2test fails to naturally terminate. > > From Bluez 5.15 tools/l2test.c: > static void send_mode(int sk) > { > do_send(sk); > > syslog(LOG_INFO, "Closing channel ..."); > if (shutdown(sk, SHUT_RDWR) < 0) > syslog(LOG_INFO, "Close failed: %m"); > else > syslog(LOG_INFO, "Done"); > } > > The -s send l2test command line option calls send_mode() to send the > data via do_send() and then closes the channel with the call to > shutdown(). > do_send() is probably not checking that the data was successfully sent > before returning meaning potentially not all the I frames were ACK'ed. > > The results shows that shutdown() fails to return as no "Done" textual > message is seen. > > When the RPi is left in this condition for up to 10 minutes (need to > wait longer than 2 times the hung task detector period plus L2CAP > timer expiry time), dmesg shows: > > [ 601.116119] INFO: task kworker/0:1:19 blocked for more than 120 > seconds. > [ 601.116156] Not tainted 3.18.11+ #781 > [ 601.116169] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 601.116183] kworker/0:1 D c054cbcc 0 19 2 0x00000000 > [ 601.116537] Workqueue: events l2cap_retrans_timeout [bluetooth] > [ 601.116685] [] (__schedule) from [] > (schedule+0x40/0x8c) > [ 601.116727] [] (schedule) from [] > (schedule_preempt_disabled+0x30/0x40) > [ 601.116812] [] (schedule_preempt_disabled) from > [] (__mutex_lock_slowpath+0xb8/0x18c) > [ 601.116856] [] (__mutex_lock_slowpath) from [] > (mutex_lock+0x38/0x3c) > [ 601.117085] [] (mutex_lock) from [] > (l2cap_retrans_timeout+0x24/0x5c [bluetooth]) > [ 601.117371] [] (l2cap_retrans_timeout [bluetooth]) from > [] (process_one_work+0x13c/0x444) > [ 601.117410] [] (process_one_work) from [] > (worker_thread+0x190/0x580) > [ 601.117445] [] (worker_thread) from [] > (kthread+0xd8/0xf4) > [ 601.117488] [] (kthread) from [] > (ret_from_fork+0x14/0x20) > [ 601.117569] INFO: task kworker/u3:2:327 blocked for more than 120 > seconds. > [ 601.117588] Not tainted 3.18.11+ #781 > [ 601.117601] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 601.117614] kworker/u3:2 D c054cbcc 0 327 2 0x00000000 > [ 601.117806] Workqueue: hci0 hci_rx_work [bluetooth] > [ 601.117941] [] (__schedule) from [] > (schedule+0x40/0x8c) > [ 601.117981] [] (schedule) from [] > (schedule_preempt_disabled+0x30/0x40) > [ 601.118056] [] (schedule_preempt_disabled) from > [] (__mutex_lock_slowpath+0xb8/0x18c) > [ 601.118092] [] (__mutex_lock_slowpath) from [] > (mutex_lock+0x38/0x3c) > [ 601.118307] [] (mutex_lock) from [] > (l2cap_get_chan_by_scid+0x28/0x98 [bluetooth]) > [ 601.118696] [] (l2cap_get_chan_by_scid [bluetooth]) from > [] (l2cap_recv_frame+0xa4/0x2cec [bluetooth]) > [ 601.119146] [] (l2cap_recv_frame [bluetooth]) from > [] (l2cap_recv_acldata+0x214/0x22c [bluetooth]) > [ 601.119559] [] (l2cap_recv_acldata [bluetooth]) from > [] (hci_rx_work+0x180/0x288 [bluetooth]) > [ 601.119801] [] (hci_rx_work [bluetooth]) from > [] (process_one_work+0x13c/0x444) > [ 601.119839] [] (process_one_work) from [] > (worker_thread+0x190/0x580) > [ 601.119883] [] (worker_thread) from [] > (kthread+0xd8/0xf4) > [ 601.119919] [] (kthread) from [] > (ret_from_fork+0x14/0x20) > > Conclusions > ======== > > The hung task failure is easy to reproduce with the l2test utility. It > could be argued that the do_send() function should only return when > all the data has been successfully acknowledged but on the other hand, > it is ideal to generate the right conditions for the failure to be > triggered. > > Probably, the low resources on the RPi help to make the failure more > likely than using a PC for example. > > hcidump shows that the S RR frame from the laptop comes 320ms after > the last I frame from the RPi which generates a window of opportunity > for the shutdown() call to execute l2cap_sock_shutdown() before the S > RR frame has been processed. > > It can be seen that the root cause function __l2cap_wait_ack() is not > reported by the hung task detector because the function is running in > an infinite loop but periodically waiting (sleeping) for 200ms so is > not detectable by the hung task detector. The hung task detector is > triggered when the L2CAP recovery timers expire > (l2cap_retrans_timeout) and gets blocked, or when a new frame is > received (l2cap_get_chan_by_scid) and processing gets blocked. > > We looked at a range of kernels from 3.8 to 4.0 which we think can > suffer the deadlock. > > We managed to get a deadlock on Fedora using kernel 4.0.4 but it was a > different test case with an iPhone. > > Please can we work together to find a solution to the deadlock ? > > Note that our interest is in fixing kernels 3.8 and 3.14 so hopefully > any fix to the latest 4.x kernel can be back-ported to the kernels > that we are using. > > Many thanks for your time. > > Regards, > Dean > > On 07/05/15 16:26, Dean Jenkins wrote: >> Hi Marcel, Gustavo and Johan, >> >> Please see the end of the E-mail for a workaround, do you agree with >> the workaround, any comments ? >> >> Description of L2CAP ERTM deadlock >> ====================== >> >> Using kernel 3.8 on an ARM i.MX6 based platform, we have identified >> some deadlock situations involving L2CAP ERTM. We believe kernel >> v4.0.1 can deadlock as well (not tested it) as the affected code is >> similar to kernel 3.8. >> >> With a L2CAP ERTM channel connected, and l2cap_sock_shutdown() is >> called, then __l2cap_wait_ack() is called which is supposed to wait >> for all sent I frames to be acknowledged. >> >> l2cap_sock_shutdown() takes all 3 locks namely: >> >> mutex_lock(&conn->chan_lock); >> l2cap_chan_lock(chan); >> lock_sock(sk); >> >> __l2cap_wait_ack() is implemented as follows: >> >> int __l2cap_wait_ack(struct sock *sk) >> { >> struct l2cap_chan *chan = l2cap_pi(sk)->chan; >> DECLARE_WAITQUEUE(wait, current); >> int err = 0; >> int timeo = HZ/5; >> >> add_wait_queue(sk_sleep(sk), &wait); >> set_current_state(TASK_INTERRUPTIBLE); >> while (chan->unacked_frames > 0 && chan->conn) { >> if (!timeo) >> timeo = HZ/5; >> >> if (signal_pending(current)) { >> err = sock_intr_errno(timeo); >> break; >> } >> >> release_sock(sk); >> timeo = schedule_timeout(timeo); >> lock_sock(sk); >> set_current_state(TASK_INTERRUPTIBLE); >> >> err = sock_error(sk); >> if (err) >> break; >> } >> set_current_state(TASK_RUNNING); >> remove_wait_queue(sk_sleep(sk), &wait); >> return err; >> } >> >> Analysis shows that the while loop checks chan->unacked_frames > 0 >> and chan->conn is !NULL. So when both are true the loop waits >> (sleeps) for 200ms using schedule_timeout() and then rechecks the >> while loop conditions before waiting again. >> >> Unfortunately, during this waiting (sleeping) period the following >> locks are still held: >> mutex_lock(&conn->chan_lock); >> l2cap_chan_lock(chan); >> >> These held locks prevent l2cap_recv_frame() from processing the >> received ACKs for the sent I frames in ERTM. This means >> chan->unacked_frames > 0 remains true. >> >> chan->conn cannot be set to NULL by the channel being closed by other >> means as the held locks prevent the state being updated. >> >> Therefore a deadlock occurs because __l2cap_wait_ack() gets stuck in >> an infinite loop which periodically waits (sleeps). The kernel's hung >> task detector is not able to detect this "hung" thread because the >> thread periodically runs. This means l2cap_sock_shutdown() does not >> return. With debug, we have proven that the infinite loop occurs. >> >> Here is an example hung task log and LOCKDEP output of a L2CAP ERTM >> deadlock scenario: (we do have some other different traces as well so >> more than 1 failure scenario exists) >> >> [ 360.497912] INFO: task kworker/u:0H:7 blocked for more than 180 >> seconds. >> [ 360.497931] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> disables this message. >> [ 360.497944] kworker/u:0H D 8046bb28 0 7 2 0x00000000 >> [ 360.498051] Backtrace: >> [ 360.498145] [<8046b624>] (__schedule+0x0/0x790) from [<8046be3c>] >> (schedule+0x88/0x8c) >> [ 360.498231] [<8046bdb4>] (schedule+0x0/0x8c) from [<8046c1c8>] >> (schedule_preempt_disabled+0x18/0x24) >> [ 360.498318] [<8046c1b0>] (schedule_preempt_disabled+0x0/0x24) from >> [<80469b70>] (mutex_lock_nested+0x250/0x3e8) >> [ 360.498495] [<80469920>] (mutex_lock_nested+0x0/0x3e8) from >> [<7f44d2a8>] (l2cap_get_chan_by_scid+0x24/0x54 [bluetooth]) >> [ 360.498696] [<7f44d284>] (l2cap_get_chan_by_scid+0x0/0x54 >> [bluetooth]) from [<7f453f68>] (l2cap_recv_frame+0x1918/0x1cc8 >> [bluetooth]) >> [ 360.498779] r6:0000006b r5:9773ef00 r4:96bfdc00 r3:0000003f >> [ 360.498919] [<7f452650>] (l2cap_recv_frame+0x0/0x1cc8 [bluetooth]) >> from [<7f454e3c>] (l2cap_recv_acldata+0xc0/0x1f8 [bluetooth]) >> [ 360.499088] [<7f454d7c>] (l2cap_recv_acldata+0x0/0x1f8 >> [bluetooth]) from [<7f437c74>] (hci_rx_work+0x1b4/0x2ec [bluetooth]) >> [ 360.499176] r6:9759a000 r5:9759a77c r4:9773ef00 r3:00000000 >> [ 360.499259] [<7f437ac0>] (hci_rx_work+0x0/0x2ec [bluetooth]) from >> [<8003ee1c>] (process_one_work+0x344/0x540) >> [ 360.499352] [<8003ead8>] (process_one_work+0x0/0x540) from >> [<8003f2e4>] (worker_thread+0x298/0x3ec) >> [ 360.499449] [<8003f04c>] (worker_thread+0x0/0x3ec) from >> [<80044a40>] (kthread+0xb4/0xc0) >> [ 360.499543] [<8004498c>] (kthread+0x0/0xc0) from [<8000da30>] >> (ret_from_fork+0x14/0x24) >> [ 360.499622] r7:00000000 r6:00000000 r5:8004498c r4:a40a3e34 >> [ 360.499880] >> Showing all locks held in the system: >> [ 360.499909] 3 locks held by kworker/u:0H/7: >> [ 360.499921] #0: (hdev->name){++++..}, at: [<8003ed28>] >> process_one_work+0x250/0x540 >> [ 360.499968] #1: ((&hdev->rx_work)){+.+...}, at: [<8003ed28>] >> process_one_work+0x250/0x540 >> [ 360.500009] #2: (&conn->chan_lock){+.+...}, at: [<7f44d2a8>] >> l2cap_get_chan_by_scid+0x24/0x54 [bluetooth] >> >> [ 360.500256] 3 locks held by kworker/1:2/64: >> [ 360.500266] #0: (events){.+.+..}, at: [<8003ed28>] >> process_one_work+0x250/0x540 >> [ 360.500307] #1: ((&(&chan->retrans_timer)->work)){+.+...}, at: >> [<8003ed28>] process_one_work+0x250/0x540 >> [ 360.500346] #2: (&chan->lock){+.+...}, at: [<7f450414>] >> l2cap_retrans_timeout+0x24/0x58 [bluetooth] >> >> [ 360.502066] 2 locks held by /1433: >> [ 360.502076] #0: (&conn->chan_lock){+.+...}, at: [<7f455378>] >> l2cap_sock_shutdown+0x34/0x130 [bluetooth] >> [ 360.502179] #1: (&chan->lock){+.+...}, at: [<7f455388>] >> l2cap_sock_shutdown+0x44/0x130 [bluetooth] >> >> >> It can be seen that l2cap_get_chan_by_scid() is blocked so that the >> ACK cannot be processed and l2cap_retrans_timeout() has timed out in >> an attempt to retransmit the I frame but is also blocked. The >> Bluetooth stack is now paralysed. >> >> This scenario will fail every time. The locking seems to be flawed in >> l2cap_sock_shutdown() for L2CAP ERTM. Did it ever work ? >> >> >> Workaround >> ======== >> >> Here is our workaround patch (based on kernel 3.8) which releases all >> the locks before waiting (sleeping): >> >> diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c >> index a53f47a..bf49066 100644 >> --- a/net/bluetooth/l2cap_core.c >> +++ b/net/bluetooth/l2cap_core.c >> @@ -1766,7 +1766,13 @@ int __l2cap_wait_ack(struct sock *sk) >> } >> >> release_sock(sk); >> + l2cap_chan_unlock(chan); >> + if (chan->conn) >> + mutex_unlock(&chan->conn->chan_lock); >> timeo = schedule_timeout(timeo); >> + if (chan->conn) >> + mutex_lock(&chan->conn->chan_lock); >> + l2cap_chan_lock(chan); >> lock_sock(sk); >> set_current_state(TASK_INTERRUPTIBLE); >> >> diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c >> index 657b3a4..bd0d1ae 100644 >> --- a/net/bluetooth/l2cap_sock.c >> +++ b/net/bluetooth/l2cap_sock.c >> @@ -867,8 +867,10 @@ static int l2cap_sock_shutdown(struct socket >> *sock, int how) >> lock_sock(sk); >> >> if (!sk->sk_shutdown) { >> - if (chan->mode == L2CAP_MODE_ERTM) >> + if (chan->mode == L2CAP_MODE_ERTM) { >> err = __l2cap_wait_ack(sk); >> + conn = chan->conn; >> + } >> >> sk->sk_shutdown = SHUTDOWN_MASK; >> >> If you think this is the right approach then we will formally create >> a patch and send it to the mailing list for review. >> >> Our initial tests show that the deadlock no longer occurs with the >> patch applied, but we are worried about side effects. >> >> We have some doubts: >> >> a) What are the locks protecting ? Do all 3 locks need to be taken ? >> >> b) Can a better architectural solution be created that avoids the >> locks ? >> >> Perhaps arm the channel to disconnect, then after all ACKs have been >> received, call a callback to wake up l2cap_sock_shutdown ? >> >> c) What stops new I frames being sent during the waiting period so >> chan->unacked_frames increases ? >> >> Thanks for your time. >> >> Regards, >> Dean >> > -- Dean Jenkins Embedded Software Engineer Linux Transportation Solutions Mentor Embedded Software Division Mentor Graphics (UK) Ltd.