2010-01-22 10:12:58

by Liang Bao

[permalink] [raw]
Subject: Kernel panic when handing Motorola S305 headset

Hello,

Recently we found an almost 100% reproducible issue when using
Motorola S305 stereo headset and Ubuntu 9.10. The Ubuntu system has
auto-update turned on and keeps up with the latest stable release. At
this point, the system is using:

bash$dpkg -l | grep bluez
ii bluez 4.51-0ubuntu2
Bluetooth tools and daemons
ii bluez-alsa 4.51-0ubuntu2
Bluetooth audio support
ii bluez-compat 4.51-0ubuntu2
BlueZ 3.x compatibility binaries
ii bluez-cups 4.51-0ubuntu2
Bluetooth printer driver for CUPS
rc bluez-gnome 1.8-0ubuntu5
Bluetooth utilities for GNOME
ii bluez-gstreamer 4.51-0ubuntu2
Bluetooth GStreamer support
ii bluez-hcidump 1.42-1build1
Analyses Bluetooth HCI packets
ii bluez-utils 4.51-0ubuntu2
Transitional package

bash$uname -a
Linux wahoo 2.6.31-17-generic #54-Ubuntu SMP Thu Dec 10 16:20:31 UTC
2009 i686 GNU/Linux

The pattern to reproduce the issue is:
1. Pair the S305 headset from Ubuntu using the bluetooth icon in the
task bar, "Set up new device..."
2. In the "Bluetooth Preferences" screen, hit "Remove" button to
remove the paired S305.
3. Power off and then power on S305.
4. The S305 will try to connect and get a negative link key reply.
You'll see the PIN input dialog flash away quickly because after the
negative reply, the LMP link is detached.
5. After step 4, the S305 willl request again and this time you'll be
given the dialog to input the PIN. Put '0000' and you'll see the
desktop frozen. Keyboard, mouse, network ... everything is down.

This phenomenon is quite similar with what Lan Zhu described in last
September(see the end of this mail). Although Ubuntu does't record the
panic dump by default, we did some analysis based on the hcidump log
and the sniffer log. Here's our thought.

The S305 is a stereo headset which supports both A2DP and HFP. When I
power cycle it in step 3, it will try to connect but as the Ubuntu
system has removed it from paired list, connection will fail as
mentioned in step 4 above(see the packets at line 74-115 in the
attached log).

After that, the S305 will try again to initiate a pair, which can be
reflected by messages from line 116 to 164. You can find pair succeeds
and encryption starts.

Then comes our weird thing. The S305 seems remember the capability of
the Ubuntu system and directly try to connect AVDTP and SDP almost at
the same time(line 165,166,172,173). For both L2CAP connection
request, the Ubuntu system responds with "connection pending"(167-169
and 174-176). Shortly the channel for SDP is ready(line 190-192) and
an "authentication pending" response is sent to the headset (line
193-195) for the pending AVDTP connection request. After that the
desktop freezes and network interfaces(I mean ssh here) is down, which
I believe kernel panic happens at this point. The only thing I can do
is to press the power button on the PC case.

With the Droid+S305 problemthat Lan Zhu mentioned in his mail, I
checked into the kernel code carefully. I admit that I can't
understand the kernel part well and therefore below is just my guess.
I'd like to share to see if it can help. The two L2CAP connection
arrive in a quite short interval. These two requests are put into a
link list by bt_accept_enqueue() and l2cap_do_start() will send out
L2CAP_INFO_REQ for both. If the SDP connection is accepted first,
bt_accept_dequeue() will remove it from the link list. This probably
caused the AVDTP entry has its parent as NULL. When AVDTP connection
is accepted in a while, l2cap_conn_start() will run into problem here
becuse the AVDTP is deferred too:

if (bt_sk(sk)->defer_setup) {
struct sock *parent = bt_sk(sk)->parent;
rsp.result = cpu_to_le16(L2CAP_CR_PEND);
rsp.status = cpu_to_le16(L2CAP_CS_AUTHOR_PEND);
parent->sk_data_ready(parent, 0);

}

Again, this is still my guess and could be in wrong direction.
However, the issue is quite easy to be reproduced with the combination
of S305 and Ubuntu 9.10(at least the versions I mentioned above). Is
this also a kind of race condition happening with those weird devices
try to make L2CAP connections simultaneously instead of one by one?
Could someone suggest a fix to prevent the kernel panic in this
scenario?

BTW, the SDP record of my Ubuntu system is listed here.

bash$sdptool records local
Service Name: BlueZ PANU service
Service Description: BlueZ PAN service
Service RecHandle: 0x10000
Service Class ID List:
"PAN User" (0x1115)
Protocol Descriptor List:
"L2CAP" (0x0100)
PSM: 15
"BNEP" (0x000f)
Version: 0x0100
SEQ16: 800 806
Language Base Attr List:
code_ISO639: 0x656e
encoding: 0x6a
base_offset: 0x100
Profile Descriptor List:
"PAN User" (0x1115)
Version: 0x0100

Service Name: BlueZ GN service
Service Description: BlueZ PAN service
Service RecHandle: 0x10001
Service Class ID List:
"PAN Group Network" (0x1117)
Protocol Descriptor List:
"L2CAP" (0x0100)
PSM: 15
"BNEP" (0x000f)
Version: 0x0100
SEQ16: 800 806
Language Base Attr List:
code_ISO639: 0x656e
encoding: 0x6a
base_offset: 0x100
Profile Descriptor List:
"PAN Group Network" (0x1117)
Version: 0x0100

Service Name: BlueZ NAP service
Service Description: BlueZ PAN service
Service RecHandle: 0x10002
Service Class ID List:
"Network Access Point" (0x1116)
Protocol Descriptor List:
"L2CAP" (0x0100)
PSM: 15
"BNEP" (0x000f)
Version: 0x0100
SEQ16: 800 806
Language Base Attr List:
code_ISO639: 0x656e
encoding: 0x6a
base_offset: 0x100
Profile Descriptor List:
"Network Access Point" (0x1116)
Version: 0x0100

Service Name: Headset Audio Gateway
Service RecHandle: 0x10003
Service Class ID List:
"Headset Audio Gateway" (0x1112)
"Generic Audio" (0x1203)
Protocol Descriptor List:
"L2CAP" (0x0100)
"RFCOMM" (0x0003)
Channel: 12
Profile Descriptor List:
"Headset" (0x1108)
Version: 0x0102

Service Name: Hands-Free Audio Gateway
Service RecHandle: 0x10004
Service Class ID List:
"Handsfree Audio Gateway" (0x111f)
"Generic Audio" (0x1203)
Protocol Descriptor List:
"L2CAP" (0x0100)
"RFCOMM" (0x0003)
Channel: 13
Profile Descriptor List:
"Handsfree" (0x111e)
Version: 0x0105

Service Name: Audio Source
Service RecHandle: 0x10005
Service Class ID List:
"Audio Source" (0x110a)
Protocol Descriptor List:
"L2CAP" (0x0100)
PSM: 25
"AVDTP" (0x0019)
uint16: 0x100
Profile Descriptor List:
"Advanced Audio" (0x110d)
Version: 0x0100

Service Name: AVRCP TG
Service RecHandle: 0x10006
Service Class ID List:
"AV Remote Target" (0x110c)
Protocol Descriptor List:
"L2CAP" (0x0100)
PSM: 23
"AVCTP" (0x0017)
uint16: 0x100
Profile Descriptor List:
"AV Remote" (0x110e)
Version: 0x0100

Service Name: AVRCP CT
Service RecHandle: 0x10007
Service Class ID List:
"AV Remote" (0x110e)
Protocol Descriptor List:
"L2CAP" (0x0100)
PSM: 23
"AVCTP" (0x0017)
uint16: 0x100
Profile Descriptor List:
"AV Remote" (0x110e)
Version: 0x0100

From: Lan Zhu <zhu.lan.cn@...>
Subject: Re: null pointer error in bluez kernel
Newsgroups: gmane.linux.bluez.kernel
Date: 2009-10-13 02:13:18 GMT (14 weeks, 3 days, 7 hours and 50 minutes ago)

Hi Marcel,

>
> Reproduce steps:
> 1. Pair and connect with Motorola S305 headset.
> 2. Disconnect and unpair with the headset.
> 3. Turn off and then turn on the headset. The headset will auto pair with phone.
> 4. Input PIN code "0000" on the phone to complete the incoming pairing.
>
> Repeat step 2-4 for many times, then kernel panic may happen right
> after step 4.
>
> From the kernel log, I found if the bt_accept_unlink() is called
> before l2cap_conn_start(), then panic will happen because in the
> bt_accept_unlink() function it set parent to NULL.
>
> Below is the call order when the result is successful. We can see the
> parent is not NULL.
>
> [ 190.162475] bt_accept_enqueue: parent ccda5298, sk cdb68920
> [ 190.170104] bt_accept_enqueue: parent ccda5d10, sk cdf5cd90
> [ 190.191223] l2cap_conn_start: conn cd14a320
> [ 190.218719] l2cap_conn_start: conn cd14a320
> [ 190.223480] l2cap_conn_start: @@@ in l2cap_conn_start --- sk =
> cdb68920, parent = ccda5298
> [ 190.235565] bt_accept_unlink: sk cdb68920 state 6
>
> Below is the call order when the result is kernel panic.
> bt_accept_unlink is called first, then we can see the parent is NULL.
>
> [ 238.188812] bt_accept_enqueue: parent ccda5298, sk ccf60040
> [ 238.196350] bt_accept_enqueue: parent ccda5d10, sk cdf5c960
> [ 238.217590] l2cap_conn_start: conn cd14a848
> [ 238.223449] bt_accept_unlink: sk ccf60040 state 6
> [ 238.229400] l2cap_sock_accept: new socket ccf60040
> [ 238.245086] l2cap_conn_start: conn cd14a848
> [ 238.249725] l2cap_conn_start: @@@ in l2cap_conn_start --- sk =
> ccf60040, parent = (null)
> [ 238.258636] Unable to handle kernel NULL pointer dereference at
> virtual address 00000120
> [ 238.267456] pgd = cdb34000
> [ 238.270446] [00000120] *pgd=8db32031, *pte=00000000, *ppte=00000000
> [ 238.277740] Internal error: Oops: 17 [#1] PREEMPT
>


Attachments:
s305-panics-with-ubuntu.hcidump.log (10.61 kB)

2010-03-15 17:27:29

by Jaikumar Ganesh

[permalink] [raw]
Subject: Re: Kernel panic when handing Motorola S305 headset

Hi Marcel / Gustavo:

On Mon, Mar 1, 2010 at 2:54 AM, Liang Bao <[email protected]> wrote:

> I'd like to continue the previous thread on that Motorola S305 causes
> kernel panic because I did find some clue here. Sorry for misleading
> guess one month ago if any.
>
> Recap the problem here so that you don't to read the first long post.
> The pattern to reproduce the issue is:
> 1. Pair the S305 headset from the phone or the PC( I am using a Ubuntu)
> 2. Remove pairing on the phone or PC
> 3. Power off and then power on S305.
> 4. The S305 will try to connect and since link key removed on this
> side it will try to pair. Input 0000.
> 5. Kernel panic happens. This can be observed on kernel version
> 2.6.29(on the Droid phone, yes, it's a modified version),
> 2.6.31-19-generic on a Ubuntu and a pretty latest 2.6.33-020633rc8
> from Ubuntu official RC release.
>
> The exact kernel crash point is
> if (l2cap_check_security(sk)) {
> if (bt_sk(sk)->defer_setup) {
> struct sock *parent = bt_sk(sk)->parent;
> rsp.result = cpu_to_le16(L2CAP_CR_PEND);
> rsp.status = cpu_to_le16(L2CAP_CS_AUTHOR_PEND);
> >>> parent->sk_data_ready(parent, 0)
> } else {
>
> After tracing the issue for a couple of weeks, I find the difference
> between a normal flow and the panic one. If the user space process
> accepts the L2CAP connection request before L2CAP_INFO_RSP received,
> the following calls will be carried out:
>
> l2cap_sock_accept-> bt_accept_dequeue->bt_accept_unlink(in the branch
> bt_sk(parent)->defer_setup)-> set bt_sk(sk)->parent = NULL. Later when
> L2CAP_INFO_RSP arrives, the l2cap_conn_start() will try to call the
> marked line above and de-referring NULL happen.
>
> To fix this, shall we consider checking if a pending socket can be
> accepted in bt_accept_dequeue() prior to a pending L2CAP_INFO_REQ
> responded? For example, adding a check to BT_CONNECT2 in
> af_bluetooth.c.
>
> 215 if (sk->sk_state == BT_CONNECTED || !newsock ||
> 216 ( bt_sk(parent)->defer_setup &&
> (sk->sk_state != BT_CONNECT2))) {
>
> Again, I am not sure if this will bring a side-effect. Please advise
> the most appropriate way. Thanks.
>
> p.s: I attached partial trace files for those who're interested to the
> traces.
>


Can you ACK or NACK the analysis ?

Thanks
Jaikumar


Attachments:
s305-good-bad-trace.txt (1.85 kB)

2010-03-01 09:54:15

by Liang Bao

[permalink] [raw]
Subject: Re: Kernel panic when handing Motorola S305 headset

I'd like to continue the previous thread on that Motorola S305 causes
kernel panic because I did find some clue here. Sorry for misleading
guess one month ago if any.

Recap the problem here so that you don't to read the first long post.
The pattern to reproduce the issue is:
1. Pair the S305 headset from the phone or the PC( I am using a Ubuntu)
2. Remove pairing on the phone or PC
3. Power off and then power on S305.
4. The S305 will try to connect and since link key removed on this
side it will try to pair. Input 0000.
5. Kernel panic happens. This can be observed on kernel version
2.6.29(on the Droid phone, yes, it's a modified version),
2.6.31-19-generic on a Ubuntu and a pretty latest 2.6.33-020633rc8
from Ubuntu official RC release.

The exact kernel crash point is
if (l2cap_check_security(sk)) {
if (bt_sk(sk)->defer_setup) {
struct sock *parent = bt_sk(sk)->parent;
rsp.result = cpu_to_le16(L2CAP_CR_PEND);
rsp.status = cpu_to_le16(L2CAP_CS_AUTHOR_PEND);
>>> parent->sk_data_ready(parent, 0)
} else {

After tracing the issue for a couple of weeks, I find the difference
between a normal flow and the panic one. If the user space process
accepts the L2CAP connection request before L2CAP_INFO_RSP received,
the following calls will be carried out:

l2cap_sock_accept-> bt_accept_dequeue->bt_accept_unlink(in the branch
bt_sk(parent)->defer_setup)-> set bt_sk(sk)->parent = NULL. Later when
L2CAP_INFO_RSP arrives, the l2cap_conn_start() will try to call the
marked line above and de-referring NULL happen.

To fix this, shall we consider checking if a pending socket can be
accepted in bt_accept_dequeue() prior to a pending L2CAP_INFO_REQ
responded? For example, adding a check to BT_CONNECT2 in
af_bluetooth.c.

215 if (sk->sk_state == BT_CONNECTED || !newsock ||
216 ( bt_sk(parent)->defer_setup &&
(sk->sk_state != BT_CONNECT2))) {

Again, I am not sure if this will bring a side-effect. Please advise
the most appropriate way. Thanks.

p.s: I attached partial trace files for those who're interested to the traces.


Attachments:
s305-good-bad-trace.txt (1.85 kB)

2010-04-08 00:15:56

by Nick Pelly

[permalink] [raw]
Subject: Re: Kernel panic when handing Motorola S305 headset

On Mon, Mar 1, 2010 at 1:54 AM, Liang Bao <[email protected]> wrote:
> I'd like to continue the previous thread on that Motorola S305 causes
> kernel panic because I did find some clue here. Sorry for misleading
> guess one month ago if any.
>
> Recap the problem here so that you don't to read the first long post.
> The pattern to reproduce the issue is:
> 1. Pair the S305 headset from the phone or the PC( I am using a Ubuntu)
> 2. Remove pairing on the phone or PC
> 3. Power off and then power on S305.
> 4. The S305 will try to connect and since link key removed on this
> side it will try to pair. Input 0000.
> 5. Kernel panic happens. This can be observed on kernel version
> 2.6.29(on the Droid phone, yes, it's a modified version),
> 2.6.31-19-generic on a Ubuntu and a pretty latest 2.6.33-020633rc8
> from Ubuntu official RC release.
>
> The exact kernel crash point is
> =A0 =A0 =A0 =A0 =A0 =A0 if (l2cap_check_security(sk)) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (bt_sk(sk)->defer_setup) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct sock *parent =3D bt_sk(=
sk)->parent;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0rsp.result =3D cpu_to_le16(L2C=
AP_CR_PEND);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0rsp.status =3D cpu_to_le16(L2C=
AP_CS_AUTHOR_PEND);
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 parent->sk_data_ready(parent, 0)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} else {
>
> After tracing the issue for a couple of weeks, I find the difference
> between a normal flow and the panic one. If the user space process
> accepts the L2CAP connection request before L2CAP_INFO_RSP received,
> the following calls will be carried out:
>
> l2cap_sock_accept-> bt_accept_dequeue->bt_accept_unlink(in the branch
> bt_sk(parent)->defer_setup)-> set bt_sk(sk)->parent =3D NULL. Later when
> L2CAP_INFO_RSP arrives, the l2cap_conn_start() will try to call the
> marked line above and de-referring NULL happen.
>
> To fix this, shall we consider checking if a pending socket can be
> accepted in bt_accept_dequeue() prior to a pending L2CAP_INFO_REQ
> responded? For example, =A0adding a check to BT_CONNECT2 in
> af_bluetooth.c.
>
> 215 =A0 =A0 =A0 =A0 if (sk->sk_state =3D=3D BT_CONNECTED || !newsock ||
> 216 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ( bt_sk(parent)->defe=
r_setup &&
> (sk->sk_state !=3D BT_CONNECT2))) {
>
> Again, I am not sure if this will bring a side-effect. Please advise
> the most appropriate way. Thanks.
>
> p.s: I attached partial trace files for those who're interested to the tr=
aces.
>

We can reproduce this issue.

There is nothing preventing an l2cap socket with deferred setup from
accepting an l2cap connection before the info response packet has come
in. This causes the null pointer panic when the info response
eventually arrives.

I'm not sure the best way to fix this. Ideally we'd check
L2CAP_INFO_FEAT_MASK_REQ_DONE, but that is not available in
af_bluetooth.c:bt_accept_dequeue(). I think the problem is that
BT_CONNECT2 - which is available in af_bluetooth.c is used for both
deferred setup and for the case where we are waiting for the info
response.

Marcel, some advice on the best way to proceed here would be helpful.

Nick