From: Doron Keren <[email protected]>
The patch fixes kernel panic which is due to race condition
between the setup of incomming connection and clean-up of the
dead one. Observed in the following case: attached HID device
disconnects unexpectedly (without performing ACL disconnect ),
the device tries to connect again before the ACL link time-out
fires, this translates to the HCI_DISCONNECT, HCI_CONNECT_REQ
events on the same handle, since HCI_DISCONNECT trigers the clean
up of the HID device and handled in different context, the
linking/unlinking connection object to sysfs, may mess up.
Signed-off-by: Ilia Kolominsky <[email protected]>
---
net/bluetooth/hci_sysfs.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/net/bluetooth/hci_sysfs.c b/net/bluetooth/hci_sysfs.c
index a6c3aa8..5967d63 100644
--- a/net/bluetooth/hci_sysfs.c
+++ b/net/bluetooth/hci_sysfs.c
@@ -9,6 +9,7 @@
#include <net/bluetooth/bluetooth.h>
#include <net/bluetooth/hci_core.h>
+static int acl_conn_index = 0;
static struct class *bt_class;
struct dentry *bt_debugfs;
@@ -91,7 +92,8 @@ static void add_conn(struct work_struct *work)
struct hci_conn *conn = container_of(work, struct hci_conn, work_add);
struct hci_dev *hdev = conn->hdev;
- dev_set_name(&conn->dev, "%s:%d", hdev->name, conn->handle);
+ acl_conn_index++;
+ dev_set_name(&conn->dev, "%s:%d:%d", hdev->name, conn->handle, acl_conn_index);
dev_set_drvdata(&conn->dev, conn);
--
1.7.4.1
On Wed, Aug 24, 2011 at 11:51 PM, Peter Hurley <[email protected]> wrote:
> Hi David,
>
>> What are possible reasons why an l2cap connection can be available
>> without an underlying ACL connection?
>
> Well, if the ACL connection is gone, the l2cap channels should be gone
> as well, but the matching l2cap sockets stay around because of the
> references on them. But the sockets should be in state BT_CLOSED, which
> is tested just prior to calling hidp_add_connection.
If I disconnect my HID device, the l2cap sockets stay alive for 20
seconds! Even though the baseband connection is closed. I think this
bug is not related to the hid/sysfs bug here, but may be also
interesting.
> Possible reasons why the apparent contradiction:
> 1. Neither socket is locked by the hidp driver AFAICT. If true, then the
> sock state (and by extension, the l2cap channels and hci connection list
> could change at any time). l2cap_conn_del acquires the socket lock prior
> to l2cap_chan_del.
I guess the problem is that the l2cap channels are BT_CONNECTED when
HIDP checks them but while hidp_add_conenction is called the l2cap
channels close and the ACL connection is removed. hidp_add_connection
then fails because it can't find the ACL connection. This is fixed by
my patch but there is still a race condition. The ACL lookup does not
increase reference count of the connection and hence it may disappear
while we hold a pointer to it.
> 2. HCI connection list (conn_hash) is corrupted. hidp_get_device looks
> smp-unsafe to me. I think it needs to be acquiring device lock via
> hci_dev_lock_bh.
> 3. Some other unknown race.
>
> That's why a debug log would help. It would help establish when
> l2cap_conn_del and hci_conn_del are called relative to
> hidp_add_connection, and what the other pre-conditions are.
It is quite hard to reproduce this bug, but I tried again a couple of
times and got a full debug log now. However, after looking at it I
noticed that I did not apply the patch from Doron Keren so this log is
triggered by the sysfs-duplicate bug and not by the one I discovered.
Last few lines with OOPS message from the kernel log:
https://gist.github.com/1171138
Full kernel.log for the session (including the log above):
http://dl.dropbox.com/u/1475019/kernel.txt
>
> Regards,
> Peter Hurley
>
I will try again with this patch applied, but I think both bugs are
related to each other.
Regards
David
Hi David,
On Wed, 2011-08-24 at 15:27 -0400, David Herrmann wrote:
> Hi Peter
>=20
> On Tue, Aug 23, 2011 at 5:47 PM, Peter Hurley <[email protected]> =
wrote:
> > David,
> > I know you already posted a patch for this (which looks ok for fixing
> > the oops specifically), but it'd still be helpful to see a debug kernel
> > log that led up to that. There's only a couple of reasons -- all bad --
> > why the hci connection could not be found in the connection list while
> > the ctrl & intr sockets are connected (which are tested just prior to
> > hidp_add_connection).
>=20
> Sorry for the delay. I currently have no other machine to run a debug
> kernel and I really don't like provoking oopses on my working machine.
> However, the next days I will setup a machine and try to send a full
> trace.
> Are there any useful config symbols for bluetooth debug? Despite the
> standard kernel-wide debug symbols.
I understand about trying to invoke oops. A one-line
BT_DBG("");
in the NULL parent error pathway of hidp_setup_hid with your patch would
be just as good (because it's the condition I'm trying to catch).
Don't need (or want) kernel-wide output. If your kernel is configured
for dynamic debug output, this is as easy as:
$ sudo su
$ echo -n 'module bluetooth +p'
> /sys/kernel/debug/dynamic_debug/control
$ echo -n 'module hidp +p'
> /sys/kernel/debug/dynamic_debug/control
Run test, then:
$ echo -n 'module bluetooth -p'
> /sys/kernel/debug/dynamic_debug/control
$ echo -n 'module hidp -p'
> /sys/kernel/debug/dynamic_debug/control
$ exit
Dynamic debug output is enabled in the Kconfig here:
Kernel hacking / Enable dynamic printk() support =3D> Y
> What are possible reasons why an l2cap connection can be available
> without an underlying ACL connection?
Well, if the ACL connection is gone, the l2cap channels should be gone
as well, but the matching l2cap sockets stay around because of the
references on them. But the sockets should be in state BT_CLOSED, which
is tested just prior to calling hidp_add_connection.
Possible reasons why the apparent contradiction:
1. Neither socket is locked by the hidp driver AFAICT. If true, then the
sock state (and by extension, the l2cap channels and hci connection list
could change at any time). l2cap_conn_del acquires the socket lock prior
to l2cap_chan_del.
2. HCI connection list (conn_hash) is corrupted. hidp_get_device looks
smp-unsafe to me. I think it needs to be acquiring device lock via
hci_dev_lock_bh.
3. Some other unknown race.
That's why a debug log would help. It would help establish when
l2cap_conn_del and hci_conn_del are called relative to
hidp_add_connection, and what the other pre-conditions are.
> My fix eliminates the oops but I
> still think that it is not fixing the real problem (anyway, I'd really
> like to see it upstream).
Not my place to say re: the patch. Gustavo Padovan and Marcel Holtmann
are the gatekeepers here. (Looks ok to me though, at least as far as
preventing NULL dereferences and handling the error exit properly).
Regards,
Peter Hurley
Hi Peter
On Tue, Aug 23, 2011 at 5:47 PM, Peter Hurley <[email protected]> wrote:
> David,
> I know you already posted a patch for this (which looks ok for fixing
> the oops specifically), but it'd still be helpful to see a debug kernel
> log that led up to that. There's only a couple of reasons -- all bad --
> why the hci connection could not be found in the connection list while
> the ctrl & intr sockets are connected (which are tested just prior to
> hidp_add_connection).
Sorry for the delay. I currently have no other machine to run a debug
kernel and I really don't like provoking oopses on my working machine.
However, the next days I will setup a machine and try to send a full
trace.
Are there any useful config symbols for bluetooth debug? Despite the
standard kernel-wide debug symbols.
What are possible reasons why an l2cap connection can be available
without an underlying ACL connection? My fix eliminates the oops but I
still think that it is not fixing the real problem (anyway, I'd really
like to see it upstream).
I will try to send a debug log if I can reproduce it.
> Regards,
> Peter Hurley
Cheers
David
SGkgRGF2aWQsIERvcm9uICYgTWFyY2VsLA0KDQpPbiBUaHUsIDIwMTEtMDgtMTggYXQgMDc6NDQg
LTA0MDAsIERhdmlkIEhlcnJtYW5uIHdyb3RlOg0KPiBPbiBUaHUsIEF1ZyAxOCwgMjAxMSBhdCAx
OjIxIFBNLCBLZXJlbiwgRG9yb24gPGRvcm9ua2VyZW5AdGkuY29tPiB3cm90ZToNCj4gPiBIaSBN
YXJjZWwNCj4gPg0KPiA+PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiA+PiBGcm9tOiBN
YXJjZWwgSG9sdG1hbm4gW21haWx0bzptYXJjZWxAaG9sdG1hbm4ub3JnXQ0KPiA+PiBTZW50OiBU
aHVyc2RheSwgQXVndXN0IDE4LCAyMDExIDEyOjQyIEFNDQo+ID4+IFRvOiBkb3Jvbi5rZXJlbi5i
bHVlekBnbWFpbC5jb20NCj4gPj4gQ2M6IGxpbnV4LWJsdWV0b290aEB2Z2VyLmtlcm5lbC5vcmc7
IEtlcmVuLCBEb3JvbjsgSWxpYSwgS29sb21pbnNreQ0KPiA+PiBTdWJqZWN0OiBSZTogW1BBVENI
XSBCbHVldG9vdGgtbmV4dDogQWRkIGluY3JlbWVudGFsIGluZGV4aW5nIGluIHN5c2ZzIEhDSQ0K
PiA+PiBjb25uZWN0aW9uIG5hbWUuDQo+ID4+DQo+ID4+IEhpIERvcm9uLA0KPiA+Pg0KPiA+PiA+
IFRoZSBwYXRjaCBmaXhlcyBrZXJuZWwgcGFuaWMgd2hpY2ggaXMgZHVlIHRvIHJhY2UgY29uZGl0
aW9uDQo+ID4+ID4gYmV0d2VlbiB0aGUgc2V0dXAgb2YgaW5jb21taW5nIGNvbm5lY3Rpb24gYW5k
IGNsZWFuLXVwIG9mIHRoZQ0KPiA+PiA+IGRlYWQgb25lLiBPYnNlcnZlZCBpbiB0aGUgZm9sbG93
aW5nIGNhc2U6IGF0dGFjaGVkIEhJRCBkZXZpY2UNCj4gPj4gPiBkaXNjb25uZWN0cyB1bmV4cGVj
dGVkbHkgKHdpdGhvdXQgcGVyZm9ybWluZyBBQ0wgZGlzY29ubmVjdCApLA0KPiA+PiA+IHRoZSBk
ZXZpY2UgdHJpZXMgdG8gY29ubmVjdCBhZ2FpbiBiZWZvcmUgdGhlIEFDTCBsaW5rIHRpbWUtb3V0
DQo+ID4+ID4gZmlyZXMsIHRoaXMgdHJhbnNsYXRlcyB0byB0aGUgSENJX0RJU0NPTk5FQ1QsIEhD
SV9DT05ORUNUX1JFUQ0KPiA+PiA+IGV2ZW50cyBvbiB0aGUgc2FtZSBoYW5kbGUsIHNpbmNlIEhD
SV9ESVNDT05ORUNUIHRyaWdlcnMgdGhlIGNsZWFuDQo+ID4+ID4gdXAgb2YgdGhlIEhJRCBkZXZp
Y2UgYW5kIGhhbmRsZWQgaW4gZGlmZmVyZW50IGNvbnRleHQsIHRoZQ0KPiA+PiA+IGxpbmtpbmcv
dW5saW5raW5nIGNvbm5lY3Rpb24gb2JqZWN0IHRvIHN5c2ZzLCBtYXkgbWVzcyB1cC4NCj4gPj4g
Pg0KPiA+PiA+IFNpZ25lZC1vZmYtYnk6IElsaWEgS29sb21pbnNreSA8aWxpYWtAdGkuY29tPg0K
PiA+PiA+IC0tLQ0KPiA+PiA+ICBuZXQvYmx1ZXRvb3RoL2hjaV9zeXNmcy5jIHwgICAgNCArKyst
DQo+ID4+ID4gIDEgZmlsZXMgY2hhbmdlZCwgMyBpbnNlcnRpb25zKCspLCAxIGRlbGV0aW9ucygt
KQ0KPiA+PiA+DQo+ID4+ID4gZGlmZiAtLWdpdCBhL25ldC9ibHVldG9vdGgvaGNpX3N5c2ZzLmMg
Yi9uZXQvYmx1ZXRvb3RoL2hjaV9zeXNmcy5jDQo+ID4+ID4gaW5kZXggYTZjM2FhOC4uNTk2N2Q2
MyAxMDA2NDQNCj4gPj4gPiAtLS0gYS9uZXQvYmx1ZXRvb3RoL2hjaV9zeXNmcy5jDQo+ID4+ID4g
KysrIGIvbmV0L2JsdWV0b290aC9oY2lfc3lzZnMuYw0KPiA+PiA+IEBAIC05LDYgKzksNyBAQA0K
PiA+PiA+ICAjaW5jbHVkZSA8bmV0L2JsdWV0b290aC9ibHVldG9vdGguaD4NCj4gPj4gPiAgI2lu
Y2x1ZGUgPG5ldC9ibHVldG9vdGgvaGNpX2NvcmUuaD4NCj4gPj4gPg0KPiA+PiA+ICtzdGF0aWMg
aW50IGFjbF9jb25uX2luZGV4ID0gMDsNCj4gPj4gPiAgc3RhdGljIHN0cnVjdCBjbGFzcyAqYnRf
Y2xhc3M7DQo+ID4+ID4NCj4gPj4gPiAgc3RydWN0IGRlbnRyeSAqYnRfZGVidWdmczsNCj4gPj4g
PiBAQCAtOTEsNyArOTIsOCBAQCBzdGF0aWMgdm9pZCBhZGRfY29ubihzdHJ1Y3Qgd29ya19zdHJ1
Y3QgKndvcmspDQo+ID4+ID4gICAgIHN0cnVjdCBoY2lfY29ubiAqY29ubiA9IGNvbnRhaW5lcl9v
Zih3b3JrLCBzdHJ1Y3QgaGNpX2Nvbm4sDQo+ID4+IHdvcmtfYWRkKTsNCj4gPj4gPiAgICAgc3Ry
dWN0IGhjaV9kZXYgKmhkZXYgPSBjb25uLT5oZGV2Ow0KPiA+PiA+DQo+ID4+ID4gLSAgIGRldl9z
ZXRfbmFtZSgmY29ubi0+ZGV2LCAiJXM6JWQiLCBoZGV2LT5uYW1lLCBjb25uLT5oYW5kbGUpOw0K
PiA+PiA+ICsgICBhY2xfY29ubl9pbmRleCsrOw0KPiA+PiA+ICsgICBkZXZfc2V0X25hbWUoJmNv
bm4tPmRldiwgIiVzOiVkOiVkIiwgaGRldi0+bmFtZSwgY29ubi0+aGFuZGxlLA0KPiA+PiBhY2xf
Y29ubl9pbmRleCk7DQo+ID4+ID4NCj4gPj4gPiAgICAgZGV2X3NldF9kcnZkYXRhKCZjb25uLT5k
ZXYsIGNvbm4pOw0KPiA+Pg0KPiA+PiBjYW4gd2UgZ2V0IGEgYml0IG1vcmUgb2YgZGV0YWlscyBv
biB3aGF0IHRoaXMgaXMgYWN0dWFsbHkgdHJ5aW5nIHRvDQo+ID4+IHNvbHZlLiBJIGRvIG5vdCBs
aWtlIHRoaXMgd2F5IG9mIHNvbHZpbmcgaXQgYXQgYWxsLiBJIHRoaW5rIGl0IGlzIHRyeWluZw0K
PiA+PiB0byBjb3ZlciB1cCBzeW1wdG9tcyBhbmQgbm90IGZpeGluZyB0aGUgcmVhbCBpc3N1ZS4N
Cj4gPj4NCj4gPj4gUmVnYXJkcw0KPiA+Pg0KPiA+PiBNYXJjZWwNCj4gPj4NCj4gPg0KPiA+IFRo
ZSBzY2VuYXJpbyB0aGF0IGNhdXNlcyB0aGUga2VybmVsIHBhbmljIEhhcHBlbnMgd2hlbiBISUQg
ZGV2aWNlIGRpc2Nvbm5lY3QgYW5kIGNvbm5lY3QgZmFzdC4gV2hlbiBISUQgZGV2aWNlIGRpc2Nv
bm5lY3RzIHRoZSBuYW1lIGNsZWFuLXVwIGFwcGVhcnMgMTAwLTMwMG1zZWMgYWZ0ZXIgdGhlICJo
Y2lfZGlzY29ubl9jb21wbGV0ZV9ldnQoKSIsIGJlY2F1c2UgdGhlIHR3byBMMkNBUCBjaGFubmVs
cyBuZWVkIHRvIGJlIGNsb3NlZCBmaXJzdC4NCg0KRG9yb24sDQpJdCdzIG5vdCBiZWNhdXNlIHRo
ZSBMMkNBUCBjaGFubmVscyBuZWVkIGNsb3NpbmcgLSB0aGUgY2hhbm5lbHMgYXJlDQpjbG9zZWQg
d2l0aGluIHRoZSBoY2lfZGlzY29ubl9jb21wbGV0ZV9ldnQoKSBjYWxsIHRyZWU6DQogICBoY2lf
ZGlzY29ubl9jb21wbGV0ZV9ldnQNCiAgICAgIGhjaV9wcm90b19kaXNjb25uX2NmbQ0KICAgICAg
ICAgbDJjYXBfZGlzY29ubl9jZm0NCiAgICAgICAgICAgIGwyY2FwX2Nvbm5fZGVsDQogICAgICAg
ICAgICAgICBsMmNhcF9jaGFuX2RlbA0KVGhlIHNrLT5za19zdGF0ZV9jaGFuZ2UoKSBpbiBsMmNh
cF9jaGFuX2RlbCB3YWtlcyB1cCB0aGUga2hpZCB0aHJlYWQNCndoaWNoIHRlc3RzIHRoZSBza19z
dGF0ZSBhbmQgZXhpdHMgaXRzIHByb2Nlc3NpbmcgbG9vcCwgZXZlbnR1YWxseQ0KcmVhY2hpbmcg
dGhlIGhjaV9jb25uX3B1dF9kZXZpY2UgYXMgdGhlIHRocmVhZCBzaHV0cyBkb3duLg0KDQpUaGUg
cmVhc29uIGZvciB0aGUgZGVsYXkgaW4gZGVsZXRpbmcgdGhlIHN5c2ZzIGRldmljZSBpcyB0aGF0
IHRoZSBjaGlsZA0KSElEIGRldmljZSBuZWVkcyB0byBiZSBkZXN0cm95ZWQgZmlyc3QgdG8gcHJl
c2VydmUgdGhlIHVldmVudCBvcmRlciB0bw0KdXNlci1zcGFjZSAod2hpY2ggaXMgdGhlIHB1cnBv
c2Ugb2YgaGNpX2Nvbm5faG9sZF9kZXZpY2UvX3B1dF9kZXZpY2UpLg0KDQo+ID4gVGhlIHByb2Js
ZW0gaXMgdGhhdCB0aGUgYmFzZS1iYW5kIGhhcyBhbHJlYWR5IGNsZWFuZWQgdGhlIGhhbmRsZSBu
dW1iZXIgd2hlbiB0aGUgImhjaV9kaXNjb25uX2NvbXBsZXRlX2V2dCgpIiBzZW50LiBJZiBhbm90
aGVyIGNvbm5lY3Rpb24gaXMgaW5pdGlhdGluZyByaWdodCBhZnRlciwgdGhlIGJhc2UtYmFuZCB3
aWxsIHNlbmQgImhjaV9jb25uX3JlcXVlc3RfZXZ0KCkiIHdpdGggdGhlIHNhbWUgaGFuZGxlIG51
bWJlci4gRHVyaW5nIHRoaXMgdGltZSB3ZSBoYXZlIHNpdHVhdGlvbiB0aGF0IHR3byBIQ0kgY29u
bmVjdGlvbnMgaGFzIHRoZSBzYW1lIG5hbWUsIGJlY2F1c2UgdGhlIGhhbmRsZSBudW1iZXIgZnJv
bSB0aGUgYmFzZS1iYW5kIGlzIHRoZSBzYW1lLiBUaGVyZSBpcyBubyByZWFzb24gZm9yIHRoZSB0
d28gSENJIGNvbm5lY3Rpb25zIHRvIHNoYXJlIHRoZSBzYW1lIHJlc291cmNlLCBuYW1lLiBUaGlz
IGR1cGxpY2F0ZSBuYW1lIHNpdHVhdGlvbiB3aWxsIGNhdXNlIEtlcm5lbCBwYW5pYy4NCj4gPg0K
PiA+IFRoZSByZWFsIGlzc3VlIGlzIHRoYXQgdGhlIEhDSSBkZXZpY2UgY29ubmVjdGlvbiBuYW1l
IGlzIHNhdmVkIGluIHRoZSBTWVNGUw0KPiA+IEluIHRoZSBmb3JtYXQ6ICIvZGV2aWNlcy92aXJ0
dWFsL2JsdWV0b290aC9oY2kwL2hjaTA6MSINCj4gPiBUaGUgbmFtZSBpbiB0aGlzIGZvcm1hdCBk
ZXBlbmRzIGp1c3Qgb24gdGhlIGJhc2UtYmFuZCBoYW5kbGUgdGhhdCByZWNlaXZlZCBpbiB0aGUg
ImhjaV9jb25uX2NvbXBsZXRlX2V2dCgpIi4gVGhlIGRldmljZSBuYW1lIGlzIGNsZWFuZWQganVz
dCB3aGVuIHRoZQ0KPiA+IFZhcmlhYmxlIGNvbm4tPmRldnJlZiBiZWNvbWVzIDAuDQo+ID4gSW4g
dGhlIHNvdXJjZSBjb2RlOg0KPiA+ICAgIGlmIChhdG9taWNfZGVjX2FuZF90ZXN0KCZjb25uLT5k
ZXZyZWYpKQ0KPiA+ICAgICAgICAgICAgICAgIGhjaV9jb25uX2RlbF9zeXNmcyhjb25uKTsNCj4g
Pg0KPiA+IFRoZSBpbmNyZW1lbnRhbCBpbmRleCBpbiB0aGUgbmFtZSBmb3JtYXQ6ICIvZGV2aWNl
cy92aXJ0dWFsL2JsdWV0b290aC9oY2kwL2hjaTA6MTo8SW5kZXg+Ig0KPiA+IFNvbHZlcyB0aGUg
cHJvYmxlbSBvZiB0d28gSENJIGNvbm5lY3Rpb25zIHdpdGggdGhlIHNhbWUgbmFtZS4NCg0KVGhp
cyBpcyB0aGUgc2ltcGxlc3Qgc29sdXRpb24uIFRoZSBvdGhlciB3YXkgLSB3aGljaCBpcyBtb3Jl
IGNvbXBsaWNhdGVkDQpidXQgcHJlc2VydmVzIG5hbWluZyAtIGlzIHRvIGRvIGF3YXkgd2l0aCB0
aGUNCmhjaV9jb25uX2hvbGRfZGV2aWNlL19wdXRfZGV2aWNlLCBhbmQgaW5zdGVhZCwgY29ycmVj
dGx5IHNlcXVlbmNlIEhJRA0KY2hpbGQgZGV2aWNlIGRlc3RydWN0aW9uIHdpdGhpbiB0aGUgZGVs
X2Nvbm4oKSB3b3JrIHF1ZXVlIGhhbmRsZXIgLQ0KYmFzaWNhbGx5IGltcG9zZSB3aGF0IHRoZSBk
ZXZpY2UgbW9kZWwgc2hvdWxkIGFscmVhZHkgZG8uDQoNCj4gSSBjYW4gY29uZmlybSB0aGF0IHRo
ZXJlIGlzIGEgcmVjb25uZWN0aW9uIGlzc3VlIGFuZCBJIGdldCB0aGUgc2FtZQ0KPiBvb3BzIChk
ZXZpY2UgImhjaTA6MSIgaXMgdHJpZWQgdG8gYmUgYWRkZWQgdHdpY2UpIG9uIGZhc3QNCj4gcmVj
b25uZWN0aW9ucy4NCj4gDQo+IEhvd2V2ZXIsIEkgdGVzdGVkIHRoaXMgZml4IGFuZCBNYXJjZWwg
c2VlbXMgcmlnaHQuIEkgbm8gbG9uZ2VyIGdldCB0aGUNCj4gc3lzZnMtb29wcywgYnV0IEkgbm93
IGdldCBhbm90aGVyIG9vcHMgaW4gaGlkcF9hZGRfY29ubmVjdGlvbigpIHNvDQo+IHRoaXMgZG9l
c24ndCBmaXggdGhlIHByb2JsZW0uDQo+IEFmdGVyIHJlc3RhcnRpbmcgdGhlIG1hY2hpbmUgYWZ0
ZXIgdGhpcyBvb3BzLCBteSBsb2cgY29udGFpbnMgb25seQ0KPiBoYWxmIG9mIHRoZSBvb3BzLW1l
c3NhZ2Ugc28gSSBjYW4ndCBwb3N0IHRoaXMgaGVyZS4gSSB3aWxsIHRyeSB0byBnZXQNCj4gYSBm
dWxsIGxvZyBhbmQgcmVwb3J0IGl0IHRoZW4uDQoNCkRhdmlkLA0KSSBrbm93IHlvdSBhbHJlYWR5
IHBvc3RlZCBhIHBhdGNoIGZvciB0aGlzICh3aGljaCBsb29rcyBvayBmb3IgZml4aW5nDQp0aGUg
b29wcyBzcGVjaWZpY2FsbHkpLCBidXQgaXQnZCBzdGlsbCBiZSBoZWxwZnVsIHRvIHNlZSBhIGRl
YnVnIGtlcm5lbA0KbG9nIHRoYXQgbGVkIHVwIHRvIHRoYXQuIFRoZXJlJ3Mgb25seSBhIGNvdXBs
ZSBvZiByZWFzb25zIC0tIGFsbCBiYWQgLS0NCndoeSB0aGUgaGNpIGNvbm5lY3Rpb24gY291bGQg
bm90IGJlIGZvdW5kIGluIHRoZSBjb25uZWN0aW9uIGxpc3Qgd2hpbGUNCnRoZSBjdHJsICYgaW50
ciBzb2NrZXRzIGFyZSBjb25uZWN0ZWQgKHdoaWNoIGFyZSB0ZXN0ZWQganVzdCBwcmlvciB0bw0K
aGlkcF9hZGRfY29ubmVjdGlvbikuDQoNCk1hcmNlbCwNCkl0IHNlZW1zIHRvIG1lIHRoYXQgaGlk
cF9nZXRfZGV2aWNlLCByZmNvbW1fZ2V0X2RldmljZSAmIGJuZXBfZ2V0X2RldmljZQ0KYXJlIHVu
c2FmZS4gRWxzZXdoZXJlLCBjb25jdXJyZW50IGFjY2VzcyB0byB0aGUgY29ubl9oYXNoIGxpc3Qg
aXMNCnByb3RlY3RlZCBieSBhY3F1aXJpbmcgdGhlIGRldmljZSBsb2NrIChoY2lfZGV2X2xvY2sv
X3VubG9jayBhbmQgX2JoDQp2YXJpYW50cykuIENhbiB5b3UgY29uZmlybSBteSB1bmRlcnN0YW5k
aW5nIGhlcmU/DQoNClJlZ2FyZHMsDQpQZXRlciBIdXJsZXkNCg==
On Thu, Aug 18, 2011 at 1:21 PM, Keren, Doron <[email protected]> wrote:
> Hi Marcel
>
>> -----Original Message-----
>> From: Marcel Holtmann [mailto:[email protected]]
>> Sent: Thursday, August 18, 2011 12:42 AM
>> To: [email protected]
>> Cc: [email protected]; Keren, Doron; Ilia, Kolominsky
>> Subject: Re: [PATCH] Bluetooth-next: Add incremental indexing in sysfs H=
CI
>> connection name.
>>
>> Hi Doron,
>>
>> > The patch fixes kernel panic which is due to race condition
>> > between the setup of incomming connection and clean-up of the
>> > dead one. Observed in the following case: attached HID device
>> > disconnects unexpectedly (without performing ACL disconnect ),
>> > the device tries to connect again before the ACL link time-out
>> > fires, this translates to the HCI_DISCONNECT, HCI_CONNECT_REQ
>> > events on the same handle, since HCI_DISCONNECT trigers the clean
>> > up of the HID device and handled in different context, the
>> > linking/unlinking connection object to sysfs, may mess up.
>> >
>> > Signed-off-by: Ilia Kolominsky <[email protected]>
>> > ---
>> > =A0net/bluetooth/hci_sysfs.c | =A0 =A04 +++-
>> > =A01 files changed, 3 insertions(+), 1 deletions(-)
>> >
>> > diff --git a/net/bluetooth/hci_sysfs.c b/net/bluetooth/hci_sysfs.c
>> > index a6c3aa8..5967d63 100644
>> > --- a/net/bluetooth/hci_sysfs.c
>> > +++ b/net/bluetooth/hci_sysfs.c
>> > @@ -9,6 +9,7 @@
>> > =A0#include <net/bluetooth/bluetooth.h>
>> > =A0#include <net/bluetooth/hci_core.h>
>> >
>> > +static int acl_conn_index =3D 0;
>> > =A0static struct class *bt_class;
>> >
>> > =A0struct dentry *bt_debugfs;
>> > @@ -91,7 +92,8 @@ static void add_conn(struct work_struct *work)
>> > =A0 =A0 struct hci_conn *conn =3D container_of(work, struct hci_conn,
>> work_add);
>> > =A0 =A0 struct hci_dev *hdev =3D conn->hdev;
>> >
>> > - =A0 dev_set_name(&conn->dev, "%s:%d", hdev->name, conn->handle);
>> > + =A0 acl_conn_index++;
>> > + =A0 dev_set_name(&conn->dev, "%s:%d:%d", hdev->name, conn->handle,
>> acl_conn_index);
>> >
>> > =A0 =A0 dev_set_drvdata(&conn->dev, conn);
>>
>> can we get a bit more of details on what this is actually trying to
>> solve. I do not like this way of solving it at all. I think it is trying
>> to cover up symptoms and not fixing the real issue.
>>
>> Regards
>>
>> Marcel
>>
>
> The scenario that causes the kernel panic Happens when HID device disconn=
ect and connect fast. When HID device disconnects the name clean-up appears=
100-300msec after the "hci_disconn_complete_evt()", because the two L2CAP =
channels need to be closed first.
> The problem is that the base-band has already cleaned the handle number w=
hen the "hci_disconn_complete_evt()" sent. If another connection is initiat=
ing right after, the base-band will send "hci_conn_request_evt()" with the =
same handle number. During this time we have situation that two HCI connect=
ions has the same name, because the handle number from the base-band is the=
same. There is no reason for the two HCI connections to share the same res=
ource, name. This duplicate name situation will cause Kernel panic.
>
> The real issue is that the HCI device connection name is saved in the SYS=
FS
> In the format: "/devices/virtual/bluetooth/hci0/hci0:1"
> The name in this format depends just on the base-band handle that receive=
d in the "hci_conn_complete_evt()". The device name is cleaned just when th=
e
> Variable conn->devref becomes 0.
> In the source code:
> =A0 =A0if (atomic_dec_and_test(&conn->devref))
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0hci_conn_del_sysfs(conn);
>
> The incremental index in the name format: "/devices/virtual/bluetooth/hci=
0/hci0:1:<Index>"
> Solves the problem of two HCI connections with the same name.
>
> Regards,
> Doron
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bluetooth=
" in
> the body of a message to [email protected]
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>
I can confirm that there is a reconnection issue and I get the same
oops (device "hci0:1" is tried to be added twice) on fast
reconnections.
However, I tested this fix and Marcel seems right. I no longer get the
sysfs-oops, but I now get another oops in hidp_add_connection() so
this doesn't fix the problem.
After restarting the machine after this oops, my log contains only
half of the oops-message so I can't post this here. I will try to get
a full log and report it then.
Regards
David
Hi Marcel
> -----Original Message-----
> From: Marcel Holtmann [mailto:[email protected]]
> Sent: Thursday, August 18, 2011 12:42 AM
> To: [email protected]
> Cc: [email protected]; Keren, Doron; Ilia, Kolominsky
> Subject: Re: [PATCH] Bluetooth-next: Add incremental indexing in sysfs HC=
I
> connection name.
>=20
> Hi Doron,
>=20
> > The patch fixes kernel panic which is due to race condition
> > between the setup of incomming connection and clean-up of the
> > dead one. Observed in the following case: attached HID device
> > disconnects unexpectedly (without performing ACL disconnect ),
> > the device tries to connect again before the ACL link time-out
> > fires, this translates to the HCI_DISCONNECT, HCI_CONNECT_REQ
> > events on the same handle, since HCI_DISCONNECT trigers the clean
> > up of the HID device and handled in different context, the
> > linking/unlinking connection object to sysfs, may mess up.
> >
> > Signed-off-by: Ilia Kolominsky <[email protected]>
> > ---
> > net/bluetooth/hci_sysfs.c | 4 +++-
> > 1 files changed, 3 insertions(+), 1 deletions(-)
> >
> > diff --git a/net/bluetooth/hci_sysfs.c b/net/bluetooth/hci_sysfs.c
> > index a6c3aa8..5967d63 100644
> > --- a/net/bluetooth/hci_sysfs.c
> > +++ b/net/bluetooth/hci_sysfs.c
> > @@ -9,6 +9,7 @@
> > #include <net/bluetooth/bluetooth.h>
> > #include <net/bluetooth/hci_core.h>
> >
> > +static int acl_conn_index =3D 0;
> > static struct class *bt_class;
> >
> > struct dentry *bt_debugfs;
> > @@ -91,7 +92,8 @@ static void add_conn(struct work_struct *work)
> > struct hci_conn *conn =3D container_of(work, struct hci_conn,
> work_add);
> > struct hci_dev *hdev =3D conn->hdev;
> >
> > - dev_set_name(&conn->dev, "%s:%d", hdev->name, conn->handle);
> > + acl_conn_index++;
> > + dev_set_name(&conn->dev, "%s:%d:%d", hdev->name, conn->handle,
> acl_conn_index);
> >
> > dev_set_drvdata(&conn->dev, conn);
>=20
> can we get a bit more of details on what this is actually trying to
> solve. I do not like this way of solving it at all. I think it is trying
> to cover up symptoms and not fixing the real issue.
>=20
> Regards
>=20
> Marcel
>=20
The scenario that causes the kernel panic Happens when HID device disconnec=
t and connect fast. When HID device disconnects the name clean-up appears 1=
00-300msec after the "hci_disconn_complete_evt()", because the two L2CAP ch=
annels need to be closed first.
The problem is that the base-band has already cleaned the handle number whe=
n the "hci_disconn_complete_evt()" sent. If another connection is initiatin=
g right after, the base-band will send "hci_conn_request_evt()" with the sa=
me handle number. During this time we have situation that two HCI connectio=
ns has the same name, because the handle number from the base-band is the s=
ame. There is no reason for the two HCI connections to share the same resou=
rce, name. This duplicate name situation will cause Kernel panic.
The real issue is that the HCI device connection name is saved in the SYSFS=
=20
In the format: "/devices/virtual/bluetooth/hci0/hci0:1"
The name in this format depends just on the base-band handle that received =
in the "hci_conn_complete_evt()". The device name is cleaned just when the
Variable conn->devref becomes 0.
In the source code:
if (atomic_dec_and_test(&conn->devref))
hci_conn_del_sysfs(conn);
The incremental index in the name format: "/devices/virtual/bluetooth/hci0/=
hci0:1:<Index>"
Solves the problem of two HCI connections with the same name.
Regards,
Doron
Hi Doron,
> The patch fixes kernel panic which is due to race condition
> between the setup of incomming connection and clean-up of the
> dead one. Observed in the following case: attached HID device
> disconnects unexpectedly (without performing ACL disconnect ),
> the device tries to connect again before the ACL link time-out
> fires, this translates to the HCI_DISCONNECT, HCI_CONNECT_REQ
> events on the same handle, since HCI_DISCONNECT trigers the clean
> up of the HID device and handled in different context, the
> linking/unlinking connection object to sysfs, may mess up.
>
> Signed-off-by: Ilia Kolominsky <[email protected]>
> ---
> net/bluetooth/hci_sysfs.c | 4 +++-
> 1 files changed, 3 insertions(+), 1 deletions(-)
>
> diff --git a/net/bluetooth/hci_sysfs.c b/net/bluetooth/hci_sysfs.c
> index a6c3aa8..5967d63 100644
> --- a/net/bluetooth/hci_sysfs.c
> +++ b/net/bluetooth/hci_sysfs.c
> @@ -9,6 +9,7 @@
> #include <net/bluetooth/bluetooth.h>
> #include <net/bluetooth/hci_core.h>
>
> +static int acl_conn_index = 0;
> static struct class *bt_class;
>
> struct dentry *bt_debugfs;
> @@ -91,7 +92,8 @@ static void add_conn(struct work_struct *work)
> struct hci_conn *conn = container_of(work, struct hci_conn, work_add);
> struct hci_dev *hdev = conn->hdev;
>
> - dev_set_name(&conn->dev, "%s:%d", hdev->name, conn->handle);
> + acl_conn_index++;
> + dev_set_name(&conn->dev, "%s:%d:%d", hdev->name, conn->handle, acl_conn_index);
>
> dev_set_drvdata(&conn->dev, conn);
can we get a bit more of details on what this is actually trying to
solve. I do not like this way of solving it at all. I think it is trying
to cover up symptoms and not fixing the real issue.
Regards
Marcel
Hi Doron,
> > > The patch fixes kernel panic which is due to race condition
> > > between the setup of incomming connection and clean-up of the
> > > dead one. Observed in the following case: attached HID device
> > > disconnects unexpectedly (without performing ACL disconnect ),
> > > the device tries to connect again before the ACL link time-out
> > > fires, this translates to the HCI_DISCONNECT, HCI_CONNECT_REQ
> > > events on the same handle, since HCI_DISCONNECT trigers the clean
> > > up of the HID device and handled in different context, the
> > > linking/unlinking connection object to sysfs, may mess up.
> > >
> > > Signed-off-by: Ilia Kolominsky <[email protected]>
> > > ---
> > > net/bluetooth/hci_sysfs.c | 4 +++-
> > > 1 files changed, 3 insertions(+), 1 deletions(-)
> > >
> > > diff --git a/net/bluetooth/hci_sysfs.c b/net/bluetooth/hci_sysfs.c
> > > index a6c3aa8..5967d63 100644
> > > --- a/net/bluetooth/hci_sysfs.c
> > > +++ b/net/bluetooth/hci_sysfs.c
> > > @@ -9,6 +9,7 @@
> > > #include <net/bluetooth/bluetooth.h>
> > > #include <net/bluetooth/hci_core.h>
> > >
> > > +static int acl_conn_index = 0;
> > > static struct class *bt_class;
> > >
> > > struct dentry *bt_debugfs;
> > > @@ -91,7 +92,8 @@ static void add_conn(struct work_struct *work)
> > > struct hci_conn *conn = container_of(work, struct hci_conn,
> > work_add);
> > > struct hci_dev *hdev = conn->hdev;
> > >
> > > - dev_set_name(&conn->dev, "%s:%d", hdev->name, conn->handle);
> > > + acl_conn_index++;
> > > + dev_set_name(&conn->dev, "%s:%d:%d", hdev->name, conn->handle,
> > acl_conn_index);
> > >
> > > dev_set_drvdata(&conn->dev, conn);
> >
> > can we get a bit more of details on what this is actually trying to
> > solve. I do not like this way of solving it at all. I think it is trying
> > to cover up symptoms and not fixing the real issue.
> >
> > Regards
> >
> > Marcel
> >
>
> The scenario that causes the kernel panic Happens when HID device disconnect and connect fast. When HID device disconnects the name clean-up appears 100-300msec after the "hci_disconn_complete_evt()", because the two L2CAP channels need to be closed first.
> The problem is that the base-band has already cleaned the handle number when the "hci_disconn_complete_evt()" sent. If another connection is initiating right after, the base-band will send "hci_conn_request_evt()" with the same handle number. During this time we have situation that two HCI connections has the same name, because the handle number from the base-band is the same. There is no reason for the two HCI connections to share the same resource, name. This duplicate name situation will cause Kernel panic.
>
> The real issue is that the HCI device connection name is saved in the SYSFS
> In the format: "/devices/virtual/bluetooth/hci0/hci0:1"
> The name in this format depends just on the base-band handle that received in the "hci_conn_complete_evt()". The device name is cleaned just when the
> Variable conn->devref becomes 0.
> In the source code:
> if (atomic_dec_and_test(&conn->devref))
> hci_conn_del_sysfs(conn);
>
> The incremental index in the name format: "/devices/virtual/bluetooth/hci0/hci0:1:<Index>"
> Solves the problem of two HCI connections with the same name.
so I had another look at this and we are still trying to fix the
symptoms and not the real bug.
The real bug here is that we have basically sysfs file creation racing
with the HCI event handling. One done inside a tasklet and the other
done inside a workqueue.
We have this problem since a long time and we really have to move HCI
processing into a workqueue. That is the only way to finally be able to
remove all races that might happen here.
Regards
Marcel