Return-Path: MIME-Version: 1.0 In-Reply-To: References: Date: Sun, 27 Feb 2011 05:30:57 -0600 Message-ID: Subject: Re: HCI core error recovery. From: Andrei Warkentin To: linux-bluetooth@vger.kernel.org Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-bluetooth-owner@vger.kernel.org List-ID: On Fri, Feb 18, 2011 at 2:21 PM, Andrei Warkentin wrote: > On Mon, Feb 14, 2011 at 4:23 PM, Andrei Warkentin wrote: >> On Sat, Feb 12, 2011 at 12:47 AM, Andrei Warkentin wrote: >>> On Fri, Feb 11, 2011 at 5:07 PM, Andrei Warkentin wrote: >>>> Dear List, >>>> >>>> I've run into an interesting problem. Excuse me in advance if this was >>>> already covered here, or for my explanations, since I'm not too >>>> familiar with overall flow within BlueZ or Bluetooth specifics... >>>> We've had some hardware config issues that resulted in garbage/malformed >>>> messages arriving via H4 into the HCI layer. We've since resolved >>>> these, but it got me thinking. The issues would result in certain HCI >>>> messages being missed, including occasionally disconnect events being >>>> missed, and a subsequent connect event would result in a double add. >>>> >>>> I was thinking about how to fix at the very least the crash. The sysfs >>>> object is created as a last step after getting a "connection >>>> completed" HCI message, I think. What I am unsure about is if it's >>>> safe to just ignore the add if there is already a sysfs entry... >>>> >>>> So I would think the HCI core needs some resiliency against >>>> bad/malignant bluetooth controllers, and perform error >>>> recovery/resynchronization. Perhaps maybe there is room for a virtual >>>> hci controller that just injects various message types to see how well >>>> the core can cope? >>>> >>>> Thanks in advance, >>>> A >>> >>> To further explain the issue, here is what was happening - >>> >>> 0) A BT device is paired. >>> 1) Host goes into sleep mode. >>> 2) BT device turns off. >>> 3) Host wakes up due to BT waking the host. Due to UART resume issues, >>> HCI message corrupted. hci_disconn_complete_evt never gets called. >>> 4) BT device turns on. >>> 5) devref gets incremented in ?hci_conn_complete_evt, and is now 2. >>> 6) BT device turns off. hci_disconn_complete_evt is called, conn hash >>> is deleted, but sysfs entry not cleaned up since >>> atomic_dec_and_test(&conn->devref) != 0. >>> 7) BT device turns on. sysfs add fails since it never was cleaned up. >>> >>> The attached patch takes care of that. I'm not too familiar with BlueZ >>> (or bluetooth :-(), so I would like your feedback. In particular, I am >>> unsure about sync connections. >>> The primary issue overall is that HCI core doesn't handle HCI issues >>> (whether caused by transport issues, or bad/malicious BT controller). >>> I am curious if there are other ways to break the core. >>> >>> Thanks, >>> A >>> >> >> Anyone? >> > > Anyone? Who should I talk to about HCI? > Anyone pretty please :)? I'm positive what I'm doing isn't necessarily right, but I do think this is a real issue in current BlueZ code that needs work. HCI core should be more resilient to HCI transport issues, after all, the BT HCI spec does mandate specific behavior. A