Return-Path: MIME-Version: 1.0 In-Reply-To: References: Date: Fri, 18 Feb 2011 14:21:28 -0600 Message-ID: Subject: Re: HCI core error recovery. From: Andrei Warkentin To: linux-bluetooth@vger.kernel.org Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-bluetooth-owner@vger.kernel.org List-ID: On Mon, Feb 14, 2011 at 4:23 PM, Andrei Warkentin wrote: > On Sat, Feb 12, 2011 at 12:47 AM, Andrei Warkentin wrote: >> On Fri, Feb 11, 2011 at 5:07 PM, Andrei Warkentin wrote: >>> Dear List, >>> >>> I've run into an interesting problem. Excuse me in advance if this was >>> already covered here, or for my explanations, since I'm not too >>> familiar with overall flow within BlueZ or Bluetooth specifics... >>> We've had some hardware config issues that resulted in garbage/malformed >>> messages arriving via H4 into the HCI layer. We've since resolved >>> these, but it got me thinking. The issues would result in certain HCI >>> messages being missed, including occasionally disconnect events being >>> missed, and a subsequent connect event would result in a double add. >>> >>> I was thinking about how to fix at the very least the crash. The sysfs >>> object is created as a last step after getting a "connection >>> completed" HCI message, I think. What I am unsure about is if it's >>> safe to just ignore the add if there is already a sysfs entry... >>> >>> So I would think the HCI core needs some resiliency against >>> bad/malignant bluetooth controllers, and perform error >>> recovery/resynchronization. Perhaps maybe there is room for a virtual >>> hci controller that just injects various message types to see how well >>> the core can cope? >>> >>> Thanks in advance, >>> A >> >> To further explain the issue, here is what was happening - >> >> 0) A BT device is paired. >> 1) Host goes into sleep mode. >> 2) BT device turns off. >> 3) Host wakes up due to BT waking the host. Due to UART resume issues, >> HCI message corrupted. hci_disconn_complete_evt never gets called. >> 4) BT device turns on. >> 5) devref gets incremented in ?hci_conn_complete_evt, and is now 2. >> 6) BT device turns off. hci_disconn_complete_evt is called, conn hash >> is deleted, but sysfs entry not cleaned up since >> atomic_dec_and_test(&conn->devref) != 0. >> 7) BT device turns on. sysfs add fails since it never was cleaned up. >> >> The attached patch takes care of that. I'm not too familiar with BlueZ >> (or bluetooth :-(), so I would like your feedback. In particular, I am >> unsure about sync connections. >> The primary issue overall is that HCI core doesn't handle HCI issues >> (whether caused by transport issues, or bad/malicious BT controller). >> I am curious if there are other ways to break the core. >> >> Thanks, >> A >> > > Anyone? > Anyone? Who should I talk to about HCI?