Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932295AbcLGHpb (ORCPT ); Wed, 7 Dec 2016 02:45:31 -0500 Received: from mx0a-0016f401.pphosted.com ([67.231.148.174]:53438 "EHLO mx0b-0016f401.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753020AbcLGHp2 (ORCPT ); Wed, 7 Dec 2016 02:45:28 -0500 From: Amitkumar Karwar To: Marcel Holtmann CC: "linux-bluetooth@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Cathy Luo , Nishant Sarmukadam , Ganapathi Bhat Subject: RE: [PATCH] btusb: fix zero BD address problem during stress test Thread-Topic: [PATCH] btusb: fix zero BD address problem during stress test Thread-Index: AQHSKT9FW6HZ2jYE4Ua/U9hcW6HDhKDbTRhggAr8tgCAAA+fkIAAndsA//+K83CAA0YGsIASm1LQ Date: Wed, 7 Dec 2016 07:44:34 +0000 Message-ID: References: <1476795449-20592-1-git-send-email-akarwar@marvell.com> <077FBE27-DA84-4F11-BC69-F38184CE6B40@holtmann.org> <3926ae98e84248eaa53999e4d466b189@SC-EXCH04.marvell.com> <0175091F-6B2E-49CE-B099-FC6D35C9C1BF@holtmann.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.93.176.43] Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-12-07_02:,, signatures=0 X-Proofpoint-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000 definitions=main-1612070137 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by mail.home.local id uB77jb9i015308 Content-Length: 7659 Lines: 207 Hi Marcel, > From: Amitkumar Karwar > Sent: Friday, November 25, 2016 4:51 PM > To: 'Marcel Holtmann' > Cc: 'linux-bluetooth@vger.kernel.org'; 'linux-kernel@vger.kernel.org'; > Cathy Luo; Nishant Sarmukadam; Ganapathi Bhat > Subject: RE: [PATCH] btusb: fix zero BD address problem during stress > test > > Hi Marcel, > > > From: Amitkumar Karwar > > Sent: Wednesday, November 23, 2016 3:14 PM > > To: 'Marcel Holtmann' > > Cc: linux-bluetooth@vger.kernel.org; linux-kernel@vger.kernel.org; > > Cathy Luo; Nishant Sarmukadam; Ganapathi Bhat > > Subject: RE: [PATCH] btusb: fix zero BD address problem during stress > > test > > > > Hi Marcel, > > > > > From: Marcel Holtmann [mailto:marcel@holtmann.org] > > > Sent: Wednesday, November 23, 2016 1:46 PM > > > To: Amitkumar Karwar > > > Cc: linux-bluetooth@vger.kernel.org; linux-kernel@vger.kernel.org; > > > Cathy Luo; Nishant Sarmukadam; Ganapathi Bhat > > > Subject: Re: [PATCH] btusb: fix zero BD address problem during > > > stress test > > > > > > Hi Amitkumar, > > > > > > >>>> From: Amitkumar Karwar [mailto:akarwar@marvell.com] > > > >>>> Sent: Tuesday, October 18, 2016 6:27 PM > > > >>>> To: linux-bluetooth@vger.kernel.org > > > >>>> Cc: marcel@holtmann.org; linux-kernel@vger.kernel.org; Cathy > > Luo; > > > >>>> Nishant Sarmukadam; Ganapathi Bhat; Amitkumar Karwar > > > >>>> Subject: [PATCH] btusb: fix zero BD address problem during > > stress > > > >>>> test > > > >>>> > > > >>>> From: Ganapathi Bhat > > > >>>> > > > >>>> We came across a corner case issue during reboot stress test > in > > > >> which > > > >>>> hciconfig shows BD address is all zero. Reason is we don't get > > > >>>> response for HCI RESET command during initialization > > > >>>> > > > >>>> The issue is tracked to a race where USB subsystem calls > > > >>>> btusb_intr_complete() to deliver a data(NOOP frame) received > on > > > >>>> interrupt endpoint. HCI_RUNNING flag is not yet set by > > > >>>> bluetooth subsystem. So we ignore that frame and return. > > > >>>> > > > >>>> As we missed to resubmit the buffer to interrupt endpoint in > > this > > > >>>> case, we don't get response for BT reset command downloaded > > after > > > >> this. > > > >>>> > > > >>>> This patch handles the corner case to resolve zero BD address > > > >> problem. > > > >>>> > > > >>>> Signed-off-by: Ganapathi Bhat > > > >>>> Signed-off-by: Amitkumar Karwar > > > >>>> --- > > > >>>> drivers/bluetooth/btusb.c | 5 +---- > > > >>>> 1 file changed, 1 insertion(+), 4 deletions(-) > > > >>>> > > > >>>> diff --git a/drivers/bluetooth/btusb.c > > > >>>> b/drivers/bluetooth/btusb.c index 811f9b9..b5596ac 100644 > > > >>>> --- a/drivers/bluetooth/btusb.c > > > >>>> +++ b/drivers/bluetooth/btusb.c > > > >>>> @@ -607,10 +607,7 @@ static void btusb_intr_complete(struct > urb > > > >> *urb) > > > >>>> BT_DBG("%s urb %p status %d count %d", hdev->name, urb, > > urb- > > > >>>>> status, > > > >>>> urb->actual_length); > > > >>>> > > > >>>> - if (!test_bit(HCI_RUNNING, &hdev->flags)) > > > >>>> - return; > > > >>>> - > > > >>>> - if (urb->status == 0) { > > > >>>> + if (urb->status == 0 && test_bit(HCI_RUNNING, &hdev- > > > >flags)) { > > > >>>> hdev->stat.byte_rx += urb->actual_length; > > > >>>> > > > >>>> if (btusb_recv_intr(data, urb->transfer_buffer, > > > >>> > > > >>> Did you get a chance to check this? > > > >>> Please let us know if you have any review comments. > > > >> > > > >> can you explain how this is correct and show me the HCI traces > > > >> for this. > > > >> > > > > > > > > I suppose HCI trace means hcidump logs here. As device hasn't yet > > > initialized, hcidump won't show anything. > > > > We had added debug info in btusb driver to trace the data > received > > > > on > > > all USB endpoints and also checked usbmon logs. > > > > > > use btmon and it will show it. > > > > > > > Here is the sequence of events we observed in a corner case while > > > running stress test. > > > > 1) Inside btusb_open() call ------ Thread 1 > > > > 2) btusb_submit_intr_urb() submits the URB for receiving data on > > > > interrupt endpoint ---- Thread 1 > > > > 3) btusb_intr_complete() gets called to deliver NOP frame from > HCI > > > > controller ---- Thread 2 > > > > 4) HCI_RUNNING isn't set yet. So we return from > > > > btusb_intr_complete() without resubmitting the buffer --- Thread > 2 > > > > 5) Exit btusb_open() ---- Thread 1 > > > > 6) "set_bit(HCI_RUNNING, &hdev->flags)" done by bluetooth core > > > > ---- Thread 1 > > > > > > > > Later HCI_RESET command gets timedout, as we haven't re-submitted > > > buffer for interrupt endpoint in step (4) above. > > > > > > > > Please find attached logs. usbmon log shows first frame received > > > > on > > > interrupt endpoint is NOP(Search for Marvell in log). > > > > > > > > Here is what bluetooth spec says about NOP frame. > > > > > > > > "To indicate to the Host that the Controller is ready to receive > > HCI > > > > command packets, the Controller generates a Command Status event > > > > with Status 0x00 and Command_Opcode 0x0000, and the > > > Num_HCI_Command_Packets > > > > event parameter is set to 1 or more. Command_Opcode, 0x0000 is a > > NOP > > > (No Operation)” > > > > > > So I wonder if we need to remove the HCI_RUNNING logic from the > > > drivers. It is only left in a few USB drivers and I removed the > rest > > > and moved it into the core. I am not in favour of papering over > this > > > issue. I need to understand what is wrong. And actually HCI_RUNNING > > > needs to be taken away from the drivers. So the question is if > > btusb.c > > > still needs it or if that is just a leftover. Meaning is it > > protecting > > > anything? > > > > > > > I can see HCI_RUNNING check being used at below places in btusb.c > > > > 1) btusb_intr_complete > > 2) btusb_bulk_complete > > 3) btusb_isoc_complete > > 4) btusb_tx_complete > > 5) btusb_isoc_tx_complete > > 6) btusb_bcm_set_diag > > 7) btusb_resume > > > > I think, it can be removed from (1), (2) and (3), as we already have > > below check inside hci_recv_frame() > > > > if (!hdev || (!test_bit(HCI_UP, &hdev->flags) > > && !test_bit(HCI_INIT, &hdev->flags))) { > > kfree_skb(skb); > > return -ENXIO; > > } > > > > For (4) and (5), it's just used to decide if we should update "hdev- > > >stat" or not. > > > > Please guide on how should we fix this issue. We observe timeout for > HCI_RESET command sometimes when system is booted. This happens due to > missing urb_submitt() when first frame(NOP) received on INT endpoint is > ignored. Below change would also help. > > --- a/net/bluetooth/hci_core.c > +++ b/net/bluetooth/hci_core.c > @@ -1318,12 +1318,13 @@ static int hci_dev_do_open(struct hci_dev > *hdev) > goto done; > } > > + set_bit(HCI_RUNNING, &hdev->flags); > if (hdev->open(hdev)) { > + clear_bit(HCI_RUNNING, &hdev->flags); > ret = -EIO; > goto done; > } > > - set_bit(HCI_RUNNING, &hdev->flags); > hci_sock_dev_event(hdev, HCI_DEV_OPEN); How can we proceed on this? I'm not much familiar with btusb driver, so not sure if HCI_RUNNING logic can be completely removed. Meanwhile could you accept proposed fix in this patch for the potential issue we spotted during stress tests. The change doesn't have any side-effects. During teardown, race between btusb_close() and btusb_intr_complete() is appropriately taken care of with the help of BTUSB_INTR_RUNNING flag. Regards, Amitkumar Karwar