Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4E893C4360F for ; Thu, 4 Apr 2019 14:52:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0667520855 for ; Thu, 4 Apr 2019 14:52:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="key not found in DNS" (0-bit key) header.d=codeaurora.org header.i=@codeaurora.org header.b="NLvAhuGP"; dkim=fail reason="key not found in DNS" (0-bit key) header.d=codeaurora.org header.i=@codeaurora.org header.b="DBgpnxye" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729325AbfDDOwU (ORCPT ); Thu, 4 Apr 2019 10:52:20 -0400 Received: from smtp.codeaurora.org ([198.145.29.96]:49748 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728647AbfDDOwU (ORCPT ); Thu, 4 Apr 2019 10:52:20 -0400 Received: by smtp.codeaurora.org (Postfix, from userid 1000) id 4439861231; Thu, 4 Apr 2019 14:52:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1554389538; bh=MOa6M+P0/LHESP0j4YbKVYUU6hJ3aNlwasBS+PK5v4g=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=NLvAhuGPAr+1wP3hVkBfvNGgkJs1CHbdLvcE+NDt73oXGbJd0UAhn5R1kV9q6AXP4 LKmSxt9URLhK5UnXAFTda2G+C05GNCmxCkTix89u4q4+UbMxEiaGWonFvr3nqh9rzi 3fSKdXFkpntDO6Td9d6+vto9bvhQQbnvGA6wzHa4= Received: from mail.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.codeaurora.org (Postfix) with ESMTP id 259BE6110F; Thu, 4 Apr 2019 14:52:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1554389536; bh=MOa6M+P0/LHESP0j4YbKVYUU6hJ3aNlwasBS+PK5v4g=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=DBgpnxye683RDJ9EAsVOBWg1+56kD3LfpPm72YCD6umowFKeR+ROUoH2MlMYp9unK bcOfJVFKnfQ+wWn84uQ/+q0wuIJnKr76ObrWotkHosgfdnBcSwSIKeWf7NyNVnBZcq 1SlD5fy/5cheximu4jzanzM+bKHfQInaj4UkgYDc= MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Thu, 04 Apr 2019 20:22:15 +0530 From: Balakrishna Godavarthi To: Matthias Kaehlcke Cc: c-hbandi@codeaurora.org, Marcel Holtmann , Johan Hedberg , linux-bluetooth@vger.kernel.org, linux-kernel@vger.kernel.org, Hemantg Subject: Re: [PATCH 2/2] Bluetooth: hci_qca: wcn3990: Drop baudrate change vendor event In-Reply-To: <20190403161438.GN112750@google.com> References: <20190307004041.38059-3-mka@chromium.org> <20190307182009.GB138592@google.com> <20190307233039.GA69116@google.com> <20190401171202.GH112750@google.com> <1bebaf0766988a9f27cbbcb9bb35e26f@codeaurora.org> <20190402180501.GK112750@google.com> <1200f5c32e029c603667d66a4c62a634@codeaurora.org> <20190403161438.GN112750@google.com> Message-ID: X-Sender: bgodavar@codeaurora.org User-Agent: Roundcube Webmail/1.2.5 Sender: linux-bluetooth-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-bluetooth@vger.kernel.org Hi Matthias, On 2019-04-03 21:44, Matthias Kaehlcke wrote: > On Wed, Apr 03, 2019 at 11:53:26AM +0530, Balakrishna Godavarthi wrote: >> + Harish to update his findings on wcn3998. >> Mean time i will perform a regression to detect the failure. > > On my system it typically reproduces within a few dozen > iterations. Make sure your tree doesn't contain any custom BT patches, > even if they just add logging or fix the timeout during > initialization. Since this problem is timing sensitive it might get > masked. My tree is based on 4.19 LTS with all QCA BT related changes > on top: > https://chromium.googlesource.com/chromiumos/third_party/kernel/+/refs/heads/chromeos-4.19 > > I found some problems during initialization easier to reproduce > when binding and unbinding the device through sysfs, instead of > doing hciconfig up/down, this resembles more the initialization at > boot time. > [Bala]: I am able to replicate this issue. are you seeing the below error message getting logged on the console in the issue case or an different error message. Bluetooth: hci0: QCA TLV response size mismatch >> On 2019-04-02 23:35, Matthias Kaehlcke wrote: >> > On Tue, Apr 02, 2019 at 05:32:54PM +0530, Balakrishna Godavarthi wrote: >> > > Hi Matthias, >> > > >> > > On 2019-04-01 22:42, Matthias Kaehlcke wrote: >> > > > On Mon, Apr 01, 2019 at 01:48:23PM +0530, Balakrishna Godavarthi wrote: >> > > > > Hi Matthias, >> > > > > >> > > > > On 2019-04-01 13:29, Balakrishna Godavarthi wrote: >> > > > > > Hi Matthias, >> > > > > > >> > > > > > Sorry for the late reply i was on vacation. >> > > > > > >> > > > > > On 2019-03-08 05:00, Matthias Kaehlcke wrote: >> > > > > > > On Thu, Mar 07, 2019 at 10:20:09AM -0800, Matthias Kaehlcke wrote: >> > > > > > > > Hi Balakrishna, >> > > > > > > > >> > > > > > > > On Thu, Mar 07, 2019 at 10:35:08AM +0530, Balakrishna Godavarthi >> > > > > > > > wrote: >> > > > > > > > > hi Matthias, >> > > > > > > > > >> > > > > > > > > On 2019-03-07 06:10, Matthias Kaehlcke wrote: >> > > > > > > > > > Firmware download to the WCN3990 often fails with a 'TLV response size >> > > > > > > > > > mismatch' error: >> > > > > > > > > > >> > > > > > > > > > [ 133.064659] Bluetooth: hci0: setting up wcn3990 >> > > > > > > > > > [ 133.489150] Bluetooth: hci0: QCA controller version 0x02140201 >> > > > > > > > > > [ 133.495245] Bluetooth: hci0: QCA Downloading qca/crbtfw21.tlv >> > > > > > > > > > [ 133.507214] Bluetooth: hci0: QCA TLV response size mismatch >> > > > > > > > > > [ 133.513265] Bluetooth: hci0: QCA Failed to download patch (-84) >> > > > > > > > > > >> > > > > > > > > > This is caused by a vendor event that corresponds to an earlier command >> > > > > > > > > > to change the baudrate. The event is not processed in the context of the >> > > > > > > > > > baudrate change and later interpreted as response to the firmware >> > > > > > > > > > download command (which is also a vendor command), but the driver >> > > > > > > > > > detects >> > > > > > > > > > that the event doesn't have the expected amount of associated data. >> > > > > > > > > > >> > > > > > > > > > More details: >> > > > > > > > > > >> > > > > > > > > > For the WCN3990 the vendor command for a baudrate change isn't sent as >> > > > > > > > > > synchronous HCI command, because the controller sends the corresponding >> > > > > > > > > > vendor event with the new baudrate. The event is received and decoded >> > > > > > > > > > after the baudrate change of the host port. >> > > > > > > > > > >> > > > > > > > > > Identify the 'unused' event when it is received and don't add it to >> > > > > > > > > > the queue of RX frames. >> > > > > > > > > > >> > > > > > > > > > Signed-off-by: Matthias Kaehlcke >> > > > > > > > > > --- >> > > > > > > > > >> > > > > > > > > ... >> > > > > > > > > >> > > > > > > > > Can you test by reverting this change "94d6671473924". >> > > > > > > > >> > > > > > > > The issue is still reproducible. >> > > > > > > > >> > > > > > > > > We need at least 15ms minimum delay for the soc to change its baud rate and >> > > > > > > > > respond to the with command complete event. >> > > > > > > > >> > > > > > > > The baudrate change has clearly been successful when the problem is >> > > > > > > > observed, since the host receives the vendor event with the new >> > > > > > > > baudrate. >> > > > > > > >> > > > > > > I forgot to mention this earlier: the controller doesn't send a >> > > > > > > command complete event for the command, or at least not a correct >> > > > > > > one. >> > > > > > > >> > > > > > > That's the data that is received: >> > > > > > > >> > > > > > > 04 0e 04 01 00 00 00 >> > > > > > > ~~ ~~ >> > > > > > > >> > > > > > [Bala]: can you share me the command sent and event recevied. >> > > > > > I see that we receive a command complete event for the baud rate >> > > > > > change command. >> > > > > > >> > > > > > command sent: 01 48 fc 01 11 >> > > > > > vendor specific event: 04 ff 02 92 01 >> > > > > > command complete event: 04 0e 04 01 00 00 00. >> > > > > > >> > > > > > >> > > > > > >> > > > > > > This is *a* command complete event, but the opcode is 0x0000 instead >> > > > > > > of the earlier command. The same happens for the firmware >> > > > > > > download/read version command, which is the reason why the command >> > > > > > > complete injection mess >> > > > > > > (https://lore.kernel.org/patchwork/patch/1027955/) is needed in one >> > > > > > > way or another. >> > > > > > > >> > > > > > [Bala]: fw download approach is different where we use >> > > > > > __hci_cmd_sync() where as here we use hci_uart_tx_wakeup() >> > > > > > which directly calls the hci_uart_write_work(). so even we >> > > > > > send an valid opcode or not for baudrate change will bot matter. >> > > > > > >> > > > > [Bala]: i miss understood the comment. Yes your true. in the all >> > > > > vendor >> > > > > commands SoC responds with an 0x0000 opcode. >> > > > >> > > > And IIUC this is not compliant with the spec, or at least the BT core >> > > > expects the actual opcode to consider the command to be completed. >> > > >> > > [Bala]: Did you try increasing the the baud rate change timeout to >> > > 50ms >> > > instead of 10ms. >> > >> > It is even reproducible with the ROME timeout of 300ms. >> > >> > The timeout doesn't help here. With 78e8fa2972e5 ("Bluetooth: hci_qca: >> > Deassert RTS while baudrate change command") RTS is deasserted during >> > the baudrate change, hence the controller only sends the response when >> > RTS is asserted again. Before the event resulted in a frame reassembly >> > error and the data was discarded. >> > >> > > i suspect it is an timing issue. >> > >> > Timing is certainly also a factor here (the problem isn't seen >> > always), but I don't think 'timing issue' is a proper description of >> > this issue. The problem is an event that the Bluetooth core doesn't >> > expect due to the hack of sending a raw command behind the core's back >> > to work around the firmware 'feature' of sending the command response >> > with the new baudrate. >> > >> > Maybe a delay after re-asserting RTS can address is reliably, I don't >> > remember if I already experimented with that in the past. If a delay >> > can 'fix' the issue reliably I'm open to consider it for the sake of >> > simplicity, but only with a detailed comment that describes the >> > problem. >> -- Regards Balakrishna.