2019-03-07 00:41:17

by Matthias Kaehlcke

[permalink] [raw]
Subject: [PATCH 0/2] Bluetooth: hci_qca: Fix WCN3990 firmware download

Firmware download to the WCN3990 often fails with a 'TLV response size
mismatch' error:

[ 133.064659] Bluetooth: hci0: setting up wcn3990
[ 133.489150] Bluetooth: hci0: QCA controller version 0x02140201
[ 133.495245] Bluetooth: hci0: QCA Downloading qca/crbtfw21.tlv
[ 133.507214] Bluetooth: hci0: QCA TLV response size mismatch
[ 133.513265] Bluetooth: hci0: QCA Failed to download patch (-84)

This is caused by a vendor event 'without command' that is misinterpreted.
Drop the event.

Matthias Kaehlcke (2):
Bluetooth: hci_qca: Rename STATE_<flags> to QCA_<flags>
Bluetooth: hci_qca: wcn3990: Drop baudrate change vendor event

drivers/bluetooth/hci_qca.c | 65 ++++++++++++++++++++++++++++++++-----
1 file changed, 56 insertions(+), 9 deletions(-)

--
2.21.0.352.gf09ad66450-goog



2019-03-07 00:41:06

by Matthias Kaehlcke

[permalink] [raw]
Subject: [PATCH 2/2] Bluetooth: hci_qca: wcn3990: Drop baudrate change vendor event

Firmware download to the WCN3990 often fails with a 'TLV response size
mismatch' error:

[ 133.064659] Bluetooth: hci0: setting up wcn3990
[ 133.489150] Bluetooth: hci0: QCA controller version 0x02140201
[ 133.495245] Bluetooth: hci0: QCA Downloading qca/crbtfw21.tlv
[ 133.507214] Bluetooth: hci0: QCA TLV response size mismatch
[ 133.513265] Bluetooth: hci0: QCA Failed to download patch (-84)

This is caused by a vendor event that corresponds to an earlier command
to change the baudrate. The event is not processed in the context of the
baudrate change and later interpreted as response to the firmware
download command (which is also a vendor command), but the driver detects
that the event doesn't have the expected amount of associated data.

More details:

For the WCN3990 the vendor command for a baudrate change isn't sent as
synchronous HCI command, because the controller sends the corresponding
vendor event with the new baudrate. The event is received and decoded
after the baudrate change of the host port.

Identify the 'unused' event when it is received and don't add it to
the queue of RX frames.

Signed-off-by: Matthias Kaehlcke <[email protected]>
---
drivers/bluetooth/hci_qca.c | 54 ++++++++++++++++++++++++++++++++++---
1 file changed, 51 insertions(+), 3 deletions(-)

diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
index ab8e59419dbc4..565681a6a1167 100644
--- a/drivers/bluetooth/hci_qca.c
+++ b/drivers/bluetooth/hci_qca.c
@@ -30,6 +30,7 @@

#include <linux/kernel.h>
#include <linux/clk.h>
+#include <linux/completion.h>
#include <linux/debugfs.h>
#include <linux/delay.h>
#include <linux/device.h>
@@ -55,6 +56,7 @@
#define HCI_MAX_IBS_SIZE 10

#define QCA_IN_BAND_SLEEP_ENABLED BIT(0)
+#define QCA_DROP_VENDOR_EVENT BIT(1)

#define IBS_WAKE_RETRANS_TIMEOUT_MS 100
#define IBS_TX_IDLE_TIMEOUT_MS 2000
@@ -108,6 +110,7 @@ struct qca_data {
struct work_struct ws_rx_vote_off;
struct work_struct ws_tx_vote_off;
unsigned long flags;
+ struct completion drop_ev_comp;

/* For debugging purpose */
u64 ibs_sent_wacks;
@@ -474,6 +477,7 @@ static int qca_open(struct hci_uart *hu)
INIT_WORK(&qca->ws_tx_vote_off, qca_wq_serial_tx_clock_vote_off);

qca->hu = hu;
+ init_completion(&qca->drop_ev_comp);

/* Assume we start with both sides asleep -- extra wakes OK */
qca->tx_ibs_state = HCI_IBS_TX_ASLEEP;
@@ -866,6 +870,33 @@ static int qca_recv_acl_data(struct hci_dev *hdev, struct sk_buff *skb)
return hci_recv_frame(hdev, skb);
}

+static int qca_recv_event(struct hci_dev *hdev, struct sk_buff *skb)
+{
+ struct hci_uart *hu = hci_get_drvdata(hdev);
+ struct qca_data *qca = hu->priv;
+
+ if (test_bit(QCA_DROP_VENDOR_EVENT, &qca->flags)) {
+ struct hci_event_hdr *hdr = (void *)skb->data;
+
+ /* For the WCN3990 the vendor command for a baudrate change
+ * isn't sent as synchronous HCI command, because the
+ * controller sends the corresponding vendor event with the
+ * new baudrate. The event is received and properly decoded
+ * after changing the baudrate of the host port. It needs to
+ * be dropped, otherwise it can be mis-interpreted as
+ * response to a later firmware download command (also a
+ * vendor command).
+ */
+
+ if (hdr->evt == HCI_EV_VENDOR)
+ complete(&qca->drop_ev_comp);
+
+ return 0;
+ }
+
+ return hci_recv_frame(hdev, skb);
+}
+
#define QCA_IBS_SLEEP_IND_EVENT \
.type = HCI_IBS_SLEEP_IND, \
.hlen = 0, \
@@ -890,7 +921,7 @@ static int qca_recv_acl_data(struct hci_dev *hdev, struct sk_buff *skb)
static const struct h4_recv_pkt qca_recv_pkts[] = {
{ H4_RECV_ACL, .recv = qca_recv_acl_data },
{ H4_RECV_SCO, .recv = hci_recv_frame },
- { H4_RECV_EVENT, .recv = hci_recv_frame },
+ { H4_RECV_EVENT, .recv = qca_recv_event },
{ QCA_IBS_WAKE_IND_EVENT, .recv = qca_ibs_wake_ind },
{ QCA_IBS_WAKE_ACK_EVENT, .recv = qca_ibs_wake_ack },
{ QCA_IBS_SLEEP_IND_EVENT, .recv = qca_ibs_sleep_ind },
@@ -1091,6 +1122,7 @@ static int qca_set_speed(struct hci_uart *hu, enum qca_speed_type speed_type)
{
unsigned int speed, qca_baudrate;
struct qca_serdev *qcadev;
+ struct qca_data *qca = hu->priv;
int ret = 0;

if (speed_type == QCA_INIT_SPEED) {
@@ -1106,8 +1138,11 @@ static int qca_set_speed(struct hci_uart *hu, enum qca_speed_type speed_type)
* changing the baudrate of chip and host.
*/
qcadev = serdev_device_get_drvdata(hu->serdev);
- if (qcadev->btsoc_type == QCA_WCN3990)
+ if (qcadev->btsoc_type == QCA_WCN3990) {
hci_uart_set_flow_control(hu, true);
+ reinit_completion(&qca->drop_ev_comp);
+ set_bit(QCA_DROP_VENDOR_EVENT, &qca->flags);
+ }

qca_baudrate = qca_get_baudrate_value(speed);
bt_dev_dbg(hu->hdev, "Set UART speed to %d", speed);
@@ -1118,8 +1153,21 @@ static int qca_set_speed(struct hci_uart *hu, enum qca_speed_type speed_type)
host_set_baudrate(hu, speed);

error:
- if (qcadev->btsoc_type == QCA_WCN3990)
+ if (qcadev->btsoc_type == QCA_WCN3990) {
hci_uart_set_flow_control(hu, false);
+
+ /* Wait for the controller to send the vendor event
+ * for the baudrate change command.
+ */
+ if (!wait_for_completion_timeout(&qca->drop_ev_comp,
+ msecs_to_jiffies(100))) {
+ bt_dev_err(hu->hdev,
+ "Failed to change controller baudrate\n");
+ ret = -EPROTO;
+ }
+
+ clear_bit(QCA_DROP_VENDOR_EVENT, &qca->flags);
+ }
}

return ret;
--
2.21.0.352.gf09ad66450-goog


2019-03-07 00:41:14

by Matthias Kaehlcke

[permalink] [raw]
Subject: [PATCH 1/2] Bluetooth: hci_qca: Rename STATE_<flags> to QCA_<flags>

Rename STATE_IN_BAND_SLEEP_ENABLED to QCA_IN_BAND_SLEEP_ENABLED.
The constant represents a flag (multiple flags can be set at once),
not a unique state of the controller or driver. Also use the BIT()
macro to specify the position of the flag instead of an integer
literal.

Signed-off-by: Matthias Kaehlcke <[email protected]>
---
drivers/bluetooth/hci_qca.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
index 237aea34b69f1..ab8e59419dbc4 100644
--- a/drivers/bluetooth/hci_qca.c
+++ b/drivers/bluetooth/hci_qca.c
@@ -54,8 +54,7 @@
#define HCI_IBS_WAKE_ACK 0xFC
#define HCI_MAX_IBS_SIZE 10

-/* Controller states */
-#define STATE_IN_BAND_SLEEP_ENABLED 1
+#define QCA_IN_BAND_SLEEP_ENABLED BIT(0)

#define IBS_WAKE_RETRANS_TIMEOUT_MS 100
#define IBS_TX_IDLE_TIMEOUT_MS 2000
@@ -775,7 +774,7 @@ static int qca_enqueue(struct hci_uart *hu, struct sk_buff *skb)
/* Don't go to sleep in middle of patch download or
* Out-Of-Band(GPIOs control) sleep is selected.
*/
- if (!test_bit(STATE_IN_BAND_SLEEP_ENABLED, &qca->flags)) {
+ if (!test_bit(QCA_IN_BAND_SLEEP_ENABLED, &qca->flags)) {
skb_queue_tail(&qca->txq, skb);
spin_unlock_irqrestore(&qca->hci_ibs_lock, flags);
return 0;
@@ -1192,7 +1191,7 @@ static int qca_setup(struct hci_uart *hu)
return ret;

/* Patch downloading has to be done without IBS mode */
- clear_bit(STATE_IN_BAND_SLEEP_ENABLED, &qca->flags);
+ clear_bit(QCA_IN_BAND_SLEEP_ENABLED, &qca->flags);

if (qcadev->btsoc_type == QCA_WCN3990) {
bt_dev_info(hdev, "setting up wcn3990");
@@ -1236,7 +1235,7 @@ static int qca_setup(struct hci_uart *hu)
/* Setup patch / NVM configurations */
ret = qca_uart_setup(hdev, qca_baudrate, qcadev->btsoc_type, soc_ver);
if (!ret) {
- set_bit(STATE_IN_BAND_SLEEP_ENABLED, &qca->flags);
+ set_bit(QCA_IN_BAND_SLEEP_ENABLED, &qca->flags);
qca_debugfs_init(hdev);
} else if (ret == -ENOENT) {
/* No patch/nvm-config found, run with original fw/config */
@@ -1294,7 +1293,7 @@ static void qca_power_shutdown(struct hci_uart *hu)
* data in skb's.
*/
spin_lock_irqsave(&qca->hci_ibs_lock, flags);
- clear_bit(STATE_IN_BAND_SLEEP_ENABLED, &qca->flags);
+ clear_bit(QCA_IN_BAND_SLEEP_ENABLED, &qca->flags);
qca_flush(hu);
spin_unlock_irqrestore(&qca->hci_ibs_lock, flags);

--
2.21.0.352.gf09ad66450-goog


2019-03-07 02:21:40

by Balakrishna Godavarthi

[permalink] [raw]
Subject: Re: [PATCH 1/2] Bluetooth: hci_qca: Rename STATE_<flags> to QCA_<flags>

On 2019-03-07 06:10, Matthias Kaehlcke wrote:
> Rename STATE_IN_BAND_SLEEP_ENABLED to QCA_IN_BAND_SLEEP_ENABLED.
> The constant represents a flag (multiple flags can be set at once),
> not a unique state of the controller or driver. Also use the BIT()
> macro to specify the position of the flag instead of an integer
> literal.
>
> Signed-off-by: Matthias Kaehlcke <[email protected]>
> ---
> drivers/bluetooth/hci_qca.c | 11 +++++------
> 1 file changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
> index 237aea34b69f1..ab8e59419dbc4 100644
> --- a/drivers/bluetooth/hci_qca.c
> +++ b/drivers/bluetooth/hci_qca.c
> @@ -54,8 +54,7 @@
> #define HCI_IBS_WAKE_ACK 0xFC
> #define HCI_MAX_IBS_SIZE 10
>
> -/* Controller states */
> -#define STATE_IN_BAND_SLEEP_ENABLED 1
> +#define QCA_IN_BAND_SLEEP_ENABLED BIT(0)
>
> #define IBS_WAKE_RETRANS_TIMEOUT_MS 100
> #define IBS_TX_IDLE_TIMEOUT_MS 2000
> @@ -775,7 +774,7 @@ static int qca_enqueue(struct hci_uart *hu, struct
> sk_buff *skb)
> /* Don't go to sleep in middle of patch download or
> * Out-Of-Band(GPIOs control) sleep is selected.
> */
> - if (!test_bit(STATE_IN_BAND_SLEEP_ENABLED, &qca->flags)) {
> + if (!test_bit(QCA_IN_BAND_SLEEP_ENABLED, &qca->flags)) {
> skb_queue_tail(&qca->txq, skb);
> spin_unlock_irqrestore(&qca->hci_ibs_lock, flags);
> return 0;
> @@ -1192,7 +1191,7 @@ static int qca_setup(struct hci_uart *hu)
> return ret;
>
> /* Patch downloading has to be done without IBS mode */
> - clear_bit(STATE_IN_BAND_SLEEP_ENABLED, &qca->flags);
> + clear_bit(QCA_IN_BAND_SLEEP_ENABLED, &qca->flags);
>
> if (qcadev->btsoc_type == QCA_WCN3990) {
> bt_dev_info(hdev, "setting up wcn3990");
> @@ -1236,7 +1235,7 @@ static int qca_setup(struct hci_uart *hu)
> /* Setup patch / NVM configurations */
> ret = qca_uart_setup(hdev, qca_baudrate, qcadev->btsoc_type,
> soc_ver);
> if (!ret) {
> - set_bit(STATE_IN_BAND_SLEEP_ENABLED, &qca->flags);
> + set_bit(QCA_IN_BAND_SLEEP_ENABLED, &qca->flags);
> qca_debugfs_init(hdev);
> } else if (ret == -ENOENT) {
> /* No patch/nvm-config found, run with original fw/config */
> @@ -1294,7 +1293,7 @@ static void qca_power_shutdown(struct hci_uart
> *hu)
> * data in skb's.
> */
> spin_lock_irqsave(&qca->hci_ibs_lock, flags);
> - clear_bit(STATE_IN_BAND_SLEEP_ENABLED, &qca->flags);
> + clear_bit(QCA_IN_BAND_SLEEP_ENABLED, &qca->flags);
> qca_flush(hu);
> spin_unlock_irqrestore(&qca->hci_ibs_lock, flags);


Reviewed-by: Balakrishna Godavarthi <[email protected]>
--
Regards
Balakrishna.

2019-03-07 05:05:18

by Balakrishna Godavarthi

[permalink] [raw]
Subject: Re: [PATCH 2/2] Bluetooth: hci_qca: wcn3990: Drop baudrate change vendor event

hi Matthias,

On 2019-03-07 06:10, Matthias Kaehlcke wrote:
> Firmware download to the WCN3990 often fails with a 'TLV response size
> mismatch' error:
>
> [ 133.064659] Bluetooth: hci0: setting up wcn3990
> [ 133.489150] Bluetooth: hci0: QCA controller version 0x02140201
> [ 133.495245] Bluetooth: hci0: QCA Downloading qca/crbtfw21.tlv
> [ 133.507214] Bluetooth: hci0: QCA TLV response size mismatch
> [ 133.513265] Bluetooth: hci0: QCA Failed to download patch (-84)
>
> This is caused by a vendor event that corresponds to an earlier command
> to change the baudrate. The event is not processed in the context of
> the
> baudrate change and later interpreted as response to the firmware
> download command (which is also a vendor command), but the driver
> detects
> that the event doesn't have the expected amount of associated data.
>
> More details:
>
> For the WCN3990 the vendor command for a baudrate change isn't sent as
> synchronous HCI command, because the controller sends the corresponding
> vendor event with the new baudrate. The event is received and decoded
> after the baudrate change of the host port.
>
> Identify the 'unused' event when it is received and don't add it to
> the queue of RX frames.
>
> Signed-off-by: Matthias Kaehlcke <[email protected]>
> ---
> drivers/bluetooth/hci_qca.c | 54 ++++++++++++++++++++++++++++++++++---
> 1 file changed, 51 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
> index ab8e59419dbc4..565681a6a1167 100644
> --- a/drivers/bluetooth/hci_qca.c
> +++ b/drivers/bluetooth/hci_qca.c
> @@ -30,6 +30,7 @@
>
> #include <linux/kernel.h>
> #include <linux/clk.h>
> +#include <linux/completion.h>
> #include <linux/debugfs.h>
> #include <linux/delay.h>
> #include <linux/device.h>
> @@ -55,6 +56,7 @@
> #define HCI_MAX_IBS_SIZE 10
>
> #define QCA_IN_BAND_SLEEP_ENABLED BIT(0)
> +#define QCA_DROP_VENDOR_EVENT BIT(1)
>
> #define IBS_WAKE_RETRANS_TIMEOUT_MS 100
> #define IBS_TX_IDLE_TIMEOUT_MS 2000
> @@ -108,6 +110,7 @@ struct qca_data {
> struct work_struct ws_rx_vote_off;
> struct work_struct ws_tx_vote_off;
> unsigned long flags;
> + struct completion drop_ev_comp;
>
> /* For debugging purpose */
> u64 ibs_sent_wacks;
> @@ -474,6 +477,7 @@ static int qca_open(struct hci_uart *hu)
> INIT_WORK(&qca->ws_tx_vote_off, qca_wq_serial_tx_clock_vote_off);
>
> qca->hu = hu;
> + init_completion(&qca->drop_ev_comp);
>
> /* Assume we start with both sides asleep -- extra wakes OK */
> qca->tx_ibs_state = HCI_IBS_TX_ASLEEP;
> @@ -866,6 +870,33 @@ static int qca_recv_acl_data(struct hci_dev
> *hdev, struct sk_buff *skb)
> return hci_recv_frame(hdev, skb);
> }
>
> +static int qca_recv_event(struct hci_dev *hdev, struct sk_buff *skb)
> +{
> + struct hci_uart *hu = hci_get_drvdata(hdev);
> + struct qca_data *qca = hu->priv;
> +
> + if (test_bit(QCA_DROP_VENDOR_EVENT, &qca->flags)) {
> + struct hci_event_hdr *hdr = (void *)skb->data;
> +
> + /* For the WCN3990 the vendor command for a baudrate change
> + * isn't sent as synchronous HCI command, because the
> + * controller sends the corresponding vendor event with the
> + * new baudrate. The event is received and properly decoded
> + * after changing the baudrate of the host port. It needs to
> + * be dropped, otherwise it can be mis-interpreted as
> + * response to a later firmware download command (also a
> + * vendor command).
> + */
> +
> + if (hdr->evt == HCI_EV_VENDOR)
> + complete(&qca->drop_ev_comp);
> +
> + return 0;
> + }
> +
> + return hci_recv_frame(hdev, skb);
> +}
> +
> #define QCA_IBS_SLEEP_IND_EVENT \
> .type = HCI_IBS_SLEEP_IND, \
> .hlen = 0, \
> @@ -890,7 +921,7 @@ static int qca_recv_acl_data(struct hci_dev *hdev,
> struct sk_buff *skb)
> static const struct h4_recv_pkt qca_recv_pkts[] = {
> { H4_RECV_ACL, .recv = qca_recv_acl_data },
> { H4_RECV_SCO, .recv = hci_recv_frame },
> - { H4_RECV_EVENT, .recv = hci_recv_frame },
> + { H4_RECV_EVENT, .recv = qca_recv_event },
> { QCA_IBS_WAKE_IND_EVENT, .recv = qca_ibs_wake_ind },
> { QCA_IBS_WAKE_ACK_EVENT, .recv = qca_ibs_wake_ack },
> { QCA_IBS_SLEEP_IND_EVENT, .recv = qca_ibs_sleep_ind },
> @@ -1091,6 +1122,7 @@ static int qca_set_speed(struct hci_uart *hu,
> enum qca_speed_type speed_type)
> {
> unsigned int speed, qca_baudrate;
> struct qca_serdev *qcadev;
> + struct qca_data *qca = hu->priv;
> int ret = 0;
>
> if (speed_type == QCA_INIT_SPEED) {
> @@ -1106,8 +1138,11 @@ static int qca_set_speed(struct hci_uart *hu,
> enum qca_speed_type speed_type)
> * changing the baudrate of chip and host.
> */
> qcadev = serdev_device_get_drvdata(hu->serdev);
> - if (qcadev->btsoc_type == QCA_WCN3990)
> + if (qcadev->btsoc_type == QCA_WCN3990) {
> hci_uart_set_flow_control(hu, true);
> + reinit_completion(&qca->drop_ev_comp);
> + set_bit(QCA_DROP_VENDOR_EVENT, &qca->flags);
> + }
>
> qca_baudrate = qca_get_baudrate_value(speed);
> bt_dev_dbg(hu->hdev, "Set UART speed to %d", speed);
> @@ -1118,8 +1153,21 @@ static int qca_set_speed(struct hci_uart *hu,
> enum qca_speed_type speed_type)
> host_set_baudrate(hu, speed);
>
> error:
> - if (qcadev->btsoc_type == QCA_WCN3990)
> + if (qcadev->btsoc_type == QCA_WCN3990) {
> hci_uart_set_flow_control(hu, false);
> +
> + /* Wait for the controller to send the vendor event
> + * for the baudrate change command.
> + */
> + if (!wait_for_completion_timeout(&qca->drop_ev_comp,
> + msecs_to_jiffies(100))) {
> + bt_dev_err(hu->hdev,
> + "Failed to change controller baudrate\n");
> + ret = -EPROTO;
> + }
> +
> + clear_bit(QCA_DROP_VENDOR_EVENT, &qca->flags);
> + }
> }
>
> return ret;

Can you test by reverting this change "94d6671473924".
We need at least 15ms minimum delay for the soc to change its baud rate
and respond to the with command complete event.
In my previous stress test we have not observed any issues.

--
Regards
Balakrishna.

2019-03-07 16:35:39

by Matthias Kaehlcke

[permalink] [raw]
Subject: Re: [PATCH 1/2] Bluetooth: hci_qca: Rename STATE_<flags> to QCA_<flags>

On Wed, Mar 06, 2019 at 04:40:40PM -0800, Matthias Kaehlcke wrote:
> Rename STATE_IN_BAND_SLEEP_ENABLED to QCA_IN_BAND_SLEEP_ENABLED.
> The constant represents a flag (multiple flags can be set at once),
> not a unique state of the controller or driver. Also use the BIT()
> macro to specify the position of the flag instead of an integer
> literal.
>
> Signed-off-by: Matthias Kaehlcke <[email protected]>
> ---
> drivers/bluetooth/hci_qca.c | 11 +++++------
> 1 file changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
> index 237aea34b69f1..ab8e59419dbc4 100644
> --- a/drivers/bluetooth/hci_qca.c
> +++ b/drivers/bluetooth/hci_qca.c
> @@ -54,8 +54,7 @@
> #define HCI_IBS_WAKE_ACK 0xFC
> #define HCI_MAX_IBS_SIZE 10
>
> -/* Controller states */
> -#define STATE_IN_BAND_SLEEP_ENABLED 1
> +#define QCA_IN_BAND_SLEEP_ENABLED BIT(0)

I realized that the use of BIT() isn't correct here, since the value
is used with set_bit(), ..., which expect a bit *number*. It should
probably be an emum starting with 0.

I'll update it in the next revision.

>
> #define IBS_WAKE_RETRANS_TIMEOUT_MS 100
> #define IBS_TX_IDLE_TIMEOUT_MS 2000
> @@ -775,7 +774,7 @@ static int qca_enqueue(struct hci_uart *hu, struct sk_buff *skb)
> /* Don't go to sleep in middle of patch download or
> * Out-Of-Band(GPIOs control) sleep is selected.
> */
> - if (!test_bit(STATE_IN_BAND_SLEEP_ENABLED, &qca->flags)) {
> + if (!test_bit(QCA_IN_BAND_SLEEP_ENABLED, &qca->flags)) {
> skb_queue_tail(&qca->txq, skb);
> spin_unlock_irqrestore(&qca->hci_ibs_lock, flags);
> return 0;
> @@ -1192,7 +1191,7 @@ static int qca_setup(struct hci_uart *hu)
> return ret;
>
> /* Patch downloading has to be done without IBS mode */
> - clear_bit(STATE_IN_BAND_SLEEP_ENABLED, &qca->flags);
> + clear_bit(QCA_IN_BAND_SLEEP_ENABLED, &qca->flags);
>
> if (qcadev->btsoc_type == QCA_WCN3990) {
> bt_dev_info(hdev, "setting up wcn3990");
> @@ -1236,7 +1235,7 @@ static int qca_setup(struct hci_uart *hu)
> /* Setup patch / NVM configurations */
> ret = qca_uart_setup(hdev, qca_baudrate, qcadev->btsoc_type, soc_ver);
> if (!ret) {
> - set_bit(STATE_IN_BAND_SLEEP_ENABLED, &qca->flags);
> + set_bit(QCA_IN_BAND_SLEEP_ENABLED, &qca->flags);
> qca_debugfs_init(hdev);
> } else if (ret == -ENOENT) {
> /* No patch/nvm-config found, run with original fw/config */
> @@ -1294,7 +1293,7 @@ static void qca_power_shutdown(struct hci_uart *hu)
> * data in skb's.
> */
> spin_lock_irqsave(&qca->hci_ibs_lock, flags);
> - clear_bit(STATE_IN_BAND_SLEEP_ENABLED, &qca->flags);
> + clear_bit(QCA_IN_BAND_SLEEP_ENABLED, &qca->flags);
> qca_flush(hu);
> spin_unlock_irqrestore(&qca->hci_ibs_lock, flags);
>

2019-03-07 18:20:14

by Matthias Kaehlcke

[permalink] [raw]
Subject: Re: [PATCH 2/2] Bluetooth: hci_qca: wcn3990: Drop baudrate change vendor event

Hi Balakrishna,

On Thu, Mar 07, 2019 at 10:35:08AM +0530, Balakrishna Godavarthi wrote:
> hi Matthias,
>
> On 2019-03-07 06:10, Matthias Kaehlcke wrote:
> > Firmware download to the WCN3990 often fails with a 'TLV response size
> > mismatch' error:
> >
> > [ 133.064659] Bluetooth: hci0: setting up wcn3990
> > [ 133.489150] Bluetooth: hci0: QCA controller version 0x02140201
> > [ 133.495245] Bluetooth: hci0: QCA Downloading qca/crbtfw21.tlv
> > [ 133.507214] Bluetooth: hci0: QCA TLV response size mismatch
> > [ 133.513265] Bluetooth: hci0: QCA Failed to download patch (-84)
> >
> > This is caused by a vendor event that corresponds to an earlier command
> > to change the baudrate. The event is not processed in the context of the
> > baudrate change and later interpreted as response to the firmware
> > download command (which is also a vendor command), but the driver
> > detects
> > that the event doesn't have the expected amount of associated data.
> >
> > More details:
> >
> > For the WCN3990 the vendor command for a baudrate change isn't sent as
> > synchronous HCI command, because the controller sends the corresponding
> > vendor event with the new baudrate. The event is received and decoded
> > after the baudrate change of the host port.
> >
> > Identify the 'unused' event when it is received and don't add it to
> > the queue of RX frames.
> >
> > Signed-off-by: Matthias Kaehlcke <[email protected]>
> > ---
>
> ...
>
> Can you test by reverting this change "94d6671473924".

The issue is still reproducible.

> We need at least 15ms minimum delay for the soc to change its baud rate and
> respond to the with command complete event.

The baudrate change has clearly been successful when the problem is
observed, since the host receives the vendor event with the new
baudrate.

We could increase the delay to 15ms just in case, though in my stress
tests with 1000s of iterations I didn't observe any problem with the
baudrate change itself. However if this patch gets landed the delay
could be removed completely, since we are waiting until the vendor
event is received with the new baudrate.

> In my previous stress test we have not observed any issues.

Which patches did you use for the stress test?

Did your stack include '78e8fa2972e55 Bluetooth: hci_qca: Deassert RTS
while baudrate change command' (or equivalent)? It is mainly this
patch which *exposes* the problem, before it (and with the long delay)
the controller response was received with the old baudrate and could
not be decoded ("frame reassembly errors").

Also make sure to test without 'Bluetooth: btqca: inject command
complete event during fw download'
(https://lore.kernel.org/patchwork/patch/1027955/), since it affects
timing and injects extra events. It might seem that it fixes the
problem, but it just papers over it.

I have my suspicions that the mega-timeout of 300ms for ROME was
introduced to 'fix' a related issue ...

Cheers

Matthias

2019-03-07 23:30:45

by Matthias Kaehlcke

[permalink] [raw]
Subject: Re: [PATCH 2/2] Bluetooth: hci_qca: wcn3990: Drop baudrate change vendor event

On Thu, Mar 07, 2019 at 10:20:09AM -0800, Matthias Kaehlcke wrote:
> Hi Balakrishna,
>
> On Thu, Mar 07, 2019 at 10:35:08AM +0530, Balakrishna Godavarthi wrote:
> > hi Matthias,
> >
> > On 2019-03-07 06:10, Matthias Kaehlcke wrote:
> > > Firmware download to the WCN3990 often fails with a 'TLV response size
> > > mismatch' error:
> > >
> > > [ 133.064659] Bluetooth: hci0: setting up wcn3990
> > > [ 133.489150] Bluetooth: hci0: QCA controller version 0x02140201
> > > [ 133.495245] Bluetooth: hci0: QCA Downloading qca/crbtfw21.tlv
> > > [ 133.507214] Bluetooth: hci0: QCA TLV response size mismatch
> > > [ 133.513265] Bluetooth: hci0: QCA Failed to download patch (-84)
> > >
> > > This is caused by a vendor event that corresponds to an earlier command
> > > to change the baudrate. The event is not processed in the context of the
> > > baudrate change and later interpreted as response to the firmware
> > > download command (which is also a vendor command), but the driver
> > > detects
> > > that the event doesn't have the expected amount of associated data.
> > >
> > > More details:
> > >
> > > For the WCN3990 the vendor command for a baudrate change isn't sent as
> > > synchronous HCI command, because the controller sends the corresponding
> > > vendor event with the new baudrate. The event is received and decoded
> > > after the baudrate change of the host port.
> > >
> > > Identify the 'unused' event when it is received and don't add it to
> > > the queue of RX frames.
> > >
> > > Signed-off-by: Matthias Kaehlcke <[email protected]>
> > > ---
> >
> > ...
> >
> > Can you test by reverting this change "94d6671473924".
>
> The issue is still reproducible.
>
> > We need at least 15ms minimum delay for the soc to change its baud rate and
> > respond to the with command complete event.
>
> The baudrate change has clearly been successful when the problem is
> observed, since the host receives the vendor event with the new
> baudrate.

I forgot to mention this earlier: the controller doesn't send a
command complete event for the command, or at least not a correct
one.

That's the data that is received:

04 0e 04 01 00 00 00
~~ ~~

This is *a* command complete event, but the opcode is 0x0000 instead
of the earlier command. The same happens for the firmware
download/read version command, which is the reason why the command
complete injection mess
(https://lore.kernel.org/patchwork/patch/1027955/) is needed in one
way or another.

I wished Qualcomm FW developers would get their act together and:

- send actual command complete events
- acknowledge a baudrate change request using the current baudrate
like Broadcom and Intel chips apparently do

this would have saved countless hours of debugging and implementing
quirky workarounds ...

Maybe there is hope for future chips (hint, hint)?

2019-03-12 19:36:45

by Matthias Kaehlcke

[permalink] [raw]
Subject: Re: [PATCH 2/2] Bluetooth: hci_qca: wcn3990: Drop baudrate change vendor event

Hi,

I held back with sending an updated version because I observe baudrate
changes to fail with a 'frame reassembly' error in about 1 out of 500
cases. After some debugging I'm fairly confident the error is not
caused by this change.

For the record:

This is the data that is received received in response to the read
version and the change baudrate commands when the baudrate change
is (completely) succesful:

04 ff 0e 00 02 0a 00 00 00 01 00 01 02 14 02 01
40 04 0e 04 01 00 00 00 04 ff 02 92 01 04 0e 04
01 00 00 00

Which translates to the following frames:

# vendor event with the version info
04 ff 0e 00 02 0a 00 00 00 01 00 01 02 14 02 01 40

# command complete event (with opcode 0)
04 0e 04 01 00 00 00

# vendor event confirming the baudrate change
04 ff 02 92 01

# command complete event (with opcode 0)
04 0e 04 01 00 00 00


When the frame reassembly error occurs we receive:

04 ff 0e 00 02 0a 00 00 00 01 00 01 02 14 02 01
40 04 0e 04 04 ff 02 92 01 04 0e 04 01 00 00 00

or:

# vendor event with the version info
04 ff 0e 00 02 0a 00 00 00 01 00 01 02 14 02 01 40

# first bytes (!) of a command complete event
04 0e 04

# vendor event confirming the baudrate change
04 ff 02 92 01

# command complete event (with opcode 0)
04 0e 04 01 00 00 00


The data was logged at UART level, hence the HCI driver is not at
fault here. Since the issue is hard to reproduce I couldn't determine
(yet) whether the controller only sends a partial command complete
event or if data is lost by the UART.

This problem can be investigated separately, I'll send an updated
version of this patch soon.

Thanks

Matthias

On Wed, Mar 06, 2019 at 04:40:41PM -0800, Matthias Kaehlcke wrote:
> Firmware download to the WCN3990 often fails with a 'TLV response size
> mismatch' error:
>
> [ 133.064659] Bluetooth: hci0: setting up wcn3990
> [ 133.489150] Bluetooth: hci0: QCA controller version 0x02140201
> [ 133.495245] Bluetooth: hci0: QCA Downloading qca/crbtfw21.tlv
> [ 133.507214] Bluetooth: hci0: QCA TLV response size mismatch
> [ 133.513265] Bluetooth: hci0: QCA Failed to download patch (-84)
>
> This is caused by a vendor event that corresponds to an earlier command
> to change the baudrate. The event is not processed in the context of the
> baudrate change and later interpreted as response to the firmware
> download command (which is also a vendor command), but the driver detects
> that the event doesn't have the expected amount of associated data.
>
> More details:
>
> For the WCN3990 the vendor command for a baudrate change isn't sent as
> synchronous HCI command, because the controller sends the corresponding
> vendor event with the new baudrate. The event is received and decoded
> after the baudrate change of the host port.
>
> Identify the 'unused' event when it is received and don't add it to
> the queue of RX frames.
>
> Signed-off-by: Matthias Kaehlcke <[email protected]>
> ---
> drivers/bluetooth/hci_qca.c | 54 ++++++++++++++++++++++++++++++++++---
> 1 file changed, 51 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
> index ab8e59419dbc4..565681a6a1167 100644
> --- a/drivers/bluetooth/hci_qca.c
> +++ b/drivers/bluetooth/hci_qca.c
> @@ -30,6 +30,7 @@
>
> #include <linux/kernel.h>
> #include <linux/clk.h>
> +#include <linux/completion.h>
> #include <linux/debugfs.h>
> #include <linux/delay.h>
> #include <linux/device.h>
> @@ -55,6 +56,7 @@
> #define HCI_MAX_IBS_SIZE 10
>
> #define QCA_IN_BAND_SLEEP_ENABLED BIT(0)
> +#define QCA_DROP_VENDOR_EVENT BIT(1)
>
> #define IBS_WAKE_RETRANS_TIMEOUT_MS 100
> #define IBS_TX_IDLE_TIMEOUT_MS 2000
> @@ -108,6 +110,7 @@ struct qca_data {
> struct work_struct ws_rx_vote_off;
> struct work_struct ws_tx_vote_off;
> unsigned long flags;
> + struct completion drop_ev_comp;
>
> /* For debugging purpose */
> u64 ibs_sent_wacks;
> @@ -474,6 +477,7 @@ static int qca_open(struct hci_uart *hu)
> INIT_WORK(&qca->ws_tx_vote_off, qca_wq_serial_tx_clock_vote_off);
>
> qca->hu = hu;
> + init_completion(&qca->drop_ev_comp);
>
> /* Assume we start with both sides asleep -- extra wakes OK */
> qca->tx_ibs_state = HCI_IBS_TX_ASLEEP;
> @@ -866,6 +870,33 @@ static int qca_recv_acl_data(struct hci_dev *hdev, struct sk_buff *skb)
> return hci_recv_frame(hdev, skb);
> }
>
> +static int qca_recv_event(struct hci_dev *hdev, struct sk_buff *skb)
> +{
> + struct hci_uart *hu = hci_get_drvdata(hdev);
> + struct qca_data *qca = hu->priv;
> +
> + if (test_bit(QCA_DROP_VENDOR_EVENT, &qca->flags)) {
> + struct hci_event_hdr *hdr = (void *)skb->data;
> +
> + /* For the WCN3990 the vendor command for a baudrate change
> + * isn't sent as synchronous HCI command, because the
> + * controller sends the corresponding vendor event with the
> + * new baudrate. The event is received and properly decoded
> + * after changing the baudrate of the host port. It needs to
> + * be dropped, otherwise it can be mis-interpreted as
> + * response to a later firmware download command (also a
> + * vendor command).
> + */
> +
> + if (hdr->evt == HCI_EV_VENDOR)
> + complete(&qca->drop_ev_comp);
> +
> + return 0;
> + }
> +
> + return hci_recv_frame(hdev, skb);
> +}
> +
> #define QCA_IBS_SLEEP_IND_EVENT \
> .type = HCI_IBS_SLEEP_IND, \
> .hlen = 0, \
> @@ -890,7 +921,7 @@ static int qca_recv_acl_data(struct hci_dev *hdev, struct sk_buff *skb)
> static const struct h4_recv_pkt qca_recv_pkts[] = {
> { H4_RECV_ACL, .recv = qca_recv_acl_data },
> { H4_RECV_SCO, .recv = hci_recv_frame },
> - { H4_RECV_EVENT, .recv = hci_recv_frame },
> + { H4_RECV_EVENT, .recv = qca_recv_event },
> { QCA_IBS_WAKE_IND_EVENT, .recv = qca_ibs_wake_ind },
> { QCA_IBS_WAKE_ACK_EVENT, .recv = qca_ibs_wake_ack },
> { QCA_IBS_SLEEP_IND_EVENT, .recv = qca_ibs_sleep_ind },
> @@ -1091,6 +1122,7 @@ static int qca_set_speed(struct hci_uart *hu, enum qca_speed_type speed_type)
> {
> unsigned int speed, qca_baudrate;
> struct qca_serdev *qcadev;
> + struct qca_data *qca = hu->priv;
> int ret = 0;
>
> if (speed_type == QCA_INIT_SPEED) {
> @@ -1106,8 +1138,11 @@ static int qca_set_speed(struct hci_uart *hu, enum qca_speed_type speed_type)
> * changing the baudrate of chip and host.
> */
> qcadev = serdev_device_get_drvdata(hu->serdev);
> - if (qcadev->btsoc_type == QCA_WCN3990)
> + if (qcadev->btsoc_type == QCA_WCN3990) {
> hci_uart_set_flow_control(hu, true);
> + reinit_completion(&qca->drop_ev_comp);
> + set_bit(QCA_DROP_VENDOR_EVENT, &qca->flags);
> + }
>
> qca_baudrate = qca_get_baudrate_value(speed);
> bt_dev_dbg(hu->hdev, "Set UART speed to %d", speed);
> @@ -1118,8 +1153,21 @@ static int qca_set_speed(struct hci_uart *hu, enum qca_speed_type speed_type)
> host_set_baudrate(hu, speed);
>
> error:
> - if (qcadev->btsoc_type == QCA_WCN3990)
> + if (qcadev->btsoc_type == QCA_WCN3990) {
> hci_uart_set_flow_control(hu, false);
> +
> + /* Wait for the controller to send the vendor event
> + * for the baudrate change command.
> + */
> + if (!wait_for_completion_timeout(&qca->drop_ev_comp,
> + msecs_to_jiffies(100))) {
> + bt_dev_err(hu->hdev,
> + "Failed to change controller baudrate\n");
> + ret = -EPROTO;
> + }
> +
> + clear_bit(QCA_DROP_VENDOR_EVENT, &qca->flags);
> + }
> }
>
> return ret;

2019-03-25 16:50:41

by Matthias Kaehlcke

[permalink] [raw]
Subject: Re: [PATCH 2/2] Bluetooth: hci_qca: wcn3990: Drop baudrate change vendor event

Hi Marcel,

do you have any comments on this change?

Without this firmware download on wcn3990 (and probably also wcn3998)
is broken, unfortunately fixing the ROM firmware is not an option :(

Thanks

Matthias

On Wed, Mar 06, 2019 at 04:40:41PM -0800, Matthias Kaehlcke wrote:
> Firmware download to the WCN3990 often fails with a 'TLV response size
> mismatch' error:
>
> [ 133.064659] Bluetooth: hci0: setting up wcn3990
> [ 133.489150] Bluetooth: hci0: QCA controller version 0x02140201
> [ 133.495245] Bluetooth: hci0: QCA Downloading qca/crbtfw21.tlv
> [ 133.507214] Bluetooth: hci0: QCA TLV response size mismatch
> [ 133.513265] Bluetooth: hci0: QCA Failed to download patch (-84)
>
> This is caused by a vendor event that corresponds to an earlier command
> to change the baudrate. The event is not processed in the context of the
> baudrate change and later interpreted as response to the firmware
> download command (which is also a vendor command), but the driver detects
> that the event doesn't have the expected amount of associated data.
>
> More details:
>
> For the WCN3990 the vendor command for a baudrate change isn't sent as
> synchronous HCI command, because the controller sends the corresponding
> vendor event with the new baudrate. The event is received and decoded
> after the baudrate change of the host port.
>
> Identify the 'unused' event when it is received and don't add it to
> the queue of RX frames.
>
> Signed-off-by: Matthias Kaehlcke <[email protected]>
> ---
> drivers/bluetooth/hci_qca.c | 54 ++++++++++++++++++++++++++++++++++---
> 1 file changed, 51 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
> index ab8e59419dbc4..565681a6a1167 100644
> --- a/drivers/bluetooth/hci_qca.c
> +++ b/drivers/bluetooth/hci_qca.c
> @@ -30,6 +30,7 @@
>
> #include <linux/kernel.h>
> #include <linux/clk.h>
> +#include <linux/completion.h>
> #include <linux/debugfs.h>
> #include <linux/delay.h>
> #include <linux/device.h>
> @@ -55,6 +56,7 @@
> #define HCI_MAX_IBS_SIZE 10
>
> #define QCA_IN_BAND_SLEEP_ENABLED BIT(0)
> +#define QCA_DROP_VENDOR_EVENT BIT(1)
>
> #define IBS_WAKE_RETRANS_TIMEOUT_MS 100
> #define IBS_TX_IDLE_TIMEOUT_MS 2000
> @@ -108,6 +110,7 @@ struct qca_data {
> struct work_struct ws_rx_vote_off;
> struct work_struct ws_tx_vote_off;
> unsigned long flags;
> + struct completion drop_ev_comp;
>
> /* For debugging purpose */
> u64 ibs_sent_wacks;
> @@ -474,6 +477,7 @@ static int qca_open(struct hci_uart *hu)
> INIT_WORK(&qca->ws_tx_vote_off, qca_wq_serial_tx_clock_vote_off);
>
> qca->hu = hu;
> + init_completion(&qca->drop_ev_comp);
>
> /* Assume we start with both sides asleep -- extra wakes OK */
> qca->tx_ibs_state = HCI_IBS_TX_ASLEEP;
> @@ -866,6 +870,33 @@ static int qca_recv_acl_data(struct hci_dev *hdev, struct sk_buff *skb)
> return hci_recv_frame(hdev, skb);
> }
>
> +static int qca_recv_event(struct hci_dev *hdev, struct sk_buff *skb)
> +{
> + struct hci_uart *hu = hci_get_drvdata(hdev);
> + struct qca_data *qca = hu->priv;
> +
> + if (test_bit(QCA_DROP_VENDOR_EVENT, &qca->flags)) {
> + struct hci_event_hdr *hdr = (void *)skb->data;
> +
> + /* For the WCN3990 the vendor command for a baudrate change
> + * isn't sent as synchronous HCI command, because the
> + * controller sends the corresponding vendor event with the
> + * new baudrate. The event is received and properly decoded
> + * after changing the baudrate of the host port. It needs to
> + * be dropped, otherwise it can be mis-interpreted as
> + * response to a later firmware download command (also a
> + * vendor command).
> + */
> +
> + if (hdr->evt == HCI_EV_VENDOR)
> + complete(&qca->drop_ev_comp);
> +
> + return 0;
> + }
> +
> + return hci_recv_frame(hdev, skb);
> +}
> +
> #define QCA_IBS_SLEEP_IND_EVENT \
> .type = HCI_IBS_SLEEP_IND, \
> .hlen = 0, \
> @@ -890,7 +921,7 @@ static int qca_recv_acl_data(struct hci_dev *hdev, struct sk_buff *skb)
> static const struct h4_recv_pkt qca_recv_pkts[] = {
> { H4_RECV_ACL, .recv = qca_recv_acl_data },
> { H4_RECV_SCO, .recv = hci_recv_frame },
> - { H4_RECV_EVENT, .recv = hci_recv_frame },
> + { H4_RECV_EVENT, .recv = qca_recv_event },
> { QCA_IBS_WAKE_IND_EVENT, .recv = qca_ibs_wake_ind },
> { QCA_IBS_WAKE_ACK_EVENT, .recv = qca_ibs_wake_ack },
> { QCA_IBS_SLEEP_IND_EVENT, .recv = qca_ibs_sleep_ind },
> @@ -1091,6 +1122,7 @@ static int qca_set_speed(struct hci_uart *hu, enum qca_speed_type speed_type)
> {
> unsigned int speed, qca_baudrate;
> struct qca_serdev *qcadev;
> + struct qca_data *qca = hu->priv;
> int ret = 0;
>
> if (speed_type == QCA_INIT_SPEED) {
> @@ -1106,8 +1138,11 @@ static int qca_set_speed(struct hci_uart *hu, enum qca_speed_type speed_type)
> * changing the baudrate of chip and host.
> */
> qcadev = serdev_device_get_drvdata(hu->serdev);
> - if (qcadev->btsoc_type == QCA_WCN3990)
> + if (qcadev->btsoc_type == QCA_WCN3990) {
> hci_uart_set_flow_control(hu, true);
> + reinit_completion(&qca->drop_ev_comp);
> + set_bit(QCA_DROP_VENDOR_EVENT, &qca->flags);
> + }
>
> qca_baudrate = qca_get_baudrate_value(speed);
> bt_dev_dbg(hu->hdev, "Set UART speed to %d", speed);
> @@ -1118,8 +1153,21 @@ static int qca_set_speed(struct hci_uart *hu, enum qca_speed_type speed_type)
> host_set_baudrate(hu, speed);
>
> error:
> - if (qcadev->btsoc_type == QCA_WCN3990)
> + if (qcadev->btsoc_type == QCA_WCN3990) {
> hci_uart_set_flow_control(hu, false);
> +
> + /* Wait for the controller to send the vendor event
> + * for the baudrate change command.
> + */
> + if (!wait_for_completion_timeout(&qca->drop_ev_comp,
> + msecs_to_jiffies(100))) {
> + bt_dev_err(hu->hdev,
> + "Failed to change controller baudrate\n");
> + ret = -EPROTO;
> + }
> +
> + clear_bit(QCA_DROP_VENDOR_EVENT, &qca->flags);
> + }
> }
>
> return ret;

2019-04-01 07:59:55

by Balakrishna Godavarthi

[permalink] [raw]
Subject: Re: [PATCH 2/2] Bluetooth: hci_qca: wcn3990: Drop baudrate change vendor event

Hi Matthias,

Sorry for the late reply i was on vacation.

On 2019-03-08 05:00, Matthias Kaehlcke wrote:
> On Thu, Mar 07, 2019 at 10:20:09AM -0800, Matthias Kaehlcke wrote:
>> Hi Balakrishna,
>>
>> On Thu, Mar 07, 2019 at 10:35:08AM +0530, Balakrishna Godavarthi
>> wrote:
>> > hi Matthias,
>> >
>> > On 2019-03-07 06:10, Matthias Kaehlcke wrote:
>> > > Firmware download to the WCN3990 often fails with a 'TLV response size
>> > > mismatch' error:
>> > >
>> > > [ 133.064659] Bluetooth: hci0: setting up wcn3990
>> > > [ 133.489150] Bluetooth: hci0: QCA controller version 0x02140201
>> > > [ 133.495245] Bluetooth: hci0: QCA Downloading qca/crbtfw21.tlv
>> > > [ 133.507214] Bluetooth: hci0: QCA TLV response size mismatch
>> > > [ 133.513265] Bluetooth: hci0: QCA Failed to download patch (-84)
>> > >
>> > > This is caused by a vendor event that corresponds to an earlier command
>> > > to change the baudrate. The event is not processed in the context of the
>> > > baudrate change and later interpreted as response to the firmware
>> > > download command (which is also a vendor command), but the driver
>> > > detects
>> > > that the event doesn't have the expected amount of associated data.
>> > >
>> > > More details:
>> > >
>> > > For the WCN3990 the vendor command for a baudrate change isn't sent as
>> > > synchronous HCI command, because the controller sends the corresponding
>> > > vendor event with the new baudrate. The event is received and decoded
>> > > after the baudrate change of the host port.
>> > >
>> > > Identify the 'unused' event when it is received and don't add it to
>> > > the queue of RX frames.
>> > >
>> > > Signed-off-by: Matthias Kaehlcke <[email protected]>
>> > > ---
>> >
>> > ...
>> >
>> > Can you test by reverting this change "94d6671473924".
>>
>> The issue is still reproducible.
>>
>> > We need at least 15ms minimum delay for the soc to change its baud rate and
>> > respond to the with command complete event.
>>
>> The baudrate change has clearly been successful when the problem is
>> observed, since the host receives the vendor event with the new
>> baudrate.
>
> I forgot to mention this earlier: the controller doesn't send a
> command complete event for the command, or at least not a correct
> one.
>
> That's the data that is received:
>
> 04 0e 04 01 00 00 00
> ~~ ~~
>
[Bala]: can you share me the command sent and event recevied.
I see that we receive a command complete event for the baud rate change
command.

command sent: 01 48 fc 01 11
vendor specific event: 04 ff 02 92 01
command complete event: 04 0e 04 01 00 00 00.



> This is *a* command complete event, but the opcode is 0x0000 instead
> of the earlier command. The same happens for the firmware
> download/read version command, which is the reason why the command
> complete injection mess
> (https://lore.kernel.org/patchwork/patch/1027955/) is needed in one
> way or another.
>
[Bala]: fw download approach is different where we use __hci_cmd_sync()
where as here we use hci_uart_tx_wakeup()
which directly calls the hci_uart_write_work(). so even we send
an valid opcode or not for baudrate change will bot matter.

> I wished Qualcomm FW developers would get their act together and:
>
> - send actual command complete events :
> - acknowledge a baudrate change request using the current baudrate
> like Broadcom and Intel chips apparently do
>
> this would have saved countless hours of debugging and implementing
> quirky workarounds ...
>
> Maybe there is hope for future chips (hint, hint)?

[Bala]: will take this forward to the SoC teams.

--
Regards
Balakrishna.

2019-04-01 08:19:18

by Balakrishna Godavarthi

[permalink] [raw]
Subject: Re: [PATCH 2/2] Bluetooth: hci_qca: wcn3990: Drop baudrate change vendor event

Hi Matthias,

On 2019-04-01 13:29, Balakrishna Godavarthi wrote:
> Hi Matthias,
>
> Sorry for the late reply i was on vacation.
>
> On 2019-03-08 05:00, Matthias Kaehlcke wrote:
>> On Thu, Mar 07, 2019 at 10:20:09AM -0800, Matthias Kaehlcke wrote:
>>> Hi Balakrishna,
>>>
>>> On Thu, Mar 07, 2019 at 10:35:08AM +0530, Balakrishna Godavarthi
>>> wrote:
>>> > hi Matthias,
>>> >
>>> > On 2019-03-07 06:10, Matthias Kaehlcke wrote:
>>> > > Firmware download to the WCN3990 often fails with a 'TLV response size
>>> > > mismatch' error:
>>> > >
>>> > > [ 133.064659] Bluetooth: hci0: setting up wcn3990
>>> > > [ 133.489150] Bluetooth: hci0: QCA controller version 0x02140201
>>> > > [ 133.495245] Bluetooth: hci0: QCA Downloading qca/crbtfw21.tlv
>>> > > [ 133.507214] Bluetooth: hci0: QCA TLV response size mismatch
>>> > > [ 133.513265] Bluetooth: hci0: QCA Failed to download patch (-84)
>>> > >
>>> > > This is caused by a vendor event that corresponds to an earlier command
>>> > > to change the baudrate. The event is not processed in the context of the
>>> > > baudrate change and later interpreted as response to the firmware
>>> > > download command (which is also a vendor command), but the driver
>>> > > detects
>>> > > that the event doesn't have the expected amount of associated data.
>>> > >
>>> > > More details:
>>> > >
>>> > > For the WCN3990 the vendor command for a baudrate change isn't sent as
>>> > > synchronous HCI command, because the controller sends the corresponding
>>> > > vendor event with the new baudrate. The event is received and decoded
>>> > > after the baudrate change of the host port.
>>> > >
>>> > > Identify the 'unused' event when it is received and don't add it to
>>> > > the queue of RX frames.
>>> > >
>>> > > Signed-off-by: Matthias Kaehlcke <[email protected]>
>>> > > ---
>>> >
>>> > ...
>>> >
>>> > Can you test by reverting this change "94d6671473924".
>>>
>>> The issue is still reproducible.
>>>
>>> > We need at least 15ms minimum delay for the soc to change its baud rate and
>>> > respond to the with command complete event.
>>>
>>> The baudrate change has clearly been successful when the problem is
>>> observed, since the host receives the vendor event with the new
>>> baudrate.
>>
>> I forgot to mention this earlier: the controller doesn't send a
>> command complete event for the command, or at least not a correct
>> one.
>>
>> That's the data that is received:
>>
>> 04 0e 04 01 00 00 00
>> ~~ ~~
>>
> [Bala]: can you share me the command sent and event recevied.
> I see that we receive a command complete event for the baud rate
> change command.
>
> command sent: 01 48 fc 01 11
> vendor specific event: 04 ff 02 92 01
> command complete event: 04 0e 04 01 00 00 00.
>
>
>
>> This is *a* command complete event, but the opcode is 0x0000 instead
>> of the earlier command. The same happens for the firmware
>> download/read version command, which is the reason why the command
>> complete injection mess
>> (https://lore.kernel.org/patchwork/patch/1027955/) is needed in one
>> way or another.
>>
> [Bala]: fw download approach is different where we use
> __hci_cmd_sync() where as here we use hci_uart_tx_wakeup()
> which directly calls the hci_uart_write_work(). so even we
> send an valid opcode or not for baudrate change will bot matter.
>
[Bala]: i miss understood the comment. Yes your true. in the all vendor
commands SoC responds with an 0x0000 opcode.

>> I wished Qualcomm FW developers would get their act together and:
>>
>> - send actual command complete events :
>> - acknowledge a baudrate change request using the current baudrate
>> like Broadcom and Intel chips apparently do
>>
>> this would have saved countless hours of debugging and implementing
>> quirky workarounds ...
>>
>> Maybe there is hope for future chips (hint, hint)?
>
> [Bala]: will take this forward to the SoC teams.

--
Regards
Balakrishna.

2019-04-01 18:07:00

by Matthias Kaehlcke

[permalink] [raw]
Subject: Re: [PATCH 2/2] Bluetooth: hci_qca: wcn3990: Drop baudrate change vendor event

On Mon, Apr 01, 2019 at 01:48:23PM +0530, Balakrishna Godavarthi wrote:
> Hi Matthias,
>
> On 2019-04-01 13:29, Balakrishna Godavarthi wrote:
> > Hi Matthias,
> >
> > Sorry for the late reply i was on vacation.
> >
> > On 2019-03-08 05:00, Matthias Kaehlcke wrote:
> > > On Thu, Mar 07, 2019 at 10:20:09AM -0800, Matthias Kaehlcke wrote:
> > > > Hi Balakrishna,
> > > >
> > > > On Thu, Mar 07, 2019 at 10:35:08AM +0530, Balakrishna Godavarthi
> > > > wrote:
> > > > > hi Matthias,
> > > > >
> > > > > On 2019-03-07 06:10, Matthias Kaehlcke wrote:
> > > > > > Firmware download to the WCN3990 often fails with a 'TLV response size
> > > > > > mismatch' error:
> > > > > >
> > > > > > [ 133.064659] Bluetooth: hci0: setting up wcn3990
> > > > > > [ 133.489150] Bluetooth: hci0: QCA controller version 0x02140201
> > > > > > [ 133.495245] Bluetooth: hci0: QCA Downloading qca/crbtfw21.tlv
> > > > > > [ 133.507214] Bluetooth: hci0: QCA TLV response size mismatch
> > > > > > [ 133.513265] Bluetooth: hci0: QCA Failed to download patch (-84)
> > > > > >
> > > > > > This is caused by a vendor event that corresponds to an earlier command
> > > > > > to change the baudrate. The event is not processed in the context of the
> > > > > > baudrate change and later interpreted as response to the firmware
> > > > > > download command (which is also a vendor command), but the driver
> > > > > > detects
> > > > > > that the event doesn't have the expected amount of associated data.
> > > > > >
> > > > > > More details:
> > > > > >
> > > > > > For the WCN3990 the vendor command for a baudrate change isn't sent as
> > > > > > synchronous HCI command, because the controller sends the corresponding
> > > > > > vendor event with the new baudrate. The event is received and decoded
> > > > > > after the baudrate change of the host port.
> > > > > >
> > > > > > Identify the 'unused' event when it is received and don't add it to
> > > > > > the queue of RX frames.
> > > > > >
> > > > > > Signed-off-by: Matthias Kaehlcke <[email protected]>
> > > > > > ---
> > > > >
> > > > > ...
> > > > >
> > > > > Can you test by reverting this change "94d6671473924".
> > > >
> > > > The issue is still reproducible.
> > > >
> > > > > We need at least 15ms minimum delay for the soc to change its baud rate and
> > > > > respond to the with command complete event.
> > > >
> > > > The baudrate change has clearly been successful when the problem is
> > > > observed, since the host receives the vendor event with the new
> > > > baudrate.
> > >
> > > I forgot to mention this earlier: the controller doesn't send a
> > > command complete event for the command, or at least not a correct
> > > one.
> > >
> > > That's the data that is received:
> > >
> > > 04 0e 04 01 00 00 00
> > > ~~ ~~
> > >
> > [Bala]: can you share me the command sent and event recevied.
> > I see that we receive a command complete event for the baud rate
> > change command.
> >
> > command sent: 01 48 fc 01 11
> > vendor specific event: 04 ff 02 92 01
> > command complete event: 04 0e 04 01 00 00 00.
> >
> >
> >
> > > This is *a* command complete event, but the opcode is 0x0000 instead
> > > of the earlier command. The same happens for the firmware
> > > download/read version command, which is the reason why the command
> > > complete injection mess
> > > (https://lore.kernel.org/patchwork/patch/1027955/) is needed in one
> > > way or another.
> > >
> > [Bala]: fw download approach is different where we use
> > __hci_cmd_sync() where as here we use hci_uart_tx_wakeup()
> > which directly calls the hci_uart_write_work(). so even we
> > send an valid opcode or not for baudrate change will bot matter.
> >
> [Bala]: i miss understood the comment. Yes your true. in the all vendor
> commands SoC responds with an 0x0000 opcode.

And IIUC this is not compliant with the spec, or at least the BT core
expects the actual opcode to consider the command to be completed.

2019-04-02 12:03:02

by Balakrishna Godavarthi

[permalink] [raw]
Subject: Re: [PATCH 2/2] Bluetooth: hci_qca: wcn3990: Drop baudrate change vendor event

Hi Matthias,

On 2019-04-01 22:42, Matthias Kaehlcke wrote:
> On Mon, Apr 01, 2019 at 01:48:23PM +0530, Balakrishna Godavarthi wrote:
>> Hi Matthias,
>>
>> On 2019-04-01 13:29, Balakrishna Godavarthi wrote:
>> > Hi Matthias,
>> >
>> > Sorry for the late reply i was on vacation.
>> >
>> > On 2019-03-08 05:00, Matthias Kaehlcke wrote:
>> > > On Thu, Mar 07, 2019 at 10:20:09AM -0800, Matthias Kaehlcke wrote:
>> > > > Hi Balakrishna,
>> > > >
>> > > > On Thu, Mar 07, 2019 at 10:35:08AM +0530, Balakrishna Godavarthi
>> > > > wrote:
>> > > > > hi Matthias,
>> > > > >
>> > > > > On 2019-03-07 06:10, Matthias Kaehlcke wrote:
>> > > > > > Firmware download to the WCN3990 often fails with a 'TLV response size
>> > > > > > mismatch' error:
>> > > > > >
>> > > > > > [ 133.064659] Bluetooth: hci0: setting up wcn3990
>> > > > > > [ 133.489150] Bluetooth: hci0: QCA controller version 0x02140201
>> > > > > > [ 133.495245] Bluetooth: hci0: QCA Downloading qca/crbtfw21.tlv
>> > > > > > [ 133.507214] Bluetooth: hci0: QCA TLV response size mismatch
>> > > > > > [ 133.513265] Bluetooth: hci0: QCA Failed to download patch (-84)
>> > > > > >
>> > > > > > This is caused by a vendor event that corresponds to an earlier command
>> > > > > > to change the baudrate. The event is not processed in the context of the
>> > > > > > baudrate change and later interpreted as response to the firmware
>> > > > > > download command (which is also a vendor command), but the driver
>> > > > > > detects
>> > > > > > that the event doesn't have the expected amount of associated data.
>> > > > > >
>> > > > > > More details:
>> > > > > >
>> > > > > > For the WCN3990 the vendor command for a baudrate change isn't sent as
>> > > > > > synchronous HCI command, because the controller sends the corresponding
>> > > > > > vendor event with the new baudrate. The event is received and decoded
>> > > > > > after the baudrate change of the host port.
>> > > > > >
>> > > > > > Identify the 'unused' event when it is received and don't add it to
>> > > > > > the queue of RX frames.
>> > > > > >
>> > > > > > Signed-off-by: Matthias Kaehlcke <[email protected]>
>> > > > > > ---
>> > > > >
>> > > > > ...
>> > > > >
>> > > > > Can you test by reverting this change "94d6671473924".
>> > > >
>> > > > The issue is still reproducible.
>> > > >
>> > > > > We need at least 15ms minimum delay for the soc to change its baud rate and
>> > > > > respond to the with command complete event.
>> > > >
>> > > > The baudrate change has clearly been successful when the problem is
>> > > > observed, since the host receives the vendor event with the new
>> > > > baudrate.
>> > >
>> > > I forgot to mention this earlier: the controller doesn't send a
>> > > command complete event for the command, or at least not a correct
>> > > one.
>> > >
>> > > That's the data that is received:
>> > >
>> > > 04 0e 04 01 00 00 00
>> > > ~~ ~~
>> > >
>> > [Bala]: can you share me the command sent and event recevied.
>> > I see that we receive a command complete event for the baud rate
>> > change command.
>> >
>> > command sent: 01 48 fc 01 11
>> > vendor specific event: 04 ff 02 92 01
>> > command complete event: 04 0e 04 01 00 00 00.
>> >
>> >
>> >
>> > > This is *a* command complete event, but the opcode is 0x0000 instead
>> > > of the earlier command. The same happens for the firmware
>> > > download/read version command, which is the reason why the command
>> > > complete injection mess
>> > > (https://lore.kernel.org/patchwork/patch/1027955/) is needed in one
>> > > way or another.
>> > >
>> > [Bala]: fw download approach is different where we use
>> > __hci_cmd_sync() where as here we use hci_uart_tx_wakeup()
>> > which directly calls the hci_uart_write_work(). so even we
>> > send an valid opcode or not for baudrate change will bot matter.
>> >
>> [Bala]: i miss understood the comment. Yes your true. in the all
>> vendor
>> commands SoC responds with an 0x0000 opcode.
>
> And IIUC this is not compliant with the spec, or at least the BT core
> expects the actual opcode to consider the command to be completed.

[Bala]: Did you try increasing the the baud rate change timeout to 50ms
instead of 10ms.
i suspect it is an timing issue.
I have see on the hardware sniffer that the chip is responding
with command complete event with
the newer baud rate after 15ms.
--
Regards
Balakrishna.

2019-04-02 18:05:05

by Matthias Kaehlcke

[permalink] [raw]
Subject: Re: [PATCH 2/2] Bluetooth: hci_qca: wcn3990: Drop baudrate change vendor event

On Tue, Apr 02, 2019 at 05:32:54PM +0530, Balakrishna Godavarthi wrote:
> Hi Matthias,
>
> On 2019-04-01 22:42, Matthias Kaehlcke wrote:
> > On Mon, Apr 01, 2019 at 01:48:23PM +0530, Balakrishna Godavarthi wrote:
> > > Hi Matthias,
> > >
> > > On 2019-04-01 13:29, Balakrishna Godavarthi wrote:
> > > > Hi Matthias,
> > > >
> > > > Sorry for the late reply i was on vacation.
> > > >
> > > > On 2019-03-08 05:00, Matthias Kaehlcke wrote:
> > > > > On Thu, Mar 07, 2019 at 10:20:09AM -0800, Matthias Kaehlcke wrote:
> > > > > > Hi Balakrishna,
> > > > > >
> > > > > > On Thu, Mar 07, 2019 at 10:35:08AM +0530, Balakrishna Godavarthi
> > > > > > wrote:
> > > > > > > hi Matthias,
> > > > > > >
> > > > > > > On 2019-03-07 06:10, Matthias Kaehlcke wrote:
> > > > > > > > Firmware download to the WCN3990 often fails with a 'TLV response size
> > > > > > > > mismatch' error:
> > > > > > > >
> > > > > > > > [ 133.064659] Bluetooth: hci0: setting up wcn3990
> > > > > > > > [ 133.489150] Bluetooth: hci0: QCA controller version 0x02140201
> > > > > > > > [ 133.495245] Bluetooth: hci0: QCA Downloading qca/crbtfw21.tlv
> > > > > > > > [ 133.507214] Bluetooth: hci0: QCA TLV response size mismatch
> > > > > > > > [ 133.513265] Bluetooth: hci0: QCA Failed to download patch (-84)
> > > > > > > >
> > > > > > > > This is caused by a vendor event that corresponds to an earlier command
> > > > > > > > to change the baudrate. The event is not processed in the context of the
> > > > > > > > baudrate change and later interpreted as response to the firmware
> > > > > > > > download command (which is also a vendor command), but the driver
> > > > > > > > detects
> > > > > > > > that the event doesn't have the expected amount of associated data.
> > > > > > > >
> > > > > > > > More details:
> > > > > > > >
> > > > > > > > For the WCN3990 the vendor command for a baudrate change isn't sent as
> > > > > > > > synchronous HCI command, because the controller sends the corresponding
> > > > > > > > vendor event with the new baudrate. The event is received and decoded
> > > > > > > > after the baudrate change of the host port.
> > > > > > > >
> > > > > > > > Identify the 'unused' event when it is received and don't add it to
> > > > > > > > the queue of RX frames.
> > > > > > > >
> > > > > > > > Signed-off-by: Matthias Kaehlcke <[email protected]>
> > > > > > > > ---
> > > > > > >
> > > > > > > ...
> > > > > > >
> > > > > > > Can you test by reverting this change "94d6671473924".
> > > > > >
> > > > > > The issue is still reproducible.
> > > > > >
> > > > > > > We need at least 15ms minimum delay for the soc to change its baud rate and
> > > > > > > respond to the with command complete event.
> > > > > >
> > > > > > The baudrate change has clearly been successful when the problem is
> > > > > > observed, since the host receives the vendor event with the new
> > > > > > baudrate.
> > > > >
> > > > > I forgot to mention this earlier: the controller doesn't send a
> > > > > command complete event for the command, or at least not a correct
> > > > > one.
> > > > >
> > > > > That's the data that is received:
> > > > >
> > > > > 04 0e 04 01 00 00 00
> > > > > ~~ ~~
> > > > >
> > > > [Bala]: can you share me the command sent and event recevied.
> > > > I see that we receive a command complete event for the baud rate
> > > > change command.
> > > >
> > > > command sent: 01 48 fc 01 11
> > > > vendor specific event: 04 ff 02 92 01
> > > > command complete event: 04 0e 04 01 00 00 00.
> > > >
> > > >
> > > >
> > > > > This is *a* command complete event, but the opcode is 0x0000 instead
> > > > > of the earlier command. The same happens for the firmware
> > > > > download/read version command, which is the reason why the command
> > > > > complete injection mess
> > > > > (https://lore.kernel.org/patchwork/patch/1027955/) is needed in one
> > > > > way or another.
> > > > >
> > > > [Bala]: fw download approach is different where we use
> > > > __hci_cmd_sync() where as here we use hci_uart_tx_wakeup()
> > > > which directly calls the hci_uart_write_work(). so even we
> > > > send an valid opcode or not for baudrate change will bot matter.
> > > >
> > > [Bala]: i miss understood the comment. Yes your true. in the all
> > > vendor
> > > commands SoC responds with an 0x0000 opcode.
> >
> > And IIUC this is not compliant with the spec, or at least the BT core
> > expects the actual opcode to consider the command to be completed.
>
> [Bala]: Did you try increasing the the baud rate change timeout to 50ms
> instead of 10ms.

It is even reproducible with the ROME timeout of 300ms.

The timeout doesn't help here. With 78e8fa2972e5 ("Bluetooth: hci_qca:
Deassert RTS while baudrate change command") RTS is deasserted during
the baudrate change, hence the controller only sends the response when
RTS is asserted again. Before the event resulted in a frame reassembly
error and the data was discarded.

> i suspect it is an timing issue.

Timing is certainly also a factor here (the problem isn't seen
always), but I don't think 'timing issue' is a proper description of
this issue. The problem is an event that the Bluetooth core doesn't
expect due to the hack of sending a raw command behind the core's back
to work around the firmware 'feature' of sending the command response
with the new baudrate.

Maybe a delay after re-asserting RTS can address is reliably, I don't
remember if I already experimented with that in the past. If a delay
can 'fix' the issue reliably I'm open to consider it for the sake of
simplicity, but only with a detailed comment that describes the
problem.

2019-04-03 06:23:33

by Balakrishna Godavarthi

[permalink] [raw]
Subject: Re: [PATCH 2/2] Bluetooth: hci_qca: wcn3990: Drop baudrate change vendor event

+ Harish to update his findings on wcn3998.
Mean time i will perform a regression to detect the failure.

On 2019-04-02 23:35, Matthias Kaehlcke wrote:
> On Tue, Apr 02, 2019 at 05:32:54PM +0530, Balakrishna Godavarthi wrote:
>> Hi Matthias,
>>
>> On 2019-04-01 22:42, Matthias Kaehlcke wrote:
>> > On Mon, Apr 01, 2019 at 01:48:23PM +0530, Balakrishna Godavarthi wrote:
>> > > Hi Matthias,
>> > >
>> > > On 2019-04-01 13:29, Balakrishna Godavarthi wrote:
>> > > > Hi Matthias,
>> > > >
>> > > > Sorry for the late reply i was on vacation.
>> > > >
>> > > > On 2019-03-08 05:00, Matthias Kaehlcke wrote:
>> > > > > On Thu, Mar 07, 2019 at 10:20:09AM -0800, Matthias Kaehlcke wrote:
>> > > > > > Hi Balakrishna,
>> > > > > >
>> > > > > > On Thu, Mar 07, 2019 at 10:35:08AM +0530, Balakrishna Godavarthi
>> > > > > > wrote:
>> > > > > > > hi Matthias,
>> > > > > > >
>> > > > > > > On 2019-03-07 06:10, Matthias Kaehlcke wrote:
>> > > > > > > > Firmware download to the WCN3990 often fails with a 'TLV response size
>> > > > > > > > mismatch' error:
>> > > > > > > >
>> > > > > > > > [ 133.064659] Bluetooth: hci0: setting up wcn3990
>> > > > > > > > [ 133.489150] Bluetooth: hci0: QCA controller version 0x02140201
>> > > > > > > > [ 133.495245] Bluetooth: hci0: QCA Downloading qca/crbtfw21.tlv
>> > > > > > > > [ 133.507214] Bluetooth: hci0: QCA TLV response size mismatch
>> > > > > > > > [ 133.513265] Bluetooth: hci0: QCA Failed to download patch (-84)
>> > > > > > > >
>> > > > > > > > This is caused by a vendor event that corresponds to an earlier command
>> > > > > > > > to change the baudrate. The event is not processed in the context of the
>> > > > > > > > baudrate change and later interpreted as response to the firmware
>> > > > > > > > download command (which is also a vendor command), but the driver
>> > > > > > > > detects
>> > > > > > > > that the event doesn't have the expected amount of associated data.
>> > > > > > > >
>> > > > > > > > More details:
>> > > > > > > >
>> > > > > > > > For the WCN3990 the vendor command for a baudrate change isn't sent as
>> > > > > > > > synchronous HCI command, because the controller sends the corresponding
>> > > > > > > > vendor event with the new baudrate. The event is received and decoded
>> > > > > > > > after the baudrate change of the host port.
>> > > > > > > >
>> > > > > > > > Identify the 'unused' event when it is received and don't add it to
>> > > > > > > > the queue of RX frames.
>> > > > > > > >
>> > > > > > > > Signed-off-by: Matthias Kaehlcke <[email protected]>
>> > > > > > > > ---
>> > > > > > >
>> > > > > > > ...
>> > > > > > >
>> > > > > > > Can you test by reverting this change "94d6671473924".
>> > > > > >
>> > > > > > The issue is still reproducible.
>> > > > > >
>> > > > > > > We need at least 15ms minimum delay for the soc to change its baud rate and
>> > > > > > > respond to the with command complete event.
>> > > > > >
>> > > > > > The baudrate change has clearly been successful when the problem is
>> > > > > > observed, since the host receives the vendor event with the new
>> > > > > > baudrate.
>> > > > >
>> > > > > I forgot to mention this earlier: the controller doesn't send a
>> > > > > command complete event for the command, or at least not a correct
>> > > > > one.
>> > > > >
>> > > > > That's the data that is received:
>> > > > >
>> > > > > 04 0e 04 01 00 00 00
>> > > > > ~~ ~~
>> > > > >
>> > > > [Bala]: can you share me the command sent and event recevied.
>> > > > I see that we receive a command complete event for the baud rate
>> > > > change command.
>> > > >
>> > > > command sent: 01 48 fc 01 11
>> > > > vendor specific event: 04 ff 02 92 01
>> > > > command complete event: 04 0e 04 01 00 00 00.
>> > > >
>> > > >
>> > > >
>> > > > > This is *a* command complete event, but the opcode is 0x0000 instead
>> > > > > of the earlier command. The same happens for the firmware
>> > > > > download/read version command, which is the reason why the command
>> > > > > complete injection mess
>> > > > > (https://lore.kernel.org/patchwork/patch/1027955/) is needed in one
>> > > > > way or another.
>> > > > >
>> > > > [Bala]: fw download approach is different where we use
>> > > > __hci_cmd_sync() where as here we use hci_uart_tx_wakeup()
>> > > > which directly calls the hci_uart_write_work(). so even we
>> > > > send an valid opcode or not for baudrate change will bot matter.
>> > > >
>> > > [Bala]: i miss understood the comment. Yes your true. in the all
>> > > vendor
>> > > commands SoC responds with an 0x0000 opcode.
>> >
>> > And IIUC this is not compliant with the spec, or at least the BT core
>> > expects the actual opcode to consider the command to be completed.
>>
>> [Bala]: Did you try increasing the the baud rate change timeout to
>> 50ms
>> instead of 10ms.
>
> It is even reproducible with the ROME timeout of 300ms.
>
> The timeout doesn't help here. With 78e8fa2972e5 ("Bluetooth: hci_qca:
> Deassert RTS while baudrate change command") RTS is deasserted during
> the baudrate change, hence the controller only sends the response when
> RTS is asserted again. Before the event resulted in a frame reassembly
> error and the data was discarded.
>
>> i suspect it is an timing issue.
>
> Timing is certainly also a factor here (the problem isn't seen
> always), but I don't think 'timing issue' is a proper description of
> this issue. The problem is an event that the Bluetooth core doesn't
> expect due to the hack of sending a raw command behind the core's back
> to work around the firmware 'feature' of sending the command response
> with the new baudrate.
>
> Maybe a delay after re-asserting RTS can address is reliably, I don't
> remember if I already experimented with that in the past. If a delay
> can 'fix' the issue reliably I'm open to consider it for the sake of
> simplicity, but only with a detailed comment that describes the
> problem.

--
Regards
Balakrishna.

2019-04-03 16:14:46

by Matthias Kaehlcke

[permalink] [raw]
Subject: Re: [PATCH 2/2] Bluetooth: hci_qca: wcn3990: Drop baudrate change vendor event

On Wed, Apr 03, 2019 at 11:53:26AM +0530, Balakrishna Godavarthi wrote:
> + Harish to update his findings on wcn3998.
> Mean time i will perform a regression to detect the failure.

On my system it typically reproduces within a few dozen
iterations. Make sure your tree doesn't contain any custom BT patches,
even if they just add logging or fix the timeout during
initialization. Since this problem is timing sensitive it might get
masked. My tree is based on 4.19 LTS with all QCA BT related changes
on top: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/refs/heads/chromeos-4.19

I found some problems during initialization easier to reproduce
when binding and unbinding the device through sysfs, instead of
doing hciconfig up/down, this resembles more the initialization at
boot time.

> On 2019-04-02 23:35, Matthias Kaehlcke wrote:
> > On Tue, Apr 02, 2019 at 05:32:54PM +0530, Balakrishna Godavarthi wrote:
> > > Hi Matthias,
> > >
> > > On 2019-04-01 22:42, Matthias Kaehlcke wrote:
> > > > On Mon, Apr 01, 2019 at 01:48:23PM +0530, Balakrishna Godavarthi wrote:
> > > > > Hi Matthias,
> > > > >
> > > > > On 2019-04-01 13:29, Balakrishna Godavarthi wrote:
> > > > > > Hi Matthias,
> > > > > >
> > > > > > Sorry for the late reply i was on vacation.
> > > > > >
> > > > > > On 2019-03-08 05:00, Matthias Kaehlcke wrote:
> > > > > > > On Thu, Mar 07, 2019 at 10:20:09AM -0800, Matthias Kaehlcke wrote:
> > > > > > > > Hi Balakrishna,
> > > > > > > >
> > > > > > > > On Thu, Mar 07, 2019 at 10:35:08AM +0530, Balakrishna Godavarthi
> > > > > > > > wrote:
> > > > > > > > > hi Matthias,
> > > > > > > > >
> > > > > > > > > On 2019-03-07 06:10, Matthias Kaehlcke wrote:
> > > > > > > > > > Firmware download to the WCN3990 often fails with a 'TLV response size
> > > > > > > > > > mismatch' error:
> > > > > > > > > >
> > > > > > > > > > [ 133.064659] Bluetooth: hci0: setting up wcn3990
> > > > > > > > > > [ 133.489150] Bluetooth: hci0: QCA controller version 0x02140201
> > > > > > > > > > [ 133.495245] Bluetooth: hci0: QCA Downloading qca/crbtfw21.tlv
> > > > > > > > > > [ 133.507214] Bluetooth: hci0: QCA TLV response size mismatch
> > > > > > > > > > [ 133.513265] Bluetooth: hci0: QCA Failed to download patch (-84)
> > > > > > > > > >
> > > > > > > > > > This is caused by a vendor event that corresponds to an earlier command
> > > > > > > > > > to change the baudrate. The event is not processed in the context of the
> > > > > > > > > > baudrate change and later interpreted as response to the firmware
> > > > > > > > > > download command (which is also a vendor command), but the driver
> > > > > > > > > > detects
> > > > > > > > > > that the event doesn't have the expected amount of associated data.
> > > > > > > > > >
> > > > > > > > > > More details:
> > > > > > > > > >
> > > > > > > > > > For the WCN3990 the vendor command for a baudrate change isn't sent as
> > > > > > > > > > synchronous HCI command, because the controller sends the corresponding
> > > > > > > > > > vendor event with the new baudrate. The event is received and decoded
> > > > > > > > > > after the baudrate change of the host port.
> > > > > > > > > >
> > > > > > > > > > Identify the 'unused' event when it is received and don't add it to
> > > > > > > > > > the queue of RX frames.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Matthias Kaehlcke <[email protected]>
> > > > > > > > > > ---
> > > > > > > > >
> > > > > > > > > ...
> > > > > > > > >
> > > > > > > > > Can you test by reverting this change "94d6671473924".
> > > > > > > >
> > > > > > > > The issue is still reproducible.
> > > > > > > >
> > > > > > > > > We need at least 15ms minimum delay for the soc to change its baud rate and
> > > > > > > > > respond to the with command complete event.
> > > > > > > >
> > > > > > > > The baudrate change has clearly been successful when the problem is
> > > > > > > > observed, since the host receives the vendor event with the new
> > > > > > > > baudrate.
> > > > > > >
> > > > > > > I forgot to mention this earlier: the controller doesn't send a
> > > > > > > command complete event for the command, or at least not a correct
> > > > > > > one.
> > > > > > >
> > > > > > > That's the data that is received:
> > > > > > >
> > > > > > > 04 0e 04 01 00 00 00
> > > > > > > ~~ ~~
> > > > > > >
> > > > > > [Bala]: can you share me the command sent and event recevied.
> > > > > > I see that we receive a command complete event for the baud rate
> > > > > > change command.
> > > > > >
> > > > > > command sent: 01 48 fc 01 11
> > > > > > vendor specific event: 04 ff 02 92 01
> > > > > > command complete event: 04 0e 04 01 00 00 00.
> > > > > >
> > > > > >
> > > > > >
> > > > > > > This is *a* command complete event, but the opcode is 0x0000 instead
> > > > > > > of the earlier command. The same happens for the firmware
> > > > > > > download/read version command, which is the reason why the command
> > > > > > > complete injection mess
> > > > > > > (https://lore.kernel.org/patchwork/patch/1027955/) is needed in one
> > > > > > > way or another.
> > > > > > >
> > > > > > [Bala]: fw download approach is different where we use
> > > > > > __hci_cmd_sync() where as here we use hci_uart_tx_wakeup()
> > > > > > which directly calls the hci_uart_write_work(). so even we
> > > > > > send an valid opcode or not for baudrate change will bot matter.
> > > > > >
> > > > > [Bala]: i miss understood the comment. Yes your true. in the all
> > > > > vendor
> > > > > commands SoC responds with an 0x0000 opcode.
> > > >
> > > > And IIUC this is not compliant with the spec, or at least the BT core
> > > > expects the actual opcode to consider the command to be completed.
> > >
> > > [Bala]: Did you try increasing the the baud rate change timeout to
> > > 50ms
> > > instead of 10ms.
> >
> > It is even reproducible with the ROME timeout of 300ms.
> >
> > The timeout doesn't help here. With 78e8fa2972e5 ("Bluetooth: hci_qca:
> > Deassert RTS while baudrate change command") RTS is deasserted during
> > the baudrate change, hence the controller only sends the response when
> > RTS is asserted again. Before the event resulted in a frame reassembly
> > error and the data was discarded.
> >
> > > i suspect it is an timing issue.
> >
> > Timing is certainly also a factor here (the problem isn't seen
> > always), but I don't think 'timing issue' is a proper description of
> > this issue. The problem is an event that the Bluetooth core doesn't
> > expect due to the hack of sending a raw command behind the core's back
> > to work around the firmware 'feature' of sending the command response
> > with the new baudrate.
> >
> > Maybe a delay after re-asserting RTS can address is reliably, I don't
> > remember if I already experimented with that in the past. If a delay
> > can 'fix' the issue reliably I'm open to consider it for the sake of
> > simplicity, but only with a detailed comment that describes the
> > problem.
>

2019-04-03 16:33:11

by Matthias Kaehlcke

[permalink] [raw]
Subject: Re: [PATCH 2/2] Bluetooth: hci_qca: wcn3990: Drop baudrate change vendor event

On Tue, Apr 02, 2019 at 11:05:01AM -0700, Matthias Kaehlcke wrote:
> On Tue, Apr 02, 2019 at 05:32:54PM +0530, Balakrishna Godavarthi wrote:
> > Hi Matthias,
> >
> > On 2019-04-01 22:42, Matthias Kaehlcke wrote:
> > > On Mon, Apr 01, 2019 at 01:48:23PM +0530, Balakrishna Godavarthi wrote:
> > > > Hi Matthias,
> > > >
> > > > On 2019-04-01 13:29, Balakrishna Godavarthi wrote:
> > > > > Hi Matthias,
> > > > >
> > > > > Sorry for the late reply i was on vacation.
> > > > >
> > > > > On 2019-03-08 05:00, Matthias Kaehlcke wrote:
> > > > > > On Thu, Mar 07, 2019 at 10:20:09AM -0800, Matthias Kaehlcke wrote:
> > > > > > > Hi Balakrishna,
> > > > > > >
> > > > > > > On Thu, Mar 07, 2019 at 10:35:08AM +0530, Balakrishna Godavarthi
> > > > > > > wrote:
> > > > > > > > hi Matthias,
> > > > > > > >
> > > > > > > > On 2019-03-07 06:10, Matthias Kaehlcke wrote:
> > > > > > > > > Firmware download to the WCN3990 often fails with a 'TLV response size
> > > > > > > > > mismatch' error:
> > > > > > > > >
> > > > > > > > > [ 133.064659] Bluetooth: hci0: setting up wcn3990
> > > > > > > > > [ 133.489150] Bluetooth: hci0: QCA controller version 0x02140201
> > > > > > > > > [ 133.495245] Bluetooth: hci0: QCA Downloading qca/crbtfw21.tlv
> > > > > > > > > [ 133.507214] Bluetooth: hci0: QCA TLV response size mismatch
> > > > > > > > > [ 133.513265] Bluetooth: hci0: QCA Failed to download patch (-84)
> > > > > > > > >
> > > > > > > > > This is caused by a vendor event that corresponds to an earlier command
> > > > > > > > > to change the baudrate. The event is not processed in the context of the
> > > > > > > > > baudrate change and later interpreted as response to the firmware
> > > > > > > > > download command (which is also a vendor command), but the driver
> > > > > > > > > detects
> > > > > > > > > that the event doesn't have the expected amount of associated data.
> > > > > > > > >
> > > > > > > > > More details:
> > > > > > > > >
> > > > > > > > > For the WCN3990 the vendor command for a baudrate change isn't sent as
> > > > > > > > > synchronous HCI command, because the controller sends the corresponding
> > > > > > > > > vendor event with the new baudrate. The event is received and decoded
> > > > > > > > > after the baudrate change of the host port.
> > > > > > > > >
> > > > > > > > > Identify the 'unused' event when it is received and don't add it to
> > > > > > > > > the queue of RX frames.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Matthias Kaehlcke <[email protected]>
> > > > > > > > > ---
> > > > > > > >
> > > > > > > > ...
> > > > > > > >
> > > > > > > > Can you test by reverting this change "94d6671473924".
> > > > > > >
> > > > > > > The issue is still reproducible.
> > > > > > >
> > > > > > > > We need at least 15ms minimum delay for the soc to change its baud rate and
> > > > > > > > respond to the with command complete event.
> > > > > > >
> > > > > > > The baudrate change has clearly been successful when the problem is
> > > > > > > observed, since the host receives the vendor event with the new
> > > > > > > baudrate.
> > > > > >
> > > > > > I forgot to mention this earlier: the controller doesn't send a
> > > > > > command complete event for the command, or at least not a correct
> > > > > > one.
> > > > > >
> > > > > > That's the data that is received:
> > > > > >
> > > > > > 04 0e 04 01 00 00 00
> > > > > > ~~ ~~
> > > > > >
> > > > > [Bala]: can you share me the command sent and event recevied.
> > > > > I see that we receive a command complete event for the baud rate
> > > > > change command.
> > > > >
> > > > > command sent: 01 48 fc 01 11
> > > > > vendor specific event: 04 ff 02 92 01
> > > > > command complete event: 04 0e 04 01 00 00 00.
> > > > >
> > > > >
> > > > >
> > > > > > This is *a* command complete event, but the opcode is 0x0000 instead
> > > > > > of the earlier command. The same happens for the firmware
> > > > > > download/read version command, which is the reason why the command
> > > > > > complete injection mess
> > > > > > (https://lore.kernel.org/patchwork/patch/1027955/) is needed in one
> > > > > > way or another.
> > > > > >
> > > > > [Bala]: fw download approach is different where we use
> > > > > __hci_cmd_sync() where as here we use hci_uart_tx_wakeup()
> > > > > which directly calls the hci_uart_write_work(). so even we
> > > > > send an valid opcode or not for baudrate change will bot matter.
> > > > >
> > > > [Bala]: i miss understood the comment. Yes your true. in the all
> > > > vendor
> > > > commands SoC responds with an 0x0000 opcode.
> > >
> > > And IIUC this is not compliant with the spec, or at least the BT core
> > > expects the actual opcode to consider the command to be completed.
> >
> > [Bala]: Did you try increasing the the baud rate change timeout to 50ms
> > instead of 10ms.
>
> It is even reproducible with the ROME timeout of 300ms.
>
> The timeout doesn't help here. With 78e8fa2972e5 ("Bluetooth: hci_qca:
> Deassert RTS while baudrate change command") RTS is deasserted during
> the baudrate change, hence the controller only sends the response when
> RTS is asserted again. Before the event resulted in a frame reassembly
> error and the data was discarded.
>
> > i suspect it is an timing issue.
>
> Timing is certainly also a factor here (the problem isn't seen
> always), but I don't think 'timing issue' is a proper description of
> this issue. The problem is an event that the Bluetooth core doesn't
> expect due to the hack of sending a raw command behind the core's back
> to work around the firmware 'feature' of sending the command response
> with the new baudrate.
>
> Maybe a delay after re-asserting RTS can address is reliably, I don't
> remember if I already experimented with that in the past. If a delay
> can 'fix' the issue reliably I'm open to consider it for the sake of
> simplicity, but only with a detailed comment that describes the
> problem.

A short delay after re-asserting RTS indeed seems to 'fix' this, the
problem didn't reproduce in 5000+ iterations of binding/unbinding the
HCI.

I'm fine with either solution, the delay is slightly less code, then
again this patch (or v2:
https://lore.kernel.org/patchwork/patch/1050247/) also isn't overly
complex. Just noticed though that my post of v2 is messed up and I
would have to re-send it with proper format.

Marcel, please let me know if you prefer an msleep + comment or
actively dropping the event.

Thanks

Matthias

2019-04-04 14:52:25

by Balakrishna Godavarthi

[permalink] [raw]
Subject: Re: [PATCH 2/2] Bluetooth: hci_qca: wcn3990: Drop baudrate change vendor event

Hi Matthias,

On 2019-04-03 21:44, Matthias Kaehlcke wrote:
> On Wed, Apr 03, 2019 at 11:53:26AM +0530, Balakrishna Godavarthi wrote:
>> + Harish to update his findings on wcn3998.
>> Mean time i will perform a regression to detect the failure.
>
> On my system it typically reproduces within a few dozen
> iterations. Make sure your tree doesn't contain any custom BT patches,
> even if they just add logging or fix the timeout during
> initialization. Since this problem is timing sensitive it might get
> masked. My tree is based on 4.19 LTS with all QCA BT related changes
> on top:
> https://chromium.googlesource.com/chromiumos/third_party/kernel/+/refs/heads/chromeos-4.19
>
> I found some problems during initialization easier to reproduce
> when binding and unbinding the device through sysfs, instead of
> doing hciconfig up/down, this resembles more the initialization at
> boot time.
>
[Bala]: I am able to replicate this issue.
are you seeing the below error message getting logged on the
console in the issue case or an different error message.
Bluetooth: hci0: QCA TLV response size mismatch

>> On 2019-04-02 23:35, Matthias Kaehlcke wrote:
>> > On Tue, Apr 02, 2019 at 05:32:54PM +0530, Balakrishna Godavarthi wrote:
>> > > Hi Matthias,
>> > >
>> > > On 2019-04-01 22:42, Matthias Kaehlcke wrote:
>> > > > On Mon, Apr 01, 2019 at 01:48:23PM +0530, Balakrishna Godavarthi wrote:
>> > > > > Hi Matthias,
>> > > > >
>> > > > > On 2019-04-01 13:29, Balakrishna Godavarthi wrote:
>> > > > > > Hi Matthias,
>> > > > > >
>> > > > > > Sorry for the late reply i was on vacation.
>> > > > > >
>> > > > > > On 2019-03-08 05:00, Matthias Kaehlcke wrote:
>> > > > > > > On Thu, Mar 07, 2019 at 10:20:09AM -0800, Matthias Kaehlcke wrote:
>> > > > > > > > Hi Balakrishna,
>> > > > > > > >
>> > > > > > > > On Thu, Mar 07, 2019 at 10:35:08AM +0530, Balakrishna Godavarthi
>> > > > > > > > wrote:
>> > > > > > > > > hi Matthias,
>> > > > > > > > >
>> > > > > > > > > On 2019-03-07 06:10, Matthias Kaehlcke wrote:
>> > > > > > > > > > Firmware download to the WCN3990 often fails with a 'TLV response size
>> > > > > > > > > > mismatch' error:
>> > > > > > > > > >
>> > > > > > > > > > [ 133.064659] Bluetooth: hci0: setting up wcn3990
>> > > > > > > > > > [ 133.489150] Bluetooth: hci0: QCA controller version 0x02140201
>> > > > > > > > > > [ 133.495245] Bluetooth: hci0: QCA Downloading qca/crbtfw21.tlv
>> > > > > > > > > > [ 133.507214] Bluetooth: hci0: QCA TLV response size mismatch
>> > > > > > > > > > [ 133.513265] Bluetooth: hci0: QCA Failed to download patch (-84)
>> > > > > > > > > >
>> > > > > > > > > > This is caused by a vendor event that corresponds to an earlier command
>> > > > > > > > > > to change the baudrate. The event is not processed in the context of the
>> > > > > > > > > > baudrate change and later interpreted as response to the firmware
>> > > > > > > > > > download command (which is also a vendor command), but the driver
>> > > > > > > > > > detects
>> > > > > > > > > > that the event doesn't have the expected amount of associated data.
>> > > > > > > > > >
>> > > > > > > > > > More details:
>> > > > > > > > > >
>> > > > > > > > > > For the WCN3990 the vendor command for a baudrate change isn't sent as
>> > > > > > > > > > synchronous HCI command, because the controller sends the corresponding
>> > > > > > > > > > vendor event with the new baudrate. The event is received and decoded
>> > > > > > > > > > after the baudrate change of the host port.
>> > > > > > > > > >
>> > > > > > > > > > Identify the 'unused' event when it is received and don't add it to
>> > > > > > > > > > the queue of RX frames.
>> > > > > > > > > >
>> > > > > > > > > > Signed-off-by: Matthias Kaehlcke <[email protected]>
>> > > > > > > > > > ---
>> > > > > > > > >
>> > > > > > > > > ...
>> > > > > > > > >
>> > > > > > > > > Can you test by reverting this change "94d6671473924".
>> > > > > > > >
>> > > > > > > > The issue is still reproducible.
>> > > > > > > >
>> > > > > > > > > We need at least 15ms minimum delay for the soc to change its baud rate and
>> > > > > > > > > respond to the with command complete event.
>> > > > > > > >
>> > > > > > > > The baudrate change has clearly been successful when the problem is
>> > > > > > > > observed, since the host receives the vendor event with the new
>> > > > > > > > baudrate.
>> > > > > > >
>> > > > > > > I forgot to mention this earlier: the controller doesn't send a
>> > > > > > > command complete event for the command, or at least not a correct
>> > > > > > > one.
>> > > > > > >
>> > > > > > > That's the data that is received:
>> > > > > > >
>> > > > > > > 04 0e 04 01 00 00 00
>> > > > > > > ~~ ~~
>> > > > > > >
>> > > > > > [Bala]: can you share me the command sent and event recevied.
>> > > > > > I see that we receive a command complete event for the baud rate
>> > > > > > change command.
>> > > > > >
>> > > > > > command sent: 01 48 fc 01 11
>> > > > > > vendor specific event: 04 ff 02 92 01
>> > > > > > command complete event: 04 0e 04 01 00 00 00.
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > > This is *a* command complete event, but the opcode is 0x0000 instead
>> > > > > > > of the earlier command. The same happens for the firmware
>> > > > > > > download/read version command, which is the reason why the command
>> > > > > > > complete injection mess
>> > > > > > > (https://lore.kernel.org/patchwork/patch/1027955/) is needed in one
>> > > > > > > way or another.
>> > > > > > >
>> > > > > > [Bala]: fw download approach is different where we use
>> > > > > > __hci_cmd_sync() where as here we use hci_uart_tx_wakeup()
>> > > > > > which directly calls the hci_uart_write_work(). so even we
>> > > > > > send an valid opcode or not for baudrate change will bot matter.
>> > > > > >
>> > > > > [Bala]: i miss understood the comment. Yes your true. in the all
>> > > > > vendor
>> > > > > commands SoC responds with an 0x0000 opcode.
>> > > >
>> > > > And IIUC this is not compliant with the spec, or at least the BT core
>> > > > expects the actual opcode to consider the command to be completed.
>> > >
>> > > [Bala]: Did you try increasing the the baud rate change timeout to
>> > > 50ms
>> > > instead of 10ms.
>> >
>> > It is even reproducible with the ROME timeout of 300ms.
>> >
>> > The timeout doesn't help here. With 78e8fa2972e5 ("Bluetooth: hci_qca:
>> > Deassert RTS while baudrate change command") RTS is deasserted during
>> > the baudrate change, hence the controller only sends the response when
>> > RTS is asserted again. Before the event resulted in a frame reassembly
>> > error and the data was discarded.
>> >
>> > > i suspect it is an timing issue.
>> >
>> > Timing is certainly also a factor here (the problem isn't seen
>> > always), but I don't think 'timing issue' is a proper description of
>> > this issue. The problem is an event that the Bluetooth core doesn't
>> > expect due to the hack of sending a raw command behind the core's back
>> > to work around the firmware 'feature' of sending the command response
>> > with the new baudrate.
>> >
>> > Maybe a delay after re-asserting RTS can address is reliably, I don't
>> > remember if I already experimented with that in the past. If a delay
>> > can 'fix' the issue reliably I'm open to consider it for the sake of
>> > simplicity, but only with a detailed comment that describes the
>> > problem.
>>

--
Regards
Balakrishna.

2019-04-09 16:36:38

by Matthias Kaehlcke

[permalink] [raw]
Subject: Re: [PATCH 2/2] Bluetooth: hci_qca: wcn3990: Drop baudrate change vendor event

On Thu, Apr 04, 2019 at 08:22:15PM +0530, Balakrishna Godavarthi wrote:
> Hi Matthias,
>
> On 2019-04-03 21:44, Matthias Kaehlcke wrote:
> > On Wed, Apr 03, 2019 at 11:53:26AM +0530, Balakrishna Godavarthi wrote:
> > > + Harish to update his findings on wcn3998.
> > > Mean time i will perform a regression to detect the failure.
> >
> > On my system it typically reproduces within a few dozen
> > iterations. Make sure your tree doesn't contain any custom BT patches,
> > even if they just add logging or fix the timeout during
> > initialization. Since this problem is timing sensitive it might get
> > masked. My tree is based on 4.19 LTS with all QCA BT related changes
> > on top:
> > https://chromium.googlesource.com/chromiumos/third_party/kernel/+/refs/heads/chromeos-4.19
> >
> > I found some problems during initialization easier to reproduce
> > when binding and unbinding the device through sysfs, instead of
> > doing hciconfig up/down, this resembles more the initialization at
> > boot time.
> >
> [Bala]: I am able to replicate this issue.
> are you seeing the below error message getting logged on the console
> in the issue case or an different error message.
> Bluetooth: hci0: QCA TLV response size mismatch

Yes, that's the error message. The size doesn't match because the
driver expects a TLV response but gets a baud rate response.