2019-02-15 20:37:36

by João Paulo Rechi Vita

[permalink] [raw]
Subject: Potential QCA9377 firmware bug, spurious command complete event

Hello all,

At Endless we are working on enabling a machine with a Qualcomm
Atheros QCA9377 adapter, and are experiencing problems when scanning
for BLE devices. btmon shows the ADV_IND and SCAN_RSP events, but no
device found events are emitted by the kernel, so the user is not able
to discover any BLE devices. Digging in the code and tracking HCI
commands being sent to the adapter and their respective command
complete events, I see the following:

1. All three HCI commands involved with starting a LE active scanning
procedure (I'm using "btmgmt find -l"), LE_SET_RANDOM_ADDR,
LE_SET_SCAN_PARAM and LE_SET_SCAN_ENABLE, are queued in hci_dev's skb
and hci_cmd_work is scheduled and runs, sending LE_SET_RANDOM_ADDR to
the controller.

2. A command complete (CC) event arrives for LE_SET_RANDOM_ADDR, is
processed by hci_cc_le_set_random_addr, hci_sent_cmd_data runs with no
errors and hci_cmd_work runs again, sending LE_SET_SCAN_PARAM to the
controller. All good until this point.

3. Another CC event arrives for LE_SET_RANDOM_ADDR, but this time the
CC opcode does not match the opcode from hdev->sent_cmd (the last HCI
command sent to the controller) in hci_sent_cmd_data. Nothing is done
so no real functional problem until this point. But continuing with
the flow hci_cmd_work runs again and sends LE_SET_SCAN_ENABLE to the
controller.

4. Now the CC event for LE_SET_SCAN_PARAM arrives, but since we
already sent LE_SET_SCAN_ENABLE the CC opcode does not match the
opcode from hdev->sent_cmd again in hci_sent_cmd_data, so
hci_cc_le_set_scan_param never gets to set hdev->le_scan_type. At this
point the controller is set to do an active scanning (CC reports
success) but the kernel believes it is a passive scanning.

5. The CC event for LE_SET_SCAN_ENABLE arrives and is processed
successfully, as no other HCI commands were sent in between. The scan
procedure starts and ADV_IND and ADV_SCAN_RSP PDUs start to arrive,
but no device found events are sent to userspace since the kernel
believes this is a passive scanning procedure.

This looks like a controller bug to me, but I may be missing
something. Has this kind of problem been seen before? If it is indeed
a controller bug, does any have contacts at Qualcomm-Atheros to report
it? Feel free to put me in contact with them if that would help.
Finally, if this is not fixable through a firmware update, does anyone
have suggestions to work around it in the kernel? This was reproduced
on a 4.20 kernel.

Best regards,

--
João Paulo Rechi Vita
http://about.me/jprvita