Hello,
I?m trying to scan for Estimote iBeacons, using Bluez 5.15, a raspbian Linux with Kernel 3.10.26 (so on a Raspberry Pi), and a TP-LINK TL-WN725N dongle.
To test receiving of the advertisements, I?m using hcidump and hcitool lescan.
As I want to continuously receive the advertisements, I?m allowing duplicates (--duplicates). Also, I don?t want to send scan responses, so I?m running a passive scan (--passive; running an active scan quickly results in a hardware error).
So the commands I?m running:
1st terminal: hcidump
2nd terminal: hcitool lescan --duplicates --passive
Receiving the advertisements works fine for some time (30 seconds-2 minutes), resulting in advertisement events printed by hcidump:
> HCI Event: LE Meta Event (0x3e) plen 42
LE Advertising Report
ADV_IND - Connectable undirected advertising (0)
bdaddr FC:94:A8:6F:A8:10 (Random)
Flags: 0x06
Unknown type 0xff with 25 bytes data
RSSI: -65
These events come at a rate of up to a couple per second. The RSSI values oscillate between -70 and -55 as the beacon is close.
But then I get:
> HCI Event: LE Meta Event (0x3e) plen 42
LE Advertising Report
ADV_IND - Connectable undirected advertising (0)
bdaddr FC:94:A8:6F:A8:10 (Random)
Flags: 0x06
Unknown type 0xff with 25 bytes data
RSSI: -4
> HCI Event: QoS Violation (0x1e) plen 2
handle 1537
> HCI Event: Data Buffer Overflow (0x1a) plen 255
type Unknown
> HCI Event: code 0xa8 plen 148
> HCI Event: code 0xf8 plen 70
> HCI Event: Physical Link Complete (0x40) plen 127
status 0x30 phy handle 0xf5
Error: Parameter out of Mandatory Range
The first weird thing is the abnormally large RSSI (sometimes even positive), but that may be the result of the buffer overflow, which comes next. Of course later no more packets are received, just a lot of: http://pastie.org/private/ckvvdt3plhxv84fbydyxg
I tried decreasing the scan?s interval to 0x50 (keeping the window at 0x10), which results in fewer advertisements being received), but after some time I get the same buffer overflow.
Any ideas what can be wrong? Is it something in bluez? Something specific to the dongle?
Thanks,
Adam
--
Adam Warski
http://twitter.com/#!/adamwarski
http://www.softwaremill.com
http://www.warski.org
Hi Adam,
On Fri, Feb 28, 2014 at 8:41 AM, Adam Warski <[email protected]> wrote:
> But then I get:
>
>> HCI Event: LE Meta Event (0x3e) plen 42
> LE Advertising Report
> ADV_IND - Connectable undirected advertising (0)
> bdaddr FC:94:A8:6F:A8:10 (Random)
> Flags: 0x06
> Unknown type 0xff with 25 bytes data
> RSSI: -4
>> HCI Event: QoS Violation (0x1e) plen 2
> handle 1537
>> HCI Event: Data Buffer Overflow (0x1a) plen 255
> type Unknown
>> HCI Event: code 0xa8 plen 148
>> HCI Event: code 0xf8 plen 70
>> HCI Event: Physical Link Complete (0x40) plen 127
> status 0x30 phy handle 0xf5
> Error: Parameter out of Mandatory Range
For me looks like something got confused on the HCI packet parsing:
either the kernel, hcidump, or something else. The HCI events after
the LE meta event make no sense (they seem garbage). Did you try with
btmon instead of hcidump ?
Can you save the raw dump using "hcidump -w output.dump" (or using
btmon -w) and send to the list? It is easier to analyze, as the parser
may be bogus.
Best Regards,
--
Anderson Lizardo
http://www.indt.org/?lang=en
INdT - Manaus - Brazil
On 2014-03-18 07:07, Anderson Lizardo wrote:
>> And the same as before, after a short amount of time I start getting
>> similar (not exactly the same [2]) errors. So it's not the dongle :)
>
> The logs seem to show the same behaviour as the other dongle:
> seemingly "random" packets with nonsense parameters (simply because
> the HCI packet was assembled incorrectly).
>
> I still blame the USB controller in RPi/Beaglebone for not feeding all
> USB fragments to the bluetooth subsystem... at least is my
> interpretation from Terry's logs (which are from USB sniffing).
>
> What I have no clue is why this is happenning: both of you tried
> powered USB hubs, and this was my only guess (insufficient power on
> the bus). If you could find a kernel version for your boards that are
> not causing any problems, then we can rule out HW related issues...
I agree that it looks like the USB controller's fault. It seems highly
unlikely, though. A few reasons why I think so:
1. The 2 boards use completely different USB controllers.
2. Both those platforms talk to other USB devices pushing much higher
data rates without any issue.
3. The problem doesn't show up when the bus is forced to USB 1.1 (Full
speed instead of high speed)
4. The problem doesn't only show up during non-duplicate-filtered scans.
I can reproduce it (although it takes a lot longer) with LE
Advertisement packets only coming in once per second.
Let me ask another question:
Are there any other USB devices that ever send a bunch of data with no
polling from the host? IE, if the host has no buffer space to accept an
INT transaction, it will just be lost, and all other USB devices wait
for the INT to be processed by getting an acknowledgement before
generating another INT? Apologies for my lack of in-depth knowledge for
the workings of USB bus transactions.
My suspicion is that if you follow the Bluetooth spec for HCI over USB
(which all of these vendors making these USB devices should) then if the
host USB controller is not fast enough, it can be overwhelmed. Forcing
the bus to full-speed mode, means the bluetooth receiver is forced to
buffer on its end, since it simply cannot transfer data to the host fast
enough to overwhelm it. If my suspicion is true, then that means any
packets of this type could overflow a host if it is busy enough,
although much less likely to happen if the host is more powerful (in
terms of processing power).
Regards,
Terry
Hi Adam,
On Sun, Mar 16, 2014 at 3:53 PM, Adam Warski <[email protected]> wrote:
> I didn't yet get to doing the wireshark investigations, but just for the record, I got another bluetooth dongle [1], and the --duplicates flag doesn't seem to have any effect, I always get all the packets.
Actually, the default in hcitool is to filter duplicates, and
"--duplicates" *disables* duplicate filtering. It's sometimes hard to
tell from the host side whether duplicate filtering is working or not,
as the logic is running inside the dongle. The only effect is to see
*more* advertising packets coming to the host when using --duplicates.
> And the same as before, after a short amount of time I start getting similar (not exactly the same [2]) errors. So it's not the dongle :)
The logs seem to show the same behaviour as the other dongle:
seemingly "random" packets with nonsense parameters (simply because
the HCI packet was assembled incorrectly).
I still blame the USB controller in RPi/Beaglebone for not feeding all
USB fragments to the bluetooth subsystem... at least is my
interpretation from Terry's logs (which are from USB sniffing).
What I have no clue is why this is happenning: both of you tried
powered USB hubs, and this was my only guess (insufficient power on
the bus). If you could find a kernel version for your boards that are
not causing any problems, then we can rule out HW related issues...
Best Regards,
--
Anderson Lizardo
http://www.indt.org/?lang=en
INdT - Manaus - Brazil
I didn?t yet get to doing the wireshark investigations, but just for the record, I got another bluetooth dongle [1], and the --duplicates flag doesn?t seem to have any effect, I always get all the packets.
And the same as before, after a short amount of time I start getting similar (not exactly the same [2]) errors. So it?s not the dongle :)
Adam
[1]
http://www.adafruit.com/products/1327, which identifies itself as:
Bus 001 Device 004: ID 0a12:0001 Cambridge Silicon Radio, Ltd Bluetooth Dongle (HCI mode)
[2]
http://pastie.org/8936306
On 10 Mar 2014, at 14:09, Anderson Lizardo <[email protected]> wrote:
> Hi Adam,
>
> Duplicate filtering is done inside the controller. If you disable
> filtering, the controller will receive a lot more packets (the
> quantity will depend on device's advertising parameters, but it is
> several times more packets).
>
> Even if your Gimbal device changes address on every packet (which is
> unusual, usually address changes are time based), what you actually
> see without using --duplicate is still a small fraction of what the
> controller receives over the air. Try comparing Gimbal on using
> --duplicates versus not using (instead of comparing different devices
> with different firmware and adv. parameters). Try counting number of
> adv. reports on both situations. I expect to see a lot more when
> --duplicate is enabled...
>
> Regarding the logs, it seems your dmesg.txt seems only to show the
> last lines of debugging... maybe the dmesg buffer is too short. Can
> you check if there is a more complete log generated in one of the
> /var/log/kern.log.* files ?
>
> Fortunately, It contains one snippet showing problems with HCI
> reassembly from USB packets:
>
> [ 878.184532] btusb_intr_complete: hci0 urb d9a8d300 status 0 count 16
> [ 878.185499] btusb_intr_complete: hci0 urb d9a8d300 status 0 count 16
> [ 878.186498] btusb_intr_complete: hci0 urb d9a8d300 status 0 count 12
> [ 878.386548] btusb_intr_complete: hci0 urb d9a8d300 status 0 count 16
> [ 878.387530] btusb_intr_complete: hci0 urb d9a8d300 status 0 count 16
> [ 878.387664] hci_rx_work: hci0
> [ 878.387693] hci_send_to_monitor: hdev d9a4e000 len 64
> [ 878.387783] hci_send_to_sock: hdev d9a4e000 len 64
> [ 878.387803] hci_rx_work: hci0 Event packet
> [ 878.387822] hci_event_packet: hci0 event 0xbc
> [ 878.387847] hci_send_to_monitor: hdev d9a4e000 len 4
> [ 878.387896] hci_send_to_sock: hdev d9a4e000 len 4
> [ 878.387913] hci_rx_work: hci0 Event packet
> [ 878.387930] hci_req_cmd_complete: opcode 0x200c status 0x00
> [ 878.387945] hci_sent_cmd_data: hci0 opcode 0x200c
> [ 878.387959] hci_event_packet: hci0 event 0x00
> [ 878.388058] hci_sock_recvmsg: sock da567840, sk daa9c800
>
> It contains 3 bogus events:
>
>> HCI Event: Unknown (0xbc) plen 62
>> HCI Event: Unknown (0x00) plen 2
>> HCI Event: Physical Link Complete (0x40) plen 127
>
> For me, it still points out to corrupted/truncated USB packets.
> Unfortunately, it is just indirect evidence as the logs don't show raw
> USB packets. But you can definitively confirm by using a USB sniffer
> tool. I recommend using tshark (wireshark's command line interface).
> It works just fine on raspberry. See this post for instructions:
> http://nagaraj-embedded.blogspot.com.br/2012/03/capturing-usb-data-through-wireshark.html
>
> Note that on raspberry there is just "usbmon1" (i.e. Bus 1), and
> everything is attached to it. So the logs will get other unrelated
> traffic (like ethernet packets), but wireshark GUI is powerful enough
> to filter only traffic related to a specific device.
>
> Best Regards,
> --
> Anderson Lizardo
> http://www.indt.org/?lang=en
> INdT - Manaus - Brazil
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bluetooth" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Adam Warski
http://twitter.com/#!/adamwarski
http://www.softwaremill.com
http://www.warski.org
On 2014-03-10 06:21, Anderson Lizardo wrote:
> Hi Terry,
>
> On Sat, Mar 8, 2014 at 3:19 AM, Terry Hardie <[email protected]> wrote:
>> The problem is when running a lescan (hcitool lescan) with a LE device
>> in
>> paring mode, which is transmitting a lot of LE Advertising report
>> packets,
>> the HCI drivers eventually loses sync. I've traced it down to a
>> duplicate
>> USB fragment.
>
> I think your logs show missing USB packets (not duplicated)... More
> details below.
Interesting that it always loses 2 packets, and never 1, that's why I
assumed duplication...
> In summary, it is expected that for devices that don't change address,
> the only byte to change is the RSSI (last byte of the last fragment).
> That's why the 2 first fragments are usually the same for all
> advertising packets.
You asked in another email if I tried a powered hub. Yes, I tried a
powered hub. I get the same results.
Also, although it's not common, I also see these lost frames outside of
a scan. A scan is just the easiest way to reproduce it. If I leave a tag
broadcasting once per second, 2 receivers on the same USB bus, after an
hour or so, the same problem happens (and this is only 1 broadcast per
second from the PROX tag, received by 2 receivers simultaneously,
resulting in each receiver sending 3 USB transfers to the host - 6 USB
transfer total), so this is not just limited to high packet rates. I
think there's some problem with USB bus contention, especially since
both receivers try to send data to the host at EXACTLY the same time.
Terry
Hi Terry,
On Sat, Mar 8, 2014 at 3:19 AM, Terry Hardie <[email protected]> wrote:
> The problem is when running a lescan (hcitool lescan) with a LE device in
> paring mode, which is transmitting a lot of LE Advertising report packets,
> the HCI drivers eventually loses sync. I've traced it down to a duplicate
> USB fragment.
I think your logs show missing USB packets (not duplicated)... More
details below.
> I added some code to btusb_intr_complete to print each urb as it comes up
> from the USB stack. Here's the output for the above problem. Note the extra
> "00 02 c7" -- Should not be there...
>
> Mar 8 06:32:12 arm kernel: [ 122.915094] hci1 urb df4a8540 status 0 count
> 16 flags 768
> Mar 8 06:32:12 arm kernel: [ 122.915176] hci1 urb contents: 3e 21 02 01
> 00 01 3c 30 05 7c f9 e1 15 02 01 05
> Mar 8 06:32:12 arm kernel: [ 122.916004] hci1 urb df4a8540 status 0 count
> 16 flags 768
> Mar 8 06:32:12 arm kernel: [ 122.916085] hci1 urb contents: 0d 09 53 63
> 6f 73 63 68 65 20 50 52 4f 58 03 19
> Mar 8 06:32:12 arm kernel: [ 122.916985] hci1 urb df4a8540 status 0 count
> 3 flags 768
> Mar 8 06:32:12 arm kernel: [ 122.917018] hci1 urb contents: 00 02 c2
> Mar 8 06:32:12 arm kernel: [ 122.942995] hci1 urb df4a8540 status 0 count
> 3 flags 768
There is a missing whole 16 byte USB packet right here.
> Mar 8 06:32:12 arm kernel: [ 122.943028] hci1 urb contents: 00 02 c7
The bytes above are part of the Advertising data, plus the RSSI (the
last byte) for the truncated packet.
> I used usbmon to do a "sniff" of the USB traffic. Here's a snippet of a
> correct HCI LE Advertising report event, followed by one where the middle
> fragment is repeated (frame 1300 is a repeat) This is a different capture
> from the above example. I can provide the pcap somewhere if needed
>
> No. Time Source Destination
> Protocol Length Info
> 1291 2014-03-07 02:40:16.942573 host 3.1
> USB 64 URB_INTERRUPT in
> 1292 2014-03-07 02:40:16.959480 3.1 host
> HCI_USB 80 Rcvd Fragment
> 3e21020100013c30057cf9e115020105
> 1293 2014-03-07 02:40:16.959624 host 3.1
> USB 64 URB_INTERRUPT in
> 1294 2014-03-07 02:40:16.960449 3.1 host
> HCI_USB 80 Rcvd Fragment
> 0d0953636f736368652050524f580319
> 1295 2014-03-07 02:40:16.960546 host 3.1
> USB 64 URB_INTERRUPT in
> 1296 2014-03-07 02:40:16.961455 3.1 host
> HCI_EVT 67 Rcvd LE Meta (LE Advertising Report)
> 0002ae
> 1297 2014-03-07 02:40:16.961560 host 3.1
> USB 64 URB_INTERRUPT in
> 1298 2014-03-07 02:40:16.981627 3.1 host
> HCI_USB 80 Rcvd Fragment
> 3e21020100013c30057cf9e115020105
> 1299 2014-03-07 02:40:16.981696 host 3.1
> USB 64 URB_INTERRUPT in
> 1300 2014-03-07 02:40:17.002651 3.1 host
> HCI_USB 80 Rcvd Fragment
> 3e21020100013c30057cf9e115020105
There are two missing fragments between right here.
> 1301 2014-03-07 02:40:17.002720 host 3.1
> USB 64 URB_INTERRUPT in
> 1302 2014-03-07 02:40:17.003560 3.1 host
> HCI_USB 80 Rcvd Fragment
> 0d0953636f736368652050524f580319
> 1303 2014-03-07 02:40:17.003649 host 3.1
> USB 64 URB_INTERRUPT in
> 1304 2014-03-07 02:40:17.004567 3.1 host
> HCI_USB 67 Rcvd Fragment
> 0002b5
In summary, it is expected that for devices that don't change address,
the only byte to change is the RSSI (last byte of the last fragment).
That's why the 2 first fragments are usually the same for all
advertising packets.
--
Anderson Lizardo
http://www.indt.org/?lang=en
INdT - Manaus - Brazil
Hi Adam,
Duplicate filtering is done inside the controller. If you disable
filtering, the controller will receive a lot more packets (the
quantity will depend on device's advertising parameters, but it is
several times more packets).
Even if your Gimbal device changes address on every packet (which is
unusual, usually address changes are time based), what you actually
see without using --duplicate is still a small fraction of what the
controller receives over the air. Try comparing Gimbal on using
--duplicates versus not using (instead of comparing different devices
with different firmware and adv. parameters). Try counting number of
adv. reports on both situations. I expect to see a lot more when
--duplicate is enabled...
Regarding the logs, it seems your dmesg.txt seems only to show the
last lines of debugging... maybe the dmesg buffer is too short. Can
you check if there is a more complete log generated in one of the
/var/log/kern.log.* files ?
Fortunately, It contains one snippet showing problems with HCI
reassembly from USB packets:
[ 878.184532] btusb_intr_complete: hci0 urb d9a8d300 status 0 count 16
[ 878.185499] btusb_intr_complete: hci0 urb d9a8d300 status 0 count 16
[ 878.186498] btusb_intr_complete: hci0 urb d9a8d300 status 0 count 12
[ 878.386548] btusb_intr_complete: hci0 urb d9a8d300 status 0 count 16
[ 878.387530] btusb_intr_complete: hci0 urb d9a8d300 status 0 count 16
[ 878.387664] hci_rx_work: hci0
[ 878.387693] hci_send_to_monitor: hdev d9a4e000 len 64
[ 878.387783] hci_send_to_sock: hdev d9a4e000 len 64
[ 878.387803] hci_rx_work: hci0 Event packet
[ 878.387822] hci_event_packet: hci0 event 0xbc
[ 878.387847] hci_send_to_monitor: hdev d9a4e000 len 4
[ 878.387896] hci_send_to_sock: hdev d9a4e000 len 4
[ 878.387913] hci_rx_work: hci0 Event packet
[ 878.387930] hci_req_cmd_complete: opcode 0x200c status 0x00
[ 878.387945] hci_sent_cmd_data: hci0 opcode 0x200c
[ 878.387959] hci_event_packet: hci0 event 0x00
[ 878.388058] hci_sock_recvmsg: sock da567840, sk daa9c800
It contains 3 bogus events:
> HCI Event: Unknown (0xbc) plen 62
> HCI Event: Unknown (0x00) plen 2
> HCI Event: Physical Link Complete (0x40) plen 127
For me, it still points out to corrupted/truncated USB packets.
Unfortunately, it is just indirect evidence as the logs don't show raw
USB packets. But you can definitively confirm by using a USB sniffer
tool. I recommend using tshark (wireshark's command line interface).
It works just fine on raspberry. See this post for instructions:
http://nagaraj-embedded.blogspot.com.br/2012/03/capturing-usb-data-through-wireshark.html
Note that on raspberry there is just "usbmon1" (i.e. Bus 1), and
everything is attached to it. So the logs will get other unrelated
traffic (like ethernet packets), but wireshark GUI is powerful enough
to filter only traffic related to a specific device.
Best Regards,
--
Anderson Lizardo
http://www.indt.org/?lang=en
INdT - Manaus - Brazil
On Fri, 7 Mar 2014, Terry Hardie wrote:
> I'm having something similar to what Adam is reporting. I'm running on a
> BeagleBone Black, which uses a TI MUSB controller (I think it's different to
> the Raspberry Pi USB controller?).
I have just found switching the USB 2 hub for a USB 1.1 hub makes the
problem go away... Not sure what that proves though...
Hi guys,
I'm having something similar to what Adam is reporting. I'm running on a
BeagleBone Black, which uses a TI MUSB controller (I think it's different
to the Raspberry Pi USB controller?).
I have 2 vendor's USB bluetooth receivers.
One is a Cambridge Silicon Radio (0a12:0001) and the other is a Broadcom
BCM20702A1 (0a5c:21e8). The Broadcom needs a RAM patch to behave slightly
better, but it outside the scope of this problem. BOTH exhibit the same
problem (so I seriously doubt it's a problem with the Bluetooth receivers)
Both devices are operating in HCI mode. I'm running Ubuntu on my
BeagleBone Black.
I've also built the latest version of Bluez (5.15)
The problem is when running a lescan (hcitool lescan) with a LE device in
paring mode, which is transmitting a lot of LE Advertising report packets,
the HCI drivers eventually loses sync. I've traced it down to a duplicate
USB fragment.
I've tested these USB receivers under Ubuntu on an Intel machine with the
same Bluez 5.15, and the problem is not present.
hcidump shows when things go crazy:
2014-03-08 06:32:12.111970 > HCI Event: LE Meta Event (0x3e) plen 33
LE Advertising Report
ADV_IND - Connectable undirected advertising (0)
bdaddr E1:F9:7C:05:30:3C (Random)
Flags: 0x05
Complete local name: 'Scosche PROX'
Unknown type 0x19 with 2 bytes data
RSSI: -62
2014-03-08 06:32:12.163087 > HCI Event: Unknown (0x00) plen 2
C7 3E
2014-03-08 06:32:12.163136 > HCI Event: Flow Specification Complete (0x21)
plen 2
status 0x01 handle 0x0000 flags 1 incoming
Error: Unknown HCI Command
2014-03-08 06:32:12.192009 > HCI Event: Inquiry Complete (0x01) plen 60
status 0x30
Error: Parameter out of Mandatory Range
05 7C F9 E1 15 02 01 05 0D 09 53 63 6F 73 63 68 65 20 50 52
4F 58 03 19 00 02 C5 3E 21 02 01 00 01 3C 30 05 7C F9 E1 15
02 01 05 0D 09 53 63 6F 73 63 68 65 20 50 52 4F 58 03 19
2014-03-08 06:32:12.218055 > HCI Event: Unknown (0x00) plen 2
C5 3E
I added some code to btusb_intr_complete to print each urb as it comes up
from the USB stack. Here's the output for the above problem. Note the
extra "00 02 c7" -- Should not be there...
Mar 8 06:32:12 arm kernel: [ 122.915094] hci1 urb df4a8540 status 0 count 16 flags 768
Mar 8 06:32:12 arm kernel: [ 122.915176] hci1 urb contents: 3e 21 02 01 00 01 3c 30 05 7c f9 e1 15 02 01 05
Mar 8 06:32:12 arm kernel: [ 122.916004] hci1 urb df4a8540 status 0 count 16 flags 768
Mar 8 06:32:12 arm kernel: [ 122.916085] hci1 urb contents: 0d 09 53 63 6f 73 63 68 65 20 50 52 4f 58 03 19
Mar 8 06:32:12 arm kernel: [ 122.916985] hci1 urb df4a8540 status 0 count 3 flags 768
Mar 8 06:32:12 arm kernel: [ 122.917018] hci1 urb contents: 00 02 c2
Mar 8 06:32:12 arm kernel: [ 122.942995] hci1 urb df4a8540 status 0 count 3 flags 768
Mar 8 06:32:12 arm kernel: [ 122.943028] hci1 urb contents: 00 02 c7
I used usbmon to do a "sniff" of the USB traffic. Here's a snippet of a
correct HCI LE Advertising report event, followed by one where the middle
fragment is repeated (frame 1300 is a repeat) This is a different capture
from the above example. I can provide the pcap somewhere if needed
No. Time Source Destination Protocol Length Info
1291 2014-03-07 02:40:16.942573 host 3.1 USB 64 URB_INTERRUPT in
1292 2014-03-07 02:40:16.959480 3.1 host HCI_USB 80 Rcvd Fragment
3e21020100013c30057cf9e115020105
1293 2014-03-07 02:40:16.959624 host 3.1 USB 64 URB_INTERRUPT in
1294 2014-03-07 02:40:16.960449 3.1 host HCI_USB 80 Rcvd Fragment
0d0953636f736368652050524f580319
1295 2014-03-07 02:40:16.960546 host 3.1 USB 64 URB_INTERRUPT in
1296 2014-03-07 02:40:16.961455 3.1 host HCI_EVT 67 Rcvd LE Meta (LE Advertising Report)
0002ae
1297 2014-03-07 02:40:16.961560 host 3.1 USB 64 URB_INTERRUPT in
1298 2014-03-07 02:40:16.981627 3.1 host HCI_USB 80 Rcvd Fragment
3e21020100013c30057cf9e115020105
1299 2014-03-07 02:40:16.981696 host 3.1 USB 64 URB_INTERRUPT in
1300 2014-03-07 02:40:17.002651 3.1 host HCI_USB 80 Rcvd Fragment
3e21020100013c30057cf9e115020105
1301 2014-03-07 02:40:17.002720 host 3.1 USB 64 URB_INTERRUPT in
1302 2014-03-07 02:40:17.003560 3.1 host HCI_USB 80 Rcvd Fragment
0d0953636f736368652050524f580319
1303 2014-03-07 02:40:17.003649 host 3.1 USB 64 URB_INTERRUPT in
1304 2014-03-07 02:40:17.004567 3.1 host HCI_USB 67 Rcvd Fragment
0002b5
Hello,
I recompiled the kernel with dynamic debug, and here are the two files as per your instructions:
http://www.warski.org/ibeacons/dmesg.txt
http://www.warski.org/ibeacons/lescan.dump
I also tried another thing. I have two kinds of beacons: one coming from Estimote, the other from Qualcomm (Gimbal).
The Estimote beacons broadcast the advertisement packets with the same source address (well, it changes, but rarely). Hence to get all the packets the --duplicates flag is needed.
The Gimbal beacons broadcast with a different source address every time (random address). Hence to get all the packets the duplicate flag isn?t needed, as every packet is different.
Both transmit at a similar rate (maybe the Gimbals are a bit slower, but I also tried decreasing the scanning interval in hcitool), and with the Gimbal beacons the are no problems. So I guess this puts more doubt on possible USB cause? (and I checked the USB hub, it works fine as far as I can tell)
The root cause must be something with the usage of the --duplicates flag, no? Though I can?t find the code that it is handling the buffering - is it done on the dongle itself?
Adam
On 04 Mar 2014, at 00:07, Anderson Lizardo <[email protected]> wrote:
> Hi Adam,
>
> On Mon, Mar 3, 2014 at 5:25 PM, Adam Warski <[email protected]> wrote:
>> But I guess even if it was a USB issue, it would be caught earlier? The advertisement packets have a certain preamble, and a CRC, so the shifted data would be discarded? Anyway, that doesn't seem to be the cause.
>
> There are are a few things that point out to USB issue (in my opinion):
>
> 1) You said that having ethernet cable connected or not modifies the
> behavior. on RPI, the ethernet chip is attached to the USB. As I said,
> strange things happen to me as well when not using powered USB hub
> (specially with WiFi dongles and USB cameras) :)
>
> 2) As you noticed, the "truncated" adv. packet is not being discarded
> by the controller as we would expect. This means that the BT
> controller is actually receiving a correct packet, but it is being
> "truncated" when being sent to the host. The CRC/preamble, after
> parsed and verified by the BT controller, is removed before sending
> the packet to the host, so there is no way for the host do detect
> corruptions between the BT controller and the kernel.
>
> The dynamic debug messages may help identify whether there are
> missing/truncated USB packets (there is a debug message in
> btusb_intr_complete() from drivers/bluetooth/btusb.c to help identify
> HCI reassembly issues).
>
> Did you try using only the BT dongle and removing the Wifi one (and
> connecting through ethernet cable)? Wifi dongles tend to be very power
> hungry.
>
> Also check whether the power source for the USB hub is working
> properly... unfortunately I don't know how to check from the RPi side,
> so you have to rely on first attaching it to the outlet, checking some
> LED, and then attaching to the RPi. Also make sure the RPi uses its
> own power adapter (instead of powering from the hub itself).
>
> Well, at least you found a workaround (attaching an ethernet cable) :)
>
> Best Regards,
> --
> Anderson Lizardo
> http://www.indt.org/?lang=en
> INdT - Manaus - Brazil
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bluetooth" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Adam Warski
http://twitter.com/#!/adamwarski
http://www.softwaremill.com
http://www.warski.org
Hi Adam,
On Mon, Mar 3, 2014 at 5:25 PM, Adam Warski <[email protected]> wrote:
> But I guess even if it was a USB issue, it would be caught earlier? The advertisement packets have a certain preamble, and a CRC, so the shifted data would be discarded? Anyway, that doesn't seem to be the cause.
There are are a few things that point out to USB issue (in my opinion):
1) You said that having ethernet cable connected or not modifies the
behavior. on RPI, the ethernet chip is attached to the USB. As I said,
strange things happen to me as well when not using powered USB hub
(specially with WiFi dongles and USB cameras) :)
2) As you noticed, the "truncated" adv. packet is not being discarded
by the controller as we would expect. This means that the BT
controller is actually receiving a correct packet, but it is being
"truncated" when being sent to the host. The CRC/preamble, after
parsed and verified by the BT controller, is removed before sending
the packet to the host, so there is no way for the host do detect
corruptions between the BT controller and the kernel.
The dynamic debug messages may help identify whether there are
missing/truncated USB packets (there is a debug message in
btusb_intr_complete() from drivers/bluetooth/btusb.c to help identify
HCI reassembly issues).
Did you try using only the BT dongle and removing the Wifi one (and
connecting through ethernet cable)? Wifi dongles tend to be very power
hungry.
Also check whether the power source for the USB hub is working
properly... unfortunately I don't know how to check from the RPi side,
so you have to rely on first attaching it to the outlet, checking some
LED, and then attaching to the RPi. Also make sure the RPi uses its
own power adapter (instead of powering from the hub itself).
Well, at least you found a workaround (attaching an ethernet cable) :)
Best Regards,
--
Anderson Lizardo
http://www.indt.org/?lang=en
INdT - Manaus - Brazil
> I didn't notice that you were using Raspberry PI... To rule out USB
> power issues, can you try attaching a powered USB hub on the RPi, and
> then attach your BT dongle on it?
>
> I always had strange/random problems on RPi with USB HW plugged
> directly on the USB port (except for pen drives, which work fine),
> even for devices which supposedly draw very little current.
>
> Given that you are disabling duplicate filtering, every single
> Advertising packet is arriving at the host, so it is certainly using
> more power. Just not sure if that is the cause.
Same thing happens with a powered hub (with both WiFi and BT dongles plugged into the hub).
But I guess even if it was a USB issue, it would be caught earlier? The advertisement packets have a certain preamble, and a CRC, so the shifted data would be discarded? Anyway, that doesn?t seem to be the cause.
Adam
--
Adam Warski
http://twitter.com/#!/adamwarski
http://www.softwaremill.com
http://www.warski.org
On 03 Mar 2014, at 19:02, Anderson Lizardo <[email protected]> wrote:
> Hi Adam,
>
> On Mon, Mar 3, 2014 at 11:26 AM, Adam Warski <[email protected]> wrote:
>> I'm wondering what this might mean exactly :) I guess it could point to the
>> specific USB/bluetooth drivers present in the raspbian 3.10 and ubuntu 3.11
>> kernels, right?
>
> I didn't notice that you were using Raspberry PI... To rule out USB
> power issues, can you try attaching a powered USB hub on the RPi, and
> then attach your BT dongle on it?
During these experiments I found something very weird :)
The RPi has two dongles: WiFi and BT
Things *do* work, if the LAN cable is plugged in - even if the interface is down in the system.
The moment I pull the cable, I soon get the familiar buffer overflow and a flood of errors.
How is that possible? :)
That?s without a powered hub so far.
> I always had strange/random problems on RPi with USB HW plugged
> directly on the USB port (except for pen drives, which work fine),
> even for devices which supposedly draw very little current.
>
> Given that you are disabling duplicate filtering, every single
> Advertising packet is arriving at the host, so it is certainly using
> more power. Just not sure if that is the cause.
So where is the filtering done? In the kernel driver?
Adam
--
Adam Warski
http://twitter.com/#!/adamwarski
http://www.softwaremill.com
http://www.warski.org
Hi Adam,
On Mon, Mar 3, 2014 at 11:26 AM, Adam Warski <[email protected]> wrote:
> I'm wondering what this might mean exactly :) I guess it could point to the
> specific USB/bluetooth drivers present in the raspbian 3.10 and ubuntu 3.11
> kernels, right?
I didn't notice that you were using Raspberry PI... To rule out USB
power issues, can you try attaching a powered USB hub on the RPi, and
then attach your BT dongle on it?
I always had strange/random problems on RPi with USB HW plugged
directly on the USB port (except for pen drives, which work fine),
even for devices which supposedly draw very little current.
Given that you are disabling duplicate filtering, every single
Advertising packet is arriving at the host, so it is certainly using
more power. Just not sure if that is the cause.
Best Regards,
--
Anderson Lizardo
http://www.indt.org/?lang=en
INdT - Manaus - Brazil
> I messed up the model number, looked at the wrong dongle, TP-Link is the WiFi one ;)
> The bluetooth one is an IOGear GBU521 (http://www.iogear.com/product/GBU521/).
>
> I don?t have another dongle right now (I tried a different IOGear, but same thing), but I?ll try setting a VM on my laptop/getting a different USB dongle.
I?ve setup an Ubuntu 13.10 VM (kernel 3.11.0-17), same dongle (IOGear), bluez 5.15, and it works without problems - at least for the past 15 minutes ;) (usually it work at most 2 minutes).
I?m wondering what this might mean exactly :) I guess it could point to the specific USB/bluetooth drivers present in the raspbian 3.10 and ubuntu 3.11 kernels, right?
Adam
--
Adam Warski
http://twitter.com/#!/adamwarski
http://www.softwaremill.com
http://www.warski.org
Thanks for the analysis :) I guess I?m more of a beginner in the area and moving slower.
I also tried with bluez 5.13 (as .15 contains some LE changes), but same thing happens.
> Maybe you already did your own investigation, but here goes my findings:
>
> When opening the dump file on a hex editor, we can see that this first
> bogus packet starts (skipping BTSnoop's own header) at offset 0x45a4
> (with bytes "3E 2A 02 01 ..."). At offset 0x45b4 we can see that the
> packet becomes truncated by missing these bytes (based on previous
> correct packets):
>
> 1A FF 4C 00 02 15 B9 40 7F 30 F5 F8 46 6E AF F9
>
> From now on, the packets become garbage because they are "shifted" due
> to these missing bytes.
>
> I think we can rule out any problems on the hcidump/btmon tools (as
> the raw logs come directly from the kernel), so I see two
> possibilities here:
>
> 1) There might be a bug in HCI packet reassembly logic (either in
> hci_core.c or btusb.c).
>
> 2) (most likely) your BT dongle may be broken (i.e. generating
> truncated USB packets from time to time)
The root cause seem to be the ?--duplicates --passive? flags. According to the BLE spec the duplicates are filtered out in the Link Layer part. Do I understand correctly that that?s on the hardware side?
If I just run ?hcitool lescan --passive?, I see the advertisement only once, and nothing else happens - which includes no truncated packets. So the --duplicates flag seems to be the culprit.
> So my suggestion would be to try with a different dongle, preferably
> of a different chipset vendor. Note that "TP-Link" is just the brand
> name, you can confirm the chipset using lsusb or (even better)
> "hciconfig hci0 -a" (check "Manufacturer" field).
I messed up the model number, looked at the wrong dongle, TP-Link is the WiFi one ;)
The bluetooth one is an IOGear GBU521 (http://www.iogear.com/product/GBU521/).
I don?t have another dongle right now (I tried a different IOGear, but same thing), but I?ll try setting a VM on my laptop/getting a different USB dongle.
> Can you also provide the "lsusb" and "hciconfig hci0 -a" output (feel
> free to modify the BT address for privacy reasons)? I think it is
> important to have this archived so similar reports can be traced to
> this hardware and firmware.
Sure:
$ sudo hciconfig hci0 -a
hci0: Type: BR/EDR Bus: USB
BD Address: 00:01:81:C6:E8:82 ACL MTU: 1021:8 SCO MTU: 64:1
UP RUNNING
RX bytes:547 acl:0 sco:0 events:27 errors:0
TX bytes:384 acl:0 sco:0 commands:27 errors:0
Features: 0xbf 0xfe 0xcf 0xfe 0xdb 0xff 0x7b 0x87
Packet type: DM1 DM3 DM5 DH1 DH3 DH5 HV1 HV2 HV3
Link policy: RSWITCH SNIFF
Link mode: SLAVE ACCEPT
Name: 'BCM20702A'
Class: 0x000000
Service Classes: Unspecified
Device Class: Miscellaneous,
HCI Version: 4.0 (0x6) Revision: 0x1000
LMP Version: 4.0 (0x6) Subversion: 0x220e
Manufacturer: Broadcom Corporation (15)
$ lsusb
Bus 001 Device 002: ID 0424:9512 Standard Microsystems Corp.
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 003: ID 0424:ec00 Standard Microsystems Corp.
Bus 001 Device 004: ID 0bda:8179 Realtek Semiconductor Corp.
Bus 001 Device 005: ID 0a5c:21e8 Broadcom Corp.
> Another good source of information is the kernel debug logs. Collect
> it by enabling dynamic debugging:
>
> # dmesg -c # just to clear the kernel log buffer
> # echo 'module bluetooth +pf' | cat > /sys/kernel/debug/dynamic_debug/control
> # echo 'module btusb +pf' | cat > /sys/kernel/debug/dynamic_debug/control
> # btmon -w lescan.dump # So we can correlate the kernel log with the
> actual packets
> # hcitool lescan --duplicates # Run on another terminal, until packets
> become corrupt
> # dmesg > dmesg.txt
> # echo 'module bluetooth -pf' | cat > /sys/kernel/debug/dynamic_debug/control
> # echo 'module btusb -pf' | cat > /sys/kernel/debug/dynamic_debug/control
> # gzip dmesg.txt (if it is too big)
>
> Then send dmesg.txt.gz + lescan.dump to the list. You may want to mask
> out your BT address on the log as well.
I mounted debugfs but as I don?t have dynamic_debug I guess I have to recompile the kernel with the appropriate flag. This will take a while ;)
Adam
--
Adam Warski
http://twitter.com/#!/adamwarski
http://www.softwaremill.com
http://www.warski.org
Hi Adam,
On Sat, Mar 1, 2014 at 5:34 AM, Adam Warski <[email protected]> wrote:
>> Can you save the raw dump using "hcidump -w output.dump" (or using
>> btmon -w) and send to the list? It is easier to analyze, as the parser
>> may be bogus.
>
> Sure. The problems start at 63.120548.
> The dump is here: http://www.warski.org/btmon_ibeacons.dat
Maybe you already did your own investigation, but here goes my findings:
When opening the dump file on a hex editor, we can see that this first
bogus packet starts (skipping BTSnoop's own header) at offset 0x45a4
(with bytes "3E 2A 02 01 ..."). At offset 0x45b4 we can see that the
packet becomes truncated by missing these bytes (based on previous
correct packets):
1A FF 4C 00 02 15 B9 40 7F 30 F5 F8 46 6E AF F9
>From now on, the packets become garbage because they are "shifted" due
to these missing bytes.
I think we can rule out any problems on the hcidump/btmon tools (as
the raw logs come directly from the kernel), so I see two
possibilities here:
1) There might be a bug in HCI packet reassembly logic (either in
hci_core.c or btusb.c).
2) (most likely) your BT dongle may be broken (i.e. generating
truncated USB packets from time to time)
So my suggestion would be to try with a different dongle, preferably
of a different chipset vendor. Note that "TP-Link" is just the brand
name, you can confirm the chipset using lsusb or (even better)
"hciconfig hci0 -a" (check "Manufacturer" field).
Can you also provide the "lsusb" and "hciconfig hci0 -a" output (feel
free to modify the BT address for privacy reasons)? I think it is
important to have this archived so similar reports can be traced to
this hardware and firmware.
Another good source of information is the kernel debug logs. Collect
it by enabling dynamic debugging:
# dmesg -c # just to clear the kernel log buffer
# echo 'module bluetooth +pf' | cat > /sys/kernel/debug/dynamic_debug/control
# echo 'module btusb +pf' | cat > /sys/kernel/debug/dynamic_debug/control
# btmon -w lescan.dump # So we can correlate the kernel log with the
actual packets
# hcitool lescan --duplicates # Run on another terminal, until packets
become corrupt
# dmesg > dmesg.txt
# echo 'module bluetooth -pf' | cat > /sys/kernel/debug/dynamic_debug/control
# echo 'module btusb -pf' | cat > /sys/kernel/debug/dynamic_debug/control
# gzip dmesg.txt (if it is too big)
Then send dmesg.txt.gz + lescan.dump to the list. You may want to mask
out your BT address on the log as well.
>
> Thanks,
> Adam
Best Regards,
--
Anderson Lizardo
http://www.indt.org/?lang=en
INdT - Manaus - Brazil
> For me looks like something got confused on the HCI packet parsing:
> either the kernel, hcidump, or something else. The HCI events after
> the LE meta event make no sense (they seem garbage). Did you try with
> btmon instead of hcidump ?
Yes, with btmon the same is happening. Also when just running ?hcitool lescan --passive ?duplicates? it stops getting advertisements after a while (without hcidump or btmon running in the background).
> Can you save the raw dump using "hcidump -w output.dump" (or using
> btmon -w) and send to the list? It is easier to analyze, as the parser
> may be bogus.
Sure. The problems start at 63.120548.
The dump is here: http://www.warski.org/btmon_ibeacons.dat
Thanks,
Adam
--
Adam Warski
http://twitter.com/#!/adamwarski
http://www.softwaremill.com
http://www.warski.org
Finally came back to this problem, and setting USB to 1.1 (dwc_otg.speed=1 in /boot/cmdline.txt) fixes the problem.
So indeed that is a USB problem.
Thanks for the help! :)
Adam
On 20 Mar 2014, at 00:47, Terry Hardie <[email protected]> wrote:
> On 2014-03-18 07:07, Anderson Lizardo wrote:
>>> And the same as before, after a short amount of time I start getting similar (not exactly the same [2]) errors. So it's not the dongle :)
>> The logs seem to show the same behaviour as the other dongle:
>> seemingly "random" packets with nonsense parameters (simply because
>> the HCI packet was assembled incorrectly).
>> I still blame the USB controller in RPi/Beaglebone for not feeding all
>> USB fragments to the bluetooth subsystem... at least is my
>> interpretation from Terry's logs (which are from USB sniffing).
>> What I have no clue is why this is happenning: both of you tried
>> powered USB hubs, and this was my only guess (insufficient power on
>> the bus). If you could find a kernel version for your boards that are
>> not causing any problems, then we can rule out HW related issues...
>
> I agree that it looks like the USB controller's fault. It seems highly unlikely, though. A few reasons why I think so:
>
> 1. The 2 boards use completely different USB controllers.
> 2. Both those platforms talk to other USB devices pushing much higher data rates without any issue.
> 3. The problem doesn't show up when the bus is forced to USB 1.1 (Full speed instead of high speed)
> 4. The problem doesn't only show up during non-duplicate-filtered scans. I can reproduce it (although it takes a lot longer) with LE Advertisement packets only coming in once per second.
>
> Let me ask another question:
>
> Are there any other USB devices that ever send a bunch of data with no polling from the host? IE, if the host has no buffer space to accept an INT transaction, it will just be lost, and all other USB devices wait for the INT to be processed by getting an acknowledgement before generating another INT? Apologies for my lack of in-depth knowledge for the workings of USB bus transactions.
>
> My suspicion is that if you follow the Bluetooth spec for HCI over USB (which all of these vendors making these USB devices should) then if the host USB controller is not fast enough, it can be overwhelmed. Forcing the bus to full-speed mode, means the bluetooth receiver is forced to buffer on its end, since it simply cannot transfer data to the host fast enough to overwhelm it. If my suspicion is true, then that means any packets of this type could overflow a host if it is busy enough, although much less likely to happen if the host is more powerful (in terms of processing power).
>
> Regards,
>
> Terry
>
--
Adam Warski
http://twitter.com/#!/adamwarski
http://www.softwaremill.com
http://www.warski.org