2007-08-21 22:25:04

by Jeffrey Cuenco

[permalink] [raw]
Subject: [Bluez-users] Simultaneous hci_read_remote_name() requests on multiple dongles lead to btnames getting mapped to the wrong btaddr's

Hi,

I posted a thread about this in bluez-devel, but realized this is more
of a user-related question so I've posted it here. I've searched through
the archives and with the exception of one thread which came close to my
problem, didn't seem to find anything else related towards
hci_read_remote_name() sometimes reporting back incorrect names.

As some background for my setup and why I am doing this, I am part of a
Bluetooth-related research project at UCSD where we have built Bluetooth
scanners using 4 dongles connected to an embedded system box running Debian
with the latest "Debian-stable" BlueZ release in the libbluetooth2 package.

In my program I spawn 4 threads, where each thread is dedicated to a
dongle. The first thread concentrates solely on hci_inquiry and grabs
addresses and stores them, the clock_offset, and pscan_rep_mode into a
queue. All other threads grab from the queue and perform an
hci_read_remote_name_with_clock_offset() call on each address.

I used the same 25000 timeout setting as in hcitool.c, as I discovered from
experience that for some reason it isn't possible to set it any lower
without it doing weird things (why is that? Is there a natural hardware
timeout that can't be changed?)

The main problem, however, is that sometimes I get back wrong names; there
is a LOT of bluetooth traffic in this building (particularly tons of
mac-mini's) that frequently respond back to the scanner, and sometimes the
mac mini name comes in place of the real name of the device I am trying to
query. This doesn't happen all the time, however, but tends to happen when
a lot of name queries are happening simultaneously.

I had seen a similar thread on here written by Avaited regarding a patch
that he had prepared to fix this problem; claiming that he had found the
problem in hci.c. As the patch hasn't surfaced yet, and assuming that this
was my problem, I decided to try the following modifications to hci.c; I
made the bdaddr pointer non-const in all functions used by
hci_read_remote_name() and made the small change below assuming that if the
wrong name came into rn.name , the wrong bdaddr must be stored in rn.bdaddr as
well:

// makes sure received name's bdaddr matches what we're expecting
bacpy(bdaddr, &rn.bdaddr);

I placed this line right above the

strncpy(name, (char *) rn.name, len);


line near the end of the function. After doing this, things seemed to
improve. However, I noticed that usually after a few hours of running the
scanner software hcid would accumulate in CPU time:

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1838 0.8 0.3 1960 784 ? Ss 04:01 2:55
/usr/sbin/hcid

and when this happened the frequency of name errors recurred even with the
patch I did, and eventually the dongles would stop scanning altogether.

We've tried everything from resetting the dongles right after a name request
times out, to having the sniffer boxes reboot after a few hours, which is
something that we really don't want to do as it prevents us from gathering
data at certain intervals during the day.

As far as I can see the code shouldn't cause any problems, though I find it
strange that we have to reset the dongles at all, yet even with the reset
they eventually freeze up anyway; it's just a matter of time. The dongles
that we are using are Belkin F8T013. We are trying to get some CSR dongles
instead but at this point in time the Belkin's are what we have available to
us.

Is there a problem with using multiple dongles in this manner that our
software is using them? We would like to use the multiple-dongle approach
so that the names come in faster, but if the names keep coming in
incorrectly then our data can't use the name data reliably. Is there a bug
at the kernel level where the name requests could get mixed up if too many
name requests are happening at the same time with multiple dongles? Is it a
Broadcom chipset-specific problem?

As an update, today there managed to be a lot of new people in the building,
which resulted in lots of new data. Unfortunately the name problem happened
again, and even worse; almost every name that came in went to a different
btaddr, leading to our database storing incorrect names with other
btaddr's. With a ps aux at the terminal I found that this time, the hcid
wasn't even high in CPU time, which tells me that Avaited's suspicion may
still be the case; that multiple hci_read_remote_name requests in general
are the source of the the names getting mixed up?

I apologize that this message has gotten long enough already, but I've tried
to work on this problem for weeks and am hoping that someone could clue me
in as to whether or not I'm doing something wrong, and/or if indeed there is
a BlueZ bug regarding this manner.

Thanks in advance,

--
-Jeff


Attachments:
(No filename) (4.74 kB)
(No filename) (6.32 kB)
(No filename) (315.00 B)
(No filename) (164.00 B)
Download all attachments