2007-08-21 17:32:24

by Jeffrey Cuenco

[permalink] [raw]
Subject: [Bluez-devel] multiple dongles and hci_read_remote_name() bug?

Hi,

I've searched through the archives and didn't seem to find anything
related towards hci_read_remote_name() sometimes reporting back incorrect
names.

As some background for my setup and why I am doing this, I am part of a
Bluetooth-related research project at UCSD where we have built Bluetooth
scanners using 4 dongles connected to an embedded system box running Debian
with the latest "Debian-stable" BlueZ release in the libbluetooth2 package.

In my program I spawn 4 threads, where each thread is dedicated to a
dongle. The first thread concentrates solely on hci_inquiry and grabs
addresses and stores them, the clock_offset, and pscan_rep_mode into a
queue. All other threads grab from the queue and perform an
hci_read_remote_name_with_clock_offset() call on each address.

I used the same 25000 timeout setting as in hcitool.c, as I discovered from
experience that for some reason it isn't possible to set it any lower
without it doing weird things (why is that? Is there a natural hardware
timeout that can't be changed?)

The main problem, however, is that sometimes I get back wrong names; there
is a LOT of bluetooth traffic in this building (particularly tons of
mac-mini's) that frequently respond back to the scanner, and sometimes the
mac mini name comes in place of the real name of the device I am trying to
query. This doesn't happen all the time, however, but tends to happen when
a lot of name queries are happening simultaneously.

I had seen a similar thread on here written by Avaited regarding a patch
that he had prepared to fix this problem; claiming that he had found the
problem in hci.c. As the patch hasn't surfaced yet, and assuming that this
was my problem, I prepared a patch of my own to attempt to fix it; I made
the bdaddr pointer non-const and made the small change below assuming that
if the wrong name came into rn.name, the wrong bdaddr must be stored in
rn.bdaddr as well:

int hci_read_remote_name_with_clock_offset(int dd, /*const*/ bdaddr_t
*bdaddr, uint8_t pscan_rep_mode, uint16_t clkoffset, int len, char *name,
int to)
{
evt_remote_name_req_complete rn;
remote_name_req_cp cp;
struct hci_request rq;

memset(&cp, 0, sizeof(cp));
bacpy(&cp.bdaddr, bdaddr);
cp.pscan_rep_mode = pscan_rep_mode;
cp.clock_offset = clkoffset;

memset(&rq, 0, sizeof(rq));
rq.ogf = OGF_LINK_CTL;
rq.ocf = OCF_REMOTE_NAME_REQ;
rq.cparam = &cp;
rq.clen = REMOTE_NAME_REQ_CP_SIZE;
rq.event = EVT_REMOTE_NAME_REQ_COMPLETE;
rq.rparam = &rn;
rq.rlen = EVT_REMOTE_NAME_REQ_COMPLETE_SIZE;

if (hci_send_req(dd, &rq, to) < 0)
return -1;

if (rn.status) {
errno = EIO;
return -1;
}

rn.name[247] = '\0';

// makes sure received name's bdaddr matches what we're expecting
bacpy(bdaddr, &rn.bdaddr);

strncpy(name, (char *) rn.name, len);

return 0;
}
After I performed this patch, things seemed to be alright for a while.
However, I noticed that usually after a few hours of running the scanner
software hcid would accumulate in CPU time:

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1838 0.8 0.3 1960 784 ? Ss 04:01 2:55
/usr/sbin/hcid

and when this happened the frequency of name errors recurred even with the
patch, and eventually the dongles would stop scanning altogether.

As this problem had happened beforehand, we tried resetting the dongles when
the remote name request were to fail:

if (hci_read_remote_name_with_clock_offset(sock, item_addr,
pscan_repmode, time_offset, sizeof(name), name,
(int)(m_timeout*1000)) < 0)
{
close(sock);
sock = scan_reset_device(dev_id);
}

where the scan_reset_device() function is below:

int scan_reset_device(int devid)
{
int ctl;

ctl = socket(AF_BLUETOOTH, SOCK_RAW, BTPROTO_HCI);
ioctl(ctl, HCIDEVDOWN, devid);
ioctl(ctl, HCIDEVUP, devid);
shutdown(ctl, 2);
close(ctl);
return hci_open_dev(devid);
}

As far as I can see the code shouldn't cause any problems, though I find it
strange that we have to reset the dongles at all, yet even with the reset
they eventually freeze up anyway. The dongles that we are using are Belkin
F8T013. We are trying to get some CSR dongles instead but at this point in
time the Belkin's are what we have available to us.

Is there a problem with using multiple dongles in this manner that our
software is using them? We would like to use the multiple-dongle approach
so that the names come in faster, but if the names keep coming in
incorrectly then our data can't use the name data reliably. Is there a bug
at the kernel level where the name requests could get mixed up if too many
name requests are happening at the same time with multiple dongles?

I apologize that this message has gotten long enough already, but I've tried
to work on this problem for weeks and am hoping that one of the developers
could clue me in as to whether or not I'm doing something wrong, and/or if
indeed there is a BlueZ bug regarding this manner.

Thanks in advance,
--
-Jeff


Attachments:
(No filename) (4.99 kB)
(No filename) (7.23 kB)
(No filename) (315.00 B)
(No filename) (164.00 B)
Download all attachments