2006-09-28 17:19:02

by Marsette Vona

[permalink] [raw]
Subject: [Bluez-devel] possible regression under rf interference

Hello -

[while the issue described below does relate to some fedora core versioning, we feel that it is also appropriate to ask for advice here in the bluez community as the core problem, or at least a better understanding of what's going on, is very likely of a technical bluetooth nature]

[if you prefer to skip the lengthly problem description, you can grep directly for the QUESTIONS section]

We've got a robot here (*) which runs a bluetooth interface. The robot has a little bluetooth v2 module on it (a BlueRadios C40). It communicates with a PC running Linux and a bluetooth USB dongle (**). We use a basic "rfcomm connect" to setup a /dev/rfcommX port, and talk to the robot serially over that. Nice. Well, it was nice.

Up until around early July of this year, everything was working well. The PC had fedora core 4. Then we upgraded the pc to FC5 and we started to get a very challenging problem. It goes like this:

1) we power up the robot and do the "rfcomm connect" as usual. Everything is happy, the robot is communicating

2) we can talk to the robot over the bluetooth link apparrently normally, until

3) we tell the robot to turn on one of its motors. Based on some oscilloscope measurements, we have strong evidence that this can cause some amount of RF noise reaching up into the GHz range. Note carefully: before fc5, and even now when we substitue an RS232 bluetooth interface for the usb dongle (see below), the bluetooth protocoll error correction is apparrently sufficient to tolerate this noise without significant delays.

4) the motor goes on and the robot continues to communicate, usually, for about 1 to 10 seconds more. It then "hiccups", i.e., there is a delay in communication for greater than 1.0 second.

5) for safety, we really can't tolerate comm delays that long, so the robot's on-board software shuts the motor down

6) it appears that the bluetooth link does not actually die, as we can always continue communicating with the robot (i.e. using the same previous rfcomm connection) after the motor stops

Obviously, we are trying to isolate the particular versions of things in which the above fault occurs and those in which it doesnt. This is proving more difficult than expected (we unfortunately do not have a simple version snapshot of the setup which was working pre-fc5, doh). We tried reverting to the kernel which we believe was in effect at the time everything was working (2.6.16-1.2069_FC4-i686), and to the most recent FC4 bluetooth rpms (which *should* have been in effect on our machine at that time...):

bluez-pin-0.24-2.i386.rpm
bluez-libs-2.15-1.i386.rpm
bluez-utils-2.15-7.i386.rpm
bluez-hcidump-1.18-1.i386.rpm

Frustratingly, the problem seemed to persist. Perhaps our original (working) fc4 setup was not fully up-to-date, and so was using even older bluez rpms. Most likely the kernel version was as above though. Or perhaps reverting the RPMS did not actually revert some /etc/ config files which had been updated by the FC5 rpms. Or maybe we made a mistake in attemping the reversion (which took the better part of a day for reasons we're sure you don't want to even hear about).

For completeness, here are the versions of things under FC5 where the timeouts definitely do occur:

kernel 2.6.17-1.2187_FC5-i686

bluez-pin-0.30-2.i386.rpm
bluez-libs-2.25-1.i386.rpm
bluez-utils-2.25-4.i386.rpm
bluez-hcidump-1.30-1.i386.rpm
gnome-bluetooth-0.7.0-2.i386.rpm
gnome-bluetooth-libs-0.7.0-2.i386.rpm
libbtctl-0.6.0-5.i386.rpm

We have entertained the possibility that the issue is being caused by some other lossage which just happened to coincide with the fc5 update. However, that does not seem to be the case:

a) if, instead of the USB dongle, we use an RS232 bluetooth interface on the pc (also based on the BlueRadios C40), everything works fine, even now

b) we have tried different USB dongles from different manufacturers with the same effects

c) we have even tried different linux workstations entirely, and still gotten the comm timeout when using a bluetooth USB dongle and bluez

Finally, we realize that delays over 1.0s may actually be within-spec for bluetooth comms (we have not read the specs). However note that the delays were never observed before the fc5 update, and also note that they are not observed even now if we avoid using the bluez software stack and instead use an RS232 bluetooth module.

QUESTIONS

1) are long timeouts under RF interference possibly a known current regression? does anyone regularly test bluez under significant RF inferference?

2) does bluez (and if so, in what codepath) even deal with things like error detection and correction, packet retransmission, tx power management, rf interference, link quality monitoring, etc? Or are things like that handled by firmware in the usb dongle?

3) are we possibly barking up the wrong tree? Can anyone think of any other possible cause which fits the above symptoms?

4) is the issue likely in the bluez kernel code, the bluez user space code, or something else entirely?

5) what else could we do to try to debug this? we did an hcidump of a "bad" session, but our untrained eye didn't see anything suspicious in it. Can we somehow tell the bluez {kernel,userspace} code to be more verbose about what's going on?

6) what should we read to educate ourselves about what is going on here so we can better diagnose an issue like this?

Any help is appreciated.

Marsette (Marty) Vona
Distributed Robotics Lab
MIT CSAIL


(*) http://www.mit.edu/~vona/publications/Vona_Detweiler_Rus__2006__Shady_Robust_Truss_Climbing_With_Mechanical_Compliances.pdf

(**) we have tried several dongles, with apparrently identical behavior. One is a D-Link DBT-120. This particular one definitely worked pre-fc5, and fails as described above post-fc5.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel


2006-09-30 08:01:16

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [Bluez-devel] hcid pairing bug when security is auto

Hi Jean-Jacques,

> > This is for debugging purpose and really special use cases.
> Which ones (use cases) ?

I think the debugging purpose is clear, I needed it. The special use
cases are fixed pair of devices with a fixed PIN or a static random PIN
that people can remember. It is possible to implement all this using the
passkey agent, but for some embedded devices a static PIN is needed and
therefor it is there. However, this feature is undocumented for a
reason, but it is not for the ordinary desktop user. The desktop user
also has no write access to this directory.

> I've tried the bluez passkey-agent, and it is not very practical to have 3
> "deamons" (hcid, dbus-daemon and the passkey agent) just to pair new devices.

The hcid is running as root, the system D-Bus daemon as message bus user
and the passkey agent as normal unprivileged user. It is not black and
white and Linux (including Bluez) scales from very small system to big
ones.

> At least explain me why the auto mode make the difference between outgoing and
> incoming connections ?
> Or make the auto mode use the default passkey in both cases ! (or remove the
> auto mode to really force users to use dbus ...).

The auto mode is no longer default. That was a mistake in the default
config. What you really want is user, which is also more secure than a
default PIN for all incoming connection. The reason why it exists is
historical. We wrote it when the first chips supported authentication
and encryption. Sounded like a nice idea back then.

Regards

Marcel



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2006-09-30 00:18:36

by Jean-Jacques Brucker

[permalink] [raw]
Subject: Re: hcid pairing bug when security is auto

Le Samedi 30 Septembre 2006 01:09, Marcel Holtmann a =E9crit=A0:
> Hi Jean-Jacques,
>
> > There was a bug in hcid when pairing and security is in auto mode.
>
> that is not a bug. It is a feature.
>
> > The code watched if the connection was out or in , but in any case the
> > remote device can ask or not for authentication (and encryption).
> >
> > By example Sagem and Ericsson mobile phones ask for
> > authentication/encryption with new devices, whithout watching who is
> > trying to connect first ... !
> >
> > Then the code was looking for pincodes in a file
> > "/var/lib/<local_bdaddr>/pincodes" but that wasn't write by any know
> > applications. (and it is not a place to make file to be write by
> > users...).
>
> This is for debugging purpose and really special use cases.
Which ones (use cases) ?

>
> > Then the code was trying to give the hand to dbus applications ("hey,
> > what this f... i have ask for auto mode !!").
>
> Read the hcid.conf and then you understand what auto means.
>
> > As i didn't know if there was some software that use the
> > "/var/lib/<local_bdaddr>/pincodes" file i let this file as a first base
> > to search and i have add a config file with the same name (but without
> > the same syntax) in the bluez config directory.
>
> PIN codes are not configuration. They are state file and thus they are
> placed under /var/lib.
>
> > Note: Using dbus is a good idea, but it would better to activate it or
> > not with a flag. Because dbus is very big to be embedded on small (and
> > embedded) systems...
>
> No. See other discussion about this topic. I made my decision. People
> can still use the 2.x generation or fork or whatever. The upstream BlueZ
> goes with D-Bus support. And once you used the D-Bus based API you are
> not going back. You can trust me on this. It solves a lot of problems.
>
> > Note2: I have watch in CVS that bluez used a file named pin in confdir.=
=2E.
> > What i have done is not really a regression. In fact we could insert the
> > content of the pincodes file inside the hcid.conf file .... but I don't
> > really know how to do it with bison (and i dislike bisons !-). At the e=
nd
> > the syntax of my pincodes file is simple and is read on each HCI "PIN
> > code request" command (when security is set to auto) and could be more
> > easily manage by extern software (that doesn't use dbus).
> >
> > PS: I have make the hcid.conf more explicit but I don't have patch man
> > pages for now, but if my patch is used, i'll obviously update them. (and
> > with our without my patch, its already need some updates..).
>
> The default mode for the security manager is now user and it will stay
> this way. No additional hacks around PIN codes are needed. The passkey
> agent interface is the way to go. There exists no argument that can
> convince me otherwise. Try using the passkey agent interface and you
> will see what I mean. We spent a lot of time getting this right and it
> is really nice and handy.

I've tried the bluez passkey-agent, and it is not very practical to have 3=
=20
"deamons" (hcid, dbus-daemon and the passkey agent) just to pair new device=
s.

At least explain me why the auto mode make the difference between outgoing =
and=20
incoming connections ?=20
Or make the auto mode use the default passkey in both cases ! (or remove th=
e=20
auto mode to really force users to use dbus ...).

( How can i connect and pair to new phones quickly, whitout using the=20
agent ? )

> Regards
>
> Marcel

2006-09-29 23:09:22

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [Bluez-devel] hcid pairing bug when security is auto

Hi Jean-Jacques,

> There was a bug in hcid when pairing and security is in auto mode.

that is not a bug. It is a feature.

> The code watched if the connection was out or in , but in any case the remote
> device can ask or not for authentication (and encryption).
>
> By example Sagem and Ericsson mobile phones ask for authentication/encryption
> with new devices, whithout watching who is trying to connect first ... !
>
> Then the code was looking for pincodes in a file
> "/var/lib/<local_bdaddr>/pincodes" but that wasn't write by any know
> applications. (and it is not a place to make file to be write by users...).

This is for debugging purpose and really special use cases.

> Then the code was trying to give the hand to dbus applications ("hey, what
> this f... i have ask for auto mode !!").

Read the hcid.conf and then you understand what auto means.

> As i didn't know if there was some software that use the
> "/var/lib/<local_bdaddr>/pincodes" file i let this file as a first base to
> search and i have add a config file with the same name (but without the same
> syntax) in the bluez config directory.

PIN codes are not configuration. They are state file and thus they are
placed under /var/lib.

> Note: Using dbus is a good idea, but it would better to activate it or not
> with a flag. Because dbus is very big to be embedded on small (and embedded)
> systems...

No. See other discussion about this topic. I made my decision. People
can still use the 2.x generation or fork or whatever. The upstream BlueZ
goes with D-Bus support. And once you used the D-Bus based API you are
not going back. You can trust me on this. It solves a lot of problems.

> Note2: I have watch in CVS that bluez used a file named pin in confdir... What
> i have done is not really a regression. In fact we could insert the content
> of the pincodes file inside the hcid.conf file .... but I don't really know
> how to do it with bison (and i dislike bisons !-). At the end the syntax of
> my pincodes file is simple and is read on each HCI "PIN code request" command
> (when security is set to auto) and could be more easily manage by extern
> software (that doesn't use dbus).
>
> PS: I have make the hcid.conf more explicit but I don't have patch man pages
> for now, but if my patch is used, i'll obviously update them. (and with our
> without my patch, its already need some updates..).

The default mode for the security manager is now user and it will stay
this way. No additional hacks around PIN codes are needed. The passkey
agent interface is the way to go. There exists no argument that can
convince me otherwise. Try using the passkey agent interface and you
will see what I mean. We spent a lot of time getting this right and it
is really nice and handy.

Regards

Marcel



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2006-09-29 20:08:33

by Jean-Jacques Brucker

[permalink] [raw]
Subject: hcid pairing bug when security is auto

There was a bug in hcid when pairing and security is in auto mode.

The code watched if the connection was out or in , but in any case the remote
device can ask or not for authentication (and encryption).

By example Sagem and Ericsson mobile phones ask for authentication/encryption
with new devices, whithout watching who is trying to connect first ... !

Then the code was looking for pincodes in a file
"/var/lib/<local_bdaddr>/pincodes" but that wasn't write by any know
applications. (and it is not a place to make file to be write by users...).

Then the code was trying to give the hand to dbus applications ("hey, what
this f... i have ask for auto mode !!").

As i didn't know if there was some software that use the
"/var/lib/<local_bdaddr>/pincodes" file i let this file as a first base to
search and i have add a config file with the same name (but without the same
syntax) in the bluez config directory.

Note: Using dbus is a good idea, but it would better to activate it or not
with a flag. Because dbus is very big to be embedded on small (and embedded)
systems...

Note2: I have watch in CVS that bluez used a file named pin in confdir... What
i have done is not really a regression. In fact we could insert the content
of the pincodes file inside the hcid.conf file .... but I don't really know
how to do it with bison (and i dislike bisons !-). At the end the syntax of
my pincodes file is simple and is read on each HCI "PIN code request" command
(when security is set to auto) and could be more easily manage by extern
software (that doesn't use dbus).

PS: I have make the hcid.conf more explicit but I don't have patch man pages
for now, but if my patch is used, i'll obviously update them. (and with our
without my patch, its already need some updates..).


Attachments:
(No filename) (1.76 kB)
hcid_autopairing.patch.gz (2.71 kB)
Download all attachments

2006-09-29 11:50:40

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [Bluez-devel] possible regression under rf interference

Hi Marsette,

> [while the issue described below does relate to some fedora core versioning, we feel that it is also appropriate to ask for advice here in the bluez community as the core problem, or at least a better understanding of what's going on, is very likely of a technical bluetooth nature]
>
> [if you prefer to skip the lengthly problem description, you can grep directly for the QUESTIONS section]
>
> We've got a robot here (*) which runs a bluetooth interface. The robot has a little bluetooth v2 module on it (a BlueRadios C40). It communicates with a PC running Linux and a bluetooth USB dongle (**). We use a basic "rfcomm connect" to setup a /dev/rfcommX port, and talk to the robot serially over that. Nice. Well, it was nice.
>
> Up until around early July of this year, everything was working well. The PC had fedora core 4. Then we upgraded the pc to FC5 and we started to get a very challenging problem. It goes like this:
>
> 1) we power up the robot and do the "rfcomm connect" as usual. Everything is happy, the robot is communicating
>
> 2) we can talk to the robot over the bluetooth link apparrently normally, until
>
> 3) we tell the robot to turn on one of its motors. Based on some oscilloscope measurements, we have strong evidence that this can cause some amount of RF noise reaching up into the GHz range. Note carefully: before fc5, and even now when we substitue an RS232 bluetooth interface for the usb dongle (see below), the bluetooth protocoll error correction is apparrently sufficient to tolerate this noise without significant delays.
>
> 4) the motor goes on and the robot continues to communicate, usually, for about 1 to 10 seconds more. It then "hiccups", i.e., there is a delay in communication for greater than 1.0 second.
>
> 5) for safety, we really can't tolerate comm delays that long, so the robot's on-board software shuts the motor down
>
> 6) it appears that the bluetooth link does not actually die, as we can always continue communicating with the robot (i.e. using the same previous rfcomm connection) after the motor stops
>
> Obviously, we are trying to isolate the particular versions of things in which the above fault occurs and those in which it doesnt. This is proving more difficult than expected (we unfortunately do not have a simple version snapshot of the setup which was working pre-fc5, doh). We tried reverting to the kernel which we believe was in effect at the time everything was working (2.6.16-1.2069_FC4-i686), and to the most recent FC4 bluetooth rpms (which *should* have been in effect on our machine at that time...):
>
> bluez-pin-0.24-2.i386.rpm
> bluez-libs-2.15-1.i386.rpm
> bluez-utils-2.15-7.i386.rpm
> bluez-hcidump-1.18-1.i386.rpm
>
> Frustratingly, the problem seemed to persist. Perhaps our original (working) fc4 setup was not fully up-to-date, and so was using even older bluez rpms. Most likely the kernel version was as above though. Or perhaps reverting the RPMS did not actually revert some /etc/ config files which had been updated by the FC5 rpms. Or maybe we made a mistake in attemping the reversion (which took the better part of a day for reasons we're sure you don't want to even hear about).
>
> For completeness, here are the versions of things under FC5 where the timeouts definitely do occur:
>
> kernel 2.6.17-1.2187_FC5-i686
>
> bluez-pin-0.30-2.i386.rpm
> bluez-libs-2.25-1.i386.rpm
> bluez-utils-2.25-4.i386.rpm
> bluez-hcidump-1.30-1.i386.rpm
> gnome-bluetooth-0.7.0-2.i386.rpm
> gnome-bluetooth-libs-0.7.0-2.i386.rpm
> libbtctl-0.6.0-5.i386.rpm
>
> We have entertained the possibility that the issue is being caused by some other lossage which just happened to coincide with the fc5 update. However, that does not seem to be the case:
>
> a) if, instead of the USB dongle, we use an RS232 bluetooth interface on the pc (also based on the BlueRadios C40), everything works fine, even now
>
> b) we have tried different USB dongles from different manufacturers with the same effects
>
> c) we have even tried different linux workstations entirely, and still gotten the comm timeout when using a bluetooth USB dongle and bluez
>
> Finally, we realize that delays over 1.0s may actually be within-spec for bluetooth comms (we have not read the specs). However note that the delays were never observed before the fc5 update, and also note that they are not observed even now if we avoid using the bluez software stack and instead use an RS232 bluetooth module.
>
> QUESTIONS
>
> 1) are long timeouts under RF interference possibly a known current regression? does anyone regularly test bluez under significant RF inferference?

we don't have to. The dongles are tested and BlueZ only access them over
HCI. There is no way to interfere with the RF in a wrong way without
using nasty vendor specific tricks (which we don't).

> 2) does bluez (and if so, in what codepath) even deal with things like error detection and correction, packet retransmission, tx power management, rf interference, link quality monitoring, etc? Or are things like that handled by firmware in the usb dongle?

This is all handled in the firmware of the chip. BlueZ doesn't have to
worry about it.

> 3) are we possibly barking up the wrong tree? Can anyone think of any other possible cause which fits the above symptoms?

The USB bus can be different. They USB subsystem changes a lot from
kernel version to kernel version.

> 4) is the issue likely in the bluez kernel code, the bluez user space code, or something else entirely?

I don't expect this to be BlueZ's fault at all.

> 5) what else could we do to try to debug this? we did an hcidump of a "bad" session, but our untrained eye didn't see anything suspicious in it. Can we somehow tell the bluez {kernel,userspace} code to be more verbose about what's going on?

You can tell the kernel code to *_DEBUG and then recompile, but you
won't see anything different than you see with hcidump. The kernel is
not doing any magic behind your back.

> 6) what should we read to educate ourselves about what is going on here so we can better diagnose an issue like this?

Try checking your USB hardware and the kernel support for USB. If you
don't use SCO channels you can load the hci_usb with isoc=0 parameter to
avoid isoc transfers on the USB bus.

Regards

Marcel



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel