2009-03-20 22:57:44

by Luis R. Rodriguez

[permalink] [raw]
Subject: Google Summer of Code 2009 -- Linux wireless roaming project

This is what we have for roaming on our todo list:

http://wireless.kernel.org/en/developers/todo-list#Roaming

This is what we have for our GSoC roaming project:

http://wireless.kernel.org/en/developers/GSoC/2009/Improve_wireless_roaming

We need to extend this now and since things are shifting towards nl80211
(the MLME SAP stuff) I think we may need to rethink this a bit. Ideas,
wishlists for how to improve our roaming with the help of GSoC students
would be appreciated.

Luis


2009-03-24 09:07:01

by Holger Schurig

[permalink] [raw]
Subject: Re: Google Summer of Code 2009 -- Linux wireless roaming project

> Basically roaming can be divided into 3 steps:
> 1) detect if it is time to roam
> 2) scan for better APs
> 3) associate with the new AP
>
> Step 1 is the most difficult one here, 2 only needs some
> tweaking and 3 should work as it is currently.


For one the disadvantage you mentioned I have some ideas:

> - The signal strength values on different cards are not
> comparable. So the threshold value has to be different for all
> cards.

First thing is to make the drivers report similar values in
similar situations. What would be helful for user-space as well,
so I think this should eventually be tackled independend of
roaming anyway.


But consider for now the case where two different cards provide
values that are (absolutely) quite different. If we look at
signal differences, this wouldn't harm us that much. Consider
this bss entries:

bss 1: 45
bss 2: 42 <- also the current one
bss 3: 10

In this case the difference from our BSS to the best matching one
is only 3. If we have some heuristic that says "Only roam if you
find an AP which is 6 points betters" we wouldn't roam. If
another card would report for the same situation

bss 1: 54
bss 2: 50 <- also the current one
bss 3: 12

(that's the same as above, but multiplied by 1.2) the outcome
would be the same.

> - Ping-pong effect if you sit between two APs both are in
> range but with a signal strength around the threshold.

It's my understanding that a good threshold would also prevent
ping-pongs ?!?


> - Unnecessary scanning if the signal strength is below the
> threshold but no better AP is in range will further reduce the
> connection quality and increase power consumption.

Either you know that your device is moving, than you want this
scanning, because soon the scanning is no longer unnecessary.
Even if at the current position (corner of the
street/warehouse/whatever) you "scan" in vein, some minutes
later the situation has changed.

Or you know that you're in hot-spot mode and then you attach to
an AP and stay with it. Then you don't need the whole roaming
sermon at all --- this is BTW the reason why mac80211 is,
despite it's awful roaming, such a success so for.

What I meant is that this is a policy decision (or trade-off
decision), that the user should be able to influence.


For one of my devices, I made lots of local changes:

- I provide a list of channels for the driver to scan. In most
warehouses only channels 1, 6, 11 are used. Then there's no
reason for the driver to scan at channel 2,3,4 etc. If user-
space didn't provide such a list, the driver has to scan on
all frequencies, so this is merely an optimization. But an
important one, helps tremenduously.

- I let the driver scan one frequency every n time units, e.g.
every second one channel. This makes the driver visit all
three channels during 3 seconds.

- If I would get all beacons of the current channel, AND if the
ESSID is not hidden, I would only scan for the channels I'm
not on. Because for my current channels I have the signal
strengths of all channels anyway and know also ESSID an IEs
to decide if I can roam or not, should the need arise.



> b) Number of consecutivley missed beacons below threshold

This tends to roam only when it is too late, e.g. when the
connection is nearly breaking. But you wrote this by
yourself :-)

In my case, I'm doing a full scan if this happens, to protect
about bad channel list provisioning.




> c) Only scan for new APs if the environment changes (e.g. we
> are moving or the AP is moving etc.)

You very seldom know about this, e.g. GPS is mostly useless
inside big buildings.

You can however record "Okay, when I associated to the AP my
signal strength was 56. If it drops below 50, I'll look if I
find something better".

> I already did some research on c) and it looks very promising
> but the topic is quite complex and needs more theoretical
> research first.

If you're serious about that then mac80211 should only get the
infrastructure necessary so that we can write different roaming
implementations, like we now have different rate selection
implementations.


> Scanning for new APs should not be started from within the
> driver or mac80211. Instead wpa_supplicant should care about
> that. Why? Just because the supplicant might have more
> information (maybe provided by NM) about the used network. For
> example a typical multi-AP network won't use all channels from
> within the bg-band due to signal interferences. Instead, all
> APs will be located on non-overlapping channels. Let's say 1,6
> and 11. Hence, if the supplicant tiggers a scan it will just
> leave all channels != 1,6,11 out of the scan request and the
> scan will take a shorter amount of time, which in turn speeds
> up the handoff delay.

That's similar to my local, debugfs-based channel list hack, but
better :-)

But please make this be able to run from wpa_supplicant alone,
don't force NM into the picture. Many embedded developers will
say "thank you" for this. :-)


> extend wpa_supplicant's network blocks to allow the
> specification of preferred channels ("channels=1,6,11"). This
> value could be provided by NM which gathered that information
> either from the user or through monitoring.

Or the value should simply be recorded in wpa_supplicant's config
file. No need no stinkin' NM ! :-)


> In order to lower the negative influence a scan has to the
> ongoing traffic the software scan implementation would have to
> be reworked. The scan should simply switch back to the
> operating channel every once in a while to allow queued
> packets to be delivered (in both directions).

Yeah, my local scan hack also does this. Scanning is there
actually a state machine, that scans a maximum of 3 channels at
one. If the user did not provide a channel list, I still scan
only 1,6,11 first. If I find now a better AP, I'm using that
one. Otherwise, I scan 3,8,13. Then 2,5,8 etc. The numbers are a
bit arbitrary and hardcoded, but you get the picture. Actually,
the reason for doing this is that this fullmac driver isn't able
to send null packets (for power save inidication) to the AP, so
I can leave the current channel only for very short times. That
it helps maintaining a smooth connection was a welcom side
effect :-)

2009-03-24 10:52:08

by Dan Williams

[permalink] [raw]
Subject: Re: Google Summer of Code 2009 -- Linux wireless roaming project

On Tue, 2009-03-24 at 11:46 +0100, Holger Schurig wrote:
> > Yes, that's a good idea. There are some more scenarios where
> > different roaming algorithms might make sense. However, I'm
> > still not sure where the roaming decision should be made (and
> > thus where the algorithm should be implemented). In user space
> > (wpa_supplicant) or in mac80211. Having it in user space would
> > allow non-mac80211-drivers to benefit too but the driver would
> > have to provide the necessary information.
>
> User-space is often easier to change *)
>
>
> However, if you want to use beacon-information, then user-space
> doesn't spring to my mind so quickly. Beacons come in quite
> fast, and I'd if I have to transport all of this to
> user-space ...
>
>
> Or we divide it: if user-spaces tells the kernel, then the kernel
> maintains a bss-list and whenever the kernel modifies the
> list "considerably", kernel pushes a nl80211 message to
> user-space, notifying it about the new state "once in a while".
> Then data gathering is in kernel, but decision making is in
> user-space. However, we need a good defintion for "considerably"
> and "once in a while", thought :-)

Right now the kernel will only keep the list for 10 seconds or so.
That's not enough to determine "considerably" really. Since of course
not every AP shows up every scan, if you don't keep some scan history
there's no good way to know what "considerably" really means... So
yeah, it would mean keeping a larger BSS list around in the kernel, or
even the last few BSS lists and a timestamp for each one, and figuring
out an algorithm to 'diff' the scan lists and come up with a threshold
for when that 'diff' is large enough to warrant a signal.

Dan

>
>
>
>
> *) EXCEPT if we're talking wpa_supplicant. Not everyone find
> wpa_supplicant easy to modify, because of the number of
> interwoven state-machines and handled corner-cases.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2009-03-24 11:05:18

by Helmut Schaa

[permalink] [raw]
Subject: Re: Google Summer of Code 2009 -- Linux wireless roaming project

Am Dienstag, 24. M=C3=A4rz 2009 schrieb Holger Schurig:
> > Hmm, quick example: AP1 - STA - AP2
> >
> > We cannot consider the signal strength as constant as it
> > varies over time even when neither the STA nor the AP are
> > moving. Assume a threshold value of t=3D40. Furthermore, the
> > signal strength of AP1 and AP2 might alter between 35-50 which
> > means we have an average signal strength of 42,5 > t.
> > Nevertheless that would result in ping-pongs between AP1 and
> > AP2 because the signal might drop below t on both APs, while
> > it would be better to stick to one AP as the signal is already
> > quite bad (but still good enough to do some communication).
>=20
> Yeah, but if the client is moving, you have to live with that,=20
> more or less.
>=20
> And if the client is not moving (and roaming is a loadable kernel=20
> module), then simply don't load it :-)

Ah, ok. I was more referring to an ordinary laptop user who sits at his
desk and once in a while starts moving (for example to a conference roo=
m).
While he sits at his desk the optimal solution shouldn't trigger a scan=
as
the chance that a better AP pops up is relatively low. Once he starts
moving scanning is desired.

> My ad-hoc approach that I already implemented (for non-mac80211)=20
> shows a quite number of scannings. But that is ok for my=20
> use-case (e.g. telnet connection via WLAN). "Connection lost" is=20
> way worse than one scanning/reassociation too much, especially=20
> if the scanning/association is done intelligently.
>=20
> So for now I wouldn't optimize here, but make non-sucking roaming=20
> possible in the first place. We can build upon this anyway.

=46ine with me. But too aggressive scanning might lead to unstable or
intermittent connections, especially when WPA-EAP without PMKSA-caching
is used where roaming from one AP to another can take up to several
seconds ;)

And additionally if the connection is idle, repeated scanning will
increase the power consumption which is not desired on battery driven
devices.

Helmut

2009-03-24 10:39:57

by Dan Williams

[permalink] [raw]
Subject: Re: Google Summer of Code 2009 -- Linux wireless roaming project

On Tue, 2009-03-24 at 11:15 +0100, Helmut Schaa wrote:
> Am Dienstag, 24. M=C3=A4rz 2009 schrieb Holger Schurig:
> > > Basically roaming can be divided into 3 steps:
> > > 1) detect if it is time to roam
> > > 2) scan for better APs
> > > 3) associate with the new AP
> > >
> > > Step 1 is the most difficult one here, 2 only needs some
> > > tweaking and 3 should work as it is currently.
> >=20
> > For one the disadvantage you mentioned I have some ideas:
> >=20
> > > - The signal strength values on different cards are not
> > > comparable. So the threshold value has to be different for all
> > > cards.
> >=20
> > First thing is to make the drivers report similar values in=20
> > similar situations. What would be helful for user-space as well,=20
> > so I think this should eventually be tackled independend of=20
> > roaming anyway.
>=20
> Yep, that would help a lot (if it is achievable).
>=20
> > But consider for now the case where two different cards provide=20
> > values that are (absolutely) quite different. If we look at=20
> > signal differences, this wouldn't harm us that much. Consider=20
> > this bss entries:
> >=20
> > bss 1: 45
> > bss 2: 42 <- also the current one
> > bss 3: 10
> >=20
> > In this case the difference from our BSS to the best matching one=20
> > is only 3. If we have some heuristic that says "Only roam if you=20
> > find an AP which is 6 points betters" we wouldn't roam. If=20
> > another card would report for the same situation
> >=20
> > bss 1: 54
> > bss 2: 50 <- also the current one
> > bss 3: 12
> >=20
> > (that's the same as above, but multiplied by 1.2) the outcome=20
> > would be the same.
>=20
> Got your point. My concern was not about the AP selection ;). More th=
at if
> the values are hardly comparable (like card A returns a value of 40 w=
hile
> card B returns a value of 50 for the same AP) it is difficult to find=
a good
> threshold value when to trigger a scan.
>=20
> > > - Ping-pong effect if you sit between two APs both are in
> > > range but with a signal strength around the threshold.
> >=20
> > It's my understanding that a good threshold would also prevent=20
> > ping-pongs ?!?
>=20
> Hmm, quick example: AP1 - STA - AP2
>=20
> We cannot consider the signal strength as constant as it varies over =
time
> even when neither the STA nor the AP are moving. Assume a threshold v=
alue
> of t=3D40. Furthermore, the signal strength of AP1 and AP2 might alte=
r between
> 35-50 which means we have an average signal strength of 42,5 > t. Nev=
ertheless
> that would result in ping-pongs between AP1 and AP2 because the signa=
l might
> drop below t on both APs, while it would be better to stick to one AP=
as the
> signal is already quite bad (but still good enough to do some communi=
cation).
>=20
> > > - Unnecessary scanning if the signal strength is below the
> > > threshold but no better AP is in range will further reduce the
> > > connection quality and increase power consumption.
> >=20
> > Either you know that your device is moving, than you want this=20
> > scanning, because soon the scanning is no longer unnecessary.=20
> > Even if at the current position (corner of the=20
> > street/warehouse/whatever) you "scan" in vein, some minutes=20
> > later the situation has changed.
> >=20
> > Or you know that you're in hot-spot mode and then you attach to=20
> > an AP and stay with it. Then you don't need the whole roaming=20
> > sermon at all --- this is BTW the reason why mac80211 is,=20
> > despite it's awful roaming, such a success so for.
> >=20
> > What I meant is that this is a policy decision (or trade-off=20
> > decision), that the user should be able to influence.
>=20
> Fully agreed. It would really make sense to turn on roaming on a per
> network base (maybe in wpa_supplicant's network blocks for example).
>=20
> > For one of my devices, I made lots of local changes:
> >=20
> > - I provide a list of channels for the driver to scan. In most
> > warehouses only channels 1, 6, 11 are used. Then there's no
> > reason for the driver to scan at channel 2,3,4 etc. If user-
> > space didn't provide such a list, the driver has to scan on
> > all frequencies, so this is merely an optimization. But an
> > important one, helps tremenduously.
> >=20
> > - I let the driver scan one frequency every n time units, e.g.
> > every second one channel. This makes the driver visit all
> > three channels during 3 seconds.
> >=20
> > - If I would get all beacons of the current channel, AND if the
> > ESSID is not hidden, I would only scan for the channels I'm
> > not on. Because for my current channels I have the signal
> > strengths of all channels anyway and know also ESSID an IEs
> > to decide if I can roam or not, should the need arise.
> >=20
> >=20
> >=20
> > > b) Number of consecutivley missed beacons below threshold
> >=20
> > This tends to roam only when it is too late, e.g. when the=20
> > connection is nearly breaking. But you wrote this by=20
> > yourself :-)
> >=20
> > In my case, I'm doing a full scan if this happens, to protect=20
> > about bad channel list provisioning.
> >=20
> >=20
> >=20
> >=20
> > > c) Only scan for new APs if the environment changes (e.g. we
> > > are moving or the AP is moving etc.)
> >=20
> > You very seldom know about this, e.g. GPS is mostly useless=20
> > inside big buildings.
> >=20
> > You can however record "Okay, when I associated to the AP my=20
> > signal strength was 56. If it drops below 50, I'll look if I=20
> > find something better".
> >=20
> > > I already did some research on c) and it looks very promising
> > > but the topic is quite complex and needs more theoretical
> > > research first.
> >=20
> > If you're serious about that then mac80211 should only get the=20
> > infrastructure necessary so that we can write different roaming=20
> > implementations, like we now have different rate selection=20
> > implementations.
>=20
> Yes, that's a good idea. There are some more scenarios where differen=
t
> roaming algorithms might make sense. However, I'm still not sure wher=
e
> the roaming decision should be made (and thus where the algorithm sho=
uld
> be implemented). In user space (wpa_supplicant) or in mac80211. Havin=
g
> it in user space would allow non-mac80211-drivers to benefit too but =
the
> driver would have to provide the necessary information.
>=20
> > > Scanning for new APs should not be started from within the
> > > driver or mac80211. Instead wpa_supplicant should care about
> > > that. Why? Just because the supplicant might have more
> > > information (maybe provided by NM) about the used network. For
> > > example a typical multi-AP network won't use all channels from
> > > within the bg-band due to signal interferences. Instead, all
> > > APs will be located on non-overlapping channels. Let's say 1,6
> > > and 11. Hence, if the supplicant tiggers a scan it will just
> > > leave all channels !=3D 1,6,11 out of the scan request and the
> > > scan will take a shorter amount of time, which in turn speeds
> > > up the handoff delay.
> >=20
> > That's similar to my local, debugfs-based channel list hack, but=20
> > better :-)
> >=20
> > But please make this be able to run from wpa_supplicant alone,=20
> > don't force NM into the picture. Many embedded developers will=20
> > say "thank you" for this. :-)
>=20
> Also agreed. The solution should be usable without NM too but I thoug=
ht
> more about the degree of automatism here. If plain wpa_supplicant is =
used
> the wpa_supplicant config should simply contain the channel list stat=
ically
> configured while in the NM case the channel list could be created by =
NM
> based on historical data.

Most wifi-related stuff should be implemented in the supplicant anyway.
Shouldn't need NM for that. Think of NM more of an overall network
policy manager and configuration storage manager instead of a wifi
control daemon. NM pushes configuration to the supplicant, and tells
the supplicant "go!". I don't have any particular objection to letting
the supplicant make the roaming decisions (or even pushing multiple
network blocks down to the supplicant from NM) as long as the desired
behavior is achieved.

Dan

> > > extend wpa_supplicant's network blocks to allow the
> > > specification of preferred channels ("channels=3D1,6,11"). This
> > > value could be provided by NM which gathered that information
> > > either from the user or through monitoring.
> >=20
> > Or the value should simply be recorded in wpa_supplicant's config=20
> > file. No need no stinkin' NM ! :-)
>=20
> Adding a network block through dbus or through the config file is pre=
tty
> much the same. So, yes, this should of course also work with a plain=20
> supplicant config.
>=20
> Helmut
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wirel=
ess" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2009-03-24 12:52:19

by Mats Karlsson

[permalink] [raw]
Subject: Re: Google Summer of Code 2009 -- Linux wireless roaming project

Hi,

Im not a developer but I have a degree in automation and I think that
some of that knowledge could be used here.

I would propose that PID is considered as the formula for deciding
when to switch from one AP to another.

This formula is widely used in automation processes and an quite good
explanation can be found here
http://en.wikipedia.org/wiki/PID_controller


Regards
Mats

On Tue, Mar 24, 2009 at 12:05 PM, Helmut Schaa
<[email protected]> wrote:
> Am Dienstag, 24. M=E4rz 2009 schrieb Holger Schurig:
>> > Hmm, quick example: AP1 - STA - AP2
>> >
>> > We cannot consider the signal strength as constant as it
>> > varies over time even when neither the STA nor the AP are
>> > moving. Assume a threshold value of t=3D40. Furthermore, the
>> > signal strength of AP1 and AP2 might alter between 35-50 which
>> > means we have an average signal strength of 42,5 > t.
>> > Nevertheless that would result in ping-pongs between AP1 and
>> > AP2 because the signal might drop below t on both APs, while
>> > it would be better to stick to one AP as the signal is already
>> > quite bad (but still good enough to do some communication).
>>
>> Yeah, but if the client is moving, you have to live with that,
>> more or less.
>>
>> And if the client is not moving (and roaming is a loadable kernel
>> module), then simply don't load it :-)
>
> Ah, ok. I was more referring to an ordinary laptop user who sits at h=
is
> desk and once in a while starts moving (for example to a conference r=
oom).
> While he sits at his desk the optimal solution shouldn't trigger a sc=
an as
> the chance that a better AP pops up is relatively low. Once he starts
> moving scanning is desired.
>
>> My ad-hoc approach that I already implemented (for non-mac80211)
>> shows a quite number of scannings. But that is ok for my
>> use-case (e.g. telnet connection via WLAN). "Connection lost" is
>> way worse than one scanning/reassociation too much, especially
>> if the scanning/association is done intelligently.
>>
>> So for now I wouldn't optimize here, but make non-sucking roaming
>> possible in the first place. We can build upon this anyway.
>
> Fine with me. But too aggressive scanning might lead to unstable or
> intermittent connections, especially when WPA-EAP without PMKSA-cachi=
ng
> is used where roaming from one AP to another can take up to several
> seconds ;)
>
> And additionally if the connection is idle, repeated scanning will
> increase the power consumption which is not desired on battery driven
> devices.
>
> Helmut
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wirel=
ess" in
> the body of a message to [email protected]
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>

2009-03-21 11:43:38

by Helmut Schaa

[permalink] [raw]
Subject: Re: Google Summer of Code 2009 -- Linux wireless roaming project

Am Freitag, 20. M=C3=A4rz 2009 schrieb Luis R. Rodriguez:
> We need to extend this now and since things are shifting towards nl80=
211
> (the MLME SAP stuff) I think we may need to rethink this a bit. Ideas=
,
> wishlists for how to improve our roaming with the help of GSoC studen=
ts
> would be appreciated.

Ok, here are some considerations regarding roaming:

Basically roaming can be divided into 3 steps:
1) detect if it is time to roam
2) scan for better APs
3) associate with the new AP

Step 1 is the most difficult one here, 2 only needs some tweaking and 3
should work as it is currently.

So, how can we detect if it is beneficial to roam?

a) Just check the signal strength if it drops below a certain threshold
and start a scan if that happens

Advantages:
- Easy to implement

Disadvantages:=20
- The signal strength values on different cards are not comparable. So =
the
threshold value has to be different for all cards.
- Ping-pong effect if you sit between two APs both are in range but wit=
h a
signal strength around the threshold.
- Unnecessary scanning if the signal strength is below the threshold bu=
t no
better AP is in range will further reduce the connection quality and
increase power consumption.

b) Number of consecutivley missed beacons below threshold

Advantages:
- Also easy to implement
- Comparable value on all cards

Disadvantages:
- e.g. 10 consecutively missed beacons means that the connection is alr=
eady
quite bad but we need around 1 second (beacon interval=3D100ms) to de=
tect
that =3D> Handoff delay is strictly greater than 1 second (+scan and
association)
- Using smaller values might result in a similar ping-pong effect as
described above
- Unnecessary scanning if the number of missed beacons drops below the
threshold every once and then but no better AP is in range will furth=
er
reduce the connection quality and increase power consumption.

c) Only scan for new APs if the environment changes (e.g. we are moving=
or
the AP is moving etc.)

This could for example be done by computing the sample variance of the =
signal
strength (maybe measured over the last second) which will change signif=
icant
if the client or the AP is moving (see [1] for details).

If the client device has a GPS it would be beneficial to only scan for =
better
APs if the client is moving.

Advantages:
- Number of unnecessary scans is lower than for a) and b)

Disadvantages:
- Needs some more theoretical research on how to automatically find the
threshold values for the signal strength variance.
- Complex implementation

I already did some research on c) and it looks very promising but the t=
opic
is quite complex and needs more theoretical research first.



Now, some considerations regarding the implementation:

Independently of which trigger is used to start the roaming process I w=
ould
suggest the following:

Scanning for new APs should not be started from within the driver or
mac80211. Instead wpa_supplicant should care about that. Why? Just beca=
use
the supplicant might have more information (maybe provided by NM) about=
the
used network. For example a typical multi-AP network won't use all chan=
nels
from within the bg-band due to signal interferences. Instead, all APs w=
ill be
located on non-overlapping channels. Let's say 1,6 and 11. Hence, if th=
e
supplicant tiggers a scan it will just leave all channels !=3D 1,6,11 o=
ut of
the scan request and the scan will take a shorter amount of time, which
in turn speeds up the handoff delay.

However, that infrastructure is not there yet but something like the fo=
llowing
would be worth considering: extend wpa_supplicant's network blocks to a=
llow
the specification of preferred channels ("channels=3D1,6,11"). This val=
ue could
be provided by NM which gathered that information either from the user =
or
through monitoring. If the client was already connected to several APs =
in the
same ESS it would just pass these channels to wpa_supplicant. Of course=
if no
AP is found on the preferred channels a full scan might have to run.



Ok, regarding the second part "scanning":

In order to lower the negative influence a scan has to the ongoing traf=
fic
the software scan implementation would have to be reworked. The scan sh=
ould
simply switch back to the operating channel every once in a while to al=
low
queued packets to be delivered (in both directions).

Phew! I'm pretty sure I've missed several ideas/considerations here but=
that
has to suffice for now.

Helmut

[1] http://www.informatik.uni-mannheim.de/pi4/publications/King2008c.pd=
f=20

2009-03-24 10:15:40

by Helmut Schaa

[permalink] [raw]
Subject: Re: Google Summer of Code 2009 -- Linux wireless roaming project

Am Dienstag, 24. M=C3=A4rz 2009 schrieb Holger Schurig:
> > Basically roaming can be divided into 3 steps:
> > 1) detect if it is time to roam
> > 2) scan for better APs
> > 3) associate with the new AP
> >
> > Step 1 is the most difficult one here, 2 only needs some
> > tweaking and 3 should work as it is currently.
>=20
> For one the disadvantage you mentioned I have some ideas:
>=20
> > - The signal strength values on different cards are not
> > comparable. So the threshold value has to be different for all
> > cards.
>=20
> First thing is to make the drivers report similar values in=20
> similar situations. What would be helful for user-space as well,=20
> so I think this should eventually be tackled independend of=20
> roaming anyway.

Yep, that would help a lot (if it is achievable).

> But consider for now the case where two different cards provide=20
> values that are (absolutely) quite different. If we look at=20
> signal differences, this wouldn't harm us that much. Consider=20
> this bss entries:
>=20
> bss 1: 45
> bss 2: 42 <- also the current one
> bss 3: 10
>=20
> In this case the difference from our BSS to the best matching one=20
> is only 3. If we have some heuristic that says "Only roam if you=20
> find an AP which is 6 points betters" we wouldn't roam. If=20
> another card would report for the same situation
>=20
> bss 1: 54
> bss 2: 50 <- also the current one
> bss 3: 12
>=20
> (that's the same as above, but multiplied by 1.2) the outcome=20
> would be the same.

Got your point. My concern was not about the AP selection ;). More that=
if
the values are hardly comparable (like card A returns a value of 40 whi=
le
card B returns a value of 50 for the same AP) it is difficult to find a=
good
threshold value when to trigger a scan.

> > - Ping-pong effect if you sit between two APs both are in
> > range but with a signal strength around the threshold.
>=20
> It's my understanding that a good threshold would also prevent=20
> ping-pongs ?!?

Hmm, quick example: AP1 - STA - AP2

We cannot consider the signal strength as constant as it varies over ti=
me
even when neither the STA nor the AP are moving. Assume a threshold val=
ue
of t=3D40. Furthermore, the signal strength of AP1 and AP2 might alter =
between
35-50 which means we have an average signal strength of 42,5 > t. Never=
theless
that would result in ping-pongs between AP1 and AP2 because the signal =
might
drop below t on both APs, while it would be better to stick to one AP a=
s the
signal is already quite bad (but still good enough to do some communica=
tion).

> > - Unnecessary scanning if the signal strength is below the
> > threshold but no better AP is in range will further reduce the
> > connection quality and increase power consumption.
>=20
> Either you know that your device is moving, than you want this=20
> scanning, because soon the scanning is no longer unnecessary.=20
> Even if at the current position (corner of the=20
> street/warehouse/whatever) you "scan" in vein, some minutes=20
> later the situation has changed.
>=20
> Or you know that you're in hot-spot mode and then you attach to=20
> an AP and stay with it. Then you don't need the whole roaming=20
> sermon at all --- this is BTW the reason why mac80211 is,=20
> despite it's awful roaming, such a success so for.
>=20
> What I meant is that this is a policy decision (or trade-off=20
> decision), that the user should be able to influence.

=46ully agreed. It would really make sense to turn on roaming on a per
network base (maybe in wpa_supplicant's network blocks for example).

> For one of my devices, I made lots of local changes:
>=20
> - I provide a list of channels for the driver to scan. In most
> warehouses only channels 1, 6, 11 are used. Then there's no
> reason for the driver to scan at channel 2,3,4 etc. If user-
> space didn't provide such a list, the driver has to scan on
> all frequencies, so this is merely an optimization. But an
> important one, helps tremenduously.
>=20
> - I let the driver scan one frequency every n time units, e.g.
> every second one channel. This makes the driver visit all
> three channels during 3 seconds.
>=20
> - If I would get all beacons of the current channel, AND if the
> ESSID is not hidden, I would only scan for the channels I'm
> not on. Because for my current channels I have the signal
> strengths of all channels anyway and know also ESSID an IEs
> to decide if I can roam or not, should the need arise.
>=20
>=20
>=20
> > b) Number of consecutivley missed beacons below threshold
>=20
> This tends to roam only when it is too late, e.g. when the=20
> connection is nearly breaking. But you wrote this by=20
> yourself :-)
>=20
> In my case, I'm doing a full scan if this happens, to protect=20
> about bad channel list provisioning.
>=20
>=20
>=20
>=20
> > c) Only scan for new APs if the environment changes (e.g. we
> > are moving or the AP is moving etc.)
>=20
> You very seldom know about this, e.g. GPS is mostly useless=20
> inside big buildings.
>=20
> You can however record "Okay, when I associated to the AP my=20
> signal strength was 56. If it drops below 50, I'll look if I=20
> find something better".
>=20
> > I already did some research on c) and it looks very promising
> > but the topic is quite complex and needs more theoretical
> > research first.
>=20
> If you're serious about that then mac80211 should only get the=20
> infrastructure necessary so that we can write different roaming=20
> implementations, like we now have different rate selection=20
> implementations.

Yes, that's a good idea. There are some more scenarios where different
roaming algorithms might make sense. However, I'm still not sure where
the roaming decision should be made (and thus where the algorithm shoul=
d
be implemented). In user space (wpa_supplicant) or in mac80211. Having
it in user space would allow non-mac80211-drivers to benefit too but th=
e
driver would have to provide the necessary information.

> > Scanning for new APs should not be started from within the
> > driver or mac80211. Instead wpa_supplicant should care about
> > that. Why? Just because the supplicant might have more
> > information (maybe provided by NM) about the used network. For
> > example a typical multi-AP network won't use all channels from
> > within the bg-band due to signal interferences. Instead, all
> > APs will be located on non-overlapping channels. Let's say 1,6
> > and 11. Hence, if the supplicant tiggers a scan it will just
> > leave all channels !=3D 1,6,11 out of the scan request and the
> > scan will take a shorter amount of time, which in turn speeds
> > up the handoff delay.
>=20
> That's similar to my local, debugfs-based channel list hack, but=20
> better :-)
>=20
> But please make this be able to run from wpa_supplicant alone,=20
> don't force NM into the picture. Many embedded developers will=20
> say "thank you" for this. :-)

Also agreed. The solution should be usable without NM too but I thought
more about the degree of automatism here. If plain wpa_supplicant is us=
ed
the wpa_supplicant config should simply contain the channel list static=
ally
configured while in the NM case the channel list could be created by NM
based on historical data.

> > extend wpa_supplicant's network blocks to allow the
> > specification of preferred channels ("channels=3D1,6,11"). This
> > value could be provided by NM which gathered that information
> > either from the user or through monitoring.
>=20
> Or the value should simply be recorded in wpa_supplicant's config=20
> file. No need no stinkin' NM ! :-)

Adding a network block through dbus or through the config file is prett=
y
much the same. So, yes, this should of course also work with a plain=20
supplicant config.

Helmut

2009-03-24 10:36:20

by Holger Schurig

[permalink] [raw]
Subject: Re: Google Summer of Code 2009 -- Linux wireless roaming project

> Got your point. My concern was not about the AP selection ;).

Yeah, I now realized that we talked about different thresholds:

a) when to do scanning
b) what AP to select

and on top of this b) with absolute and relative meaning :-)


> Hmm, quick example: AP1 - STA - AP2
>
> We cannot consider the signal strength as constant as it
> varies over time even when neither the STA nor the AP are
> moving. Assume a threshold value of t=40. Furthermore, the
> signal strength of AP1 and AP2 might alter between 35-50 which
> means we have an average signal strength of 42,5 > t.
> Nevertheless that would result in ping-pongs between AP1 and
> AP2 because the signal might drop below t on both APs, while
> it would be better to stick to one AP as the signal is already
> quite bad (but still good enough to do some communication).

Yeah, but if the client is moving, you have to live with that,
more or less.

And if the client is not moving (and roaming is a loadable kernel
module), then simply don't load it :-)


My ad-hoc approach that I already implemented (for non-mac80211)
shows a quite number of scannings. But that is ok for my
use-case (e.g. telnet connection via WLAN). "Connection lost" is
way worse than one scanning/reassociation too much, especially
if the scanning/association is done intelligently.

So for now I wouldn't optimize here, but make non-sucking roaming
possible in the first place. We can build upon this anyway.



BTW, I actually looked once into mac80211, to see if I can get a
roaming similar to what madwifi with a little patch does now.
However, I quickly was lost in the djungle of cfg80211, nl80211,
mac80211. I found it overly complex just to add one additional
netlink-message, e.g. to specify the threshold via "iw". Maybe
I've done something wrong (e.g. not stared long enought at the
source-code), but I think it was necessary to change 80-100
lines just to get one value from "iw" into mac80211.

2009-03-24 10:47:06

by Holger Schurig

[permalink] [raw]
Subject: Re: Google Summer of Code 2009 -- Linux wireless roaming project

> Yes, that's a good idea. There are some more scenarios where
> different roaming algorithms might make sense. However, I'm
> still not sure where the roaming decision should be made (and
> thus where the algorithm should be implemented). In user space
> (wpa_supplicant) or in mac80211. Having it in user space would
> allow non-mac80211-drivers to benefit too but the driver would
> have to provide the necessary information.

User-space is often easier to change *)


However, if you want to use beacon-information, then user-space
doesn't spring to my mind so quickly. Beacons come in quite
fast, and I'd if I have to transport all of this to
user-space ...


Or we divide it: if user-spaces tells the kernel, then the kernel
maintains a bss-list and whenever the kernel modifies the
list "considerably", kernel pushes a nl80211 message to
user-space, notifying it about the new state "once in a while".
Then data gathering is in kernel, but decision making is in
user-space. However, we need a good defintion for "considerably"
and "once in a while", thought :-)





*) EXCEPT if we're talking wpa_supplicant. Not everyone find
wpa_supplicant easy to modify, because of the number of
interwoven state-machines and handled corner-cases.