2010-10-08 20:02:39

by Luis R. Rodriguez

[permalink] [raw]
Subject: Roaming / offchannel enhancements for broadcast / multicast frames

We spoke about how to handle broadcast / multicast frames when going
offchannel at the Wireless Summit [1]. A lot of these talks were lead
due to a Chrome side open bug
[2]. Chrome has dealt the critical issues by preventing doing a scan
when we are doing DHCP but we need a resolution in the long term. At
the summit I do not believe we made any solid conclusions but we did
throw out a lot of ideas. After the summit a few of us at Atheros met
and reviewed strategies to consider for doing this the best way
possible. I'd like to summarize our latest conclusions and plan on
addressing this and see if we can reach some consensus and priorities.

First, we need a force scan command API. This can be used by userspace
when it knows it wants to roam and it can allow mac80211 to drop
broadcast / multicast frames as there is a priority to roam / scan
over keeping these frames. Then we can implement a deadzone event that
userspace can pick up which will let userspace know we can RX from an
AP but cannot TX to it. We can send this event once the connection
monitor hits a trigger but the beacon monitor is still OK. Note how
some hardware has its own beacon monitor so we'll also need these
drivers to send these events to mac80211. Userspace may want to force
a roam when this deadzone event hits. Once we have these two in place
we can then ignore bgscan requests (when associated) unless a force
scan command has been issued by userspace, or unless we are idle. To
determine if we are idle we can use the existing dynamic power save
timers.

Then we need to move to mac80211 the code that checks we have RX'd all
multicast frames / broadcast frames prior to going to power save or
going offchannel. In the worst case scenario we will have missed the
last broadcast / multicast frame, so we can rely on the next beacon as
an indication the AP is done with all buffered broadcast / multicast
frames. ath9k already has code for all of this, so this just need to
be shifted to mac80211 for drivers that do software scan. In the worst
case scenario and unfortunately this seems to be the most common one,
a DTIM of 1 is used and we will have to be on channel and awake every
beacon interval. In this case we may want to optimize scan time by not
scanning passive scan channels. I still do believe we will drop some
broadcast / multicast frames in this case though, but avoiding a
bgscan when we are not-idle should cure most critical frame drops.

I've documented all this on our respective TODO wiki [3]. If you have
further ideas or tweaks to this approach please chime in and feel free
do edit the wiki as you see fit. this is a good time to invite people
to also subscribe to the wiki for edits.

Luis

[1] http://wireless.kernel.org/en/developers/Summits/SanFranciscoBayArea-2010
[2] http://code.google.com/p/chromium-os/issues/detail?id=5713&q=ath9k&colspec=ID%20Stars%20Pri%20Area%20Type%20Status%20Summary%20Modified%20Owner%20Mstone
[3] http://wireless.kernel.org/en/developers/todo-list


2010-10-09 00:43:33

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: Roaming / offchannel enhancements for broadcast / multicast frames

On Fri, Oct 8, 2010 at 5:08 PM, Paul Stewart <[email protected]> wrote:
> On Fri, Oct 8, 2010 at 12:54 PM, Luis R. Rodriguez <[email protected]> wrote:
>> We spoke about how to handle broadcast / multicast frames when going
>> offchannel at the Wireless Summit [1]. A lot of these talks were lead
>> due to a Chrome side open bug [2].
>
> Thanks for getting the ball rolling, Luis.  Technically the bug is in
> ChromumOS ("Chrome" is a web browser).

Heh, yeah sorry.

>> Userspace may want to force a roam when this deadzone event hits.
>
> Why not just disassociate at this point?  I'm not sure what the
> difference is between a "dead zone" situation and a reason to
> completely disconnect.

Not sure what is best, but one reason for considering to just roam is
we are at least getting data while we roam. I hope we can iron out the
best algorithm through this thread.

>> Once we have these two in place we can then ignore bgscan requests
>> (when associated) unless a force scan command has been issued by
>> userspace, or unless we are idle.
>
> By "ignore" do you mean "postpone" or or "return an appropriate error
> to userspace"?  Either of those are acceptable.  Not doing anything at
> all wouldn't be good.

Good questions. If we postpone it means we end up queuing up all these
scan requests, unless we only let a few queue up, or just one?

> There's an additional issue about what happens
> when we are in the middle of a bgscan and new tx traffic appears.

We can force going back on channel in this case I think.

>> In the worst case scenario and unfortunately this seems to be the most
>> common one, a DTIM of 1 is used and we will have to be on channel and
>> awake every beacon interval. In this case we may want to optimize scan
>> time by not scanning passive scan channels.
>
> A compromise would be to go off-channel for less than a full beacon
> interval when doing background passive channel scans in DTIM=1
> networks.  It's certainly better than (a) not scanning at all and (b)
> arguably better than intentionally dropping mcast.  An 80% beacon-time
> passive listen will get you 80% of the beacons, assuming linear
> probability, and even more over time if you account for beacon skew
> between networks.

Sure it just also means our bgscans can take up ages, if they take up
ages and we queue up a few of them then we can get a backlog of
requests from userspace agents. I suppose we need to figure out this
fine line. But yeah -- you're right, we can do scanning for less than
a beacon interval if DTIM is 1. This can mean 1024 TUs, not sure how
long it takes for us to do a passive scan, this likely is
radio/chipset specific so we'd have to add those values maybe to the
wiphy characteristics. Worth trying and seeing if its possible.

Luis

2010-10-09 07:34:04

by Helmut Schaa

[permalink] [raw]
Subject: Re: Roaming / offchannel enhancements for broadcast / multicast frames

Am Samstag 09 Oktober 2010 schrieb Luis R. Rodriguez:
> > There's an additional issue about what happens
> > when we are in the middle of a bgscan and new tx traffic appears.
>
> We can force going back on channel in this case I think.

This already happens (scan.c):

523 if (associated && ( !tx_empty || bad_latency ||
524 listen_int_exceeded))
525 local->next_scan_state = SCAN_ENTER_OPER_CHANNEL;
526 else
527 local->next_scan_state = SCAN_SET_CHANNEL;

When the currently scanned channel is finished and tx traffic arrived
mac80211 will switch back to the operating channel. However, this doesn't
immediately switch back but waits for the channel scan to be completed.

2010-10-11 22:28:17

by Paul Stewart

[permalink] [raw]
Subject: Re: Roaming / offchannel enhancements for broadcast / multicast frames

[Whoops. I didn't reply-all here and only sent to Johannes]

On Sat, Oct 9, 2010 at 5:53 AM, Johannes Berg <[email protected]> wrote:
> And in any case, there's no way to achieve perfect multicast reliability
> _anyway_, so I don't see why you're even trying? Can somebody actually
> come up with a problem statement? All I've seen so far is DHCP, but for
> just that, what's wrong with doing what you already do now?

I think this brings up a fairly fundamental difference in outlook
here, but I think it is completely resolvable. The argument "this is
an unreliable medium / class of traffic -- why spend any extra effort
on receiving it" doesn't personally wash with me. I think the term
used for traffic of this sort is "best effort". Especially in the
situation where the background scan was triggered just for
informational purposes ("what other channels does this ESS exist on?
I may want to look there first when things start getting nasty on this
channel"), I see no reason to either hurry the scan or knowingly
interrupt any form of traffic (no matter how low on the totem pole
they may appear to be).

I spent a lot of effort in an an earlier life in supporting multicast
and writing various multicast protocols, and I have to admit that
being in position to have to defend "best effort" as such is a little
distressing. :-) There are tons of different uses for multicast, and
while the use cases all have to consider a situation where frames are
lost on the medium, there is always a cost to their loss. In many
situations it might be better from the standpoint of ultimate channel
utilization to have received all multicast ("best effort") than to
have left it all on the floor. As I said above, there are classes of
background scan where loss of fidelity in the scan is much more
tolerable.

Even in the 80% passive scan case, I consider a passive-only scan
channel with an AP with a beacon interval synchronized such that we
will never hear from it unless we choose not to listen to our "home"
beacon to be marginalized enough that I'd prefer to never see it while
I'm successfully connected to a separate AP.

I propose to resolve this difference in outlook by proposing another
flag (orthogonal to passive/active) which describes whether the scan
should be "seamless", which is to say "prioritize all traffic on the
home channel above scanning". This could include the TX enhancements
(abort/postpone scan immediately if transmit traffic appears) as well
as receiving all multicast traffic regardless of the possible
detriment to the efficacy of the scan.

This may be paired with one additional feature -- progressive return
of scan results. Since the scan may take a while (in some senses it
already does). It would be nice to get nl80211 messages with each BSS
as it is acquired, much in the same way as wpa_supplicant now does in
its new DBus API.

--
Paul

2010-10-09 00:08:30

by Paul Stewart

[permalink] [raw]
Subject: Re: Roaming / offchannel enhancements for broadcast / multicast frames

On Fri, Oct 8, 2010 at 12:54 PM, Luis R. Rodriguez <[email protected]> wrote:
> We spoke about how to handle broadcast / multicast frames when going
> offchannel at the Wireless Summit [1]. A lot of these talks were lead
> due to a Chrome side open bug [2].

Thanks for getting the ball rolling, Luis. Technically the bug is in
ChromumOS ("Chrome" is a web browser).

> Userspace may want to force a roam when this deadzone event hits.

Why not just disassociate at this point? I'm not sure what the
difference is between a "dead zone" situation and a reason to
completely disconnect.

> Once we have these two in place we can then ignore bgscan requests
> (when associated) unless a force scan command has been issued by
> userspace, or unless we are idle.

By "ignore" do you mean "postpone" or or "return an appropriate error
to userspace"? Either of those are acceptable. Not doing anything at
all wouldn't be good. There's an additional issue about what happens
when we are in the middle of a bgscan and new tx traffic appears.

> In the worst case scenario and unfortunately this seems to be the most
> common one, a DTIM of 1 is used and we will have to be on channel and
> awake every beacon interval. In this case we may want to optimize scan
> time by not scanning passive scan channels.

A compromise would be to go off-channel for less than a full beacon
interval when doing background passive channel scans in DTIM=1
networks. It's certainly better than (a) not scanning at all and (b)
arguably better than intentionally dropping mcast. An 80% beacon-time
passive listen will get you 80% of the beacons, assuming linear
probability, and even more over time if you account for beacon skew
between networks.

--
Paul

2010-10-09 12:53:27

by Johannes Berg

[permalink] [raw]
Subject: Re: Roaming / offchannel enhancements for broadcast / multicast frames

On Fri, 2010-10-08 at 17:08 -0700, Paul Stewart wrote:

> > Userspace may want to force a roam when this deadzone event hits.
>
> Why not just disassociate at this point? I'm not sure what the
> difference is between a "dead zone" situation and a reason to
> completely disconnect.

I'd tend to agree -- what point is there in receiving data when you
can't even ACK it any more.. you'll just hurt everybody else by using
huge amounts of airtime, and if the situation persists hopefully the AP
will kick you off quickly anyway.

> > Once we have these two in place we can then ignore bgscan requests
> > (when associated) unless a force scan command has been issued by
> > userspace, or unless we are idle.
>
> By "ignore" do you mean "postpone" or or "return an appropriate error
> to userspace"? Either of those are acceptable. Not doing anything at
> all wouldn't be good. There's an additional issue about what happens
> when we are in the middle of a bgscan and new tx traffic appears.

I really don't want to add API for "force" to userspace -- it's entirely
pointless. If userspace wants their scan, so they'll always have to set
the force flag anyway. Postponing it for a bit seems much saner. I think
you're trying to solve a problem with say an existing NM that tries to
scan every two minutes -- but that problem need not be solved at this
layer.

> > In the worst case scenario and unfortunately this seems to be the most
> > common one, a DTIM of 1 is used and we will have to be on channel and
> > awake every beacon interval. In this case we may want to optimize scan
> > time by not scanning passive scan channels.
>
> A compromise would be to go off-channel for less than a full beacon
> interval when doing background passive channel scans in DTIM=1
> networks. It's certainly better than (a) not scanning at all and (b)
> arguably better than intentionally dropping mcast. An 80% beacon-time
> passive listen will get you 80% of the beacons, assuming linear
> probability, and even more over time if you account for beacon skew
> between networks.

Skew between APs is minimal -- the clocks are required to be accurate to
5ppm IIRC.

And in any case, there's no way to achieve perfect multicast reliability
_anyway_, so I don't see why you're even trying? Can somebody actually
come up with a problem statement? All I've seen so far is DHCP, but for
just that, what's wrong with doing what you already do now?

johannes