2010-09-28 22:29:48

by Luis R. Rodriguez

[permalink] [raw]
Subject: [PATCH] mac80211: fix rate_control_send_low warnings for delbas

The following two patches:

9c87ba6 - mac80211: Fix reassociation processing (within ESS roaming)
e1dd33f - cfg80211: Allow reassociation in associated state

$ git describe --contains 9c87ba6
v2.6.34-rc2~48^2~77^2~6
$ git describe --contains 9c87ba6
v2.6.34-rc2~48^2~77^2~6

Added support for cfg80211/mac80211 to cleanly roam between two BSSes
on an ESS by allowing the station to reassociate to an old AP and
only when that is done drop the old association. What we forgot to
take into consideration is that when we disassociate with the older
AP we may need to transmit frames to that AP and those frames may
actually be intended to go under a different channel and even sometimes
a completely separate band than the new APs. When we TX a frame we
assume the frame we want to TX however will be on the current hardware
configured channel. The channel we would try to send a frame on can
be different than the channel we prepared the bitrates for the peer
on though. What this meant is that upon tearing down a BA agreement
we would try to send a frame to a peer but not find a valid rate for
that peer and generate warnings like the following:

WARNING: at include/net/mac80211.h:2677 rate_control_send_low+0xd3/0x140 [mac80211]()
Hardware name: 6460DWU
Modules linked in: ath9k mac80211 ath9k_common ath9k_hw ath cfg80211 <etc>
Pid: 898, comm: wpa_supplicant Tainted: G W 2.6.36-rc5-wl+ #254
Call Trace:
[<ffffffff8105fddf>] warn_slowpath_common+0x7f/0xc0
[<ffffffff8105fe3a>] warn_slowpath_null+0x1a/0x20
[<ffffffffa03b8573>] rate_control_send_low+0xd3/0x140 [mac80211]
[<ffffffffa0192fa8>] ath_get_rate+0x48/0x570 [ath9k]
[<ffffffff812b8d19>] ? put_dec+0x59/0x60
[<ffffffffa03b839e>] rate_control_get_rate+0x8e/0x190 [mac80211]
[<ffffffffa03c27e8>] ieee80211_tx_h_rate_ctrl+0x1a8/0x4e0 [mac80211]
[<ffffffffa03c2ec0>] invoke_tx_handlers+0x100/0x140 [mac80211]
[<ffffffffa03c2f85>] ieee80211_tx+0x85/0x240 [mac80211]
[<ffffffff81479c70>] ? skb_release_data+0xd0/0xe0
[<ffffffff8147bb0d>] ? pskb_expand_head+0x10d/0x1a0
[<ffffffffa03c31f6>] ieee80211_xmit+0xb6/0x1d0 [mac80211]
[<ffffffff81479db3>] ? __alloc_skb+0x83/0x170
[<ffffffffa03c3364>] ieee80211_tx_skb+0x54/0x70 [mac80211]
[<ffffffffa03ac0dd>] ieee80211_send_delba+0x11d/0x190 [mac80211]
[<ffffffffa03adcf0>] ___ieee80211_stop_rx_ba_session+0xf0/0x110 [mac80211]
[<ffffffffa03add60>] __ieee80211_stop_rx_ba_session+0x50/0x70 [mac80211]
[<ffffffffa03ac3f3>] ieee80211_sta_tear_down_BA_sessions+0x43/0x50 [mac80211]
[<ffffffffa03b23c3>] ieee80211_set_disassoc+0x103/0x240 [mac80211]
[<ffffffffa03b3b2d>] ieee80211_mgd_assoc+0x8d/0x3a0 [mac80211]
[<ffffffffa03ba66f>] ieee80211_assoc+0x4f/0x80 [mac80211]
[<ffffffffa011e5b6>] __cfg80211_mlme_assoc+0x1f6/0x240 [cfg80211]
[<ffffffffa011e68f>] cfg80211_mlme_assoc+0x8f/0xc0 [cfg80211]
[<ffffffffa010afd0>] ? cfg80211_get_dev_from_ifindex+0x70/0x80 [cfg80211]
[<ffffffffa011489a>] nl80211_associate+0x23a/0x260 [cfg80211]
[<ffffffff812c6c6f>] ? nla_parse+0xef/0x110
[<ffffffff814ad738>] genl_rcv_msg+0x1d8/0x210
[<ffffffff81475cf4>] ? sock_alloc_send_pskb+0x1d4/0x330
[<ffffffff814ad560>] ? genl_rcv_msg+0x0/0x210
[<ffffffff814ac179>] netlink_rcv_skb+0xa9/0xd0
[<ffffffff814ad545>] genl_rcv+0x25/0x40
[<ffffffff814abdd8>] netlink_unicast+0x2c8/0x2e0
[<ffffffff814acc30>] netlink_sendmsg+0x250/0x360
[<ffffffff81472643>] sock_sendmsg+0xf3/0x120
[<ffffffff81562dbe>] ? _raw_spin_lock+0xe/0x20
[<ffffffff81471105>] ? move_addr_to_kernel+0x65/0x70
[<ffffffff8147d168>] ? verify_iovec+0x88/0xe0
[<ffffffff81472d70>] sys_sendmsg+0x240/0x3a0
[<ffffffff81151b0a>] ? do_readv_writev+0x1aa/0x1f0
[<ffffffff815604b0>] ? schedule+0x3c0/0xa00
[<ffffffff81151b98>] ? vfs_writev+0x48/0x60
[<ffffffff81151cc1>] ? sys_writev+0x51/0xb0
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

This can be easily reproduced with the test-roam script [1] and
statically defining the ESS variable with the BSSes of an AP
in 2.4 GHz band and another one on the 5 GHz band.

Since we end up associated to two different APs at the same
time to transmit to either AP consists of an offchannel
operation but today we just lack the state machine to be able
to do this properly so for now just drop those frames. To be
able to detect what frames to drop we keep track of the
peer's channel upon creation and whenever that differs from
the currently operating channel we simply drop the frame
and increment a new debugfs counter:

/sys/kernel/debug/ieee80211/phy0/statistics/tx_handlers_drop_wrong_channel

This should have no impact on existing functionality for frames
we were already dropping but it also does mean we'll drop some more
frames upon disassociation if the AP we are moving away from is on
a different channel on the same band. We now suppress the warning
and acknowledge it, understand the the issue, and understand what
we need to fix in the long run to properly address this.

For more information refer to:

http://code.google.com/p/chromium-os/issues/detail?id=6348
http://marc.info/?l=linux-wireless&m=128401139730423

This patch has a fix for kernels >= v2.6.34. The fix is to supress
the warnings from spewing all over the logs and prevent trying to
send frames on a separate channel on the same band right after
re-association.

[1] http://www.kernel.org/pub/linux/kernel/people/mcgrof/scripts/test-roam

Cc: [email protected]
Cc: Jouni Malinen <[email protected]>
Cc: Paul Stewart <[email protected]>
Cc: Amod Bodas <[email protected]>
Cc: Johannes Berg <[email protected]>
Cc: Vasanthakumar Thiagarajan <[email protected]>
Signed-off-by: Luis R. Rodriguez <[email protected]>
---
include/net/mac80211.h | 2 ++
net/mac80211/cfg.c | 7 +++++++
net/mac80211/debugfs.c | 2 ++
net/mac80211/ieee80211_i.h | 1 +
net/mac80211/tx.c | 10 ++++++++++
5 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index fe8b9da..bfe4344 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -888,6 +888,7 @@ enum set_key_cmd {
* or you must take good care to not use such a pointer after a
* call to your sta_remove callback that removed it.
*
+ * @channel: the channel this sta belongs to
* @addr: MAC address
* @aid: AID we assigned to the station if we're an AP
* @supp_rates: Bitmap of supported rates (per band)
@@ -896,6 +897,7 @@ enum set_key_cmd {
* sizeof(void *), size is determined in hw information.
*/
struct ieee80211_sta {
+ struct ieee80211_channel *channel;
u32 supp_rates[IEEE80211_NUM_BANDS];
u8 addr[ETH_ALEN];
u16 aid;
diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index c981604..bfe91a5 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -584,7 +584,14 @@ static void sta_apply_parameters(struct ieee80211_local *local,
struct ieee80211_sub_if_data *sdata = sta->sdata;
u32 mask, set;

+ /*
+ * TODO: userspace should pass the band, there is no
+ * guarantee userspace will have sent this in synch with
+ * the STA's own same band. For now we assume the currently
+ * operating channel is on the STA's own band and channel.
+ */
sband = local->hw.wiphy->bands[local->oper_channel->band];
+ sta->sta.channel = local->oper_channel;

spin_lock_irqsave(&sta->flaglock, flags);
mask = params->sta_flags_mask;
diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c
index ebd5b69..6f05921 100644
--- a/net/mac80211/debugfs.c
+++ b/net/mac80211/debugfs.c
@@ -414,6 +414,8 @@ void debugfs_hw_add(struct ieee80211_local *local)
local->tx_handlers_drop_wep);
DEBUGFS_STATS_ADD(tx_handlers_drop_not_assoc,
local->tx_handlers_drop_not_assoc);
+ DEBUGFS_STATS_ADD(tx_handlers_drop_wrong_channel,
+ local->tx_handlers_drop_wrong_channel);
DEBUGFS_STATS_ADD(tx_handlers_drop_unauth_port,
local->tx_handlers_drop_unauth_port);
DEBUGFS_STATS_ADD(rx_handlers_drop, local->rx_handlers_drop);
diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index 945fbf2..36a2e80 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -854,6 +854,7 @@ struct ieee80211_local {
unsigned int tx_handlers_drop_fragment;
unsigned int tx_handlers_drop_wep;
unsigned int tx_handlers_drop_not_assoc;
+ unsigned int tx_handlers_drop_wrong_channel;
unsigned int tx_handlers_drop_unauth_port;
unsigned int rx_handlers_drop;
unsigned int rx_handlers_queued;
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index e1733dc..f3ac6dd 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1174,6 +1174,10 @@ ieee80211_tx_prepare(struct ieee80211_sub_if_data *sdata,
tx->skb = skb;
tx->local = local;
tx->sdata = sdata;
+ /*
+ * TODO: This should use tx->sta->sta.channel once we can
+ * actually rely on the information being accurate.
+ */
tx->channel = local->hw.conf.channel;
/*
* Set this flag (used below to indicate "automatic fragmentation"),
@@ -1213,6 +1217,12 @@ ieee80211_tx_prepare(struct ieee80211_sub_if_data *sdata,
if (!tx->sta)
tx->sta = sta_info_get(sdata, hdr->addr1);

+ if (tx->sta && tx->sta->sta.channel &&
+ tx->sta->sta.channel->center_freq != tx->channel->center_freq) {
+ I802_DEBUG_INC(local->tx_handlers_drop_wrong_channel);
+ return TX_DROP;
+ }
+
if (tx->sta && ieee80211_is_data_qos(hdr->frame_control) &&
(local->hw.flags & IEEE80211_HW_AMPDU_AGGREGATION)) {
struct tid_ampdu_tx *tid_tx;
--
1.7.0.4



2010-09-30 17:53:40

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [PATCH] mac80211: fix rate_control_send_low warnings for delbas

On Wed, Sep 29, 2010 at 10:25 AM, Johannes Berg
<[email protected]> wrote:
> On Wed, 2010-09-29 at 10:10 -0700, Luis R. Rodriguez wrote:
>
>> > I really think that can be done more easily though. For one, clearly we
>> > already have the warning, so we don't need more infrastructure to catch
>> > such errors?!
>>
>> This is true, I just added that to ensure I hit these when testing
>> really, I can nuke the counter, but it can help if eventually believe
>> we have a proper fix to allow these frames through somehow too.
>>
>> > Also, this may end up hiding issues that we don't yet
>> > understand, like the nullfunc one you were talking about.
>>
>> Yeah, good point, although I am under the impression this is a similar
>> situation, we probably try to send a nullfunc to notify the old AP we
>> are going to go awake if we are transmitting data while roaming. But
>> yeah, its not easily triggerable yet.
>>
>> > The delBA one
>> > we now understand fully, so it makes more sense to simply suppress
>> > sending delBA when we are going to disassociate by way of associating
>> > with a new AP, no?
>>
>> That's reasonable but we will still need the channel, otherwise how
>> would we know its this issue?
>
> Why do we care? We can just always suppress sending the delba when we
> get into _set_disassoc() from mgd_assoc(), no?

Yeah, and I actually think I found another race with this code, I'm
going to try something a bit different now.

Luis

Subject: Re: [PATCH] mac80211: fix rate_control_send_low warnings for delbas

On Wed, Sep 29, 2010 at 12:54:27PM +0530, Luis R. Rodriguez wrote:
> > (where we just authenticated with another AP)
>
> Not sure I get this part.

I mean we are trying to tx frames on the channel where we got
newly authenticated to another AP.
>
> > just before starting association with the new AP.
>
> Right, but when we are already switched to the new channel, and the
> reason is that when we try to send a frame we never kept track of the
> peer's original channel. Now, I also see the teardown happen right
> before mac80211 tells us its associated to the new AP as well but it
> not sure what cfg80211 thinks at that point. At least the cfg80211
> patch's intention seemed to be to allow association to both, and
> tearing down the association to the other once you completed
> association to the other. I do see, what you describe though too --
> only authentication to the new AP, and right before association, we
> send the teardown. The problem is though that when we send the
> teardown we are already set on the new channel.

Ok. From my understanding, the sequence of operations during roaming
are

1. NL80211_CMD_AUTHENTICATE to an (better) AP from user space
- Change to New Ap's channel and send auth req (done by
ieee80211_work_work()).
- Notify auth resp to userspace (ieee80211_probe_auth_done())
2. NL80211_CMD_ASSOCIATE to newly authenticated AP from user space
- Clean up any existing BA session with an AP, if any.
Here the assumption is, ieee80211_work_work() has already
configured the device back to it's operating channel (the
old one) after the authentication is complete with the new
AP. This should have happened part of 1.
- Start the association process with the new AP.


Vasanth

2010-09-29 08:18:30

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH] mac80211: fix rate_control_send_low warnings for delbas

On Tue, 2010-09-28 at 15:29 -0700, Luis R. Rodriguez wrote:

> Since we end up associated to two different APs at the same
> time to transmit to either AP consists of an offchannel
> operation but today we just lack the state machine to be able
> to do this properly so for now just drop those frames.

I really think that can be done more easily though. For one, clearly we
already have the warning, so we don't need more infrastructure to catch
such errors?! Also, this may end up hiding issues that we don't yet
understand, like the nullfunc one you were talking about. The delBA one
we now understand fully, so it makes more sense to simply suppress
sending delBA when we are going to disassociate by way of associating
with a new AP, no?

johannes


2010-09-29 17:24:10

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH] mac80211: fix rate_control_send_low warnings for delbas

On Wed, 2010-09-29 at 10:16 -0700, Luis R. Rodriguez wrote:

> > 2. NL80211_CMD_ASSOCIATE to newly authenticated AP from user space
> >
> > - Clean up any existing BA session with an AP, if any.
> > Here the assumption is, ieee80211_work_work() has already
> > configured the device back to it's operating channel (the
> > old one) after the authentication is complete with the new
> > AP. This should have happened part of 1.
> > - Start the association process with the new AP.
>
> No see, I believe we actually get the association response and *then*
> we teardown the BA agreements. Let me review this some more today.

No, Vasanth's right, but there's a race:

* we go off-channel
* we send auth
* we get auth
* we notify wpa_s
* wpa_s asks for assoc | these aren't ordered
* we disassoc | quite right but it's
* we return from off-channel | hard to avoid ...
* we assoc to new AP

johannes


Subject: Re: [PATCH] mac80211: fix rate_control_send_low warnings for delbas

On Wed, Sep 29, 2010 at 05:28:35AM +0530, Luis R. Rodriguez wrote:
> On Tue, Sep 28, 2010 at 4:02 PM, Jouni Malinen <[email protected]> wrote:
> > On Tue, Sep 28, 2010 at 03:29:45PM -0700, Luis R. Rodriguez wrote:
> >
> >> Added support for cfg80211/mac80211 to cleanly roam between two BSSes
> >> on an ESS by allowing the station to reassociate to an old AP and
> >> only when that is done drop the old association. What we forgot to
> >> take into consideration is that when we disassociate with the older
> >> AP we may need to transmit frames to that AP and those frames may
> >> actually be intended to go under a different channel and even sometimes
> >> a completely separate band than the new APs.
> >
> > which frames are you talking about here?
>
> ieee80211_send_delba()
>
> > When reassociating to a new BSS
> > in an ESS, there should be no need to transmit frames to the old AP
> > anymore, i.e., we are never associated with more than one BSS (when
> > talking about a single vif). We could just drop any potential TX frame
> > to the old AP after the moment the new AP is marked associated.
>
> That's the goal of this patch.

>From my debugging, this happens when cleaning up BA session for old
AP but on a different channel (where we just authenticated with
another AP) just before starting association with the new AP.
Though local->work_work takes care of moving to right channel,
it is possible that cfg80211 receives NL80211_CMD_ASSOCIATE before
local->work_work configures the hw back to old AP's channel (that
is what happening in our case).


Vasanth

2010-09-29 17:16:51

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [PATCH] mac80211: fix rate_control_send_low warnings for delbas

On Wed, Sep 29, 2010 at 1:56 AM, Vasanthakumar Thiagarajan
<[email protected]> wrote:
> On Wed, Sep 29, 2010 at 12:54:27PM +0530, Luis R. Rodriguez wrote:
>> > (where we just authenticated with another AP)
>>
>> Not sure I get this part.
>
> I mean we are trying to tx frames on the channel where we got
> newly authenticated to another AP.

Ah yes, agred.

>> > just before starting association with the new AP.
>>
>> Right, but when we are already switched to the new channel, and the
>> reason is that when we try to send a frame we never kept track of the
>> peer's original channel. Now, I also see the teardown happen right
>> before mac80211 tells us its associated to the new AP as well but it
>> not sure what cfg80211 thinks at that point. At least the cfg80211
>> patch's intention seemed to be to allow association to both, and
>> tearing down the association to the other once you completed
>> association to the other. I do see, what you describe though too --
>> only authentication to the new AP, and right before association, we
>> send the teardown. The problem is though that when we send the
>> teardown we are already set on the new channel.
>
> Ok. From my understanding, the sequence of operations during roaming
> are
>
> 1. NL80211_CMD_AUTHENTICATE to an (better) AP from user space
>        - Change to New Ap's channel and send auth req (done by
>          ieee80211_work_work()).
>        - Notify auth resp to userspace (ieee80211_probe_auth_done())
> 2. NL80211_CMD_ASSOCIATE to newly authenticated AP from user space
>
>        - Clean up any existing BA session with an AP, if any.
>        Here the assumption is, ieee80211_work_work() has already
>        configured the device back to it's operating channel (the
>        old one) after the authentication is complete with the new
>        AP. This should have happened part of 1.
>        - Start the association process with the new AP.

No see, I believe we actually get the association response and *then*
we teardown the BA agreements. Let me review this some more today.

Luis

2010-09-29 17:11:00

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [PATCH] mac80211: fix rate_control_send_low warnings for delbas

On Wed, Sep 29, 2010 at 1:18 AM, Johannes Berg
<[email protected]> wrote:
> On Tue, 2010-09-28 at 15:29 -0700, Luis R. Rodriguez wrote:
>
>> Since we end up associated to two different APs at the same
>> time to transmit to either AP consists of an offchannel
>> operation but today we just lack the state machine to be able
>> to do this properly so for now just drop those frames.
>
> I really think that can be done more easily though. For one, clearly we
> already have the warning, so we don't need more infrastructure to catch
> such errors?!

This is true, I just added that to ensure I hit these when testing
really, I can nuke the counter, but it can help if eventually believe
we have a proper fix to allow these frames through somehow too.

> Also, this may end up hiding issues that we don't yet
> understand, like the nullfunc one you were talking about.

Yeah, good point, although I am under the impression this is a similar
situation, we probably try to send a nullfunc to notify the old AP we
are going to go awake if we are transmitting data while roaming. But
yeah, its not easily triggerable yet.

> The delBA one
> we now understand fully, so it makes more sense to simply suppress
> sending delBA when we are going to disassociate by way of associating
> with a new AP, no?

That's reasonable but we will still need the channel, otherwise how
would we know its this issue?

Luis

2010-09-28 23:58:56

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [PATCH] mac80211: fix rate_control_send_low warnings for delbas

On Tue, Sep 28, 2010 at 4:02 PM, Jouni Malinen <[email protected]> wrote:
> On Tue, Sep 28, 2010 at 03:29:45PM -0700, Luis R. Rodriguez wrote:
>
>> Added support for cfg80211/mac80211 to cleanly roam between two BSSes
>> on an ESS by allowing the station to reassociate to an old AP and
>> only when that is done drop the old association. What we forgot to
>> take into consideration is that when we disassociate with the older
>> AP we may need to transmit frames to that AP and those frames may
>> actually be intended to go under a different channel and even sometimes
>> a completely separate band than the new APs.
>
> which frames are you talking about here?

ieee80211_send_delba()

> When reassociating to a new BSS
> in an ESS, there should be no need to transmit frames to the old AP
> anymore, i.e., we are never associated with more than one BSS (when
> talking about a single vif). We could just drop any potential TX frame
> to the old AP after the moment the new AP is marked associated.

That's the goal of this patch.

>> + * @channel: the channel this sta belongs to
>
>>  struct ieee80211_sta {
>> +     struct ieee80211_channel *channel;
>>       u32 supp_rates[IEEE80211_NUM_BANDS];
>>       u8 addr[ETH_ALEN];
>>       u16 aid;
>
>> diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
>
>> @@ -1213,6 +1217,12 @@ ieee80211_tx_prepare(struct ieee80211_sub_if_data *sdata,
>>       if (!tx->sta)
>>               tx->sta = sta_info_get(sdata, hdr->addr1);
>>
>> +     if (tx->sta && tx->sta->sta.channel &&
>> +         tx->sta->sta.channel->center_freq != tx->channel->center_freq) {
>> +             I802_DEBUG_INC(local->tx_handlers_drop_wrong_channel);
>> +             return TX_DROP;
>> +     }
>
> Could this drop some off-channel frames should we happen to have a STA
> entry for the destination?

This would depend on where we set the tx->channel and I highlighted
that part of the code on my patch context diffs. The channel comes
from local->hw.conf.channel so if the peer is defined then the
assumption being made here is that we will only transmit to it on the
channel that the peer was on. During offchannel operation the
currently configured channel should be same:

offchannel_flag = local->hw.conf.flags & IEEE80211_CONF_OFFCHANNEL;
if (scan_chan) {
chan = scan_chan;
channel_type = NL80211_CHAN_NO_HT;
local->hw.conf.flags |= IEEE80211_CONF_OFFCHANNEL;
} else if (local->tmp_channel &&
local->oper_channel != local->tmp_channel) {
chan = scan_chan = local->tmp_channel;
channel_type = local->tmp_channel_type;
local->hw.conf.flags |= IEEE80211_CONF_OFFCHANNEL;
} else {
chan = local->oper_channel;
channel_type = local->_oper_channel_type;
local->hw.conf.flags &= ~IEEE80211_CONF_OFFCHANNEL;
}
offchannel_flag ^= local->hw.conf.flags & IEEE80211_CONF_OFFCHANNEL;

if (offchannel_flag || chan != local->hw.conf.channel ||
channel_type != local->hw.conf.channel_type) {
local->hw.conf.channel = chan;
local->hw.conf.channel_type = channel_type;
changed |= IEEE80211_CONF_CHANGE_CHANNEL;
}

> I'm not sure about all implications of this,
> but I could not convince myself that this would be safe for all P2P use
> cases (e.g., frame exchange on non-operating  channel with a device that
> happens to be in group with us).

When going off channel the local->hw.conf.channel would still be
respected and only if the peer exists would we make the check for it.
If this assumption would break something for P2P then let me know.

Luis

2010-09-29 07:24:49

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [PATCH] mac80211: fix rate_control_send_low warnings for delbas

On Tue, Sep 28, 2010 at 10:20 PM, Vasanthakumar Thiagarajan
<[email protected]> wrote:
> On Wed, Sep 29, 2010 at 05:28:35AM +0530, Luis R. Rodriguez wrote:
>> On Tue, Sep 28, 2010 at 4:02 PM, Jouni Malinen <[email protected]> wrote:
>> > On Tue, Sep 28, 2010 at 03:29:45PM -0700, Luis R. Rodriguez wrote:
>> >
>> >> Added support for cfg80211/mac80211 to cleanly roam between two BSSes
>> >> on an ESS by allowing the station to reassociate to an old AP and
>> >> only when that is done drop the old association. What we forgot to
>> >> take into consideration is that when we disassociate with the older
>> >> AP we may need to transmit frames to that AP and those frames may
>> >> actually be intended to go under a different channel and even sometimes
>> >> a completely separate band than the new APs.
>> >
>> > which frames are you talking about here?
>>
>> ieee80211_send_delba()
>>
>> > When reassociating to a new BSS
>> > in an ESS, there should be no need to transmit frames to the old AP
>> > anymore, i.e., we are never associated with more than one BSS (when
>> > talking about a single vif). We could just drop any potential TX frame
>> > to the old AP after the moment the new AP is marked associated.
>>
>> That's the goal of this patch.
>
> From my debugging, this happens when cleaning up BA session for old
> AP but on a different channel

Right.

> (where we just authenticated with another AP)

Not sure I get this part.

> just before starting association with the new AP.

Right, but when we are already switched to the new channel, and the
reason is that when we try to send a frame we never kept track of the
peer's original channel. Now, I also see the teardown happen right
before mac80211 tells us its associated to the new AP as well but it
not sure what cfg80211 thinks at that point. At least the cfg80211
patch's intention seemed to be to allow association to both, and
tearing down the association to the other once you completed
association to the other. I do see, what you describe though too --
only authentication to the new AP, and right before association, we
send the teardown. The problem is though that when we send the
teardown we are already set on the new channel.

> Though local->work_work takes care of moving to right channel,

When we TX a frame we use local->hw.conf.channel which would be the
new AP's channel, this is wrong, but we never kept track of the peer's
original channel so there no way to know the original channel.

> it is possible that cfg80211 receives NL80211_CMD_ASSOCIATE before
> local->work_work configures the hw back to old AP's channel (that
> is what happening in our case).

And how would mac80211 or cfg80211 know what channel that was?

Luis

2010-09-28 23:04:15

by Jouni Malinen

[permalink] [raw]
Subject: Re: [PATCH] mac80211: fix rate_control_send_low warnings for delbas

On Tue, Sep 28, 2010 at 03:29:45PM -0700, Luis R. Rodriguez wrote:

> Added support for cfg80211/mac80211 to cleanly roam between two BSSes
> on an ESS by allowing the station to reassociate to an old AP and
> only when that is done drop the old association. What we forgot to
> take into consideration is that when we disassociate with the older
> AP we may need to transmit frames to that AP and those frames may
> actually be intended to go under a different channel and even sometimes
> a completely separate band than the new APs.

which frames are you talking about here? When reassociating to a new BSS
in an ESS, there should be no need to transmit frames to the old AP
anymore, i.e., we are never associated with more than one BSS (when
talking about a single vif). We could just drop any potential TX frame
to the old AP after the moment the new AP is marked associated.


> + * @channel: the channel this sta belongs to

> struct ieee80211_sta {
> + struct ieee80211_channel *channel;
> u32 supp_rates[IEEE80211_NUM_BANDS];
> u8 addr[ETH_ALEN];
> u16 aid;

> diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c

> @@ -1213,6 +1217,12 @@ ieee80211_tx_prepare(struct ieee80211_sub_if_data *sdata,
> if (!tx->sta)
> tx->sta = sta_info_get(sdata, hdr->addr1);
>
> + if (tx->sta && tx->sta->sta.channel &&
> + tx->sta->sta.channel->center_freq != tx->channel->center_freq) {
> + I802_DEBUG_INC(local->tx_handlers_drop_wrong_channel);
> + return TX_DROP;
> + }

Could this drop some off-channel frames should we happen to have a STA
entry for the destination? I'm not sure about all implications of this,
but I could not convince myself that this would be safe for all P2P use
cases (e.g., frame exchange on non-operating channel with a device that
happens to be in group with us).


--
Jouni Malinen PGP id EFC895FA

2010-09-29 17:25:40

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH] mac80211: fix rate_control_send_low warnings for delbas

On Wed, 2010-09-29 at 10:10 -0700, Luis R. Rodriguez wrote:

> > I really think that can be done more easily though. For one, clearly we
> > already have the warning, so we don't need more infrastructure to catch
> > such errors?!
>
> This is true, I just added that to ensure I hit these when testing
> really, I can nuke the counter, but it can help if eventually believe
> we have a proper fix to allow these frames through somehow too.
>
> > Also, this may end up hiding issues that we don't yet
> > understand, like the nullfunc one you were talking about.
>
> Yeah, good point, although I am under the impression this is a similar
> situation, we probably try to send a nullfunc to notify the old AP we
> are going to go awake if we are transmitting data while roaming. But
> yeah, its not easily triggerable yet.
>
> > The delBA one
> > we now understand fully, so it makes more sense to simply suppress
> > sending delBA when we are going to disassociate by way of associating
> > with a new AP, no?
>
> That's reasonable but we will still need the channel, otherwise how
> would we know its this issue?

Why do we care? We can just always suppress sending the delba when we
get into _set_disassoc() from mgd_assoc(), no?

johannes


2010-10-01 20:07:35

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [PATCH] mac80211: fix rate_control_send_low warnings for delbas

On Thu, Sep 30, 2010 at 10:53 AM, Luis R. Rodriguez
<[email protected]> wrote:
> On Wed, Sep 29, 2010 at 10:25 AM, Johannes Berg
> <[email protected]> wrote:
>> On Wed, 2010-09-29 at 10:10 -0700, Luis R. Rodriguez wrote:
>>
>>> > I really think that can be done more easily though. For one, clearly we
>>> > already have the warning, so we don't need more infrastructure to catch
>>> > such errors?!
>>>
>>> This is true, I just added that to ensure I hit these when testing
>>> really, I can nuke the counter, but it can help if eventually believe
>>> we have a proper fix to allow these frames through somehow too.
>>>
>>> > Also, this may end up hiding issues that we don't yet
>>> > understand, like the nullfunc one you were talking about.
>>>
>>> Yeah, good point, although I am under the impression this is a similar
>>> situation, we probably try to send a nullfunc to notify the old AP we
>>> are going to go awake if we are transmitting data while roaming. But
>>> yeah, its not easily triggerable yet.
>>>
>>> > The delBA one
>>> > we now understand fully, so it makes more sense to simply suppress
>>> > sending delBA when we are going to disassociate by way of associating
>>> > with a new AP, no?
>>>
>>> That's reasonable but we will still need the channel, otherwise how
>>> would we know its this issue?
>>
>> Why do we care? We can just always suppress sending the delba when we
>> get into _set_disassoc() from mgd_assoc(), no?
>
> Yeah, and I actually think I found another race with this code, I'm
> going to try something a bit different now.

OK yay, took a while but I think I found a better and proper fix for
this. Will post next.

Luis