The following two patches:
9c87ba6 - mac80211: Fix reassociation processing (within ESS roaming)
e1dd33f - cfg80211: Allow reassociation in associated state
$ git describe --contains 9c87ba6
v2.6.34-rc2~48^2~77^2~6
$ git describe --contains 9c87ba6
v2.6.34-rc2~48^2~77^2~6
Added support for cfg80211/mac80211 to cleanly roam between two BSSes
on an ESS by allowing the station to authenticate to two APS at the
same time, and when an association comes in for the new AP we first
disassociate from the old AP and then associate with the new one.
What we forgot to take into consideration is that when we disassociate
with the older AP we may need to transmit frames to that AP and those
frames may actually be intended to go under a different channel and
even sometimes a completely separate band than the new APs. When we
TX a frame we assume the frame we want to TX however will be on the
current hardware configured channel. The channel we would try to
send a frame on can be different than the channel we prepared the
bitrates for the peer on though. What this meant is that upon tearing
down a BA agreement we would try to send a frame to a peer but not
find a valid rate for that peer and generate warnings like the
following:
WARNING: at include/net/mac80211.h:2677 rate_control_send_low+0xd3/0x140 [mac80211]()
Hardware name: 6460DWU
Modules linked in: ath9k mac80211 ath9k_common ath9k_hw ath cfg80211 <etc>
Pid: 898, comm: wpa_supplicant Tainted: G W 2.6.36-rc5-wl+ #254
Call Trace:
[<ffffffff8105fddf>] warn_slowpath_common+0x7f/0xc0
[<ffffffff8105fe3a>] warn_slowpath_null+0x1a/0x20
[<ffffffffa03b8573>] rate_control_send_low+0xd3/0x140 [mac80211]
[<ffffffffa0192fa8>] ath_get_rate+0x48/0x570 [ath9k]
[<ffffffff812b8d19>] ? put_dec+0x59/0x60
[<ffffffffa03b839e>] rate_control_get_rate+0x8e/0x190 [mac80211]
[<ffffffffa03c27e8>] ieee80211_tx_h_rate_ctrl+0x1a8/0x4e0 [mac80211]
[<ffffffffa03c2ec0>] invoke_tx_handlers+0x100/0x140 [mac80211]
[<ffffffffa03c2f85>] ieee80211_tx+0x85/0x240 [mac80211]
[<ffffffff81479c70>] ? skb_release_data+0xd0/0xe0
[<ffffffff8147bb0d>] ? pskb_expand_head+0x10d/0x1a0
[<ffffffffa03c31f6>] ieee80211_xmit+0xb6/0x1d0 [mac80211]
[<ffffffff81479db3>] ? __alloc_skb+0x83/0x170
[<ffffffffa03c3364>] ieee80211_tx_skb+0x54/0x70 [mac80211]
[<ffffffffa03ac0dd>] ieee80211_send_delba+0x11d/0x190 [mac80211]
[<ffffffffa03adcf0>] ___ieee80211_stop_rx_ba_session+0xf0/0x110 [mac80211]
[<ffffffffa03add60>] __ieee80211_stop_rx_ba_session+0x50/0x70 [mac80211]
[<ffffffffa03ac3f3>] ieee80211_sta_tear_down_BA_sessions+0x43/0x50 [mac80211]
[<ffffffffa03b23c3>] ieee80211_set_disassoc+0x103/0x240 [mac80211]
[<ffffffffa03b3b2d>] ieee80211_mgd_assoc+0x8d/0x3a0 [mac80211]
[<ffffffffa03ba66f>] ieee80211_assoc+0x4f/0x80 [mac80211]
[<ffffffffa011e5b6>] __cfg80211_mlme_assoc+0x1f6/0x240 [cfg80211]
[<ffffffffa011e68f>] cfg80211_mlme_assoc+0x8f/0xc0 [cfg80211]
[<ffffffffa010afd0>] ? cfg80211_get_dev_from_ifindex+0x70/0x80 [cfg80211]
[<ffffffffa011489a>] nl80211_associate+0x23a/0x260 [cfg80211]
[<ffffffff812c6c6f>] ? nla_parse+0xef/0x110
[<ffffffff814ad738>] genl_rcv_msg+0x1d8/0x210
[<ffffffff81475cf4>] ? sock_alloc_send_pskb+0x1d4/0x330
[<ffffffff814ad560>] ? genl_rcv_msg+0x0/0x210
[<ffffffff814ac179>] netlink_rcv_skb+0xa9/0xd0
[<ffffffff814ad545>] genl_rcv+0x25/0x40
[<ffffffff814abdd8>] netlink_unicast+0x2c8/0x2e0
[<ffffffff814acc30>] netlink_sendmsg+0x250/0x360
[<ffffffff81472643>] sock_sendmsg+0xf3/0x120
[<ffffffff81562dbe>] ? _raw_spin_lock+0xe/0x20
[<ffffffff81471105>] ? move_addr_to_kernel+0x65/0x70
[<ffffffff8147d168>] ? verify_iovec+0x88/0xe0
[<ffffffff81472d70>] sys_sendmsg+0x240/0x3a0
[<ffffffff81151b0a>] ? do_readv_writev+0x1aa/0x1f0
[<ffffffff815604b0>] ? schedule+0x3c0/0xa00
[<ffffffff81151b98>] ? vfs_writev+0x48/0x60
[<ffffffff81151cc1>] ? sys_writev+0x51/0xb0
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
This can be easily reproduced with the test-roam script [1] and
statically defining the ESS variable with the BSSes of an AP
in 2.4 GHz band and another one on the 5 GHz band.
Since we end up associated to two different APs at the same
time to transmit to either AP consists of an offchannel
operation and mac80211 does have a state machine for this
but assumes we don't transmit on our operating channel within
each work item. If we do need to transmit to on the operating
channel each work item needs to ensure this. This series
addresses this by addressing a series of possible races on
the frame's set channel, prevening us from associating to a
new AP if we were previously associated and haven't yet
associated, and ensuring we transmit the disassocation on
the operating channel.
For more information refer to:
http://code.google.com/p/chromium-os/issues/detail?id=6348
http://marc.info/?l=linux-wireless&m=128401139730423
This patch series has a fix for kernels >= v2.6.34. The fix will
actually cure the warnings correctly, it ensures we tear down our
BA agreements with our old APs prior to moving away from them and
ensures we transmit those frames on the intended channel.
[1] http://www.kernel.org/pub/linux/kernel/people/mcgrof/scripts/test-roam
Cc: [email protected]
Cc: Jouni Malinen <[email protected]>
Cc: Paul Stewart <[email protected]>
Cc: Amod Bodas <[email protected]>
Cc: Johannes Berg <[email protected]>
Cc: Vasanthakumar Thiagarajan <[email protected]>
Signed-off-by: Luis R. Rodriguez <[email protected]>
Luis R. Rodriguez (3):
mac80211: fix channel assumption for association done work
mac80211: wait until completely disassociated before new association
mac80211: move to the home channel for disassociation when roaming
net/mac80211/mlme.c | 18 ++++++++++++++----
net/mac80211/work.c | 14 ++++++++++++++
2 files changed, 28 insertions(+), 4 deletions(-)
On Mon, 2010-10-04 at 10:23 -0700, Luis R. Rodriguez wrote:
> On Mon, Oct 04, 2010 at 09:41:56AM -0700, Johannes Berg wrote:
> > On Mon, 2010-10-04 at 09:38 -0700, Luis R. Rodriguez wrote:
> > > On Mon, Oct 04, 2010 at 06:15:52AM -0700, Johannes Berg wrote:
> > > > On Fri, 2010-10-01 at 16:33 -0400, Luis R. Rodriguez wrote:
> > > >
> > > > > @@ -2203,6 +2204,14 @@ int ieee80211_mgd_assoc(struct ieee80211_sub_if_data *sdata,
> > > > > return -EALREADY;
> > > > > }
> > > > >
> > > > > + /*
> > > > > + * right before this was authentication, that was on
> > > > > + * the same the wk->chan so we need to ensure we temporarily
> > > > > + * go back to the operating channel to send the disassocation.
> > > > > + */
> > > > > + local->tmp_channel = NULL;
> > > > > + ieee80211_hw_config(local, 0);
> > > > > +
> > > >
> > > > Yikes, no, you can't do this. You don't know what has been merged with
> > > > the authentication work item, etc.
> > >
> > > What do you mean? Auth/Assoc will be handled separately and the
> > > target channel will also be handled for each work item run, what
> > > issues can arise from sending us back to the home channel towards
> > > the end of this call here?
> >
> > Another work item, like a remain-on-channel, might have been started on
> > the same channel ("merged").
>
> But how do this break that other work from moving forward in its respective
> channel? The work_work loop makes sure to check if we require a channel
> change on the work item.
But we already activated that other work item.
johannes
On Mon, Oct 4, 2010 at 3:47 PM, Jouni Malinen <[email protected]> wrote:
> On Mon, Oct 04, 2010 at 02:07:22PM -0700, Luis R. Rodriguez wrote:
>> Jouni, just to be clear so you are fine with dropping explicitly the
>> tear down of the BA agreement to the old AP?
>
> If you are talking about not transmitting a frame, then yes, but
> obviously, we would still need to locally tear down whatever we have set
> up..
Sure.
> I think it would be a good default policy to not send any frames to the
> old AP after we have decided to roam to a new one. If this does not work
> well in some networks, I would like to here more detailed description on
> what exactly is happening.
No that's fine, I just was curious how the bridge stuff you described
was handled in enterprise networks, you would know better :)
OK -- so lets go with Johanne's patch instead of patch 2 and 3 as he
suggested :)
Luis
How about this instead of your patches 2 and 3?
We're disassociating from the old AP since we associate with the new AP,
so there's not a whole lot of sense in explicitly tearing down
aggregation sessions either, right?
---
net/mac80211/agg-rx.c | 8 ++++----
net/mac80211/agg-tx.c | 14 +++++++++-----
net/mac80211/debugfs_sta.c | 3 ++-
net/mac80211/ht.c | 17 ++++++++++-------
net/mac80211/ieee80211_i.h | 12 +++++++-----
net/mac80211/iface.c | 3 ++-
net/mac80211/mlme.c | 18 +++++++++---------
net/mac80211/pm.c | 2 +-
net/mac80211/sta_info.c | 2 +-
net/mac80211/sta_info.h | 2 ++
net/mac80211/util.c | 2 +-
11 files changed, 48 insertions(+), 35 deletions(-)
--- wireless-testing.orig/net/mac80211/agg-tx.c 2010-10-04 15:18:07.000000000 +0200
+++ wireless-testing/net/mac80211/agg-tx.c 2010-10-04 15:23:38.000000000 +0200
@@ -145,7 +145,8 @@ static void kfree_tid_tx(struct rcu_head
}
int ___ieee80211_stop_tx_ba_session(struct sta_info *sta, u16 tid,
- enum ieee80211_back_parties initiator)
+ enum ieee80211_back_parties initiator,
+ bool tx)
{
struct ieee80211_local *local = sta->local;
struct tid_ampdu_tx *tid_tx = sta->ampdu_mlme.tid_tx[tid];
@@ -183,6 +184,7 @@ int ___ieee80211_stop_tx_ba_session(stru
clear_bit(HT_AGG_STATE_OPERATIONAL, &tid_tx->state);
tid_tx->stop_initiator = initiator;
+ tid_tx->tx_stop = tx;
ret = drv_ampdu_action(local, sta->sdata,
IEEE80211_AMPDU_TX_STOP,
@@ -575,13 +577,14 @@ void ieee80211_start_tx_ba_cb_irqsafe(st
EXPORT_SYMBOL(ieee80211_start_tx_ba_cb_irqsafe);
int __ieee80211_stop_tx_ba_session(struct sta_info *sta, u16 tid,
- enum ieee80211_back_parties initiator)
+ enum ieee80211_back_parties initiator,
+ bool tx)
{
int ret;
mutex_lock(&sta->ampdu_mlme.mtx);
- ret = ___ieee80211_stop_tx_ba_session(sta, tid, initiator);
+ ret = ___ieee80211_stop_tx_ba_session(sta, tid, initiator, tx);
mutex_unlock(&sta->ampdu_mlme.mtx);
@@ -670,7 +673,7 @@ void ieee80211_stop_tx_ba_cb(struct ieee
goto unlock_sta;
}
- if (tid_tx->stop_initiator == WLAN_BACK_INITIATOR)
+ if (tid_tx->stop_initiator == WLAN_BACK_INITIATOR && tid_tx->tx_stop)
ieee80211_send_delba(sta->sdata, ra, tid,
WLAN_BACK_INITIATOR, WLAN_REASON_QSTA_NOT_USE);
@@ -770,7 +773,8 @@ void ieee80211_process_addba_resp(struct
sta->ampdu_mlme.addba_req_num[tid] = 0;
} else {
- ___ieee80211_stop_tx_ba_session(sta, tid, WLAN_BACK_INITIATOR);
+ ___ieee80211_stop_tx_ba_session(sta, tid, WLAN_BACK_INITIATOR,
+ true);
}
out:
--- wireless-testing.orig/net/mac80211/ht.c 2010-10-04 15:17:19.000000000 +0200
+++ wireless-testing/net/mac80211/ht.c 2010-10-04 15:19:04.000000000 +0200
@@ -101,16 +101,16 @@ void ieee80211_ht_cap_ie_to_sta_ht_cap(s
ht_cap->mcs.rx_mask[32/8] |= 1;
}
-void ieee80211_sta_tear_down_BA_sessions(struct sta_info *sta)
+void ieee80211_sta_tear_down_BA_sessions(struct sta_info *sta, bool tx)
{
int i;
cancel_work_sync(&sta->ampdu_mlme.work);
for (i = 0; i < STA_TID_NUM; i++) {
- __ieee80211_stop_tx_ba_session(sta, i, WLAN_BACK_INITIATOR);
+ __ieee80211_stop_tx_ba_session(sta, i, WLAN_BACK_INITIATOR, tx);
__ieee80211_stop_rx_ba_session(sta, i, WLAN_BACK_RECIPIENT,
- WLAN_REASON_QSTA_LEAVE_QBSS);
+ WLAN_REASON_QSTA_LEAVE_QBSS, tx);
}
}
@@ -135,7 +135,7 @@ void ieee80211_ba_session_work(struct wo
if (test_and_clear_bit(tid, sta->ampdu_mlme.tid_rx_timer_expired))
___ieee80211_stop_rx_ba_session(
sta, tid, WLAN_BACK_RECIPIENT,
- WLAN_REASON_QSTA_TIMEOUT);
+ WLAN_REASON_QSTA_TIMEOUT, true);
tid_tx = sta->ampdu_mlme.tid_tx[tid];
if (!tid_tx)
@@ -146,7 +146,8 @@ void ieee80211_ba_session_work(struct wo
else if (test_and_clear_bit(HT_AGG_STATE_WANT_STOP,
&tid_tx->state))
___ieee80211_stop_tx_ba_session(sta, tid,
- WLAN_BACK_INITIATOR);
+ WLAN_BACK_INITIATOR,
+ true);
}
mutex_unlock(&sta->ampdu_mlme.mtx);
}
@@ -214,9 +215,11 @@ void ieee80211_process_delba(struct ieee
#endif /* CONFIG_MAC80211_HT_DEBUG */
if (initiator == WLAN_BACK_INITIATOR)
- __ieee80211_stop_rx_ba_session(sta, tid, WLAN_BACK_INITIATOR, 0);
+ __ieee80211_stop_rx_ba_session(sta, tid, WLAN_BACK_INITIATOR, 0,
+ true);
else
- __ieee80211_stop_tx_ba_session(sta, tid, WLAN_BACK_RECIPIENT);
+ __ieee80211_stop_tx_ba_session(sta, tid, WLAN_BACK_RECIPIENT,
+ true);
}
int ieee80211_send_smps_action(struct ieee80211_sub_if_data *sdata,
--- wireless-testing.orig/net/mac80211/ieee80211_i.h 2010-10-04 15:17:19.000000000 +0200
+++ wireless-testing/net/mac80211/ieee80211_i.h 2010-10-04 15:22:24.000000000 +0200
@@ -1170,10 +1170,10 @@ int ieee80211_send_smps_action(struct ie
void ieee80211_request_smps_work(struct work_struct *work);
void ___ieee80211_stop_rx_ba_session(struct sta_info *sta, u16 tid,
- u16 initiator, u16 reason);
+ u16 initiator, u16 reason, bool stop);
void __ieee80211_stop_rx_ba_session(struct sta_info *sta, u16 tid,
- u16 initiator, u16 reason);
-void ieee80211_sta_tear_down_BA_sessions(struct sta_info *sta);
+ u16 initiator, u16 reason, bool stop);
+void ieee80211_sta_tear_down_BA_sessions(struct sta_info *sta, bool tx);
void ieee80211_process_delba(struct ieee80211_sub_if_data *sdata,
struct sta_info *sta,
struct ieee80211_mgmt *mgmt, size_t len);
@@ -1187,9 +1187,11 @@ void ieee80211_process_addba_request(str
size_t len);
int __ieee80211_stop_tx_ba_session(struct sta_info *sta, u16 tid,
- enum ieee80211_back_parties initiator);
+ enum ieee80211_back_parties initiator,
+ bool tx);
int ___ieee80211_stop_tx_ba_session(struct sta_info *sta, u16 tid,
- enum ieee80211_back_parties initiator);
+ enum ieee80211_back_parties initiator,
+ bool tx);
void ieee80211_start_tx_ba_cb(struct ieee80211_vif *vif, u8 *ra, u16 tid);
void ieee80211_stop_tx_ba_cb(struct ieee80211_vif *vif, u8 *ra, u8 tid);
void ieee80211_ba_session_work(struct work_struct *work);
--- wireless-testing.orig/net/mac80211/mlme.c 2010-10-04 15:16:03.000000000 +0200
+++ wireless-testing/net/mac80211/mlme.c 2010-10-04 15:17:05.000000000 +0200
@@ -921,7 +921,7 @@ static void ieee80211_set_associated(str
}
static void ieee80211_set_disassoc(struct ieee80211_sub_if_data *sdata,
- bool remove_sta)
+ bool remove_sta, bool tx)
{
struct ieee80211_if_managed *ifmgd = &sdata->u.mgd;
struct ieee80211_local *local = sdata->local;
@@ -960,7 +960,7 @@ static void ieee80211_set_disassoc(struc
sta = sta_info_get(sdata, bssid);
if (sta) {
set_sta_flags(sta, WLAN_STA_BLOCK_BA);
- ieee80211_sta_tear_down_BA_sessions(sta);
+ ieee80211_sta_tear_down_BA_sessions(sta, tx);
}
mutex_unlock(&local->sta_mtx);
@@ -1124,7 +1124,7 @@ static void __ieee80211_connection_loss(
printk(KERN_DEBUG "Connection to AP %pM lost.\n", bssid);
- ieee80211_set_disassoc(sdata, true);
+ ieee80211_set_disassoc(sdata, true, true);
mutex_unlock(&ifmgd->mtx);
mutex_lock(&local->mtx);
@@ -1197,7 +1197,7 @@ ieee80211_rx_mgmt_deauth(struct ieee8021
printk(KERN_DEBUG "%s: deauthenticated from %pM (Reason: %u)\n",
sdata->name, bssid, reason_code);
- ieee80211_set_disassoc(sdata, true);
+ ieee80211_set_disassoc(sdata, true, false);
mutex_lock(&sdata->local->mtx);
ieee80211_recalc_idle(sdata->local);
mutex_unlock(&sdata->local->mtx);
@@ -1229,7 +1229,7 @@ ieee80211_rx_mgmt_disassoc(struct ieee80
printk(KERN_DEBUG "%s: disassociated from %pM (Reason: %u)\n",
sdata->name, mgmt->sa, reason_code);
- ieee80211_set_disassoc(sdata, true);
+ ieee80211_set_disassoc(sdata, true, false);
mutex_lock(&sdata->local->mtx);
ieee80211_recalc_idle(sdata->local);
mutex_unlock(&sdata->local->mtx);
@@ -1879,7 +1879,7 @@ void ieee80211_sta_work(struct ieee80211
printk(KERN_DEBUG "No probe response from AP %pM"
" after %dms, disconnecting.\n",
bssid, (1000 * IEEE80211_PROBE_WAIT)/HZ);
- ieee80211_set_disassoc(sdata, true);
+ ieee80211_set_disassoc(sdata, true, true);
mutex_unlock(&ifmgd->mtx);
mutex_lock(&local->mtx);
ieee80211_recalc_idle(local);
@@ -2203,7 +2203,7 @@ int ieee80211_mgd_assoc(struct ieee80211
}
/* Trying to reassociate - clear previous association state */
- ieee80211_set_disassoc(sdata, true);
+ ieee80211_set_disassoc(sdata, true, false);
}
mutex_unlock(&ifmgd->mtx);
@@ -2315,7 +2315,7 @@ int ieee80211_mgd_deauth(struct ieee8021
memcpy(bssid, req->bss->bssid, ETH_ALEN);
if (ifmgd->associated == req->bss) {
- ieee80211_set_disassoc(sdata, false);
+ ieee80211_set_disassoc(sdata, false, true);
mutex_unlock(&ifmgd->mtx);
assoc_bss = true;
} else {
@@ -2398,7 +2398,7 @@ int ieee80211_mgd_disassoc(struct ieee80
sdata->name, req->bss->bssid, req->reason_code);
memcpy(bssid, req->bss->bssid, ETH_ALEN);
- ieee80211_set_disassoc(sdata, false);
+ ieee80211_set_disassoc(sdata, false, true);
mutex_unlock(&ifmgd->mtx);
--- wireless-testing.orig/net/mac80211/pm.c 2010-10-04 15:17:18.000000000 +0200
+++ wireless-testing/net/mac80211/pm.c 2010-10-04 15:17:37.000000000 +0200
@@ -46,7 +46,7 @@ int __ieee80211_suspend(struct ieee80211
list_for_each_entry(sta, &local->sta_list, list) {
if (hw->flags & IEEE80211_HW_AMPDU_AGGREGATION) {
set_sta_flags(sta, WLAN_STA_BLOCK_BA);
- ieee80211_sta_tear_down_BA_sessions(sta);
+ ieee80211_sta_tear_down_BA_sessions(sta, true);
}
if (sta->uploaded) {
--- wireless-testing.orig/net/mac80211/sta_info.c 2010-10-04 15:17:19.000000000 +0200
+++ wireless-testing/net/mac80211/sta_info.c 2010-10-04 15:17:42.000000000 +0200
@@ -633,7 +633,7 @@ static int __must_check __sta_info_destr
* will be sufficient.
*/
set_sta_flags(sta, WLAN_STA_BLOCK_BA);
- ieee80211_sta_tear_down_BA_sessions(sta);
+ ieee80211_sta_tear_down_BA_sessions(sta, true);
spin_lock_irqsave(&local->sta_lock, flags);
ret = sta_info_hash_del(local, sta);
--- wireless-testing.orig/net/mac80211/sta_info.h 2010-10-04 15:21:03.000000000 +0200
+++ wireless-testing/net/mac80211/sta_info.h 2010-10-04 15:24:14.000000000 +0200
@@ -79,6 +79,7 @@ enum ieee80211_sta_info_flags {
* @dialog_token: dialog token for aggregation session
* @state: session state (see above)
* @stop_initiator: initiator of a session stop
+ * @tx_stop: TX DelBA frame when stopping
*
* This structure is protected by RCU and the per-station
* spinlock. Assignments to the array holding it must hold
@@ -95,6 +96,7 @@ struct tid_ampdu_tx {
unsigned long state;
u8 dialog_token;
u8 stop_initiator;
+ bool tx_stop;
};
/**
--- wireless-testing.orig/net/mac80211/util.c 2010-10-04 15:17:18.000000000 +0200
+++ wireless-testing/net/mac80211/util.c 2010-10-04 15:17:32.000000000 +0200
@@ -1221,7 +1221,7 @@ int ieee80211_reconfig(struct ieee80211_
mutex_lock(&local->sta_mtx);
list_for_each_entry(sta, &local->sta_list, list) {
- ieee80211_sta_tear_down_BA_sessions(sta);
+ ieee80211_sta_tear_down_BA_sessions(sta, true);
clear_sta_flags(sta, WLAN_STA_BLOCK_BA);
}
--- wireless-testing.orig/net/mac80211/agg-rx.c 2010-10-04 15:18:07.000000000 +0200
+++ wireless-testing/net/mac80211/agg-rx.c 2010-10-04 15:24:39.000000000 +0200
@@ -56,7 +56,7 @@ static void ieee80211_free_tid_rx(struct
}
void ___ieee80211_stop_rx_ba_session(struct sta_info *sta, u16 tid,
- u16 initiator, u16 reason)
+ u16 initiator, u16 reason, bool tx)
{
struct ieee80211_local *local = sta->local;
struct tid_ampdu_rx *tid_rx;
@@ -81,7 +81,7 @@ void ___ieee80211_stop_rx_ba_session(str
"aggregation for tid %d\n", tid);
/* check if this is a self generated aggregation halt */
- if (initiator == WLAN_BACK_RECIPIENT)
+ if (initiator == WLAN_BACK_RECIPIENT && tx)
ieee80211_send_delba(sta->sdata, sta->sta.addr,
tid, 0, reason);
@@ -92,10 +92,10 @@ void ___ieee80211_stop_rx_ba_session(str
}
void __ieee80211_stop_rx_ba_session(struct sta_info *sta, u16 tid,
- u16 initiator, u16 reason)
+ u16 initiator, u16 reason, bool tx)
{
mutex_lock(&sta->ampdu_mlme.mtx);
- ___ieee80211_stop_rx_ba_session(sta, tid, initiator, reason);
+ ___ieee80211_stop_rx_ba_session(sta, tid, initiator, reason, tx);
mutex_unlock(&sta->ampdu_mlme.mtx);
}
--- wireless-testing.orig/net/mac80211/iface.c 2010-10-04 15:24:46.000000000 +0200
+++ wireless-testing/net/mac80211/iface.c 2010-10-04 15:24:50.000000000 +0200
@@ -793,7 +793,8 @@ static void ieee80211_iface_work(struct
__ieee80211_stop_rx_ba_session(
sta, tid, WLAN_BACK_RECIPIENT,
- WLAN_REASON_QSTA_REQUIRE_SETUP);
+ WLAN_REASON_QSTA_REQUIRE_SETUP,
+ true);
}
mutex_unlock(&local->sta_mtx);
} else switch (sdata->vif.type) {
--- wireless-testing.orig/net/mac80211/debugfs_sta.c 2010-10-04 15:25:14.000000000 +0200
+++ wireless-testing/net/mac80211/debugfs_sta.c 2010-10-04 15:25:23.000000000 +0200
@@ -196,7 +196,8 @@ static ssize_t sta_agg_status_write(stru
else
ret = ieee80211_stop_tx_ba_session(&sta->sta, tid);
} else {
- __ieee80211_stop_rx_ba_session(sta, tid, WLAN_BACK_RECIPIENT, 3);
+ __ieee80211_stop_rx_ba_session(sta, tid, WLAN_BACK_RECIPIENT,
+ 3, true);
ret = 0;
}
On Mon, 2010-10-04 at 16:50 -0700, Luis R. Rodriguez wrote:
> On Mon, Oct 4, 2010 at 3:47 PM, Jouni Malinen <[email protected]> wrote:
> > On Mon, Oct 04, 2010 at 02:07:22PM -0700, Luis R. Rodriguez wrote:
> >> Jouni, just to be clear so you are fine with dropping explicitly the
> >> tear down of the BA agreement to the old AP?
> >
> > If you are talking about not transmitting a frame, then yes, but
> > obviously, we would still need to locally tear down whatever we have set
> > up..
>
> Sure.
Right -- exactly what my (untested!!) patch did.
> > I think it would be a good default policy to not send any frames to the
> > old AP after we have decided to roam to a new one. If this does not work
> > well in some networks, I would like to here more detailed description on
> > what exactly is happening.
>
> No that's fine, I just was curious how the bridge stuff you described
> was handled in enterprise networks, you would know better :)
Look at net/mac80211/cfg.c - ieee80211_send_layer2_update.
> OK -- so lets go with Johanne's patch instead of patch 2 and 3 as he
> suggested :)
Have you tested it?
johannes
On Mon, Oct 04, 2010 at 09:42:46AM -0700, Johannes Berg wrote:
> On Mon, 2010-10-04 at 09:41 -0700, Luis R. Rodriguez wrote:
> > On Mon, Oct 04, 2010 at 06:27:25AM -0700, Johannes Berg wrote:
> > > How about this instead of your patches 2 and 3?
> > >
> > > We're disassociating from the old AP since we associate with the new AP,
> > > so there's not a whole lot of sense in explicitly tearing down
> > > aggregation sessions either, right?
> >
> > Good question, but why do we not get a warning when transmitting
> > the disassoc frame?
>
> We aren't transmitting a disassoc frame!
OK, then the question is if we should even at least do that...
Luis
On Tue, Oct 05, 2010 at 12:59:42AM -0700, Johannes Berg wrote:
> On Mon, 2010-10-04 at 13:55 -0700, Luis R. Rodriguez wrote:
>
> > > We don't need to go out of PS state to just TX, but we'd need to be
> > > careful to TX with asleep bit.
> >
> > I got what you meant here.
> >
> > > That said, we don't TX data frames then.
> >
> > But not here. Right now I am going to assume that we actually are
> > transmitting some frames for the delba when we try to tear down
> > the BA agreements with the old AP and the new AP are on the
> > same band, we just likely transmit it on the wrong channel.
>
> Yes, but delBA aren't data frames. The queue stop etc. doesn't pertain
> to them.
Got it.
> > Ah and also we did call ieee80211_offchannel_stop_beaconing() prior
> > to processing work_work stuff so that should take care of stopping
> > beaconing but that also turns off all TX queues... so yeah you're
> > right. The race here was just within work items assuming they can
> > transmit on other channels than the wk->chan.
>
> Did any work item actually assume that? Except maybe for your patch 1 in
> this set?
So far I just see the assoc request doing that.
> > > > Also this seems buggy, we do not take into consideration how much offchannel
> > > > work we are doing in consideration against the current AP's DTIM interval as
> > > > we do when going offchannel for scan work. We should merge that code for
> > > > this offchannel work_work loop.
> > >
> > > True, we don't do _any_ timing here.
> >
> > We can resolve that later, I'll add that to the TODO list.
>
> Come to think of it -- some p2p stuff in the supplicant might time out
> if we do this,
Its no different than the case of a DTIM 1 case when we do bgscan
and need to scan through some passive scan channels.
> I think we need to cut the drivers/mac80211 some more
> slack in the supplicant and not assume the remain-on-channel will start
> quickly.
Not sure I follow, do you mean that as per API we want to document
offchannel operation may loose broadcast/multicast data frames?
Luis
On Mon, Oct 4, 2010 at 1:39 PM, Jouni Malinen <[email protected]> wrote:
> On Mon, Oct 04, 2010 at 10:35:07AM -0700, Luis R. Rodriguez wrote:
>> I was wondering if actively disassociating might help with a smoother
>> transition. I was under this why we were doing this in the first place.
>> I frankly do not know, but if it does not help then I do agree with
>> your patch replacement.
>
> In many cases, it may end up harming more than helping.. At minimum, it
> takes some time to transmit the frame (and do the channel changes, if
> needed). Furthermore, this makes it more difficult for centrally managed
> networks to optimize roaming since we would be disassociating and
> associating as a new association instead of doing proper re-association.
> In such networks, the APs (or well, likely some sort of central manager)
> takes care of clearing the old association when the reassociation is
> being processed. In addition, this could potentially tunnel some frames
> through the new AP or at least make sure that bridge tables gets
> updated.
>
Jouni, just to be clear so you are fine with dropping explicitly the
tear down of the BA agreement to the old AP?
Luis
On Mon, 2010-10-04 at 09:41 -0700, Luis R. Rodriguez wrote:
> On Mon, Oct 04, 2010 at 06:27:25AM -0700, Johannes Berg wrote:
> > How about this instead of your patches 2 and 3?
> >
> > We're disassociating from the old AP since we associate with the new AP,
> > so there's not a whole lot of sense in explicitly tearing down
> > aggregation sessions either, right?
>
> Good question, but why do we not get a warning when transmitting
> the disassoc frame?
We aren't transmitting a disassoc frame!
johannes
On Mon, 2010-10-04 at 13:55 -0700, Luis R. Rodriguez wrote:
> > We don't need to go out of PS state to just TX, but we'd need to be
> > careful to TX with asleep bit.
>
> I got what you meant here.
>
> > That said, we don't TX data frames then.
>
> But not here. Right now I am going to assume that we actually are
> transmitting some frames for the delba when we try to tear down
> the BA agreements with the old AP and the new AP are on the
> same band, we just likely transmit it on the wrong channel.
Yes, but delBA aren't data frames. The queue stop etc. doesn't pertain
to them.
> Ah and also we did call ieee80211_offchannel_stop_beaconing() prior
> to processing work_work stuff so that should take care of stopping
> beaconing but that also turns off all TX queues... so yeah you're
> right. The race here was just within work items assuming they can
> transmit on other channels than the wk->chan.
Did any work item actually assume that? Except maybe for your patch 1 in
this set?
> > > Also this seems buggy, we do not take into consideration how much offchannel
> > > work we are doing in consideration against the current AP's DTIM interval as
> > > we do when going offchannel for scan work. We should merge that code for
> > > this offchannel work_work loop.
> >
> > True, we don't do _any_ timing here.
>
> We can resolve that later, I'll add that to the TODO list.
Come to think of it -- some p2p stuff in the supplicant might time out
if we do this, I think we need to cut the drivers/mac80211 some more
slack in the supplicant and not assume the remain-on-channel will start
quickly.
johannes
On Mon, Oct 04, 2010 at 02:07:22PM -0700, Luis R. Rodriguez wrote:
> Jouni, just to be clear so you are fine with dropping explicitly the
> tear down of the BA agreement to the old AP?
If you are talking about not transmitting a frame, then yes, but
obviously, we would still need to locally tear down whatever we have set
up..
I think it would be a good default policy to not send any frames to the
old AP after we have decided to roam to a new one. If this does not work
well in some networks, I would like to here more detailed description on
what exactly is happening.
--
Jouni Malinen PGP id EFC895FA
On Mon, Oct 04, 2010 at 09:41:56AM -0700, Johannes Berg wrote:
> On Mon, 2010-10-04 at 09:38 -0700, Luis R. Rodriguez wrote:
> > On Mon, Oct 04, 2010 at 06:15:52AM -0700, Johannes Berg wrote:
> > > On Fri, 2010-10-01 at 16:33 -0400, Luis R. Rodriguez wrote:
> > >
> > > > @@ -2203,6 +2204,14 @@ int ieee80211_mgd_assoc(struct ieee80211_sub_if_data *sdata,
> > > > return -EALREADY;
> > > > }
> > > >
> > > > + /*
> > > > + * right before this was authentication, that was on
> > > > + * the same the wk->chan so we need to ensure we temporarily
> > > > + * go back to the operating channel to send the disassocation.
> > > > + */
> > > > + local->tmp_channel = NULL;
> > > > + ieee80211_hw_config(local, 0);
> > > > +
> > >
> > > Yikes, no, you can't do this. You don't know what has been merged with
> > > the authentication work item, etc.
> >
> > What do you mean? Auth/Assoc will be handled separately and the
> > target channel will also be handled for each work item run, what
> > issues can arise from sending us back to the home channel towards
> > the end of this call here?
>
> Another work item, like a remain-on-channel, might have been started on
> the same channel ("merged").
But how do this break that other work from moving forward in its respective
channel? The work_work loop makes sure to check if we require a channel
change on the work item.
Luis
On Mon, Oct 04, 2010 at 11:39:29AM -0700, Johannes Berg wrote:
> On Mon, 2010-10-04 at 10:23 -0700, Luis R. Rodriguez wrote:
> > On Mon, Oct 04, 2010 at 09:41:56AM -0700, Johannes Berg wrote:
> > > On Mon, 2010-10-04 at 09:38 -0700, Luis R. Rodriguez wrote:
> > > > On Mon, Oct 04, 2010 at 06:15:52AM -0700, Johannes Berg wrote:
> > > > > On Fri, 2010-10-01 at 16:33 -0400, Luis R. Rodriguez wrote:
> > > > >
> > > > > > @@ -2203,6 +2204,14 @@ int ieee80211_mgd_assoc(struct ieee80211_sub_if_data *sdata,
> > > > > > return -EALREADY;
> > > > > > }
> > > > > >
> > > > > > + /*
> > > > > > + * right before this was authentication, that was on
> > > > > > + * the same the wk->chan so we need to ensure we temporarily
> > > > > > + * go back to the operating channel to send the disassocation.
> > > > > > + */
> > > > > > + local->tmp_channel = NULL;
> > > > > > + ieee80211_hw_config(local, 0);
> > > > > > +
> > > > >
> > > > > Yikes, no, you can't do this. You don't know what has been merged with
> > > > > the authentication work item, etc.
> > > >
> > > > What do you mean? Auth/Assoc will be handled separately and the
> > > > target channel will also be handled for each work item run, what
> > > > issues can arise from sending us back to the home channel towards
> > > > the end of this call here?
> > >
> > > Another work item, like a remain-on-channel, might have been started on
> > > the same channel ("merged").
> >
> > But how do this break that other work from moving forward in its respective
> > channel? The work_work loop makes sure to check if we require a channel
> > change on the work item.
>
> But we already activated that other work item.
Ah yes, I see that now, thanks..
Luis
On Fri, 2010-10-01 at 16:33 -0400, Luis R. Rodriguez wrote:
> Be consistent and use the wk->chan instead of the
> local->hw.conf.channel for the association done work.
> This prevents any possible races against channel changes
> while we run this work.
>
> In the case that the race did happen we would be initializing
> the bit rates for the new AP under the assumption of a wrong
> channel and in the worst case, wrong band. This could lead
> to trying to assuming we could use CCK frames on 5 GHz, for
> example.
Acked-by: Johannes Berg <[email protected]>
johannes
On Mon, 2010-10-04 at 11:04 -0700, Luis R. Rodriguez wrote:
> On Mon, Oct 04, 2010 at 09:39:52AM -0700, Johannes Berg wrote:
> > On Mon, 2010-10-04 at 09:36 -0700, Luis R. Rodriguez wrote:
> >
> > > > > +wait:
> > > > > wk->timeout = jiffies + IEEE80211_ASSOC_TIMEOUT;
> > > > > run_again(local, wk->timeout);
> > > >
> > > > But you'll be staying off-channel for the wait period, so what does this
> > > > really help?
> > >
> > > I totally missed this what locks us offchannel here, I though we just re-arm
> > > the timer, and come back offchannel at a later time. What is it that locks
> > > us offchannel until the timer runs again?
> >
> > I believe we stay off-channel as long as the work item is active, after
> > it has been activated, no?
>
> Well I don't see that, the problem here was the assumption that within a work
> item we can try to transmit a frame for our home channel without changing it.
Yes, that's true, but we do try to stop most things ... we just miss
some :-)
> If that is desired we must move back to the home channel as I did, but I can
> see how we'd need more work than what I did, we'd need to start the queues,
> get out of PS state with the AP and then TX... unless TX already handles
> that for us.
We don't need to go out of PS state to just TX, but we'd need to be
careful to TX with asleep bit. That said, we don't TX data frames then.
> ieee80211_work_work() just iterates over all work items, and then bails out.
> The work loop is protected against local->mtx, and if we call work_work
> when we either add new work, purge work, or hit a timer. We *try* to prevent
> frames from being sent on the home channel by calling
> ieee80211_offchannel_stop_station() but notice how we only stop the queues
> for NL80211_IFTYPE_STATION interfaces.
No, we just stop the station ones differently from all the others.
> Also this seems buggy, we do not take into consideration how much offchannel
> work we are doing in consideration against the current AP's DTIM interval as
> we do when going offchannel for scan work. We should merge that code for
> this offchannel work_work loop.
True, we don't do _any_ timing here.
johannes
On Mon, Oct 04, 2010 at 10:30:26AM -0700, Johannes Berg wrote:
> On Mon, 2010-10-04 at 10:25 -0700, Luis R. Rodriguez wrote:
> > On Mon, Oct 04, 2010 at 09:42:46AM -0700, Johannes Berg wrote:
> > > On Mon, 2010-10-04 at 09:41 -0700, Luis R. Rodriguez wrote:
> > > > On Mon, Oct 04, 2010 at 06:27:25AM -0700, Johannes Berg wrote:
> > > > > How about this instead of your patches 2 and 3?
> > > > >
> > > > > We're disassociating from the old AP since we associate with the new AP,
> > > > > so there's not a whole lot of sense in explicitly tearing down
> > > > > aggregation sessions either, right?
> > > >
> > > > Good question, but why do we not get a warning when transmitting
> > > > the disassoc frame?
> > >
> > > We aren't transmitting a disassoc frame!
> >
> > OK, then the question is if we should even at least do that...
>
> No, why should we? We're sending a re-assoc frame to the new AP.
I was wondering if actively disassociating might help with a smoother
transition. I was under this why we were doing this in the first place.
I frankly do not know, but if it does not help then I do agree with
your patch replacement.
Luis
On Fri, 2010-10-01 at 16:33 -0400, Luis R. Rodriguez wrote:
> @@ -936,7 +936,6 @@ static void ieee80211_set_disassoc(struct ieee80211_sub_if_data *sdata,
>
> memcpy(bssid, ifmgd->associated->bssid, ETH_ALEN);
>
> - ifmgd->associated = NULL;
> memset(ifmgd->bssid, 0, ETH_ALEN);
>
> /*
You need to read this comment and the one below it :-)
> @@ -513,6 +513,19 @@ ieee80211_associate(struct ieee80211_work *wk)
> {
> struct ieee80211_sub_if_data *sdata = wk->sdata;
> struct ieee80211_local *local = sdata->local;
> + struct ieee80211_if_managed *ifmgd = &sdata->u.mgd;
> + bool associated;
> +
> + mutex_lock(&ifmgd->mtx);
> + associated = !!ifmgd->associated;
> + mutex_unlock(&ifmgd->mtx);
> +
> + if (associated) {
> + printk(KERN_DEBUG "%s: delaying association with %pM as "
> + "we are still associated",
> + sdata->name, wk->filter_ta);
> + goto wait;
> + }
>
> wk->assoc.tries++;
> if (wk->assoc.tries > IEEE80211_ASSOC_MAX_TRIES) {
> @@ -534,6 +547,7 @@ ieee80211_associate(struct ieee80211_work *wk)
> sdata->name, wk->filter_ta, wk->assoc.tries);
> ieee80211_send_assoc(sdata, wk);
>
> +wait:
> wk->timeout = jiffies + IEEE80211_ASSOC_TIMEOUT;
> run_again(local, wk->timeout);
But you'll be staying off-channel for the wait period, so what does this
really help?
johannes
On Mon, 2010-10-04 at 09:38 -0700, Luis R. Rodriguez wrote:
> On Mon, Oct 04, 2010 at 06:15:52AM -0700, Johannes Berg wrote:
> > On Fri, 2010-10-01 at 16:33 -0400, Luis R. Rodriguez wrote:
> >
> > > @@ -2203,6 +2204,14 @@ int ieee80211_mgd_assoc(struct ieee80211_sub_if_data *sdata,
> > > return -EALREADY;
> > > }
> > >
> > > + /*
> > > + * right before this was authentication, that was on
> > > + * the same the wk->chan so we need to ensure we temporarily
> > > + * go back to the operating channel to send the disassocation.
> > > + */
> > > + local->tmp_channel = NULL;
> > > + ieee80211_hw_config(local, 0);
> > > +
> >
> > Yikes, no, you can't do this. You don't know what has been merged with
> > the authentication work item, etc.
>
> What do you mean? Auth/Assoc will be handled separately and the
> target channel will also be handled for each work item run, what
> issues can arise from sending us back to the home channel towards
> the end of this call here?
Another work item, like a remain-on-channel, might have been started on
the same channel ("merged").
johannes
When roaming we try to disassociate with our old AP prior to
associating. The assumption though is that we will send our
disassociation first prior to sending our association request
to the new AP. This should in theory work given that the
association is put in through a workqueue but to avoid any
possible races just ensure we wait for complete disassociation
prior to trying to associate to the new AP.
In the worst case scenerio this will delay our association
request by one iteration on the work item.
This patch has a fix for kernels >= v2.6.34
Cc: [email protected]
Cc: Jouni Malinen <[email protected]>
Cc: Paul Stewart <[email protected]>
Cc: Amod Bodas <[email protected]>
Cc: Johannes Berg <[email protected]>
Cc: Vasanthakumar Thiagarajan <[email protected]>
Signed-off-by: Luis R. Rodriguez <[email protected]>
---
net/mac80211/mlme.c | 3 ++-
net/mac80211/work.c | 14 ++++++++++++++
2 files changed, 16 insertions(+), 1 deletions(-)
diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
index daab0c6..7eff124 100644
--- a/net/mac80211/mlme.c
+++ b/net/mac80211/mlme.c
@@ -936,7 +936,6 @@ static void ieee80211_set_disassoc(struct ieee80211_sub_if_data *sdata,
memcpy(bssid, ifmgd->associated->bssid, ETH_ALEN);
- ifmgd->associated = NULL;
memset(ifmgd->bssid, 0, ETH_ALEN);
/*
@@ -1007,6 +1006,8 @@ static void ieee80211_set_disassoc(struct ieee80211_sub_if_data *sdata,
del_timer_sync(&sdata->u.mgd.bcn_mon_timer);
del_timer_sync(&sdata->u.mgd.timer);
del_timer_sync(&sdata->u.mgd.chswitch_timer);
+
+ ifmgd->associated = NULL;
}
void ieee80211_sta_rx_notify(struct ieee80211_sub_if_data *sdata,
diff --git a/net/mac80211/work.c b/net/mac80211/work.c
index ae344d1..268ac1d 100644
--- a/net/mac80211/work.c
+++ b/net/mac80211/work.c
@@ -513,6 +513,19 @@ ieee80211_associate(struct ieee80211_work *wk)
{
struct ieee80211_sub_if_data *sdata = wk->sdata;
struct ieee80211_local *local = sdata->local;
+ struct ieee80211_if_managed *ifmgd = &sdata->u.mgd;
+ bool associated;
+
+ mutex_lock(&ifmgd->mtx);
+ associated = !!ifmgd->associated;
+ mutex_unlock(&ifmgd->mtx);
+
+ if (associated) {
+ printk(KERN_DEBUG "%s: delaying association with %pM as "
+ "we are still associated",
+ sdata->name, wk->filter_ta);
+ goto wait;
+ }
wk->assoc.tries++;
if (wk->assoc.tries > IEEE80211_ASSOC_MAX_TRIES) {
@@ -534,6 +547,7 @@ ieee80211_associate(struct ieee80211_work *wk)
sdata->name, wk->filter_ta, wk->assoc.tries);
ieee80211_send_assoc(sdata, wk);
+wait:
wk->timeout = jiffies + IEEE80211_ASSOC_TIMEOUT;
run_again(local, wk->timeout);
--
1.7.0.4
Be consistent and use the wk->chan instead of the
local->hw.conf.channel for the association done work.
This prevents any possible races against channel changes
while we run this work.
In the case that the race did happen we would be initializing
the bit rates for the new AP under the assumption of a wrong
channel and in the worst case, wrong band. This could lead
to trying to assuming we could use CCK frames on 5 GHz, for
example.
This patch has a fix for kernels >= v2.6.34
Cc: [email protected]
Cc: Jouni Malinen <[email protected]>
Cc: Paul Stewart <[email protected]>
Cc: Amod Bodas <[email protected]>
Cc: Johannes Berg <[email protected]>
Cc: Vasanthakumar Thiagarajan <[email protected]>
Signed-off-by: Luis R. Rodriguez <[email protected]>
---
net/mac80211/mlme.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
index 77913a1..daab0c6 100644
--- a/net/mac80211/mlme.c
+++ b/net/mac80211/mlme.c
@@ -1291,7 +1291,7 @@ static bool ieee80211_assoc_success(struct ieee80211_work *wk,
rates = 0;
basic_rates = 0;
- sband = local->hw.wiphy->bands[local->hw.conf.channel->band];
+ sband = local->hw.wiphy->bands[wk->chan->band];
for (i = 0; i < elems.supp_rates_len; i++) {
int rate = (elems.supp_rates[i] & 0x7f) * 5;
@@ -1327,11 +1327,11 @@ static bool ieee80211_assoc_success(struct ieee80211_work *wk,
}
}
- sta->sta.supp_rates[local->hw.conf.channel->band] = rates;
+ sta->sta.supp_rates[wk->chan->band] = rates;
sdata->vif.bss_conf.basic_rates = basic_rates;
/* cf. IEEE 802.11 9.2.12 */
- if (local->hw.conf.channel->band == IEEE80211_BAND_2GHZ &&
+ if (wk->chan->band == IEEE80211_BAND_2GHZ &&
have_higher_than_11mbit)
sdata->flags |= IEEE80211_SDATA_OPERATING_GMODE;
else
--
1.7.0.4
On Mon, Oct 04, 2010 at 06:15:52AM -0700, Johannes Berg wrote:
> On Fri, 2010-10-01 at 16:33 -0400, Luis R. Rodriguez wrote:
>
> > @@ -2203,6 +2204,14 @@ int ieee80211_mgd_assoc(struct ieee80211_sub_if_data *sdata,
> > return -EALREADY;
> > }
> >
> > + /*
> > + * right before this was authentication, that was on
> > + * the same the wk->chan so we need to ensure we temporarily
> > + * go back to the operating channel to send the disassocation.
> > + */
> > + local->tmp_channel = NULL;
> > + ieee80211_hw_config(local, 0);
> > +
>
> Yikes, no, you can't do this. You don't know what has been merged with
> the authentication work item, etc.
What do you mean? Auth/Assoc will be handled separately and the
target channel will also be handled for each work item run, what
issues can arise from sending us back to the home channel towards
the end of this call here?
Luis
On Fri, 2010-10-01 at 16:33 -0400, Luis R. Rodriguez wrote:
> @@ -2203,6 +2204,14 @@ int ieee80211_mgd_assoc(struct ieee80211_sub_if_data *sdata,
> return -EALREADY;
> }
>
> + /*
> + * right before this was authentication, that was on
> + * the same the wk->chan so we need to ensure we temporarily
> + * go back to the operating channel to send the disassocation.
> + */
> + local->tmp_channel = NULL;
> + ieee80211_hw_config(local, 0);
> +
Yikes, no, you can't do this. You don't know what has been merged with
the authentication work item, etc.
johannes
On Mon, 2010-10-04 at 10:25 -0700, Luis R. Rodriguez wrote:
> On Mon, Oct 04, 2010 at 09:42:46AM -0700, Johannes Berg wrote:
> > On Mon, 2010-10-04 at 09:41 -0700, Luis R. Rodriguez wrote:
> > > On Mon, Oct 04, 2010 at 06:27:25AM -0700, Johannes Berg wrote:
> > > > How about this instead of your patches 2 and 3?
> > > >
> > > > We're disassociating from the old AP since we associate with the new AP,
> > > > so there's not a whole lot of sense in explicitly tearing down
> > > > aggregation sessions either, right?
> > >
> > > Good question, but why do we not get a warning when transmitting
> > > the disassoc frame?
> >
> > We aren't transmitting a disassoc frame!
>
> OK, then the question is if we should even at least do that...
No, why should we? We're sending a re-assoc frame to the new AP.
johannes
On Mon, 2010-10-04 at 09:36 -0700, Luis R. Rodriguez wrote:
> > > +wait:
> > > wk->timeout = jiffies + IEEE80211_ASSOC_TIMEOUT;
> > > run_again(local, wk->timeout);
> >
> > But you'll be staying off-channel for the wait period, so what does this
> > really help?
>
> I totally missed this what locks us offchannel here, I though we just re-arm
> the timer, and come back offchannel at a later time. What is it that locks
> us offchannel until the timer runs again?
I believe we stay off-channel as long as the work item is active, after
it has been activated, no?
johannes
On Tue, Oct 5, 2010 at 12:57 AM, Johannes Berg
<[email protected]> wrote:
> On Mon, 2010-10-04 at 16:50 -0700, Luis R. Rodriguez wrote:
>> On Mon, Oct 4, 2010 at 3:47 PM, Jouni Malinen <[email protected]> wrote:
>> > On Mon, Oct 04, 2010 at 02:07:22PM -0700, Luis R. Rodriguez wrote:
>> >> Jouni, just to be clear so you are fine with dropping explicitly the
>> >> tear down of the BA agreement to the old AP?
>> >
>> > If you are talking about not transmitting a frame, then yes, but
>> > obviously, we would still need to locally tear down whatever we have set
>> > up..
>>
>> Sure.
>
> Right -- exactly what my (untested!!) patch did.
>
>> > I think it would be a good default policy to not send any frames to the
>> > old AP after we have decided to roam to a new one. If this does not work
>> > well in some networks, I would like to here more detailed description on
>> > what exactly is happening.
>>
>> No that's fine, I just was curious how the bridge stuff you described
>> was handled in enterprise networks, you would know better :)
>
> Look at net/mac80211/cfg.c - ieee80211_send_layer2_update.
>
>> OK -- so lets go with Johanne's patch instead of patch 2 and 3 as he
>> suggested :)
>
> Have you tested it?
Tested, it works and resolves the issue if I drop my patch 2 and 3 and
instead use yours.
Luis
On Fri, Oct 01, 2010 at 01:33:50PM -0700, Luis R. Rodriguez wrote:
> The following two patches:
>
> 9c87ba6 - mac80211: Fix reassociation processing (within ESS roaming)
> e1dd33f - cfg80211: Allow reassociation in associated state
>
> $ git describe --contains 9c87ba6
> v2.6.34-rc2~48^2~77^2~6
> $ git describe --contains 9c87ba6
> v2.6.34-rc2~48^2~77^2~6
>
> Added support for cfg80211/mac80211 to cleanly roam between two BSSes
> on an ESS by allowing the station to authenticate to two APS at the
> same time, and when an association comes in for the new AP we first
> disassociate from the old AP and then associate with the new one.
>
> What we forgot to take into consideration is that when we disassociate
> with the older AP we may need to transmit frames to that AP and those
> frames may actually be intended to go under a different channel and
> even sometimes a completely separate band than the new APs. When we
> TX a frame we assume the frame we want to TX however will be on the
> current hardware configured channel. The channel we would try to
> send a frame on can be different than the channel we prepared the
> bitrates for the peer on though. What this meant is that upon tearing
> down a BA agreement we would try to send a frame to a peer but not
> find a valid rate for that peer and generate warnings like the
> following:
>
> WARNING: at include/net/mac80211.h:2677 rate_control_send_low+0xd3/0x140 [mac80211]()
> Hardware name: 6460DWU
> Modules linked in: ath9k mac80211 ath9k_common ath9k_hw ath cfg80211 <etc>
> Pid: 898, comm: wpa_supplicant Tainted: G W 2.6.36-rc5-wl+ #254
> Call Trace:
> [<ffffffff8105fddf>] warn_slowpath_common+0x7f/0xc0
> [<ffffffff8105fe3a>] warn_slowpath_null+0x1a/0x20
> [<ffffffffa03b8573>] rate_control_send_low+0xd3/0x140 [mac80211]
> [<ffffffffa0192fa8>] ath_get_rate+0x48/0x570 [ath9k]
> [<ffffffff812b8d19>] ? put_dec+0x59/0x60
> [<ffffffffa03b839e>] rate_control_get_rate+0x8e/0x190 [mac80211]
> [<ffffffffa03c27e8>] ieee80211_tx_h_rate_ctrl+0x1a8/0x4e0 [mac80211]
> [<ffffffffa03c2ec0>] invoke_tx_handlers+0x100/0x140 [mac80211]
> [<ffffffffa03c2f85>] ieee80211_tx+0x85/0x240 [mac80211]
> [<ffffffff81479c70>] ? skb_release_data+0xd0/0xe0
> [<ffffffff8147bb0d>] ? pskb_expand_head+0x10d/0x1a0
> [<ffffffffa03c31f6>] ieee80211_xmit+0xb6/0x1d0 [mac80211]
> [<ffffffff81479db3>] ? __alloc_skb+0x83/0x170
> [<ffffffffa03c3364>] ieee80211_tx_skb+0x54/0x70 [mac80211]
> [<ffffffffa03ac0dd>] ieee80211_send_delba+0x11d/0x190 [mac80211]
> [<ffffffffa03adcf0>] ___ieee80211_stop_rx_ba_session+0xf0/0x110 [mac80211]
> [<ffffffffa03add60>] __ieee80211_stop_rx_ba_session+0x50/0x70 [mac80211]
> [<ffffffffa03ac3f3>] ieee80211_sta_tear_down_BA_sessions+0x43/0x50 [mac80211]
> [<ffffffffa03b23c3>] ieee80211_set_disassoc+0x103/0x240 [mac80211]
> [<ffffffffa03b3b2d>] ieee80211_mgd_assoc+0x8d/0x3a0 [mac80211]
> [<ffffffffa03ba66f>] ieee80211_assoc+0x4f/0x80 [mac80211]
> [<ffffffffa011e5b6>] __cfg80211_mlme_assoc+0x1f6/0x240 [cfg80211]
> [<ffffffffa011e68f>] cfg80211_mlme_assoc+0x8f/0xc0 [cfg80211]
> [<ffffffffa010afd0>] ? cfg80211_get_dev_from_ifindex+0x70/0x80 [cfg80211]
> [<ffffffffa011489a>] nl80211_associate+0x23a/0x260 [cfg80211]
> [<ffffffff812c6c6f>] ? nla_parse+0xef/0x110
> [<ffffffff814ad738>] genl_rcv_msg+0x1d8/0x210
> [<ffffffff81475cf4>] ? sock_alloc_send_pskb+0x1d4/0x330
> [<ffffffff814ad560>] ? genl_rcv_msg+0x0/0x210
> [<ffffffff814ac179>] netlink_rcv_skb+0xa9/0xd0
> [<ffffffff814ad545>] genl_rcv+0x25/0x40
> [<ffffffff814abdd8>] netlink_unicast+0x2c8/0x2e0
> [<ffffffff814acc30>] netlink_sendmsg+0x250/0x360
> [<ffffffff81472643>] sock_sendmsg+0xf3/0x120
> [<ffffffff81562dbe>] ? _raw_spin_lock+0xe/0x20
> [<ffffffff81471105>] ? move_addr_to_kernel+0x65/0x70
> [<ffffffff8147d168>] ? verify_iovec+0x88/0xe0
> [<ffffffff81472d70>] sys_sendmsg+0x240/0x3a0
> [<ffffffff81151b0a>] ? do_readv_writev+0x1aa/0x1f0
> [<ffffffff815604b0>] ? schedule+0x3c0/0xa00
> [<ffffffff81151b98>] ? vfs_writev+0x48/0x60
> [<ffffffff81151cc1>] ? sys_writev+0x51/0xb0
> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
>
> This can be easily reproduced with the test-roam script [1] and
> statically defining the ESS variable with the BSSes of an AP
> in 2.4 GHz band and another one on the 5 GHz band.
>
> Since we end up associated
typo, I meant authenticate
> to two different APs at the same
> time to transmit to either AP consists of an offchannel
and this should read "to transmit to the new AP consinst
of an offchannel"
> operation and mac80211 does have a state machine for this
> but assumes we don't transmit on our operating channel within
> each work item. If we do need to transmit to on the operating
> channel each work item needs to ensure this. This series
> addresses this by addressing a series of possible races on
> the frame's set channel, prevening us from associating to a
> new AP if we were previously associated and haven't yet
> associated, and ensuring we transmit the disassocation on
> the operating channel.
On Mon, Oct 04, 2010 at 06:14:40AM -0700, Johannes Berg wrote:
> On Fri, 2010-10-01 at 16:33 -0400, Luis R. Rodriguez wrote:
>
> > @@ -936,7 +936,6 @@ static void ieee80211_set_disassoc(struct ieee80211_sub_if_data *sdata,
> >
> > memcpy(bssid, ifmgd->associated->bssid, ETH_ALEN);
> >
> > - ifmgd->associated = NULL;
> > memset(ifmgd->bssid, 0, ETH_ALEN);
> >
> > /*
>
> You need to read this comment and the one below it :-)
Missed that.. thanks.
> > @@ -513,6 +513,19 @@ ieee80211_associate(struct ieee80211_work *wk)
> > {
> > struct ieee80211_sub_if_data *sdata = wk->sdata;
> > struct ieee80211_local *local = sdata->local;
> > + struct ieee80211_if_managed *ifmgd = &sdata->u.mgd;
> > + bool associated;
> > +
> > + mutex_lock(&ifmgd->mtx);
> > + associated = !!ifmgd->associated;
> > + mutex_unlock(&ifmgd->mtx);
> > +
> > + if (associated) {
> > + printk(KERN_DEBUG "%s: delaying association with %pM as "
> > + "we are still associated",
> > + sdata->name, wk->filter_ta);
> > + goto wait;
> > + }
> >
> > wk->assoc.tries++;
> > if (wk->assoc.tries > IEEE80211_ASSOC_MAX_TRIES) {
> > @@ -534,6 +547,7 @@ ieee80211_associate(struct ieee80211_work *wk)
> > sdata->name, wk->filter_ta, wk->assoc.tries);
> > ieee80211_send_assoc(sdata, wk);
> >
> > +wait:
> > wk->timeout = jiffies + IEEE80211_ASSOC_TIMEOUT;
> > run_again(local, wk->timeout);
>
> But you'll be staying off-channel for the wait period, so what does this
> really help?
I totally missed this what locks us offchannel here, I though we just re-arm
the timer, and come back offchannel at a later time. What is it that locks
us offchannel until the timer runs again?
Luis
On Mon, Oct 04, 2010 at 11:44:58AM -0700, Johannes Berg wrote:
> On Mon, 2010-10-04 at 11:04 -0700, Luis R. Rodriguez wrote:
> > On Mon, Oct 04, 2010 at 09:39:52AM -0700, Johannes Berg wrote:
> > > On Mon, 2010-10-04 at 09:36 -0700, Luis R. Rodriguez wrote:
> > >
> > > > > > +wait:
> > > > > > wk->timeout = jiffies + IEEE80211_ASSOC_TIMEOUT;
> > > > > > run_again(local, wk->timeout);
> > > > >
> > > > > But you'll be staying off-channel for the wait period, so what does this
> > > > > really help?
> > > >
> > > > I totally missed this what locks us offchannel here, I though we just re-arm
> > > > the timer, and come back offchannel at a later time. What is it that locks
> > > > us offchannel until the timer runs again?
> > >
> > > I believe we stay off-channel as long as the work item is active, after
> > > it has been activated, no?
> >
> > Well I don't see that, the problem here was the assumption that within a work
> > item we can try to transmit a frame for our home channel without changing it.
>
> Yes, that's true, but we do try to stop most things ... we just miss
> some :-)
That's one way of putting it, that's fine, I was under the impression
we did want to send the data though, but if the goal is to *stop* this
that's fine too.
> > If that is desired we must move back to the home channel as I did, but I can
> > see how we'd need more work than what I did, we'd need to start the queues,
> > get out of PS state with the AP and then TX... unless TX already handles
> > that for us.
>
> We don't need to go out of PS state to just TX, but we'd need to be
> careful to TX with asleep bit.
I got what you meant here.
> That said, we don't TX data frames then.
But not here. Right now I am going to assume that we actually are
transmitting some frames for the delba when we try to tear down
the BA agreements with the old AP and the new AP are on the
same band, we just likely transmit it on the wrong channel.
> > ieee80211_work_work() just iterates over all work items, and then bails out.
> > The work loop is protected against local->mtx, and if we call work_work
> > when we either add new work, purge work, or hit a timer. We *try* to prevent
> > frames from being sent on the home channel by calling
> > ieee80211_offchannel_stop_station() but notice how we only stop the queues
> > for NL80211_IFTYPE_STATION interfaces.
>
> No, we just stop the station ones differently from all the others.
Ah and also we did call ieee80211_offchannel_stop_beaconing() prior
to processing work_work stuff so that should take care of stopping
beaconing but that also turns off all TX queues... so yeah you're
right. The race here was just within work items assuming they can
transmit on other channels than the wk->chan.
> > Also this seems buggy, we do not take into consideration how much offchannel
> > work we are doing in consideration against the current AP's DTIM interval as
> > we do when going offchannel for scan work. We should merge that code for
> > this offchannel work_work loop.
>
> True, we don't do _any_ timing here.
We can resolve that later, I'll add that to the TODO list.
Luis
On Mon, 2010-10-04 at 10:35 -0700, Luis R. Rodriguez wrote:
> On Mon, Oct 04, 2010 at 10:30:26AM -0700, Johannes Berg wrote:
> > On Mon, 2010-10-04 at 10:25 -0700, Luis R. Rodriguez wrote:
> > > On Mon, Oct 04, 2010 at 09:42:46AM -0700, Johannes Berg wrote:
> > > > On Mon, 2010-10-04 at 09:41 -0700, Luis R. Rodriguez wrote:
> > > > > On Mon, Oct 04, 2010 at 06:27:25AM -0700, Johannes Berg wrote:
> > > > > > How about this instead of your patches 2 and 3?
> > > > > >
> > > > > > We're disassociating from the old AP since we associate with the new AP,
> > > > > > so there's not a whole lot of sense in explicitly tearing down
> > > > > > aggregation sessions either, right?
> > > > >
> > > > > Good question, but why do we not get a warning when transmitting
> > > > > the disassoc frame?
> > > >
> > > > We aren't transmitting a disassoc frame!
> > >
> > > OK, then the question is if we should even at least do that...
> >
> > No, why should we? We're sending a re-assoc frame to the new AP.
>
> I was wondering if actively disassociating might help with a smoother
> transition. I was under this why we were doing this in the first place.
> I frankly do not know, but if it does not help then I do agree with
> your patch replacement.
Nope, I don't think how it would help at all.
johannes
On Mon, Oct 04, 2010 at 10:35:07AM -0700, Luis R. Rodriguez wrote:
> I was wondering if actively disassociating might help with a smoother
> transition. I was under this why we were doing this in the first place.
> I frankly do not know, but if it does not help then I do agree with
> your patch replacement.
In many cases, it may end up harming more than helping.. At minimum, it
takes some time to transmit the frame (and do the channel changes, if
needed). Furthermore, this makes it more difficult for centrally managed
networks to optimize roaming since we would be disassociating and
associating as a new association instead of doing proper re-association.
In such networks, the APs (or well, likely some sort of central manager)
takes care of clearing the old association when the reassociation is
being processed. In addition, this could potentially tunnel some frames
through the new AP or at least make sure that bridge tables gets
updated.
--
Jouni Malinen PGP id EFC895FA
When we roam from one AP to another within an ESS we keep
our association alive with the old AP while we we authenticate
with the new AP. Once we have been authenticated we then proceed
to associate with the new AP but prior to that we disassociate
with the AP, which includes tearing down our BA agreements if
we had any.
mac80211 uses a queue for authentication and association frames
in case we have to go offchannel for them. In case two work items
happen to be offchannel on the same channel we remain offchannel.
This works well unless we need to send some frames to the old
AP when doing some of this work. In the case of roaming we will
try to disassociate from the old AP right before we associate
but need to ensure we go back to the operating channel prior to
trying to disassociating from the old AP.
This patch has a fix for kernels >= v2.6.34
Cc: [email protected]
Cc: Jouni Malinen <[email protected]>
Cc: Paul Stewart <[email protected]>
Cc: Amod Bodas <[email protected]>
Cc: Johannes Berg <[email protected]>
Cc: Vasanthakumar Thiagarajan <[email protected]>
Signed-off-by: Luis R. Rodriguez <[email protected]>
---
net/mac80211/mlme.c | 9 +++++++++
1 files changed, 9 insertions(+), 0 deletions(-)
diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
index 7eff124..a833255 100644
--- a/net/mac80211/mlme.c
+++ b/net/mac80211/mlme.c
@@ -2185,6 +2185,7 @@ int ieee80211_mgd_assoc(struct ieee80211_sub_if_data *sdata,
{
struct ieee80211_if_managed *ifmgd = &sdata->u.mgd;
struct ieee80211_bss *bss = (void *)req->bss->priv;
+ struct ieee80211_local *local = sdata->local;
struct ieee80211_work *wk;
const u8 *ssid;
int i;
@@ -2203,6 +2204,14 @@ int ieee80211_mgd_assoc(struct ieee80211_sub_if_data *sdata,
return -EALREADY;
}
+ /*
+ * right before this was authentication, that was on
+ * the same the wk->chan so we need to ensure we temporarily
+ * go back to the operating channel to send the disassocation.
+ */
+ local->tmp_channel = NULL;
+ ieee80211_hw_config(local, 0);
+
/* Trying to reassociate - clear previous association state */
ieee80211_set_disassoc(sdata, true);
}
--
1.7.0.4
On Mon, Oct 04, 2010 at 06:27:25AM -0700, Johannes Berg wrote:
> How about this instead of your patches 2 and 3?
>
> We're disassociating from the old AP since we associate with the new AP,
> so there's not a whole lot of sense in explicitly tearing down
> aggregation sessions either, right?
Good question, but why do we not get a warning when transmitting
the disassoc frame? The race is hit when we are on the wrong
channel, are you sure we are not sending a disassoc on a wrong
channel? I would expect a smoother transition if we actually
disassoc fromt the old AP in the ESS and I am not sure if we
do that right now.
Luis
On Mon, Oct 04, 2010 at 09:39:52AM -0700, Johannes Berg wrote:
> On Mon, 2010-10-04 at 09:36 -0700, Luis R. Rodriguez wrote:
>
> > > > +wait:
> > > > wk->timeout = jiffies + IEEE80211_ASSOC_TIMEOUT;
> > > > run_again(local, wk->timeout);
> > >
> > > But you'll be staying off-channel for the wait period, so what does this
> > > really help?
> >
> > I totally missed this what locks us offchannel here, I though we just re-arm
> > the timer, and come back offchannel at a later time. What is it that locks
> > us offchannel until the timer runs again?
>
> I believe we stay off-channel as long as the work item is active, after
> it has been activated, no?
Well I don't see that, the problem here was the assumption that within a work
item we can try to transmit a frame for our home channel without changing it.
If that is desired we must move back to the home channel as I did, but I can
see how we'd need more work than what I did, we'd need to start the queues,
get out of PS state with the AP and then TX... unless TX already handles
that for us.
ieee80211_work_work() just iterates over all work items, and then bails out.
The work loop is protected against local->mtx, and if we call work_work
when we either add new work, purge work, or hit a timer. We *try* to prevent
frames from being sent on the home channel by calling
ieee80211_offchannel_stop_station() but notice how we only stop the queues
for NL80211_IFTYPE_STATION interfaces.
Also this seems buggy, we do not take into consideration how much offchannel
work we are doing in consideration against the current AP's DTIM interval as
we do when going offchannel for scan work. We should merge that code for
this offchannel work_work loop.
Luis
On Tue, 2010-10-05 at 12:36 -0700, Luis R. Rodriguez wrote:
> > Come to think of it -- some p2p stuff in the supplicant might time out
> > if we do this,
>
> Its no different than the case of a DTIM 1 case when we do bgscan
> and need to scan through some passive scan channels.
>
> > I think we need to cut the drivers/mac80211 some more
> > slack in the supplicant and not assume the remain-on-channel will start
> > quickly.
>
> Not sure I follow, do you mean that as per API we want to document
> offchannel operation may loose broadcast/multicast data frames?
Never mind ... it's not really related to this issue much. (but no, I
meant that the offchannel operation might not start quickly)
johannes