Hi Sujith and list(s),
we have found a regression in the IBSS creation/joining part of mac80211 which
is appearently connected to the TSF-syncing patches introduced last year[1].
It prevents beaconing of an adhoc member after rejoining a cell when this cell
is currently empty. The problem is present in at least 3.10 and 3.13.
To reproduce, use two adhoc peers and let them join/leave in the following
order:
station 1: join
station 2: join
station 2: leave
station 1: leave
station 1: join
now we would expect that station 1 sends beacons, but it doesn't. After
inspecting the code, station 1 actually selected the "old" ibss network and
waits for a beacon to sync the tsf which is never received, as all members
already left the network. An easy workaround is to set the IBSS creator always
to true.
Since this kind of "race condition" could be solved in various ways, e.g. a
timeout in ibss code, timeout in ath9k, ... i'd like to hear your opinions or
ideas how to fix it.
Thanks,
Simon
[1] mac80211: Notify new IBSS network creation
(c13a765bd96f4e2f52d218ee6e5c0715380eeeb8)
ath9k: Fix IBSS joiner mode (1a6404a1d8497692f31808319d662c739033c491)
[2] actual commands I used:
iw dev wlan0 ibss join "rejointest" 5180 HT40+ fixed-freq 02:de:ad:be:ee:ef
iw dev wlan0 ibss leave
Simon Wunderlich wrote:
> Yeah, this patch works for my case. I'm not completely sure why we only unlink
> for this special case (no stations & bssid = zero), I don't see why it would
> hurt to always throw away that BSS and rescan on the next join?
>
> I'm CCing Teemu, who introduced this roughly 3.5 years ago ("mac80211: remove
> BSS from cfg80211 list when leaving IBSS",
> 5ea096c0c85e80335889539899af9a4717976e0b) , maybe he can explain it more. I
> couldn't understand that from the commit message and the corresponding mail
> thread.
I think we can drop the BSS when we leave an IBSS network. The only advantage
of not doing so is a slightly faster join command that is given after
leaving - if there are more than 2 nodes in the cell, that is. But it would
be prudent to scan and check if the network exists before trying to join it,
so I'll send the patch for review.
Sujith
On 28/01/14 02:41, Sujith Manoharan wrote:
> Simon Wunderlich wrote:
>> Yeah, this patch works for my case. I'm not completely sure why we only unlink
>> for this special case (no stations & bssid = zero), I don't see why it would
>> hurt to always throw away that BSS and rescan on the next join?
>>
>> I'm CCing Teemu, who introduced this roughly 3.5 years ago ("mac80211: remove
>> BSS from cfg80211 list when leaving IBSS",
>> 5ea096c0c85e80335889539899af9a4717976e0b) , maybe he can explain it more. I
>> couldn't understand that from the commit message and the corresponding mail
>> thread.
>
> I think we can drop the BSS when we leave an IBSS network. The only advantage
> of not doing so is a slightly faster join command that is given after
> leaving - if there are more than 2 nodes in the cell, that is. But it would
> be prudent to scan and check if the network exists before trying to join it,
> so I'll send the patch for review.
Right now no scan takes place if you specify bssid and freq (+
fixed_freq attr). Then no matter if the BSS is already the scan list:
the join command will be fast.
Cheers,
--
Antonio Quartulli
> I'm CCing Teemu, who introduced this roughly 3.5 years ago ("mac80211:
> remove BSS from cfg80211 list when leaving IBSS",
> 5ea096c0c85e80335889539899af9a4717976e0b) , maybe he can explain it more.
> I couldn't understand that from the commit message and the corresponding
> mail thread.
>
> If we don't hear anything or there aren't any further objections, I think
> we can clean this patch and merge it. :)
Nokias mailserver just told me that Teemus e-mail address isn't registered, so
I guess we shouldn't wait for him. :)
Hello Sujith,
> Simon Wunderlich wrote:
> > we have found a regression in the IBSS creation/joining part of mac80211
> > which is appearently connected to the TSF-syncing patches introduced
> > last year[1]. It prevents beaconing of an adhoc member after rejoining a
> > cell when this cell is currently empty. The problem is present in at
> > least 3.10 and 3.13.
> >
> > To reproduce, use two adhoc peers and let them join/leave in the
> > following order:
> >
> > station 1: join
> > station 2: join
> > station 2: leave
> > station 1: leave
> > station 1: join
> >
> > now we would expect that station 1 sends beacons, but it doesn't. After
> > inspecting the code, station 1 actually selected the "old" ibss network
> > and waits for a beacon to sync the tsf which is never received, as all
> > members already left the network. An easy workaround is to set the IBSS
> > creator always to true.
>
> The race condition is that station-1 (the creator) removes station-2 only
> after a while, based on the expiration/inactive timer.
>
> The small window that IEEE80211_IBSS_MERGE_INTERVAL introduces when
> ieee80211_ibss_disconnect() is called causes the race, since we assume that
> station-2 is still active and do not remove the BSS from cfg80211.
>
> I am not sure why we have to keep the BSS around when we are leaving the
> network.
>
> Is this patch the right approach ?
Thanks for the prompt answer!
Yeah, this patch works for my case. I'm not completely sure why we only unlink
for this special case (no stations & bssid = zero), I don't see why it would
hurt to always throw away that BSS and rescan on the next join?
I'm CCing Teemu, who introduced this roughly 3.5 years ago ("mac80211: remove
BSS from cfg80211 list when leaving IBSS",
5ea096c0c85e80335889539899af9a4717976e0b) , maybe he can explain it more. I
couldn't understand that from the commit message and the corresponding mail
thread.
If we don't hear anything or there aren't any further objections, I think we
can clean this patch and merge it. :)
Thanks a lot!
Simon
>
> diff --git a/net/mac80211/ibss.c b/net/mac80211/ibss.c
> index 771080e..e1688cd 100644
> --- a/net/mac80211/ibss.c
> +++ b/net/mac80211/ibss.c
> @@ -688,17 +688,18 @@ static int ieee80211_sta_active_ibss(struct
> ieee80211_sub_if_data *sdata) return active;
> }
>
> -static void ieee80211_ibss_disconnect(struct ieee80211_sub_if_data *sdata)
> +static void ieee80211_ibss_disconnect(struct ieee80211_sub_if_data *sdata,
> bool leave) {
> struct ieee80211_if_ibss *ifibss = &sdata->u.ibss;
> struct ieee80211_local *local = sdata->local;
> struct cfg80211_bss *cbss;
> struct beacon_data *presp;
> struct sta_info *sta;
> - int active_ibss;
> + int active_ibss = 0;
> u16 capability;
>
> - active_ibss = ieee80211_sta_active_ibss(sdata);
> + if (!leave)
> + active_ibss = ieee80211_sta_active_ibss(sdata);
>
> if (!active_ibss && !is_zero_ether_addr(ifibss->bssid)) {
> capability = WLAN_CAPABILITY_IBSS;
> @@ -765,7 +766,7 @@ static void ieee80211_csa_connection_drop_work(struct
> work_struct *work)
>
> sdata_lock(sdata);
>
> - ieee80211_ibss_disconnect(sdata);
> + ieee80211_ibss_disconnect(sdata, false);
> synchronize_rcu();
> skb_queue_purge(&sdata->skb_queue);
>
> @@ -1721,7 +1722,7 @@ int ieee80211_ibss_leave(struct ieee80211_sub_if_data
> *sdata) {
> struct ieee80211_if_ibss *ifibss = &sdata->u.ibss;
>
> - ieee80211_ibss_disconnect(sdata);
> + ieee80211_ibss_disconnect(sdata, true);
> ifibss->ssid_len = 0;
> memset(ifibss->bssid, 0, ETH_ALEN);
>
>
> Sujith
Simon Wunderlich wrote:
> we have found a regression in the IBSS creation/joining part of mac80211 which
> is appearently connected to the TSF-syncing patches introduced last year[1].
> It prevents beaconing of an adhoc member after rejoining a cell when this cell
> is currently empty. The problem is present in at least 3.10 and 3.13.
>
> To reproduce, use two adhoc peers and let them join/leave in the following
> order:
>
> station 1: join
> station 2: join
> station 2: leave
> station 1: leave
> station 1: join
>
> now we would expect that station 1 sends beacons, but it doesn't. After
> inspecting the code, station 1 actually selected the "old" ibss network and
> waits for a beacon to sync the tsf which is never received, as all members
> already left the network. An easy workaround is to set the IBSS creator always
> to true.
The race condition is that station-1 (the creator) removes station-2 only
after a while, based on the expiration/inactive timer.
The small window that IEEE80211_IBSS_MERGE_INTERVAL introduces when
ieee80211_ibss_disconnect() is called causes the race, since we assume that
station-2 is still active and do not remove the BSS from cfg80211.
I am not sure why we have to keep the BSS around when we are leaving the network.
Is this patch the right approach ?
diff --git a/net/mac80211/ibss.c b/net/mac80211/ibss.c
index 771080e..e1688cd 100644
--- a/net/mac80211/ibss.c
+++ b/net/mac80211/ibss.c
@@ -688,17 +688,18 @@ static int ieee80211_sta_active_ibss(struct ieee80211_sub_if_data *sdata)
return active;
}
-static void ieee80211_ibss_disconnect(struct ieee80211_sub_if_data *sdata)
+static void ieee80211_ibss_disconnect(struct ieee80211_sub_if_data *sdata, bool leave)
{
struct ieee80211_if_ibss *ifibss = &sdata->u.ibss;
struct ieee80211_local *local = sdata->local;
struct cfg80211_bss *cbss;
struct beacon_data *presp;
struct sta_info *sta;
- int active_ibss;
+ int active_ibss = 0;
u16 capability;
- active_ibss = ieee80211_sta_active_ibss(sdata);
+ if (!leave)
+ active_ibss = ieee80211_sta_active_ibss(sdata);
if (!active_ibss && !is_zero_ether_addr(ifibss->bssid)) {
capability = WLAN_CAPABILITY_IBSS;
@@ -765,7 +766,7 @@ static void ieee80211_csa_connection_drop_work(struct work_struct *work)
sdata_lock(sdata);
- ieee80211_ibss_disconnect(sdata);
+ ieee80211_ibss_disconnect(sdata, false);
synchronize_rcu();
skb_queue_purge(&sdata->skb_queue);
@@ -1721,7 +1722,7 @@ int ieee80211_ibss_leave(struct ieee80211_sub_if_data *sdata)
{
struct ieee80211_if_ibss *ifibss = &sdata->u.ibss;
- ieee80211_ibss_disconnect(sdata);
+ ieee80211_ibss_disconnect(sdata, true);
ifibss->ssid_len = 0;
memset(ifibss->bssid, 0, ETH_ALEN);
Sujith