2015-03-05 17:07:24

by Johannes Berg

[permalink] [raw]
Subject: [RFC v2] [RFC] mac80211: lock rate control

From: Johannes Berg <[email protected]>

Both minstrel (reported by Sven Eckelmann) and the iwlwifi rate
control aren't properly taking concurrency into account. It's
likely that the same is true for other rate control algorithms.

In the case of minstrel this manifests itself in crashes when an
update and other data access are run concurrently, for example
when the stations change bandwidth or similar. In iwlwifi, this
can cause firmware crashes.

Since fixing all rate control algorithms will be very difficult,
just provide locking for invocations. This protects the internal
data structures the algorithms maintain.

I've manipulated hostapd to test this, by having it change its
advertised bandwidth roughly ever 150ms. At the same time, I'm
running a flood ping between the client and the AP, which causes
this race of update vs. get_rate/status to easily happen on the
client. With this change, the system survives this test.

Change-Id: I77383401a67a1ec380cec65b34802ab879357e80
Reported-by: Sven Eckelmann <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
---
net/mac80211/rate.c | 8 +++++++-
net/mac80211/rate.h | 12 +++++++++---
net/mac80211/sta_info.c | 2 +-
net/mac80211/sta_info.h | 1 +
4 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/net/mac80211/rate.c b/net/mac80211/rate.c
index bc5270ed26bd..68bed2cad6f3 100644
--- a/net/mac80211/rate.c
+++ b/net/mac80211/rate.c
@@ -683,7 +683,13 @@ void rate_control_get_rate(struct ieee80211_sub_if_data *sdata,
if (sdata->local->hw.flags & IEEE80211_HW_HAS_RATE_CONTROL)
return;

- ref->ops->get_rate(ref->priv, ista, priv_sta, txrc);
+ if (ista) {
+ spin_lock_bh(&sta->rate_ctrl_lock);
+ ref->ops->get_rate(ref->priv, ista, priv_sta, txrc);
+ spin_unlock_bh(&sta->rate_ctrl_lock);
+ } else {
+ ref->ops->get_rate(ref->priv, NULL, NULL, txrc);
+ }

if (sdata->local->hw.flags & IEEE80211_HW_SUPPORTS_RC_TABLE)
return;
diff --git a/net/mac80211/rate.h b/net/mac80211/rate.h
index a7d5439322d7..d0ecc8952fa0 100644
--- a/net/mac80211/rate.h
+++ b/net/mac80211/rate.h
@@ -42,10 +42,12 @@ static inline void rate_control_tx_status(struct ieee80211_local *local,
if (!ref || !test_sta_flag(sta, WLAN_STA_RATE_CONTROL))
return;

+ spin_lock_bh(&sta->rate_ctrl_lock);
if (ref->ops->tx_status)
ref->ops->tx_status(ref->priv, sband, ista, priv_sta, skb);
else
ref->ops->tx_status_noskb(ref->priv, sband, ista, priv_sta, info);
+ spin_unlock_bh(&sta->rate_ctrl_lock);
}

static inline void
@@ -64,7 +66,9 @@ rate_control_tx_status_noskb(struct ieee80211_local *local,
if (WARN_ON_ONCE(!ref->ops->tx_status_noskb))
return;

+ spin_lock_bh(&sta->rate_ctrl_lock);
ref->ops->tx_status_noskb(ref->priv, sband, ista, priv_sta, info);
+ spin_unlock_bh(&sta->rate_ctrl_lock);
}

static inline void rate_control_rate_init(struct sta_info *sta)
@@ -115,18 +119,20 @@ static inline void rate_control_rate_update(struct ieee80211_local *local,
return;
}

+ spin_lock_bh(&sta->rate_ctrl_lock);
ref->ops->rate_update(ref->priv, sband, &chanctx_conf->def,
ista, priv_sta, changed);
+ spin_unlock_bh(&sta->rate_ctrl_lock);
rcu_read_unlock();
}
drv_sta_rc_update(local, sta->sdata, &sta->sta, changed);
}

static inline void *rate_control_alloc_sta(struct rate_control_ref *ref,
- struct ieee80211_sta *sta,
- gfp_t gfp)
+ struct sta_info *sta, gfp_t gfp)
{
- return ref->ops->alloc_sta(ref->priv, sta, gfp);
+ spin_lock_init(&sta->rate_ctrl_lock);
+ return ref->ops->alloc_sta(ref->priv, &sta->sta, gfp);
}

static inline void rate_control_free_sta(struct sta_info *sta)
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index 11c1e71c833d..eab8d8afab13 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -292,7 +292,7 @@ static int sta_prepare_rate_control(struct ieee80211_local *local,

sta->rate_ctrl = local->rate_ctrl;
sta->rate_ctrl_priv = rate_control_alloc_sta(sta->rate_ctrl,
- &sta->sta, gfp);
+ sta, gfp);
if (!sta->rate_ctrl_priv)
return -ENOMEM;

diff --git a/net/mac80211/sta_info.h b/net/mac80211/sta_info.h
index 36a14a0be6dd..b53c6ca3179a 100644
--- a/net/mac80211/sta_info.h
+++ b/net/mac80211/sta_info.h
@@ -396,6 +396,7 @@ struct sta_info {
u8 ptk_idx;
struct rate_control_ref *rate_ctrl;
void *rate_ctrl_priv;
+ spinlock_t rate_ctrl_lock;
spinlock_t lock;

struct work_struct drv_deliver_wk;
--
2.1.4



2015-03-06 10:55:51

by Johannes Berg

[permalink] [raw]
Subject: Re: [RFC v2] [RFC] mac80211: lock rate control

On Fri, 2015-03-06 at 11:47 +0100, Sven Eckelmann wrote:

> The function rate_control_rate_init is called by the IBSS code (in my OpenWrt
> compat-wireless 2014-11-04 snapshot) in ieee80211_ibss_finish_sta and
> ieee80211_rx_bss_info after an rates_updated. At least the latter one sounds
> suspicious.

Yeah, I forgot IBSS is doing strange things here.

> I will just add following modification to your patch and ask for a new test.

Let me know how that goes :)

johannes


2015-03-06 10:47:53

by Sven Eckelmann

[permalink] [raw]
Subject: Re: [RFC v2] [RFC] mac80211: lock rate control

On Thursday 05 March 2015 18:07:17 Johannes Berg wrote:
> From: Johannes Berg <[email protected]>
>
> Both minstrel (reported by Sven Eckelmann) and the iwlwifi rate
> control aren't properly taking concurrency into account. It's
> likely that the same is true for other rate control algorithms.
>
> In the case of minstrel this manifests itself in crashes when an
> update and other data access are run concurrently, for example
> when the stations change bandwidth or similar. In iwlwifi, this
> can cause firmware crashes.
>
> Since fixing all rate control algorithms will be very difficult,
> just provide locking for invocations. This protects the internal
> data structures the algorithms maintain.
>
> I've manipulated hostapd to test this, by having it change its
> advertised bandwidth roughly ever 150ms. At the same time, I'm
> running a flood ping between the client and the AP, which causes
> this race of update vs. get_rate/status to easily happen on the
> client. With this change, the system survives this test.
>
> Change-Id: I77383401a67a1ec380cec65b34802ab879357e80
> Reported-by: Sven Eckelmann <[email protected]>
> Signed-off-by: Johannes Berg <[email protected]>
> ---

Thanks for the second patch. I got a message today that the patched wifi
driver was tested at least on one network and it still crashes. You can see
the attached logs but they are mostly the same as before - just with some
addresses changed due to the patch.

I've looked through the code and didn't find a situation where the tx_status
function pointer is called without the lock. Same for the rate_update. But
there is something in minstrel_ht which you most likely don't have in the
iwlwifi code:

minstrel_ht_update_caps is called by rate_update and rate_init

My hypothesis would be that rate_init is called for a sta during tx_status and
this causes one of the crashes. This sounds a little bit weird but at least
one thing that can be checked. I am not sure in which state the device is but
it is most likely running AP vifs and one IBSS vif.

The function rate_control_rate_init is called by the IBSS code (in my OpenWrt
compat-wireless 2014-11-04 snapshot) in ieee80211_ibss_finish_sta and
ieee80211_rx_bss_info after an rates_updated. At least the latter one sounds
suspicious.

I will just add following modification to your patch and ask for a new test.
But feel free to point out any problem in my hypothesis.


@@ -69,8 +71,10 @@ static inline void rate_control_rate_ini

sband = local->hw.wiphy->bands[chanctx_conf->def.chan->band];

+ spin_lock_bh(&sta->rate_ctrl_lock);
ref->ops->rate_init(ref->priv, sband, &chanctx_conf->def, ista,
priv_sta);
+ spin_unlock_bh(&sta->rate_ctrl_lock);
rcu_read_unlock();
set_sta_flag(sta, WLAN_STA_RATE_CONTROL);
}

Kind regards,
Sven


Attachments:
minstel_crash.txt (8.86 kB)

2015-03-09 08:35:51

by Johannes Berg

[permalink] [raw]
Subject: Re: [RFC v2] [RFC] mac80211: lock rate control

On Mon, 2015-03-09 at 09:32 +0100, Sven Eckelmann wrote:
> On Friday 06 March 2015 11:55:48 Johannes Berg wrote:
> > > I will just add following modification to your patch and ask for a new
> > > test.
> >
> > Let me know how that goes :)
>
> I got a reply and they informed me that the test looked promising. The test
> setup didn't crash and was running stable for hours. It looks like the extra
> locking in rate_control_rate_init around ref->ops->rate_init was the missing
> part in your patch that was necessary fix the problem for them.

Great, thanks for testing.

johannes


2015-03-09 08:32:13

by Sven Eckelmann

[permalink] [raw]
Subject: Re: [RFC v2] [RFC] mac80211: lock rate control

On Friday 06 March 2015 11:55:48 Johannes Berg wrote:
> > I will just add following modification to your patch and ask for a new
> > test.
>
> Let me know how that goes :)

I got a reply and they informed me that the test looked promising. The test
setup didn't crash and was running stable for hours. It looks like the extra
locking in rate_control_rate_init around ref->ops->rate_init was the missing
part in your patch that was necessary fix the problem for them.

Kind regards,
Sven