When the mac80211 layer is used with ath10k, the following may happen:
a) radio card firmware receives ADDBA-REQ from peer
b) radio sends back ADDBA-RESP to peer and signals the ath10k driver
c) ath10k calls back ieee80211_manage_rx_ba_offl() in mac80211
to signal rx ba offloading
d) mac80211::agg-rx.c::ieee80211_manage_rx_ba_offl()
d1) sets a flag: sta->ampdu_mlme.tid_rx_manage_offl
d2) queues a call to ht.c::ieee80211_ba_session_work()
e) ...scheduler runs...
f) ht.c::ieee80211_ba_session_work() checks the flag, clears it
and sets up the rx ba session.
During (e), a fast peer may already have sent a BAREQ which is
propagated to rx.c::ieee80211_rx_h_ctrl(). Since the session is not
yet established, mac80211 sends back a DELBA to the peer, which can
hang the BA session.
The phenomenon can be observed between two QCA988X fw 10.2.4 radios,
using a loop of associate/arping from client to AP/disconnect. After
a few thousand loops, arping does not get a response and a sniffer
detects a DELBA action frame from the client, following an ADDBA.
Fix:
1) check the offload flag in addition to the check for a valid
aggregation session
2) protect the paired checks of (offload flag, valid aggregation session)
with a spinlock against interference from ieee80211_ba_session_work().
Note 1: there is another dubious DELBA generation in
ieee80211_rx_reorder_ampdu(), where the same kind of fix should fit,
but I did not fix it since I knew no easy way to test.
Note 2: this fix was tested against a wireless backport from 5.4-rc8.
Signed-off-by: Jean-Pierre Tosoni <[email protected]>
---
V2: remove debugging code leftovers, sorry for that
V3: use spin_lock_bh instead of a mutex
V4: spinlock must protect ___ieee80211_start_rx_ba_session instead of
___ieee80211_stop_rx_ba_session
Index: bp/net/mac80211/rx.c
===================================================================
--- bp.orig/net/mac80211/rx.c
+++ bp/net/mac80211/rx.c
@@ -3085,11 +3085,18 @@ ieee80211_rx_h_ctrl(struct ieee80211_rx_
tid = le16_to_cpu(bar_data.control) >> 12;
+ spin_lock_bh(&rx->sta->ampdu_mlme.rx_offl_lock);
if (!test_bit(tid, rx->sta->ampdu_mlme.agg_session_valid) &&
- !test_and_set_bit(tid, rx->sta->ampdu_mlme.unexpected_agg))
+ /* back_req is allowed if the fw just received addba */
+ !test_bit(tid, rx->sta->ampdu_mlme.tid_rx_manage_offl) &&
+ !test_and_set_bit(tid, rx->sta->ampdu_mlme.unexpected_agg)) {
+ spin_unlock_bh(&rx->sta->ampdu_mlme.rx_offl_lock);
ieee80211_send_delba(rx->sdata, rx->sta->sta.addr, tid,
WLAN_BACK_RECIPIENT,
WLAN_REASON_QSTA_REQUIRE_SETUP);
+ } else {
+ spin_unlock_bh(&rx->sta->ampdu_mlme.rx_offl_lock);
+ }
tid_agg_rx = rcu_dereference(rx->sta->ampdu_mlme.tid_rx[tid]);
if (!tid_agg_rx)
Index: bp/net/mac80211/ht.c
===================================================================
--- bp.orig/net/mac80211/ht.c
+++ bp/net/mac80211/ht.c
@@ -360,12 +360,14 @@ void ieee80211_ba_session_work(struct wo
sta, tid, WLAN_BACK_RECIPIENT,
WLAN_REASON_UNSPECIFIED, true);
+ spin_lock_bh(&sta->ampdu_mlme.rx_offl_lock);
if (!blocked &&
test_and_clear_bit(tid,
sta->ampdu_mlme.tid_rx_manage_offl))
___ieee80211_start_rx_ba_session(sta, 0, 0, 0, 1, tid,
IEEE80211_MAX_AMPDU_BUF_HT,
false, true, NULL);
+ spin_unlock_bh(&sta->ampdu_mlme.rx_offl_lock);
if (test_and_clear_bit(tid + IEEE80211_NUM_TIDS,
sta->ampdu_mlme.tid_rx_manage_offl))
Index: bp/net/mac80211/sta_info.c
===================================================================
--- bp.orig/net/mac80211/sta_info.c
+++ bp/net/mac80211/sta_info.c
@@ -354,6 +354,7 @@ struct sta_info *sta_info_alloc(struct i
spin_lock_init(&sta->lock);
spin_lock_init(&sta->ps_lock);
+ spin_lock_init(&sta->ampdu_mlme.rx_offl_lock);
INIT_WORK(&sta->drv_deliver_wk, sta_deliver_ps_frames);
INIT_WORK(&sta->ampdu_mlme.work, ieee80211_ba_session_work);
mutex_init(&sta->ampdu_mlme.mtx);
Index: bp/net/mac80211/sta_info.h
===================================================================
--- bp.orig/net/mac80211/sta_info.h
+++ bp/net/mac80211/sta_info.h
@@ -266,6 +266,7 @@ struct tid_ampdu_rx {
* @mtx: mutex to protect all TX data (except non-NULL assignments
* to tid_tx[idx], which are protected by the sta spinlock)
* tid_start_tx is also protected by sta->lock.
+ * @rx_offl_lock: protects transfer from tid_rx_manage_offl to agg_session_valid
* @tid_rx: aggregation info for Rx per TID -- RCU protected
* @tid_rx_token: dialog tokens for valid aggregation sessions
* @tid_rx_timer_expired: bitmap indicating on which TIDs the
@@ -287,6 +288,7 @@ struct tid_ampdu_rx {
struct sta_ampdu_mlme {
struct mutex mtx;
/* rx */
+ spinlock_t rx_offl_lock;
struct tid_ampdu_rx __rcu *tid_rx[IEEE80211_NUM_TIDS];
u8 tid_rx_token[IEEE80211_NUM_TIDS];
unsigned long tid_rx_timer_expired[BITS_TO_LONGS(IEEE80211_NUM_TIDS)];
--
quilt 0.63
On Thu, 2021-10-28 at 10:25 +0000, Jean-Pierre TOSONI wrote:
> + spin_lock_bh(&rx->sta->ampdu_mlme.rx_offl_lock);
> if (!test_bit(tid, rx->sta->ampdu_mlme.agg_session_valid) &&
> - !test_and_set_bit(tid, rx->sta->ampdu_mlme.unexpected_agg))
> + /* back_req is allowed if the fw just received addba */
>
If we're going to add an unconditional lock here, I see little reason to
have all the test_bit()/test_and_set_bit() etc.
I really don't think it's _right_ to add an unconditional lock here
though, if it can at all be avoided.
johannes
> -----Message d'origine-----
> De : Johannes Berg <[email protected]>
> Envoyé : vendredi 26 novembre 2021 13:05
> À : Jean-Pierre TOSONI <[email protected]>; 'linux-
> [email protected]' <[email protected]>
> Objet : Re: [RFC v4] mac80211: fix rx blockack session race condition
>
> On Thu, 2021-10-28 at 10:25 +0000, Jean-Pierre TOSONI wrote:
>
> > + spin_lock_bh(&rx->sta->ampdu_mlme.rx_offl_lock);
> > if (!test_bit(tid, rx->sta->ampdu_mlme.agg_session_valid)
> &&
> > - !test_and_set_bit(tid, rx->sta-
> >ampdu_mlme.unexpected_agg))
> > + /* back_req is allowed if the fw just received addba */
> >
>
> If we're going to add an unconditional lock here, I see little reason to
> have all the test_bit()/test_and_set_bit() etc.
The lock specifically protects the successive testing of agg_session_valid
and tid_rx_manage_offl. So I could just isolate the agg_session_valid
condition as major condition then take the spinlock then test&set
agg_session_valid and tid_rx_manage_offl as a minor condition, so the
spinlock would be conditional. But see below.
>
> I really don't think it's _right_ to add an unconditional lock here
> though, if it can at all be avoided.
>
> johannes
After extensive testing, adding this lock was effective but only removed
half of the DELBA frames wrongly sent. I suspect there is another race
condition somewhere else.
Thinking about it, here the DELBA is sent because the ath10k driver
passes a BAREQ before the signaling the establishment of the BA session.
either the firmware sent the two events in wrong order, or there was
actually an inversion in the frames sent by the host (in my testing,
Wireshark shows the first case only).
In the first case, we should not send a DELBA at all, because actually,
the session is correctly set up. The second case should be detected
and acted upon by ath10k itself since it was responsible for acking the
ADDBA; hence mac80211 should not send a -duplicate- DELBA either.
So, I completely removed this call to ieee80211_send_delba() when
working with ath10k, and my RX BA sessions do not break anymore.
(I added my own hardware flag IEEE80211_HW_RX_AMPDU_IN_HW
similar to TX)
With my hardware it works, but I have no idea how to identify
chipsets/drivers/firmwares that will handle this correctly. So I
cannot submit a generic patch.
Please close this request, I will stay with my proprietary fix. Hopefully
someone with more ath10k/firmware knowledge will find a complete
way to fix.
Thanks for the review,
Jean-Pierre