2023-08-01 18:06:48

by Sven Eckelmann

[permalink] [raw]
Subject: [PATCH RFC] ath11k: Don't drop tx_status when peer cannot be found

When a station idles for a long time, hostapd will try to send a QoS Null
frame to the station as "poll". NL80211_CMD_PROBE_CLIENT is used for this
purpose. And the skb will be added to ack_status_frame - waiting for a
tx_complete via ieee80211_tx_status*();

But when the peer was already removed before the tx_complete arrives, the
peer will be missing and thus the entry will not be removed from
ack_status_frame. This IDR will therefore run full after 8K clients which
disappeared this way - the access point will then just stall and not allow
any new clients because idr_alloc for ack_status_frame will fail.

Tested-on: IPQ6018 hw1.0 WLAN.HK.2.5.0.1-01100-QCAHKSWPL_SILICONZ-1

Fixes: 6257c702264c ("wifi: ath11k: fix tx status reporting in encap offload mode")
Fixes: 94739d45c388 ("ath11k: switch to using ieee80211_tx_status_ext()")
Signed-off-by: Sven Eckelmann <[email protected]>
---
This problem can be seen with QCA's ath11k fork as:

attach ack fail -28

when new clients try to connect - and connection attempt will obviously
fail. Most likely with an "deauthenticated due to inactivity (timer
DEAUTH/REMOVE)" by hostapd.

And the fix (required for both platches) would then be something like:

--- a/drivers/net/wireless/ath/ath11k/dp_tx.c
+++ b/drivers/net/wireless/ath/ath11k/dp_tx.c
@@ -629,8 +629,14 @@ static void ath11k_dp_tx_complete_msdu(struct ath11k *ar,
"dp_tx: failed to find the peer with peer_id %d\n",
ts->peer_id);
spin_unlock_bh(&ab->base_lock);
- dev_kfree_skb_any(msdu);
- goto exit;
+ rcu_read_unlock();
+
+ if (skb_cb->flags & ATH11K_SKB_HW_80211_ENCAP)
+ ieee80211_tx_status_8023(ar->hw, skb_cb->vif, msdu);
+ else
+ ieee80211_tx_status(ar->hw, msdu);
+
+ return;
}
arsta = (struct ath11k_sta *)peer->sta->drv_priv;
status.sta = peer->sta;

But this is not possible any longer because Felix Fietkau removed
ieee80211_tx_status_8023 in commit 9ae708f00161 ("wifi: mac80211: remove
ieee80211_tx_status_8023") - and the function ieee80211_lookup_ra_sta
(required for this task) is currently not exported. And the sta information
is required to reach the ieee80211_sta_tx_notify code section in
ieee80211_tx_status_ext()
---
drivers/net/wireless/ath/ath11k/dp_tx.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/ath/ath11k/dp_tx.c b/drivers/net/wireless/ath/ath11k/dp_tx.c
index a34833de7c67..c25283a11d27 100644
--- a/drivers/net/wireless/ath/ath11k/dp_tx.c
+++ b/drivers/net/wireless/ath/ath11k/dp_tx.c
@@ -369,7 +369,14 @@ ath11k_dp_tx_htt_tx_complete_buf(struct ath11k_base *ab,
"dp_tx: failed to find the peer with peer_id %d\n",
ts->peer_id);
spin_unlock_bh(&ab->base_lock);
- dev_kfree_skb_any(msdu);
+
+ /* report back to clean up resources - even when the peer
+ * is not known by ath11k.
+ *
+ * TODO handle also for ATH11K_SKB_HW_80211_ENCAP
+ */
+ if (!(skb_cb->flags & ATH11K_SKB_HW_80211_ENCAP))
+ ieee80211_tx_status(ar->hw, msdu);
return;
}
spin_unlock_bh(&ab->base_lock);
@@ -624,7 +631,14 @@ static void ath11k_dp_tx_complete_msdu(struct ath11k *ar,
"dp_tx: failed to find the peer with peer_id %d\n",
ts->peer_id);
spin_unlock_bh(&ab->base_lock);
- dev_kfree_skb_any(msdu);
+
+ /* report back to clean up resources - even when the peer
+ * is not known by ath11k.
+ *
+ * TODO handle also for ATH11K_SKB_HW_80211_ENCAP
+ */
+ if (!(skb_cb->flags & ATH11K_SKB_HW_80211_ENCAP))
+ ieee80211_tx_status(ar->hw, msdu);
return;
}
arsta = (struct ath11k_sta *)peer->sta->drv_priv;

---
base-commit: 1d7dd5aa35474e553b8671b58579e0749b560779
change-id: 20230801-ath11k-ack_status_leak-2338eda5a721

Best regards,
--
Sven Eckelmann <[email protected]>



2023-08-01 18:39:38

by Felix Fietkau

[permalink] [raw]
Subject: Re: [PATCH RFC] ath11k: Don't drop tx_status when peer cannot be found

On 01.08.23 19:38, Sven Eckelmann wrote:
> When a station idles for a long time, hostapd will try to send a QoS Null
> frame to the station as "poll". NL80211_CMD_PROBE_CLIENT is used for this
> purpose. And the skb will be added to ack_status_frame - waiting for a
> tx_complete via ieee80211_tx_status*();
>
> But when the peer was already removed before the tx_complete arrives, the
> peer will be missing and thus the entry will not be removed from
> ack_status_frame. This IDR will therefore run full after 8K clients which
> disappeared this way - the access point will then just stall and not allow
> any new clients because idr_alloc for ack_status_frame will fail.
>
> Tested-on: IPQ6018 hw1.0 WLAN.HK.2.5.0.1-01100-QCAHKSWPL_SILICONZ-1
>
> Fixes: 6257c702264c ("wifi: ath11k: fix tx status reporting in encap offload mode")
> Fixes: 94739d45c388 ("ath11k: switch to using ieee80211_tx_status_ext()")
> Signed-off-by: Sven Eckelmann <[email protected]>
> ---
> This problem can be seen with QCA's ath11k fork as:
>
> attach ack fail -28
>
> when new clients try to connect - and connection attempt will obviously
> fail. Most likely with an "deauthenticated due to inactivity (timer
> DEAUTH/REMOVE)" by hostapd.
>
> And the fix (required for both platches) would then be something like:
>
> --- a/drivers/net/wireless/ath/ath11k/dp_tx.c
> +++ b/drivers/net/wireless/ath/ath11k/dp_tx.c
> @@ -629,8 +629,14 @@ static void ath11k_dp_tx_complete_msdu(struct ath11k *ar,
> "dp_tx: failed to find the peer with peer_id %d\n",
> ts->peer_id);
> spin_unlock_bh(&ab->base_lock);
> - dev_kfree_skb_any(msdu);
> - goto exit;
> + rcu_read_unlock();
> +
> + if (skb_cb->flags & ATH11K_SKB_HW_80211_ENCAP)
> + ieee80211_tx_status_8023(ar->hw, skb_cb->vif, msdu);
> + else
> + ieee80211_tx_status(ar->hw, msdu);
> +
> + return;
> }
> arsta = (struct ath11k_sta *)peer->sta->drv_priv;
> status.sta = peer->sta;
>
> But this is not possible any longer because Felix Fietkau removed
> ieee80211_tx_status_8023 in commit 9ae708f00161 ("wifi: mac80211: remove
> ieee80211_tx_status_8023") - and the function ieee80211_lookup_ra_sta
> (required for this task) is currently not exported. And the sta information
> is required to reach the ieee80211_sta_tx_notify code section in
> ieee80211_tx_status_ext()

This does not make much sense to me. ieee80211_sta_tx_notify is specific
to interfaces running in client mode, thus unrelated to anything hostapd
is doing. It's a different kind of probing than the one you're looking into.

If the status information is irrelevant to mac80211/hostapd, then there
really is no need to call ieee80211_tx_status* here.

The main bug is the fact that dev_kfree_skb* must not be called for tx
packets passed from mac80211. If you replace it with a call to
ieee80211_free_txskb, the bug goes away.

One more note regarding ieee80211_tx_status_8023 - I removed it not only
because it was unused, but because it should never be used at all. Its
call to ieee80211_lookup_ra_sta is guaranteed to be broken whenever
4-address mode AP_VLAN is being used (since the driver cannot pass the
correct vif).

- Felix

2023-08-01 22:11:55

by Sven Eckelmann

[permalink] [raw]
Subject: Re: [PATCH RFC] ath11k: Don't drop tx_status when peer cannot be found

On Tuesday, 1 August 2023 20:11:51 CEST Felix Fietkau wrote:
[...]
> > when new clients try to connect - and connection attempt will obviously
> > fail. Most likely with an "deauthenticated due to inactivity (timer
> > DEAUTH/REMOVE)" by hostapd.
> >
> > And the fix (required for both platches) would then be something like:
> >
> > --- a/drivers/net/wireless/ath/ath11k/dp_tx.c
> > +++ b/drivers/net/wireless/ath/ath11k/dp_tx.c
> > @@ -629,8 +629,14 @@ static void ath11k_dp_tx_complete_msdu(struct ath11k *ar,
> > "dp_tx: failed to find the peer with peer_id %d\n",
> > ts->peer_id);
> > spin_unlock_bh(&ab->base_lock);
> > - dev_kfree_skb_any(msdu);
> > - goto exit;
> > + rcu_read_unlock();
> > +
> > + if (skb_cb->flags & ATH11K_SKB_HW_80211_ENCAP)
> > + ieee80211_tx_status_8023(ar->hw, skb_cb->vif, msdu);
> > + else
> > + ieee80211_tx_status(ar->hw, msdu);
> > +
> > + return;
> > }
> > arsta = (struct ath11k_sta *)peer->sta->drv_priv;
> > status.sta = peer->sta;
> >
> > But this is not possible any longer because Felix Fietkau removed
> > ieee80211_tx_status_8023 in commit 9ae708f00161 ("wifi: mac80211: remove
> > ieee80211_tx_status_8023") - and the function ieee80211_lookup_ra_sta
> > (required for this task) is currently not exported. And the sta information
> > is required to reach the ieee80211_sta_tx_notify code section in
> > ieee80211_tx_status_ext()
>
> This does not make much sense to me. ieee80211_sta_tx_notify is specific
> to interfaces running in client mode, thus unrelated to anything hostapd
> is doing. It's a different kind of probing than the one you're looking into.

Sorry, copied something to my notes and then mixed up basically everything
after that. Interesting for the fix was only that it reaches
ieee80211_report_ack_skb via ieee80211_report_used_skb. This can either be
done via __ieee80211_tx_status/ieee80211_tx_status_ext (which doesn't have any
dependency to the sta - which I incorrectly said earlier) or via
ieee80211_free_txskb (which I missed earlier)

[...]
> The main bug is the fact that dev_kfree_skb* must not be called for tx
> packets passed from mac80211. If you replace it with a call to
> ieee80211_free_txskb, the bug goes away.

Thanks for the hint. Will submit an actual patch with your recommended
replacement.

Kind regards,
Sven


Attachments:
signature.asc (849.00 B)
This is a digitally signed message part.