2018-09-05 14:52:53

by Bob Copeland

[permalink] [raw]
Subject: [PATCH] mac80211: fix pending queue hang due to TX_DROP

In our environment running lots of mesh nodes, we are seeing the
pending queue hang periodically, with the debugfs queues file showing
lines such as:

00: 0x00000000/348

i.e. there are a large number of frames but no stop reason set.

One way this could happen is if queue processing from the pending
tasklet exited early without processing all frames, and without having
some future event (incoming frame, stop reason flag, ...) to reschedule
it.

Exactly this can occur today if ieee80211_tx() returns false due to
packet drops or power-save buffering in the tx handlers. In the
past, this function would return true in such cases, and the change
to false doesn't seem to be intentional. Fix this case by reverting
to the previous behavior.

Fixes: bb42f2d13ffc ("mac80211: Move reorder-sensitive TX handlers to after TXQ dequeue")
Signed-off-by: Bob Copeland <[email protected]>
---
net/mac80211/tx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index e88547842239..6b83dc397c3e 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1907,7 +1907,7 @@ static bool ieee80211_tx(struct ieee80211_sub_if_data *sdata,
sdata->vif.hw_queue[skb_get_queue_mapping(skb)];

if (invoke_tx_handlers_early(&tx))
- return false;
+ return true;

if (ieee80211_queue_skb(local, sdata, tx.sta, tx.skb))
return true;
--
2.11.0


2018-09-05 15:21:53

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: [PATCH] mac80211: fix pending queue hang due to TX_DROP

Bob Copeland <[email protected]> writes:

> In our environment running lots of mesh nodes, we are seeing the
> pending queue hang periodically, with the debugfs queues file showing
> lines such as:
>
> 00: 0x00000000/348
>
> i.e. there are a large number of frames but no stop reason set.
>
> One way this could happen is if queue processing from the pending
> tasklet exited early without processing all frames, and without having
> some future event (incoming frame, stop reason flag, ...) to reschedule
> it.
>
> Exactly this can occur today if ieee80211_tx() returns false due to
> packet drops or power-save buffering in the tx handlers. In the
> past, this function would return true in such cases, and the change
> to false doesn't seem to be intentional.

Can confirm that this was not intentional; nice catch! :)

Acked-by: Toke H=C3=B8iland-J=C3=B8rgensen <[email protected]>

-Toke