In the function ieee80211_tx_dequeue() there is a particular locking
sequence:
begin:
spin_lock(&local->queue_stop_reason_lock);
q_stopped = local->queue_stop_reasons[q];
spin_unlock(&local->queue_stop_reason_lock);
However small the chance (increased by ftracetest), an asynchronous
interrupt can occur in between of spin_lock() and spin_unlock(),
and the interrupt routine will attempt to lock the same
&local->queue_stop_reason_lock again.
This will cause a costly reset of the CPU and the wifi device or an
altogether hang in the single CPU and single core scenario.
This is the probable trace of the deadlock:
Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: Possible unsafe locking scenario:
Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: CPU0
Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: ----
Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: lock(&local->queue_stop_reason_lock);
Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: <Interrupt>
Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: lock(&local->queue_stop_reason_lock);
Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:
*** DEADLOCK ***
Fixes: 4444bc2116ae ("wifi: mac80211: Proper mark iTXQs for resumption")
Link: https://lore.kernel.org/all/[email protected]/
Reported-by: Mirsad Goran Todorovac <[email protected]>
Cc: Gregory Greenman <[email protected]>
Cc: Johannes Berg <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/
Cc: David S. Miller <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Paolo Abeni <[email protected]>
Cc: Alexander Wetzel <[email protected]>
Signed-off-by: Mirsad Goran Todorovac <[email protected]>
---
v2 -> v3:
- Fix the Fixes: tag as advised.
- change the net: to wifi: to comply with the original patch that
is being fixed.
v1 -> v2:
- Minor rewording and clarification.
- Cc:-ed people that replied to the original bug report (forgotten
in v1 by omission).
net/mac80211/tx.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 7699fb410670..45cb8e7bcc61 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -3781,6 +3781,7 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
ieee80211_tx_result r;
struct ieee80211_vif *vif = txq->vif;
int q = vif->hw_queue[txq->ac];
+ unsigned long flags;
bool q_stopped;
WARN_ON_ONCE(softirq_count() == 0);
@@ -3789,9 +3790,9 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
return NULL;
begin:
- spin_lock(&local->queue_stop_reason_lock);
+ spin_lock_irqsave(&local->queue_stop_reason_lock, flags);
q_stopped = local->queue_stop_reasons[q];
- spin_unlock(&local->queue_stop_reason_lock);
+ spin_unlock_irqrestore(&local->queue_stop_reason_lock, flags);
if (unlikely(q_stopped)) {
/* mark for waking later */
--
2.30.2
On Tue, Apr 25, 2023 at 11:35:48AM +0200, Mirsad Goran Todorovac wrote:
> In the function ieee80211_tx_dequeue() there is a particular locking
> sequence:
>
> begin:
> spin_lock(&local->queue_stop_reason_lock);
> q_stopped = local->queue_stop_reasons[q];
> spin_unlock(&local->queue_stop_reason_lock);
>
> However small the chance (increased by ftracetest), an asynchronous
> interrupt can occur in between of spin_lock() and spin_unlock(),
> and the interrupt routine will attempt to lock the same
> &local->queue_stop_reason_lock again.
>
> This will cause a costly reset of the CPU and the wifi device or an
> altogether hang in the single CPU and single core scenario.
>
> This is the probable trace of the deadlock:
>
> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: Possible unsafe locking scenario:
> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: CPU0
> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: ----
> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: lock(&local->queue_stop_reason_lock);
> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: <Interrupt>
> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: lock(&local->queue_stop_reason_lock);
> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:
> *** DEADLOCK ***
Can you please add to the commit message whole lockdep trace?
And please trim "Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: " line prefix,
it doesn't add any value.
Thanks