2015-10-06 13:46:26

by Michael Braun

[permalink] [raw]
Subject: Deadlock in mac80211 running 3.18.11 + compat-wireless 2015-03-09

Dear Maintainer,

I'm running a custom version of OpenWRT based on Linux 3.18.11 with
compat-wireless 2015-03-09 and am sometimes experiencing deadlock
warnings on P1020WLAN (PPC).
Thought, I think I have reason to believe that the modifications do not
affect the deadlock; I currently do not have to opportunity to test an
unmodified kernel.
Please find a backtrace attached. It makes more sense when replacing
minstrel_remove_sta_debugfs with minstrel_ht_get_rate ->
minstrel_aggr_check and
ieee80211_proberesp_get, ieee80211_get_buffered_bc with ieee80211_xmit
and ieee80211_tx_h_rate_ctrl.

This occurs while running infrastructure (AP) mode and IBSS
simultaneously.

The important part (stack):

CPU 0
1. ieee80211_subif_start_xmit
2. ieee80211_get_buffered_bc
3. ieee80211_proberesp_get
4. rate_control_get_rate -> acquires sta->rate_lock
5. minstrel_ht_get_rate
6. minstrel_aggr_check
7. ieee80211_start_tx_ba_session -> wait for sta->lock

CPU 1
1. ieee80211_ibss_leave
2. ieee80211_stop_tx_ba_cb -> acquires sta->lock
3. ieee80211_send_delba
4. ieee80211_tx_skb
5. ieee80211_tx_skb_tid
6. __ieee80211_tx_skb_tid_band
7. ieee80211_xmit
8. ieee80211_tx
9. invoke_tx_handlers
10. ieee80211_tx_h_rate_ctrl
11. rate_control_get_rate -> wait for sta->rate_lock

I'm unsure how to address this. Replacing sta->rate_lock with sta->lock
breaks due to spinlock nesting and might be overkill.
If there are patches I could test them.
If you believe that this is not valid upstream issue, please let me
know.

Thanks,
M. Braun


Attachments:
deadlock-trace.txt (6.08 kB)

2015-10-06 13:42:56

by Johannes Berg

[permalink] [raw]
Subject: Re: Deadlock in mac80211 running 3.18.11 + compat-wireless 2015-03-09

commit 2c158887f1185e04b3763ae346da9f71fcbc4429
Author: Johannes Berg <[email protected]>
Date: Thu Mar 12 19:28:31 2015 +0100

mac80211: agg-tx: avoid sending DelBA with sta->lock held

The rate control locking caused a potential deadlock here due to
the
locks being acquired in different orders, so that change cannot yet
be applied. However, there's no fundamental reason for this code to
hold the sta->lock while transmitting frames.

Clearly it's better not to hold the lock for longer periods of
time,
which can happen here since we call all the way down to the driver.
Change the code a bit to not hold it while doing that.

Signed-off-by: Johannes Berg <[email protected]>