2010-11-17 21:19:31

by Ben Greear

[permalink] [raw]
Subject: recursive locking on wireless-testing.

I found this while testing wpa_supplicant that shares scan results.
The kernel has no scan-sharing hacks in it..just a few patches
I've been using for a while (and the deadlock prevention patch
previously mentioned in other threads).


Nov 17 13:16:25 ath9k kernel: ieee80211 wiphy0: Removed STA 00:14:d1:c6:d2:54
Nov 17 13:16:25 ath9k kernel: ieee80211 wiphy0: Destroyed STA 00:14:d1:c6:d2:54

=============================================
[ INFO: possible recursive locking detected ]
2.6.37-rc1-wl+ #48
---------------------------------------------
wpa_supplicant/12334 is trying to acquire lock:
(&(&txq->axq_lock)->rlock){+.-...}, at: [<f8fe90aa>] ath_tx_complete_buf+0x1d4/0x26c [ath9k]

but task is already holding lock:
(&(&txq->axq_lock)->rlock){+.-...}, at: [<f8fe9ce6>] ath_tx_flush_tid+0x41/0xb6 [ath9k]

other info that might help us debug this:
6 locks held by wpa_supplicant/12334:
#0: (rtnl_mutex){+.+.+.}, at: [<786ffe97>] rtnl_lock+0xf/0x11
#1: (&wdev->mtx){+.+.+.}, at: [<f8bbc5a9>] cfg80211_wext_siwmlme+0x41/0x85 [cfg80211]
#2: (&ifmgd->mtx){+.+.+.}, at: [<f8f1b4ce>] ieee80211_mgd_deauth+0x28/0x1af [mac80211]
#3: (&local->sta_mtx){+.+.+.}, at: [<f8f1b150>] ieee80211_set_disassoc+0xab/0x1bc [mac80211]
#4: (&sta->ampdu_mlme.mtx){+.+...}, at: [<f8f19096>] __ieee80211_stop_tx_ba_session+0x25/0x4c [mac80211]
#5: (&(&txq->axq_lock)->rlock){+.-...}, at: [<f8fe9ce6>] ath_tx_flush_tid+0x41/0xb6 [ath9k]

stack backtrace:
Pid: 12334, comm: wpa_supplicant Not tainted 2.6.37-rc1-wl+ #48
Call Trace:
[<7878bf56>] ? printk+0x18/0x1a
[<7845bb58>] __lock_acquire+0xb14/0xb8b
[<784593ff>] ? register_lock_class+0x17/0x297
[<7845bc41>] lock_acquire+0x72/0x8d
[<f8fe90aa>] ? ath_tx_complete_buf+0x1d4/0x26c [ath9k]
[<7878de3a>] _raw_spin_lock_bh+0x38/0x45
[<f8fe90aa>] ? ath_tx_complete_buf+0x1d4/0x26c [ath9k]
[<f8fe90aa>] ath_tx_complete_buf+0x1d4/0x26c [ath9k]
[<f8fe9d31>] ath_tx_flush_tid+0x8c/0xb6 [ath9k]
[<f8fea716>] ath_tx_aggr_stop+0x7e/0x86 [ath9k]
[<f8fe56bd>] ath9k_ampdu_action+0x93/0xf4 [ath9k]
[<f8fe562a>] ? ath9k_ampdu_action+0x0/0xf4 [ath9k]
[<f8f18708>] drv_ampdu_action+0x60/0x68 [mac80211]
[<f8f18faf>] ___ieee80211_stop_tx_ba_session+0xde/0xfd [mac80211]
[<f8f190aa>] __ieee80211_stop_tx_ba_session+0x39/0x4c [mac80211]
[<f8f18683>] ieee80211_sta_tear_down_BA_sessions+0x31/0x56 [mac80211]
[<f8f1b0a0>] ? set_sta_flags+0x23/0x28 [mac80211]
[<f8f1b175>] ieee80211_set_disassoc+0xd0/0x1bc [mac80211]
[<f8f1b4f5>] ieee80211_mgd_deauth+0x4f/0x1af [mac80211]
[<f8f22cf1>] ieee80211_deauth+0x14/0x16 [mac80211]
[<f8bb71e9>] __cfg80211_mlme_deauth+0x105/0x10d [cfg80211]
[<f8bb994e>] __cfg80211_disconnect+0x112/0x199 [cfg80211]
[<f8bbc5cc>] cfg80211_wext_siwmlme+0x64/0x85 [cfg80211]
[<7876e089>] ioctl_standard_call+0x1f0/0x28e
[<786f2b2b>] ? dev_name_hash+0x16/0x48
[<786f653c>] ? __dev_get_by_name+0x32/0x3d
[<7876e1b4>] wext_handle_ioctl+0x8d/0x18d
[<f8bbc568>] ? cfg80211_wext_siwmlme+0x0/0x85 [cfg80211]
[<786f7669>] dev_ioctl+0x520/0x53f
[<785977bb>] ? copy_to_user+0x2f/0x108
[<786e69dc>] ? sys_recvfrom+0xb8/0xc6
[<786e5d1f>] ? sock_ioctl+0x0/0x202
[<786e5f15>] sock_ioctl+0x1f6/0x202
[<786e5d1f>] ? sock_ioctl+0x0/0x202
[<784cc071>] do_vfs_ioctl+0x56d/0x5c3
[<784c130d>] ? fcheck_files+0x9b/0xca
[<784c1369>] ? fget_light+0x2d/0xb0
[<784cc10a>] sys_ioctl+0x43/0x62
[<784030dc>] sysenter_do_call+0x12/0x38
Nov 17 13:16:25 ath9k kernel:
Nov 17 13:16:25 ath9k kernel: =============================================
Nov 17 13:16:25 ath9k kernel: [ INFO: possible recursive locking detected ]
Nov 17 13:16:25 ath9k kernel: 2.6.37-rc1-wl+ #48
Nov 17 13:16:25 ath9k kernel: ---------------------------------------------
Nov 17 13:16:25 ath9k kernel: wpa_supplicant/12334 is trying to acquire lock:
Nov 17 13:16:25 ath9k kernel: (&(&txq->axq_lock)->rlock){+.-...}, at: [<f8fe90aa>] ath_tx_complete_buf+0x1d4/0x26c [ath9k]
Nov 17 13:16:25 ath9k kernel:
Nov 17 13:16:25 ath9k kernel: but task is already holding lock:
Nov 17 13:16:25 ath9k kernel: (&(&txq->axq_lock)->rlock){+.-...}, at: [<f8fe9ce6>] ath_tx_flush_tid+0x41/0xb6 [ath9k]
Nov 17 13:16:25 ath9k kernel:
Nov 17 13:16:25 ath9k kernel: other info that might help us debug this:
Nov 17 13:16:25 ath9k kernel: 6 locks held by wpa_supplicant/12334:
Nov 17 13:16:25 ath9k kernel: #0: (rtnl_mutex){+.+.+.}, at: [<786ffe97>] rtnl_lock+0xf/0x11
Nov 17 13:16:25 ath9k kernel: #1: (&wdev->mtx){+.+.+.}, at: [<f8bbc5a9>] cfg80211_wext_siwmlme+0x41/0x85 [cfg80211]
Nov 17 13:16:25 ath9k kernel: #2: (&ifmgd->mtx){+.+.+.}, at: [<f8f1b4ce>] ieee80211_mgd_deauth+0x28/0x1af [mac80211]
Nov 17 13:16:25 ath9k kernel: #3: (&local->sta_mtx){+.+.+.}, at: [<f8f1b150>] ieee80211_set_disassoc+0xab/0x1bc [mac80211]
Nov 17 13:16:25 ath9k kernel: #4: (&sta->ampdu_mlme.mtx){+.+...}, at: [<f8f19096>] __ieee80211_stop_tx_ba_session+0x25/0x4c [mac80211]
Nov 17 13:16:25 ath9k kernel: #5: (&(&txq->axq_lock)->rlock){+.-...}, at: [<f8fe9ce6>] ath_tx_flush_tid+0x41/0xb6 [ath9k]
Nov 17 13:16:25 ath9k kernel:
Nov 17 13:16:25 ath9k kernel: stack backtrace:
Nov 17 13:16:25 ath9k kernel: Pid: 12334, comm: wpa_supplicant Not tainted 2.6.37-rc1-wl+ #48
Nov 17 13:16:25 ath9k kernel: Call Trace:
Nov 17 13:16:25 ath9k kernel: [<7878bf56>] ? printk+0x18/0x1a
Nov 17 13:16:25 ath9k kernel: [<7845bb58>] __lock_acquire+0xb14/0xb8b
Nov 17 13:16:25 ath9k kernel: [<784593ff>] ? register_lock_class+0x17/0x297
Nov 17 13:16:25 ath9k kernel: [<7845bc41>] lock_acquire+0x72/0x8d
Nov 17 13:16:25 ath9k kernel: [<f8fe90aa>] ? ath_tx_complete_buf+0x1d4/0x26c [ath9k]
Nov 17 13:16:25 ath9k kernel: [<7878de3a>] _raw_spin_lock_bh+0x38/0x45
Nov 17 13:16:25 ath9k kernel: [<f8fe90aa>] ? ath_tx_complete_buf+0x1d4/0x26c [ath9k]
Nov 17 13:16:25 ath9k kernel: [<f8fe90aa>] ath_tx_complete_buf+0x1d4/0x26c [ath9k]
Nov 17 13:16:25 ath9k kernel: [<f8fe9d31>] ath_tx_flush_tid+0x8c/0xb6 [ath9k]
Nov 17 13:16:25 ath9k kernel: [<f8fea716>] ath_tx_aggr_stop+0x7e/0x86 [ath9k]
Nov 17 13:16:25 ath9k kernel: [<f8fe56bd>] ath9k_ampdu_action+0x93/0xf4 [ath9k]
Nov 17 13:16:25 ath9k kernel: [<f8fe562a>] ? ath9k_ampdu_action+0x0/0xf4 [ath9k]
Nov 17 13:16:25 ath9k kernel: [<f8f18708>] drv_ampdu_action+0x60/0x68 [mac80211]
Nov 17 13:16:25 ath9k kernel: [<f8f18faf>] ___ieee80211_stop_tx_ba_session+0xde/0xfd [mac80211]
Nov 17 13:16:25 ath9k kernel: [<f8f190aa>] __ieee80211_stop_tx_ba_session+0x39/0x4c [mac80211]
Nov 17 13:16:25 ath9k kernel: [<f8f18683>] ieee80211_sta_tear_down_BA_sessions+0x31/0x56 [mac80211]
Nov 17 13:16:25 ath9k kernel: [<f8f1b0a0>] ? set_sta_flags+0x23/0x28 [mac80211]
Nov 17 13:16:25 ath9k kernel: [<f8f1b175>] ieee80211_set_disassoc+0xd0/0x1bc [mac80211]
Nov 17 13:16:25 ath9k kernel: [<f8f1b4f5>] ieee80211_mgd_deauth+0x4f/0x1af [mac80211]
Nov 17 13:16:25 ath9k kernel: [<f8f22cf1>] ieee80211_deauth+0x14/0x16 [mac80211]
Nov 17 13:16:25 ath9k kernel: [<f8bb71e9>] __cfg80211_mlme_deauth+0x105/0x10d [cfg80211]
Nov 17 13:16:25 ath9k kernel: [<f8bb994e>] __cfg80211_disconnect+0x112/0x199 [cfg80211]
Nov 17 13:16:25 ath9k kernel: [<f8bbc5cc>] cfg80211_wext_siwmlme+0x64/0x85 [cfg80211]
Nov 17 13:16:25 ath9k kernel: [<7876e089>] ioctl_standard_call+0x1f0/0x28e
Nov 17 13:16:25 ath9k kernel: [<786f2b2b>] ? dev_name_hash+0x16/0x48
Nov 17 13:16:25 ath9k kernel: [<786f653c>] ? __dev_get_by_name+0x32/0x3d
Nov 17 13:16:25 ath9k kernel: [<7876e1b4>] wext_handle_ioctl+0x8d/0x18d
Nov 17 13:16:25 ath9k kernel: [<f8bbc568>] ? cfg80211_wext_siwmlme+0x0/0x85 [cfg80211]
Nov 17 13:16:25 ath9k kernel: [<786f7669>] dev_ioctl+0x520/0x53f
Nov 17 13:16:25 ath9k kernel: [<785977bb>] ? copy_to_user+0x2f/0x108
Nov 17 13:16:25 ath9k kernel: [<786e69dc>] ? sys_recvfrom+0xb8/0xc6
Nov 17 13:16:25 ath9k kernel: [<786e5d1f>] ? sock_ioctl+0x0/0x202
Nov 17 13:16:25 ath9k kernel: [<786e5f15>] sock_ioctl+0x1f6/0x202
Nov 17 13:16:25 ath9k kernel: [<786e5d1f>] ? sock_ioctl+0x0/0x202
Nov 17 13:16:25 ath9k kernel: [<784cc071>] do_vfs_ioctl+0x56d/0x5c3
Nov 17 13:16:25 ath9k kernel: [<784c130d>] ? fcheck_files+0x9b/0xca
Nov 17 13:16:25 ath9k kernel: [<784c1369>] ? fget_light+0x2d/0xb0
Nov 17 13:16:25 ath9k kernel: [<784cc10a>] sys_ioctl+0x43/0x62
Nov 17 13:16:25 ath9k kernel: [<784030dc>] sysenter_do_call+0x12/0x38

CTRL-A Z for help |115200 8N1 | NOR | Minicom 2.2 | VT102 | Online 03:17

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com



2010-11-18 18:15:14

by Ben Greear

[permalink] [raw]
Subject: Re: recursive locking on wireless-testing.

On 11/18/2010 01:53 AM, Felix Fietkau wrote:
> On 2010-11-18 1:55 AM, Ben Greear wrote:
>> On 11/17/2010 04:37 PM, Felix Fietkau wrote:
>>> On 2010-11-17 10:19 PM, Ben Greear wrote:
>>>> I found this while testing wpa_supplicant that shares scan results.
>>>> The kernel has no scan-sharing hacks in it..just a few patches
>>>> I've been using for a while (and the deadlock prevention patch
>>>> previously mentioned in other threads).
>>>>
>>>>
>>>> Nov 17 13:16:25 ath9k kernel: ieee80211 wiphy0: Removed STA 00:14:d1:c6:d2:54
>>>> Nov 17 13:16:25 ath9k kernel: ieee80211 wiphy0: Destroyed STA 00:14:d1:c6:d2:54
>>>>
>>>> =============================================
>>>> [ INFO: possible recursive locking detected ]
>>>> 2.6.37-rc1-wl+ #48
>>>> ---------------------------------------------
>>> This should fix it. ath_tx_complete is already called with the txq locked.
>>>
>>> --- a/drivers/net/wireless/ath/ath9k/xmit.c
>>> +++ b/drivers/net/wireless/ath/ath9k/xmit.c
>>> @@ -1830,10 +1830,8 @@ static void ath_tx_complete(struct ath_s
>>> else {
>>> q = skb_get_queue_mapping(skb);
>>> if (txq == sc->tx.txq_map[q]) {
>>> - spin_lock_bh(&txq->axq_lock);
>>> if (WARN_ON(--txq->pending_frames< 0))
>>> txq->pending_frames = 0;
>>> - spin_unlock_bh(&txq->axq_lock);
>>> }
>>>
>>> ieee80211_tx_status(hw, skb);
>>
>>
> How about this instead of the other patch?
>
> --- a/drivers/net/wireless/ath/ath9k/xmit.c
> +++ b/drivers/net/wireless/ath/ath9k/xmit.c
> @@ -163,6 +163,7 @@ static void ath_tx_flush_tid(struct ath_
> bf = list_first_entry(&tid->buf_q, struct ath_buf, list);
> list_move_tail(&bf->list,&bf_head);
>
> + spin_unlock_bh(&txq->axq_lock);
> fi = get_frame_info(bf->bf_mpdu);
> if (fi->retries) {
> ath_tx_update_baw(sc, tid, fi->seqno);
> @@ -170,6 +171,7 @@ static void ath_tx_flush_tid(struct ath_
> } else {
> ath_tx_send_normal(sc, txq, tid,&bf_head);
> }
> + spin_lock_bh(&txq->axq_lock);
> }
>
> spin_unlock_bh(&txq->axq_lock);

I'll give this a try later. Overnight my ath9k box started spitting endless ath9k TX DMA
errors and it seems to have corrupted the / file-system or disk again:

[drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
dracut: Starting plymouth daemon
Gdracut: rd_NO_DM: removing DM RAID activation
dracut: rd_NO_MD: removing MD RAID activation
input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input2
G

No root device found
GG

No root device found

Boot has failed, sleeping forever.



Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2010-11-18 00:37:59

by Felix Fietkau

[permalink] [raw]
Subject: Re: recursive locking on wireless-testing.

On 2010-11-17 10:19 PM, Ben Greear wrote:
> I found this while testing wpa_supplicant that shares scan results.
> The kernel has no scan-sharing hacks in it..just a few patches
> I've been using for a while (and the deadlock prevention patch
> previously mentioned in other threads).
>
>
> Nov 17 13:16:25 ath9k kernel: ieee80211 wiphy0: Removed STA 00:14:d1:c6:d2:54
> Nov 17 13:16:25 ath9k kernel: ieee80211 wiphy0: Destroyed STA 00:14:d1:c6:d2:54
>
> =============================================
> [ INFO: possible recursive locking detected ]
> 2.6.37-rc1-wl+ #48
> ---------------------------------------------
This should fix it. ath_tx_complete is already called with the txq locked.

--- a/drivers/net/wireless/ath/ath9k/xmit.c
+++ b/drivers/net/wireless/ath/ath9k/xmit.c
@@ -1830,10 +1830,8 @@ static void ath_tx_complete(struct ath_s
else {
q = skb_get_queue_mapping(skb);
if (txq == sc->tx.txq_map[q]) {
- spin_lock_bh(&txq->axq_lock);
if (WARN_ON(--txq->pending_frames < 0))
txq->pending_frames = 0;
- spin_unlock_bh(&txq->axq_lock);
}

ieee80211_tx_status(hw, skb);

2010-11-18 00:42:27

by Ben Greear

[permalink] [raw]
Subject: Re: recursive locking on wireless-testing.

On 11/17/2010 04:37 PM, Felix Fietkau wrote:
> On 2010-11-17 10:19 PM, Ben Greear wrote:
>> I found this while testing wpa_supplicant that shares scan results.
>> The kernel has no scan-sharing hacks in it..just a few patches
>> I've been using for a while (and the deadlock prevention patch
>> previously mentioned in other threads).
>>
>>
>> Nov 17 13:16:25 ath9k kernel: ieee80211 wiphy0: Removed STA 00:14:d1:c6:d2:54
>> Nov 17 13:16:25 ath9k kernel: ieee80211 wiphy0: Destroyed STA 00:14:d1:c6:d2:54
>>
>> =============================================
>> [ INFO: possible recursive locking detected ]
>> 2.6.37-rc1-wl+ #48
>> ---------------------------------------------
> This should fix it. ath_tx_complete is already called with the txq locked.

Thanks, I'll give it a try now.

Ben

>
> --- a/drivers/net/wireless/ath/ath9k/xmit.c
> +++ b/drivers/net/wireless/ath/ath9k/xmit.c
> @@ -1830,10 +1830,8 @@ static void ath_tx_complete(struct ath_s
> else {
> q = skb_get_queue_mapping(skb);
> if (txq == sc->tx.txq_map[q]) {
> - spin_lock_bh(&txq->axq_lock);
> if (WARN_ON(--txq->pending_frames< 0))
> txq->pending_frames = 0;
> - spin_unlock_bh(&txq->axq_lock);
> }
>
> ieee80211_tx_status(hw, skb);


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2010-11-18 00:55:23

by Ben Greear

[permalink] [raw]
Subject: Re: recursive locking on wireless-testing.

On 11/17/2010 04:37 PM, Felix Fietkau wrote:
> On 2010-11-17 10:19 PM, Ben Greear wrote:
>> I found this while testing wpa_supplicant that shares scan results.
>> The kernel has no scan-sharing hacks in it..just a few patches
>> I've been using for a while (and the deadlock prevention patch
>> previously mentioned in other threads).
>>
>>
>> Nov 17 13:16:25 ath9k kernel: ieee80211 wiphy0: Removed STA 00:14:d1:c6:d2:54
>> Nov 17 13:16:25 ath9k kernel: ieee80211 wiphy0: Destroyed STA 00:14:d1:c6:d2:54
>>
>> =============================================
>> [ INFO: possible recursive locking detected ]
>> 2.6.37-rc1-wl+ #48
>> ---------------------------------------------
> This should fix it. ath_tx_complete is already called with the txq locked.
>
> --- a/drivers/net/wireless/ath/ath9k/xmit.c
> +++ b/drivers/net/wireless/ath/ath9k/xmit.c
> @@ -1830,10 +1830,8 @@ static void ath_tx_complete(struct ath_s
> else {
> q = skb_get_queue_mapping(skb);
> if (txq == sc->tx.txq_map[q]) {
> - spin_lock_bh(&txq->axq_lock);
> if (WARN_ON(--txq->pending_frames< 0))
> txq->pending_frames = 0;
> - spin_unlock_bh(&txq->axq_lock);
> }
>
> ieee80211_tx_status(hw, skb);


I restarted a few times, and haven't see any lockdep errors. I did see the
WARN_ON inside that lock hit, however:

------------[ cut here ]------------
WARNING: at /home/greearb/git/linux.wireless-testing/drivers/net/wireless/ath/ath9k/xmit.c:1833 ath_tx_complete_buf+0x1d5/0x240 [ath9k]()
Hardware name: PDSBM
Modules linked in: aes_i586 aes_generic 8021q garp stp llc michael_mic macvlan pktgen nfs lockd fscache nfs_acl auth_rpcgss sunrpc p4_clockmod ipv6 uinput arc4
ecb e1000e ath9k mac80211 ath9k_common ath9k_hw ath i2c_i801 cfg80211 iTCO_wdt iTCO_vendor_support pcspkr microcode i915 drm_kms_helper drm i2c_algo_bit
i2c_core video output [last unloaded: ipt_addrtype]
Pid: 0, comm: swapper Tainted: P 2.6.37-rc2-wl+ #50
Call Trace:
[<78436f25>] warn_slowpath_common+0x77/0x8c
[<f86e910b>] ? ath_tx_complete_buf+0x1d5/0x240 [ath9k]
[<f86e910b>] ? ath_tx_complete_buf+0x1d5/0x240 [ath9k]
[<78436f57>] warn_slowpath_null+0x1d/0x1f
[<f86e910b>] ath_tx_complete_buf+0x1d5/0x240 [ath9k]
[<7843c24b>] ? _local_bh_enable_ip+0x9d/0xa6
[<f86eb1bf>] ath_tx_tasklet+0x242/0x2b6 [ath9k]
[<f86e68bc>] ath9k_tasklet+0xb9/0x127 [ath9k]
[<7843bb0d>] tasklet_action+0x88/0xe3
[<7843c089>] __do_softirq+0x85/0x142
[<7843c004>] ? __do_softirq+0x0/0x142
<IRQ> [<7843beab>] ? irq_exit+0x35/0x69
[<78404245>] ? do_IRQ+0x8e/0xa2
[<7844e97c>] ? hrtimer_start+0x22/0x28
[<784036ae>] ? common_interrupt+0x2e/0x40
[<78408a12>] ? mwait_idle+0x59/0x69
[<78402417>] ? cpu_idle+0x4e/0x6b
[<78779419>] ? rest_init+0xa1/0xa7
[<78779378>] ? rest_init+0x0/0xa7
[<78992949>] ? start_kernel+0x334/0x33a
[<7899244f>] ? unknown_bootoption+0x0/0x190
[<789920e2>] ? i386_start_kernel+0xe2/0xea
---[ end trace a659d7b152ca5d4f ]---


Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2010-11-18 09:53:35

by Felix Fietkau

[permalink] [raw]
Subject: Re: recursive locking on wireless-testing.

On 2010-11-18 1:55 AM, Ben Greear wrote:
> On 11/17/2010 04:37 PM, Felix Fietkau wrote:
>> On 2010-11-17 10:19 PM, Ben Greear wrote:
>>> I found this while testing wpa_supplicant that shares scan results.
>>> The kernel has no scan-sharing hacks in it..just a few patches
>>> I've been using for a while (and the deadlock prevention patch
>>> previously mentioned in other threads).
>>>
>>>
>>> Nov 17 13:16:25 ath9k kernel: ieee80211 wiphy0: Removed STA 00:14:d1:c6:d2:54
>>> Nov 17 13:16:25 ath9k kernel: ieee80211 wiphy0: Destroyed STA 00:14:d1:c6:d2:54
>>>
>>> =============================================
>>> [ INFO: possible recursive locking detected ]
>>> 2.6.37-rc1-wl+ #48
>>> ---------------------------------------------
>> This should fix it. ath_tx_complete is already called with the txq locked.
>>
>> --- a/drivers/net/wireless/ath/ath9k/xmit.c
>> +++ b/drivers/net/wireless/ath/ath9k/xmit.c
>> @@ -1830,10 +1830,8 @@ static void ath_tx_complete(struct ath_s
>> else {
>> q = skb_get_queue_mapping(skb);
>> if (txq == sc->tx.txq_map[q]) {
>> - spin_lock_bh(&txq->axq_lock);
>> if (WARN_ON(--txq->pending_frames< 0))
>> txq->pending_frames = 0;
>> - spin_unlock_bh(&txq->axq_lock);
>> }
>>
>> ieee80211_tx_status(hw, skb);
>
>
How about this instead of the other patch?

--- a/drivers/net/wireless/ath/ath9k/xmit.c
+++ b/drivers/net/wireless/ath/ath9k/xmit.c
@@ -163,6 +163,7 @@ static void ath_tx_flush_tid(struct ath_
bf = list_first_entry(&tid->buf_q, struct ath_buf, list);
list_move_tail(&bf->list, &bf_head);

+ spin_unlock_bh(&txq->axq_lock);
fi = get_frame_info(bf->bf_mpdu);
if (fi->retries) {
ath_tx_update_baw(sc, tid, fi->seqno);
@@ -170,6 +171,7 @@ static void ath_tx_flush_tid(struct ath_
} else {
ath_tx_send_normal(sc, txq, tid, &bf_head);
}
+ spin_lock_bh(&txq->axq_lock);
}

spin_unlock_bh(&txq->axq_lock);

2010-11-19 23:12:27

by Ben Greear

[permalink] [raw]
Subject: Re: recursive locking on wireless-testing.

On 11/18/2010 01:53 AM, Felix Fietkau wrote:
> On 2010-11-18 1:55 AM, Ben Greear wrote:
>> On 11/17/2010 04:37 PM, Felix Fietkau wrote:
>>> On 2010-11-17 10:19 PM, Ben Greear wrote:
>>>> I found this while testing wpa_supplicant that shares scan results.
>>>> The kernel has no scan-sharing hacks in it..just a few patches
>>>> I've been using for a while (and the deadlock prevention patch
>>>> previously mentioned in other threads).
>>>>
>>>>
>>>> Nov 17 13:16:25 ath9k kernel: ieee80211 wiphy0: Removed STA 00:14:d1:c6:d2:54
>>>> Nov 17 13:16:25 ath9k kernel: ieee80211 wiphy0: Destroyed STA 00:14:d1:c6:d2:54
>>>>
>>>> =============================================
>>>> [ INFO: possible recursive locking detected ]
>>>> 2.6.37-rc1-wl+ #48
>>>> ---------------------------------------------
>>> This should fix it. ath_tx_complete is already called with the txq locked.
>>>
>>> --- a/drivers/net/wireless/ath/ath9k/xmit.c
>>> +++ b/drivers/net/wireless/ath/ath9k/xmit.c
>>> @@ -1830,10 +1830,8 @@ static void ath_tx_complete(struct ath_s
>>> else {
>>> q = skb_get_queue_mapping(skb);
>>> if (txq == sc->tx.txq_map[q]) {
>>> - spin_lock_bh(&txq->axq_lock);
>>> if (WARN_ON(--txq->pending_frames< 0))
>>> txq->pending_frames = 0;
>>> - spin_unlock_bh(&txq->axq_lock);
>>> }
>>>
>>> ieee80211_tx_status(hw, skb);
>>
>>
> How about this instead of the other patch?
>
> --- a/drivers/net/wireless/ath/ath9k/xmit.c
> +++ b/drivers/net/wireless/ath/ath9k/xmit.c
> @@ -163,6 +163,7 @@ static void ath_tx_flush_tid(struct ath_
> bf = list_first_entry(&tid->buf_q, struct ath_buf, list);
> list_move_tail(&bf->list,&bf_head);
>
> + spin_unlock_bh(&txq->axq_lock);
> fi = get_frame_info(bf->bf_mpdu);
> if (fi->retries) {
> ath_tx_update_baw(sc, tid, fi->seqno);
> @@ -170,6 +171,7 @@ static void ath_tx_flush_tid(struct ath_
> } else {
> ath_tx_send_normal(sc, txq, tid,&bf_head);
> }
> + spin_lock_bh(&txq->axq_lock);
> }
>
> spin_unlock_bh(&txq->axq_lock);

I don't see any lockdep errors with this, but I did see this spit out:


WARNING: at /home/greearb/git/linux.wireless-testing/drivers/net/wireless/ath/ath9k/recv.c:532 ath_stoprecv+0x90/0x9a [ath9k]()
Hardware name: PDSBM
Could not stop RX, we could be confusing the DMA engine when we start RX up
Modules linked in: bluetooth aes_i586 aes_generic 8021q garp stp llc michael_mic macvlan pktgen fuse nfs lockd fscache nfs_acl auth_rpcgss sunrpc ipv6]
Pid: 16092, comm: kworker/u:0 Tainted: P W 2.6.37-rc2-wl+ #50
Call Trace:
[<78436f25>] warn_slowpath_common+0x77/0x8c
[<f91f513e>] ? ath_stoprecv+0x90/0x9a [ath9k]
[<f91f513e>] ? ath_stoprecv+0x90/0x9a [ath9k]
[<78436fb6>] warn_slowpath_fmt+0x2e/0x30
[<f91f513e>] ath_stoprecv+0x90/0x9a [ath9k]
[<f91f40dc>] ath_set_channel+0x94/0x1e8 [ath9k]
[<f8c73767>] ? ath_hw_cycle_counters_update+0xc4/0x114 [ath]
[<f91f4574>] ath9k_config+0x344/0x423 [ath9k]
[<f9111aaa>] ieee80211_hw_config+0x11b/0x125 [mac80211]
[<f9115dea>] ieee80211_scan_work+0x29e/0x3f8 [mac80211]
[<7845a5e5>] ? trace_hardirqs_on+0xb/0xd
[<7878ea66>] ? _raw_spin_unlock_irq+0x22/0x2b
[<78446ecb>] ? process_one_work+0x13e/0x2bf
[<78446f3c>] process_one_work+0x1af/0x2bf
[<78446ecb>] ? process_one_work+0x13e/0x2bf
[<f9115b4c>] ? ieee80211_scan_work+0x0/0x3f8 [mac80211]
[<7844868a>] worker_thread+0xf9/0x1bf
[<78448591>] ? worker_thread+0x0/0x1bf
[<7844b1ba>] kthread+0x62/0x67
[<7844b158>] ? kthread+0x0/0x67
[<784036c6>] kernel_thread_helper+0x6/0x1a


That, or similar, was happening before, so your patch may still be fine.

Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com