2010-11-20 02:08:52

by Felix Fietkau

[permalink] [raw]
Subject: [PATCH 1/3] ath9k: fix recursive locking in the tx flush path

Signed-off-by: Felix Fietkau <[email protected]>
Tested-by: Ben Greear <[email protected]>
---
drivers/net/wireless/ath/ath9k/xmit.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
index c63e283..495432e 100644
--- a/drivers/net/wireless/ath/ath9k/xmit.c
+++ b/drivers/net/wireless/ath/ath9k/xmit.c
@@ -163,6 +163,7 @@ static void ath_tx_flush_tid(struct ath_softc *sc, struct ath_atx_tid *tid)
bf = list_first_entry(&tid->buf_q, struct ath_buf, list);
list_move_tail(&bf->list, &bf_head);

+ spin_unlock_bh(&txq->axq_lock);
fi = get_frame_info(bf->bf_mpdu);
if (fi->retries) {
ath_tx_update_baw(sc, tid, fi->seqno);
@@ -170,6 +171,7 @@ static void ath_tx_flush_tid(struct ath_softc *sc, struct ath_atx_tid *tid)
} else {
ath_tx_send_normal(sc, txq, tid, &bf_head);
}
+ spin_lock_bh(&txq->axq_lock);
}

spin_unlock_bh(&txq->axq_lock);
--
1.7.3.2



2010-11-20 02:08:53

by Felix Fietkau

[permalink] [raw]
Subject: [PATCH 2/3] ath9k: fix timeout on stopping rx dma

It seems that using ath9k_hw_stoppcurecv to stop rx dma is not enough.
When it's time to stop DMA, the PCU is still busy, so the rx enable
bit never clears.
Using ath9k_hw_abortpcurecv helps with getting rx stopped much faster,
with this change, I cannot reproduce the rx stop related WARN_ON anymore.

Signed-off-by: Felix Fietkau <[email protected]>
Cc: [email protected]
---
drivers/net/wireless/ath/ath9k/recv.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/recv.c b/drivers/net/wireless/ath/ath9k/recv.c
index c5c8076..262c815 100644
--- a/drivers/net/wireless/ath/ath9k/recv.c
+++ b/drivers/net/wireless/ath/ath9k/recv.c
@@ -518,7 +518,7 @@ bool ath_stoprecv(struct ath_softc *sc)
bool stopped;

spin_lock_bh(&sc->rx.rxbuflock);
- ath9k_hw_stoppcurecv(ah);
+ ath9k_hw_abortpcurecv(ah);
ath9k_hw_setrxfilter(ah, 0);
stopped = ath9k_hw_stopdmarecv(ah);

--
1.7.3.2


2010-11-22 20:18:39

by Ben Greear

[permalink] [raw]
Subject: Re: [PATCH 1/3] ath9k: fix recursive locking in the tx flush path

I compiled with these three patches, but it still crashed badly after I started
configuring and/or passing traffic on the 30 STA interfaces.

It is e1000e that first complains, but I still suspect ath9k because
my app was actively creating & configuring the STA interfaces when
this happened.

I can try a different system just in case..might be a day or two though.

The patches don't seem to make anything worse, and at least my
file-system wasn't corrupted this time, so maybe it helped..or
maybe I just got lucky.


PHY Status <796d>
PHY 1000BASE-T Status <7c00>
PHY Extended Status <3000>
PCI Status <4010>
e1000e 0000:06:00.0: eth0: Detected Hardware Unit Hang:
TDH <63>
TDT <78>
next_to_use <78>
next_to_clean <63>
buffer_info[next_to_clean]:
time_stamp <6b37e6>
next_to_watch <66>
jiffies <6b48dc>
next_to_watch.status <0>
MAC Status <80080f83>
PHY Status <796d>
PHY 1000BASE-T Status <7c00>
PHY Extended Status <3000>
PCI Status <4010>
------------[ cut here ]------------
WARNING: at /home/greearb/git/linux.wireless-testing/drivers/net/wireless/ath/ath9k/recv.c:532 ath_stoprecv+0x90/0x9a [ath9k]()
Hardware name: PDSBM
Could not stop RX, we could be confusing the DMA engine when we start RX up
Modules linked in: michael_mic aes_i586 aes_generic 8021q garp stp llc fuse macvlan pktgen nfs lockd fscache nfs_acl auth_rpcgss sunrpc ipv6 uinput ar]
Pid: 18650, comm: kworker/u:1 Not tainted 2.6.37-rc2-wl+ #51
Call Trace:
[<78436f25>] warn_slowpath_common+0x77/0x8c
[<f929f14e>] ? ath_stoprecv+0x90/0x9a [ath9k]
[<f929f14e>] ? ath_stoprecv+0x90/0x9a [ath9k]
[<78436fb6>] warn_slowpath_fmt+0x2e/0x30
[<f929f14e>] ath_stoprecv+0x90/0x9a [ath9k]
[<f929de42>] ath_radio_disable+0x78/0x140 [ath9k]
[<7845a5e5>] ? trace_hardirqs_on+0xb/0xd
[<7843c24b>] ? _local_bh_enable_ip+0x9d/0xa6
[<7843c25c>] ? local_bh_enable_ip+0x8/0xa
[<f929e651>] ath9k_config+0x411/0x423 [ath9k]
[<7845a35d>] ? mark_held_locks+0x47/0x5f
[<7845a500>] ? trace_hardirqs_on_caller+0x4b/0x125
[<f9193aaa>] ieee80211_hw_config+0x11b/0x125 [mac80211]
[<f919fa61>] ieee80211_recalc_idle+0x34/0x39 [mac80211]
[<f919c2c7>] ieee80211_sta_work+0x121/0x16a [mac80211]
[<f91a0cbd>] ieee80211_iface_work+0x268/0x282 [mac80211]
[<78446f3c>] process_one_work+0x1af/0x2bf
[<78446ecb>] ? process_one_work+0x13e/0x2bf
[<f91a0a55>] ? ieee80211_iface_work+0x0/0x282 [mac80211]
[<7844868a>] worker_thread+0xf9/0x1bf
[<78448591>] ? worker_thread+0x0/0x1bf
[<7844b1ba>] kthread+0x62/0x67
[<7844b158>] ? kthread+0x0/0x67
[<784036c6>] kernel_thread_helper+0x6/0x1a
---[ end trace f37a1506c7b4957c ]---
cfg80211: Calling CRDA to update world regulatory domain
cfg80211: Calling CRDA for country: US
cfg80211: Regulatory domain changed to country: US
(start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
(2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2700 mBm)
(5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 1700 mBm)
(5250000 KHz - 5330000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
(5490000 KHz - 5600000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
(5650000 KHz - 5710000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
(5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 3000 mBm)
e1000e 0000:06:00.0: eth0: Detected Hardware Unit Hang:
TDH <63>
TDT <78>
next_to_use <78>
next_to_clean <63>
buffer_info[next_to_clean]:
time_stamp <6b37e6>
next_to_watch <66>
jiffies <6b50ac>
next_to_watch.status <0>
MAC Status <80080f83>
PHY Status <796d>
PHY 1000BASE-T Status <7c00>
PHY Extended Status <3000>
PCI Status <4010>
e1000e 0000:06:00.0: eth0: Detected Hardware Unit Hang:
TDH <63>
TDT <78>
next_to_use <78>
next_to_clean <63>
buffer_info[next_to_clean]:
time_stamp <6b37e6>
next_to_watch <66>
jiffies <6b587c>
next_to_watch.status <0>
MAC Status <80080f83>
PHY Status <796d>
PHY 1000BASE-T Status <7c00>
PHY Extended Status <3000>
PCI Status <4010>
e1000e 0000:06:00.0: eth0: Detected Hardware Unit Hang:
TDH <63>
TDT <78>
next_to_use <78>
next_to_clean <63>
buffer_info[next_to_clean]:
time_stamp <6b37e6>
next_to_watch <66>
jiffies <6b604c>
next_to_watch.status <0>
MAC Status <80080f83>
PHY Status <796d>
PHY 1000BASE-T Status <7c00>
PHY Extended Status <3000>
PCI Status <4010>
------------[ cut here ]------------
WARNING: at /home/greearb/git/linux.wireless-testing/net/sched/sch_generic.c:258 dev_watchdog+0xd0/0x124()
Hardware name: PDSBM
NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
Modules linked in: michael_mic aes_i586 aes_generic 8021q garp stp llc fuse macvlan pktgen nfs lockd fscache nfs_acl auth_rpcgss sunrpc ipv6 uinput ar]
Pid: 0, comm: swapper Tainted: G W 2.6.37-rc2-wl+ #51
Call Trace:
[<78436f25>] warn_slowpath_common+0x77/0x8c
[<78707759>] ? dev_watchdog+0xd0/0x124
[<78707759>] ? dev_watchdog+0xd0/0x124
[<78436fb6>] warn_slowpath_fmt+0x2e/0x30
[<78707759>] dev_watchdog+0xd0/0x124
[<784403f8>] ? run_timer_softirq+0xec/0x207
[<7844046e>] run_timer_softirq+0x162/0x207
[<784403f8>] ? run_timer_softirq+0xec/0x207
[<7845a5e5>] ? trace_hardirqs_on+0xb/0xd
[<78707689>] ? dev_watchdog+0x0/0x124
[<7843c089>] __do_softirq+0x85/0x142
[<7843c004>] ? __do_softirq+0x0/0x142
<IRQ> [<7843beab>] ? irq_exit+0x35/0x69
[<7841a2e5>] ? smp_apic_timer_interrupt+0x74/0x81
[<785972d0>] ? trace_hardirqs_off_thunk+0xc/0x10
[<7878f2ef>] ? apic_timer_interrupt+0x2f/0x40
[<7845007b>] ? sched_clock_local+0xc5/0x155
[<78408a12>] ? mwait_idle+0x59/0x69
[<78402417>] ? cpu_idle+0x4e/0x6b
[<78779489>] ? rest_init+0xa1/0xa7
[<787793e8>] ? rest_init+0x0/0xa7
[<78992949>] ? start_kernel+0x334/0x33a
[<7899244f>] ? unknown_bootoption+0x0/0x190
[<789920e2>] ? i386_start_kernel+0xe2/0xea
---[ end trace f37a1506c7b4957d ]---
e1000e 0000:06:00.0: eth0: Reset adapter


[ similar stuff continues, including ATA errors for the hard-drive ]

Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2010-11-22 23:42:53

by Ben Greear

[permalink] [raw]
Subject: Re: [PATCH 2/3] ath9k: fix timeout on stopping rx dma

On 11/19/2010 06:08 PM, Felix Fietkau wrote:
> It seems that using ath9k_hw_stoppcurecv to stop rx dma is not enough.
> When it's time to stop DMA, the PCU is still busy, so the rx enable
> bit never clears.
> Using ath9k_hw_abortpcurecv helps with getting rx stopped much faster,
> with this change, I cannot reproduce the rx stop related WARN_ON anymore.

I have done some more testing, and several times it rebooted and
properly configured the 30 STAs & passed traffic without error.

So, it appears your change is definitely an improvement.

I did see one time earlier today when it still failed to stop,
however. Would it be worth attempting the abort and/or stop
hardware calls several times if it fails the first time?

Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2010-11-20 02:08:53

by Felix Fietkau

[permalink] [raw]
Subject: [PATCH 3/3] ath9k_hw: remove ath9k_hw_stoppcurecv

It is no longer used anywhere

Signed-off-by: Felix Fietkau <[email protected]>
---
drivers/net/wireless/ath/ath9k/mac.c | 8 --------
drivers/net/wireless/ath/ath9k/mac.h | 1 -
2 files changed, 0 insertions(+), 9 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/mac.c b/drivers/net/wireless/ath/ath9k/mac.c
index 65b1ee2..b04b37b 100644
--- a/drivers/net/wireless/ath/ath9k/mac.c
+++ b/drivers/net/wireless/ath/ath9k/mac.c
@@ -766,14 +766,6 @@ void ath9k_hw_startpcureceive(struct ath_hw *ah, bool is_scanning)
}
EXPORT_SYMBOL(ath9k_hw_startpcureceive);

-void ath9k_hw_stoppcurecv(struct ath_hw *ah)
-{
- REG_SET_BIT(ah, AR_DIAG_SW, AR_DIAG_RX_DIS);
-
- ath9k_hw_disable_mib_counters(ah);
-}
-EXPORT_SYMBOL(ath9k_hw_stoppcurecv);
-
void ath9k_hw_abortpcurecv(struct ath_hw *ah)
{
REG_SET_BIT(ah, AR_DIAG_SW, AR_DIAG_RX_ABORT | AR_DIAG_RX_DIS);
diff --git a/drivers/net/wireless/ath/ath9k/mac.h b/drivers/net/wireless/ath/ath9k/mac.h
index 22907e2..7512f97 100644
--- a/drivers/net/wireless/ath/ath9k/mac.h
+++ b/drivers/net/wireless/ath/ath9k/mac.h
@@ -691,7 +691,6 @@ void ath9k_hw_setuprxdesc(struct ath_hw *ah, struct ath_desc *ds,
bool ath9k_hw_setrxabort(struct ath_hw *ah, bool set);
void ath9k_hw_putrxbuf(struct ath_hw *ah, u32 rxdp);
void ath9k_hw_startpcureceive(struct ath_hw *ah, bool is_scanning);
-void ath9k_hw_stoppcurecv(struct ath_hw *ah);
void ath9k_hw_abortpcurecv(struct ath_hw *ah);
bool ath9k_hw_stopdmarecv(struct ath_hw *ah);
int ath9k_hw_beaconq_setup(struct ath_hw *ah);
--
1.7.3.2