2011-01-07 18:01:06

by Ben Greear

[permalink] [raw]
Subject: [PATCH 1/2] ath9k: More xmit queue debugfs information.

From: Ben Greear <[email protected]>

To try to figure out why xmit logic hangs.

Signed-off-by: Ben Greear <[email protected]>
---
:100644 100644 650f00f... 9e009cc... M drivers/net/wireless/ath/ath9k/debug.c
drivers/net/wireless/ath/ath9k/debug.c | 26 ++++++++++++++++++++++++++
1 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/debug.c b/drivers/net/wireless/ath/ath9k/debug.c
index 650f00f..9e009cc 100644
--- a/drivers/net/wireless/ath/ath9k/debug.c
+++ b/drivers/net/wireless/ath/ath9k/debug.c
@@ -679,6 +679,32 @@ static ssize_t read_file_xmit(struct file *file, char __user *user_buf,
PRQLE(tmp, txq_fifo[i]);
}

+ /* Print out more detailed queue-info */
+ for (i = 0; i <= WME_AC_BK; i++) {
+ struct ath_txq *txq = &(sc->tx.txq[i]);
+ struct ath_atx_ac *ac;
+ struct ath_atx_tid *tid;
+ if (len >= size)
+ goto done;
+ spin_lock_bh(&txq->axq_lock);
+ if (!list_empty(&txq->axq_acq)) {
+ ac = list_first_entry(&txq->axq_acq, struct ath_atx_ac,
+ list);
+ len += snprintf(buf + len, size - len,
+ "txq[%i] first-ac: %p sched: %i\n",
+ i, ac, ac->sched);
+ if (list_empty(&ac->tid_q) || (len >= size))
+ goto done_for;
+ tid = list_first_entry(&ac->tid_q, struct ath_atx_tid,
+ list);
+ len += snprintf(buf + len, size - len,
+ " first-tid: %p sched: %i paused: %i\n",
+ tid, tid->sched, tid->paused);
+ }
+ done_for:
+ spin_unlock_bh(&txq->axq_lock);
+ }
+
done:
if (len > size)
len = size;
--
1.7.2.3



2011-01-10 05:48:58

by Ben Greear

[permalink] [raw]
Subject: Re: [PATCH 2/2] ath9k: Fix incorrect tx-hang detection logic.

On 01/09/2011 09:38 PM, Vasanthakumar Thiagarajan wrote:
> On Fri, Jan 07, 2011 at 11:30:59PM +0530, [email protected] wrote:
>> From: Ben Greear<[email protected]>
>>
>> It is not guaranteed that the ath_tx_complete_poll_work runs
>> after some fixed duration because the channel-reset logic
>> removes the work and then re-adds it to run immediately.
>> Two channel-changes 1ms apart, with a transmit was being
>> attempted, could thus incorrectly trigger a reset by
>> the ath_tx_complete_poll_work method.
>
> I don't think so. axq_tx_inprogress is reset in ath_draintxq().

Ahhh, I see now.

I'll remove this patch from my queue and make sure it still runs
as well.

Thanks,
Ben

>
> Vasanth


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2011-01-07 18:01:24

by Ben Greear

[permalink] [raw]
Subject: [PATCH 2/2] ath9k: Fix incorrect tx-hang detection logic.

From: Ben Greear <[email protected]>

It is not guaranteed that the ath_tx_complete_poll_work runs
after some fixed duration because the channel-reset logic
removes the work and then re-adds it to run immediately.
Two channel-changes 1ms apart, with a transmit was being
attempted, could thus incorrectly trigger a reset by
the ath_tx_complete_poll_work method.

Add a jiffies timestamp to ensure that at least 1 second
has elapsed before triggering the reset.

Signed-off-by: Ben Greear <[email protected]>
---
:100644 100644 3f5c513... 93209d6... M drivers/net/wireless/ath/ath9k/ath9k.h
:100644 100644 3aae523... e63de71... M drivers/net/wireless/ath/ath9k/xmit.c
drivers/net/wireless/ath/ath9k/ath9k.h | 1 +
drivers/net/wireless/ath/ath9k/xmit.c | 15 ++++++++++-----
2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h
index 3f5c513..93209d6 100644
--- a/drivers/net/wireless/ath/ath9k/ath9k.h
+++ b/drivers/net/wireless/ath/ath9k/ath9k.h
@@ -190,6 +190,7 @@ struct ath_txq {
spinlock_t axq_lock;
u32 axq_depth;
u32 axq_ampdu_depth;
+ unsigned long start_tx_timer; /* jiffies */
bool stopped;
bool axq_tx_inprogress;
struct list_head axq_acq;
diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
index 3aae523..e63de71 100644
--- a/drivers/net/wireless/ath/ath9k/xmit.c
+++ b/drivers/net/wireless/ath/ath9k/xmit.c
@@ -2097,6 +2097,7 @@ static void ath_tx_complete_poll_work(struct work_struct *work)
struct ath_txq *txq;
int i;
bool needreset = false;
+ unsigned long timeout = msecs_to_jiffies(ATH_TX_COMPLETE_POLL_INT);

for (i = 0; i < ATH9K_NUM_TX_QUEUES; i++)
if (ATH_TXQ_SETUP(sc, i)) {
@@ -2104,11 +2105,16 @@ static void ath_tx_complete_poll_work(struct work_struct *work)
spin_lock_bh(&txq->axq_lock);
if (txq->axq_depth) {
if (txq->axq_tx_inprogress) {
- needreset = true;
- spin_unlock_bh(&txq->axq_lock);
- break;
+ if (time_after_eq(jiffies,
+ txq->start_tx_timer +
+ timeout)) {
+ needreset = true;
+ spin_unlock_bh(&txq->axq_lock);
+ break;
+ }
} else {
txq->axq_tx_inprogress = true;
+ txq->start_tx_timer = jiffies;
}
}
spin_unlock_bh(&txq->axq_lock);
@@ -2122,8 +2128,7 @@ static void ath_tx_complete_poll_work(struct work_struct *work)
ath9k_ps_restore(sc);
}

- ieee80211_queue_delayed_work(sc->hw, &sc->tx_complete_work,
- msecs_to_jiffies(ATH_TX_COMPLETE_POLL_INT));
+ ieee80211_queue_delayed_work(sc->hw, &sc->tx_complete_work, timeout);
}


--
1.7.2.3


Subject: Re: [PATCH 2/2] ath9k: Fix incorrect tx-hang detection logic.

On Fri, Jan 07, 2011 at 11:30:59PM +0530, [email protected] wrote:
> From: Ben Greear <[email protected]>
>
> It is not guaranteed that the ath_tx_complete_poll_work runs
> after some fixed duration because the channel-reset logic
> removes the work and then re-adds it to run immediately.
> Two channel-changes 1ms apart, with a transmit was being
> attempted, could thus incorrectly trigger a reset by
> the ath_tx_complete_poll_work method.

I don't think so. axq_tx_inprogress is reset in ath_draintxq().

Vasanth