2011-01-07 00:46:13

by Ben Greear

[permalink] [raw]
Subject: [PATCH 1/3] ath9k: Decrease skb size to fit into one page.

From: Ben Greear <[email protected]>

Patch is from Eric Dumazet, as described here:
https://patchwork.kernel.org/patch/104271/

Reported-by: Michael Guntsche <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: Ben Greear <[email protected]>
---

NOTE: This needs review by ath9k and/or other informed
people.

:100644 100644 b2497b8... 270661d... M drivers/net/wireless/ath/ath9k/recv.c
drivers/net/wireless/ath/ath9k/recv.c | 10 +++++-----
1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/recv.c b/drivers/net/wireless/ath/ath9k/recv.c
index b2497b8..270661d 100644
--- a/drivers/net/wireless/ath/ath9k/recv.c
+++ b/drivers/net/wireless/ath/ath9k/recv.c
@@ -230,11 +230,11 @@ static int ath_rx_edma_init(struct ath_softc *sc, int nbufs)
int error = 0, i;
u32 size;

-
- common->rx_bufsize = roundup(IEEE80211_MAX_MPDU_LEN +
- ah->caps.rx_status_len,
- min(common->cachelsz, (u16)64));
-
+ size = roundup(IEEE80211_MAX_MPDU_LEN + ah->caps.rx_status_len,
+ min(common->cachelsz, (u16)64));
+ common->rx_bufsize = max_t(u32, size,
+ SKB_MAX_ORDER(NET_SKB_PAD
+ + common->cachelsz, 0));
ath9k_hw_set_rx_bufsize(ah, common->rx_bufsize -
ah->caps.rx_status_len);

--
1.7.2.3



2011-01-07 01:23:53

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH 1/3] ath9k: Decrease skb size to fit into one page.

Le vendredi 07 janvier 2011 à 02:04 +0100, Christian Lamparter a écrit :
> On Friday 07 January 2011 01:46:03 [email protected] wrote:
> > From: Ben Greear <[email protected]>
> >
> > Patch is from Eric Dumazet, as described here:
> > https://patchwork.kernel.org/patch/104271/
> >
> > Reported-by: Michael Guntsche <[email protected]>
> > Signed-off-by: Eric Dumazet <[email protected]>
> > Signed-off-by: Ben Greear <[email protected]>
> > ---
> >
> > NOTE: This needs review by ath9k and/or other informed
> > people.
>
> Does the hardware support vector-i/o for rx (like for instance iwlagn)?
> Else, this change would break A-MSDU rx - which is a mandatory feature
> (although, not very popular) of 802.11n -
>
> See for example 802.11n-2009 9.7c:
>
> "Support for the reception of an A-MSDU, where [...], is mandatory for
> an HT STA"
>
> And 7.1.2 "The maximum frame body size is determined by the maximum
> MSDU size (2304 octets) OR the maximum A-MSDU (3839 or 7935 octets,
> depending upon the STA's capability), plus any overhead from security
> encapsulation.

Then, only solution is to mark this broken, and perform a copy of each
received frame, to keep a order-1 buffer(s) allocated for hardware.

Its too easy to have memory allocation failures for high order pages and
freeze the card.

A copy is time consuming, but at least works.




2011-01-07 15:53:20

by Ben Greear

[permalink] [raw]
Subject: Re: [ath9k-devel] [PATCH 3/3] ath9k: Keep track of stations for debugfs.

On 01/07/2011 07:36 AM, Peter Stuge wrote:
> Ben Greear wrote:
>>> but at least other clueless users would not go over stable limit
>>> you have found.
>>
>> I think it's very likely that the problems I find are general
>> issues that are just much easier to hit with lots of stations.
>
> I strongly support this. I recognize several of your problems from my
> attempts at using ath9k with only a single STA which was all but stable.

Please do (re)post the info against the latest wireless-testing. It might
help me and others see something we're missing when trying to debug
problems we see ourselves.

Thanks,
Ben


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2011-01-07 22:20:56

by Ben Greear

[permalink] [raw]
Subject: Re: [ath9k-devel] [PATCH 1/3] ath9k: Decrease skb size to fit into one page.

On 01/07/2011 12:26 PM, Eric Dumazet wrote:
> Le vendredi 07 janvier 2011 à 12:09 -0800, Luis R. Rodriguez a écrit :
>> On Fri, Jan 07, 2011 at 10:34:52AM -0800, Ben Greear wrote:
>>> On 01/07/2011 02:58 AM, Johannes Berg wrote:
>>>> On Thu, 2011-01-06 at 16:46 -0800, [email protected] wrote:
>>>>> From: Ben Greear<[email protected]>
>>>>>
>>>>> Patch is from Eric Dumazet, as described here:
>>>>> https://patchwork.kernel.org/patch/104271/
>>>>>
>>>>> Reported-by: Michael Guntsche<[email protected]>
>>>>> Signed-off-by: Eric Dumazet<[email protected]>
>>>>> Signed-off-by: Ben Greear<[email protected]>
>>>>> ---
>>>>>
>>>>> NOTE: This needs review by ath9k and/or other informed
>>>>> people.
>>>>
>>>> This doesn't make sense. It might help, but it'll probably lead to not
>>>> being able to receive all frames off the air.
>>>>
>>>> If this is an issue, ath9k should do paged RX like iwlwifi.
>>>
>>> Ok, I backed this out..but now I'm back to getting buffer allocation
>>> failures (and this is on a system with 2GB RAM).
>>>
>>> Seems it's coming from mac80211 instead of ath9k, at least most of
>>> the time (I'm using 60 stations, so it probably needs to make lots of
>>> copies in the rx path). The traffic I'm generating/receiving is 1024 byte UDP
>>> payloads.
>>>
>>> Does this mean I really received a packet that was 3872 bytes long,
>>> or is the skb_copy allocating/copying empty data?
>>
>> Good question. The buffer we setup for DMA should be large since we
>> need to support AMSDU RX up to a certain bytes of RX data for the frame.
>> Harwdware should tell us the right size for the RX'd data and the skb
>> should be set with that size, respectively. Following this logic,
>> skb_copy() should only allocate on the order of the required skb->len.
>>
>> Remember that trick we did to force the older memory leak issues by
>> forcing an skb_copy() on every RX'd frame and then just discarding that
>> buffer immediately? You can try to do the same and print the skb->len
>> there, just to check what's going on.
>
>
> Using skb_copy() is wrong then, since it makes a copy (order-1
> allocations)
>
> It should use :
> skb_alloc( actual_size_of_frame not the 3840 thing ...)
> copy(data)

We need the extra stuff copied too I think (like skb->cb).

If you could provide a bit more complete example code, I'll be happy
to test it...

Thanks,
Ben

>
>


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2011-01-07 00:46:22

by Ben Greear

[permalink] [raw]
Subject: [PATCH 2/3] ath9k: Re-start xmit logic in xmit watchdog timer.

From: Ben Greear <[email protected]>

We should not get to this state, but we do. What is
worse, many times the xmit logic still will not start,
probably due to tids being paused when they shouldn't be.

Signed-off-by: Ben Greear <[email protected]>
---

NOTE: This needs review. It might be too much of a hack
for upstream code, and at best it works around a small part
of the problem.

:100644 100644 3aae523... 547fb44... M drivers/net/wireless/ath/ath9k/xmit.c
drivers/net/wireless/ath/ath9k/xmit.c | 21 +++++++++++++++++++++
1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
index 3aae523..547fb44 100644
--- a/drivers/net/wireless/ath/ath9k/xmit.c
+++ b/drivers/net/wireless/ath/ath9k/xmit.c
@@ -2110,6 +2110,27 @@ static void ath_tx_complete_poll_work(struct work_struct *work)
} else {
txq->axq_tx_inprogress = true;
}
+ } else {
+ /* If the queue has pending buffers, then it
+ * should be doing tx work (and have axq_depth).
+ * Shouldn't get to this state I think..but
+ * perhaps we do.
+ */
+ if (!list_empty(&txq->axq_acq)) {
+ ath_err(ath9k_hw_common(sc->sc_ah),
+ "txq: %p axq_qnum: %i,"
+ " axq_link: %p"
+ " pending frames: %i"
+ " axq_acq is not empty, but"
+ " axq_depth is zero. Calling"
+ " ath_txq_schedule to restart"
+ " tx logic.\n",
+ txq, txq->axq_qnum,
+ txq->axq_link,
+ txq->pending_frames);
+ ATH_DBG_WARN_ON_ONCE(1);
+ ath_txq_schedule(sc, txq);
+ }
}
spin_unlock_bh(&txq->axq_lock);
}
--
1.7.2.3


2011-01-07 20:27:04

by Eric Dumazet

[permalink] [raw]
Subject: Re: [ath9k-devel] [PATCH 1/3] ath9k: Decrease skb size to fit into one page.

Le vendredi 07 janvier 2011 à 12:09 -0800, Luis R. Rodriguez a écrit :
> On Fri, Jan 07, 2011 at 10:34:52AM -0800, Ben Greear wrote:
> > On 01/07/2011 02:58 AM, Johannes Berg wrote:
> > > On Thu, 2011-01-06 at 16:46 -0800, [email protected] wrote:
> > >> From: Ben Greear<[email protected]>
> > >>
> > >> Patch is from Eric Dumazet, as described here:
> > >> https://patchwork.kernel.org/patch/104271/
> > >>
> > >> Reported-by: Michael Guntsche<[email protected]>
> > >> Signed-off-by: Eric Dumazet<[email protected]>
> > >> Signed-off-by: Ben Greear<[email protected]>
> > >> ---
> > >>
> > >> NOTE: This needs review by ath9k and/or other informed
> > >> people.
> > >
> > > This doesn't make sense. It might help, but it'll probably lead to not
> > > being able to receive all frames off the air.
> > >
> > > If this is an issue, ath9k should do paged RX like iwlwifi.
> >
> > Ok, I backed this out..but now I'm back to getting buffer allocation
> > failures (and this is on a system with 2GB RAM).
> >
> > Seems it's coming from mac80211 instead of ath9k, at least most of
> > the time (I'm using 60 stations, so it probably needs to make lots of
> > copies in the rx path). The traffic I'm generating/receiving is 1024 byte UDP
> > payloads.
> >
> > Does this mean I really received a packet that was 3872 bytes long,
> > or is the skb_copy allocating/copying empty data?
>
> Good question. The buffer we setup for DMA should be large since we
> need to support AMSDU RX up to a certain bytes of RX data for the frame.
> Harwdware should tell us the right size for the RX'd data and the skb
> should be set with that size, respectively. Following this logic,
> skb_copy() should only allocate on the order of the required skb->len.
>
> Remember that trick we did to force the older memory leak issues by
> forcing an skb_copy() on every RX'd frame and then just discarding that
> buffer immediately? You can try to do the same and print the skb->len
> there, just to check what's going on.


Using skb_copy() is wrong then, since it makes a copy (order-1
allocations)

It should use :
skb_alloc( actual_size_of_frame not the 3840 thing ...)
copy(data)




2011-01-07 07:16:11

by Ben Greear

[permalink] [raw]
Subject: Re: [PATCH 2/3] ath9k: Re-start xmit logic in xmit watchdog timer.

On 01/06/2011 10:51 PM, Vasanthakumar Thiagarajan wrote:
> On Fri, Jan 07, 2011 at 06:16:04AM +0530, [email protected] wrote:
>> From: Ben Greear<[email protected]>
>>
>> We should not get to this state, but we do. What is
>> worse, many times the xmit logic still will not start,
>> probably due to tids being paused when they shouldn't be.
>>
>> Signed-off-by: Ben Greear<[email protected]>
>> ---
>>
>> NOTE: This needs review. It might be too much of a hack
>> for upstream code, and at best it works around a small part
>> of the problem.
>>
>> :100644 100644 3aae523... 547fb44... M drivers/net/wireless/ath/ath9k/xmit.c
>> drivers/net/wireless/ath/ath9k/xmit.c | 21 +++++++++++++++++++++
>> 1 files changed, 21 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
>> index 3aae523..547fb44 100644
>> --- a/drivers/net/wireless/ath/ath9k/xmit.c
>> +++ b/drivers/net/wireless/ath/ath9k/xmit.c
>> @@ -2110,6 +2110,27 @@ static void ath_tx_complete_poll_work(struct work_struct *work)
>> } else {
>> txq->axq_tx_inprogress = true;
>> }
>> + } else {
>> + /* If the queue has pending buffers, then it
>> + * should be doing tx work (and have axq_depth).
>> + * Shouldn't get to this state I think..but
>> + * perhaps we do.
>> + */
>> + if (!list_empty(&txq->axq_acq)) {
>> + ath_err(ath9k_hw_common(sc->sc_ah),
>> + "txq: %p axq_qnum: %i,"
>> + " axq_link: %p"
>> + " pending frames: %i"
>> + " axq_acq is not empty, but"
>> + " axq_depth is zero. Calling"
>> + " ath_txq_schedule to restart"
>> + " tx logic.\n",
>> + txq, txq->axq_qnum,
>> + txq->axq_link,
>> + txq->pending_frames);
>> + ATH_DBG_WARN_ON_ONCE(1);
>> + ath_txq_schedule(sc, txq);
>
> NAK. This complete work monitors the hw q periodically and does a reset if a
> hang is detected. This work is no way meant to schedule aggr. This
> change really does not make any sense. Scheduling a tid periodically
> would introduce reordering issues especially when there are more
> retries.

Ok, but if the system is in the state where it hits this code branch, it seems
that no new packets are sent to the chip to be transmitted. I see this,
for instance, in debugfs (with my debugfs patches applied):

axq-qnum: 2 3 1 0
axq-depth: 0 0 0 0
axq-ampdu_depth: 0 0 0 0
axq-stopped 1 0 0 0
tx-in-progress 0 0 0 0
pending-frames 70 0 0 0
txq_headidx: 0 0 0 0
txq_tailidx: 0 0 0 0
axq_q empty: 0 1 1 0
axq_acq empty: 0 1 1 1
txq_fifo_pending: 1 1 1 1

The queue is stopped, axq_acq and axq_q are not empty, there are pending frames,
and no axq-depth. I cannot figure out why the queue is stopped since
it seems it should be running when axq-depth is zero.

Thanks for the review, I'll back this hack out of my testing tree.

Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2011-01-07 00:46:14

by Ben Greear

[permalink] [raw]
Subject: [PATCH 3/3] ath9k: Keep track of stations for debugfs.

From: Ben Greear <[email protected]>

The stations hold the ath_node, which holds the tid
and other xmit logic structures. In order to debug
stuck xmit logic, we need a way to print out the tid
state for the stations.

Signed-off-by: Ben Greear <[email protected]>
---
:100644 100644 deda815... a34141a... M drivers/net/wireless/ath/ath9k/ath9k.h
:100644 100644 faf84e4... 8e29fb9... M drivers/net/wireless/ath/ath9k/debug.c
:100644 100644 b716ffb... a3e5539... M drivers/net/wireless/ath/ath9k/main.c
drivers/net/wireless/ath/ath9k/ath9k.h | 4 +
drivers/net/wireless/ath/ath9k/debug.c | 100 +++++++++++++++++++++++++++++++-
drivers/net/wireless/ath/ath9k/main.c | 20 ++++++-
3 files changed, 120 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h
index deda815..a34141a 100644
--- a/drivers/net/wireless/ath/ath9k/ath9k.h
+++ b/drivers/net/wireless/ath/ath9k/ath9k.h
@@ -592,6 +592,10 @@ struct ath_softc {
struct work_struct paprd_work;
struct work_struct hw_check_work;
struct completion paprd_complete;
+#ifdef CONFIG_ATH9K_DEBUGFS
+#define ATH9K_MAX_STATIONS 1024
+ struct ieee80211_sta *stations[ATH9K_MAX_STATIONS]; /* dbg purposes */
+#endif
bool paprd_pending;

u32 intrstatus;
diff --git a/drivers/net/wireless/ath/ath9k/debug.c b/drivers/net/wireless/ath/ath9k/debug.c
index faf84e4..8e29fb9 100644
--- a/drivers/net/wireless/ath/ath9k/debug.c
+++ b/drivers/net/wireless/ath/ath9k/debug.c
@@ -587,6 +587,8 @@ static const struct file_operations fops_wiphy = {
sc->debug.stats.txstats[WME_AC_BK].elem, \
sc->debug.stats.txstats[WME_AC_VI].elem, \
sc->debug.stats.txstats[WME_AC_VO].elem); \
+ if (len >= size) \
+ goto done; \
} while(0)

#define PRX(str, elem) \
@@ -597,6 +599,8 @@ do { \
(unsigned int)(sc->tx.txq[WME_AC_BK].elem), \
(unsigned int)(sc->tx.txq[WME_AC_VI].elem), \
(unsigned int)(sc->tx.txq[WME_AC_VO].elem)); \
+ if (len >= size) \
+ goto done; \
} while(0)

#define PRQLE(str, elem) \
@@ -607,6 +611,8 @@ do { \
list_empty(&sc->tx.txq[WME_AC_BK].elem), \
list_empty(&sc->tx.txq[WME_AC_VI].elem), \
list_empty(&sc->tx.txq[WME_AC_VO].elem)); \
+ if (len >= size) \
+ goto done; \
} while (0)

static ssize_t read_file_xmit(struct file *file, char __user *user_buf,
@@ -614,7 +620,7 @@ static ssize_t read_file_xmit(struct file *file, char __user *user_buf,
{
struct ath_softc *sc = file->private_data;
char *buf;
- unsigned int len = 0, size = 4000;
+ unsigned int len = 0, size = 8000;
int i;
ssize_t retval = 0;
char tmp[32];
@@ -623,7 +629,10 @@ static ssize_t read_file_xmit(struct file *file, char __user *user_buf,
if (buf == NULL)
return -ENOMEM;

- len += sprintf(buf, "%30s %10s%10s%10s\n\n", "BE", "BK", "VI", "VO");
+ len += sprintf(buf, "Num-Tx-Queues: %i tx-queues-setup: 0x%x\n"
+ "%30s %10s%10s%10s\n\n",
+ ATH9K_NUM_TX_QUEUES, sc->tx.txqsetup,
+ "BE", "BK", "VI", "VO");

PR("MPDUs Queued: ", queued);
PR("MPDUs Completed: ", completed);
@@ -644,6 +653,14 @@ static ssize_t read_file_xmit(struct file *file, char __user *user_buf,
PR("hw-put-tx-buf: ", puttxbuf);
PR("hw-tx-start: ", txstart);
PR("hw-tx-proc-desc: ", txprocdesc);
+ len += snprintf(buf + len, size - len,
+ "%s%11p%11p%10p%10p\n", "txq-memory-address:",
+ &(sc->tx.txq[WME_AC_BE]),
+ &(sc->tx.txq[WME_AC_BK]),
+ &(sc->tx.txq[WME_AC_VI]),
+ &(sc->tx.txq[WME_AC_VO]));
+ if (len >= size)
+ goto done;

PRX("axq-qnum: ", axq_qnum);
PRX("axq-depth: ", axq_depth);
@@ -661,6 +678,74 @@ static ssize_t read_file_xmit(struct file *file, char __user *user_buf,
snprintf(tmp, sizeof(tmp) - 1, "txq_fifo[%i] empty: ", i);
PRQLE(tmp, txq_fifo[i]);
}
+
+done:
+ if (len > size)
+ len = size;
+
+ retval = simple_read_from_buffer(user_buf, count, ppos, buf, len);
+ kfree(buf);
+
+ return retval;
+}
+
+static ssize_t read_file_stations(struct file *file, char __user *user_buf,
+ size_t count, loff_t *ppos)
+{
+ struct ath_softc *sc = file->private_data;
+ char *buf;
+ unsigned int len = 0, size = 64000;
+ int i;
+ ssize_t retval = 0;
+
+ buf = kzalloc(size, GFP_KERNEL);
+ if (buf == NULL)
+ return -ENOMEM;
+
+ len += snprintf(buf + len, size - len,
+ "Stations:\n"
+ " tid: addr sched paused buf_q-empty an ac\n"
+ " ac: addr sched tid_q-empty txq\n");
+
+ for (i = 0; i < ATH9K_MAX_STATIONS; i++) {
+ if (sc->stations[i]) {
+ struct ath_node *an;
+ int q;
+ an = (struct ath_node *)(sc->stations[i]->drv_priv);
+
+ len += snprintf(buf + len, size - len,
+ "%pM\n", sc->stations[i]->addr);
+ if (len >= size)
+ goto done;
+
+ for (q = 0; q < WME_NUM_TID; q++) {
+ struct ath_atx_tid *tid = &(an->tid[q]);
+ len += snprintf(buf + len, size - len,
+ " tid: %p %s %s %i %p %p\n",
+ tid,
+ tid->sched ? "sched" : "idle",
+ tid->paused ? "pause" : " ",
+ list_empty(&tid->buf_q),
+ tid->an, tid->ac);
+ if (len >= size)
+ goto done;
+ }
+
+ for (q = 0; q < WME_NUM_AC; q++) {
+ struct ath_atx_ac *ac = &(an->ac[q]);
+ len += snprintf(buf + len, size - len,
+ " ac: %p %s %i %p\n",
+ ac,
+ ac->sched ? "sched" : "idle",
+ list_empty(&ac->tid_q),
+ ac->txq);
+ if (len >= size)
+ goto done;
+ }
+ }
+ }
+
+done:
if (len > size)
len = size;

@@ -708,6 +793,13 @@ static const struct file_operations fops_xmit = {
.llseek = default_llseek,
};

+static const struct file_operations fops_stations = {
+ .read = read_file_stations,
+ .open = ath9k_debugfs_open,
+ .owner = THIS_MODULE,
+ .llseek = default_llseek,
+};
+
static ssize_t read_file_recv(struct file *file, char __user *user_buf,
size_t count, loff_t *ppos)
{
@@ -945,6 +1037,10 @@ int ath9k_init_debug(struct ath_hw *ah)
sc, &fops_xmit))
goto err;

+ if (!debugfs_create_file("stations", S_IRUSR, sc->debug.debugfs_phy,
+ sc, &fops_stations))
+ goto err;
+
if (!debugfs_create_file("recv", S_IRUSR, sc->debug.debugfs_phy,
sc, &fops_recv))
goto err;
diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
index b716ffb..a3e5539 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -1769,7 +1769,15 @@ static int ath9k_sta_add(struct ieee80211_hw *hw,
{
struct ath_wiphy *aphy = hw->priv;
struct ath_softc *sc = aphy->sc;
-
+#ifdef CONFIG_ATH9K_DEBUGFS
+ int i;
+ for (i = 0; i < ATH9K_MAX_STATIONS; i++) {
+ if (!sc->stations[i]) {
+ sc->stations[i] = sta;
+ break;
+ }
+ }
+#endif
ath_node_attach(sc, sta);

return 0;
@@ -1781,7 +1789,15 @@ static int ath9k_sta_remove(struct ieee80211_hw *hw,
{
struct ath_wiphy *aphy = hw->priv;
struct ath_softc *sc = aphy->sc;
-
+#ifdef CONFIG_ATH9K_DEBUGFS
+ int i;
+ for (i = 0; i < ATH9K_MAX_STATIONS; i++) {
+ if (sc->stations[i] == sta) {
+ sc->stations[i] = NULL;
+ break;
+ }
+ }
+#endif
ath_node_detach(sc, sta);

return 0;
--
1.7.2.3


Subject: Re: [ath9k-devel] [PATCH 2/3] ath9k: Re-start xmit logic in xmit watchdog timer.

On Fri, Jan 07, 2011 at 08:41:48PM +0530, Peter Stuge wrote:
> Vasanthakumar Thiagarajan wrote:
> > > +++ b/drivers/net/wireless/ath/ath9k/xmit.c
> > > @@ -2110,6 +2110,27 @@ static void ath_tx_complete_poll_work(struct work_struct *work)
> > > } else {
> > > txq->axq_tx_inprogress = true;
> > > }
> > > + } else {
> ..
> > > + if (!list_empty(&txq->axq_acq)) {
> ..
> > > + ath_txq_schedule(sc, txq);
> >
> > NAK. This complete work monitors the hw q periodically and does a
> > reset if a hang is detected.
>
> Could you please point to that hang check?

It not exactly a hang check. It is monitoring tx intr when hw q is
non-empty.

Vasanth

2011-01-07 02:49:13

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [ath9k-devel] [PATCH 3/3] ath9k: Keep track of stations for debugfs.

On Thu, Jan 06, 2011 at 06:45:33PM -0800, Ben Greear wrote:
> On 01/06/2011 06:30 PM, Luis R. Rodriguez wrote:
> > On Thu, Jan 6, 2011 at 4:46 PM,<[email protected]> wrote:
> >> +#define ATH9K_MAX_STATIONS 1024
> >
> > How about making this a Kconfig with a default to a value of the known
> > (by you) max workable number of STAs that one can use on ath9k, which
> > is modifiable to other values by power of two up to 1024. Advise in
> > the kconfig that if more STAs are used then some issue may arise but
> > should be reported (so the issue can be fixed). This way by default
> > normal users (you're not normal) won't enable> max known workable
> > stable number of STAs on ath9k.
>
> This is just for debugging at this point. It wastes a bit of memory
> when debugfs is enabled, but otherwise doesn't affect anything. It's
> not even really a problem if there are more stations than fit in
> the array.

I meant to use the value as a limit on the # of STAs you can create
with one ath9k device. The debugfs can still be used as you did,
only that the limit would come from the kconfig value.

> I can reproduce all my problems with < 128, so if you'd prefer
> the number be smaller, that's fine with me. I don't think it's
> worth a configurable value, however.

I thikn we should limit the # STAs to whatever it is that you can
use in a realiable way, this should be a driver limitation, but
I figured you'd want to configure this to a higher value yourself
for whatever tests you are doing. We should fix it though but at
least other clueless users would not go over stable limit you have
found.

Luis

2011-01-07 02:33:33

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH 1/3] ath9k: Decrease skb size to fit into one page.

Le vendredi 07 janvier 2011 à 03:24 +0100, Eric Dumazet a écrit :

> Given IEEE80211_MAX_MPDU_LEN is more than 3840, and skb shinfo adds more
> than 256 bytes, I can assert rx_bufsize is greater than 4096 : order-1

On 64bit arches :

sizeof(struct skb_shared_info) = 0x198 = 408




2011-01-07 02:45:38

by Ben Greear

[permalink] [raw]
Subject: Re: [ath9k-devel] [PATCH 3/3] ath9k: Keep track of stations for debugfs.

On 01/06/2011 06:30 PM, Luis R. Rodriguez wrote:
> On Thu, Jan 6, 2011 at 4:46 PM,<[email protected]> wrote:
>> +#define ATH9K_MAX_STATIONS 1024
>
> How about making this a Kconfig with a default to a value of the known
> (by you) max workable number of STAs that one can use on ath9k, which
> is modifiable to other values by power of two up to 1024. Advise in
> the kconfig that if more STAs are used then some issue may arise but
> should be reported (so the issue can be fixed). This way by default
> normal users (you're not normal) won't enable> max known workable
> stable number of STAs on ath9k.

This is just for debugging at this point. It wastes a bit of memory
when debugfs is enabled, but otherwise doesn't affect anything. It's
not even really a problem if there are more stations than fit in
the array.

I can reproduce all my problems with < 128, so if you'd prefer
the number be smaller, that's fine with me. I don't think it's
worth a configurable value, however.

Thanks,
Ben

>
> Luis


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2011-01-07 19:46:10

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [ath9k-devel] [PATCH 3/3] ath9k: Keep track of stations for debugfs.

On Thu, Jan 06, 2011 at 07:17:32PM -0800, Ben Greear wrote:
> On 01/06/2011 06:49 PM, Luis R. Rodriguez wrote:
> > On Thu, Jan 06, 2011 at 06:45:33PM -0800, Ben Greear wrote:
> >> On 01/06/2011 06:30 PM, Luis R. Rodriguez wrote:
> >>> On Thu, Jan 6, 2011 at 4:46 PM,<[email protected]> wrote:
> >>>> +#define ATH9K_MAX_STATIONS 1024
> >>>
> >>> How about making this a Kconfig with a default to a value of the known
> >>> (by you) max workable number of STAs that one can use on ath9k, which
> >>> is modifiable to other values by power of two up to 1024. Advise in
> >>> the kconfig that if more STAs are used then some issue may arise but
> >>> should be reported (so the issue can be fixed). This way by default
> >>> normal users (you're not normal) won't enable> max known workable
> >>> stable number of STAs on ath9k.
> >>
> >> This is just for debugging at this point. It wastes a bit of memory
> >> when debugfs is enabled, but otherwise doesn't affect anything. It's
> >> not even really a problem if there are more stations than fit in
> >> the array.
> >
> > I meant to use the value as a limit on the # of STAs you can create
> > with one ath9k device. The debugfs can still be used as you did,
> > only that the limit would come from the kconfig value.
> >
> >> I can reproduce all my problems with< 128, so if you'd prefer
> >> the number be smaller, that's fine with me. I don't think it's
> >> worth a configurable value, however.
> >
> > I thikn we should limit the # STAs to whatever it is that you can
> > use in a realiable way, this should be a driver limitation, but
> > I figured you'd want to configure this to a higher value yourself
> > for whatever tests you are doing. We should fix it though but at
> > least other clueless users would not go over stable limit you have
> > found.
>
> I think it's very likely that the problems I find are general issues that
> are just much easier to hit with lots of stations.

I agree 100% :)

> There is probably no
> 'safe' number of stations...just takes longer to see bugs with
> fewer stations.

The point is to prevent users from going above the known safe limit.
No point in allowing more STAs in the driver if they won't work.
As a matter of fact, this may be a welcomed cfg80211 driver limitation
which can be exposed via nl80211.

> For instance, you still see the failure-to-stop-DMA errors with a single station, right?

On some hard corner cases but yes, and this is exactly why I agree that
the issues you are seeing *must* be debugged and fixed. The issues you have
found with over 60 STAs on ath9k with one device have helped us reproduce
some hard corner case bugs that were triggerable before only in rare situations.

> And the tx locking stuff was just easier to exercise with lots of stations,
> but it would have been possible to hit it with 2 stations.

Right.

> The current tx-hang stuff I'm chasing seems like logic bugs in the queueing,
> probably nothing in particular about the chipset.

Sure.

Luis

2011-01-07 02:07:57

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH 1/3] ath9k: Decrease skb size to fit into one page.

Le jeudi 06 janvier 2011 à 17:57 -0800, Luis R. Rodriguez a écrit :
> On Thu, Jan 6, 2011 at 5:23 PM, Eric Dumazet <[email protected]> wrote:
> > Le vendredi 07 janvier 2011 à 02:04 +0100, Christian Lamparter a écrit :
> >> On Friday 07 January 2011 01:46:03 [email protected] wrote:
> >> > From: Ben Greear <[email protected]>
> >> >
> >> > Patch is from Eric Dumazet, as described here:
> >> > https://patchwork.kernel.org/patch/104271/
> >> >
> >> > Reported-by: Michael Guntsche <[email protected]>
> >> > Signed-off-by: Eric Dumazet <[email protected]>
> >> > Signed-off-by: Ben Greear <[email protected]>
> >> > ---
> >> >
> >> > NOTE: This needs review by ath9k and/or other informed
> >> > people.
> >>
> >> Does the hardware support vector-i/o for rx (like for instance iwlagn)?
> >> Else, this change would break A-MSDU rx - which is a mandatory feature
> >> (although, not very popular) of 802.11n -
> >>
> >> See for example 802.11n-2009 9.7c:
> >>
> >> "Support for the reception of an A-MSDU, where [...], is mandatory for
> >> an HT STA"
> >>
> >> And 7.1.2 "The maximum frame body size is determined by the maximum
> >> MSDU size (2304 octets) OR the maximum A-MSDU (3839 or 7935 octets,
> >> depending upon the STA's capability), plus any overhead from security
> >> encapsulation.
> >
> > Then, only solution is to mark this broken, and perform a copy of each
> > received frame, to keep a order-1 buffer(s) allocated for hardware.
>
> -ENOTPOSSIBLE -- its an WFA requirement to RX AMSDU.
>
> > Its too easy to have memory allocation failures for high order pages and
> > freeze the card.
>
> Can't we us paged RX skbs, which mac80211 supports now?
>
> See 2f301227a1ede57504694e1f64839839f5737cac and friends.
>

Maybe you dont understand the point. A fix is needed for stable kernels.
paged RX skbs is probably a bit complex, even if copy/pasted from other
drivers.

If the hardware needs 8192 bytes (or 16384) buffers to perform its
operation, it should not give them back to linux, because there is no
guarantee it can allocate fresh ones for the next frames.




2011-01-07 01:57:43

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [PATCH 1/3] ath9k: Decrease skb size to fit into one page.

On Thu, Jan 6, 2011 at 5:23 PM, Eric Dumazet <[email protected]> wrote:
> Le vendredi 07 janvier 2011 à 02:04 +0100, Christian Lamparter a écrit :
>> On Friday 07 January 2011 01:46:03 [email protected] wrote:
>> > From: Ben Greear <[email protected]>
>> >
>> > Patch is from Eric Dumazet, as described here:
>> > https://patchwork.kernel.org/patch/104271/
>> >
>> > Reported-by: Michael Guntsche <[email protected]>
>> > Signed-off-by: Eric Dumazet <[email protected]>
>> > Signed-off-by: Ben Greear <[email protected]>
>> > ---
>> >
>> > NOTE:  This needs review by ath9k and/or other informed
>> > people.
>>
>> Does the hardware support vector-i/o for rx (like for instance iwlagn)?
>> Else, this change would break A-MSDU rx - which is a mandatory feature
>> (although, not very popular) of 802.11n -
>>
>> See for example 802.11n-2009 9.7c:
>>
>> "Support for the reception of an A-MSDU, where [...], is mandatory for
>> an HT STA"
>>
>> And 7.1.2 "The maximum frame body size is determined by the maximum
>> MSDU size (2304 octets) OR the maximum A-MSDU (3839 or 7935 octets,
>> depending upon the STA's capability), plus any overhead from security
>> encapsulation.
>
> Then, only solution is to mark this broken, and perform a copy of each
> received frame, to keep a order-1 buffer(s) allocated for hardware.

-ENOTPOSSIBLE -- its an WFA requirement to RX AMSDU.

> Its too easy to have memory allocation failures for high order pages and
> freeze the card.

Can't we us paged RX skbs, which mac80211 supports now?

See 2f301227a1ede57504694e1f64839839f5737cac and friends.

Luis

2011-01-07 22:47:01

by Ben Greear

[permalink] [raw]
Subject: Re: [ath9k-devel] [PATCH 1/3] ath9k: Decrease skb size to fit into one page.

On 01/07/2011 02:26 PM, Eric Dumazet wrote:
> Le vendredi 07 janvier 2011 à 14:20 -0800, Ben Greear a écrit :
>> On 0
>>> Using skb_copy() is wrong then, since it makes a copy (order-1
>>> allocations)
>>>
>>> It should use :
>>> skb_alloc( actual_size_of_frame not the 3840 thing ...)
>>> copy(data)
>>
>> We need the extra stuff copied too I think (like skb->cb).
>>
>> If you could provide a bit more complete example code, I'll be happy
>> to test it...
>
> take a random drivers/net using copybreak ... say ... tg3.c
>
> lines 4785
>
> len = length_of_the_current_frame
> copy_skb = netdev_alloc_skb(..., len);
> // allocates exact required size not a byte more....
> ...
> skb_copy_from_linear_data(skb, copy_skb->data, len);
> ...
> skb_put(skb, len);
> ...

That seems fine for drivers, but I'm guessing something different
would be needed in the mac80211/rx.c code:

I figure doing the fix here might be a nice quick fix,
and would help with all funky drivers. Later, can fix the
ath9k to do the skb copying as soon as DMA is done?

/*
* This function returns whether or not the SKB
* was destined for RX processing or not, which,
* if consume is true, is equivalent to whether
* or not the skb was consumed.
*/
static bool ieee80211_prepare_and_rx_handle(struct ieee80211_rx_data *rx,
struct sk_buff *skb, bool consume)
{
struct ieee80211_local *local = rx->local;
struct ieee80211_sub_if_data *sdata = rx->sdata;
struct ieee80211_rx_status *status = IEEE80211_SKB_RXCB(skb);
struct ieee80211_hdr *hdr = (void *)skb->data;
int prepares;

rx->skb = skb;
status->rx_flags |= IEEE80211_RX_RA_MATCH;
prepares = prepare_for_handlers(rx, hdr);

if (!prepares)
return false;

if (status->flag & RX_FLAG_MMIC_ERROR) {
if (status->rx_flags & IEEE80211_RX_RA_MATCH)
ieee80211_rx_michael_mic_report(hdr, rx);
return false;
}

if (!consume) {
skb = skb_copy(skb, GFP_ATOMIC);
if (!skb) {
if (net_ratelimit()) {
printk("failed skb_copy, skb->len: %i\n", skb->len);
wiphy_debug(local->hw.wiphy,
"failed to copy skb for %s\n",
sdata->name);
}
return true;
}

rx->skb = skb;
}

ieee80211_invoke_rx_handlers(rx);
return true;
}


Thanks,
Ben


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2011-01-07 20:09:14

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [ath9k-devel] [PATCH 1/3] ath9k: Decrease skb size to fit into one page.

On Fri, Jan 07, 2011 at 10:34:52AM -0800, Ben Greear wrote:
> On 01/07/2011 02:58 AM, Johannes Berg wrote:
> > On Thu, 2011-01-06 at 16:46 -0800, [email protected] wrote:
> >> From: Ben Greear<[email protected]>
> >>
> >> Patch is from Eric Dumazet, as described here:
> >> https://patchwork.kernel.org/patch/104271/
> >>
> >> Reported-by: Michael Guntsche<[email protected]>
> >> Signed-off-by: Eric Dumazet<[email protected]>
> >> Signed-off-by: Ben Greear<[email protected]>
> >> ---
> >>
> >> NOTE: This needs review by ath9k and/or other informed
> >> people.
> >
> > This doesn't make sense. It might help, but it'll probably lead to not
> > being able to receive all frames off the air.
> >
> > If this is an issue, ath9k should do paged RX like iwlwifi.
>
> Ok, I backed this out..but now I'm back to getting buffer allocation
> failures (and this is on a system with 2GB RAM).
>
> Seems it's coming from mac80211 instead of ath9k, at least most of
> the time (I'm using 60 stations, so it probably needs to make lots of
> copies in the rx path). The traffic I'm generating/receiving is 1024 byte UDP
> payloads.
>
> Does this mean I really received a packet that was 3872 bytes long,
> or is the skb_copy allocating/copying empty data?

Good question. The buffer we setup for DMA should be large since we
need to support AMSDU RX up to a certain bytes of RX data for the frame.
Harwdware should tell us the right size for the RX'd data and the skb
should be set with that size, respectively. Following this logic,
skb_copy() should only allocate on the order of the required skb->len.

Remember that trick we did to force the older memory leak issues by
forcing an skb_copy() on every RX'd frame and then just discarding that
buffer immediately? You can try to do the same and print the skb->len
there, just to check what's going on.

Luis

2011-01-07 02:48:50

by Ben Greear

[permalink] [raw]
Subject: Re: [ath9k-devel] [PATCH 3/3] ath9k: Keep track of stations for debugfs.

On 01/06/2011 06:45 PM, Felix Fietkau wrote:
> On 2011-01-06 5:46 PM, [email protected] wrote:
>> From: Ben Greear<[email protected]>
>>
>> The stations hold the ath_node, which holds the tid
>> and other xmit logic structures. In order to debug
>> stuck xmit logic, we need a way to print out the tid
>> state for the stations.
>>
>> Signed-off-by: Ben Greear<[email protected]>
> I really don't like the array with the hardcoded size. How about just adding a struct list_head to the ath_node struct and tracking that?

I can do that..would probably want a pointer back from the ath_node to it's STA struct
as well.

I'll re-work the patch to use a list.

Thanks,
Ben

>
> - Felix


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2011-01-07 00:57:32

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [ath9k-devel] [PATCH 1/3] ath9k: Decrease skb size to fit into one page.

On Thu, Jan 6, 2011 at 4:46 PM, <[email protected]> wrote:
> From: Ben Greear <[email protected]>
>
> Patch is from Eric Dumazet, as described here:
> https://patchwork.kernel.org/patch/104271/
>
> Reported-by: Michael Guntsche <[email protected]>
> Signed-off-by: Eric Dumazet <[email protected]>

Did he SOB it?

Luis

2011-01-07 10:58:05

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH 1/3] ath9k: Decrease skb size to fit into one page.

On Thu, 2011-01-06 at 16:46 -0800, [email protected] wrote:
> From: Ben Greear <[email protected]>
>
> Patch is from Eric Dumazet, as described here:
> https://patchwork.kernel.org/patch/104271/
>
> Reported-by: Michael Guntsche <[email protected]>
> Signed-off-by: Eric Dumazet <[email protected]>
> Signed-off-by: Ben Greear <[email protected]>
> ---
>
> NOTE: This needs review by ath9k and/or other informed
> people.

This doesn't make sense. It might help, but it'll probably lead to not
being able to receive all frames off the air.

If this is an issue, ath9k should do paged RX like iwlwifi.

johannes


2011-01-07 15:36:09

by Peter Stuge

[permalink] [raw]
Subject: Re: [ath9k-devel] [PATCH 3/3] ath9k: Keep track of stations for debugfs.

Ben Greear wrote:
> > but at least other clueless users would not go over stable limit
> > you have found.
>
> I think it's very likely that the problems I find are general
> issues that are just much easier to hit with lots of stations.

I strongly support this. I recognize several of your problems from my
attempts at using ath9k with only a single STA which was all but stable.


> There is probably no 'safe' number of stations...just takes longer
> to see bugs with fewer stations.
>
> For instance, you still see the failure-to-stop-DMA errors with a
> single station, right? And the tx locking stuff was just easier to
> exercise with lots of stations, but it would have been possible to
> hit it with 2 stations.
>
> The current tx-hang stuff I'm chasing seems like logic bugs in the
> queueing, probably nothing in particular about the chipset.

This is also my impression. Since it is important for Atheros to have
bugzilla reports rather than discussion on list I will try to find
time for creating more and/or new bug reports about my problems.

(But the ones I have created so far did not really receive very good
response.)

Note that my AR9280 is not even detected correctly on PCI now so I
may switch back to AR5008 for that.

I really should try out FreeBSD too.


//Peter

2011-01-07 02:13:45

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [PATCH 1/3] ath9k: Decrease skb size to fit into one page.

On Thu, Jan 6, 2011 at 6:07 PM, Eric Dumazet <[email protected]> wrote:
> Le jeudi 06 janvier 2011 à 17:57 -0800, Luis R. Rodriguez a écrit :
>> On Thu, Jan 6, 2011 at 5:23 PM, Eric Dumazet <[email protected]> wrote:
>> > Le vendredi 07 janvier 2011 à 02:04 +0100, Christian Lamparter a écrit :
>> >> On Friday 07 January 2011 01:46:03 [email protected] wrote:
>> >> > From: Ben Greear <[email protected]>
>> >> >
>> >> > Patch is from Eric Dumazet, as described here:
>> >> > https://patchwork.kernel.org/patch/104271/
>> >> >
>> >> > Reported-by: Michael Guntsche <[email protected]>
>> >> > Signed-off-by: Eric Dumazet <[email protected]>
>> >> > Signed-off-by: Ben Greear <[email protected]>
>> >> > ---
>> >> >
>> >> > NOTE:  This needs review by ath9k and/or other informed
>> >> > people.
>> >>
>> >> Does the hardware support vector-i/o for rx (like for instance iwlagn)?
>> >> Else, this change would break A-MSDU rx - which is a mandatory feature
>> >> (although, not very popular) of 802.11n -
>> >>
>> >> See for example 802.11n-2009 9.7c:
>> >>
>> >> "Support for the reception of an A-MSDU, where [...], is mandatory for
>> >> an HT STA"
>> >>
>> >> And 7.1.2 "The maximum frame body size is determined by the maximum
>> >> MSDU size (2304 octets) OR the maximum A-MSDU (3839 or 7935 octets,
>> >> depending upon the STA's capability), plus any overhead from security
>> >> encapsulation.
>> >
>> > Then, only solution is to mark this broken, and perform a copy of each
>> > received frame, to keep a order-1 buffer(s) allocated for hardware.
>>
>> -ENOTPOSSIBLE -- its an WFA requirement to RX AMSDU.
>>
>> > Its too easy to have memory allocation failures for high order pages and
>> > freeze the card.
>>
>> Can't we us paged RX skbs, which mac80211 supports now?
>>
>> See 2f301227a1ede57504694e1f64839839f5737cac and friends.
>>
>
> Maybe you dont understand the point. A fix is needed for stable kernels.
> paged RX skbs is probably a bit complex, even if copy/pasted from other
> drivers.

The only way to accept your patch is to use a debugfs option to
disable it, we need AMSDU support enabled by default.

> If the hardware needs 8192 bytes (or 16384) buffers to perform its
> operation, it should not give them back to linux, because there is no
> guarantee it can allocate fresh ones for the next frames.

Last I looked at this the issue was not the upper used by the driver
but an issue of a roundoff by the kernel that ended up on *some*
machines going a higher order. Not all machines use the higher order.
I wondered at one point if using ksize() might help here too but
again, this is a new API. Not sure how to fix it for older kernels.

Luis

2011-01-07 20:30:21

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [ath9k-devel] [PATCH 3/3] ath9k: Keep track of stations for debugfs.

On Fri, Jan 07, 2011 at 12:25:31PM -0800, Luis Rodriguez wrote:
> On Fri, Jan 07, 2011 at 12:01:01PM -0800, Luis Rodriguez wrote:
> > On Fri, Jan 07, 2011 at 07:36:07AM -0800, Peter Stuge wrote:
> > > Ben Greear wrote:
> > > > > but at least other clueless users would not go over stable limit
> > > > > you have found.
> > > >
> > > > I think it's very likely that the problems I find are general
> > > > issues that are just much easier to hit with lots of stations.
> > >
> > > I strongly support this. I recognize several of your problems from my
> > > attempts at using ath9k with only a single STA which was all but stable.
> >
> > Sure, see my other e-mail.
> >
> > > > There is probably no 'safe' number of stations...just takes longer
> > > > to see bugs with fewer stations.
> > > >
> > > > For instance, you still see the failure-to-stop-DMA errors with a
> > > > single station, right? And the tx locking stuff was just easier to
> > > > exercise with lots of stations, but it would have been possible to
> > > > hit it with 2 stations.
> > > >
> > > > The current tx-hang stuff I'm chasing seems like logic bugs in the
> > > > queueing, probably nothing in particular about the chipset.
> > >
> > > This is also my impression. Since it is important for Atheros to have
> > > bugzilla reports rather than discussion on list
> >
> > I'm trying to tell you how you can more efficiently work with developers
> > on reporting issues, I'm not singling you out but I am telling you that
> > the energy you spend on complaining on things not being addressed can be
> > better put on reporting issues more efficiently.
> >
> > Reporting issues on the list helps but what helps is a describin the
> > issue for a specific release but the most difficult thing to do sometimes
> > is to come up with a recipe for a way to reproduce a specific issue. Without
> > this it is harder to debug issues. If you can come up with ways to reproduce
> > issues then it becomes easier for engineers to start digging. Additionally
> > if an issue is seen that was not observed before the reporter may also
> > do a git bisect to try to identify the culprit commit. Be aware though that
> > bisecting on wireless-testing can only be done against the master-* tags,
> > and not on the entire tree due to the way John updates his tree. If you
> > see the issue on a stable kernel though you can just use Linus' tree or
> > the linux-2.6-allstable git tree which will have the stable extra versioned
> > kernels as well.
> >
> > Reporting issues for stable kernels should go through the kernel bugzilla,
> > but can start off on the mailing list. ath9k-devel though is not ideal for
> > reporting major issues and I recommend linux-wireless to be used instead
> > for that. If an issue is reporting for wireless-testing with a good series
> > of reproducible steps chances are very high it will be addressed. Motivated
> > highly technical users willing to help further get bonus points if they go
> > the extra mile and bisect.
> >
> > Furthermore, issues can be kept track on here:
> >
> > http://wireless.kernel.org/en/users/Drivers/ath9k/bugs
> >
> > This keeps track of major issues reported, and what people are working on.
>
> I'll also note:
>
> http://wireless.kernel.org/en/users/Documentation/Reporting_bugs
>
> If this needs update feel free to edit. The hope is to make it easier for
> users to identify issues and report them properly.

Oh and before I forget, last tip, using a new wpa_supplicant always helps me :)

Luis

2011-01-07 18:35:02

by Ben Greear

[permalink] [raw]
Subject: Re: [PATCH 1/3] ath9k: Decrease skb size to fit into one page.

On 01/07/2011 02:58 AM, Johannes Berg wrote:
> On Thu, 2011-01-06 at 16:46 -0800, [email protected] wrote:
>> From: Ben Greear<[email protected]>
>>
>> Patch is from Eric Dumazet, as described here:
>> https://patchwork.kernel.org/patch/104271/
>>
>> Reported-by: Michael Guntsche<[email protected]>
>> Signed-off-by: Eric Dumazet<[email protected]>
>> Signed-off-by: Ben Greear<[email protected]>
>> ---
>>
>> NOTE: This needs review by ath9k and/or other informed
>> people.
>
> This doesn't make sense. It might help, but it'll probably lead to not
> being able to receive all frames off the air.
>
> If this is an issue, ath9k should do paged RX like iwlwifi.

Ok, I backed this out..but now I'm back to getting buffer allocation
failures (and this is on a system with 2GB RAM).

Seems it's coming from mac80211 instead of ath9k, at least most of
the time (I'm using 60 stations, so it probably needs to make lots of
copies in the rx path). The traffic I'm generating/receiving is 1024 byte UDP
payloads.

Does this mean I really received a packet that was 3872 bytes long,
or is the skb_copy allocating/copying empty data?

Jan 7 10:00:45 localhost kernel: skbuff alloc of size 3872 failed
Jan 7 10:00:45 localhost kernel: skbuff alloc of size 3872 failed
Jan 7 10:00:45 localhost kernel: __alloc_pages_slowpath: 2886 callbacks suppressed
Jan 7 10:00:45 localhost kernel: kswapd0: page allocation failure. order:2, mode:0x4020
Jan 7 10:00:45 localhost kernel: Pid: 29, comm: kswapd0 Not tainted 2.6.37-wl+ #62
Jan 7 10:00:45 localhost kernel: Call Trace:
Jan 7 10:00:45 localhost kernel: [<7878c822>] ? printk+0x18/0x1e
Jan 7 10:00:45 localhost kernel: [<784938ef>] __alloc_pages_nodemask+0x625/0x668
Jan 7 10:00:45 localhost kernel: [<784b66a5>] alloc_slab_page+0x1d/0x21
Jan 7 10:00:45 localhost kernel: [<784b66fc>] new_slab+0x53/0x144
Jan 7 10:00:45 localhost kernel: [<784b6c05>] __slab_alloc.clone.4+0x133/0x1f9
Jan 7 10:00:45 localhost kernel: [<784b6f3c>] ? kmem_cache_alloc+0x7c/0xa1
Jan 7 10:00:45 localhost kernel: [<786edc0d>] ? skb_copy+0x33/0x87
Jan 7 10:00:45 localhost kernel: [<784b7374>] __kmalloc_track_caller+0xc6/0x115
Jan 7 10:00:45 localhost kernel: [<786edc0d>] ? skb_copy+0x33/0x87
Jan 7 10:00:45 localhost kernel: [<786ed642>] __alloc_skb+0x58/0xf4
Jan 7 10:00:45 localhost kernel: [<786edc0d>] skb_copy+0x33/0x87
Jan 7 10:00:45 localhost kernel: [<fa4c73e2>] ieee80211_prepare_and_rx_handle+0x3be/0x86f [mac80211]
Jan 7 10:00:45 localhost kernel: [<fa4c80ba>] ieee80211_rx+0x795/0x853 [mac80211]
Jan 7 10:00:45 localhost kernel: [<fa4c7a1c>] ? ieee80211_rx+0xf7/0x853 [mac80211]
Jan 7 10:00:45 localhost kernel: [<7845007b>] ? pm_qos_update_request+0x4b/0x57
Jan 7 10:00:45 localhost kernel: [<fa591366>] ath_rx_send_to_mac80211+0x5a/0x60 [ath9k]
Jan 7 10:00:45 localhost kernel: [<fa592ce7>] ath_rx_tasklet+0x1318/0x13af [ath9k]
Jan 7 10:00:45 localhost kernel: [<7845a405>] ? mark_lock+0x1e/0x1eb
Jan 7 10:00:45 localhost kernel: [<7845a619>] ? mark_held_locks+0x47/0x5f
Jan 7 10:00:45 localhost kernel: [<7878ebcb>] ? _raw_spin_unlock_irqrestore+0x3c/0x48
Jan 7 10:00:45 localhost kernel: [<7845a85c>] ? trace_hardirqs_on_caller+0xeb/0x125
Jan 7 10:00:45 localhost kernel: [<7845a8a1>] ? trace_hardirqs_on+0xb/0xd
Jan 7 10:00:45 localhost kernel: [<fa590ce1>] ath9k_tasklet+0x98/0x12d [ath9k]
Jan 7 10:00:45 localhost kernel: [<7845a85c>] ? trace_hardirqs_on_caller+0xeb/0x125
Jan 7 10:00:45 localhost kernel: [<7843be09>] tasklet_action+0x88/0xe3
Jan 7 10:00:45 localhost kernel: [<7843c385>] __do_softirq+0x85/0x142
Jan 7 10:00:45 localhost kernel: [<7843c300>] ? __do_softirq+0x0/0x142
Jan 7 10:00:45 localhost kernel: [<7843c300>] ? __do_softirq+0x0/0x142
Jan 7 10:00:45 localhost kernel: <IRQ> [<7843c1a7>] ? irq_exit+0x35/0x69
Jan 7 10:00:45 localhost kernel: [<7841a399>] ? smp_apic_timer_interrupt+0x74/0x81
Jan 7 10:00:45 localhost kernel: [<785976e0>] ? trace_hardirqs_off_thunk+0xc/0x10
Jan 7 10:00:45 localhost kernel: [<7878f39f>] ? apic_timer_interrupt+0x2f/0x40
Jan 7 10:00:45 localhost kernel: [<7845007b>] ? pm_qos_update_request+0x4b/0x57
Jan 7 10:00:45 localhost kernel: [<784b700a>] ? kmem_cache_free+0xa9/0xb5
Jan 7 10:00:45 localhost kernel: [<7852cafa>] ? ext4_destroy_inode+0x91/0x98
Jan 7 10:00:45 localhost kernel: [<7852cafa>] ? ext4_destroy_inode+0x91/0x98
Jan 7 10:00:45 localhost kernel: [<7852cafa>] ? ext4_destroy_inode+0x91/0x98
Jan 7 10:00:45 localhost kernel: [<784d0c1d>] ? percpu_counter_dec+0x19/0x1b
Jan 7 10:00:45 localhost kernel: [<784d12f1>] ? destroy_inode+0x2d/0x3e
Jan 7 10:00:45 localhost kernel: [<784d17f8>] ? dispose_list+0x8d/0x9c
Jan 7 10:00:45 localhost kernel: [<784d1d8d>] ? shrink_icache_memory+0x1d7/0x218
Jan 7 10:00:45 localhost kernel: [<78497560>] ? shrink_slab+0xdc/0x144
Jan 7 10:00:45 localhost kernel: [<78498891>] ? kswapd+0x468/0x66f
Jan 7 10:00:45 localhost kernel: [<7844b776>] ? autoremove_wake_function+0x0/0x34
Jan 7 10:00:45 localhost kernel: [<78498429>] ? kswapd+0x0/0x66f
Jan 7 10:00:45 localhost kernel: [<7844b465>] ? kthread+0x62/0x67
Jan 7 10:00:45 localhost kernel: [<7844b403>] ? kthread+0x0/0x67
Jan 7 10:00:45 localhost kernel: [<784036c6>] ? kernel_thread_helper+0x6/0x1a
Jan 7 10:00:45 localhost kernel: Mem-Info:
Jan 7 10:00:45 localhost kernel: DMA per-cpu:
Jan 7 10:00:45 localhost kernel: CPU 0: hi: 0, btch: 1 usd: 0
Jan 7 10:00:45 localhost kernel: CPU 1: hi: 0, btch: 1 usd: 0
Jan 7 10:00:45 localhost kernel: Normal per-cpu:
Jan 7 10:00:45 localhost kernel: CPU 0: hi: 186, btch: 31 usd: 165
Jan 7 10:00:45 localhost kernel: CPU 1: hi: 186, btch: 31 usd: 61
Jan 7 10:00:45 localhost kernel: active_anon:48191 inactive_anon:1802 isolated_anon:0
Jan 7 10:00:45 localhost kernel: active_file:16287 inactive_file:29786 isolated_file:0
Jan 7 10:00:45 localhost kernel: unevictable:0 dirty:12 writeback:0 unstable:0
Jan 7 10:00:45 localhost kernel: free:7257 slab_reclaimable:12121 slab_unreclaimable:386040
Jan 7 10:00:45 localhost kernel: mapped:9692 shmem:1985 pagetables:1131 bounce:0
Jan 7 10:00:45 localhost kernel: DMA free:4288kB min:40kB low:48kB high:60kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15804kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB
slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jan 7 10:00:45 localhost kernel: lowmem_reserve[]: 0 2007 2007 2007
Jan 7 10:00:45 localhost kernel: Normal free:24740kB min:5712kB low:7140kB high:8568kB active_anon:192764kB inactive_anon:7208kB active_file:65148kB
inactive_file:119144kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2056128kB mlocked:0kB dirty:48kB writeback:0kB mapped:38768kB shmem:7940kB
slab_reclaimable:48484kB slab_unreclaimable:1544160kB kernel_stack:6736kB pagetables:4524kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no
Jan 7 10:00:45 localhost kernel: lowmem_reserve[]: 0 0 0 0
Jan 7 10:00:45 localhost kernel: DMA: 0*4kB 2*8kB 1*16kB 1*32kB 2*64kB 2*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 4288kB
Jan 7 10:00:45 localhost kernel: Normal: 3067*4kB 1491*8kB 26*16kB 0*32kB 0*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 24740kB
Jan 7 10:00:45 localhost kernel: 48061 total pagecache pages
Jan 7 10:00:45 localhost kernel: 0 pages in swap cache
Jan 7 10:00:45 localhost kernel: Swap cache stats: add 0, delete 0, find 0/0
Jan 7 10:00:45 localhost kernel: Free swap = 0kB
Jan 7 10:00:45 localhost kernel: Total swap = 0kB
Jan 7 10:00:45 localhost kernel: 522160 pages RAM
Jan 7 10:00:45 localhost kernel: 0 pages HighMem
Jan 7 10:00:45 localhost kernel: 12174 pages reserved
Jan 7 10:00:45 localhost kernel: 78943 pages shared
Jan 7 10:00:45 localhost kernel: 469325 pages non-shared

Thanks,
Ben

>
> johannes


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2011-01-07 15:11:51

by Peter Stuge

[permalink] [raw]
Subject: Re: [ath9k-devel] [PATCH 2/3] ath9k: Re-start xmit logic in xmit watchdog timer.

Vasanthakumar Thiagarajan wrote:
> > +++ b/drivers/net/wireless/ath/ath9k/xmit.c
> > @@ -2110,6 +2110,27 @@ static void ath_tx_complete_poll_work(struct work_struct *work)
> > } else {
> > txq->axq_tx_inprogress = true;
> > }
> > + } else {
..
> > + if (!list_empty(&txq->axq_acq)) {
..
> > + ath_txq_schedule(sc, txq);
>
> NAK. This complete work monitors the hw q periodically and does a
> reset if a hang is detected.

Could you please point to that hang check?


//Peter

2011-01-07 02:30:47

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [ath9k-devel] [PATCH 3/3] ath9k: Keep track of stations for debugfs.

On Thu, Jan 6, 2011 at 4:46 PM, <[email protected]> wrote:
> +#define ATH9K_MAX_STATIONS 1024

How about making this a Kconfig with a default to a value of the known
(by you) max workable number of STAs that one can use on ath9k, which
is modifiable to other values by power of two up to 1024. Advise in
the kconfig that if more STAs are used then some issue may arise but
should be reported (so the issue can be fixed). This way by default
normal users (you're not normal) won't enable > max known workable
stable number of STAs on ath9k.

Luis

2011-01-07 01:03:58

by Ben Greear

[permalink] [raw]
Subject: Re: [ath9k-devel] [PATCH 1/3] ath9k: Decrease skb size to fit into one page.

On 01/06/2011 04:57 PM, Luis R. Rodriguez wrote:
> On Thu, Jan 6, 2011 at 4:46 PM,<[email protected]> wrote:
>> From: Ben Greear<[email protected]>
>>
>> Patch is from Eric Dumazet, as described here:
>> https://patchwork.kernel.org/patch/104271/
>>
>> Reported-by: Michael Guntsche<[email protected]>
>> Signed-off-by: Eric Dumazet<[email protected]>
>
> Did he SOB it?

I guess not explicitly. But, it's a small little patch that has been
languishing for 6 months. Please put the fix in one way or another.
I don't care that I get any credit what-so-ever.

Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2011-01-07 20:25:33

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [ath9k-devel] [PATCH 3/3] ath9k: Keep track of stations for debugfs.

On Fri, Jan 07, 2011 at 12:01:01PM -0800, Luis Rodriguez wrote:
> On Fri, Jan 07, 2011 at 07:36:07AM -0800, Peter Stuge wrote:
> > Ben Greear wrote:
> > > > but at least other clueless users would not go over stable limit
> > > > you have found.
> > >
> > > I think it's very likely that the problems I find are general
> > > issues that are just much easier to hit with lots of stations.
> >
> > I strongly support this. I recognize several of your problems from my
> > attempts at using ath9k with only a single STA which was all but stable.
>
> Sure, see my other e-mail.
>
> > > There is probably no 'safe' number of stations...just takes longer
> > > to see bugs with fewer stations.
> > >
> > > For instance, you still see the failure-to-stop-DMA errors with a
> > > single station, right? And the tx locking stuff was just easier to
> > > exercise with lots of stations, but it would have been possible to
> > > hit it with 2 stations.
> > >
> > > The current tx-hang stuff I'm chasing seems like logic bugs in the
> > > queueing, probably nothing in particular about the chipset.
> >
> > This is also my impression. Since it is important for Atheros to have
> > bugzilla reports rather than discussion on list
>
> I'm trying to tell you how you can more efficiently work with developers
> on reporting issues, I'm not singling you out but I am telling you that
> the energy you spend on complaining on things not being addressed can be
> better put on reporting issues more efficiently.
>
> Reporting issues on the list helps but what helps is a describin the
> issue for a specific release but the most difficult thing to do sometimes
> is to come up with a recipe for a way to reproduce a specific issue. Without
> this it is harder to debug issues. If you can come up with ways to reproduce
> issues then it becomes easier for engineers to start digging. Additionally
> if an issue is seen that was not observed before the reporter may also
> do a git bisect to try to identify the culprit commit. Be aware though that
> bisecting on wireless-testing can only be done against the master-* tags,
> and not on the entire tree due to the way John updates his tree. If you
> see the issue on a stable kernel though you can just use Linus' tree or
> the linux-2.6-allstable git tree which will have the stable extra versioned
> kernels as well.
>
> Reporting issues for stable kernels should go through the kernel bugzilla,
> but can start off on the mailing list. ath9k-devel though is not ideal for
> reporting major issues and I recommend linux-wireless to be used instead
> for that. If an issue is reporting for wireless-testing with a good series
> of reproducible steps chances are very high it will be addressed. Motivated
> highly technical users willing to help further get bonus points if they go
> the extra mile and bisect.
>
> Furthermore, issues can be kept track on here:
>
> http://wireless.kernel.org/en/users/Drivers/ath9k/bugs
>
> This keeps track of major issues reported, and what people are working on.

I'll also note:

http://wireless.kernel.org/en/users/Documentation/Reporting_bugs

If this needs update feel free to edit. The hope is to make it easier for
users to identify issues and report them properly.

Luis

2011-01-07 01:05:24

by Christian Lamparter

[permalink] [raw]
Subject: Re: [PATCH 1/3] ath9k: Decrease skb size to fit into one page.

On Friday 07 January 2011 01:46:03 [email protected] wrote:
> From: Ben Greear <[email protected]>
>
> Patch is from Eric Dumazet, as described here:
> https://patchwork.kernel.org/patch/104271/
>
> Reported-by: Michael Guntsche <[email protected]>
> Signed-off-by: Eric Dumazet <[email protected]>
> Signed-off-by: Ben Greear <[email protected]>
> ---
>
> NOTE: This needs review by ath9k and/or other informed
> people.

Does the hardware support vector-i/o for rx (like for instance iwlagn)?
Else, this change would break A-MSDU rx - which is a mandatory feature
(although, not very popular) of 802.11n -

See for example 802.11n-2009 9.7c:

"Support for the reception of an A-MSDU, where [...], is mandatory for
an HT STA"

And 7.1.2 "The maximum frame body size is determined by the maximum
MSDU size (2304 octets) OR the maximum A-MSDU (3839 or 7935 octets,
depending upon the STA's capability), plus any overhead from security
encapsulation.

> diff --git a/drivers/net/wireless/ath/ath9k/recv.c b/drivers/net/wireless/ath/ath9k/recv.c
> index b2497b8..270661d 100644
> --- a/drivers/net/wireless/ath/ath9k/recv.c
> +++ b/drivers/net/wireless/ath/ath9k/recv.c
> @@ -230,11 +230,11 @@ static int ath_rx_edma_init(struct ath_softc *sc, int nbufs)
> int error = 0, i;
> u32 size;
>
> -
> - common->rx_bufsize = roundup(IEEE80211_MAX_MPDU_LEN +
> - ah->caps.rx_status_len,
> - min(common->cachelsz, (u16)64));
> -
> + size = roundup(IEEE80211_MAX_MPDU_LEN + ah->caps.rx_status_len,
> + min(common->cachelsz, (u16)64));
> + common->rx_bufsize = max_t(u32, size,
> + SKB_MAX_ORDER(NET_SKB_PAD
> + + common->cachelsz, 0));
> ath9k_hw_set_rx_bufsize(ah, common->rx_bufsize -
> ah->caps.rx_status_len);
>
>

2011-01-07 20:01:06

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [ath9k-devel] [PATCH 3/3] ath9k: Keep track of stations for debugfs.

On Fri, Jan 07, 2011 at 07:36:07AM -0800, Peter Stuge wrote:
> Ben Greear wrote:
> > > but at least other clueless users would not go over stable limit
> > > you have found.
> >
> > I think it's very likely that the problems I find are general
> > issues that are just much easier to hit with lots of stations.
>
> I strongly support this. I recognize several of your problems from my
> attempts at using ath9k with only a single STA which was all but stable.

Sure, see my other e-mail.

> > There is probably no 'safe' number of stations...just takes longer
> > to see bugs with fewer stations.
> >
> > For instance, you still see the failure-to-stop-DMA errors with a
> > single station, right? And the tx locking stuff was just easier to
> > exercise with lots of stations, but it would have been possible to
> > hit it with 2 stations.
> >
> > The current tx-hang stuff I'm chasing seems like logic bugs in the
> > queueing, probably nothing in particular about the chipset.
>
> This is also my impression. Since it is important for Atheros to have
> bugzilla reports rather than discussion on list

I'm trying to tell you how you can more efficiently work with developers
on reporting issues, I'm not singling you out but I am telling you that
the energy you spend on complaining on things not being addressed can be
better put on reporting issues more efficiently.

Reporting issues on the list helps but what helps is a describin the
issue for a specific release but the most difficult thing to do sometimes
is to come up with a recipe for a way to reproduce a specific issue. Without
this it is harder to debug issues. If you can come up with ways to reproduce
issues then it becomes easier for engineers to start digging. Additionally
if an issue is seen that was not observed before the reporter may also
do a git bisect to try to identify the culprit commit. Be aware though that
bisecting on wireless-testing can only be done against the master-* tags,
and not on the entire tree due to the way John updates his tree. If you
see the issue on a stable kernel though you can just use Linus' tree or
the linux-2.6-allstable git tree which will have the stable extra versioned
kernels as well.

Reporting issues for stable kernels should go through the kernel bugzilla,
but can start off on the mailing list. ath9k-devel though is not ideal for
reporting major issues and I recommend linux-wireless to be used instead
for that. If an issue is reporting for wireless-testing with a good series
of reproducible steps chances are very high it will be addressed. Motivated
highly technical users willing to help further get bonus points if they go
the extra mile and bisect.

Furthermore, issues can be kept track on here:

http://wireless.kernel.org/en/users/Drivers/ath9k/bugs

This keeps track of major issues reported, and what people are working on.

> I will try to find
> time for creating more and/or new bug reports about my problems.

Good, the bug reports will not be necessary if the issue is not on a
stable kernel as it will just be a regression against the last stable
kernel. If the issue is seen on a

> (But the ones I have created so far did not really receive very good
> response.)

Try to increase the quality of your reports and I assure you that your
issues will be addressed.

> Note that my AR9280 is not even detected correctly on PCI now so I
> may switch back to AR5008 for that.
>
> I really should try out FreeBSD too.

Good luck with that ;)

Luis

2011-01-07 02:24:52

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH 1/3] ath9k: Decrease skb size to fit into one page.

Le jeudi 06 janvier 2011 à 18:13 -0800, Luis R. Rodriguez a écrit :
> On Thu, Jan 6, 2011 at 6:07 PM, Eric Dumazet <[email protected]> wrote:

> The only way to accept your patch is to use a debugfs option to
> disable it, we need AMSDU support enabled by default.
>

I dont care of my patch ;)

There is a reason I felt it was not appropriate.

> > If the hardware needs 8192 bytes (or 16384) buffers to perform its
> > operation, it should not give them back to linux, because there is no
> > guarantee it can allocate fresh ones for the next frames.
>
> Last I looked at this the issue was not the upper used by the driver
> but an issue of a roundoff by the kernel that ended up on *some*
> machines going a higher order. Not all machines use the higher order.
> I wondered at one point if using ksize() might help here too but
> again, this is a new API. Not sure how to fix it for older kernels.

common->rx_bufsize = roundup(IEEE80211_MAX_MPDU_LEN +
ah->caps.rx_status_len,
min(common->cachelsz, (u16)64));

Given IEEE80211_MAX_MPDU_LEN is more than 3840, and skb shinfo adds more
than 256 bytes, I can assert rx_bufsize is greater than 4096 : order-1
allocations at minimum.
ksize() wont help you.

Many net drivers perform a copy (r8169.c comes to mind, because of a
hardware flaw).




2011-01-07 15:20:10

by Ben Greear

[permalink] [raw]
Subject: Re: [ath9k-devel] [PATCH 2/3] ath9k: Re-start xmit logic in xmit watchdog timer.

On 01/07/2011 07:11 AM, Peter Stuge wrote:
> Vasanthakumar Thiagarajan wrote:
>>> +++ b/drivers/net/wireless/ath/ath9k/xmit.c
>>> @@ -2110,6 +2110,27 @@ static void ath_tx_complete_poll_work(struct work_struct *work)
>>> } else {
>>> txq->axq_tx_inprogress = true;
>>> }
>>> + } else {
> ..
>>> + if (!list_empty(&txq->axq_acq)) {
> ..
>>> + ath_txq_schedule(sc, txq);
>>
>> NAK. This complete work monitors the hw q periodically and does a
>> reset if a hang is detected.
>
> Could you please point to that hang check?

It's right there in the method. Basically, first call of this poll_work might
set the txq to axq_tx_inprogress, and if it's still in progress on next call,
it resets the hardware.

Now that you mention it, however...the channel change logic restarts this worker
for immediate execution. That means that two channel changes very close together
could easily cause this reset to hit when it really shouldn't. Probably we need
to store the time at which tx-in-progress was set and only reset the NIC if the
timer has expired (not just if it was detected twice in a row)??

Thanks,
Ben

>
>
> //Peter
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2011-01-09 09:34:04

by Johannes Berg

[permalink] [raw]
Subject: Re: [ath9k-devel] [PATCH 1/3] ath9k: Decrease skb size to fit into one page.

On Fri, 2011-01-07 at 14:46 -0800, Ben Greear wrote:
> On 01/07/2011 02:26 PM, Eric Dumazet wrote:
> > Le vendredi 07 janvier 2011 à 14:20 -0800, Ben Greear a écrit :
> >> On 0
> >>> Using skb_copy() is wrong then, since it makes a copy (order-1
> >>> allocations)
> >>>
> >>> It should use :
> >>> skb_alloc( actual_size_of_frame not the 3840 thing ...)
> >>> copy(data)
> >>
> >> We need the extra stuff copied too I think (like skb->cb).
> >>
> >> If you could provide a bit more complete example code, I'll be happy
> >> to test it...
> >
> > take a random drivers/net using copybreak ... say ... tg3.c
> >
> > lines 4785
> >
> > len = length_of_the_current_frame
> > copy_skb = netdev_alloc_skb(..., len);
> > // allocates exact required size not a byte more....
> > ...
> > skb_copy_from_linear_data(skb, copy_skb->data, len);
> > ...
> > skb_put(skb, len);
> > ...
>
> That seems fine for drivers, but I'm guessing something different
> would be needed in the mac80211/rx.c code:
>
> I figure doing the fix here might be a nice quick fix,
> and would help with all funky drivers. Later, can fix the
> ath9k to do the skb copying as soon as DMA is done?

This code in mac80211 that you point out will nearly never do a copy
unless you have a lot of multicast traffic. I really don't think it
makes sense to think about changing that now.

johannes


2011-01-07 03:17:36

by Ben Greear

[permalink] [raw]
Subject: Re: [ath9k-devel] [PATCH 3/3] ath9k: Keep track of stations for debugfs.

On 01/06/2011 06:49 PM, Luis R. Rodriguez wrote:
> On Thu, Jan 06, 2011 at 06:45:33PM -0800, Ben Greear wrote:
>> On 01/06/2011 06:30 PM, Luis R. Rodriguez wrote:
>>> On Thu, Jan 6, 2011 at 4:46 PM,<[email protected]> wrote:
>>>> +#define ATH9K_MAX_STATIONS 1024
>>>
>>> How about making this a Kconfig with a default to a value of the known
>>> (by you) max workable number of STAs that one can use on ath9k, which
>>> is modifiable to other values by power of two up to 1024. Advise in
>>> the kconfig that if more STAs are used then some issue may arise but
>>> should be reported (so the issue can be fixed). This way by default
>>> normal users (you're not normal) won't enable> max known workable
>>> stable number of STAs on ath9k.
>>
>> This is just for debugging at this point. It wastes a bit of memory
>> when debugfs is enabled, but otherwise doesn't affect anything. It's
>> not even really a problem if there are more stations than fit in
>> the array.
>
> I meant to use the value as a limit on the # of STAs you can create
> with one ath9k device. The debugfs can still be used as you did,
> only that the limit would come from the kconfig value.
>
>> I can reproduce all my problems with< 128, so if you'd prefer
>> the number be smaller, that's fine with me. I don't think it's
>> worth a configurable value, however.
>
> I thikn we should limit the # STAs to whatever it is that you can
> use in a realiable way, this should be a driver limitation, but
> I figured you'd want to configure this to a higher value yourself
> for whatever tests you are doing. We should fix it though but at
> least other clueless users would not go over stable limit you have
> found.

I think it's very likely that the problems I find are general issues that
are just much easier to hit with lots of stations. There is probably no
'safe' number of stations...just takes longer to see bugs with
fewer stations.

For instance, you still see the failure-to-stop-DMA errors with a single station, right?
And the tx locking stuff was just easier to exercise with lots of stations,
but it would have been possible to hit it with 2 stations.

The current tx-hang stuff I'm chasing seems like logic bugs in the queueing,
probably nothing in particular about the chipset.

Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2011-01-07 02:45:34

by Felix Fietkau

[permalink] [raw]
Subject: Re: [ath9k-devel] [PATCH 3/3] ath9k: Keep track of stations for debugfs.

On 2011-01-06 5:46 PM, [email protected] wrote:
> From: Ben Greear<[email protected]>
>
> The stations hold the ath_node, which holds the tid
> and other xmit logic structures. In order to debug
> stuck xmit logic, we need a way to print out the tid
> state for the stations.
>
> Signed-off-by: Ben Greear<[email protected]>
I really don't like the array with the hardcoded size. How about just
adding a struct list_head to the ath_node struct and tracking that?

- Felix

Subject: Re: [PATCH 2/3] ath9k: Re-start xmit logic in xmit watchdog timer.

On Fri, Jan 07, 2011 at 06:16:04AM +0530, [email protected] wrote:
> From: Ben Greear <[email protected]>
>
> We should not get to this state, but we do. What is
> worse, many times the xmit logic still will not start,
> probably due to tids being paused when they shouldn't be.
>
> Signed-off-by: Ben Greear <[email protected]>
> ---
>
> NOTE: This needs review. It might be too much of a hack
> for upstream code, and at best it works around a small part
> of the problem.
>
> :100644 100644 3aae523... 547fb44... M drivers/net/wireless/ath/ath9k/xmit.c
> drivers/net/wireless/ath/ath9k/xmit.c | 21 +++++++++++++++++++++
> 1 files changed, 21 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
> index 3aae523..547fb44 100644
> --- a/drivers/net/wireless/ath/ath9k/xmit.c
> +++ b/drivers/net/wireless/ath/ath9k/xmit.c
> @@ -2110,6 +2110,27 @@ static void ath_tx_complete_poll_work(struct work_struct *work)
> } else {
> txq->axq_tx_inprogress = true;
> }
> + } else {
> + /* If the queue has pending buffers, then it
> + * should be doing tx work (and have axq_depth).
> + * Shouldn't get to this state I think..but
> + * perhaps we do.
> + */
> + if (!list_empty(&txq->axq_acq)) {
> + ath_err(ath9k_hw_common(sc->sc_ah),
> + "txq: %p axq_qnum: %i,"
> + " axq_link: %p"
> + " pending frames: %i"
> + " axq_acq is not empty, but"
> + " axq_depth is zero. Calling"
> + " ath_txq_schedule to restart"
> + " tx logic.\n",
> + txq, txq->axq_qnum,
> + txq->axq_link,
> + txq->pending_frames);
> + ATH_DBG_WARN_ON_ONCE(1);
> + ath_txq_schedule(sc, txq);

NAK. This complete work monitors the hw q periodically and does a reset if a
hang is detected. This work is no way meant to schedule aggr. This
change really does not make any sense. Scheduling a tid periodically
would introduce reordering issues especially when there are more
retries.


Vasanth

2011-01-07 22:34:52

by Eric Dumazet

[permalink] [raw]
Subject: Re: [ath9k-devel] [PATCH 1/3] ath9k: Decrease skb size to fit into one page.

Le vendredi 07 janvier 2011 à 14:20 -0800, Ben Greear a écrit :
> On 0
> > Using skb_copy() is wrong then, since it makes a copy (order-1
> > allocations)
> >
> > It should use :
> > skb_alloc( actual_size_of_frame not the 3840 thing ...)
> > copy(data)
>
> We need the extra stuff copied too I think (like skb->cb).
>
> If you could provide a bit more complete example code, I'll be happy
> to test it...

take a random drivers/net using copybreak ... say ... tg3.c

lines 4785

len = length_of_the_current_frame
copy_skb = netdev_alloc_skb(..., len);
// allocates exact required size not a byte more....
...
skb_copy_from_linear_data(skb, copy_skb->data, len);
...
skb_put(skb, len);
...