2013-03-14 18:46:23

by Larry Finger

[permalink] [raw]
Subject: [RFC/RFT] b43: A fix for DMA transmission sequence errors

From: Iestyn C. Elfick <[email protected]>

Intermittently, b43 will report "Out of order TX status report on DMA ring".
When this happens, the driver must be reset before communication can resume.
The cause of the problem is believed to be an error in the closed-source
firmware; however, all versions of the firmware are affected.

This change uses the observation that the expected status is always 2 less
than the observed value, and supplies a fake status report to skip one
header/data pair.

Signed-off-by: Larry Finger <[email protected]>
Cc: Stable <[email protected]>
---

Although this is a hack, it seems to work for the OP. It should not cause any
ill effects on any device that does not have the problem.

It does need a lot more testing, particularly on systems prone to the out-of-order
status report problem.

If anyone can think of a cleaner, less intrusive solution, please let us know.

Thanks,

Larry
---

Index: wireless-testing-rebased/drivers/net/wireless/b43/dma.c
===================================================================
--- wireless-testing-rebased.orig/drivers/net/wireless/b43/dma.c
+++ wireless-testing-rebased/drivers/net/wireless/b43/dma.c
@@ -1487,8 +1487,11 @@ void b43_dma_handle_txstatus(struct b43_
const struct b43_dma_ops *ops;
struct b43_dmaring *ring;
struct b43_dmadesc_meta *meta;
+ static const struct b43_txstatus fake; /* filled with 0 */
+ const struct b43_txstatus *txstat;
int slot, firstused;
bool frame_succeed;
+ int skip;

ring = parse_cookie(dev, status->cookie, &slot);
if (unlikely(!ring))
@@ -1501,13 +1504,26 @@ void b43_dma_handle_txstatus(struct b43_
firstused = ring->current_slot - ring->used_slots + 1;
if (firstused < 0)
firstused = ring->nr_slots + firstused;
+
+ skip = 0;
if (unlikely(slot != firstused)) {
/* This possibly is a firmware bug and will result in
- * malfunction, memory leaks and/or stall of DMA functionality. */
+ * malfunction, memory leaks and/or stall of DMA functionality.
+ */
b43dbg(dev->wl, "Out of order TX status report on DMA ring %d. "
"Expected %d, but got %d\n",
ring->index, firstused, slot);
- return;
+ if (slot == firstused + 2) {
+ /* If a single header/data pair was missed, skip over
+ * the first two slots in an attempt to recover.
+ */
+ slot = firstused;
+ skip = 2;
+ b43dbg(dev->wl, "Skip on DMA ring %d slot %d.\n",
+ ring->index, slot);
+ } else {
+ return;
+ }
}

ops = ring->ops;
@@ -1522,11 +1538,13 @@ void b43_dma_handle_txstatus(struct b43_
slot, firstused, ring->index);
break;
}
+
if (meta->skb) {
struct b43_private_tx_info *priv_info =
- b43_get_priv_tx_info(IEEE80211_SKB_CB(meta->skb));
+ b43_get_priv_tx_info(IEEE80211_SKB_CB(meta->skb));

- unmap_descbuffer(ring, meta->dmaaddr, meta->skb->len, 1);
+ unmap_descbuffer(ring, meta->dmaaddr,
+ meta->skb->len, 1);
kfree(priv_info->bouncebuffer);
priv_info->bouncebuffer = NULL;
} else {
@@ -1538,8 +1556,9 @@ void b43_dma_handle_txstatus(struct b43_
struct ieee80211_tx_info *info;

if (unlikely(!meta->skb)) {
- /* This is a scatter-gather fragment of a frame, so
- * the skb pointer must not be NULL. */
+ /* This is a scatter-gather fragment of a frame,
+ * so the skb pointer must not be NULL.
+ */
b43dbg(dev->wl, "TX status unexpected NULL skb "
"at slot %d (first=%d) on ring %d\n",
slot, firstused, ring->index);
@@ -1550,9 +1569,18 @@ void b43_dma_handle_txstatus(struct b43_

/*
* Call back to inform the ieee80211 subsystem about
- * the status of the transmission.
+ * the status of the transmission. When skipping over
+ * a missed TX status report, use a status structure
+ * filled with zeros to indicate that the frame was not
+ * sent (frame_count 0) and not acknowledged
*/
- frame_succeed = b43_fill_txstatus_report(dev, info, status);
+ if (unlikely(!skip))
+ txstat = status;
+ else
+ txstat = &fake;
+
+ frame_succeed = b43_fill_txstatus_report(dev, info,
+ txstat);
#ifdef CONFIG_B43_DEBUG
if (frame_succeed)
ring->nr_succeed_tx_packets++;
@@ -1580,12 +1608,14 @@ void b43_dma_handle_txstatus(struct b43_
/* Everything unmapped and free'd. So it's not used anymore. */
ring->used_slots--;

- if (meta->is_last_fragment) {
+ if (meta->is_last_fragment && !skip) {
/* This is the last scatter-gather
* fragment of the frame. We are done. */
break;
}
slot = next_slot(ring, slot);
+ if (skip > 0)
+ --skip;
}
if (ring->stopped) {
B43_WARN_ON(free_slots(ring) < TX_SLOTS_PER_FRAME);


2013-03-20 15:04:45

by ISE Development

[permalink] [raw]
Subject: Re: [RFC/RFT] b43: A fix for DMA transmission sequence errors

On Wednesday 20 Mar 2013 09:53:28 Larry Finger wrote:
> On 03/20/2013 06:26 AM, ISE Development wrote:
> > Did you get a chance to validate this patch?
> >
> > So far on my system, it has been running for over week without problems
> > (over 15GB of traffic).
>
> I have been using the patch, but I am testing rtlwifi drivers quite heavily
> at the moment, thus b43 has had limited usage. In fact, I was waiting to
> get a report from you before pushing the patch. It is ready to go.
>
> How frequently do you get the automatic reset?
>
> Larry

At least 3 or 4 per second when exceeding 2Mbit/s, otherwise around 1-2 per 10
second interval. The card was literally unusable without the patch.

--
-- isedev

2013-03-15 06:37:42

by Rafał Miłecki

[permalink] [raw]
Subject: Re: [RFC/RFT] b43: A fix for DMA transmission sequence errors

2013/3/14 Rafał Miłecki <[email protected]>:
> 2013/3/14 Larry Finger <[email protected]>:
>> From: Iestyn C. Elfick <[email protected]>
>>
>> Intermittently, b43 will report "Out of order TX status report on DMA ring".
>> When this happens, the driver must be reset before communication can resume.
>> The cause of the problem is believed to be an error in the closed-source
>> firmware; however, all versions of the firmware are affected.
>>
>> This change uses the observation that the expected status is always 2 less
>> than the observed value, and supplies a fake status report to skip one
>> header/data pair.
>>
>> Signed-off-by: Larry Finger <[email protected]>
>> Cc: Stable <[email protected]>
>> ---
>>
>> Although this is a hack, it seems to work for the OP. It should not cause any
>> ill effects on any device that does not have the problem.
>>
>> It does need a lot more testing, particularly on systems prone to the out-of-order
>> status report problem.
>>
>> If anyone can think of a cleaner, less intrusive solution, please let us know.
>
> I wonder if this is hardware supporting unaligned access with
> full-addressing for DMA engine. Maybe it support index-based indexes
> for backward compatibility only, and it's somehow broken?
>
> Can we test full-addressing first? I'll try to prepare a patch for
> this during the weekend.

OK, so after early tests, this card doesn't support full-addressing at all.

I'll give this patch a try during the weekend.

--
Rafał

2013-03-20 11:26:27

by ISE Development

[permalink] [raw]
Subject: Re: [RFC/RFT] b43: A fix for DMA transmission sequence errors

On Friday 15 Mar 2013 11:48:42 Larry Finger wrote:
> On 03/15/2013 01:37 AM, Rafał Miłecki wrote:
> > OK, so after early tests, this card doesn't support full-addressing at
> > all.
> >
> > I'll give this patch a try during the weekend.
>
> Please follow Michael's suggestion, and change
>
> if (slot == firstused + 2) {
>
> into
>
> if (slot == next_slot(ring, next_slot(ring, firstused))) {
>
> This change is needed just in case "firstused + 2" wraps around.
>
> Thanks,
>
> Larry

Did you get a chance to validate this patch?

So far on my system, it has been running for over week without problems (over
15GB of traffic).

Iestyn.

2013-03-20 13:48:49

by Chris Vine

[permalink] [raw]
Subject: Re: [RFC/RFT] b43: A fix for DMA transmission sequence errors

On Wed, 20 Mar 2013 12:26:23 +0100
ISE Development <[email protected]> wrote:

> On Friday 15 Mar 2013 11:48:42 Larry Finger wrote:
> > On 03/15/2013 01:37 AM, Rafał Miłecki wrote:
> > > OK, so after early tests, this card doesn't support
> > > full-addressing at all.
> > >
> > > I'll give this patch a try during the weekend.
> >
> > Please follow Michael's suggestion, and change
> >
> > if (slot == firstused + 2) {
> >
> > into
> >
> > if (slot == next_slot(ring, next_slot(ring,
> > firstused))) {
> >
> > This change is needed just in case "firstused + 2" wraps around.
> >
> > Thanks,
> >
> > Larry
>
> Did you get a chance to validate this patch?
>
> So far on my system, it has been running for over week without
> problems (over 15GB of traffic).

If it helps as a data point, I have run the patch with the wrap-over
change suggested by Michael Busch and with the unlikely->likely
branching correction and it has been working fine for the last 5 days
on the 3.8.2 kernel, with approx 5GB of traffic. Before that, I put
about 10GB through the previous version of the patch as originally sent
by isedev.

Chris

2013-03-15 16:48:47

by Larry Finger

[permalink] [raw]
Subject: Re: [RFC/RFT] b43: A fix for DMA transmission sequence errors

On 03/15/2013 01:37 AM, Rafał Miłecki wrote:
>
> OK, so after early tests, this card doesn't support full-addressing at all.
>
> I'll give this patch a try during the weekend.

Please follow Michael's suggestion, and change

if (slot == firstused + 2) {

into

if (slot == next_slot(ring, next_slot(ring, firstused))) {

This change is needed just in case "firstused + 2" wraps around.

Thanks,

Larry


2013-03-15 02:07:36

by Larry Finger

[permalink] [raw]
Subject: Re: [RFC/RFT] b43: A fix for DMA transmission sequence errors

On 03/14/2013 07:09 PM, ISE Development wrote:
> On Thursday 14 Mar 2013 13:46:19 Larry Finger wrote:
>> + if (unlikely(!skip))
>> + txstat = status;
>> + else
>> + txstat = &fake;
>
> The logic should probably be reversed. It's more unlikely to skip than not to
> do so (hopefully!)...

Yes, you are correct.

Larry



2013-03-20 16:16:36

by Chris Vine

[permalink] [raw]
Subject: Re: [RFC/RFT] b43: A fix for DMA transmission sequence errors

On Wed, 20 Mar 2013 16:04:29 +0100
ISE Development <[email protected]> wrote:

> On Wednesday 20 Mar 2013 09:53:28 Larry Finger wrote:
> > On 03/20/2013 06:26 AM, ISE Development wrote:
> > > Did you get a chance to validate this patch?
> > >
> > > So far on my system, it has been running for over week without
> > > problems (over 15GB of traffic).
> >
> > I have been using the patch, but I am testing rtlwifi drivers quite
> > heavily at the moment, thus b43 has had limited usage. In fact, I
> > was waiting to get a report from you before pushing the patch. It
> > is ready to go.
> >
> > How frequently do you get the automatic reset?
> >
> > Larry
>
> At least 3 or 4 per second when exceeding 2Mbit/s, otherwise around
> 1-2 per 10 second interval. The card was literally unusable without
> the patch.

For me, much less. At full speed, about one reset per 500MBytes. In
fact, the resets seem to depend on the number of bytes passed rather
than the speed of itself.

Chris

2013-03-14 20:17:19

by Rafał Miłecki

[permalink] [raw]
Subject: Re: [RFC/RFT] b43: A fix for DMA transmission sequence errors

2013/3/14 Larry Finger <[email protected]>:
> From: Iestyn C. Elfick <[email protected]>
>
> Intermittently, b43 will report "Out of order TX status report on DMA ring".
> When this happens, the driver must be reset before communication can resume.
> The cause of the problem is believed to be an error in the closed-source
> firmware; however, all versions of the firmware are affected.
>
> This change uses the observation that the expected status is always 2 less
> than the observed value, and supplies a fake status report to skip one
> header/data pair.
>
> Signed-off-by: Larry Finger <[email protected]>
> Cc: Stable <[email protected]>
> ---
>
> Although this is a hack, it seems to work for the OP. It should not cause any
> ill effects on any device that does not have the problem.
>
> It does need a lot more testing, particularly on systems prone to the out-of-order
> status report problem.
>
> If anyone can think of a cleaner, less intrusive solution, please let us know.

I wonder if this is hardware supporting unaligned access with
full-addressing for DMA engine. Maybe it support index-based indexes
for backward compatibility only, and it's somehow broken?

Can we test full-addressing first? I'll try to prepare a patch for
this during the weekend.

--
Rafał

2013-03-20 14:53:31

by Larry Finger

[permalink] [raw]
Subject: Re: [RFC/RFT] b43: A fix for DMA transmission sequence errors

On 03/20/2013 06:26 AM, ISE Development wrote:
>
> Did you get a chance to validate this patch?
>
> So far on my system, it has been running for over week without problems (over
> 15GB of traffic).

I have been using the patch, but I am testing rtlwifi drivers quite heavily at
the moment, thus b43 has had limited usage. In fact, I was waiting to get a
report from you before pushing the patch. It is ready to go.

How frequently do you get the automatic reset?

Larry



2013-03-14 18:59:45

by Michael Büsch

[permalink] [raw]
Subject: Re: [RFC/RFT] b43: A fix for DMA transmission sequence errors

On Thu, 14 Mar 2013 13:46:19 -0500
Larry Finger <[email protected]> wrote:

> if (unlikely(slot != firstused)) {
> /* This possibly is a firmware bug and will result in
> - * malfunction, memory leaks and/or stall of DMA functionality. */
> + * malfunction, memory leaks and/or stall of DMA functionality.
> + */
> b43dbg(dev->wl, "Out of order TX status report on DMA ring %d. "
> "Expected %d, but got %d\n",
> ring->index, firstused, slot);
> - return;
> + if (slot == firstused + 2) {

I guess you need to handle wrap-over here.
This would probably do:
if (slot == next_slot(ring, next_slot(ring, firstused)))

--
Michael


Attachments:
signature.asc (836.00 B)

2013-03-15 00:09:23

by ISE Development

[permalink] [raw]
Subject: Re: [RFC/RFT] b43: A fix for DMA transmission sequence errors

On Thursday 14 Mar 2013 13:46:19 Larry Finger wrote:
> + if (unlikely(!skip))
> + txstat = status;
> + else
> + txstat = &fake;

The logic should probably be reversed. It's more unlikely to skip than not to
do so (hopefully!)...

--
-- isedev