2013-03-20 19:02:34

by Larry Finger

[permalink] [raw]
Subject: [PATCH] b43: A fix for DMA transmission sequence errors

From: Iestyn C. Elfick <[email protected]>

Intermittently, b43 will report "Out of order TX status report on DMA ring".
When this happens, the driver must be reset before communication can resume.
The cause of the problem is believed to be an error in the closed-source
firmware; however, all versions of the firmware are affected.

This change uses the observation that the expected status is always 2 less
than the observed value, and supplies a fake status report to skip one
header/data pair.

Not all devices suffer from this problem, but it can occur several times
per second under heavy load. As each occurence kills the unmodified driver,
this patch makes if possible for the affected devices to function. The patch
logs only the first instance of the reset operation to prevent spamming
the logs.

Tested-by: Chris Vine <[email protected]>
Signed-off-by: Larry Finger <[email protected]>
Cc: Stable <[email protected]>
---

John,

If possible, this material should be applied to 3.9.

Thanks,

Larry
---

Index: wireless-testing-save/drivers/net/wireless/b43/dma.c
===================================================================
--- wireless-testing-save.orig/drivers/net/wireless/b43/dma.c
+++ wireless-testing-save/drivers/net/wireless/b43/dma.c
@@ -1487,8 +1487,12 @@ void b43_dma_handle_txstatus(struct b43_
const struct b43_dma_ops *ops;
struct b43_dmaring *ring;
struct b43_dmadesc_meta *meta;
+ static const struct b43_txstatus fake; /* filled with 0 */
+ const struct b43_txstatus *txstat;
int slot, firstused;
bool frame_succeed;
+ int skip;
+ static u8 err_out1, err_out2;

ring = parse_cookie(dev, status->cookie, &slot);
if (unlikely(!ring))
@@ -1501,13 +1505,36 @@ void b43_dma_handle_txstatus(struct b43_
firstused = ring->current_slot - ring->used_slots + 1;
if (firstused < 0)
firstused = ring->nr_slots + firstused;
+
+ skip = 0;
if (unlikely(slot != firstused)) {
/* This possibly is a firmware bug and will result in
- * malfunction, memory leaks and/or stall of DMA functionality. */
- b43dbg(dev->wl, "Out of order TX status report on DMA ring %d. "
- "Expected %d, but got %d\n",
- ring->index, firstused, slot);
- return;
+ * malfunction, memory leaks and/or stall of DMA functionality.
+ */
+ if (slot == next_slot(ring, next_slot(ring, firstused))) {
+ /* If a single header/data pair was missed, skip over
+ * the first two slots in an attempt to recover.
+ */
+ slot = firstused;
+ skip = 2;
+ if (!err_out1) {
+ /* Report the error once. */
+ b43dbg(dev->wl,
+ "Skip on DMA ring %d slot %d.\n",
+ ring->index, slot);
+ err_out1 = 1;
+ }
+ } else {
+ /* More than a single header/data pair were missed.
+ * Report this error once.
+ */
+ if (!err_out2)
+ b43dbg(dev->wl,
+ "Out of order TX status report on DMA ring %d. Expected %d, but got %d\n",
+ ring->index, firstused, slot);
+ err_out2 = 1;
+ return;
+ }
}

ops = ring->ops;
@@ -1522,11 +1549,13 @@ void b43_dma_handle_txstatus(struct b43_
slot, firstused, ring->index);
break;
}
+
if (meta->skb) {
struct b43_private_tx_info *priv_info =
- b43_get_priv_tx_info(IEEE80211_SKB_CB(meta->skb));
+ b43_get_priv_tx_info(IEEE80211_SKB_CB(meta->skb));

- unmap_descbuffer(ring, meta->dmaaddr, meta->skb->len, 1);
+ unmap_descbuffer(ring, meta->dmaaddr,
+ meta->skb->len, 1);
kfree(priv_info->bouncebuffer);
priv_info->bouncebuffer = NULL;
} else {
@@ -1538,8 +1567,9 @@ void b43_dma_handle_txstatus(struct b43_
struct ieee80211_tx_info *info;

if (unlikely(!meta->skb)) {
- /* This is a scatter-gather fragment of a frame, so
- * the skb pointer must not be NULL. */
+ /* This is a scatter-gather fragment of a frame,
+ * so the skb pointer must not be NULL.
+ */
b43dbg(dev->wl, "TX status unexpected NULL skb "
"at slot %d (first=%d) on ring %d\n",
slot, firstused, ring->index);
@@ -1550,9 +1580,18 @@ void b43_dma_handle_txstatus(struct b43_

/*
* Call back to inform the ieee80211 subsystem about
- * the status of the transmission.
+ * the status of the transmission. When skipping over
+ * a missed TX status report, use a status structure
+ * filled with zeros to indicate that the frame was not
+ * sent (frame_count 0) and not acknowledged
*/
- frame_succeed = b43_fill_txstatus_report(dev, info, status);
+ if (unlikely(skip))
+ txstat = &fake;
+ else
+ txstat = status;
+
+ frame_succeed = b43_fill_txstatus_report(dev, info,
+ txstat);
#ifdef CONFIG_B43_DEBUG
if (frame_succeed)
ring->nr_succeed_tx_packets++;
@@ -1580,12 +1619,14 @@ void b43_dma_handle_txstatus(struct b43_
/* Everything unmapped and free'd. So it's not used anymore. */
ring->used_slots--;

- if (meta->is_last_fragment) {
+ if (meta->is_last_fragment && !skip) {
/* This is the last scatter-gather
* fragment of the frame. We are done. */
break;
}
slot = next_slot(ring, slot);
+ if (skip > 0)
+ --skip;
}
if (ring->stopped) {
B43_WARN_ON(free_slots(ring) < TX_SLOTS_PER_FRAME);


2013-03-24 21:51:28

by Chris Vine

[permalink] [raw]
Subject: Re: [PATCH] b43: A fix for DMA transmission sequence errors

On Sat, 23 Mar 2013 19:01:26 +0100
Rafał Miłecki <[email protected]> wrote:
> 2013/3/23 Michael Büsch <[email protected]>:
> > On Sat, 23 Mar 2013 18:28:33 +0100
> > Rafał Miłecki <[email protected]> wrote:
> >
> >> > Could you try changing
> >> > B43_DMA32_RINGMEMSIZE
> >> > from 4096 to 8192 in dma.h?
> >>
> >> Blah, ignore that. You're card has 64b DMA, not 32b :|
> >
> > You could try playing with the constants anyway. Maybe something
> > interesting happens.
>
> Won't hurt.
>
> Chris, isedev: can you try replacing B43_DMA64_RINGMEMSIZE with some
> higher values than 8192?
>
> 8192 is 0x2000, so please try 0x4000 or even 0x8000

I still get the errors with these values.

Chris

2013-03-23 15:54:00

by Chris Vine

[permalink] [raw]
Subject: Re: [PATCH] b43: A fix for DMA transmission sequence errors

On Sat, 23 Mar 2013 11:35:17 +0100
Michael Büsch <[email protected]> wrote:
> On Sat, 23 Mar 2013 00:27:30 +0100
> Rafał Miłecki <[email protected]> wrote:
>
> > Today I've plugged my 14e4:4315 and (unfortunately?) it's working
> > pretty well. I hoped to reproduce some problems but failed to do
> > so. I was transmitting for an hour with average speed 11MiB/s and
> > didn't notice any DMA issues.
> >
> > I was using iperf with interval of 60 seconds and only 3 results
> > showed some problems (8.5MiB/s, 2.5MiB/s, 4.5MiB/s). No
> > disconnections however and no DMA errors. I just got "Group
> > rekeying completed..." in wpa_supplicant.
> >
> > So as I can't reproduce this, I can't find any other fix for this
> > issue, and there's no reason to stop this workaround. I'll just
> > apply it and test over weekend to check for any regressions, but
> > they are highly unlikely.
>
> I don't really believe in this being a firmware bug.
>
> Some b43 DMA engines (all?) have some alignment and
> page-boundary-crossing constraints. I would rather guess that on some
> kernels with some options turned on, alignment and/or boundary
> constraints are violated every now and then. (and thus the packet
> never reaches the firmware).
>
> I don't remember the details, though. Too long since I worked on that.
> But a few sanity checks could probably be added to the code to check
> this hypothesis.
>
> Does the failing kernel/machine have any special things w.r.t. memory?
> Like iommu, hugepages, whetever...

For what it is worth, this happens to me on both home compiled and
distributor kernels (ubuntu and slackware): in fact, any 32-bit kernel
that I have tried it on.

And it does not happen with the wl driver on the same kernels. So if
this is right, the wl driver must be doing something that the b43 driver
does not with respect to alignment: and you might well be right about
that.

Chris


2013-03-23 18:01:27

by Rafał Miłecki

[permalink] [raw]
Subject: Re: [PATCH] b43: A fix for DMA transmission sequence errors

2013/3/23 Michael Büsch <[email protected]>:
> On Sat, 23 Mar 2013 18:28:33 +0100
> Rafał Miłecki <[email protected]> wrote:
>
>> > Could you try changing
>> > B43_DMA32_RINGMEMSIZE
>> > from 4096 to 8192 in dma.h?
>>
>> Blah, ignore that. You're card has 64b DMA, not 32b :|
>
> You could try playing with the constants anyway. Maybe something
> interesting happens.

Won't hurt.

Chris, isedev: can you try replacing B43_DMA64_RINGMEMSIZE with some
higher values than 8192?

8192 is 0x2000, so please try 0x4000 or even 0x8000

--
Rafał

2013-03-23 18:20:44

by ISE Development

[permalink] [raw]
Subject: Re: [PATCH] b43: A fix for DMA transmission sequence errors

On Saturday 23 Mar 2013 19:01:26 Rafał Miłecki wrote:
> 2013/3/23 Michael Büsch <[email protected]>:
> > On Sat, 23 Mar 2013 18:28:33 +0100
> >
> > Rafał Miłecki <[email protected]> wrote:
> >> > Could you try changing
> >> > B43_DMA32_RINGMEMSIZE
> >> > from 4096 to 8192 in dma.h?
> >>
> >> Blah, ignore that. You're card has 64b DMA, not 32b :|
> >
> > You could try playing with the constants anyway. Maybe something
> > interesting happens.
>
> Won't hurt.
>
> Chris, isedev: can you try replacing B43_DMA64_RINGMEMSIZE with some
> higher values than 8192?
>
> 8192 is 0x2000, so please try 0x4000 or even 0x8000

Tried 16384 and 32768. No difference observed (same sequence of errors).

--
-- isedev


Attachments:
signature.asc (490.00 B)
This is a digitally signed message part.

2013-03-23 17:44:16

by Michael Büsch

[permalink] [raw]
Subject: Re: [PATCH] b43: A fix for DMA transmission sequence errors

On Sat, 23 Mar 2013 18:28:33 +0100
Rafał Miłecki <[email protected]> wrote:

> > Could you try changing
> > B43_DMA32_RINGMEMSIZE
> > from 4096 to 8192 in dma.h?
>
> Blah, ignore that. You're card has 64b DMA, not 32b :|

You could try playing with the constants anyway. Maybe something
interesting happens.

--
Michael


Attachments:
signature.asc (836.00 B)

2013-03-23 10:35:38

by Michael Büsch

[permalink] [raw]
Subject: Re: [PATCH] b43: A fix for DMA transmission sequence errors

On Sat, 23 Mar 2013 00:27:30 +0100
Rafał Miłecki <[email protected]> wrote:

> Today I've plugged my 14e4:4315 and (unfortunately?) it's working
> pretty well. I hoped to reproduce some problems but failed to do so. I
> was transmitting for an hour with average speed 11MiB/s and didn't
> notice any DMA issues.
>
> I was using iperf with interval of 60 seconds and only 3 results
> showed some problems (8.5MiB/s, 2.5MiB/s, 4.5MiB/s). No disconnections
> however and no DMA errors. I just got "Group rekeying completed..." in
> wpa_supplicant.
>
> So as I can't reproduce this, I can't find any other fix for this
> issue, and there's no reason to stop this workaround. I'll just apply
> it and test over weekend to check for any regressions, but they are
> highly unlikely.

I don't really believe in this being a firmware bug.

Some b43 DMA engines (all?) have some alignment and page-boundary-crossing
constraints. I would rather guess that on some kernels with some options
turned on, alignment and/or boundary constraints are violated every
now and then. (and thus the packet never reaches the firmware).

I don't remember the details, though. Too long since I worked on that.
But a few sanity checks could probably be added to the code to check
this hypothesis.

Does the failing kernel/machine have any special things w.r.t. memory?
Like iommu, hugepages, whetever...

--
Michael


Attachments:
signature.asc (836.00 B)

2013-03-23 17:26:52

by Rafał Miłecki

[permalink] [raw]
Subject: Re: [PATCH] b43: A fix for DMA transmission sequence errors

2013/3/23 Chris Vine <[email protected]>:
> On Sat, 23 Mar 2013 11:35:17 +0100
> Michael Büsch <[email protected]> wrote:
>> On Sat, 23 Mar 2013 00:27:30 +0100
>> Rafał Miłecki <[email protected]> wrote:
>>
>> > Today I've plugged my 14e4:4315 and (unfortunately?) it's working
>> > pretty well. I hoped to reproduce some problems but failed to do
>> > so. I was transmitting for an hour with average speed 11MiB/s and
>> > didn't notice any DMA issues.
>> >
>> > I was using iperf with interval of 60 seconds and only 3 results
>> > showed some problems (8.5MiB/s, 2.5MiB/s, 4.5MiB/s). No
>> > disconnections however and no DMA errors. I just got "Group
>> > rekeying completed..." in wpa_supplicant.
>> >
>> > So as I can't reproduce this, I can't find any other fix for this
>> > issue, and there's no reason to stop this workaround. I'll just
>> > apply it and test over weekend to check for any regressions, but
>> > they are highly unlikely.
>>
>> I don't really believe in this being a firmware bug.
>>
>> Some b43 DMA engines (all?) have some alignment and
>> page-boundary-crossing constraints. I would rather guess that on some
>> kernels with some options turned on, alignment and/or boundary
>> constraints are violated every now and then. (and thus the packet
>> never reaches the firmware).
>>
>> I don't remember the details, though. Too long since I worked on that.
>> But a few sanity checks could probably be added to the code to check
>> this hypothesis.
>>
>> Does the failing kernel/machine have any special things w.r.t. memory?
>> Like iommu, hugepages, whetever...
>
> For what it is worth, this happens to me on both home compiled and
> distributor kernels (ubuntu and slackware): in fact, any 32-bit kernel
> that I have tried it on.
>
> And it does not happen with the wl driver on the same kernels. So if
> this is right, the wl driver must be doing something that the b43 driver
> does not with respect to alignment: and you might well be right about
> that.

Could you try changing
B43_DMA32_RINGMEMSIZE
from 4096 to 8192 in dma.h?

--
Rafał

2013-03-23 17:28:34

by Rafał Miłecki

[permalink] [raw]
Subject: Re: [PATCH] b43: A fix for DMA transmission sequence errors

2013/3/23 Rafał Miłecki <[email protected]>:
> 2013/3/23 Chris Vine <[email protected]>:
>> On Sat, 23 Mar 2013 11:35:17 +0100
>> Michael Büsch <[email protected]> wrote:
>>> On Sat, 23 Mar 2013 00:27:30 +0100
>>> Rafał Miłecki <[email protected]> wrote:
>>>
>>> > Today I've plugged my 14e4:4315 and (unfortunately?) it's working
>>> > pretty well. I hoped to reproduce some problems but failed to do
>>> > so. I was transmitting for an hour with average speed 11MiB/s and
>>> > didn't notice any DMA issues.
>>> >
>>> > I was using iperf with interval of 60 seconds and only 3 results
>>> > showed some problems (8.5MiB/s, 2.5MiB/s, 4.5MiB/s). No
>>> > disconnections however and no DMA errors. I just got "Group
>>> > rekeying completed..." in wpa_supplicant.
>>> >
>>> > So as I can't reproduce this, I can't find any other fix for this
>>> > issue, and there's no reason to stop this workaround. I'll just
>>> > apply it and test over weekend to check for any regressions, but
>>> > they are highly unlikely.
>>>
>>> I don't really believe in this being a firmware bug.
>>>
>>> Some b43 DMA engines (all?) have some alignment and
>>> page-boundary-crossing constraints. I would rather guess that on some
>>> kernels with some options turned on, alignment and/or boundary
>>> constraints are violated every now and then. (and thus the packet
>>> never reaches the firmware).
>>>
>>> I don't remember the details, though. Too long since I worked on that.
>>> But a few sanity checks could probably be added to the code to check
>>> this hypothesis.
>>>
>>> Does the failing kernel/machine have any special things w.r.t. memory?
>>> Like iommu, hugepages, whetever...
>>
>> For what it is worth, this happens to me on both home compiled and
>> distributor kernels (ubuntu and slackware): in fact, any 32-bit kernel
>> that I have tried it on.
>>
>> And it does not happen with the wl driver on the same kernels. So if
>> this is right, the wl driver must be doing something that the b43 driver
>> does not with respect to alignment: and you might well be right about
>> that.
>
> Could you try changing
> B43_DMA32_RINGMEMSIZE
> from 4096 to 8192 in dma.h?

Blah, ignore that. You're card has 64b DMA, not 32b :|

--
Rafał

2013-03-22 23:27:32

by Rafał Miłecki

[permalink] [raw]
Subject: Re: [PATCH] b43: A fix for DMA transmission sequence errors

2013/3/20 Larry Finger <[email protected]>:
> From: Iestyn C. Elfick <[email protected]>
>
> Intermittently, b43 will report "Out of order TX status report on DMA ring".
> When this happens, the driver must be reset before communication can resume.
> The cause of the problem is believed to be an error in the closed-source
> firmware; however, all versions of the firmware are affected.
>
> This change uses the observation that the expected status is always 2 less
> than the observed value, and supplies a fake status report to skip one
> header/data pair.
>
> Not all devices suffer from this problem, but it can occur several times
> per second under heavy load. As each occurence kills the unmodified driver,
> this patch makes if possible for the affected devices to function. The patch
> logs only the first instance of the reset operation to prevent spamming
> the logs.
>
> Tested-by: Chris Vine <[email protected]>
> Signed-off-by: Larry Finger <[email protected]>
> Cc: Stable <[email protected]>

I promised to perform some tests, unfortunately I didn't find enough
time last weekend.

Today I've plugged my 14e4:4315 and (unfortunately?) it's working
pretty well. I hoped to reproduce some problems but failed to do so. I
was transmitting for an hour with average speed 11MiB/s and didn't
notice any DMA issues.

I was using iperf with interval of 60 seconds and only 3 results
showed some problems (8.5MiB/s, 2.5MiB/s, 4.5MiB/s). No disconnections
however and no DMA errors. I just got "Group rekeying completed..." in
wpa_supplicant.

So as I can't reproduce this, I can't find any other fix for this
issue, and there's no reason to stop this workaround. I'll just apply
it and test over weekend to check for any regressions, but they are
highly unlikely.

--
Rafał