2013-04-23 19:45:21

by Thommy Jakobsson

[permalink] [raw]
Subject: [PATCH] B43: Handle DMA RX descriptor underrun

From: Thommy Jakobsson <[email protected]>

Add handling of rx descriptor underflow. This fixes a fault that could
happen on slow machines, where data is received faster than the CPU can
handle. In such a case the device will use up all rx descriptors and
refuse to send any more data before confirming that it is ok. This
patch enables necessary interrupt to discover such a situation and will
handle them by dropping everything in the ring buffer.

Reviewed-by: Michael Buesch <[email protected]>
Signed-off-by: Thommy Jakobsson <[email protected]>
Cc: stable <[email protected]>
---
Hi John,

patch for fixing problems seen on slow hardware and b43. Discussed at b43
mailinglist and openwrt ticket system
(https://dev.openwrt.org/ticket/7552)

BR,
Thommy Jakobsson
---
drivers/net/wireless/b43/dma.c | 19 +++++++++++++++++
drivers/net/wireless/b43/dma.h | 4 +++-
drivers/net/wireless/b43/main.c | 43 ++++++++++++++++-----------------------
3 files changed, 40 insertions(+), 26 deletions(-)

diff --git a/drivers/net/wireless/b43/dma.c b/drivers/net/wireless/b43/dma.c
index 1221469..ee3d640 100644
--- a/drivers/net/wireless/b43/dma.c
+++ b/drivers/net/wireless/b43/dma.c
@@ -1733,6 +1733,25 @@ drop_recycle_buffer:
sync_descbuffer_for_device(ring, dmaaddr, ring->rx_buffersize);
}

+void b43_dma_handle_rx_overflow(struct b43_dmaring *ring)
+{
+ int current_slot, previous_slot;
+
+ B43_WARN_ON(ring->tx);
+
+ /* Device has filled all buffers, drop all packets and let TCP
+ * decrease speed.
+ * Decrement RX index by one will let the device to see all slots
+ * as free again
+ */
+ /*
+ *TODO: How to increase rx_drop in mac80211?
+ */
+ current_slot = ring->ops->get_current_rxslot(ring);
+ previous_slot = prev_slot(ring, current_slot);
+ ring->ops->set_current_rxslot(ring, previous_slot);
+}
+
void b43_dma_rx(struct b43_dmaring *ring)
{
const struct b43_dma_ops *ops = ring->ops;
diff --git a/drivers/net/wireless/b43/dma.h b/drivers/net/wireless/b43/dma.h
index 9fdd198..df8c8cd 100644
--- a/drivers/net/wireless/b43/dma.h
+++ b/drivers/net/wireless/b43/dma.h
@@ -9,7 +9,7 @@
/* DMA-Interrupt reasons. */
#define B43_DMAIRQ_FATALMASK ((1 << 10) | (1 << 11) | (1 << 12) \
| (1 << 14) | (1 << 15))
-#define B43_DMAIRQ_NONFATALMASK (1 << 13)
+#define B43_DMAIRQ_RDESC_UFLOW (1 << 13)
#define B43_DMAIRQ_RX_DONE (1 << 16)

/*** 32-bit DMA Engine. ***/
@@ -295,6 +295,8 @@ int b43_dma_tx(struct b43_wldev *dev,
void b43_dma_handle_txstatus(struct b43_wldev *dev,
const struct b43_txstatus *status);

+void b43_dma_handle_rx_overflow(struct b43_dmaring *ring);
+
void b43_dma_rx(struct b43_dmaring *ring);

void b43_dma_direct_fifo_rx(struct b43_wldev *dev,
diff --git a/drivers/net/wireless/b43/main.c b/drivers/net/wireless/b43/main.c
index d377f77..6dd07e2 100644
--- a/drivers/net/wireless/b43/main.c
+++ b/drivers/net/wireless/b43/main.c
@@ -1902,30 +1902,18 @@ static void b43_do_interrupt_thread(struct b43_wldev *dev)
}
}

- if (unlikely(merged_dma_reason & (B43_DMAIRQ_FATALMASK |
- B43_DMAIRQ_NONFATALMASK))) {
- if (merged_dma_reason & B43_DMAIRQ_FATALMASK) {
- b43err(dev->wl, "Fatal DMA error: "
- "0x%08X, 0x%08X, 0x%08X, "
- "0x%08X, 0x%08X, 0x%08X\n",
- dma_reason[0], dma_reason[1],
- dma_reason[2], dma_reason[3],
- dma_reason[4], dma_reason[5]);
- b43err(dev->wl, "This device does not support DMA "
+ if (unlikely(merged_dma_reason & (B43_DMAIRQ_FATALMASK))) {
+ b43err(dev->wl,
+ "Fatal DMA error: 0x%08X, 0x%08X, 0x%08X, 0x%08X, 0x%08X, 0x%08X\n",
+ dma_reason[0], dma_reason[1],
+ dma_reason[2], dma_reason[3],
+ dma_reason[4], dma_reason[5]);
+ b43err(dev->wl, "This device does not support DMA "
"on your system. It will now be switched to PIO.\n");
- /* Fall back to PIO transfers if we get fatal DMA errors! */
- dev->use_pio = true;
- b43_controller_restart(dev, "DMA error");
- return;
- }
- if (merged_dma_reason & B43_DMAIRQ_NONFATALMASK) {
- b43err(dev->wl, "DMA error: "
- "0x%08X, 0x%08X, 0x%08X, "
- "0x%08X, 0x%08X, 0x%08X\n",
- dma_reason[0], dma_reason[1],
- dma_reason[2], dma_reason[3],
- dma_reason[4], dma_reason[5]);
- }
+ /* Fall back to PIO transfers if we get fatal DMA errors! */
+ dev->use_pio = true;
+ b43_controller_restart(dev, "DMA error");
+ return;
}

if (unlikely(reason & B43_IRQ_UCODE_DEBUG))
@@ -1944,6 +1932,11 @@ static void b43_do_interrupt_thread(struct b43_wldev *dev)
handle_irq_noise(dev);

/* Check the DMA reason registers for received data. */
+ if (dma_reason[0] & B43_DMAIRQ_RDESC_UFLOW) {
+ if (B43_DEBUG)
+ b43warn(dev->wl, "RX descriptor underrun\n");
+ b43_dma_handle_rx_overflow(dev->dma.rx_ring);
+ }
if (dma_reason[0] & B43_DMAIRQ_RX_DONE) {
if (b43_using_pio_transfers(dev))
b43_pio_rx(dev->pio.rx_queue);
@@ -2001,7 +1994,7 @@ static irqreturn_t b43_do_interrupt(struct b43_wldev *dev)
return IRQ_NONE;

dev->dma_reason[0] = b43_read32(dev, B43_MMIO_DMA0_REASON)
- & 0x0001DC00;
+ & 0x0001FC00;
dev->dma_reason[1] = b43_read32(dev, B43_MMIO_DMA1_REASON)
& 0x0000DC00;
dev->dma_reason[2] = b43_read32(dev, B43_MMIO_DMA2_REASON)
@@ -3130,7 +3123,7 @@ static int b43_chip_init(struct b43_wldev *dev)
b43_write32(dev, 0x018C, 0x02000000);
}
b43_write32(dev, B43_MMIO_GEN_IRQ_REASON, 0x00004000);
- b43_write32(dev, B43_MMIO_DMA0_IRQ_MASK, 0x0001DC00);
+ b43_write32(dev, B43_MMIO_DMA0_IRQ_MASK, 0x0001FC00);
b43_write32(dev, B43_MMIO_DMA1_IRQ_MASK, 0x0000DC00);
b43_write32(dev, B43_MMIO_DMA2_IRQ_MASK, 0x0000DC00);
b43_write32(dev, B43_MMIO_DMA3_IRQ_MASK, 0x0001DC00);
--
1.7.9.5



2013-04-23 20:34:18

by Michael Büsch

[permalink] [raw]
Subject: [PATCH] b43: rename stop-index DMA op

Rename the stop-index-write DMA operation to set_rx_stop_slot().

Signed-off-by: Michael Buesch <[email protected]>

---

This patch depends on:
"B43: Handle DMA RX descriptor underrun"


Index: wireless-testing/drivers/net/wireless/b43/dma.c
===================================================================
--- wireless-testing.orig/drivers/net/wireless/b43/dma.c 2013-04-23 21:50:47.655004059 +0200
+++ wireless-testing/drivers/net/wireless/b43/dma.c 2013-04-23 21:51:24.132064549 +0200
@@ -156,9 +156,9 @@
return (val / sizeof(struct b43_dmadesc32));
}

-static void op32_set_current_rxslot(struct b43_dmaring *ring, int slot)
+static void op32_set_rx_stop_slot(struct b43_dmaring *ring, int slot)
{
- b43_dma_write(ring, B43_DMA32_RXINDEX,
+ b43_dma_write(ring, B43_DMA32_RXSTOPINDEX,
(u32) (slot * sizeof(struct b43_dmadesc32)));
}

@@ -169,7 +169,7 @@
.tx_suspend = op32_tx_suspend,
.tx_resume = op32_tx_resume,
.get_current_rxslot = op32_get_current_rxslot,
- .set_current_rxslot = op32_set_current_rxslot,
+ .set_rx_stop_slot = op32_set_rx_stop_slot,
};

/* 64bit DMA ops. */
@@ -251,9 +251,9 @@
return (val / sizeof(struct b43_dmadesc64));
}

-static void op64_set_current_rxslot(struct b43_dmaring *ring, int slot)
+static void op64_set_rx_stop_slot(struct b43_dmaring *ring, int slot)
{
- b43_dma_write(ring, B43_DMA64_RXINDEX,
+ b43_dma_write(ring, B43_DMA64_RXSTOPINDEX,
(u32) (slot * sizeof(struct b43_dmadesc64)));
}

@@ -264,7 +264,7 @@
.tx_suspend = op64_tx_suspend,
.tx_resume = op64_tx_resume,
.get_current_rxslot = op64_get_current_rxslot,
- .set_current_rxslot = op64_set_current_rxslot,
+ .set_rx_stop_slot = op64_set_rx_stop_slot,
};

static inline int free_slots(struct b43_dmaring *ring)
@@ -743,7 +743,7 @@
b43_dma_write(ring, B43_DMA64_RXCTL, value);
b43_dma_write(ring, B43_DMA64_RXRINGLO, addrlo);
b43_dma_write(ring, B43_DMA64_RXRINGHI, addrhi);
- b43_dma_write(ring, B43_DMA64_RXINDEX, ring->nr_slots *
+ b43_dma_write(ring, B43_DMA64_RXSTOPINDEX, ring->nr_slots *
sizeof(struct b43_dmadesc64));
} else {
u32 ringbase = (u32) (ring->dmabase);
@@ -758,7 +758,7 @@
value |= B43_DMA32_RXPARITYDISABLE;
b43_dma_write(ring, B43_DMA32_RXCTL, value);
b43_dma_write(ring, B43_DMA32_RXRING, addrlo);
- b43_dma_write(ring, B43_DMA32_RXINDEX, ring->nr_slots *
+ b43_dma_write(ring, B43_DMA32_RXSTOPINDEX, ring->nr_slots *
sizeof(struct b43_dmadesc32));
}
}
@@ -1749,7 +1749,7 @@
*/
current_slot = ring->ops->get_current_rxslot(ring);
previous_slot = prev_slot(ring, current_slot);
- ring->ops->set_current_rxslot(ring, previous_slot);
+ ring->ops->set_rx_stop_slot(ring, previous_slot);
}

void b43_dma_rx(struct b43_dmaring *ring)
@@ -1768,7 +1768,7 @@
update_max_used_slots(ring, ++used_slots);
}
wmb();
- ops->set_current_rxslot(ring, slot);
+ ops->set_rx_stop_slot(ring, slot);
ring->current_slot = slot;
}

Index: wireless-testing/drivers/net/wireless/b43/dma.h
===================================================================
--- wireless-testing.orig/drivers/net/wireless/b43/dma.h 2013-04-23 21:50:47.655004059 +0200
+++ wireless-testing/drivers/net/wireless/b43/dma.h 2013-04-23 21:50:47.635003479 +0200
@@ -49,7 +49,7 @@
#define B43_DMA32_RXADDREXT_MASK 0x00030000
#define B43_DMA32_RXADDREXT_SHIFT 16
#define B43_DMA32_RXRING 0x14
-#define B43_DMA32_RXINDEX 0x18
+#define B43_DMA32_RXSTOPINDEX 0x18
#define B43_DMA32_RXSTATUS 0x1C
#define B43_DMA32_RXDPTR 0x00000FFF
#define B43_DMA32_RXSTATE 0x0000F000
@@ -117,7 +117,7 @@
#define B43_DMA64_RXPARITYDISABLE 0x00000800
#define B43_DMA64_RXADDREXT_MASK 0x00030000
#define B43_DMA64_RXADDREXT_SHIFT 16
-#define B43_DMA64_RXINDEX 0x24
+#define B43_DMA64_RXSTOPINDEX 0x24
#define B43_DMA64_RXRINGLO 0x28
#define B43_DMA64_RXRINGHI 0x2C
#define B43_DMA64_RXSTATUS 0x30
@@ -207,7 +207,7 @@
void (*tx_suspend) (struct b43_dmaring * ring);
void (*tx_resume) (struct b43_dmaring * ring);
int (*get_current_rxslot) (struct b43_dmaring * ring);
- void (*set_current_rxslot) (struct b43_dmaring * ring, int slot);
+ void (*set_rx_stop_slot) (struct b43_dmaring * ring, int slot);
};

enum b43_dmatype {


--
Michael


Attachments:
signature.asc (836.00 B)

2013-04-24 07:01:00

by Rafał Miłecki

[permalink] [raw]
Subject: Re: [PATCH] B43: Handle DMA RX descriptor underrun

2013/4/23 Thommy Jakobsson <[email protected]>:
> From: Thommy Jakobsson <[email protected]>
>
> Add handling of rx descriptor underflow. This fixes a fault that could
> happen on slow machines, where data is received faster than the CPU can
> handle. In such a case the device will use up all rx descriptors and
> refuse to send any more data before confirming that it is ok. This
> patch enables necessary interrupt to discover such a situation and will
> handle them by dropping everything in the ring buffer.

Thanks for working on this, I didn't really got enough time to follow
your active discussion actively.

I'll try to test this patch on my devices over the next weekend.

--
Rafał

2013-05-05 17:24:40

by Michael Büsch

[permalink] [raw]
Subject: Re: [PATCH] B43: Handle DMA RX descriptor underrun

On Sun, 5 May 2013 18:31:20 +0200
Rafał Miłecki <[email protected]> wrote:

> Still worth considering is my previous e-mail. Why writing (for
> example) 1 to RXSTOPINDEX doesn't stop firmware from using slot 1?

What makes you think this register does not work?

Do you write a 1 to the register, or do you mean "the offset that
corresponds to slot 1"?

> Can it be because it's "too late"? For example:
> 1) Firmware writes to slot 0, changes "curr descr" to 1
> 2) Firmware grabs slot 1
> 3) Firmware generates IRQ
> 4) Driver reads slot 0
> 5) Driver writes RX stop index 1

The stop index is the index that the firmware does _not_ write to.
If you set it _after_ it already wrote, you clearly are too late (for
this ring wrapping).

--
Michael


Attachments:
signature.asc (836.00 B)

2013-05-05 19:58:19

by Michael Büsch

[permalink] [raw]
Subject: Re: [PATCH] B43: Handle DMA RX descriptor underrun

On Sun, 5 May 2013 21:50:33 +0200
Rafał Miłecki <[email protected]> wrote:

> 2013/5/5 Michael Büsch <[email protected]>:
> > On Sun, 5 May 2013 18:31:20 +0200
> > Rafał Miłecki <[email protected]> wrote:
> >
> >> Still worth considering is my previous e-mail. Why writing (for
> >> example) 1 to RXSTOPINDEX doesn't stop firmware from using slot 1?
> >
> > What makes you think this register does not work?
>
> Take a look at this:
>
> [ 327.224976] [DBG] old current:5 new current:6
> [ 327.224982] [DBG] reading slot 5
> [ 327.224997] [DBG] writing stop slot 6
>
> In above ring->slot was 5, but IRQ was generated, and we read new
> "current" using get_current_rxslot. It appeared to be 6. So we read
> packet from slot 5 and then called
> ops->set_current_rxslot(ring, 6);
> AFAIU hardware shouldn't use slot 6, right? But take a look at what
> happens next:
>
> [ 327.319582] [DBG] old current:6 new current:7
> [ 327.319590] [DBG] reading slot 6
> [ 327.319619] [DBG] writing stop slot 7
>
> Hardware generated IRQ and we get_current_rxslot returned 7. It means
> we're allowed to read slots up to 7 (excluding). It other words it
> means firmware used slot 6... but 100ms earlier we forbid firmware to
> use slot 6!

I'd rather say that this is a race condition between your testing code
and the firmware.

--
Michael


Attachments:
signature.asc (836.00 B)

2013-05-05 19:09:57

by Thommy Jakobsson

[permalink] [raw]
Subject: Re: [PATCH] B43: Handle DMA RX descriptor underrun



On Sun, 5 May 2013, Rafa? Mi?ecki wrote:

> 2013/5/5 Rafa? Mi?ecki <[email protected]>:
> > 2013/4/23 Thommy Jakobsson <[email protected]>:
> >> Add handling of rx descriptor underflow. This fixes a fault that could
> >> happen on slow machines, where data is received faster than the CPU can
> >> handle. In such a case the device will use up all rx descriptors and
> >> refuse to send any more data before confirming that it is ok. This
> >> patch enables necessary interrupt to discover such a situation and will
> >> handle them by dropping everything in the ring buffer.
> >
> > Thommy: does it mean firmware actually ignores what we write to the
> > B43_DMA64_RXINDEX (recently renamed to the B43_DMA64_RXSTOPINDEX)? Is
> > our set_current_rxslot and op64_set_current_rxslot (same for 32bit
> > version) useless in this situation?
>
> I've done some tests with this register. First I've added simple debugging:
>
> [ 326.496207] [DBG] old current:0 new current:1
> [ 326.496215] [DBG] reading slot 0
> [ 326.496237] [DBG] writing stop slot 1
>
> [ 326.921657] [DBG] old current:1 new current:2
> [ 326.921665] [DBG] reading slot 1
> [ 326.921691] [DBG] writing stop slot 2
>
> I'm not sure how does it work. If we write 1 to RXSTOPINDEX it means
> firmware should not use slot "1", right? But then it'll IRQ and
> "current" will be 2 meaning there is a packet on slot 1.
> So I'm not sure anymore meaning of that index register.

See that this got stuck in my outbox, sorry. Michael already answered the
question I think, but I send it anyway.


If we write 1 to rxstopindex it means stop when you reach that index. The
difference in the normal loop is that the device has already marked what
we write to RXSTOPINDEX as current. One cannot remove a slot from the
firmware.

//Thommy



2013-05-03 19:40:40

by Thommy Jakobsson

[permalink] [raw]
Subject: Re: [PATCH] B43: Handle DMA RX descriptor underrun



On Fri, 3 May 2013, John W. Linville wrote:

> On Wed, Apr 24, 2013 at 09:00:59AM +0200, Rafa? Mi?ecki wrote:
> > 2013/4/23 Thommy Jakobsson <[email protected]>:
> > > From: Thommy Jakobsson <[email protected]>
> > >
> > > Add handling of rx descriptor underflow. This fixes a fault that could
> > > happen on slow machines, where data is received faster than the CPU can
> > > handle. In such a case the device will use up all rx descriptors and
> > > refuse to send any more data before confirming that it is ok. This
> > > patch enables necessary interrupt to discover such a situation and will
> > > handle them by dropping everything in the ring buffer.
> >
> > Thanks for working on this, I didn't really got enough time to follow
> > your active discussion actively.
> >
> > I'll try to test this patch on my devices over the next weekend.
>
> Any word on this testing? Should this go to 3.10?
It still runs on my device without any problem. Don't know anything about
Rafael's testing though

BR,
Thommy Jakobsson

2013-05-05 12:44:16

by Rafał Miłecki

[permalink] [raw]
Subject: Re: [PATCH] B43: Handle DMA RX descriptor underrun

2013/4/23 Thommy Jakobsson <[email protected]>:
> Add handling of rx descriptor underflow. This fixes a fault that could
> happen on slow machines, where data is received faster than the CPU can
> handle. In such a case the device will use up all rx descriptors and
> refuse to send any more data before confirming that it is ok. This
> patch enables necessary interrupt to discover such a situation and will
> handle them by dropping everything in the ring buffer.

Thommy: does it mean firmware actually ignores what we write to the
B43_DMA64_RXINDEX (recently renamed to the B43_DMA64_RXSTOPINDEX)? Is
our set_current_rxslot and op64_set_current_rxslot (same for 32bit
version) useless in this situation?

Could this be a off-by-one issue? Maybe we're writing a value too low
by a one and firmware believes the whole ring is empty while it's
full?

I think we may want turning this interrupt anyway, but I wonder if
there is another issue we just hide a bit better.

--
Rafał

2013-05-05 16:31:21

by Rafał Miłecki

[permalink] [raw]
Subject: Re: [PATCH] B43: Handle DMA RX descriptor underrun

2013/5/5 Rafał Miłecki <[email protected]>:
> 2013/5/5 Rafał Miłecki <[email protected]>:
>> 2013/4/23 Thommy Jakobsson <[email protected]>:
>>> Add handling of rx descriptor underflow. This fixes a fault that could
>>> happen on slow machines, where data is received faster than the CPU can
>>> handle. In such a case the device will use up all rx descriptors and
>>> refuse to send any more data before confirming that it is ok. This
>>> patch enables necessary interrupt to discover such a situation and will
>>> handle them by dropping everything in the ring buffer.
>>
>> Thommy: does it mean firmware actually ignores what we write to the
>> B43_DMA64_RXINDEX (recently renamed to the B43_DMA64_RXSTOPINDEX)? Is
>> our set_current_rxslot and op64_set_current_rxslot (same for 32bit
>> version) useless in this situation?
>
> I've also grab some old logs from BCM4311 from wl driver.
>
> DMA setup:
> write32 0xfaafc210 <- 0x0000083d B43_DMA32_RXCTL
> write32 0xfaafc214 <- 0x5581e000 B43_DMA32_RXRING
> write32 0xfaafc218 <- 0x00000100 B43_DMA32_RXINDEX
>
> While running:
>
> read32 0xfaafc21c -> 0x02001018 B43_DMA32_RXSTATUS
> read32 0xfaafc21c -> 0x02001018 B43_DMA32_RXSTATUS
> read32 0xfaafc180 -> 0x0006230f
> read32 0xfaafc184 -> 0x00000000
> write32 0xfaafc218 <- 0x00000118 B43_DMA32_RXINDEX
>
> (...)
>
> read32 0xfaafc21c -> 0x02801020 B43_DMA32_RXSTATUS
> read32 0xfaafc21c -> 0x02801020 B43_DMA32_RXSTATUS
> read32 0xfaafc180 -> 0x00064c28
> read32 0xfaafc184 -> 0x00000000
> write32 0xfaafc218 <- 0x00000120 B43_DMA32_RXINDEX
>
> Interesting part is that they write amount of slots + slot index to
> the B43_DMA32_RXINDEX (like 0x120 instead of 0x20). May be worth
> investigating.

Hm, this may be related to the unaligned addressing. After hacking
b43_dma_rx to:
ops->set_current_rxslot(ring, ring->nr_slots + 4);
(I've added ring->nr_slots) stopped stopping firmware from using slot
4. So I think we can ignore my above research.

Still worth considering is my previous e-mail. Why writing (for
example) 1 to RXSTOPINDEX doesn't stop firmware from using slot 1?
Can it be because it's "too late"? For example:
1) Firmware writes to slot 0, changes "curr descr" to 1
2) Firmware grabs slot 1
3) Firmware generates IRQ
4) Driver reads slot 0
5) Driver writes RX stop index 1

Firmware took slot 1 before driver forbid it to use it.

--
Rafał

2013-05-05 19:22:50

by Larry Finger

[permalink] [raw]
Subject: Re: [PATCH] B43: Handle DMA RX descriptor underrun

On 05/05/2013 10:43 AM, Rafał Miłecki wrote:
>
> Interesting part is that they write amount of slots + slot index to
> the B43_DMA32_RXINDEX (like 0x120 instead of 0x20). May be worth
> investigating.

Getting that behavior from wl might be useful, but their response to an
interrupt with bit 13 set will not be of much help. Whenever that occurs, they
reinitialize the device, and do not try a recovery.

Larry


2013-05-02 15:01:53

by Larry Finger

[permalink] [raw]
Subject: Re: [PATCH] B43: Handle DMA RX descriptor underrun

On 05/02/2013 08:06 AM, Michael B?sch wrote:
> On Tue, 23 Apr 2013 21:45:11 +0200 (CEST)
> Thommy Jakobsson <[email protected]> wrote:
>
>> From: Thommy Jakobsson <[email protected]>
>>
>> Add handling of rx descriptor underflow. This fixes a fault that could
>> happen on slow machines, where data is received faster than the CPU can
>> handle. In such a case the device will use up all rx descriptors and
>> refuse to send any more data before confirming that it is ok. This
>> patch enables necessary interrupt to discover such a situation and will
>> handle them by dropping everything in the ring buffer.
>>
>> Reviewed-by: Michael Buesch <[email protected]>
>> Signed-off-by: Thommy Jakobsson <[email protected]>
>> Cc: stable <[email protected]>
>
> Anybody interested in porting this to b43legacy (and testing)?

Michael,

I will take care of that. It may take a while due to other projects.

Larry



2013-05-13 18:27:17

by Thommy Jakobsson

[permalink] [raw]
Subject: Re: [PATCH] B43: Handle DMA RX descriptor underrun



On Sun, 5 May 2013, Rafa? Mi?ecki wrote:

> I think we may want turning this interrupt anyway, but I wonder if
> there is another issue we just hide a bit better.
>
Any news on the testing Rafael? Did you have any more questions about the
solution? Just let me know if I can help

//Thommy

2013-05-25 20:37:05

by Larry Finger

[permalink] [raw]
Subject: Re: [PATCH] B43: Handle DMA RX descriptor underrun

On 05/25/2013 02:42 PM, Michael Büsch wrote:
> On Sat, 25 May 2013 21:37:32 +0200
> Hauke Mehrtens <[email protected]> wrote:
>
>> On 05/25/2013 09:02 PM, Thommy Jakobsson wrote:
>>>
>>>
>>> On Fri, 24 May 2013, John W. Linville wrote:
>>>
>>>> Ping? Should I drop this one?
>>>>
>>> Still working great on my device, so from that I think we should go for
>>> it. But it would be great if Rafal (or someone) tested it on some other
>>> hardware as well.
>>>
>>> //Thommy
>>
>> This is included in OpenWrt for ~4 weeks now and I haven't read of
>> anybody complaining about any new problems introduced by this patch or
>> something which looks like the DMA under run.
>> I am also for adding this to the kernel.
>
> I'm probably missing something, but isn't this patch already in Linus' tree?
> It's in stable review after all.

The patch is in wireless-testing with John's s-o-b, and in the mainline tree
with the same annotation.

It has been working OK here, but my devices did not show underrun.

Larry



2013-05-05 20:06:18

by Thommy Jakobsson

[permalink] [raw]
Subject: Re: [PATCH] B43: Handle DMA RX descriptor underrun



On Sun, 5 May 2013, Michael B?sch wrote:

> On Sun, 5 May 2013 21:50:33 +0200
> Rafa? Mi?ecki <[email protected]> wrote:
>
> > 2013/5/5 Michael B?sch <[email protected]>:
> > > On Sun, 5 May 2013 18:31:20 +0200
> > > Rafa? Mi?ecki <[email protected]> wrote:
> > >
> > >> Still worth considering is my previous e-mail. Why writing (for
> > >> example) 1 to RXSTOPINDEX doesn't stop firmware from using slot 1?
> > >
> > > What makes you think this register does not work?
> >
> > Take a look at this:
> >
> > [ 327.224976] [DBG] old current:5 new current:6
> > [ 327.224982] [DBG] reading slot 5
> > [ 327.224997] [DBG] writing stop slot 6
> >
> > In above ring->slot was 5, but IRQ was generated, and we read new
> > "current" using get_current_rxslot. It appeared to be 6. So we read
> > packet from slot 5 and then called
> > ops->set_current_rxslot(ring, 6);
> > AFAIU hardware shouldn't use slot 6, right? But take a look at what
> > happens next:
> >
> > [ 327.319582] [DBG] old current:6 new current:7
> > [ 327.319590] [DBG] reading slot 6
> > [ 327.319619] [DBG] writing stop slot 7
> >
> > Hardware generated IRQ and we get_current_rxslot returned 7. It means
> > we're allowed to read slots up to 7 (excluding). It other words it
> > means firmware used slot 6... but 100ms earlier we forbid firmware to
> > use slot 6!
>
> I'd rather say that this is a race condition between your testing code
> and the firmware.
If I understood Rafael this is the normal rx-code (with some
printouts). When you use set_current_rxslot you dont forbid the hardware
to use slot 6, you say that the last slot that you can use is 5.So the
firmware actually checks if we step from 5 to 6, not if the slot is 6.

//Thommy

2013-05-05 20:00:00

by Thommy Jakobsson

[permalink] [raw]
Subject: Re: [PATCH] B43: Handle DMA RX descriptor underrun



On Sun, 5 May 2013, Rafa? Mi?ecki wrote:

> 2013/5/5 Michael B?sch <[email protected]>:
> > On Sun, 5 May 2013 18:31:20 +0200
> > Rafa? Mi?ecki <[email protected]> wrote:
> >
> >> Still worth considering is my previous e-mail. Why writing (for
> >> example) 1 to RXSTOPINDEX doesn't stop firmware from using slot 1?
> >
> > What makes you think this register does not work?
>
> Take a look at this:
>
> [ 327.224976] [DBG] old current:5 new current:6
> [ 327.224982] [DBG] reading slot 5
> [ 327.224997] [DBG] writing stop slot 6
>
> In above ring->slot was 5, but IRQ was generated, and we read new
> "current" using get_current_rxslot. It appeared to be 6. So we read
> packet from slot 5 and then called
> ops->set_current_rxslot(ring, 6);
> AFAIU hardware shouldn't use slot 6, right? But take a look at what
> happens next:
>
> [ 327.319582] [DBG] old current:6 new current:7
> [ 327.319590] [DBG] reading slot 6
> [ 327.319619] [DBG] writing stop slot 7
>
> Hardware generated IRQ and we get_current_rxslot returned 7. It means
> we're allowed to read slots up to 7 (excluding). It other words it
> means firmware used slot 6... but 100ms earlier we forbid firmware to
> use slot 6!
>
> This is part I don't understand. It looks like firmware ignores what
> we set with the ops->set_current_rxslot
As I wrote before, you cannot remove a slot that the device already marked
as next. The register that you write to with set_current_rxslot is only
checked when the firmware steps up the current descriptor register. At
least thats what I have seen from my testing.

//Thommy

2013-05-05 13:56:42

by Michael Büsch

[permalink] [raw]
Subject: Re: [PATCH] B43: Handle DMA RX descriptor underrun

On Sun, 5 May 2013 14:44:14 +0200
Rafał Miłecki <[email protected]> wrote:

> 2013/4/23 Thommy Jakobsson <[email protected]>:
> > Add handling of rx descriptor underflow. This fixes a fault that could
> > happen on slow machines, where data is received faster than the CPU can
> > handle. In such a case the device will use up all rx descriptors and
> > refuse to send any more data before confirming that it is ok. This
> > patch enables necessary interrupt to discover such a situation and will
> > handle them by dropping everything in the ring buffer.
>
> Thommy: does it mean firmware actually ignores what we write to the
> B43_DMA64_RXINDEX (recently renamed to the B43_DMA64_RXSTOPINDEX)? Is
> our set_current_rxslot and op64_set_current_rxslot (same for 32bit
> version) useless in this situation?
>
> Could this be a off-by-one issue? Maybe we're writing a value too low
> by a one and firmware believes the whole ring is empty while it's
> full?

The ring looks the same if it's full or empty. We can only know that it is
full when this interrupt fires, which happens as the indexes collide.

--
Michael


Attachments:
signature.asc (836.00 B)

2013-05-25 19:37:43

by Hauke Mehrtens

[permalink] [raw]
Subject: Re: [PATCH] B43: Handle DMA RX descriptor underrun

On 05/25/2013 09:02 PM, Thommy Jakobsson wrote:
>
>
> On Fri, 24 May 2013, John W. Linville wrote:
>
>> Ping? Should I drop this one?
>>
> Still working great on my device, so from that I think we should go for
> it. But it would be great if Rafal (or someone) tested it on some other
> hardware as well.
>
> //Thommy

This is included in OpenWrt for ~4 weeks now and I haven't read of
anybody complaining about any new problems introduced by this patch or
something which looks like the DMA under run.
I am also for adding this to the kernel.

Hauke


2013-05-28 17:45:12

by John W. Linville

[permalink] [raw]
Subject: Re: [PATCH] B43: Handle DMA RX descriptor underrun

On Sat, May 25, 2013 at 03:37:02PM -0500, Larry Finger wrote:
> On 05/25/2013 02:42 PM, Michael B?sch wrote:
> >On Sat, 25 May 2013 21:37:32 +0200
> >Hauke Mehrtens <[email protected]> wrote:
> >
> >>On 05/25/2013 09:02 PM, Thommy Jakobsson wrote:
> >>>
> >>>
> >>>On Fri, 24 May 2013, John W. Linville wrote:
> >>>
> >>>>Ping? Should I drop this one?
> >>>>
> >>>Still working great on my device, so from that I think we should go for
> >>>it. But it would be great if Rafal (or someone) tested it on some other
> >>>hardware as well.
> >>>
> >>>//Thommy
> >>
> >>This is included in OpenWrt for ~4 weeks now and I haven't read of
> >>anybody complaining about any new problems introduced by this patch or
> >>something which looks like the DMA under run.
> >>I am also for adding this to the kernel.
> >
> >I'm probably missing something, but isn't this patch already in Linus' tree?
> >It's in stable review after all.
>
> The patch is in wireless-testing with John's s-o-b, and in the
> mainline tree with the same annotation.
>
> It has been working OK here, but my devices did not show underrun.

OK, thanks -- sorry for the confusion!

--
John W. Linville Someday the world will need a hero, and you
[email protected] might be all we have. Be ready.

2013-05-25 19:03:05

by Thommy Jakobsson

[permalink] [raw]
Subject: Re: [PATCH] B43: Handle DMA RX descriptor underrun



On Fri, 24 May 2013, John W. Linville wrote:

> Ping? Should I drop this one?
>
Still working great on my device, so from that I think we should go for
it. But it would be great if Rafal (or someone) tested it on some other
hardware as well.

//Thommy

2013-05-24 17:00:14

by John W. Linville

[permalink] [raw]
Subject: Re: [PATCH] B43: Handle DMA RX descriptor underrun

On Mon, May 13, 2013 at 08:27:08PM +0200, Thommy Jakobsson wrote:
>
>
> On Sun, 5 May 2013, Rafa? Mi?ecki wrote:
>
> > I think we may want turning this interrupt anyway, but I wonder if
> > there is another issue we just hide a bit better.
> >
> Any news on the testing Rafael? Did you have any more questions about the
> solution? Just let me know if I can help
>
> //Thommy

Ping? Should I drop this one?

--
John W. Linville Someday the world will need a hero, and you
[email protected] might be all we have. Be ready.

2013-05-05 19:50:34

by Rafał Miłecki

[permalink] [raw]
Subject: Re: [PATCH] B43: Handle DMA RX descriptor underrun

2013/5/5 Michael Büsch <[email protected]>:
> On Sun, 5 May 2013 18:31:20 +0200
> Rafał Miłecki <[email protected]> wrote:
>
>> Still worth considering is my previous e-mail. Why writing (for
>> example) 1 to RXSTOPINDEX doesn't stop firmware from using slot 1?
>
> What makes you think this register does not work?

Take a look at this:

[ 327.224976] [DBG] old current:5 new current:6
[ 327.224982] [DBG] reading slot 5
[ 327.224997] [DBG] writing stop slot 6

In above ring->slot was 5, but IRQ was generated, and we read new
"current" using get_current_rxslot. It appeared to be 6. So we read
packet from slot 5 and then called
ops->set_current_rxslot(ring, 6);
AFAIU hardware shouldn't use slot 6, right? But take a look at what
happens next:

[ 327.319582] [DBG] old current:6 new current:7
[ 327.319590] [DBG] reading slot 6
[ 327.319619] [DBG] writing stop slot 7

Hardware generated IRQ and we get_current_rxslot returned 7. It means
we're allowed to read slots up to 7 (excluding). It other words it
means firmware used slot 6... but 100ms earlier we forbid firmware to
use slot 6!

This is part I don't understand. It looks like firmware ignores what
we set with the ops->set_current_rxslot


> Do you write a 1 to the register, or do you mean "the offset that
> corresponds to slot 1"?

I mean "the offset that corresponds to slot 1". In code it's:
ops->set_current_rxslot(ring, 1);


>> Can it be because it's "too late"? For example:
>> 1) Firmware writes to slot 0, changes "curr descr" to 1
>> 2) Firmware grabs slot 1
>> 3) Firmware generates IRQ
>> 4) Driver reads slot 0
>> 5) Driver writes RX stop index 1
>
> The stop index is the index that the firmware does _not_ write to.
> If you set it _after_ it already wrote, you clearly are too late (for
> this ring wrapping).

Take a look at above log. First we forbid firmware to use slot 6:
[ 327.224997] [DBG] writing stop slot 6
but 100ms later firmware used that slot:
[ 327.319582] [DBG] old current:6 new current:7
(it filled slot 6 and bumped value in B43_DMA64_RXSTATUS)

--
Rafał

2013-05-25 19:42:39

by Michael Büsch

[permalink] [raw]
Subject: Re: [PATCH] B43: Handle DMA RX descriptor underrun

On Sat, 25 May 2013 21:37:32 +0200
Hauke Mehrtens <[email protected]> wrote:

> On 05/25/2013 09:02 PM, Thommy Jakobsson wrote:
> >
> >
> > On Fri, 24 May 2013, John W. Linville wrote:
> >
> >> Ping? Should I drop this one?
> >>
> > Still working great on my device, so from that I think we should go for
> > it. But it would be great if Rafal (or someone) tested it on some other
> > hardware as well.
> >
> > //Thommy
>
> This is included in OpenWrt for ~4 weeks now and I haven't read of
> anybody complaining about any new problems introduced by this patch or
> something which looks like the DMA under run.
> I am also for adding this to the kernel.

I'm probably missing something, but isn't this patch already in Linus' tree?
It's in stable review after all.

--
Michael


Attachments:
signature.asc (836.00 B)

2013-05-03 17:35:47

by John W. Linville

[permalink] [raw]
Subject: Re: [PATCH] B43: Handle DMA RX descriptor underrun

On Wed, Apr 24, 2013 at 09:00:59AM +0200, Rafał Miłecki wrote:
> 2013/4/23 Thommy Jakobsson <[email protected]>:
> > From: Thommy Jakobsson <[email protected]>
> >
> > Add handling of rx descriptor underflow. This fixes a fault that could
> > happen on slow machines, where data is received faster than the CPU can
> > handle. In such a case the device will use up all rx descriptors and
> > refuse to send any more data before confirming that it is ok. This
> > patch enables necessary interrupt to discover such a situation and will
> > handle them by dropping everything in the ring buffer.
>
> Thanks for working on this, I didn't really got enough time to follow
> your active discussion actively.
>
> I'll try to test this patch on my devices over the next weekend.

Any word on this testing? Should this go to 3.10?

--
John W. Linville Someday the world will need a hero, and you
[email protected] might be all we have. Be ready.

2013-05-02 13:06:24

by Michael Büsch

[permalink] [raw]
Subject: Re: [PATCH] B43: Handle DMA RX descriptor underrun

On Tue, 23 Apr 2013 21:45:11 +0200 (CEST)
Thommy Jakobsson <[email protected]> wrote:

> From: Thommy Jakobsson <[email protected]>
>
> Add handling of rx descriptor underflow. This fixes a fault that could
> happen on slow machines, where data is received faster than the CPU can
> handle. In such a case the device will use up all rx descriptors and
> refuse to send any more data before confirming that it is ok. This
> patch enables necessary interrupt to discover such a situation and will
> handle them by dropping everything in the ring buffer.
>
> Reviewed-by: Michael Buesch <[email protected]>
> Signed-off-by: Thommy Jakobsson <[email protected]>
> Cc: stable <[email protected]>

Anybody interested in porting this to b43legacy (and testing)?

--
Michael


Attachments:
signature.asc (836.00 B)

2013-05-05 15:34:19

by Rafał Miłecki

[permalink] [raw]
Subject: Re: [PATCH] B43: Handle DMA RX descriptor underrun

2013/5/5 Rafał Miłecki <[email protected]>:
> 2013/4/23 Thommy Jakobsson <[email protected]>:
>> Add handling of rx descriptor underflow. This fixes a fault that could
>> happen on slow machines, where data is received faster than the CPU can
>> handle. In such a case the device will use up all rx descriptors and
>> refuse to send any more data before confirming that it is ok. This
>> patch enables necessary interrupt to discover such a situation and will
>> handle them by dropping everything in the ring buffer.
>
> Thommy: does it mean firmware actually ignores what we write to the
> B43_DMA64_RXINDEX (recently renamed to the B43_DMA64_RXSTOPINDEX)? Is
> our set_current_rxslot and op64_set_current_rxslot (same for 32bit
> version) useless in this situation?

I've done some tests with this register. First I've added simple debugging:

[ 326.496207] [DBG] old current:0 new current:1
[ 326.496215] [DBG] reading slot 0
[ 326.496237] [DBG] writing stop slot 1

[ 326.921657] [DBG] old current:1 new current:2
[ 326.921665] [DBG] reading slot 1
[ 326.921691] [DBG] writing stop slot 2

[ 326.931333] [DBG] old current:2 new current:3
[ 326.931339] [DBG] reading slot 2
[ 326.931357] [DBG] writing stop slot 3

[ 327.152025] [DBG] old current:3 new current:4
[ 327.152034] [DBG] reading slot 3
[ 327.152052] [DBG] writing stop slot 4

[ 327.217176] [DBG] old current:4 new current:5
[ 327.217182] [DBG] reading slot 4
[ 327.217206] [DBG] writing stop slot 5

[ 327.224976] [DBG] old current:5 new current:6
[ 327.224982] [DBG] reading slot 5
[ 327.224997] [DBG] writing stop slot 6

[ 327.319582] [DBG] old current:6 new current:7
[ 327.319590] [DBG] reading slot 6
[ 327.319619] [DBG] writing stop slot 7

[ 330.450532] [DBG] old current:7 new current:0
[ 330.450540] [DBG] reading slot 7
[ 330.450559] [DBG] writing stop slot 0

[ 330.889299] [DBG] old current:0 new current:1
[ 330.889306] [DBG] reading slot 0
[ 330.889329] [DBG] writing stop slot 1

I'm not sure how does it work. If we write 1 to RXSTOPINDEX it means
firmware should not use slot "1", right? But then it'll IRQ and
"current" will be 2 meaning there is a packet on slot 1.
So I'm not sure anymore meaning of that index register.

I've also done some extra experiment in b43_dma_rx and changed
set_current call to the one with hardcoded 4:
ops->set_current_rxslot(ring, 4);
There goes a log from that experiment:

[ 659.557475] [DBG] old current:0 new current:1
[ 659.557483] [DBG] reading slot 0
[ 659.557503] [DBG] writing stop slot 4

[ 659.929288] [DBG] old current:1 new current:2
[ 659.929295] [DBG] reading slot 1
[ 659.929319] [DBG] writing stop slot 4

[ 659.994291] [DBG] old current:2 new current:3
[ 659.994296] [DBG] reading slot 2
[ 659.994311] [DBG] writing stop slot 4

[ 660.125386] [DBG] old current:3 new current:4
[ 660.125394] [DBG] reading slot 3
[ 660.125421] [DBG] writing stop slot 4

And then RX stalled. It seems this time index register worked as
expected. We wrote "4" to it, so firmware didn't try to use slot 4,
probably waiting for b43 driver dumping that value.

So why that stop index register doesn't work that way in normal
situation? Any ideas?

--
Rafał

2013-05-05 15:43:26

by Rafał Miłecki

[permalink] [raw]
Subject: Re: [PATCH] B43: Handle DMA RX descriptor underrun

2013/5/5 Rafał Miłecki <[email protected]>:
> 2013/4/23 Thommy Jakobsson <[email protected]>:
>> Add handling of rx descriptor underflow. This fixes a fault that could
>> happen on slow machines, where data is received faster than the CPU can
>> handle. In such a case the device will use up all rx descriptors and
>> refuse to send any more data before confirming that it is ok. This
>> patch enables necessary interrupt to discover such a situation and will
>> handle them by dropping everything in the ring buffer.
>
> Thommy: does it mean firmware actually ignores what we write to the
> B43_DMA64_RXINDEX (recently renamed to the B43_DMA64_RXSTOPINDEX)? Is
> our set_current_rxslot and op64_set_current_rxslot (same for 32bit
> version) useless in this situation?

I've also grab some old logs from BCM4311 from wl driver.

DMA setup:
write32 0xfaafc210 <- 0x0000083d B43_DMA32_RXCTL
write32 0xfaafc214 <- 0x5581e000 B43_DMA32_RXRING
write32 0xfaafc218 <- 0x00000100 B43_DMA32_RXINDEX

While running:

read32 0xfaafc21c -> 0x02001018 B43_DMA32_RXSTATUS
read32 0xfaafc21c -> 0x02001018 B43_DMA32_RXSTATUS
read32 0xfaafc180 -> 0x0006230f
read32 0xfaafc184 -> 0x00000000
write32 0xfaafc218 <- 0x00000118 B43_DMA32_RXINDEX

(...)

read32 0xfaafc21c -> 0x02801020 B43_DMA32_RXSTATUS
read32 0xfaafc21c -> 0x02801020 B43_DMA32_RXSTATUS
read32 0xfaafc180 -> 0x00064c28
read32 0xfaafc184 -> 0x00000000
write32 0xfaafc218 <- 0x00000120 B43_DMA32_RXINDEX

Interesting part is that they write amount of slots + slot index to
the B43_DMA32_RXINDEX (like 0x120 instead of 0x20). May be worth
investigating.

--
Rafał

2013-06-10 09:40:43

by Rafał Miłecki

[permalink] [raw]
Subject: Re: [PATCH] B43: Handle DMA RX descriptor underrun

2013/5/25 Hauke Mehrtens <[email protected]>:
> On 05/25/2013 09:02 PM, Thommy Jakobsson wrote:
>>
>>
>> On Fri, 24 May 2013, John W. Linville wrote:
>>
>>> Ping? Should I drop this one?
>>>
>> Still working great on my device, so from that I think we should go for
>> it. But it would be great if Rafal (or someone) tested it on some other
>> hardware as well.
>>
>> //Thommy
>
> This is included in OpenWrt for ~4 weeks now and I haven't read of
> anybody complaining about any new problems introduced by this patch or
> something which looks like the DMA under run.
> I am also for adding this to the kernel.

Yeah, I saw some ticker on OpenWrt with very positive feedback. Nice
we have that issue worked out.

I'm still curious how Broadcom handles that without this interrupt.
Anyway, glad we found our way :)

--
Rafał