2017-12-07 22:57:07

by Niklas Cassel

[permalink] [raw]
Subject: [PATCH net-next] net: stmmac: fix broken dma_interrupt handling for multi-queues

There is nothing that says that number of TX queues == number of RX
queues. E.g. the ARTPEC-6 SoC has 2 TX queues and 1 RX queue.

This code is obviously wrong:
for (chan = 0; chan < tx_channel_count; chan++) {
struct stmmac_rx_queue *rx_q = &priv->rx_queue[chan];

priv->rx_queue has size MTL_MAX_RX_QUEUES, so this will send an
uninitialized napi_struct to __napi_schedule(), causing us to
crash in net_rx_action(), because napi_struct->poll is zero.

[12846.759880] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[12846.768014] pgd = (ptrval)
[12846.770742] [00000000] *pgd=39ec7831, *pte=00000000, *ppte=00000000
[12846.777023] Internal error: Oops: 80000007 [#1] PREEMPT SMP ARM
[12846.782942] Modules linked in:
[12846.785998] CPU: 0 PID: 161 Comm: dropbear Not tainted 4.15.0-rc2-00285-gf5fb5f2f39a7 #36
[12846.794177] Hardware name: Axis ARTPEC-6 Platform
[12846.798879] task: (ptrval) task.stack: (ptrval)
[12846.803407] PC is at 0x0
[12846.805942] LR is at net_rx_action+0x274/0x43c
[12846.810383] pc : [<00000000>] lr : [<80bff064>] psr: 200e0113
[12846.816648] sp : b90d9ae8 ip : b90d9ae8 fp : b90d9b44
[12846.821871] r10: 00000008 r9 : 0013250e r8 : 00000100
[12846.827094] r7 : 0000012c r6 : 00000000 r5 : 00000001 r4 : bac84900
[12846.833619] r3 : 00000000 r2 : b90d9b08 r1 : 00000000 r0 : bac84900

Since each DMA channel can be used for rx and tx simultaneously,
the current code should probably be rewritten so that napi_struct is
embedded in a new struct stmmac_channel.
That way, stmmac_poll() can call stmmac_tx_clean() on just the tx queue
where we got the IRQ, instead of looping through all tx queues.
This is also how the xgbe driver does it (another driver for this IP).

Fixes: c22a3f48ef99 ("net: stmmac: adding multiple napi mechanism")
Signed-off-by: Niklas Cassel <[email protected]>
---
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 54 +++++++++++++++++++----
1 file changed, 46 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index d7250539d0bd..c52a9963c19d 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1997,22 +1997,60 @@ static void stmmac_set_dma_operation_mode(struct stmmac_priv *priv, u32 txmode,
static void stmmac_dma_interrupt(struct stmmac_priv *priv)
{
u32 tx_channel_count = priv->plat->tx_queues_to_use;
- int status;
+ u32 rx_channel_count = priv->plat->rx_queues_to_use;
+ u32 channels_to_check = tx_channel_count > rx_channel_count ?
+ tx_channel_count : rx_channel_count;
u32 chan;
+ bool poll_scheduled = false;
+ int status[channels_to_check];
+
+ /* Each DMA channel can be used for rx and tx simultaneously, yet
+ * napi_struct is embedded in struct stmmac_rx_queue rather than in a
+ * stmmac_channel struct.
+ * Because of this, stmmac_poll currently checks (and possibly wakes)
+ * all tx queues rather than just a single tx queue.
+ */
+ for (chan = 0; chan < channels_to_check; chan++)
+ status[chan] = priv->hw->dma->dma_interrupt(priv->ioaddr,
+ &priv->xstats,
+ chan);

- for (chan = 0; chan < tx_channel_count; chan++) {
- struct stmmac_rx_queue *rx_q = &priv->rx_queue[chan];
+ for (chan = 0; chan < rx_channel_count; chan++) {
+ if (likely(status[chan] & handle_rx)) {
+ struct stmmac_rx_queue *rx_q = &priv->rx_queue[chan];

- status = priv->hw->dma->dma_interrupt(priv->ioaddr,
- &priv->xstats, chan);
- if (likely((status & handle_rx)) || (status & handle_tx)) {
if (likely(napi_schedule_prep(&rx_q->napi))) {
stmmac_disable_dma_irq(priv, chan);
__napi_schedule(&rx_q->napi);
+ poll_scheduled = true;
+ }
+ }
+ }
+
+ /* If we scheduled poll, we already know that tx queues will be checked.
+ * If we didn't schedule poll, see if any DMA channel (used by tx) has a
+ * completed transmission, if so, call stmmac_poll (once).
+ */
+ if (!poll_scheduled) {
+ for (chan = 0; chan < tx_channel_count; chan++) {
+ if (status[chan] & handle_tx) {
+ /* It doesn't matter what rx queue we choose
+ * here. We use 0 since it always exists.
+ */
+ struct stmmac_rx_queue *rx_q =
+ &priv->rx_queue[0];
+
+ if (likely(napi_schedule_prep(&rx_q->napi))) {
+ stmmac_disable_dma_irq(priv, chan);
+ __napi_schedule(&rx_q->napi);
+ }
+ break;
}
}
+ }

- if (unlikely(status & tx_hard_error_bump_tc)) {
+ for (chan = 0; chan < tx_channel_count; chan++) {
+ if (unlikely(status[chan] & tx_hard_error_bump_tc)) {
/* Try to bump up the dma threshold on this failure */
if (unlikely(priv->xstats.threshold != SF_DMA_MODE) &&
(tc <= 256)) {
@@ -2029,7 +2067,7 @@ static void stmmac_dma_interrupt(struct stmmac_priv *priv)
chan);
priv->xstats.threshold = tc;
}
- } else if (unlikely(status == tx_hard_error)) {
+ } else if (unlikely(status[chan] == tx_hard_error)) {
stmmac_tx_err(priv, chan);
}
}
--
2.14.2


2017-12-07 23:57:51

by Niklas Cassel

[permalink] [raw]
Subject: Re: [PATCH net-next] net: stmmac: fix broken dma_interrupt handling for multi-queues

On Thu, Dec 07, 2017 at 11:56:10PM +0100, Niklas Cassel wrote:
> Since each DMA channel can be used for rx and tx simultaneously,
> the current code should probably be rewritten so that napi_struct is
> embedded in a new struct stmmac_channel.
> That way, stmmac_poll() can call stmmac_tx_clean() on just the tx queue
> where we got the IRQ, instead of looping through all tx queues.
> This is also how the xgbe driver does it (another driver for this IP).

Did anyone at Synopsys ever try this driver with a device tree
where num tx queues != num rx queues?

I know your hardware has 8 rx queues and 8 tx queues, but
that doesn't mean that you have to enable them all.

After fixing this crash, I'm still seeing tx timeouts with multi-queue
device trees. (At Axis we usually only enable 1 tx queue and 1 rx queue).

The multi-queue code seems a bit messy.. refactoring so that napi_struct
is in a struct stmmac_channel might help you guys with overall code
readability, since this would then actually match how the hardware works.

With the xgbe driver, xgbe_one_poll disable tx+rx irqs for a specific
channel, then calls futher function only with that specific channel.

With the stmmac driver, there is no connection between napi_struct
and tx. Disable irqs for tx? stmmac_poll loops through all tx queues :)

2017-12-08 19:19:13

by David Miller

[permalink] [raw]
Subject: Re: [PATCH net-next] net: stmmac: fix broken dma_interrupt handling for multi-queues

From: Niklas Cassel <[email protected]>
Date: Thu, 7 Dec 2017 23:56:10 +0100

> There is nothing that says that number of TX queues == number of RX
> queues. E.g. the ARTPEC-6 SoC has 2 TX queues and 1 RX queue.
>
> This code is obviously wrong:
> for (chan = 0; chan < tx_channel_count; chan++) {
> struct stmmac_rx_queue *rx_q = &priv->rx_queue[chan];
>
> priv->rx_queue has size MTL_MAX_RX_QUEUES, so this will send an
> uninitialized napi_struct to __napi_schedule(), causing us to
> crash in net_rx_action(), because napi_struct->poll is zero.
...
> Since each DMA channel can be used for rx and tx simultaneously,
> the current code should probably be rewritten so that napi_struct is
> embedded in a new struct stmmac_channel.
> That way, stmmac_poll() can call stmmac_tx_clean() on just the tx queue
> where we got the IRQ, instead of looping through all tx queues.
> This is also how the xgbe driver does it (another driver for this IP).
>
> Fixes: c22a3f48ef99 ("net: stmmac: adding multiple napi mechanism")
> Signed-off-by: Niklas Cassel <[email protected]>

Applied, but indeed a lot more fixes are needed in this area.

2017-12-13 10:04:44

by Joao Pinto

[permalink] [raw]
Subject: Re: [PATCH net-next] net: stmmac: fix broken dma_interrupt handling for multi-queues


Hi Niklas,

Às 10:56 PM de 12/7/2017, Niklas Cassel escreveu:
> There is nothing that says that number of TX queues == number of RX
> queues. E.g. the ARTPEC-6 SoC has 2 TX queues and 1 RX queue.
>

Yes you are totally right. Our Hardware was configured with 4RX queues and 4TX
queues and that lead us not to detect some of the issues that you are reporting.
We are already developing a prototype with a number of RX Queue different from
TX Queues.

We would like to resume a discussion that we had a couple of months ago, about
Synopsys testing stmmac in market boards. Currently I am the new Software Team
Manager and so I would like to say to you that we are interested in testing the
driver in several boards to assure quality in the stmmac driver, since we will
be constantly developing the new features for future HW releases.

If some vendors can send us sample boards it would be great, if we have to buy
some, it would be very useful to have reference suggestions.

Thanks and best regards,
Joao

> This code is obviously wrong:
> for (chan = 0; chan < tx_channel_count; chan++) {
> struct stmmac_rx_queue *rx_q = &priv->rx_queue[chan];
>
> priv->rx_queue has size MTL_MAX_RX_QUEUES, so this will send an
> uninitialized napi_struct to __napi_schedule(), causing us to
> crash in net_rx_action(), because napi_struct->poll is zero.
>
> [12846.759880] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> [12846.768014] pgd = (ptrval)
> [12846.770742] [00000000] *pgd=39ec7831, *pte=00000000, *ppte=00000000
> [12846.777023] Internal error: Oops: 80000007 [#1] PREEMPT SMP ARM
> [12846.782942] Modules linked in:
> [12846.785998] CPU: 0 PID: 161 Comm: dropbear Not tainted 4.15.0-rc2-00285-gf5fb5f2f39a7 #36
> [12846.794177] Hardware name: Axis ARTPEC-6 Platform
> [12846.798879] task: (ptrval) task.stack: (ptrval)
> [12846.803407] PC is at 0x0
> [12846.805942] LR is at net_rx_action+0x274/0x43c
> [12846.810383] pc : [<00000000>] lr : [<80bff064>] psr: 200e0113
> [12846.816648] sp : b90d9ae8 ip : b90d9ae8 fp : b90d9b44
> [12846.821871] r10: 00000008 r9 : 0013250e r8 : 00000100
> [12846.827094] r7 : 0000012c r6 : 00000000 r5 : 00000001 r4 : bac84900
> [12846.833619] r3 : 00000000 r2 : b90d9b08 r1 : 00000000 r0 : bac84900
>
> Since each DMA channel can be used for rx and tx simultaneously,
> the current code should probably be rewritten so that napi_struct is
> embedded in a new struct stmmac_channel.
> That way, stmmac_poll() can call stmmac_tx_clean() on just the tx queue
> where we got the IRQ, instead of looping through all tx queues.
> This is also how the xgbe driver does it (another driver for this IP).
>
> Fixes: c22a3f48ef99 ("net: stmmac: adding multiple napi mechanism")
> Signed-off-by: Niklas Cassel <[email protected]>
> ---
> drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 54 +++++++++++++++++++----
> 1 file changed, 46 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index d7250539d0bd..c52a9963c19d 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -1997,22 +1997,60 @@ static void stmmac_set_dma_operation_mode(struct stmmac_priv *priv, u32 txmode,
> static void stmmac_dma_interrupt(struct stmmac_priv *priv)
> {
> u32 tx_channel_count = priv->plat->tx_queues_to_use;
> - int status;
> + u32 rx_channel_count = priv->plat->rx_queues_to_use;
> + u32 channels_to_check = tx_channel_count > rx_channel_count ?
> + tx_channel_count : rx_channel_count;
> u32 chan;
> + bool poll_scheduled = false;
> + int status[channels_to_check];
> +
> + /* Each DMA channel can be used for rx and tx simultaneously, yet
> + * napi_struct is embedded in struct stmmac_rx_queue rather than in a
> + * stmmac_channel struct.
> + * Because of this, stmmac_poll currently checks (and possibly wakes)
> + * all tx queues rather than just a single tx queue.
> + */
> + for (chan = 0; chan < channels_to_check; chan++)
> + status[chan] = priv->hw->dma->dma_interrupt(priv->ioaddr,
> + &priv->xstats,
> + chan);
>
> - for (chan = 0; chan < tx_channel_count; chan++) {
> - struct stmmac_rx_queue *rx_q = &priv->rx_queue[chan];
> + for (chan = 0; chan < rx_channel_count; chan++) {
> + if (likely(status[chan] & handle_rx)) {
> + struct stmmac_rx_queue *rx_q = &priv->rx_queue[chan];
>
> - status = priv->hw->dma->dma_interrupt(priv->ioaddr,
> - &priv->xstats, chan);
> - if (likely((status & handle_rx)) || (status & handle_tx)) {
> if (likely(napi_schedule_prep(&rx_q->napi))) {
> stmmac_disable_dma_irq(priv, chan);
> __napi_schedule(&rx_q->napi);
> + poll_scheduled = true;
> + }
> + }
> + }
> +
> + /* If we scheduled poll, we already know that tx queues will be checked.
> + * If we didn't schedule poll, see if any DMA channel (used by tx) has a
> + * completed transmission, if so, call stmmac_poll (once).
> + */
> + if (!poll_scheduled) {
> + for (chan = 0; chan < tx_channel_count; chan++) {
> + if (status[chan] & handle_tx) {
> + /* It doesn't matter what rx queue we choose
> + * here. We use 0 since it always exists.
> + */
> + struct stmmac_rx_queue *rx_q =
> + &priv->rx_queue[0];
> +
> + if (likely(napi_schedule_prep(&rx_q->napi))) {
> + stmmac_disable_dma_irq(priv, chan);
> + __napi_schedule(&rx_q->napi);
> + }
> + break;
> }
> }
> + }
>
> - if (unlikely(status & tx_hard_error_bump_tc)) {
> + for (chan = 0; chan < tx_channel_count; chan++) {
> + if (unlikely(status[chan] & tx_hard_error_bump_tc)) {
> /* Try to bump up the dma threshold on this failure */
> if (unlikely(priv->xstats.threshold != SF_DMA_MODE) &&
> (tc <= 256)) {
> @@ -2029,7 +2067,7 @@ static void stmmac_dma_interrupt(struct stmmac_priv *priv)
> chan);
> priv->xstats.threshold = tc;
> }
> - } else if (unlikely(status == tx_hard_error)) {
> + } else if (unlikely(status[chan] == tx_hard_error)) {
> stmmac_tx_err(priv, chan);
> }
> }
>