LinuxLists.cc - [PATCH 1/7] can: rx-offload: continue on error

2019-09-26 08:38:49

Subject: [PATCH 1/7] can: rx-offload: continue on error

While can_rx_offload_offload_one will call mailbox_read to discard
the mailbox in case of an error, can_rx_offload_irq_offload_timestamp
bails out in the error case. Since it is typically called from a while
loop, all message will eventually be discarded. So lets continue on
error instead to discard them directly.

Signed-off-by: Jeroen Hofstee <[email protected]>
---
drivers/net/can/rx-offload.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/can/rx-offload.c b/drivers/net/can/rx-offload.c
index e6a668ee7730..39df41280e2d 100644
--- a/drivers/net/can/rx-offload.c
+++ b/drivers/net/can/rx-offload.c
@@ -158,7 +158,7 @@ int can_rx_offload_irq_offload_timestamp(struct can_rx_offload *offload, u64 pen

skb = can_rx_offload_offload_one(offload, i);
if (!skb)
- break;
+ continue;

__skb_queue_add_sort(&skb_queue, skb, can_rx_offload_compare);
}
--
2.17.1

2019-10-09 13:21:09

by Marc Kleine-Budde

[permalink] [raw]

Subject: Re: [PATCH 1/7] can: rx-offload: continue on error

Hello Jeroen,

I'm currently looking at the rx-offload error handling in detail.

TLDR: I've added the patch to linux-can.

Thanks,
Marc

For the record, the details:

On 9/24/19 8:45 PM, Jeroen Hofstee wrote:
> While can_rx_offload_offload_one will call mailbox_read to discard
> the mailbox in case of an error,

Yes.

can_rx_offload_offload_one() will read into the discard mailbox in case
of an error.

Currently there are two kinds of errors:
1) the rx-offoad skb queue (between the IRQ handler and the NAPI)
is full
2) alloc_can_skb() fails to allocate a skb, due to OOM

> can_rx_offload_irq_offload_timestamp bails out in the error case.

Yes, in case of an error both can_rx_offload_irq_offload_timestamp() and
can_rx_offload_irq_offload_fifo() will stop reading mailboxes, add the
already filled skbs to the queue and schedule NAPI if needed.

Currently both can_rx_offload_irq_offload_timestamp() and
can_rx_offload_irq_offload_fifo() will return the number of queued
mailboxes.

This means in case of queue overflow or OOM, only one mailbox is
discaded, before can_rx_offload_irq_offload_*() returning the number of
successfully queued mailboxes so far.

Looking at the flexcan driver:

https://elixir.bootlin.com/linux/latest/source/drivers/net/can/flexcan.c#L867

> while ((reg_iflag = flexcan_read_reg_iflag_rx(priv))) {
> handled = IRQ_HANDLED;
> ret = can_rx_offload_irq_offload_timestamp(&priv->offload,
> reg_iflag);
> if (!ret)
> break;
> }
[...]
> if (reg_iflag1 & FLEXCAN_IFLAG_RX_FIFO_AVAILABLE) {
> handled = IRQ_HANDLED;
> can_rx_offload_irq_offload_fifo(&priv->offload);
> }

This means for the timestamp mode, at least one mailbox is discarded or
if the error occurred after reading one or more mailboxes the while loop
will try again. If the error persists a second mailbox is discarded.

For the fifo mode, only one mailbox is discarded.

Then the flexcan's IRQ is exited. If there are messages in mailboxes are
pending another IRQ is triggered.... I doubt that this is a good idea.

On the other hand the ti_hecc driver:

> /* offload RX mailboxes and let NAPI deliver them */
> while ((rx_pending = hecc_read(priv, HECC_CANRMP))) {
> can_rx_offload_irq_offload_timestamp(&priv->offload,
> rx_pending);
> hecc_write(priv, HECC_CANRMP, rx_pending);
> }

The error is ignored and the _all_ mailboxes are discarded (given the
error persists).

> Since it is typically called from a while loop, all message will
> eventually be discarded. So lets continue on error instead to discard
> them directly.

After reading my own code and writing it up, your patch totally makes sense.

If there is a shortage of resources, queue overrun or OOM, it makes no
sense to return from the IRQ handler, if a mailbox is still active as it
will trigger the IRQ again. Entering the IRQ handler again probably
doesn't give the system time to recover from the resource problem.

> Signed-off-by: Jeroen Hofstee <[email protected]>
> ---
> drivers/net/can/rx-offload.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/can/rx-offload.c b/drivers/net/can/rx-offload.c
> index e6a668ee7730..39df41280e2d 100644
> --- a/drivers/net/can/rx-offload.c
> +++ b/drivers/net/can/rx-offload.c
> @@ -158,7 +158,7 @@ int can_rx_offload_irq_offload_timestamp(struct can_rx_offload *offload, u64 pen
>
> skb = can_rx_offload_offload_one(offload, i);
> if (!skb)
> - break;
> + continue;
>
> __skb_queue_add_sort(&skb_queue, skb, can_rx_offload_compare);
> }
>

--
Pengutronix e.K. | Marc Kleine-Budde |
Industrial Linux Solutions | Phone: +49-231-2826-924 |
Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |

Attachments:

signature.asc (499.00 B)
OpenPGP digital signature

2019-10-13 21:30:35

by Jeroen Hofstee

[permalink] [raw]

Subject: Re: [PATCH 1/7] can: rx-offload: continue on error

Hello Marc,

On 10/9/19 3:18 PM, Marc Kleine-Budde wrote:
> Hello Jeroen,
>
> I'm currently looking at the rx-offload error handling in detail.
>
> TLDR: I've added the patch to linux-can.
>
> Thanks,
> Marc
>
> For the record, the details:
>
> On 9/24/19 8:45 PM, Jeroen Hofstee wrote:
>> While can_rx_offload_offload_one will call mailbox_read to discard
>> the mailbox in case of an error,
> Yes.
>
> can_rx_offload_offload_one() will read into the discard mailbox in case
> of an error.
>
> Currently there are two kinds of errors:
> 1) the rx-offoad skb queue (between the IRQ handler and the NAPI)
> is full
> 2) alloc_can_skb() fails to allocate a skb, due to OOM
>
>> can_rx_offload_irq_offload_timestamp bails out in the error case.
> Yes, in case of an error both can_rx_offload_irq_offload_timestamp() and
> can_rx_offload_irq_offload_fifo() will stop reading mailboxes, add the
> already filled skbs to the queue and schedule NAPI if needed.
>
> Currently both can_rx_offload_irq_offload_timestamp() and
> can_rx_offload_irq_offload_fifo() will return the number of queued
> mailboxes.
>
> This means in case of queue overflow or OOM, only one mailbox is
> discaded, before can_rx_offload_irq_offload_*() returning the number of
> successfully queued mailboxes so far.
>
> Looking at the flexcan driver:
>
> https://elixir.bootlin.com/linux/latest/source/drivers/net/can/flexcan.c#L867
>
>> while ((reg_iflag = flexcan_read_reg_iflag_rx(priv))) {
>> handled = IRQ_HANDLED;
>> ret = can_rx_offload_irq_offload_timestamp(&priv->offload,
>> reg_iflag);
>> if (!ret)
>> break;
>> }
> [...]
>> if (reg_iflag1 & FLEXCAN_IFLAG_RX_FIFO_AVAILABLE) {
>> handled = IRQ_HANDLED;
>> can_rx_offload_irq_offload_fifo(&priv->offload);
>> }
> This means for the timestamp mode, at least one mailbox is discarded or
> if the error occurred after reading one or more mailboxes the while loop
> will try again. If the error persists a second mailbox is discarded.
>
> For the fifo mode, only one mailbox is discarded.
>
> Then the flexcan's IRQ is exited. If there are messages in mailboxes are
> pending another IRQ is triggered.... I doubt that this is a good idea.
>
> On the other hand the ti_hecc driver:
>
>> /* offload RX mailboxes and let NAPI deliver them */
>> while ((rx_pending = hecc_read(priv, HECC_CANRMP))) {
>> can_rx_offload_irq_offload_timestamp(&priv->offload,
>> rx_pending);
>> hecc_write(priv, HECC_CANRMP, rx_pending);
>> }
> The error is ignored and the _all_ mailboxes are discarded (given the
> error persists).
>
>> Since it is typically called from a while loop, all message will
>> eventually be discarded. So lets continue on error instead to discard
>> them directly.
> After reading my own code and writing it up, your patch totally makes sense.
>
> If there is a shortage of resources, queue overrun or OOM, it makes no
> sense to return from the IRQ handler, if a mailbox is still active as it
> will trigger the IRQ again. Entering the IRQ handler again probably
> doesn't give the system time to recover from the resource problem.

Indeed, I have nothing to comment on that, besides thanks for
being willing to reconsider your own code.

With kind regards,

Jeroen