2019-06-26 13:10:53

by Nikita Yushchenko

[permalink] [raw]
Subject: [PATCH resend] can: rcar_canfd: fix possible IRQ storm on high load

We have observed rcar_canfd driver entering IRQ storm under high load,
with following scenario:
- rcar_canfd_global_interrupt() in entered due to Rx available,
- napi_schedule_prep() is called, and sets NAPIF_STATE_SCHED in state
- Rx fifo interrupts are masked,
- rcar_canfd_global_interrupt() is entered again, this time due to
error interrupt (e.g. due to overflow),
- since scheduled napi poller has not yet executed, condition for calling
napi_schedule_prep() from rcar_canfd_global_interrupt() remains true,
thus napi_schedule_prep() gets called and sets NAPIF_STATE_MISSED flag
in state,
- later, napi poller function rcar_canfd_rx_poll() gets executed, and
calls napi_complete_done(),
- due to NAPIF_STATE_MISSED flag in state, this call does not clear
NAPIF_STATE_SCHED flag from state,
- on return from napi_complete_done(), rcar_canfd_rx_poll() unmasks Rx
interrutps,
- Rx interrupt happens, rcar_canfd_global_interrupt() gets called
and calls napi_schedule_prep(),
- since NAPIF_STATE_SCHED is set in state at this time, this call
returns false,
- due to that false return, rcar_canfd_global_interrupt() returns
without masking Rx interrupt
- and this results into IRQ storm: unmasked Rx interrupt happens again
and again is misprocessed in the same way.

This patch fixes that scenario by unmasking Rx interrupts only when
napi_complete_done() returns true, which means it has cleared
NAPIF_STATE_SCHED in state.

Signed-off-by: Nikita Yushchenko <[email protected]>
---
drivers/net/can/rcar/rcar_canfd.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/can/rcar/rcar_canfd.c b/drivers/net/can/rcar/rcar_canfd.c
index 05410008aa6b..de34a4b82d4a 100644
--- a/drivers/net/can/rcar/rcar_canfd.c
+++ b/drivers/net/can/rcar/rcar_canfd.c
@@ -1508,10 +1508,11 @@ static int rcar_canfd_rx_poll(struct napi_struct *napi, int quota)

/* All packets processed */
if (num_pkts < quota) {
- napi_complete_done(napi, num_pkts);
- /* Enable Rx FIFO interrupts */
- rcar_canfd_set_bit(priv->base, RCANFD_RFCC(ridx),
- RCANFD_RFCC_RFIE);
+ if (napi_complete_done(napi, num_pkts)) {
+ /* Enable Rx FIFO interrupts */
+ rcar_canfd_set_bit(priv->base, RCANFD_RFCC(ridx),
+ RCANFD_RFCC_RFIE);
+ }
}
return num_pkts;
}
--
2.11.0


2019-06-26 13:13:31

by Wolfram Sang

[permalink] [raw]
Subject: Re: [PATCH resend] can: rcar_canfd: fix possible IRQ storm on high load

On Wed, Jun 26, 2019 at 04:08:48PM +0300, Nikita Yushchenko wrote:
> We have observed rcar_canfd driver entering IRQ storm under high load,
> with following scenario:
> - rcar_canfd_global_interrupt() in entered due to Rx available,
> - napi_schedule_prep() is called, and sets NAPIF_STATE_SCHED in state
> - Rx fifo interrupts are masked,
> - rcar_canfd_global_interrupt() is entered again, this time due to
> error interrupt (e.g. due to overflow),
> - since scheduled napi poller has not yet executed, condition for calling
> napi_schedule_prep() from rcar_canfd_global_interrupt() remains true,
> thus napi_schedule_prep() gets called and sets NAPIF_STATE_MISSED flag
> in state,
> - later, napi poller function rcar_canfd_rx_poll() gets executed, and
> calls napi_complete_done(),
> - due to NAPIF_STATE_MISSED flag in state, this call does not clear
> NAPIF_STATE_SCHED flag from state,
> - on return from napi_complete_done(), rcar_canfd_rx_poll() unmasks Rx
> interrutps,
> - Rx interrupt happens, rcar_canfd_global_interrupt() gets called
> and calls napi_schedule_prep(),
> - since NAPIF_STATE_SCHED is set in state at this time, this call
> returns false,
> - due to that false return, rcar_canfd_global_interrupt() returns
> without masking Rx interrupt
> - and this results into IRQ storm: unmasked Rx interrupt happens again
> and again is misprocessed in the same way.
>
> This patch fixes that scenario by unmasking Rx interrupts only when
> napi_complete_done() returns true, which means it has cleared
> NAPIF_STATE_SCHED in state.
>
> Signed-off-by: Nikita Yushchenko <[email protected]>

CCing the driver author...

> ---
> drivers/net/can/rcar/rcar_canfd.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/can/rcar/rcar_canfd.c b/drivers/net/can/rcar/rcar_canfd.c
> index 05410008aa6b..de34a4b82d4a 100644
> --- a/drivers/net/can/rcar/rcar_canfd.c
> +++ b/drivers/net/can/rcar/rcar_canfd.c
> @@ -1508,10 +1508,11 @@ static int rcar_canfd_rx_poll(struct napi_struct *napi, int quota)
>
> /* All packets processed */
> if (num_pkts < quota) {
> - napi_complete_done(napi, num_pkts);
> - /* Enable Rx FIFO interrupts */
> - rcar_canfd_set_bit(priv->base, RCANFD_RFCC(ridx),
> - RCANFD_RFCC_RFIE);
> + if (napi_complete_done(napi, num_pkts)) {
> + /* Enable Rx FIFO interrupts */
> + rcar_canfd_set_bit(priv->base, RCANFD_RFCC(ridx),
> + RCANFD_RFCC_RFIE);
> + }
> }
> return num_pkts;
> }
> --
> 2.11.0
>


Attachments:
(No filename) (2.54 kB)
signature.asc (849.00 B)
Download all attachments

2019-06-26 13:34:19

by Wolfram Sang

[permalink] [raw]
Subject: Re: [PATCH resend] can: rcar_canfd: fix possible IRQ storm on high load

On Wed, Jun 26, 2019 at 03:12:51PM +0200, Wolfram Sang wrote:
> On Wed, Jun 26, 2019 at 04:08:48PM +0300, Nikita Yushchenko wrote:
> > We have observed rcar_canfd driver entering IRQ storm under high load,
> > with following scenario:
> > - rcar_canfd_global_interrupt() in entered due to Rx available,
> > - napi_schedule_prep() is called, and sets NAPIF_STATE_SCHED in state
> > - Rx fifo interrupts are masked,
> > - rcar_canfd_global_interrupt() is entered again, this time due to
> > error interrupt (e.g. due to overflow),
> > - since scheduled napi poller has not yet executed, condition for calling
> > napi_schedule_prep() from rcar_canfd_global_interrupt() remains true,
> > thus napi_schedule_prep() gets called and sets NAPIF_STATE_MISSED flag
> > in state,
> > - later, napi poller function rcar_canfd_rx_poll() gets executed, and
> > calls napi_complete_done(),
> > - due to NAPIF_STATE_MISSED flag in state, this call does not clear
> > NAPIF_STATE_SCHED flag from state,
> > - on return from napi_complete_done(), rcar_canfd_rx_poll() unmasks Rx
> > interrutps,
> > - Rx interrupt happens, rcar_canfd_global_interrupt() gets called
> > and calls napi_schedule_prep(),
> > - since NAPIF_STATE_SCHED is set in state at this time, this call
> > returns false,
> > - due to that false return, rcar_canfd_global_interrupt() returns
> > without masking Rx interrupt
> > - and this results into IRQ storm: unmasked Rx interrupt happens again
> > and again is misprocessed in the same way.
> >
> > This patch fixes that scenario by unmasking Rx interrupts only when
> > napi_complete_done() returns true, which means it has cleared
> > NAPIF_STATE_SCHED in state.
> >
> > Signed-off-by: Nikita Yushchenko <[email protected]>
>
> CCing the driver author...

Bounced :(


Attachments:
(No filename) (1.81 kB)
signature.asc (849.00 B)
Download all attachments