Dirk sent a very interesting bug report[1]. This series is what I found
out by reviewing the driver. It is not tested yet because I couldn't
trigger the code path yet. The console still works normally with these
patches. Still, I am already curious in hearing your opinions, so here
is what I have...
[1] https://lore.kernel.org/r/[email protected]
Wolfram Sang (2):
serial: sh-sci: start hrtimer after setting up DMA
serial: sh-sci: always cancel hrtimer when DMA RX is invalidated
drivers/tty/serial/sh-sci.c | 13 +++----------
1 file changed, 3 insertions(+), 10 deletions(-)
--
2.43.0
Clear the timer whenever 'chan_rx' is cleared to avoid an OOPS.
Currently, the driver only runs the timer when 'chan_rx' is set before.
However, it is good defensive programming to make sure the hrtimer is
always stopped before clearing the 'chan_rx' pointer.
Reported-by: Dirk Behme <[email protected]>
Closes: https://lore.kernel.org/r/[email protected]
Fixes: 9ab765566086 ("serial: sh-sci: Remove timer on shutdown of port")
Signed-off-by: Wolfram Sang <[email protected]>
---
Locking needs to be double-checked here. This patch is mainly calling
for opinions.
drivers/tty/serial/sh-sci.c | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
diff --git a/drivers/tty/serial/sh-sci.c b/drivers/tty/serial/sh-sci.c
index 1e3c26c11c49..5ad73933c1c5 100644
--- a/drivers/tty/serial/sh-sci.c
+++ b/drivers/tty/serial/sh-sci.c
@@ -1262,6 +1262,7 @@ static void sci_dma_rx_chan_invalidate(struct sci_port *s)
{
unsigned int i;
+ hrtimer_cancel(&s->rx_timer);
s->chan_rx = NULL;
for (i = 0; i < ARRAY_SIZE(s->cookie_rx); i++)
s->cookie_rx[i] = -EINVAL;
@@ -2242,14 +2243,6 @@ static void sci_shutdown(struct uart_port *port)
scr & (SCSCR_CKE1 | SCSCR_CKE0 | s->hscif_tot));
uart_port_unlock_irqrestore(port, flags);
-#ifdef CONFIG_SERIAL_SH_SCI_DMA
- if (s->chan_rx_saved) {
- dev_dbg(port->dev, "%s(%d) deleting rx_timer\n", __func__,
- port->line);
- hrtimer_cancel(&s->rx_timer);
- }
-#endif
-
if (s->rx_trigger > 1 && s->rx_fifo_timeout > 0)
del_timer_sync(&s->rx_fifo_timer);
sci_free_irq(s);
--
2.43.0
Hi Wolfram,
On Tue, Apr 16, 2024 at 2:35 PM Wolfram Sang
<[email protected]> wrote:
> Clear the timer whenever 'chan_rx' is cleared to avoid an OOPS.
> Currently, the driver only runs the timer when 'chan_rx' is set before.
> However, it is good defensive programming to make sure the hrtimer is
> always stopped before clearing the 'chan_rx' pointer.
>
> Reported-by: Dirk Behme <[email protected]>
> Closes: https://lore.kernel.org/r/[email protected]
> Fixes: 9ab765566086 ("serial: sh-sci: Remove timer on shutdown of port")
> Signed-off-by: Wolfram Sang <[email protected]>
Thanks for your patch!
> Locking needs to be double-checked here. This patch is mainly calling
> for opinions.
I do think you need to cancel the timer: even when not restarting
the timer in sci_dma_rx_complete() due to a DMA failure, the previous
timer may still be running, and will cause a NULL pointer dereference
on s->chan_rx on timer expiry.
> --- a/drivers/tty/serial/sh-sci.c
> +++ b/drivers/tty/serial/sh-sci.c
> @@ -1262,6 +1262,7 @@ static void sci_dma_rx_chan_invalidate(struct sci_port *s)
> {
> unsigned int i;
>
> + hrtimer_cancel(&s->rx_timer);
Is it safe to do this unconditionally on shutdown (cfr. the old check
for s->chan_rx_saved)?
> s->chan_rx = NULL;
> for (i = 0; i < ARRAY_SIZE(s->cookie_rx); i++)
> s->cookie_rx[i] = -EINVAL;
> @@ -2242,14 +2243,6 @@ static void sci_shutdown(struct uart_port *port)
> scr & (SCSCR_CKE1 | SCSCR_CKE0 | s->hscif_tot));
> uart_port_unlock_irqrestore(port, flags);
>
> -#ifdef CONFIG_SERIAL_SH_SCI_DMA
> - if (s->chan_rx_saved) {
> - dev_dbg(port->dev, "%s(%d) deleting rx_timer\n", __func__,
> - port->line);
> - hrtimer_cancel(&s->rx_timer);
> - }
> -#endif
> -
> if (s->rx_trigger > 1 && s->rx_fifo_timeout > 0)
> del_timer_sync(&s->rx_fifo_timer);
> sci_free_irq(s);
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68korg
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
Hi Geert,
good news, I was able to trigger the DMA rx code path. I dunno what I
did wrong last time. I started from scratch again and it worked easily
by dd-ing random data to the second non-console debug port.
> I do think you need to cancel the timer: even when not restarting
> the timer in sci_dma_rx_complete() due to a DMA failure, the previous
> timer may still be running, and will cause a NULL pointer dereference
> on s->chan_rx on timer expiry.
Taking locking into account, I think this patch is bogus. If we run into
this NULL-pointer, we have a locking problem and cancelling the timer in
sci_dma_rx_chan_invalidate() is not going to fix the underlying locking
problem. sci_dma_rx_chan_invalidate() does not only clear the pointer
but also the cookie_rx-array. sci_dma_rx_timer_fn() bails out via
sci_dma_rx_find_active() if that array is cleared. It does so before
accessing the chan_rx-pointer. So, it looks to me that should work once
all calls to sci_dma_rx_chan_invalidate() are protected. And there is
one path where this is not true, namely via sci_dma_rx_release() during
shutdown. This is why I asked Dirk if the system was about to shutdown.
Currently, I don't see any other problematic code path.
> > -#ifdef CONFIG_SERIAL_SH_SCI_DMA
> > - if (s->chan_rx_saved) {
> > - dev_dbg(port->dev, "%s(%d) deleting rx_timer\n", __func__,
> > - port->line);
> > - hrtimer_cancel(&s->rx_timer);
> > - }
> > -#endif
Also, this chunk needs to stay. I suggested in patch 1 to cancel the
timer on successful dma_rx_complete, so the timer only runs when a DMA
is in progress. Then, of course, we need to cancel it in shutdown.
I hope I am not seeing "no wood for the trees" by now. I am not
convinced that I actually found Dirk's race condition yet...
All the best,
Wolfram