2024-04-04 15:02:06

by Andy Shevchenko

[permalink] [raw]
Subject: [PATCH v1 1/1] serial: core: Clearing the circular buffer before NULLifying it

The circular buffer is NULLified in uart_tty_port_shutdown()
under the spin lock. However, the PM or other timer based callbacks
may still trigger after this event without knowning that buffer pointer
is not valid. Since the serial code is a bit inconsistent in checking
the buffer state (some rely on the head-tail positions, some on the
buffer pointer), it's better to have both aligned, i.e. buffer pointer
to be NULL and head-tail possitions to be the same, meaning it's empty.
This will prevent asynchronous calls to dereference NULL pointer as
reported recently in 8250 case:

BUG: kernel NULL pointer dereference, address: 00000cf5
Workqueue: pm pm_runtime_work
EIP: serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809)
...
? serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809)
__start_tx (drivers/tty/serial/8250/8250_port.c:1551)
serial8250_start_tx (drivers/tty/serial/8250/8250_port.c:1654)
serial_port_runtime_suspend (include/linux/serial_core.h:667 drivers/tty/serial/serial_port.c:63)
__rpm_callback (drivers/base/power/runtime.c:393)
? serial_port_remove (drivers/tty/serial/serial_port.c:50)
rpm_suspend (drivers/base/power/runtime.c:447)

The proposed change will prevent ->start_tx() to be called during
suspend on shut down port.

Fixes: 43066e32227e ("serial: port: Don't suspend if the port is still busy")
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-lkp/[email protected]
Signed-off-by: Andy Shevchenko <[email protected]>
---

I have got into the very similar issue while working on max3100 driver.
I haven't checked the 8250 case, but for mine the culprit is the same
and this patch fixes it. Hence I assume it will fix the 8250 case as
well.

drivers/tty/serial/serial_core.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
index a005fc06a077..ba3a674a8bbf 100644
--- a/drivers/tty/serial/serial_core.c
+++ b/drivers/tty/serial/serial_core.c
@@ -1788,6 +1788,7 @@ static void uart_tty_port_shutdown(struct tty_port *port)
* Free the transmit buffer.
*/
uart_port_lock_irq(uport);
+ uart_circ_clear(&state->xmit);
buf = state->xmit.buf;
state->xmit.buf = NULL;
uart_port_unlock_irq(uport);
--
2.43.0.rc1.1.gbec44491f096



2024-04-05 05:25:18

by Jiri Slaby

[permalink] [raw]
Subject: Re: [PATCH v1 1/1] serial: core: Clearing the circular buffer before NULLifying it

On 04. 04. 24, 16:59, Andy Shevchenko wrote:
> The circular buffer is NULLified in uart_tty_port_shutdown()
> under the spin lock. However, the PM or other timer based callbacks
> may still trigger after this event without knowning that buffer pointer
> is not valid. Since the serial code is a bit inconsistent in checking
> the buffer state (some rely on the head-tail positions, some on the
> buffer pointer), it's better to have both aligned, i.e. buffer pointer
> to be NULL and head-tail possitions to be the same, meaning it's empty.
> This will prevent asynchronous calls to dereference NULL pointer as
> reported recently in 8250 case:
>
> BUG: kernel NULL pointer dereference, address: 00000cf5
> Workqueue: pm pm_runtime_work
> EIP: serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809)
> ...
> ? serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809)
> __start_tx (drivers/tty/serial/8250/8250_port.c:1551)
> serial8250_start_tx (drivers/tty/serial/8250/8250_port.c:1654)
> serial_port_runtime_suspend (include/linux/serial_core.h:667 drivers/tty/serial/serial_port.c:63)
> __rpm_callback (drivers/base/power/runtime.c:393)
> ? serial_port_remove (drivers/tty/serial/serial_port.c:50)
> rpm_suspend (drivers/base/power/runtime.c:447)

Yeah, I noticed start_tx() is called repeatedly after shutdown()
yesterday too. So thanks for looking into this.

And it's pretty weird. I think it's new with the runtime PM (sure, /me
reads Fixes: now). I am not sure if it is documented, but most of the
code in tty/ assumes NO ordinary ->ops (like start_tx()) are called
after shutdown(). Actually, to me it occurs like serial8250_start_tx()
should not be called in the first place. It makes no sense after all.

BTW cannot be x_char en/queued at that time too (the other check in the
if)? But again, serial8250_start_tx() should not be called after shutdown().

> The proposed change will prevent ->start_tx() to be called during
> suspend on shut down port.
>
> Fixes: 43066e32227e ("serial: port: Don't suspend if the port is still busy")
> Reported-by: kernel test robot <[email protected]>
> Closes: https://lore.kernel.org/oe-lkp/[email protected]
> Signed-off-by: Andy Shevchenko <[email protected]>
> ---
>
> I have got into the very similar issue while working on max3100 driver.
> I haven't checked the 8250 case, but for mine the culprit is the same
> and this patch fixes it. Hence I assume it will fix the 8250 case as
> well.
>
> drivers/tty/serial/serial_core.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
> index a005fc06a077..ba3a674a8bbf 100644
> --- a/drivers/tty/serial/serial_core.c
> +++ b/drivers/tty/serial/serial_core.c
> @@ -1788,6 +1788,7 @@ static void uart_tty_port_shutdown(struct tty_port *port)
> * Free the transmit buffer.
> */
> uart_port_lock_irq(uport);
> + uart_circ_clear(&state->xmit);
> buf = state->xmit.buf;
> state->xmit.buf = NULL;
> uart_port_unlock_irq(uport);

--
js
suse labs


2024-04-05 05:42:54

by Tony Lindgren

[permalink] [raw]
Subject: Re: [PATCH v1 1/1] serial: core: Clearing the circular buffer before NULLifying it

* Jiri Slaby <[email protected]> [240405 05:25]:
> But again, serial8250_start_tx() should not be called after shutdown().

Sounds like we should add some check in addition to UPF_DEAD to
serial_port_runtime_suspend() and serial_port_runtime_resume() to
bail out early on shutdown.

Regards,

Tony


2024-04-05 16:29:33

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v1 1/1] serial: core: Clearing the circular buffer before NULLifying it

On Fri, Apr 05, 2024 at 07:25:03AM +0200, Jiri Slaby wrote:
> On 04. 04. 24, 16:59, Andy Shevchenko wrote:
> > The circular buffer is NULLified in uart_tty_port_shutdown()
> > under the spin lock. However, the PM or other timer based callbacks
> > may still trigger after this event without knowning that buffer pointer
> > is not valid. Since the serial code is a bit inconsistent in checking
> > the buffer state (some rely on the head-tail positions, some on the
> > buffer pointer), it's better to have both aligned, i.e. buffer pointer
> > to be NULL and head-tail possitions to be the same, meaning it's empty.
> > This will prevent asynchronous calls to dereference NULL pointer as
> > reported recently in 8250 case:
> >
> > BUG: kernel NULL pointer dereference, address: 00000cf5
> > Workqueue: pm pm_runtime_work
> > EIP: serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809)
> > ...
> > ? serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809)
> > __start_tx (drivers/tty/serial/8250/8250_port.c:1551)
> > serial8250_start_tx (drivers/tty/serial/8250/8250_port.c:1654)
> > serial_port_runtime_suspend (include/linux/serial_core.h:667 drivers/tty/serial/serial_port.c:63)
> > __rpm_callback (drivers/base/power/runtime.c:393)
> > ? serial_port_remove (drivers/tty/serial/serial_port.c:50)
> > rpm_suspend (drivers/base/power/runtime.c:447)
>
> Yeah, I noticed start_tx() is called repeatedly after shutdown() yesterday
> too. So thanks for looking into this.

> And it's pretty weird. I think it's new with the runtime PM (sure, /me reads
> Fixes: now). I am not sure if it is documented, but most of the code in tty/
> assumes NO ordinary ->ops (like start_tx()) are called after shutdown().
> Actually, to me it occurs like serial8250_start_tx() should not be called in
> the first place. It makes no sense after all.
>
> BTW cannot be x_char en/queued at that time too (the other check in the if)?
> But again, serial8250_start_tx() should not be called after shutdown().

Yes, and I have no clue how we can check this as startup can be called again
and so on. The PM callback is timer based AFAIU, meaning it may happen at any
time.

But do you agree that this patch has value on its own?

--
With Best Regards,
Andy Shevchenko



2024-04-05 23:51:47

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v1 1/1] serial: core: Clearing the circular buffer before NULLifying it

On Fri, Apr 05, 2024 at 06:17:54PM +0300, Andy Shevchenko wrote:
> On Fri, Apr 05, 2024 at 07:25:03AM +0200, Jiri Slaby wrote:
> > On 04. 04. 24, 16:59, Andy Shevchenko wrote:
> > > The circular buffer is NULLified in uart_tty_port_shutdown()
> > > under the spin lock. However, the PM or other timer based callbacks
> > > may still trigger after this event without knowning that buffer pointer
> > > is not valid. Since the serial code is a bit inconsistent in checking
> > > the buffer state (some rely on the head-tail positions, some on the
> > > buffer pointer), it's better to have both aligned, i.e. buffer pointer
> > > to be NULL and head-tail possitions to be the same, meaning it's empty.
> > > This will prevent asynchronous calls to dereference NULL pointer as
> > > reported recently in 8250 case:
> > >
> > > BUG: kernel NULL pointer dereference, address: 00000cf5
> > > Workqueue: pm pm_runtime_work
> > > EIP: serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809)
> > > ...
> > > ? serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809)
> > > __start_tx (drivers/tty/serial/8250/8250_port.c:1551)
> > > serial8250_start_tx (drivers/tty/serial/8250/8250_port.c:1654)
> > > serial_port_runtime_suspend (include/linux/serial_core.h:667 drivers/tty/serial/serial_port.c:63)
> > > __rpm_callback (drivers/base/power/runtime.c:393)
> > > ? serial_port_remove (drivers/tty/serial/serial_port.c:50)
> > > rpm_suspend (drivers/base/power/runtime.c:447)
> >
> > Yeah, I noticed start_tx() is called repeatedly after shutdown() yesterday
> > too. So thanks for looking into this.
>
> > And it's pretty weird. I think it's new with the runtime PM (sure, /me reads
> > Fixes: now). I am not sure if it is documented, but most of the code in tty/
> > assumes NO ordinary ->ops (like start_tx()) are called after shutdown().
> > Actually, to me it occurs like serial8250_start_tx() should not be called in
> > the first place. It makes no sense after all.
> >
> > BTW cannot be x_char en/queued at that time too (the other check in the if)?
> > But again, serial8250_start_tx() should not be called after shutdown().
>
> Yes, and I have no clue how we can check this as startup can be called again
> and so on. The PM callback is timer based AFAIU, meaning it may happen at any
> time.
>
> But do you agree that this patch has value on its own?

FWIW, https://lore.kernel.org/all/[email protected]/T/

--
With Best Regards,
Andy Shevchenko



2024-04-06 05:46:51

by Tony Lindgren

[permalink] [raw]
Subject: Re: [PATCH v1 1/1] serial: core: Clearing the circular buffer before NULLifying it

* Andy Shevchenko <[email protected]> [240405 22:37]:
> On Fri, Apr 05, 2024 at 06:17:54PM +0300, Andy Shevchenko wrote:
> > On Fri, Apr 05, 2024 at 07:25:03AM +0200, Jiri Slaby wrote:
> > > BTW cannot be x_char en/queued at that time too (the other check in the if)?
> > > But again, serial8250_start_tx() should not be called after shutdown().
> >
> > Yes, and I have no clue how we can check this as startup can be called again
> > and so on. The PM callback is timer based AFAIU, meaning it may happen at any
> > time.

So below is an incomplete pseudo patch just showing where we could disable
tx for runtime PM.

The patch won't compile, and assumes we only disable tx for runtime PM.

However, if we need it elsewhere also, then we may want to set up some
UPF_TX_ENABLED type flag instead of serial_base_port specific calls.

My preference would be to limit it to serial_port.c if we can get away
with that.

Anybody have better ideas for enabling and disabling tx?

> > But do you agree that this patch has value on its own?
>
> FWIW, https://lore.kernel.org/all/[email protected]/T/

No objections from me for clearing the xmit. But should it also be done for
uart_shutdown() in addition to uart_tty_port_shutdown()?

Regards,

Tony

8< -----------------------
diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
--- a/drivers/tty/serial/serial_core.c
+++ b/drivers/tty/serial/serial_core.c
@@ -345,16 +345,23 @@ static int uart_startup(struct tty_struct *tty, struct uart_state *state,
bool init_hw)
{
struct tty_port *port = &state->port;
+ struct uart_port *uport;
int retval;

if (tty_port_initialized(port))
- return 0;
+ goto enable_tx;

retval = uart_port_startup(tty, state, init_hw);
- if (retval)
+ if (retval) {
set_bit(TTY_IO_ERROR, &tty->flags);
+ return retval;
+ }

- return retval;
+enable_tx:
+ uport = uart_port_check(state);
+ serial_base_port_enable_tx(uport);
+
+ return 0;
}

/*
@@ -377,6 +384,9 @@ static void uart_shutdown(struct tty_struct *tty, struct uart_state *state)
if (tty)
set_bit(TTY_IO_ERROR, &tty->flags);

+ if (uport)
+ serial_base_port_disable_tx(uport);
+
if (tty_port_initialized(port)) {
tty_port_set_initialized(port, false);

@@ -1821,6 +1831,7 @@ static void uart_tty_port_shutdown(struct tty_port *port)
uport->ops->stop_rx(uport);
uart_port_unlock_irq(uport);

+ serial_base_port_disable_tx(uport);
uart_port_shutdown(port);

/*

2024-04-07 09:49:33

by Yicong Yang

[permalink] [raw]
Subject: Re: [PATCH v1 1/1] serial: core: Clearing the circular buffer before NULLifying it

Hi Andy,

On 2024/4/4 22:59, Andy Shevchenko wrote:
> The circular buffer is NULLified in uart_tty_port_shutdown()
> under the spin lock. However, the PM or other timer based callbacks
> may still trigger after this event without knowning that buffer pointer
> is not valid. Since the serial code is a bit inconsistent in checking
> the buffer state (some rely on the head-tail positions, some on the
> buffer pointer), it's better to have both aligned, i.e. buffer pointer
> to be NULL and head-tail possitions to be the same, meaning it's empty.
> This will prevent asynchronous calls to dereference NULL pointer as
> reported recently in 8250 case:
>
> BUG: kernel NULL pointer dereference, address: 00000cf5
> Workqueue: pm pm_runtime_work
> EIP: serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809)
> ...
> ? serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809)
> __start_tx (drivers/tty/serial/8250/8250_port.c:1551)
> serial8250_start_tx (drivers/tty/serial/8250/8250_port.c:1654)
> serial_port_runtime_suspend (include/linux/serial_core.h:667 drivers/tty/serial/serial_port.c:63)
> __rpm_callback (drivers/base/power/runtime.c:393)
> ? serial_port_remove (drivers/tty/serial/serial_port.c:50)
> rpm_suspend (drivers/base/power/runtime.c:447)
>
> The proposed change will prevent ->start_tx() to be called during
> suspend on shut down port.
>

Just saw the issue and thanks for your timely fix. I didn't got a board with 8250 and sorry for
didn't found this issue.

FYI, I checked device_shutdown() and seems it called pm_runtime_barrier() for waiting all
the scheduled RPM callbacks finished and keep the device in resume state. So ideally there
shouldn't be any pending requests later since we handled them before shutdown?

There's someone encountered the same issue in shutdown() due to runtime pm and fixed it in
af8db1508f2c ("PM / driver core: disable device's runtime PM during shutdown")
patch above seems to still have some problem and later fixed by:
fe6b91f47080 ("PM / Driver core: leave runtime PM enabled during system shutdown")

But seems the handling in the driver core doesn't cover the case here..

> Fixes: 43066e32227e ("serial: port: Don't suspend if the port is still busy")
> Reported-by: kernel test robot <[email protected]>
> Closes: https://lore.kernel.org/oe-lkp/[email protected]
> Signed-off-by: Andy Shevchenko <[email protected]>
> ---
>
> I have got into the very similar issue while working on max3100 driver.
> I haven't checked the 8250 case, but for mine the culprit is the same
> and this patch fixes it. Hence I assume it will fix the 8250 case as
> well.
>
> drivers/tty/serial/serial_core.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
> index a005fc06a077..ba3a674a8bbf 100644
> --- a/drivers/tty/serial/serial_core.c
> +++ b/drivers/tty/serial/serial_core.c
> @@ -1788,6 +1788,7 @@ static void uart_tty_port_shutdown(struct tty_port *port)
> * Free the transmit buffer.
> */
> uart_port_lock_irq(uport);
> + uart_circ_clear(&state->xmit);
> buf = state->xmit.buf;
> state->xmit.buf = NULL;
> uart_port_unlock_irq(uport);
>

2024-04-08 14:36:53

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v1 1/1] serial: core: Clearing the circular buffer before NULLifying it

On Sun, Apr 07, 2024 at 05:49:19PM +0800, Yicong Yang wrote:
> On 2024/4/4 22:59, Andy Shevchenko wrote:
> > The circular buffer is NULLified in uart_tty_port_shutdown()
> > under the spin lock. However, the PM or other timer based callbacks
> > may still trigger after this event without knowning that buffer pointer
> > is not valid. Since the serial code is a bit inconsistent in checking
> > the buffer state (some rely on the head-tail positions, some on the
> > buffer pointer), it's better to have both aligned, i.e. buffer pointer
> > to be NULL and head-tail possitions to be the same, meaning it's empty.
> > This will prevent asynchronous calls to dereference NULL pointer as
> > reported recently in 8250 case:
> >
> > BUG: kernel NULL pointer dereference, address: 00000cf5
> > Workqueue: pm pm_runtime_work
> > EIP: serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809)
> > ...
> > ? serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809)
> > __start_tx (drivers/tty/serial/8250/8250_port.c:1551)
> > serial8250_start_tx (drivers/tty/serial/8250/8250_port.c:1654)
> > serial_port_runtime_suspend (include/linux/serial_core.h:667 drivers/tty/serial/serial_port.c:63)
> > __rpm_callback (drivers/base/power/runtime.c:393)
> > ? serial_port_remove (drivers/tty/serial/serial_port.c:50)
> > rpm_suspend (drivers/base/power/runtime.c:447)
> >
> > The proposed change will prevent ->start_tx() to be called during
> > suspend on shut down port.
>
> Just saw the issue and thanks for your timely fix. I didn't got a board with
> 8250 and sorry for didn't found this issue.

But does this change make no regression in your case? Can you test it?

> FYI, I checked device_shutdown() and seems it called pm_runtime_barrier() for waiting all
> the scheduled RPM callbacks finished and keep the device in resume state. So ideally there
> shouldn't be any pending requests later since we handled them before shutdown?
>
> There's someone encountered the same issue in shutdown() due to runtime pm and fixed it in
> af8db1508f2c ("PM / driver core: disable device's runtime PM during shutdown")
> patch above seems to still have some problem and later fixed by:
> fe6b91f47080 ("PM / Driver core: leave runtime PM enabled during system shutdown")

Ah, yes, thanks for reminding (yeah, I saw those patches, let me test it on my setup.

> But seems the handling in the driver core doesn't cover the case here..

Of course, since we have our own shutdown on the upper level, we don't kill the
device, but we do release _some_ resources.

--
With Best Regards,
Andy Shevchenko



2024-04-08 15:45:21

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v1 1/1] serial: core: Clearing the circular buffer before NULLifying it

On Fri, Apr 05, 2024 at 07:25:03AM +0200, Jiri Slaby wrote:
> On 04. 04. 24, 16:59, Andy Shevchenko wrote:
> > The circular buffer is NULLified in uart_tty_port_shutdown()
> > under the spin lock. However, the PM or other timer based callbacks
> > may still trigger after this event without knowning that buffer pointer
> > is not valid. Since the serial code is a bit inconsistent in checking
> > the buffer state (some rely on the head-tail positions, some on the
> > buffer pointer), it's better to have both aligned, i.e. buffer pointer
> > to be NULL and head-tail possitions to be the same, meaning it's empty.
> > This will prevent asynchronous calls to dereference NULL pointer as
> > reported recently in 8250 case:
> >
> > BUG: kernel NULL pointer dereference, address: 00000cf5
> > Workqueue: pm pm_runtime_work
> > EIP: serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809)
> > ...
> > ? serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809)
> > __start_tx (drivers/tty/serial/8250/8250_port.c:1551)
> > serial8250_start_tx (drivers/tty/serial/8250/8250_port.c:1654)
> > serial_port_runtime_suspend (include/linux/serial_core.h:667 drivers/tty/serial/serial_port.c:63)
> > __rpm_callback (drivers/base/power/runtime.c:393)
> > ? serial_port_remove (drivers/tty/serial/serial_port.c:50)
> > rpm_suspend (drivers/base/power/runtime.c:447)
>
> Yeah, I noticed start_tx() is called repeatedly after shutdown() yesterday
> too. So thanks for looking into this.
>
> And it's pretty weird. I think it's new with the runtime PM (sure, /me reads
> Fixes: now). I am not sure if it is documented, but most of the code in tty/
> assumes NO ordinary ->ops (like start_tx()) are called after shutdown().
> Actually, to me it occurs like serial8250_start_tx() should not be called in
> the first place. It makes no sense after all.

So, with PM autosuspend the [PM] callback can be called in to cases:
- port is open and busy, but PM is not informed (yet) of it
- port is closed while PM timer is still counting

The Fixes seems about the first case (so we need to call Tx there).
The second one probably can be fixed properly with PM barrier.

This fix is just against the oops AFAIU the bug report and my own case.

--
With Best Regards,
Andy Shevchenko