2023-12-19 17:20:11

by Hugo Villeneuve

[permalink] [raw]
Subject: [PATCH 02/18] serial: sc16is7xx: fix invalid sc16is7xx_lines bitfield in case of probe error

From: Hugo Villeneuve <[email protected]>

If an error occurs during probing, the sc16is7xx_lines bitfield may be left
in a state that doesn't represent the correct state of lines allocation.

For example, in a system with two SC16 devices, if an error occurs only
during probing of channel (port) B of the second device, sc16is7xx_lines
final state will be 00001011b instead of the expected 00000011b.

This is caused in part because of the "i--" in the for/loop located in
the out_ports: error path.

Fix this by checking the return value of uart_add_one_port() and set line
allocation bit only if this was successful. This allows the refactor of
the obfuscated for(i--...) loop in the error path, and properly call
uart_remove_one_port() only when needed, and properly unset line allocation
bits.

Also use same mechanism in remove() when calling uart_remove_one_port().

Fixes: c64349722d14 ("sc16is7xx: support multiple devices")
Cc: [email protected]
Cc: Yury Norov <[email protected]>
Signed-off-by: Hugo Villeneuve <[email protected]>
---
There is already a patch by Yury Norov <[email protected]> to simplify
sc16is7xx_alloc_line():
https://lore.kernel.org/all/[email protected]/

Since my patch gets rid of sc16is7xx_alloc_line() entirely, it would make
Yury's patch unnecessary.
---
drivers/tty/serial/sc16is7xx.c | 44 ++++++++++++++--------------------
1 file changed, 18 insertions(+), 26 deletions(-)

diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c
index b585663c1e6e..b92fd01cfeec 100644
--- a/drivers/tty/serial/sc16is7xx.c
+++ b/drivers/tty/serial/sc16is7xx.c
@@ -407,19 +407,6 @@ static void sc16is7xx_port_update(struct uart_port *port, u8 reg,
regmap_update_bits(one->regmap, reg, mask, val);
}

-static int sc16is7xx_alloc_line(void)
-{
- int i;
-
- BUILD_BUG_ON(SC16IS7XX_MAX_DEVS > BITS_PER_LONG);
-
- for (i = 0; i < SC16IS7XX_MAX_DEVS; i++)
- if (!test_and_set_bit(i, &sc16is7xx_lines))
- break;
-
- return i;
-}
-
static void sc16is7xx_power(struct uart_port *port, int on)
{
sc16is7xx_port_update(port, SC16IS7XX_IER_REG,
@@ -1550,6 +1537,13 @@ static int sc16is7xx_probe(struct device *dev,
SC16IS7XX_IOCONTROL_SRESET_BIT);

for (i = 0; i < devtype->nr_uart; ++i) {
+ s->p[i].port.line = find_first_zero_bit(&sc16is7xx_lines,
+ SC16IS7XX_MAX_DEVS);
+ if (s->p[i].port.line >= SC16IS7XX_MAX_DEVS) {
+ ret = -ERANGE;
+ goto out_ports;
+ }
+
/* Initialize port data */
s->p[i].port.dev = dev;
s->p[i].port.irq = irq;
@@ -1569,14 +1563,8 @@ static int sc16is7xx_probe(struct device *dev,
s->p[i].port.rs485_supported = sc16is7xx_rs485_supported;
s->p[i].port.ops = &sc16is7xx_ops;
s->p[i].old_mctrl = 0;
- s->p[i].port.line = sc16is7xx_alloc_line();
s->p[i].regmap = regmaps[i];

- if (s->p[i].port.line >= SC16IS7XX_MAX_DEVS) {
- ret = -ENOMEM;
- goto out_ports;
- }
-
mutex_init(&s->p[i].efr_lock);

ret = uart_get_rs485_mode(&s->p[i].port);
@@ -1594,8 +1582,13 @@ static int sc16is7xx_probe(struct device *dev,
kthread_init_work(&s->p[i].tx_work, sc16is7xx_tx_proc);
kthread_init_work(&s->p[i].reg_work, sc16is7xx_reg_proc);
kthread_init_delayed_work(&s->p[i].ms_work, sc16is7xx_ms_proc);
+
/* Register port */
- uart_add_one_port(&sc16is7xx_uart, &s->p[i].port);
+ ret = uart_add_one_port(&sc16is7xx_uart, &s->p[i].port);
+ if (ret)
+ goto out_ports;
+
+ set_bit(s->p[i].port.line, &sc16is7xx_lines);

/* Enable EFR */
sc16is7xx_port_write(&s->p[i].port, SC16IS7XX_LCR_REG,
@@ -1653,10 +1646,9 @@ static int sc16is7xx_probe(struct device *dev,
#endif

out_ports:
- for (i--; i >= 0; i--) {
- uart_remove_one_port(&sc16is7xx_uart, &s->p[i].port);
- clear_bit(s->p[i].port.line, &sc16is7xx_lines);
- }
+ for (i = 0; i < devtype->nr_uart; i++)
+ if (test_and_clear_bit(s->p[i].port.line, &sc16is7xx_lines))
+ uart_remove_one_port(&sc16is7xx_uart, &s->p[i].port);

kthread_stop(s->kworker_task);

@@ -1683,8 +1675,8 @@ static void sc16is7xx_remove(struct device *dev)

for (i = 0; i < s->devtype->nr_uart; i++) {
kthread_cancel_delayed_work_sync(&s->p[i].ms_work);
- uart_remove_one_port(&sc16is7xx_uart, &s->p[i].port);
- clear_bit(s->p[i].port.line, &sc16is7xx_lines);
+ if (test_and_clear_bit(s->p[i].port.line, &sc16is7xx_lines))
+ uart_remove_one_port(&sc16is7xx_uart, &s->p[i].port);
sc16is7xx_power(&s->p[i].port, 0);
}

--
2.39.2



2023-12-20 15:41:05

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH 02/18] serial: sc16is7xx: fix invalid sc16is7xx_lines bitfield in case of probe error

On Tue, Dec 19, 2023 at 12:18:46PM -0500, Hugo Villeneuve wrote:
> From: Hugo Villeneuve <[email protected]>
>
> If an error occurs during probing, the sc16is7xx_lines bitfield may be left
> in a state that doesn't represent the correct state of lines allocation.
>
> For example, in a system with two SC16 devices, if an error occurs only
> during probing of channel (port) B of the second device, sc16is7xx_lines
> final state will be 00001011b instead of the expected 00000011b.
>
> This is caused in part because of the "i--" in the for/loop located in
> the out_ports: error path.
>
> Fix this by checking the return value of uart_add_one_port() and set line
> allocation bit only if this was successful. This allows the refactor of
> the obfuscated for(i--...) loop in the error path, and properly call
> uart_remove_one_port() only when needed, and properly unset line allocation
> bits.
>
> Also use same mechanism in remove() when calling uart_remove_one_port().

Yes, this seems to be the correct one to fix the problem described in
the patch 1. I dunno why the patch 1 even exists.

As for Yury's patch, you are doing fixes, so your stuff has priority on his.

--
With Best Regards,
Andy Shevchenko



2023-12-21 15:57:10

by Hugo Villeneuve

[permalink] [raw]
Subject: Re: [PATCH 02/18] serial: sc16is7xx: fix invalid sc16is7xx_lines bitfield in case of probe error

On Wed, 20 Dec 2023 17:40:42 +0200
Andy Shevchenko <[email protected]> wrote:

> On Tue, Dec 19, 2023 at 12:18:46PM -0500, Hugo Villeneuve wrote:
> > From: Hugo Villeneuve <[email protected]>
> >
> > If an error occurs during probing, the sc16is7xx_lines bitfield may be left
> > in a state that doesn't represent the correct state of lines allocation.
> >
> > For example, in a system with two SC16 devices, if an error occurs only
> > during probing of channel (port) B of the second device, sc16is7xx_lines
> > final state will be 00001011b instead of the expected 00000011b.
> >
> > This is caused in part because of the "i--" in the for/loop located in
> > the out_ports: error path.
> >
> > Fix this by checking the return value of uart_add_one_port() and set line
> > allocation bit only if this was successful. This allows the refactor of
> > the obfuscated for(i--...) loop in the error path, and properly call
> > uart_remove_one_port() only when needed, and properly unset line allocation
> > bits.
> >
> > Also use same mechanism in remove() when calling uart_remove_one_port().
>
> Yes, this seems to be the correct one to fix the problem described in
> the patch 1. I dunno why the patch 1 even exists.

Hi,
this will indeed fix the problem described in patch 1.

However, if I remove patch 1, and I simulate the same probe error as
described in patch 1, now we get stuck forever when trying to
remove the driver. This is something that I observed before and
that patch 1 also corrected.

The problem is caused in sc16is7xx_remove() when calling this function

kthread_flush_worker(&s->kworker);

I am not sure how best to handle that without patch 1.

Hugo Villeneuve

2023-12-21 16:08:45

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH 02/18] serial: sc16is7xx: fix invalid sc16is7xx_lines bitfield in case of probe error

On Thu, Dec 21, 2023 at 10:56:39AM -0500, Hugo Villeneuve wrote:
> On Wed, 20 Dec 2023 17:40:42 +0200
> Andy Shevchenko <[email protected]> wrote:
> > On Tue, Dec 19, 2023 at 12:18:46PM -0500, Hugo Villeneuve wrote:

...

> > Yes, this seems to be the correct one to fix the problem described in
> > the patch 1. I dunno why the patch 1 even exists.
>
> Hi,
> this will indeed fix the problem described in patch 1.
>
> However, if I remove patch 1, and I simulate the same probe error as
> described in patch 1, now we get stuck forever when trying to
> remove the driver. This is something that I observed before and
> that patch 1 also corrected.
>
> The problem is caused in sc16is7xx_remove() when calling this function
>
> kthread_flush_worker(&s->kworker);
>
> I am not sure how best to handle that without patch 1.

So, it means we need to root cause this issue. Because patch 1 looks
really bogus.

--
With Best Regards,
Andy Shevchenko



2023-12-21 16:17:47

by Hugo Villeneuve

[permalink] [raw]
Subject: Re: [PATCH 02/18] serial: sc16is7xx: fix invalid sc16is7xx_lines bitfield in case of probe error

On Thu, 21 Dec 2023 10:56:39 -0500
Hugo Villeneuve <[email protected]> wrote:

> On Wed, 20 Dec 2023 17:40:42 +0200
> Andy Shevchenko <[email protected]> wrote:
>
> > On Tue, Dec 19, 2023 at 12:18:46PM -0500, Hugo Villeneuve wrote:
> > > From: Hugo Villeneuve <[email protected]>
> > >
> > > If an error occurs during probing, the sc16is7xx_lines bitfield may be left
> > > in a state that doesn't represent the correct state of lines allocation.
> > >
> > > For example, in a system with two SC16 devices, if an error occurs only
> > > during probing of channel (port) B of the second device, sc16is7xx_lines
> > > final state will be 00001011b instead of the expected 00000011b.
> > >
> > > This is caused in part because of the "i--" in the for/loop located in
> > > the out_ports: error path.
> > >
> > > Fix this by checking the return value of uart_add_one_port() and set line
> > > allocation bit only if this was successful. This allows the refactor of
> > > the obfuscated for(i--...) loop in the error path, and properly call
> > > uart_remove_one_port() only when needed, and properly unset line allocation
> > > bits.
> > >
> > > Also use same mechanism in remove() when calling uart_remove_one_port().
> >
> > Yes, this seems to be the correct one to fix the problem described in
> > the patch 1. I dunno why the patch 1 even exists.
>
> Hi,
> this will indeed fix the problem described in patch 1.
>
> However, if I remove patch 1, and I simulate the same probe error as
> described in patch 1, now we get stuck forever when trying to
> remove the driver. This is something that I observed before and
> that patch 1 also corrected.
>
> The problem is caused in sc16is7xx_remove() when calling this function
>
> kthread_flush_worker(&s->kworker);
>
> I am not sure how best to handle that without patch 1.

Also, if we manage to get past kthread_flush_worker() and
kthread_stop() (commented out for testing purposes), we get another bug:

# rmmod sc16is7xx
...
crystal-duart-24m already disabled
WARNING: CPU: 2 PID: 340 at drivers/clk/clk.c:1090
clk_core_disable+0x1b0/0x1e0
...
Call trace:
clk_core_disable+0x1b0/0x1e0
clk_disable+0x38/0x60
sc16is7xx_remove+0x1e4/0x240 [sc16is7xx]

This one is caused by calling clk_disable_unprepare(). But
clk_disable_unprepare() has already been called in probe error handling
code. Patch 1 also fixed this...

Hugo Villeneuve

2023-12-21 16:20:45

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH 02/18] serial: sc16is7xx: fix invalid sc16is7xx_lines bitfield in case of probe error

On Thu, Dec 21, 2023 at 11:13:37AM -0500, Hugo Villeneuve wrote:
> On Thu, 21 Dec 2023 10:56:39 -0500
> Hugo Villeneuve <[email protected]> wrote:
> > On Wed, 20 Dec 2023 17:40:42 +0200
> > Andy Shevchenko <[email protected]> wrote:

...

> > this will indeed fix the problem described in patch 1.
> >
> > However, if I remove patch 1, and I simulate the same probe error as
> > described in patch 1, now we get stuck forever when trying to
> > remove the driver. This is something that I observed before and
> > that patch 1 also corrected.
> >
> > The problem is caused in sc16is7xx_remove() when calling this function
> >
> > kthread_flush_worker(&s->kworker);
> >
> > I am not sure how best to handle that without patch 1.
>
> Also, if we manage to get past kthread_flush_worker() and
> kthread_stop() (commented out for testing purposes), we get another bug:
>
> # rmmod sc16is7xx
> ...
> crystal-duart-24m already disabled
> WARNING: CPU: 2 PID: 340 at drivers/clk/clk.c:1090
> clk_core_disable+0x1b0/0x1e0
> ...
> Call trace:
> clk_core_disable+0x1b0/0x1e0
> clk_disable+0x38/0x60
> sc16is7xx_remove+0x1e4/0x240 [sc16is7xx]
>
> This one is caused by calling clk_disable_unprepare(). But
> clk_disable_unprepare() has already been called in probe error handling
> code. Patch 1 also fixed this...

Word "fixed" is incorrect. "Papered over" is what it did.

--
With Best Regards,
Andy Shevchenko



2023-12-21 17:13:56

by Hugo Villeneuve

[permalink] [raw]
Subject: Re: [PATCH 02/18] serial: sc16is7xx: fix invalid sc16is7xx_lines bitfield in case of probe error

On Thu, 21 Dec 2023 18:16:40 +0200
Andy Shevchenko <[email protected]> wrote:

> On Thu, Dec 21, 2023 at 11:13:37AM -0500, Hugo Villeneuve wrote:
> > On Thu, 21 Dec 2023 10:56:39 -0500
> > Hugo Villeneuve <[email protected]> wrote:
> > > On Wed, 20 Dec 2023 17:40:42 +0200
> > > Andy Shevchenko <[email protected]> wrote:
>
> ...
>
> > > this will indeed fix the problem described in patch 1.
> > >
> > > However, if I remove patch 1, and I simulate the same probe error as
> > > described in patch 1, now we get stuck forever when trying to
> > > remove the driver. This is something that I observed before and
> > > that patch 1 also corrected.
> > >
> > > The problem is caused in sc16is7xx_remove() when calling this function
> > >
> > > kthread_flush_worker(&s->kworker);
> > >
> > > I am not sure how best to handle that without patch 1.
> >
> > Also, if we manage to get past kthread_flush_worker() and
> > kthread_stop() (commented out for testing purposes), we get another bug:
> >
> > # rmmod sc16is7xx
> > ...
> > crystal-duart-24m already disabled
> > WARNING: CPU: 2 PID: 340 at drivers/clk/clk.c:1090
> > clk_core_disable+0x1b0/0x1e0
> > ...
> > Call trace:
> > clk_core_disable+0x1b0/0x1e0
> > clk_disable+0x38/0x60
> > sc16is7xx_remove+0x1e4/0x240 [sc16is7xx]
> >
> > This one is caused by calling clk_disable_unprepare(). But
> > clk_disable_unprepare() has already been called in probe error handling
> > code. Patch 1 also fixed this...
>
> Word "fixed" is incorrect. "Papered over" is what it did.

Hi,
I just found the problem, and it was in my bug simulation, not the
driver itself. When I simulated the bug, I forgot to set "ret" to an
error code, and thus I returned 0 at the end of sc16is7xx_probe(). This
is why sc16is7xx_remove() was called when unloading driver, but
shouldn't have.

If I simulate my probe error and return "-EINVAL" at the end of
sc16is7xx_probe(), sc16is7xx_remove() is not called when
unloading the driver.

Sorry for the noise, so I will drop patch 1 and leave patch "fix invalid
sc16is7xx_lines bitfield in case of probe error" as it is, and
simply remove comments about Yury's patch.

Hugo.