2020-06-15 08:10:29

by Krzysztof Kozlowski

[permalink] [raw]
Subject: [PATCH v2 1/3] spi: spi-fsl-dspi: Fix external abort on interrupt in exit paths

If interrupt comes late, during probe error path or device remove (could
be triggered with CONFIG_DEBUG_SHIRQ), the interrupt handler
dspi_interrupt() will access registers with the clock being disabled. This
leads to external abort on non-linefetch on Toradex Colibri VF50 module
(with Vybrid VF5xx):

$ echo 4002d000.spi > /sys/devices/platform/soc/40000000.bus/4002d000.spi/driver/unbind

Unhandled fault: external abort on non-linefetch (0x1008) at 0x8887f02c
Internal error: : 1008 [#1] ARM
CPU: 0 PID: 136 Comm: sh Not tainted 5.7.0-next-20200610-00009-g5c913fa0f9c5-dirty #74
Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
(regmap_mmio_read32le) from [<8061885c>] (regmap_mmio_read+0x48/0x68)
(regmap_mmio_read) from [<8060e3b8>] (_regmap_bus_reg_read+0x24/0x28)
(_regmap_bus_reg_read) from [<80611c50>] (_regmap_read+0x70/0x1c0)
(_regmap_read) from [<80611dec>] (regmap_read+0x4c/0x6c)
(regmap_read) from [<80678ca0>] (dspi_interrupt+0x3c/0xa8)
(dspi_interrupt) from [<8017acec>] (free_irq+0x26c/0x3cc)
(free_irq) from [<8017dcec>] (devm_irq_release+0x1c/0x20)
(devm_irq_release) from [<805f98ec>] (release_nodes+0x1e4/0x298)
(release_nodes) from [<805f9ac8>] (devres_release_all+0x40/0x60)
(devres_release_all) from [<805f5134>] (device_release_driver_internal+0x108/0x1ac)
(device_release_driver_internal) from [<805f521c>] (device_driver_detach+0x20/0x24)

Fixes: 349ad66c0ab0 ("spi:Add Freescale DSPI driver for Vybrid VF610 platform")
Cc: <[email protected]>
Signed-off-by: Krzysztof Kozlowski <[email protected]>

---

This is an follow up of my other patch for I2C IMX driver [1]. Let's fix the
issues consistently.

[1] https://lore.kernel.org/lkml/[email protected]/T/#u

Changes since v1:
1. Disable the IRQ instead of using non-devm interface.
---
drivers/spi/spi-fsl-dspi.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c
index 58190c94561f..023e05c53b85 100644
--- a/drivers/spi/spi-fsl-dspi.c
+++ b/drivers/spi/spi-fsl-dspi.c
@@ -1400,7 +1400,7 @@ static int dspi_probe(struct platform_device *pdev)
ret = dspi_request_dma(dspi, res->start);
if (ret < 0) {
dev_err(&pdev->dev, "can't get dma channels\n");
- goto out_clk_put;
+ goto disable_irq;
}
}

@@ -1415,11 +1415,14 @@ static int dspi_probe(struct platform_device *pdev)
ret = spi_register_controller(ctlr);
if (ret != 0) {
dev_err(&pdev->dev, "Problem registering DSPI ctlr\n");
- goto out_clk_put;
+ goto disable_irq;
}

return ret;

+disable_irq:
+ if (dspi->irq > 0)
+ disable_irq(dspi->irq);
out_clk_put:
clk_disable_unprepare(dspi->clk);
out_ctlr_put:
@@ -1435,6 +1438,8 @@ static int dspi_remove(struct platform_device *pdev)

/* Disconnect from the SPI framework */
dspi_release_dma(dspi);
+ if (dspi->irq > 0)
+ disable_irq(dspi->irq);
clk_disable_unprepare(dspi->clk);
spi_unregister_controller(dspi->ctlr);

--
2.7.4


2020-06-15 08:10:32

by Krzysztof Kozlowski

[permalink] [raw]
Subject: [PATCH v2 2/3] spi: spi-fsl-dspi: Initialize completion before possible interrupt

The interrupt handler calls completion and is IRQ requested before the
completion is initialized. Logically it should be the other way.

Fixes: 4f5ee75ea171 ("spi: spi-fsl-dspi: Replace interruptible wait queue with a simple completion")
Cc: <[email protected]>
Signed-off-by: Krzysztof Kozlowski <[email protected]>

---

Changes since v1:
1. Rework the commit msg.
---
drivers/spi/spi-fsl-dspi.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c
index 023e05c53b85..080c5624bd1e 100644
--- a/drivers/spi/spi-fsl-dspi.c
+++ b/drivers/spi/spi-fsl-dspi.c
@@ -1385,6 +1385,8 @@ static int dspi_probe(struct platform_device *pdev)
goto poll_mode;
}

+ init_completion(&dspi->xfer_done);
+
ret = devm_request_irq(&pdev->dev, dspi->irq, dspi_interrupt,
IRQF_SHARED, pdev->name, dspi);
if (ret < 0) {
@@ -1392,8 +1394,6 @@ static int dspi_probe(struct platform_device *pdev)
goto out_clk_put;
}

- init_completion(&dspi->xfer_done);
-
poll_mode:

if (dspi->devtype_data->trans_mode == DSPI_DMA_MODE) {
--
2.7.4

2020-06-15 08:12:57

by Krzysztof Kozlowski

[permalink] [raw]
Subject: [PATCH v2 3/3] genirq: Do not test disabled IRQs with DEBUG_SHIRQ

Testing events during freeing of disabled shared interrupts
(CONFIG_DEBUG_SHIRQ) leads to false positives. The driver disabled
interrupts on purpose to be sure that they will not fire during device
removal.

Signed-off-by: Krzysztof Kozlowski <[email protected]>

---

Changes since v1:
1. New patch.
---
kernel/irq/manage.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 761911168438..f19f0dedc30d 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1775,12 +1775,14 @@ static struct irqaction *__free_irq(struct irq_desc *desc, void *dev_id)
/*
* It's a shared IRQ -- the driver ought to be prepared for an IRQ
* event to happen even now it's being freed, so let's make sure that
- * is so by doing an extra call to the handler ....
+ * is so by doing an extra call to the handler.
+ * Although the driver could disable the interrupts just before freeing
+ * just to avoid such trouble - don't test it then.
*
* ( We do this after actually deregistering it, to make sure that a
* 'real' IRQ doesn't run in parallel with our fake. )
*/
- if (action->flags & IRQF_SHARED) {
+ if (action->flags & IRQF_SHARED && !desc->depth) {
local_irq_save(flags);
action->handler(irq, dev_id);
local_irq_restore(flags);
--
2.7.4

2020-06-15 08:20:30

by Marc Kleine-Budde

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] spi: spi-fsl-dspi: Fix external abort on interrupt in exit paths

On 6/15/20 10:07 AM, Krzysztof Kozlowski wrote:
> If interrupt comes late, during probe error path or device remove (could
> be triggered with CONFIG_DEBUG_SHIRQ), the interrupt handler
> dspi_interrupt() will access registers with the clock being disabled. This
> leads to external abort on non-linefetch on Toradex Colibri VF50 module
> (with Vybrid VF5xx):
>
> $ echo 4002d000.spi > /sys/devices/platform/soc/40000000.bus/4002d000.spi/driver/unbind
>
> Unhandled fault: external abort on non-linefetch (0x1008) at 0x8887f02c
> Internal error: : 1008 [#1] ARM
> CPU: 0 PID: 136 Comm: sh Not tainted 5.7.0-next-20200610-00009-g5c913fa0f9c5-dirty #74
> Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
> (regmap_mmio_read32le) from [<8061885c>] (regmap_mmio_read+0x48/0x68)
> (regmap_mmio_read) from [<8060e3b8>] (_regmap_bus_reg_read+0x24/0x28)
> (_regmap_bus_reg_read) from [<80611c50>] (_regmap_read+0x70/0x1c0)
> (_regmap_read) from [<80611dec>] (regmap_read+0x4c/0x6c)
> (regmap_read) from [<80678ca0>] (dspi_interrupt+0x3c/0xa8)
> (dspi_interrupt) from [<8017acec>] (free_irq+0x26c/0x3cc)
> (free_irq) from [<8017dcec>] (devm_irq_release+0x1c/0x20)
> (devm_irq_release) from [<805f98ec>] (release_nodes+0x1e4/0x298)
> (release_nodes) from [<805f9ac8>] (devres_release_all+0x40/0x60)
> (devres_release_all) from [<805f5134>] (device_release_driver_internal+0x108/0x1ac)
> (device_release_driver_internal) from [<805f521c>] (device_driver_detach+0x20/0x24)
>
> Fixes: 349ad66c0ab0 ("spi:Add Freescale DSPI driver for Vybrid VF610 platform")
> Cc: <[email protected]>
> Signed-off-by: Krzysztof Kozlowski <[email protected]>
>
> ---
>
> This is an follow up of my other patch for I2C IMX driver [1]. Let's fix the
> issues consistently.
>
> [1] https://lore.kernel.org/lkml/[email protected]/T/#u
>
> Changes since v1:
> 1. Disable the IRQ instead of using non-devm interface.
> ---
> drivers/spi/spi-fsl-dspi.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c
> index 58190c94561f..023e05c53b85 100644
> --- a/drivers/spi/spi-fsl-dspi.c
> +++ b/drivers/spi/spi-fsl-dspi.c
> @@ -1400,7 +1400,7 @@ static int dspi_probe(struct platform_device *pdev)
> ret = dspi_request_dma(dspi, res->start);
> if (ret < 0) {
> dev_err(&pdev->dev, "can't get dma channels\n");
> - goto out_clk_put;
> + goto disable_irq;
> }
> }
>
> @@ -1415,11 +1415,14 @@ static int dspi_probe(struct platform_device *pdev)
> ret = spi_register_controller(ctlr);
> if (ret != 0) {
> dev_err(&pdev->dev, "Problem registering DSPI ctlr\n");
> - goto out_clk_put;
> + goto disable_irq;
> }
>
> return ret;
>
> +disable_irq:
> + if (dspi->irq > 0)
> + disable_irq(dspi->irq);
> out_clk_put:
> clk_disable_unprepare(dspi->clk);
> out_ctlr_put:
> @@ -1435,6 +1438,8 @@ static int dspi_remove(struct platform_device *pdev)
>
> /* Disconnect from the SPI framework */
> dspi_release_dma(dspi);
> + if (dspi->irq > 0)
> + disable_irq(dspi->irq);

What happens, if you re-bind the driver?
Is the IRQ still working?
Who is taking care of calling the enable_irq() again?
What happens, if you really have a shared IRQ line?
Is the IRQ disabled for all other devices on the same IRQ line?

> clk_disable_unprepare(dspi->clk);
> spi_unregister_controller(dspi->ctlr);
>
>
Marc

--
Pengutronix e.K. | Marc Kleine-Budde |
Embedded Linux | https://www.pengutronix.de |
Vertretung West/Dortmund | Phone: +49-231-2826-924 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

2020-06-15 09:26:23

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] spi: spi-fsl-dspi: Fix external abort on interrupt in exit paths

On Mon, 15 Jun 2020 at 11:18, Marc Kleine-Budde <[email protected]> wrote:
>
> On 6/15/20 10:07 AM, Krzysztof Kozlowski wrote:
> > If interrupt comes late, during probe error path or device remove (could
> > be triggered with CONFIG_DEBUG_SHIRQ), the interrupt handler
> > dspi_interrupt() will access registers with the clock being disabled. This
> > leads to external abort on non-linefetch on Toradex Colibri VF50 module
> > (with Vybrid VF5xx):
> >
> > $ echo 4002d000.spi > /sys/devices/platform/soc/40000000.bus/4002d000.spi/driver/unbind
> >
> > Unhandled fault: external abort on non-linefetch (0x1008) at 0x8887f02c
> > Internal error: : 1008 [#1] ARM
> > CPU: 0 PID: 136 Comm: sh Not tainted 5.7.0-next-20200610-00009-g5c913fa0f9c5-dirty #74
> > Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
> > (regmap_mmio_read32le) from [<8061885c>] (regmap_mmio_read+0x48/0x68)
> > (regmap_mmio_read) from [<8060e3b8>] (_regmap_bus_reg_read+0x24/0x28)
> > (_regmap_bus_reg_read) from [<80611c50>] (_regmap_read+0x70/0x1c0)
> > (_regmap_read) from [<80611dec>] (regmap_read+0x4c/0x6c)
> > (regmap_read) from [<80678ca0>] (dspi_interrupt+0x3c/0xa8)
> > (dspi_interrupt) from [<8017acec>] (free_irq+0x26c/0x3cc)
> > (free_irq) from [<8017dcec>] (devm_irq_release+0x1c/0x20)
> > (devm_irq_release) from [<805f98ec>] (release_nodes+0x1e4/0x298)
> > (release_nodes) from [<805f9ac8>] (devres_release_all+0x40/0x60)
> > (devres_release_all) from [<805f5134>] (device_release_driver_internal+0x108/0x1ac)
> > (device_release_driver_internal) from [<805f521c>] (device_driver_detach+0x20/0x24)
> >
> > Fixes: 349ad66c0ab0 ("spi:Add Freescale DSPI driver for Vybrid VF610 platform")
> > Cc: <[email protected]>
> > Signed-off-by: Krzysztof Kozlowski <[email protected]>
> >
> > ---
> >
> > This is an follow up of my other patch for I2C IMX driver [1]. Let's fix the
> > issues consistently.
> >
> > [1] https://lore.kernel.org/lkml/[email protected]/T/#u
> >
> > Changes since v1:
> > 1. Disable the IRQ instead of using non-devm interface.
> > ---
> > drivers/spi/spi-fsl-dspi.c | 9 +++++++--
> > 1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c
> > index 58190c94561f..023e05c53b85 100644
> > --- a/drivers/spi/spi-fsl-dspi.c
> > +++ b/drivers/spi/spi-fsl-dspi.c
> > @@ -1400,7 +1400,7 @@ static int dspi_probe(struct platform_device *pdev)
> > ret = dspi_request_dma(dspi, res->start);
> > if (ret < 0) {
> > dev_err(&pdev->dev, "can't get dma channels\n");
> > - goto out_clk_put;
> > + goto disable_irq;
> > }
> > }
> >
> > @@ -1415,11 +1415,14 @@ static int dspi_probe(struct platform_device *pdev)
> > ret = spi_register_controller(ctlr);
> > if (ret != 0) {
> > dev_err(&pdev->dev, "Problem registering DSPI ctlr\n");
> > - goto out_clk_put;
> > + goto disable_irq;
> > }
> >
> > return ret;
> >
> > +disable_irq:
> > + if (dspi->irq > 0)
> > + disable_irq(dspi->irq);
> > out_clk_put:
> > clk_disable_unprepare(dspi->clk);
> > out_ctlr_put:
> > @@ -1435,6 +1438,8 @@ static int dspi_remove(struct platform_device *pdev)
> >
> > /* Disconnect from the SPI framework */
> > dspi_release_dma(dspi);
> > + if (dspi->irq > 0)
> > + disable_irq(dspi->irq);
>
> What happens, if you re-bind the driver?
> Is the IRQ still working?
> Who is taking care of calling the enable_irq() again?
> What happens, if you really have a shared IRQ line?
> Is the IRQ disabled for all other devices on the same IRQ line?
>

Yup, devm is completely broken for shared IRQs.

> > clk_disable_unprepare(dspi->clk);
> > spi_unregister_controller(dspi->ctlr);
> >
> >
> Marc
>
> --
> Pengutronix e.K. | Marc Kleine-Budde |
> Embedded Linux | https://www.pengutronix.de |
> Vertretung West/Dortmund | Phone: +49-231-2826-924 |
> Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

2020-06-15 12:11:11

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] genirq: Do not test disabled IRQs with DEBUG_SHIRQ

On Mon, Jun 15, 2020 at 10:07:19AM +0200, Krzysztof Kozlowski wrote:
> Testing events during freeing of disabled shared interrupts
> (CONFIG_DEBUG_SHIRQ) leads to false positives. The driver disabled
> interrupts on purpose to be sure that they will not fire during device
> removal.

Surely the whole issue with shared IRQs that's being tested for here is
that when the interrupt is shared some other device connected to the
same interrupt line may trigger an interrupt regardless of what's going
on with this device?


Attachments:
(No filename) (530.00 B)
signature.asc (499.00 B)
Download all attachments

2020-06-15 12:59:00

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] spi: spi-fsl-dspi: Fix external abort on interrupt in exit paths

On Mon, 15 Jun 2020 at 15:35, Mark Brown <[email protected]> wrote:
>

>
> Indeed. The upshot of all this is that the interrupt needs to be freed
> not disabled before the clocks are disabled, or some other mechanism
> needs to be used to ensure that the interrupt handler won't attempt to
> access the hardware when it shouldn't. As Vladimir says there are
> serious issues using devm for interrupt handlers (or anything else that
> might cause code to be run) due to problems like this.

And the down-shot is that whatever is done in dspi_remove (free_irq)
also needs to be done in dspi_suspend, but with extra care in
dspi_resume not only to request the irq again, but also to flush the
module's FIFOs and clear interrupts, because there might have been
nasty stuff uncaught during sleep:

regmap_update_bits(dspi->regmap, SPI_MCR,
SPI_MCR_CLR_TXF | SPI_MCR_CLR_RXF,
SPI_MCR_CLR_TXF | SPI_MCR_CLR_RXF);
regmap_write(dspi->regmap, SPI_SR, SPI_SR_CLEAR);

So it's pretty messy.

-Vladimir

2020-06-15 13:12:40

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] spi: spi-fsl-dspi: Fix external abort on interrupt in exit paths

On Mon, Jun 15, 2020 at 03:56:01PM +0300, Vladimir Oltean wrote:

> And the down-shot is that whatever is done in dspi_remove (free_irq)
> also needs to be done in dspi_suspend, but with extra care in
> dspi_resume not only to request the irq again, but also to flush the
> module's FIFOs and clear interrupts, because there might have been
> nasty stuff uncaught during sleep:

> regmap_update_bits(dspi->regmap, SPI_MCR,
> SPI_MCR_CLR_TXF | SPI_MCR_CLR_RXF,
> SPI_MCR_CLR_TXF | SPI_MCR_CLR_RXF);
> regmap_write(dspi->regmap, SPI_SR, SPI_SR_CLEAR);

> So it's pretty messy.

It's a bit unusual to need to actually free the IRQ over suspend -
what's driving that requirement here? I can see we might need to
reinit the hardware but usually the interrupt handler can be left there
safely.


Attachments:
(No filename) (846.00 B)
signature.asc (499.00 B)
Download all attachments

2020-06-15 13:12:42

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] spi: spi-fsl-dspi: Fix external abort on interrupt in exit paths

On Mon, Jun 15, 2020 at 12:23:41PM +0300, Vladimir Oltean wrote:
> On Mon, 15 Jun 2020 at 11:18, Marc Kleine-Budde <[email protected]> wrote:
> >
> > On 6/15/20 10:07 AM, Krzysztof Kozlowski wrote:
> > > If interrupt comes late, during probe error path or device remove (could
> > > be triggered with CONFIG_DEBUG_SHIRQ), the interrupt handler
> > > dspi_interrupt() will access registers with the clock being disabled. This
> > > leads to external abort on non-linefetch on Toradex Colibri VF50 module
> > > (with Vybrid VF5xx):
> > >
> > > $ echo 4002d000.spi > /sys/devices/platform/soc/40000000.bus/4002d000.spi/driver/unbind
> > >
> > > Unhandled fault: external abort on non-linefetch (0x1008) at 0x8887f02c
> > > Internal error: : 1008 [#1] ARM
> > > CPU: 0 PID: 136 Comm: sh Not tainted 5.7.0-next-20200610-00009-g5c913fa0f9c5-dirty #74
> > > Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
> > > (regmap_mmio_read32le) from [<8061885c>] (regmap_mmio_read+0x48/0x68)
> > > (regmap_mmio_read) from [<8060e3b8>] (_regmap_bus_reg_read+0x24/0x28)
> > > (_regmap_bus_reg_read) from [<80611c50>] (_regmap_read+0x70/0x1c0)
> > > (_regmap_read) from [<80611dec>] (regmap_read+0x4c/0x6c)
> > > (regmap_read) from [<80678ca0>] (dspi_interrupt+0x3c/0xa8)
> > > (dspi_interrupt) from [<8017acec>] (free_irq+0x26c/0x3cc)
> > > (free_irq) from [<8017dcec>] (devm_irq_release+0x1c/0x20)
> > > (devm_irq_release) from [<805f98ec>] (release_nodes+0x1e4/0x298)
> > > (release_nodes) from [<805f9ac8>] (devres_release_all+0x40/0x60)
> > > (devres_release_all) from [<805f5134>] (device_release_driver_internal+0x108/0x1ac)
> > > (device_release_driver_internal) from [<805f521c>] (device_driver_detach+0x20/0x24)
> > >
> > > Fixes: 349ad66c0ab0 ("spi:Add Freescale DSPI driver for Vybrid VF610 platform")
> > > Cc: <[email protected]>
> > > Signed-off-by: Krzysztof Kozlowski <[email protected]>
> > >
> > > ---
> > >
> > > This is an follow up of my other patch for I2C IMX driver [1]. Let's fix the
> > > issues consistently.
> > >
> > > [1] https://lore.kernel.org/lkml/[email protected]/T/#u
> > >
> > > Changes since v1:
> > > 1. Disable the IRQ instead of using non-devm interface.
> > > ---
> > > drivers/spi/spi-fsl-dspi.c | 9 +++++++--
> > > 1 file changed, 7 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c
> > > index 58190c94561f..023e05c53b85 100644
> > > --- a/drivers/spi/spi-fsl-dspi.c
> > > +++ b/drivers/spi/spi-fsl-dspi.c
> > > @@ -1400,7 +1400,7 @@ static int dspi_probe(struct platform_device *pdev)
> > > ret = dspi_request_dma(dspi, res->start);
> > > if (ret < 0) {
> > > dev_err(&pdev->dev, "can't get dma channels\n");
> > > - goto out_clk_put;
> > > + goto disable_irq;
> > > }
> > > }
> > >
> > > @@ -1415,11 +1415,14 @@ static int dspi_probe(struct platform_device *pdev)
> > > ret = spi_register_controller(ctlr);
> > > if (ret != 0) {
> > > dev_err(&pdev->dev, "Problem registering DSPI ctlr\n");
> > > - goto out_clk_put;
> > > + goto disable_irq;
> > > }
> > >
> > > return ret;
> > >
> > > +disable_irq:
> > > + if (dspi->irq > 0)
> > > + disable_irq(dspi->irq);
> > > out_clk_put:
> > > clk_disable_unprepare(dspi->clk);
> > > out_ctlr_put:
> > > @@ -1435,6 +1438,8 @@ static int dspi_remove(struct platform_device *pdev)
> > >
> > > /* Disconnect from the SPI framework */
> > > dspi_release_dma(dspi);
> > > + if (dspi->irq > 0)
> > > + disable_irq(dspi->irq);
> >
> > What happens, if you re-bind the driver?
> > Is the IRQ still working?
> > Who is taking care of calling the enable_irq() again?
> > What happens, if you really have a shared IRQ line?
> > Is the IRQ disabled for all other devices on the same IRQ line?
> >
>
> Yup, devm is completely broken for shared IRQs.

Then we're back to this massive rework of using non-devm interface.

Best regards,
Krzysztof

2020-06-15 13:14:06

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] spi: spi-fsl-dspi: Fix external abort on interrupt in exit paths

On Mon, Jun 15, 2020 at 03:56:01PM +0300, Vladimir Oltean wrote:
> On Mon, 15 Jun 2020 at 15:35, Mark Brown <[email protected]> wrote:
> >
>
> >
> > Indeed. The upshot of all this is that the interrupt needs to be freed
> > not disabled before the clocks are disabled, or some other mechanism
> > needs to be used to ensure that the interrupt handler won't attempt to
> > access the hardware when it shouldn't. As Vladimir says there are
> > serious issues using devm for interrupt handlers (or anything else that
> > might cause code to be run) due to problems like this.
>
> And the down-shot is that whatever is done in dspi_remove (free_irq)
> also needs to be done in dspi_suspend, but with extra care in
> dspi_resume not only to request the irq again, but also to flush the
> module's FIFOs and clear interrupts, because there might have been
> nasty stuff uncaught during sleep:
>
> regmap_update_bits(dspi->regmap, SPI_MCR,
> SPI_MCR_CLR_TXF | SPI_MCR_CLR_RXF,
> SPI_MCR_CLR_TXF | SPI_MCR_CLR_RXF);
> regmap_write(dspi->regmap, SPI_SR, SPI_SR_CLEAR);
>
> So it's pretty messy.

It is a slightly different bug which so this patch should have a follow
up.

Best regards,
Krzysztof

2020-06-15 13:16:56

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] spi: spi-fsl-dspi: Fix external abort on interrupt in exit paths

On Mon, 15 Jun 2020 at 16:10, Krzysztof Kozlowski <[email protected]> wrote:
>

>
> It is a slightly different bug which so this patch should have a follow
> up.
>
> Best regards,
> Krzysztof
>

Why is it a different bug? It's the same bug.

2020-06-15 13:17:14

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] spi: spi-fsl-dspi: Fix external abort on interrupt in exit paths

On Mon, 15 Jun 2020 at 16:10, Mark Brown <[email protected]> wrote:

>
> It's a bit unusual to need to actually free the IRQ over suspend -
> what's driving that requirement here?

clk_disable_unprepare(dspi->clk); is driving the requirement - same as
in dspi_remove case, the module will fault when its registers are
accessed without a clock.

-Vladimir

2020-06-15 13:27:01

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] spi: spi-fsl-dspi: Fix external abort on interrupt in exit paths

On Mon, Jun 15, 2020 at 04:12:28PM +0300, Vladimir Oltean wrote:
> On Mon, 15 Jun 2020 at 16:10, Mark Brown <[email protected]> wrote:

> > It's a bit unusual to need to actually free the IRQ over suspend -
> > what's driving that requirement here?

> clk_disable_unprepare(dspi->clk); is driving the requirement - same as
> in dspi_remove case, the module will fault when its registers are
> accessed without a clock.

I see - this could be fixed by having the interrupt handler bounce the
clock on, there's a little overhead from that but hopefully not too
much. That should also help with the remove case I guess so long as the
clock is registered before the interrupt is requested?


Attachments:
(No filename) (702.00 B)
signature.asc (499.00 B)
Download all attachments

2020-06-15 13:31:59

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] spi: spi-fsl-dspi: Fix external abort on interrupt in exit paths

On Mon, Jun 15, 2020 at 04:14:06PM +0300, Vladimir Oltean wrote:
> On Mon, 15 Jun 2020 at 16:10, Krzysztof Kozlowski <[email protected]> wrote:
> >
>
> >
> > It is a slightly different bug which so this patch should have a follow
> > up.
> >
> > Best regards,
> > Krzysztof
> >
>
> Why is it a different bug? It's the same bug.

One bug is using devm-interface for shared interrupts and second is not
caring about suspend/resume.

Best regards,
Krzysztof

2020-06-15 13:32:37

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] spi: spi-fsl-dspi: Fix external abort on interrupt in exit paths

On Mon, 15 Jun 2020 at 16:24, Mark Brown <[email protected]> wrote:
>
> On Mon, Jun 15, 2020 at 04:12:28PM +0300, Vladimir Oltean wrote:
> > On Mon, 15 Jun 2020 at 16:10, Mark Brown <[email protected]> wrote:
>
> > > It's a bit unusual to need to actually free the IRQ over suspend -
> > > what's driving that requirement here?
>
> > clk_disable_unprepare(dspi->clk); is driving the requirement - same as
> > in dspi_remove case, the module will fault when its registers are
> > accessed without a clock.
>
> I see - this could be fixed by having the interrupt handler bounce the
> clock on, there's a little overhead from that but hopefully not too
> much. That should also help with the remove case I guess so long as the
> clock is registered before the interrupt is requested?

Doesn't this mean that we risk leaving the clock enabled during suspend?
Is there any function in the SPI core that quiesces any pending
transactions, and then stops the controller? I would have expected
spi_controller_suspend to do that, but I'm not sure (it doesn't look
like it).

2020-06-15 13:36:49

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] spi: spi-fsl-dspi: Fix external abort on interrupt in exit paths

On Mon, 15 Jun 2020 at 16:28, Krzysztof Kozlowski <[email protected]> wrote:
>
> On Mon, Jun 15, 2020 at 04:14:06PM +0300, Vladimir Oltean wrote:
> > On Mon, 15 Jun 2020 at 16:10, Krzysztof Kozlowski <[email protected]> wrote:
> > >
> >
> > >
> > > It is a slightly different bug which so this patch should have a follow
> > > up.
> > >
> > > Best regards,
> > > Krzysztof
> > >
> >
> > Why is it a different bug? It's the same bug.
>
> One bug is using devm-interface for shared interrupts and second is not
> caring about suspend/resume.
>
> Best regards,
> Krzysztof
>

The problem is that you don't have a way to stop servicing a shared
interrupt safely and on demand, before clk_disable_unprepare.
So it's exactly the same problem on suspend and on remove.
Avoiding to think about the suspend problem now means that you'll end
up having an overall worse solution.

2020-06-15 13:38:56

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] spi: spi-fsl-dspi: Fix external abort on interrupt in exit paths

On Mon, Jun 15, 2020 at 04:29:15PM +0300, Vladimir Oltean wrote:
> On Mon, 15 Jun 2020 at 16:24, Mark Brown <[email protected]> wrote:

> > I see - this could be fixed by having the interrupt handler bounce the
> > clock on, there's a little overhead from that but hopefully not too
> > much. That should also help with the remove case I guess so long as the
> > clock is registered before the interrupt is requested?

> Doesn't this mean that we risk leaving the clock enabled during suspend?

If we suspend with the interrupt handler running but IIRC the suspend
sequence will allow interrupt handlers to complete.

> Is there any function in the SPI core that quiesces any pending
> transactions, and then stops the controller? I would have expected
> spi_controller_suspend to do that, but I'm not sure (it doesn't look
> like it).

spi_stop_queue() should do this (but will time out if the queue is too
busy). It doesn't stop new transactions being issued, I'm guessing
because that'll most likely cause more problems than it solves but that
code predates my involvement.


Attachments:
(No filename) (1.08 kB)
signature.asc (499.00 B)
Download all attachments

2020-06-15 13:44:13

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] spi: spi-fsl-dspi: Fix external abort on interrupt in exit paths

On Mon, Jun 15, 2020 at 04:12:28PM +0300, Vladimir Oltean wrote:
> On Mon, 15 Jun 2020 at 16:10, Mark Brown <[email protected]> wrote:
>
> >
> > It's a bit unusual to need to actually free the IRQ over suspend -
> > what's driving that requirement here?
>
> clk_disable_unprepare(dspi->clk); is driving the requirement - same as
> in dspi_remove case, the module will fault when its registers are
> accessed without a clock.

In few cases when I have shared interrupt in different drivers, they
were just disabling it during suspend. Why it has to be freed?

Best regards,
Krzysztof

2020-06-15 14:26:19

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] spi: spi-fsl-dspi: Fix external abort on interrupt in exit paths

On Mon, 15 Jun 2020 at 16:41, Krzysztof Kozlowski <[email protected]> wrote:
>
> On Mon, Jun 15, 2020 at 04:12:28PM +0300, Vladimir Oltean wrote:
> > On Mon, 15 Jun 2020 at 16:10, Mark Brown <[email protected]> wrote:
> >
> > >
> > > It's a bit unusual to need to actually free the IRQ over suspend -
> > > what's driving that requirement here?
> >
> > clk_disable_unprepare(dspi->clk); is driving the requirement - same as
> > in dspi_remove case, the module will fault when its registers are
> > accessed without a clock.
>
> In few cases when I have shared interrupt in different drivers, they
> were just disabling it during suspend. Why it has to be freed?
>
> Best regards,
> Krzysztof
>

Not saying it _has_ to be freed, just to be prevented from running
concurrently with us disabling the clock.
But if we can get away in dspi_suspend with just disable_irq, can't we
also get away in dspi_remove with just devm_free_irq?

-Vladimir

2020-06-15 14:59:45

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] spi: spi-fsl-dspi: Fix external abort on interrupt in exit paths

On Mon, Jun 15, 2020 at 05:23:28PM +0300, Vladimir Oltean wrote:
> On Mon, 15 Jun 2020 at 16:41, Krzysztof Kozlowski <[email protected]> wrote:
> >
> > On Mon, Jun 15, 2020 at 04:12:28PM +0300, Vladimir Oltean wrote:
> > > On Mon, 15 Jun 2020 at 16:10, Mark Brown <[email protected]> wrote:
> > >
> > > >
> > > > It's a bit unusual to need to actually free the IRQ over suspend -
> > > > what's driving that requirement here?
> > >
> > > clk_disable_unprepare(dspi->clk); is driving the requirement - same as
> > > in dspi_remove case, the module will fault when its registers are
> > > accessed without a clock.
> >
> > In few cases when I have shared interrupt in different drivers, they
> > were just disabling it during suspend. Why it has to be freed?
> >
> > Best regards,
> > Krzysztof
> >
>
> Not saying it _has_ to be freed, just to be prevented from running
> concurrently with us disabling the clock.
> But if we can get away in dspi_suspend with just disable_irq, can't we
> also get away in dspi_remove with just devm_free_irq?

One reason why they have to be different could be following scenario:
1. Device could be unbound any time and disabling IRQ in remove() would
effectively disable the IRQ also for other devices using this shared
line. First disable_irq() really disables it, the latter just
increases the counter.
2. However during system suspend, it is expected that all drivers in
their suspend (and later resume) callbacks will do the same - disable
the shared IRQ line. And finally the system disables interrupts
globally so the line will be balanced.

Freeing IRQ solves the case #1 without causing any imbalance between
enables/disables or requests/frees. Disabling IRQ solves the #2, also
without any imbalance.

Best regards,
Krzysztof



2020-06-15 15:04:47

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] spi: spi-fsl-dspi: Fix external abort on interrupt in exit paths

On Mon, 15 Jun 2020 at 17:57, Krzysztof Kozlowski <[email protected]> wrote:
>
> On Mon, Jun 15, 2020 at 05:23:28PM +0300, Vladimir Oltean wrote:
> > On Mon, 15 Jun 2020 at 16:41, Krzysztof Kozlowski <[email protected]> wrote:
> > >
> > > On Mon, Jun 15, 2020 at 04:12:28PM +0300, Vladimir Oltean wrote:
> > > > On Mon, 15 Jun 2020 at 16:10, Mark Brown <[email protected]> wrote:
> > > >
> > > > >
> > > > > It's a bit unusual to need to actually free the IRQ over suspend -
> > > > > what's driving that requirement here?
> > > >
> > > > clk_disable_unprepare(dspi->clk); is driving the requirement - same as
> > > > in dspi_remove case, the module will fault when its registers are
> > > > accessed without a clock.
> > >
> > > In few cases when I have shared interrupt in different drivers, they
> > > were just disabling it during suspend. Why it has to be freed?
> > >
> > > Best regards,
> > > Krzysztof
> > >
> >
> > Not saying it _has_ to be freed, just to be prevented from running
> > concurrently with us disabling the clock.
> > But if we can get away in dspi_suspend with just disable_irq, can't we
> > also get away in dspi_remove with just devm_free_irq?
>
> One reason why they have to be different could be following scenario:
> 1. Device could be unbound any time and disabling IRQ in remove() would
> effectively disable the IRQ also for other devices using this shared
> line. First disable_irq() really disables it, the latter just
> increases the counter.
> 2. However during system suspend, it is expected that all drivers in
> their suspend (and later resume) callbacks will do the same - disable
> the shared IRQ line. And finally the system disables interrupts
> globally so the line will be balanced.
>
> Freeing IRQ solves the case #1 without causing any imbalance between
> enables/disables or requests/frees. Disabling IRQ solves the #2, also
> without any imbalance.
>
> Best regards,
> Krzysztof
>
>
>

So the answer to my question is 'yes', right?

2020-06-15 17:05:17

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] spi: spi-fsl-dspi: Fix external abort on interrupt in exit paths

On Mon, Jun 15, 2020 at 10:17:07AM +0200, Marc Kleine-Budde wrote:
> On 6/15/20 10:07 AM, Krzysztof Kozlowski wrote:
> > If interrupt comes late, during probe error path or device remove (could
> > be triggered with CONFIG_DEBUG_SHIRQ), the interrupt handler
> > dspi_interrupt() will access registers with the clock being disabled. This
> > leads to external abort on non-linefetch on Toradex Colibri VF50 module
> > (with Vybrid VF5xx):

> > Unhandled fault: external abort on non-linefetch (0x1008) at 0x8887f02c
> > Internal error: : 1008 [#1] ARM
> > CPU: 0 PID: 136 Comm: sh Not tainted 5.7.0-next-20200610-00009-g5c913fa0f9c5-dirty #74
> > Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
> > (regmap_mmio_read32le) from [<8061885c>] (regmap_mmio_read+0x48/0x68)
> > (regmap_mmio_read) from [<8060e3b8>] (_regmap_bus_reg_read+0x24/0x28)
> > (_regmap_bus_reg_read) from [<80611c50>] (_regmap_read+0x70/0x1c0)

Please think hard before including complete backtraces in upstream
reports, they are very large and contain almost no useful information
relative to their size so often obscure the relevant content in your
message. If part of the backtrace is usefully illustrative (it often is
for search engines if nothing else) then it's usually better to pull out
the relevant sections.

> > +disable_irq:
> > + if (dspi->irq > 0)
> > + disable_irq(dspi->irq);

> What happens, if you re-bind the driver?
> Is the IRQ still working?
> Who is taking care of calling the enable_irq() again?
> What happens, if you really have a shared IRQ line?
> Is the IRQ disabled for all other devices on the same IRQ line?

Indeed. The upshot of all this is that the interrupt needs to be freed
not disabled before the clocks are disabled, or some other mechanism
needs to be used to ensure that the interrupt handler won't attempt to
access the hardware when it shouldn't. As Vladimir says there are
serious issues using devm for interrupt handlers (or anything else that
might cause code to be run) due to problems like this.


Attachments:
(No filename) (2.05 kB)
signature.asc (499.00 B)
Download all attachments

2020-06-16 10:13:52

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] genirq: Do not test disabled IRQs with DEBUG_SHIRQ

On Mon, Jun 15, 2020 at 01:08:44PM +0100, Mark Brown wrote:
> On Mon, Jun 15, 2020 at 10:07:19AM +0200, Krzysztof Kozlowski wrote:
> > Testing events during freeing of disabled shared interrupts
> > (CONFIG_DEBUG_SHIRQ) leads to false positives. The driver disabled
> > interrupts on purpose to be sure that they will not fire during device
> > removal.
>
> Surely the whole issue with shared IRQs that's being tested for here is
> that when the interrupt is shared some other device connected to the
> same interrupt line may trigger an interrupt regardless of what's going
> on with this device?

Yes. However if that device disabled the interrupt, it should not be
fired for other users. In such case the testing does not point to a
real issue.

Anyway, this patch is not necessary with my v3 approach to SPI shared
interrupts issue.

Best regards,
Krzysztof

2020-06-16 10:42:45

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] genirq: Do not test disabled IRQs with DEBUG_SHIRQ

On Tue, Jun 16, 2020 at 12:11:17PM +0200, Krzysztof Kozlowski wrote:
> On Mon, Jun 15, 2020 at 01:08:44PM +0100, Mark Brown wrote:
> > On Mon, Jun 15, 2020 at 10:07:19AM +0200, Krzysztof Kozlowski wrote:
> > > Testing events during freeing of disabled shared interrupts
> > > (CONFIG_DEBUG_SHIRQ) leads to false positives. The driver disabled
> > > interrupts on purpose to be sure that they will not fire during device
> > > removal.

> > Surely the whole issue with shared IRQs that's being tested for here is
> > that when the interrupt is shared some other device connected to the
> > same interrupt line may trigger an interrupt regardless of what's going
> > on with this device?

> Yes. However if that device disabled the interrupt, it should not be
> fired for other users. In such case the testing does not point to a
> real issue.

To be honest I'd say that if you're disabling a shared interrupt that's
a bit of an issue regardless of anything else that's going on, it'll
disrupt other devices connected to it.


Attachments:
(No filename) (1.02 kB)
signature.asc (499.00 B)
Download all attachments

2020-06-17 09:33:23

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] genirq: Do not test disabled IRQs with DEBUG_SHIRQ

Mark Brown <[email protected]> writes:
> On Tue, Jun 16, 2020 at 12:11:17PM +0200, Krzysztof Kozlowski wrote:
>> On Mon, Jun 15, 2020 at 01:08:44PM +0100, Mark Brown wrote:
>> > On Mon, Jun 15, 2020 at 10:07:19AM +0200, Krzysztof Kozlowski wrote:
>> > > Testing events during freeing of disabled shared interrupts
>> > > (CONFIG_DEBUG_SHIRQ) leads to false positives. The driver disabled
>> > > interrupts on purpose to be sure that they will not fire during device
>> > > removal.
>
>> > Surely the whole issue with shared IRQs that's being tested for here is
>> > that when the interrupt is shared some other device connected to the
>> > same interrupt line may trigger an interrupt regardless of what's going
>> > on with this device?
>
>> Yes. However if that device disabled the interrupt, it should not be
>> fired for other users. In such case the testing does not point to a
>> real issue.
>
> To be honest I'd say that if you're disabling a shared interrupt that's
> a bit of an issue regardless of anything else that's going on, it'll
> disrupt other devices connected to it.

Correct.

Shared interrupts are broken by design and I really can't understand why
hardware people still insist on them.

Thanks,

tglx