2018-07-17 00:58:18

by Daniel Kurtz

[permalink] [raw]
Subject: [PATCH 1/2] pinctrl/amd: only handle irq if it is pending and unmasked

The AMD pinctrl driver demultiplexes GPIO interrupts and fires off their
individual handlers.

If one of these GPIO irqs is configured as a level interrupt, and its
downstream handler is a threaded ONESHOT interrupt, the GPIO interrupt
source is masked by handle_level_irq() until the eventual return of the
threaded irq handler. During this time the level GPIO interrupt status
will still report as high until the actual gpio source is cleared - both
in the individual GPIO interrupt status bit (INTERRUPT_STS_OFF) and in
its corresponding "WAKE_INT_STATUS_REG" bit.

Thus, if another GPIO interrupt occurs during this time,
amd_gpio_irq_handler() will see that the (masked-and-not-yet-cleared)
level irq is still pending and incorrectly call its handler again.

To fix this, have amd_gpio_irq_handler() check for both interrupts status
and mask before calling generic_handle_irq().

Note: Is it possible that this bug was the source of the interrupt storm
on Ryzen when using chained interrupts before commit ba714a9c1dea85
("pinctrl/amd: Use regular interrupt instead of chained")?

Signed-off-by: Daniel Kurtz <[email protected]>
---
drivers/pinctrl/pinctrl-amd.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/pinctrl/pinctrl-amd.c b/drivers/pinctrl/pinctrl-amd.c
index 04ae139671c8a8..b91db89eb9247c 100644
--- a/drivers/pinctrl/pinctrl-amd.c
+++ b/drivers/pinctrl/pinctrl-amd.c
@@ -552,7 +552,8 @@ static irqreturn_t amd_gpio_irq_handler(int irq, void *dev_id)
/* Each status bit covers four pins */
for (i = 0; i < 4; i++) {
regval = readl(regs + i);
- if (!(regval & PIN_IRQ_PENDING))
+ if (!(regval & PIN_IRQ_PENDING) ||
+ !(regval & BIT(INTERRUPT_MASK_OFF)))
continue;
irq = irq_find_mapping(gc->irq.domain, irqnr + i);
generic_handle_irq(irq);
--
2.18.0.203.gfac676dfb9-goog



2018-07-17 00:58:51

by Daniel Kurtz

[permalink] [raw]
Subject: [PATCH 2/2] pinctrl/amd: use byte access to clear irq/wake status bits

Commit 6afb10267c1692 ("pinctrl/amd: fix masking of GPIO interrupts")
changed to the clearing of interrupt status bits to a RMW in a critical
section. This works, but is a bit overkill.

The relevant interrupt/wake status bits are in the Most Significant Byte
of a 32-bit word. These two are the only write-able bits in this byte.

Therefore, it should be safe to just write these bits back as a byte
access without any additional locking.

Signed-off-by: Daniel Kurtz <[email protected]>
---
drivers/pinctrl/pinctrl-amd.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/pinctrl/pinctrl-amd.c b/drivers/pinctrl/pinctrl-amd.c
index b91db89eb9247c..52efe77ffb9991 100644
--- a/drivers/pinctrl/pinctrl-amd.c
+++ b/drivers/pinctrl/pinctrl-amd.c
@@ -558,15 +558,11 @@ static irqreturn_t amd_gpio_irq_handler(int irq, void *dev_id)
irq = irq_find_mapping(gc->irq.domain, irqnr + i);
generic_handle_irq(irq);

- /* Clear interrupt.
- * We must read the pin register again, in case the
- * value was changed while executing
- * generic_handle_irq() above.
+ /*
+ * Write-1-to-clear irq/wake status bits in MSByte.
+ * All other bits in this byte are read-only.
*/
- raw_spin_lock_irqsave(&gpio_dev->lock, flags);
- regval = readl(regs + i);
- writel(regval, regs + i);
- raw_spin_unlock_irqrestore(&gpio_dev->lock, flags);
+ writeb((regval >> 24), (u8 *)(regs + i) + 3);
ret = IRQ_HANDLED;
}
}
--
2.18.0.203.gfac676dfb9-goog


2018-07-17 12:32:33

by Daniel Drake

[permalink] [raw]
Subject: Re: [PATCH 2/2] pinctrl/amd: use byte access to clear irq/wake status bits

On Mon, Jul 16, 2018 at 7:57 PM, Daniel Kurtz <[email protected]> wrote:
> Commit 6afb10267c1692 ("pinctrl/amd: fix masking of GPIO interrupts")
> changed to the clearing of interrupt status bits to a RMW in a critical
> section. This works, but is a bit overkill.
>
> The relevant interrupt/wake status bits are in the Most Significant Byte
> of a 32-bit word. These two are the only write-able bits in this byte.

I don't have the hardware to test this any more, and I also don't have
any docs to double if those are really the only writable bits, but
looking at the existing driver code it does seem to be the case.

I think you should retain the comment noting that the value of the
register may have changed since it was read just a few lines above
(and hence explaining more precisely why we make the special effort
just to modify the MSB), just in case there is further rework of this
code in future and we end up walking into the same trap. It was one of
those issues that took a frustratingly long time to figure out...

Thanks
Daniel

2018-07-19 14:55:57

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/2] pinctrl/amd: only handle irq if it is pending and unmasked

On Mon, 16 Jul 2018, Daniel Kurtz wrote:

> The AMD pinctrl driver demultiplexes GPIO interrupts and fires off their
> individual handlers.
>
> If one of these GPIO irqs is configured as a level interrupt, and its
> downstream handler is a threaded ONESHOT interrupt, the GPIO interrupt
> source is masked by handle_level_irq() until the eventual return of the
> threaded irq handler. During this time the level GPIO interrupt status
> will still report as high until the actual gpio source is cleared - both
> in the individual GPIO interrupt status bit (INTERRUPT_STS_OFF) and in
> its corresponding "WAKE_INT_STATUS_REG" bit.
>
> Thus, if another GPIO interrupt occurs during this time,
> amd_gpio_irq_handler() will see that the (masked-and-not-yet-cleared)
> level irq is still pending and incorrectly call its handler again.
>
> To fix this, have amd_gpio_irq_handler() check for both interrupts status
> and mask before calling generic_handle_irq().
>
> Note: Is it possible that this bug was the source of the interrupt storm
> on Ryzen when using chained interrupts before commit ba714a9c1dea85
> ("pinctrl/amd: Use regular interrupt instead of chained")?
>
> Signed-off-by: Daniel Kurtz <[email protected]>

Acked-by: Thomas Gleixner <[email protected]>

2018-07-20 23:39:35

by Daniel Kurtz

[permalink] [raw]
Subject: Re: [PATCH 2/2] pinctrl/amd: use byte access to clear irq/wake status bits

Hi Daniel,

On Tue, Jul 17, 2018 at 6:30 AM Daniel Drake <[email protected]> wrote:
>
> On Mon, Jul 16, 2018 at 7:57 PM, Daniel Kurtz <[email protected]> wrote:
> > Commit 6afb10267c1692 ("pinctrl/amd: fix masking of GPIO interrupts")
> > changed to the clearing of interrupt status bits to a RMW in a critical
> > section. This works, but is a bit overkill.
> >
> > The relevant interrupt/wake status bits are in the Most Significant Byte
> > of a 32-bit word. These two are the only write-able bits in this byte.
>
> I don't have the hardware to test this any more, and I also don't have
> any docs to double if those are really the only writable bits, but
> looking at the existing driver code it does seem to be the case.
>
> I think you should retain the comment noting that the value of the
> register may have changed since it was read just a few lines above
> (and hence explaining more precisely why we make the special effort
> just to modify the MSB), just in case there is further rework of this
> code in future and we end up walking into the same trap. It was one of
> those issues that took a frustratingly long time to figure out...

Sounds reasonable. How about:

- /* Clear interrupt.
- * We must read the pin register again, in case the
- * value was changed while executing
- * generic_handle_irq() above.
+ /*
+ * Write-1-to-clear irq/wake status bits in MSByte.
+ * All other bits in this byte are read-only.
+ * This avoids modifying the lower 24-bits
because they may have
+ * changed while executing generic_handle_irq() above.
*/


>
> Thanks
> Daniel

2018-07-24 12:30:50

by Daniel Drake

[permalink] [raw]
Subject: Re: [PATCH 2/2] pinctrl/amd: use byte access to clear irq/wake status bits

On Fri, Jul 20, 2018 at 6:38 PM, Daniel Kurtz <[email protected]> wrote:
> Sounds reasonable. How about:
>
> - /* Clear interrupt.
> - * We must read the pin register again, in case the
> - * value was changed while executing
> - * generic_handle_irq() above.
> + /*
> + * Write-1-to-clear irq/wake status bits in MSByte.
> + * All other bits in this byte are read-only.
> + * This avoids modifying the lower 24-bits
> because they may have
> + * changed while executing generic_handle_irq() above.
> */

That looks good. Thanks

Daniel

2018-07-29 21:06:46

by Linus Walleij

[permalink] [raw]
Subject: Re: [PATCH 1/2] pinctrl/amd: only handle irq if it is pending and unmasked

On Tue, Jul 17, 2018 at 2:57 AM Daniel Kurtz <[email protected]> wrote:

> The AMD pinctrl driver demultiplexes GPIO interrupts and fires off their
> individual handlers.
>
> If one of these GPIO irqs is configured as a level interrupt, and its
> downstream handler is a threaded ONESHOT interrupt, the GPIO interrupt
> source is masked by handle_level_irq() until the eventual return of the
> threaded irq handler. During this time the level GPIO interrupt status
> will still report as high until the actual gpio source is cleared - both
> in the individual GPIO interrupt status bit (INTERRUPT_STS_OFF) and in
> its corresponding "WAKE_INT_STATUS_REG" bit.
>
> Thus, if another GPIO interrupt occurs during this time,
> amd_gpio_irq_handler() will see that the (masked-and-not-yet-cleared)
> level irq is still pending and incorrectly call its handler again.
>
> To fix this, have amd_gpio_irq_handler() check for both interrupts status
> and mask before calling generic_handle_irq().
>
> Note: Is it possible that this bug was the source of the interrupt storm
> on Ryzen when using chained interrupts before commit ba714a9c1dea85
> ("pinctrl/amd: Use regular interrupt instead of chained")?
>
> Signed-off-by: Daniel Kurtz <[email protected]>

Patch applied with TGLX ACK.

Yours,
Linus Walleij