2023-05-30 22:27:14

by James Gowans

[permalink] [raw]
Subject: [PATCH 1/2] genirq: Expand doc for PENDING and REPLAY flags

Adding a bit more info about what the flags are used for may help future
code readers.

Signed-off-by: James Gowans <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Liao Chang <[email protected]>
---
kernel/irq/internals.h | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index 5fdc0b557579..c443a0ddc07e 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -47,9 +47,12 @@ enum {
* detection
* IRQS_POLL_INPROGRESS - polling in progress
* IRQS_ONESHOT - irq is not unmasked in primary handler
- * IRQS_REPLAY - irq is replayed
+ * IRQS_REPLAY - irq has been resent and will not be resent
+ * again until the handler has run and cleared
+ * this flag.
* IRQS_WAITING - irq is waiting
- * IRQS_PENDING - irq is pending and replayed later
+ * IRQS_PENDING - irq needs to be resent and should be resent
+ * at the next available opportunity.
* IRQS_SUSPENDED - irq is suspended
* IRQS_NMI - irq line is used to deliver NMIs
* IRQS_SYSFS - descriptor has been added to sysfs
--
2.25.1



2023-05-30 22:36:05

by James Gowans

[permalink] [raw]
Subject: [PATCH 2/2] genirq: fasteoi resends interrupt on concurrent invoke

Update the generic handle_fasteoi_irq to cater for the case when the
next interrupt comes in while the previous handler is still running.
Currently when that happens the irq_may_run() early out causes the next
IRQ to be lost. Change the behaviour to mark the interrupt as pending
and re-send the interrupt when handle_fasteoi_irq sees that the pending
flag has been set. This is largely inspired by handle_edge_irq.

Generally it should not be possible for the next interrupt to arrive
while the previous handler is still running: the next interrupt should
only arrive after the EOI message has been sent and the previous handler
has returned. However, there is a race where if the interrupt affinity
is changed while the previous handler is running, then the next
interrupt can arrive at a different CPU while the previous handler is
still running. In that case there will be a concurrent invoke and the
early out will be taken.

For example:

CPU 0 | CPU 1
-----------------------------|-----------------------------
interrupt start |
handle_fasteoi_irq | set_affinity(CPU 1)
handler |
... | interrupt start
... | handle_fasteoi_irq -> early out
handle_fasteoi_irq return | interrupt end
interrupt end |

This issue was observed specifically on an arm64 system with a GIC-v3
handling MSIs; GIC-v3 uses the handle_fasteoi_irq handler. The issue is
that the global ITS is responsible for affinity but does not know
whether interrupts are pending/running, only the CPU-local redistributor
handles the EOI. Hence when the affinity is changed in the ITS, the new
CPU's redistributor does not know that the original CPU is still running
the handler.

Implementation notes:

It is believed that it's NOT necessary to mask the interrupt in
handle_fasteoi_irq() the way that handle_edge_irq() does. This is
because handle_edge_irq() caters for controllers which are too simple to
gate interrupts from the same source, so the kernel explicitly masks the
interrupt if it re-occurs [0].

[0] https://lore.kernel.org/all/[email protected]/

Suggested-by: Liao Chang <[email protected]>
Signed-off-by: James Gowans <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: KarimAllah Raslan <[email protected]>
Cc: Yipeng Zou <[email protected]>
Cc: Zhang Jianhua <[email protected]>
---
kernel/irq/chip.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 49e7bc871fec..42f33e77c16b 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -692,8 +692,15 @@ void handle_fasteoi_irq(struct irq_desc *desc)

raw_spin_lock(&desc->lock);

- if (!irq_may_run(desc))
+ /*
+ * When an affinity change races with IRQ delivery, the next interrupt
+ * can arrive on the new CPU before the original CPU has completed
+ * handling the previous one. Mark it as pending and return EOI.
+ */
+ if (!irq_may_run(desc)) {
+ desc->istate |= IRQS_PENDING;
goto out;
+ }

desc->istate &= ~(IRQS_REPLAY | IRQS_WAITING);

@@ -715,6 +722,12 @@ void handle_fasteoi_irq(struct irq_desc *desc)

cond_unmask_eoi_irq(desc, chip);

+ /*
+ * When the race descibed above happens, this will resend the interrupt.
+ */
+ if (unlikely(desc->istate & IRQS_PENDING))
+ check_irq_resend(desc, false);
+
raw_spin_unlock(&desc->lock);
return;
out:
--
2.25.1


2023-05-30 23:53:58

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH 2/2] genirq: fasteoi resends interrupt on concurrent invoke

Hi--

On 5/30/23 14:38, James Gowans wrote:

> ---
> kernel/irq/chip.c | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> index 49e7bc871fec..42f33e77c16b 100644
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -692,8 +692,15 @@ void handle_fasteoi_irq(struct irq_desc *desc)
>
> raw_spin_lock(&desc->lock);
>
> - if (!irq_may_run(desc))
> + /*
> + * When an affinity change races with IRQ delivery, the next interrupt
> + * can arrive on the new CPU before the original CPU has completed
> + * handling the previous one. Mark it as pending and return EOI.
> + */
> + if (!irq_may_run(desc)) {
> + desc->istate |= IRQS_PENDING;
> goto out;
> + }
>
> desc->istate &= ~(IRQS_REPLAY | IRQS_WAITING);
>
> @@ -715,6 +722,12 @@ void handle_fasteoi_irq(struct irq_desc *desc)
>
> cond_unmask_eoi_irq(desc, chip);
>
> + /*
> + * When the race descibed above happens, this will resend the interrupt.

described

> + */
> + if (unlikely(desc->istate & IRQS_PENDING))
> + check_irq_resend(desc, false);
> +
> raw_spin_unlock(&desc->lock);
> return;
> out:

--
~Randy

2023-05-31 07:03:57

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 2/2] genirq: fasteoi resends interrupt on concurrent invoke

On Tue, 30 May 2023 22:38:48 +0100,
James Gowans <[email protected]> wrote:
>
> Update the generic handle_fasteoi_irq to cater for the case when the
> next interrupt comes in while the previous handler is still running.
> Currently when that happens the irq_may_run() early out causes the next
> IRQ to be lost. Change the behaviour to mark the interrupt as pending
> and re-send the interrupt when handle_fasteoi_irq sees that the pending
> flag has been set. This is largely inspired by handle_edge_irq.
>
> Generally it should not be possible for the next interrupt to arrive
> while the previous handler is still running: the next interrupt should
> only arrive after the EOI message has been sent and the previous handler
> has returned.

There is no such message with LPIs. I pointed that out previously.

> However, there is a race where if the interrupt affinity
> is changed while the previous handler is running, then the next
> interrupt can arrive at a different CPU while the previous handler is
> still running. In that case there will be a concurrent invoke and the
> early out will be taken.
>
> For example:
>
> CPU 0 | CPU 1
> -----------------------------|-----------------------------
> interrupt start |
> handle_fasteoi_irq | set_affinity(CPU 1)
> handler |
> ... | interrupt start
> ... | handle_fasteoi_irq -> early out
> handle_fasteoi_irq return | interrupt end
> interrupt end |
>
> This issue was observed specifically on an arm64 system with a GIC-v3
> handling MSIs; GIC-v3 uses the handle_fasteoi_irq handler. The issue is
> that the global ITS is responsible for affinity but does not know
> whether interrupts are pending/running, only the CPU-local redistributor
> handles the EOI. Hence when the affinity is changed in the ITS, the new
> CPU's redistributor does not know that the original CPU is still running
> the handler.

Similar to your previous patch, you don't explain *why* the interrupt
gets delivered when it is an LPI, and not for any of the other GICv3
interrupt types. That's an important point.

>
> Implementation notes:
>
> It is believed that it's NOT necessary to mask the interrupt in
> handle_fasteoi_irq() the way that handle_edge_irq() does. This is
> because handle_edge_irq() caters for controllers which are too simple to
> gate interrupts from the same source, so the kernel explicitly masks the
> interrupt if it re-occurs [0].
>
> [0] https://lore.kernel.org/all/[email protected]/
>
> Suggested-by: Liao Chang <[email protected]>
> Signed-off-by: James Gowans <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Cc: KarimAllah Raslan <[email protected]>
> Cc: Yipeng Zou <[email protected]>
> Cc: Zhang Jianhua <[email protected]>
> ---
> kernel/irq/chip.c | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> index 49e7bc871fec..42f33e77c16b 100644
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -692,8 +692,15 @@ void handle_fasteoi_irq(struct irq_desc *desc)
>
> raw_spin_lock(&desc->lock);
>
> - if (!irq_may_run(desc))
> + /*
> + * When an affinity change races with IRQ delivery, the next interrupt
> + * can arrive on the new CPU before the original CPU has completed
> + * handling the previous one. Mark it as pending and return EOI.
> + */
> + if (!irq_may_run(desc)) {
> + desc->istate |= IRQS_PENDING;
> goto out;
> + }
>
> desc->istate &= ~(IRQS_REPLAY | IRQS_WAITING);
>
> @@ -715,6 +722,12 @@ void handle_fasteoi_irq(struct irq_desc *desc)
>
> cond_unmask_eoi_irq(desc, chip);
>
> + /*
> + * When the race descibed above happens, this will resend the interrupt.
> + */
> + if (unlikely(desc->istate & IRQS_PENDING))
> + check_irq_resend(desc, false);
> +
> raw_spin_unlock(&desc->lock);
> return;
> out:

While I'm glad that you eventually decided to use the resend mechanism
instead of spinning on the "old" CPU, I still think imposing this
behaviour on all users without any discrimination is wrong.

Look at what it does if an interrupt is a wake-up source. You'd
pointlessly requeue the interrupt (bonus points if the irqchip doesn't
provide a HW-based retrigger mechanism).

I still maintain that this change should only be applied for the
particular interrupts that *require* it, and not as a blanket change
affecting everything under the sun. I have proposed such a change in
the past, feel free to use it or roll your own.

In the meantime, I strongly oppose this change as proposed.

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2023-06-01 07:27:49

by James Gowans

[permalink] [raw]
Subject: Re: [PATCH 2/2] genirq: fasteoi resends interrupt on concurrent invoke

On Wed, 2023-05-31 at 08:00 +0100, Marc Zyngier wrote:
> > Generally it should not be possible for the next interrupt to arrive
> > while the previous handler is still running: the next interrupt should
> > only arrive after the EOI message has been sent and the previous handler
> > has returned.
>
> There is no such message with LPIs. I pointed that out previously.

Arg, thanks, I'll re-word this to:

"Generally it should not be possible for the next interrupt to arrive
while the previous handler is still running: the CPU will not preempt an
interrupt with another from the same source or same priority."

I hope that's more accurate?

> > This issue was observed specifically on an arm64 system with a GIC-v3
> > handling MSIs; GIC-v3 uses the handle_fasteoi_irq handler. The issue is
> > that the global ITS is responsible for affinity but does not know
> > whether interrupts are pending/running, only the CPU-local redistributor
> > handles the EOI. Hence when the affinity is changed in the ITS, the new
> > CPU's redistributor does not know that the original CPU is still running
> > the handler.
>
> Similar to your previous patch, you don't explain *why* the interrupt
> gets delivered when it is an LPI, and not for any of the other GICv3
> interrupt types. That's an important point.

Right, you pointed out the issue with this sentence too and I missed
updating it. :-/ How about:

"This issue was observed specifically on an arm64 system with a GIC-v3
handling MSIs; GIC-v3 uses the handle_fasteoi_irq handler. The issue is
that the GIC-v3's physical LPIs do not have a global active state. If LPIs
had an active state, then it would not be be able to be retriggered until
the first CPU had issued a deactivation"

>
> >
> > + /*
> > + * When the race descibed above happens, this will resend the interrupt.
> > + */
> > + if (unlikely(desc->istate & IRQS_PENDING))
> > + check_irq_resend(desc, false);
> > +
> > raw_spin_unlock(&desc->lock);
> > return;
> > out:
>
> While I'm glad that you eventually decided to use the resend mechanism
> instead of spinning on the "old" CPU, I still think imposing this
> behaviour on all users without any discrimination is wrong.
>
> Look at what it does if an interrupt is a wake-up source. You'd
> pointlessly requeue the interrupt (bonus points if the irqchip doesn't
> provide a HW-based retrigger mechanism).
>
> I still maintain that this change should only be applied for the
> particular interrupts that *require* it, and not as a blanket change
> affecting everything under the sun. I have proposed such a change in
> the past, feel free to use it or roll your own.

Thanks for the example of where this blanket functionality wouldn't be
desired - I'll re-work this to introduce and use
the IRQD_RESEND_WHEN_IN_PROGRESS flag as you originally suggested.

Just one more thing before I post V3: are you okay with doing the resend
here *after* the handler finished running, and using the IRQ_PENDING flag
to know to resend it? Or would you like it to be resent in
the !irq_may_run(desc) block as you suggested?

I have a slight preference to do it after, only when we know it's ready to
be run again, and hence not needed to modify check_irq_resend() to cater
for multiple retries.

JG

2023-06-05 16:42:52

by James Gowans

[permalink] [raw]
Subject: Re: [PATCH 2/2] genirq: fasteoi resends interrupt on concurrent invoke

Hi Marc,

On Thu, 2023-06-01 at 09:24 +0200, James Gowans wrote:
> While I'm glad that you eventually decided to use the resend mechanism
> > instead of spinning on the "old" CPU, I still think imposing this
> > behaviour on all users without any discrimination is wrong.
> >
> > Look at what it does if an interrupt is a wake-up source. You'd
> > pointlessly requeue the interrupt (bonus points if the irqchip doesn't
> > provide a HW-based retrigger mechanism).
> >
> > I still maintain that this change should only be applied for the
> > particular interrupts that *require* it, and not as a blanket change
> > affecting everything under the sun. I have proposed such a change in
> > the past, feel free to use it or roll your own.
>
> Thanks for the example of where this blanket functionality wouldn't be
> desired - I'll re-work this to introduce and use
> the IRQD_RESEND_WHEN_IN_PROGRESS flag as you originally suggested.
>
> Just one more thing before I post V3: are you okay with doing the resend
> here *after* the handler finished running, and using the IRQ_PENDING flag
> to know to resend it? Or would you like it to be resent in
> the !irq_may_run(desc) block as you suggested?
>
> I have a slight preference to do it after, only when we know it's ready to
> be run again, and hence not needed to modify check_irq_resend() to cater
> for multiple retries.

Hoping/assuming that you're okay with the keeping the resend at the end of
the function, here's V3:
https://lore.kernel.org/lkml/[email protected]/

It's now very close to what to suggested originally. :-)

One thing: I'm not totally sure that it's necessary or correct to set this
flag on its_vpe_irq_domain_alloc() too - on my system (r6g.metal Graviton
2 host) that doesn't seem to be run. My guess is that it's for a GIC-v4
system which supports posted interrupt which generally get delivery
directly to the vCPU if running, but sometimes need to be delivered to the
hypervisor (for example if vCPU not running). I can try test this on
r7g.metal which might have GIC-v4 to be sure...

JG