From: "H. Peter Anvin (Intel)" <[email protected]>
The current IRQ vector allocation code should be "clean" and never
issue a IRQ_MOVE_CLEANUP_VECTOR IPI for an interrupt that could still
be pending. This should make it possible to move it to the "normal"
system IRQ vector range. This should probably be a three-step process:
1. Introduce this WARN_ONCE() on this event ever occurring.
2. Remove the self-IPI hack.
3. Move the IRQ_MOVE_CLEANUP_VECTOR to the sysvec range.
This implements step 1.
[ Previous versions of this patch had steps 2 and 3 reversed. ]
Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
---
arch/x86/kernel/apic/vector.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 6dbdc7c22bb7..7ba2982a3585 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -939,9 +939,14 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_irq_move_cleanup)
* to this CPU. IRQ_MOVE_CLEANUP_VECTOR is the lowest
* priority external vector, so on return from this
* interrupt the device interrupt will happen first.
+ *
+ * *** This should never happen with the current IRQ
+ * cleanup code, so WARN_ONCE() for now, and
+ * eventually get rid of this hack.
*/
irr = apic_read(APIC_IRR + (vector / 32 * 0x10));
if (irr & (1U << (vector % 32))) {
+ WARN_ONCE(1, "irq_move_cleanup called on still pending interrupt\n");
apic->send_IPI_self(IRQ_MOVE_CLEANUP_VECTOR);
continue;
}
--
2.31.1
On Wed, May 19 2021 at 14:21, H. Peter Anvin wrote:
> The current IRQ vector allocation code should be "clean" and never
> issue a IRQ_MOVE_CLEANUP_VECTOR IPI for an interrupt that could still
> be pending. This should make it possible to move it to the "normal"
> system IRQ vector range. This should probably be a three-step process:
>
> 1. Introduce this WARN_ONCE() on this event ever occurring.
> 2. Remove the self-IPI hack.
> 3. Move the IRQ_MOVE_CLEANUP_VECTOR to the sysvec range.
Second thoughts on this.
Despite having a halfways sane mechanism now, that warning still can be
triggered for remapped interrupts which can be moved from task context.
Interrupt is bound to CPU1 and moved to CPU2.
CPU0 CPU1 CPU2 Device
set_affinity() local_irq_disable()
Raise interrupt
-> pending in IRR on CPU1
remap to CPU2
(after this point all interrupts will go to CPU2)
Raise interrupt
on CPU2
handle_irq()
send_cleanup_ipi()
local_irq_enable()
handle_irq()
handle_cleanup_ipi()
Now if we move the cleanup vector up (higher priority) then CPU1 will
have:
local_irq_enable()
handle_cleanup_ipi()
WARN(irq set in IRR)
Unlikely but bound to happen.
Adding the WARN_ON() to the current code is harmless because it can't
trigger. Let me think some more about that.
Thanks,
tglx