2007-08-08 11:00:28

by Jarek Poplawski

[permalink] [raw]
Subject: [patch] genirq: temporary fix for level-triggered IRQ resend

Hi,

I see there is a bit of complaining on this original resend temporary
patch. But, since it seems to do a good job for some people, here is
my proposal to limit the 'range of fire' a little bit.

Marcin and Jean-Baptiste: try to test this with 2.6.23-rc2, please.
(Unless Ingo or Thomas have other plans with this problem?)

Thanks,
Jarek P.

-------->

Subject: [patch] genirq: temporary fix for level-triggered IRQ resend


Marcin Slusarz reported a ne2k-pci "hung network interface" regression.

delayed disable relies on the ability to re-trigger the interrupt in the
case that a real interrupt happens after the software disable was set.
In this case we actually disable the interrupt on the hardware level
_after_ it occurred.

On enable_irq, we need to re-trigger the interrupt. On i386 this relies
on a hardware resend mechanism (send_IPI_self()).

Actually we only need the resend for edge type interrupts. Level type
interrupts come back once enable_irq() re-enables the interrupt line.

Marcin found that when he disables the irq line on the hardware level
(removing the delayed disable) the card is kept alive.

Thomas Gleixner prepared a testing patch which proved to fix hung ups
after limiting irqs resending to edge type only. Then Ingo Molnar's
version of this patch was applied to 2.6.23-rc2 kernel as a temporary
solution. Since, the problem is still diagnosed, but seems to affect
mainly X86_64 arch, here is a modified, but still temporary, version of
this solution, which should limit the range of warnings and changes in
interrupt handling to minimum.


Signed-off-by: Jarek Poplawski <[email protected]>
Cc: Marcin Slusarz <[email protected]>
Cc: Jean-Baptiste Vignaud <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>

---

diff -Nurp 2.6.23-rc2-/kernel/irq/resend.c 2.6.23-rc2/kernel/irq/resend.c
--- 2.6.23-rc2-/kernel/irq/resend.c 2007-08-08 11:48:15.000000000 +0200
+++ 2.6.23-rc2/kernel/irq/resend.c 2007-08-08 11:58:42.000000000 +0200
@@ -62,18 +62,19 @@ void check_irq_resend(struct irq_desc *d
*/
desc->chip->enable(irq);

+ if ((status & (IRQ_PENDING | IRQ_REPLAY)) == IRQ_PENDING) {
+ desc->status = (status & ~IRQ_PENDING) | IRQ_REPLAY;
+
+#ifdef CONFIG_X86_64
/*
* Temporary hack to figure out more about the problem, which
* is causing the ancient network cards to die.
*/
- if (desc->handle_irq != handle_edge_irq) {
- WARN_ON_ONCE(1);
- return;
- }
-
- if ((status & (IRQ_PENDING | IRQ_REPLAY)) == IRQ_PENDING) {
- desc->status = (status & ~IRQ_PENDING) | IRQ_REPLAY;
-
+ if (desc->handle_irq == handle_fasteoi_irq) {
+ WARN_ON_ONCE(1);
+ return;
+ }
+#endif
if (!desc->chip || !desc->chip->retrigger ||
!desc->chip->retrigger(irq)) {
#ifdef CONFIG_HARDIRQS_SW_RESEND


2007-08-09 08:13:40

by Jean-Baptiste Vignaud

[permalink] [raw]
Subject: Re:[patch] genirq: temporary fix for level-triggered IRQ resend

> Hi,
>
> I see there is a bit of complaining on this original resend temporary
> patch. But, since it seems to do a good job for some people, here is
> my proposal to limit the 'range of fire' a little bit.
>
> Marcin and Jean-Baptiste: try to test this with 2.6.23-rc2, please.
> (Unless Ingo or Thomas have other plans with this problem?)

2.6.23-rc2 + this patch, and the box's cards are still networking after 20 hours.

RX bytes:452423991847 (421.3 GiB) TX bytes:13464471620 (12.5 GiB)

still testing.

Jb