2006-03-06 16:40:53

by Vivek Goyal

[permalink] [raw]
Subject: [RFC][PATCH] kdump: x86_64 timer interrupt lockup due to pending interrupt


o check_timer() routine fails while second kernel is booting after a crash
on an opetron box. Problem happens because timer vector (0x31) seems to be
locked.

o After a system crash, it is not safe to service interrupts any more, hence
interrupts are disabled. This leads to pending interrupts at LAPIC. LAPIC
sends these interrupts to the CPU during early boot of second kernel. Other
pending interrupts are discarded saying unexpected trap but timer interrupt
is serviced and CPU does not issue an LAPIC EOI because it think this
interrupt came from i8259 and sends ack to 8259. This leads to vector 0x31
locking as LAPIC does not clear respective ISR and keeps on waiting for
EOI.

o In this patch, one extra EOI is being issued in check_timer() to unlock the
vector. Please suggest if there is a better way to handle this situation.

Signed-off-by: Vivek Goyal <[email protected]>
---

arch/x86_64/kernel/io_apic.c | 15 +++++++++++++++
1 files changed, 15 insertions(+)

diff -puN arch/x86_64/kernel/io_apic.c~x86_64-kdump-pending-timer-interrupt-fix arch/x86_64/kernel/io_apic.c
--- linux-2.6.16-rc5-16M/arch/x86_64/kernel/io_apic.c~x86_64-kdump-pending-timer-interrupt-fix 2006-03-03 12:25:51.000000000 -0500
+++ linux-2.6.16-rc5-16M-root/arch/x86_64/kernel/io_apic.c 2006-03-03 12:32:05.000000000 -0500
@@ -1809,6 +1809,21 @@ static inline void check_timer(void)
if (timer_over_8254 > 0)
enable_8259A_irq(0);

+#ifdef CONFIG_CRASH_DUMP
+ /*
+ * After a crash, we no longer service the interrupts and a pending
+ * timer interrupt (0x31) from previous kernel might still have ISR
+ * bit set. Most probably by now CPU has serviced that pending
+ * interrupt and it did not do ack_APIC_irq() because it thought,
+ * interrupt came from i8259 as ExtInt. LAPIC did not get EOI so it
+ * does not clear the ISR bit and cpu thinks it has already serivced
+ * the interrupt. Hence vector 0x31 is locked. Issue an extra EOI to
+ * LAPIC to unlock.
+ */
+ if (!disable_apic)
+ ack_APIC_irq();
+#endif
+
pin1 = find_isa_irq_pin(0, mp_INT);
apic1 = find_isa_irq_apic(0, mp_INT);
pin2 = ioapic_i8259.pin;
_


2006-03-06 21:43:41

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC][PATCH] kdump: x86_64 timer interrupt lockup due to pending interrupt

On Mon, Mar 06, 2006 at 11:40:34AM -0500, Vivek Goyal wrote:
>
> o check_timer() routine fails while second kernel is booting after a crash
> on an opetron box. Problem happens because timer vector (0x31) seems to be
> locked.
>
> o After a system crash, it is not safe to service interrupts any more, hence
> interrupts are disabled. This leads to pending interrupts at LAPIC. LAPIC
> sends these interrupts to the CPU during early boot of second kernel. Other
> pending interrupts are discarded saying unexpected trap but timer interrupt
> is serviced and CPU does not issue an LAPIC EOI because it think this
> interrupt came from i8259 and sends ack to 8259. This leads to vector 0x31
> locking as LAPIC does not clear respective ISR and keeps on waiting for
> EOI.
>
> o In this patch, one extra EOI is being issued in check_timer() to unlock the
> vector. Please suggest if there is a better way to handle this situation.

Shouldn't we rather do this for all interrupts when the APIC is set up?
I don't see how the timer is special here.

-Andi

2006-03-07 22:21:13

by Vivek Goyal

[permalink] [raw]
Subject: Re: [RFC][PATCH] kdump: x86_64 timer interrupt lockup due to pending interrupt

On Mon, Mar 06, 2006 at 10:43:32PM +0100, Andi Kleen wrote:
> On Mon, Mar 06, 2006 at 11:40:34AM -0500, Vivek Goyal wrote:
> >
> > o check_timer() routine fails while second kernel is booting after a crash
> > on an opetron box. Problem happens because timer vector (0x31) seems to be
> > locked.
> >
> > o After a system crash, it is not safe to service interrupts any more, hence
> > interrupts are disabled. This leads to pending interrupts at LAPIC. LAPIC
> > sends these interrupts to the CPU during early boot of second kernel. Other
> > pending interrupts are discarded saying unexpected trap but timer interrupt
> > is serviced and CPU does not issue an LAPIC EOI because it think this
> > interrupt came from i8259 and sends ack to 8259. This leads to vector 0x31
> > locking as LAPIC does not clear respective ISR and keeps on waiting for
> > EOI.
> >
> > o In this patch, one extra EOI is being issued in check_timer() to unlock the
> > vector. Please suggest if there is a better way to handle this situation.
>
> Shouldn't we rather do this for all interrupts when the APIC is set up?
> I don't see how the timer is special here.
>

Timer is a special case here.

In other cases, the moment interrupts are enabled on cpu, LAPIC pushes pending
interrupts to cpu and it is ignored as bad irq using ack_bad_irq(). This
still sends EOI to LAPIC if LPAIC support is compiled in.

But for timer, the moment pending interrupt is pushed to cpu, it is handled
as valid interrupt and cpu assumes that it came from 8259 and sends ack to
8259 and not to LAPIC. Hence leads to missing EOI for timer vector and
deadlock.

But still doing it generic manner for all interrupts while setting up LAPIC
probably makes more sense. Please find attached the patch.




o check_timer() routine fails while second kernel is booting after a crash
on an opetron box. Problem happens because timer vector (0x31) seems to be
locked.

o After a system crash, it is not safe to service interrupts any more, hence
interrupts are disabled. This leads to pending interrupts at LAPIC. LAPIC
sends these interrupts to the CPU during early boot of second kernel. Other
pending interrupts are discarded saying unexpected trap but timer interrupt
is serviced and CPU does not issue an LAPIC EOI because it think this
interrupt came from i8259 and sends ack to 8259. This leads to vector 0x31
locking as LAPIC does not clear respective ISR and keeps on waiting for
EOI.

o This patch issues extra EOI for the pending interrupts who have ISR set.

o Though today only timer seems to be the special case because in early
boot it thinks interrupts are coming from i8259 and uses
mask_and_ack_8259A() as ack handler and does not issue LAPIC EOI. But
probably doing it in generic manner for all vectors makes sense.

Signed-off-by: Vivek Goyal <[email protected]>
---

arch/x86_64/kernel/apic.c | 21 +++++++++++++++++++++
include/asm-x86_64/apicdef.h | 1 +
2 files changed, 22 insertions(+)

diff -puN arch/x86_64/kernel/apic.c~x86_64-pending-interrupt-fix arch/x86_64/kernel/apic.c
--- linux-2.6.16-rc5-16M/arch/x86_64/kernel/apic.c~x86_64-pending-interrupt-fix 2006-03-07 15:37:49.000000000 -0500
+++ linux-2.6.16-rc5-16M-root/arch/x86_64/kernel/apic.c 2006-03-07 16:25:11.000000000 -0500
@@ -342,6 +342,7 @@ void __init init_bsp_APIC(void)
void __cpuinit setup_local_APIC (void)
{
unsigned int value, maxlvt;
+ int i, j;

value = apic_read(APIC_LVR);

@@ -370,6 +371,26 @@ void __cpuinit setup_local_APIC (void)
value &= ~APIC_TPRI_MASK;
apic_write(APIC_TASKPRI, value);

+#ifdef CONFIG_CRASH_DUMP
+ /*
+ * After a crash, we no longer service the interrupts and a pending
+ * interrupt from previous kernel might still have ISR bit set.
+ *
+ * Most probably by now CPU has serviced that pending interrupt and
+ * it might not have done the ack_APIC_irq() because it thought,
+ * interrupt came from i8259 as ExtInt. LAPIC did not get EOI so it
+ * does not clear the ISR bit and cpu thinks it has already serivced
+ * the interrupt. Hence a vector might get locked. It was noticed
+ * for timer irq (vector 0x31). Issue an extra EOI to clear ISR.
+ */
+ for (i = APIC_ISR_NR - 1; i >= 0; i--) {
+ value = apic_read(APIC_ISR + i*0x10);
+ for (j = 31; j >= 0; j--) {
+ if (value & (1<<j))
+ ack_APIC_irq();
+ }
+ }
+#endif
/*
* Now that we are all set up, enable the APIC
*/
diff -puN include/asm-x86_64/apicdef.h~x86_64-pending-interrupt-fix include/asm-x86_64/apicdef.h
--- linux-2.6.16-rc5-16M/include/asm-x86_64/apicdef.h~x86_64-pending-interrupt-fix 2006-03-07 15:56:10.000000000 -0500
+++ linux-2.6.16-rc5-16M-root/include/asm-x86_64/apicdef.h 2006-03-07 16:27:45.000000000 -0500
@@ -39,6 +39,7 @@
#define APIC_SPIV_FOCUS_DISABLED (1<<9)
#define APIC_SPIV_APIC_ENABLED (1<<8)
#define APIC_ISR 0x100
+#define APIC_ISR_NR 0x8 /* Number of 32 bit ISR registers. */
#define APIC_TMR 0x180
#define APIC_IRR 0x200
#define APIC_ESR 0x280
_

2006-03-07 23:45:52

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [RFC][PATCH] kdump: x86_64 timer interrupt lockup due to pending interrupt

Vivek Goyal <[email protected]> writes:

> On Mon, Mar 06, 2006 at 10:43:32PM +0100, Andi Kleen wrote:
>> On Mon, Mar 06, 2006 at 11:40:34AM -0500, Vivek Goyal wrote:
>> >
>> > o check_timer() routine fails while second kernel is booting after a crash
>> > on an opetron box. Problem happens because timer vector (0x31) seems to be
>> > locked.
>> >
>> > o After a system crash, it is not safe to service interrupts any more, hence
>> > interrupts are disabled. This leads to pending interrupts at LAPIC. LAPIC
>> > sends these interrupts to the CPU during early boot of second kernel. Other
>> > pending interrupts are discarded saying unexpected trap but timer interrupt
>> > is serviced and CPU does not issue an LAPIC EOI because it think this
>> > interrupt came from i8259 and sends ack to 8259. This leads to vector 0x31
>> > locking as LAPIC does not clear respective ISR and keeps on waiting for
>> > EOI.
>> >
>> > o In this patch, one extra EOI is being issued in check_timer() to unlock
> the
>> > vector. Please suggest if there is a better way to handle this situation.
>>
>> Shouldn't we rather do this for all interrupts when the APIC is set up?
>> I don't see how the timer is special here.
>>
>
> Timer is a special case here.
>
> In other cases, the moment interrupts are enabled on cpu, LAPIC pushes pending
> interrupts to cpu and it is ignored as bad irq using ack_bad_irq(). This
> still sends EOI to LAPIC if LPAIC support is compiled in.
>
> But for timer, the moment pending interrupt is pushed to cpu, it is handled
> as valid interrupt and cpu assumes that it came from 8259 and sends ack to
> 8259 and not to LAPIC. Hence leads to missing EOI for timer vector and
> deadlock.
>
> But still doing it generic manner for all interrupts while setting up LAPIC
> probably makes more sense. Please find attached the patch.

A couple of questions.

Does this need to be in #ifdef CONFIG_CRASS_DUMP?
If this code is truly safe I expect we could run it on every bootup
simply to be more robust.

Why is APIC_ISR_NR a hard code? I think there is an apic register
that tells the count.

Does ack_APIC_irq take an argument? I am confused that we are calling
ack_APIC_irq() potentially 8*32 times without passing it anything.


Eric

2006-03-08 01:27:14

by Vivek Goyal

[permalink] [raw]
Subject: Re: [RFC][PATCH] kdump: x86_64 timer interrupt lockup due to pending interrupt

On Tue, Mar 07, 2006 at 04:43:07PM -0700, Eric W. Biederman wrote:
> Vivek Goyal <[email protected]> writes:
> > On Mon, Mar 06, 2006 at 10:43:32PM +0100, Andi Kleen wrote:
> >> On Mon, Mar 06, 2006 at 11:40:34AM -0500, Vivek Goyal wrote:
> >> >

[..]
> >> >
> >> > o In this patch, one extra EOI is being issued in check_timer() to unlock
> > the
> >> > vector. Please suggest if there is a better way to handle this situation.
> >>
> >> Shouldn't we rather do this for all interrupts when the APIC is set up?
> >> I don't see how the timer is special here.
> >>
> >
> > Timer is a special case here.
> >
> > In other cases, the moment interrupts are enabled on cpu, LAPIC pushes pending
> > interrupts to cpu and it is ignored as bad irq using ack_bad_irq(). This
> > still sends EOI to LAPIC if LPAIC support is compiled in.
> >
> > But for timer, the moment pending interrupt is pushed to cpu, it is handled
> > as valid interrupt and cpu assumes that it came from 8259 and sends ack to
> > 8259 and not to LAPIC. Hence leads to missing EOI for timer vector and
> > deadlock.
> >
> > But still doing it generic manner for all interrupts while setting up LAPIC
> > probably makes more sense. Please find attached the patch.
>
> A couple of questions.
>
> Does this need to be in #ifdef CONFIG_CRASS_DUMP?
> If this code is truly safe I expect we could run it on every bootup
> simply to be more robust.
>

AFAIK, we can run this code safely on every bootup and can get rid of
CONFIG_CRASH_DUMP. I have simply put it under it because I observed it
only for crashdump scenarios. But removing this should be good as it
protectets agains buggy boards. Modified patch is attached.


> Why is APIC_ISR_NR a hard code? I think there is an apic register
> that tells the count.
>

I did not find any such register. Basically ISR is a 256bit register. We
are reading 32 bits at a time, so logically we can view it as 8, 32 bit
registers. I had two options. Either I put a constant number in for()
loop or #define it. I chose later one.

> Does ack_APIC_irq take an argument? I am confused that we are calling
> ack_APIC_irq() potentially 8*32 times without passing it anything.
>

It does not take any argument. Whenever a zero is written to EOI register
LAPIC resets one ISR register bit corresponding to highest priority
interrupt. So if all the ISR bits are set, we will call ack_APIC_irq()
8*32 times to reset them all.

Thanks
Vivek



o check_timer() routine fails while second kernel is booting after a crash
on an opetron box. Problem happens because timer vector (0x31) seems to be
locked.

o After a system crash, it is not safe to service interrupts any more, hence
interrupts are disabled. This leads to pending interrupts at LAPIC. LAPIC
sends these interrupts to the CPU during early boot of second kernel. Other
pending interrupts are discarded saying unexpected trap but timer interrupt
is serviced and CPU does not issue an LAPIC EOI because it think this
interrupt came from i8259 and sends ack to 8259. This leads to vector 0x31
locking as LAPIC does not clear respective ISR and keeps on waiting for
EOI.

o This patch issues extra EOI for the pending interrupts who have ISR set.

o Though today only timer seems to be the special case because in early
boot it thinks interrupts are coming from i8259 and uses
mask_and_ack_8259A() as ack handler and does not issue LAPIC EOI. But
probably doing it in generic manner for all vectors makes sense.

Signed-off-by: Vivek Goyal <[email protected]>
---


diff -puN arch/x86_64/kernel/apic.c~x86_64-pending-interrupt-fix arch/x86_64/kernel/apic.c
--- linux-2.6.16-rc5-16M/arch/x86_64/kernel/apic.c~x86_64-pending-interrupt-fix 2006-03-08 11:42:33.000000000 +0530
+++ linux-2.6.16-rc5-16M-root/arch/x86_64/kernel/apic.c 2006-03-08 11:44:49.000000000 +0530
@@ -342,6 +342,7 @@ void __init init_bsp_APIC(void)
void __cpuinit setup_local_APIC (void)
{
unsigned int value, maxlvt;
+ int i, j;

value = apic_read(APIC_LVR);

@@ -371,6 +372,25 @@ void __cpuinit setup_local_APIC (void)
apic_write(APIC_TASKPRI, value);

/*
+ * After a crash, we no longer service the interrupts and a pending
+ * interrupt from previous kernel might still have ISR bit set.
+ *
+ * Most probably by now CPU has serviced that pending interrupt and
+ * it might not have done the ack_APIC_irq() because it thought,
+ * interrupt came from i8259 as ExtInt. LAPIC did not get EOI so it
+ * does not clear the ISR bit and cpu thinks it has already serivced
+ * the interrupt. Hence a vector might get locked. It was noticed
+ * for timer irq (vector 0x31). Issue an extra EOI to clear ISR.
+ */
+ for (i = APIC_ISR_NR - 1; i >= 0; i--) {
+ value = apic_read(APIC_ISR + i*0x10);
+ for (j = 31; j >= 0; j--) {
+ if (value & (1<<j))
+ ack_APIC_irq();
+ }
+ }
+
+ /*
* Now that we are all set up, enable the APIC
*/
value = apic_read(APIC_SPIV);
diff -puN include/asm-x86_64/apicdef.h~x86_64-pending-interrupt-fix include/asm-x86_64/apicdef.h
--- linux-2.6.16-rc5-16M/include/asm-x86_64/apicdef.h~x86_64-pending-interrupt-fix 2006-03-08 11:42:33.000000000 +0530
+++ linux-2.6.16-rc5-16M-root/include/asm-x86_64/apicdef.h 2006-03-08 11:42:33.000000000 +0530
@@ -39,6 +39,7 @@
#define APIC_SPIV_FOCUS_DISABLED (1<<9)
#define APIC_SPIV_APIC_ENABLED (1<<8)
#define APIC_ISR 0x100
+#define APIC_ISR_NR 0x8 /* Number of 32 bit ISR registers. */
#define APIC_TMR 0x180
#define APIC_IRR 0x200
#define APIC_ESR 0x280
_

2006-03-08 04:07:16

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [RFC][PATCH] kdump: x86_64 timer interrupt lockup due to pending interrupt

Vivek Goyal <[email protected]> writes:

> On Tue, Mar 07, 2006 at 04:43:07PM -0700, Eric W. Biederman wrote:
>> Vivek Goyal <[email protected]> writes:
>> > On Mon, Mar 06, 2006 at 10:43:32PM +0100, Andi Kleen wrote:
>> >> On Mon, Mar 06, 2006 at 11:40:34AM -0500, Vivek Goyal wrote:
>> >> >
>
> [..]
>> >> >
>> >> > o In this patch, one extra EOI is being issued in check_timer() to unlock
>> > the
>> >> > vector. Please suggest if there is a better way to handle this situation.
>> >>
>> >> Shouldn't we rather do this for all interrupts when the APIC is set up?
>> >> I don't see how the timer is special here.
>> >>
>> >
>> > Timer is a special case here.
>> >
>> > In other cases, the moment interrupts are enabled on cpu, LAPIC pushes
> pending
>> > interrupts to cpu and it is ignored as bad irq using ack_bad_irq(). This
>> > still sends EOI to LAPIC if LPAIC support is compiled in.
>> >
>> > But for timer, the moment pending interrupt is pushed to cpu, it is handled
>> > as valid interrupt and cpu assumes that it came from 8259 and sends ack to
>> > 8259 and not to LAPIC. Hence leads to missing EOI for timer vector and
>> > deadlock.
>> >
>> > But still doing it generic manner for all interrupts while setting up LAPIC
>> > probably makes more sense. Please find attached the patch.
>>
>> A couple of questions.
>>
>> Does this need to be in #ifdef CONFIG_CRASS_DUMP?
>> If this code is truly safe I expect we could run it on every bootup
>> simply to be more robust.
>>
>
> AFAIK, we can run this code safely on every bootup and can get rid of
> CONFIG_CRASH_DUMP. I have simply put it under it because I observed it
> only for crashdump scenarios. But removing this should be good as it
> protectets agains buggy boards. Modified patch is attached.
>
>
>> Why is APIC_ISR_NR a hard code? I think there is an apic register
>> that tells the count.
>>
>
> I did not find any such register. Basically ISR is a 256bit register. We
> are reading 32 bits at a time, so logically we can view it as 8, 32 bit
> registers. I had two options. Either I put a constant number in for()
> loop or #define it. I chose later one.
>
>> Does ack_APIC_irq take an argument? I am confused that we are calling
>> ack_APIC_irq() potentially 8*32 times without passing it anything.
>>
>
> It does not take any argument. Whenever a zero is written to EOI register
> LAPIC resets one ISR register bit corresponding to highest priority
> interrupt. So if all the ISR bits are set, we will call ack_APIC_irq()
> 8*32 times to reset them all.

Ok. That makes sense.

Looks good to me.

Eric

2006-03-08 13:04:42

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC][PATCH] kdump: x86_64 timer interrupt lockup due to pending interrupt


> o Though today only timer seems to be the special case because in early
> boot it thinks interrupts are coming from i8259 and uses
> mask_and_ack_8259A() as ack handler and does not issue LAPIC EOI. But
> probably doing it in generic manner for all vectors makes sense.

Applied thanks.

Not sure if this is still 2.6.16 material though. Might be too late for that.

-Andi