2007-09-26 18:23:57

by David Bahi

[permalink] [raw]
Subject: nmi_watchdog fix for x86_64 to be more like i386

Thanks to tglx and ghaskins for all the help in tracking down a very
early nmi_watchdog crash on certain x86_64 machines.


Attachments:
(No filename) (123.00 B)
watchdog_use_timer_and_hpet_on_x86_64.patch (1.08 kB)
signature.asc (190.00 B)
Download all attachments

2007-10-01 17:36:17

by Andi Kleen

[permalink] [raw]
Subject: Re: nmi_watchdog fix for x86_64 to be more like i386

On Wednesday 26 September 2007 20:03:12 David Bahi wrote:
> Thanks to tglx and ghaskins for all the help in tracking down a very
> early nmi_watchdog crash on certain x86_64 machines.

The patch is totally bogus. irq 0 doesn't say anything about whether
the current CPU still works or not. You always need some local
interrupt. This basically disables the NMI watchdog for the non boot CPUs.

It's even wrong on i386 -- i wonder how that broken patch
made it in there. I'll remove it there.

-Andi

2007-10-01 18:54:37

by Thomas Gleixner

[permalink] [raw]
Subject: Re: nmi_watchdog fix for x86_64 to be more like i386

On Mon, 1 Oct 2007, Andi Kleen wrote:

> On Wednesday 26 September 2007 20:03:12 David Bahi wrote:
> > Thanks to tglx and ghaskins for all the help in tracking down a very
> > early nmi_watchdog crash on certain x86_64 machines.
>
> The patch is totally bogus. irq 0 doesn't say anything about whether
> the current CPU still works or not. You always need some local
> interrupt. This basically disables the NMI watchdog for the non boot CPUs.
>
> It's even wrong on i386 -- i wonder how that broken patch
> made it in there. I'll remove it there.

Right, it's wrong for the broadcast case, but simply removing it will
trigger false positives on the CPU which runs the broadcast timer. I
fix this proper.

tglx

2007-10-01 19:17:19

by Andi Kleen

[permalink] [raw]
Subject: Re: nmi_watchdog fix for x86_64 to be more like i386

On Monday 01 October 2007 20:54:21 Thomas Gleixner wrote:
> On Mon, 1 Oct 2007, Andi Kleen wrote:
>
> > On Wednesday 26 September 2007 20:03:12 David Bahi wrote:
> > > Thanks to tglx and ghaskins for all the help in tracking down a very
> > > early nmi_watchdog crash on certain x86_64 machines.
> >
> > The patch is totally bogus. irq 0 doesn't say anything about whether
> > the current CPU still works or not. You always need some local
> > interrupt. This basically disables the NMI watchdog for the non boot CPUs.
> >
> > It's even wrong on i386 -- i wonder how that broken patch
> > made it in there. I'll remove it there.
>
> Right, it's wrong for the broadcast case, but simply removing it will
> trigger false positives on the CPU which runs the broadcast timer. I
> fix this proper.

I already did this here by checking for cpu != 0. But it also needs either tracking
or forbidding migrations of irq 0. I can take care of the patch.

-Andi

2007-10-01 19:28:17

by Thomas Gleixner

[permalink] [raw]
Subject: Re: nmi_watchdog fix for x86_64 to be more like i386

On Mon, 1 Oct 2007, Andi Kleen wrote:
> On Monday 01 October 2007 20:54:21 Thomas Gleixner wrote:
> > On Mon, 1 Oct 2007, Andi Kleen wrote:
> >
> > > On Wednesday 26 September 2007 20:03:12 David Bahi wrote:
> > > > Thanks to tglx and ghaskins for all the help in tracking down a very
> > > > early nmi_watchdog crash on certain x86_64 machines.
> > >
> > > The patch is totally bogus. irq 0 doesn't say anything about whether
> > > the current CPU still works or not. You always need some local
> > > interrupt. This basically disables the NMI watchdog for the non boot CPUs.
> > >
> > > It's even wrong on i386 -- i wonder how that broken patch
> > > made it in there. I'll remove it there.
> >
> > Right, it's wrong for the broadcast case, but simply removing it will
> > trigger false positives on the CPU which runs the broadcast timer. I
> > fix this proper.
>
> I already did this here by checking for cpu != 0. But it also needs either tracking
> or forbidding migrations of irq 0. I can take care of the patch.

I was thinking about the same fix. On i386 we already have the irq
migration / balancing of irq 0 disabled. That's why we setup IRQ0 with
IRQ_NOBALANCING.

tglx




2007-10-01 19:58:31

by Arjan van de Ven

[permalink] [raw]
Subject: Re: nmi_watchdog fix for x86_64 to be more like i386

On Mon, 1 Oct 2007 21:27:39 +0200 (CEST)

> > I already did this here by checking for cpu != 0. But it also needs
> > either tracking or forbidding migrations of irq 0. I can take care
> > of the patch.
>
> I was thinking about the same fix. On i386 we already have the irq
> migration / balancing of irq 0 disabled. That's why we setup IRQ0 with
> IRQ_NOBALANCING.

btw doing this is a problem if the user decides to hot(un)plug cpu 0...
he then can't move the irqs away to do that

2007-10-01 20:11:20

by Dave Jones

[permalink] [raw]
Subject: Re: nmi_watchdog fix for x86_64 to be more like i386

On Mon, Oct 01, 2007 at 12:56:26PM -0700, Arjan van de Ven wrote:
> On Mon, 1 Oct 2007 21:27:39 +0200 (CEST)
>
> > > I already did this here by checking for cpu != 0. But it also needs
> > > either tracking or forbidding migrations of irq 0. I can take care
> > > of the patch.
> >
> > I was thinking about the same fix. On i386 we already have the irq
> > migration / balancing of irq 0 disabled. That's why we setup IRQ0 with
> > IRQ_NOBALANCING.
>
> btw doing this is a problem if the user decides to hot(un)plug cpu 0...
> he then can't move the irqs away to do that

You can't hot unplug cpu0.
Take a look in sysfs, no /sys/devices/system/cpu/cpu0/online file.

Dave

--
http://www.codemonkey.org.uk

2007-10-01 20:12:21

by Paul E. McKenney

[permalink] [raw]
Subject: Re: nmi_watchdog fix for x86_64 to be more like i386

On Mon, Oct 01, 2007 at 12:56:26PM -0700, Arjan van de Ven wrote:
> On Mon, 1 Oct 2007 21:27:39 +0200 (CEST)
>
> > > I already did this here by checking for cpu != 0. But it also needs
> > > either tracking or forbidding migrations of irq 0. I can take care
> > > of the patch.
> >
> > I was thinking about the same fix. On i386 we already have the irq
> > migration / balancing of irq 0 disabled. That's why we setup IRQ0 with
> > IRQ_NOBALANCING.
>
> btw doing this is a problem if the user decides to hot(un)plug cpu 0...
> he then can't move the irqs away to do that

Last I tried it, x86_64 and i386 in fact prohibit hotunplugging CPU 0.

Thanx, Paul

2007-10-01 21:18:19

by Thomas Gleixner

[permalink] [raw]
Subject: Re: nmi_watchdog fix for x86_64 to be more like i386

On Mon, 1 Oct 2007, Arjan van de Ven wrote:
> > > I already did this here by checking for cpu != 0. But it also needs
> > > either tracking or forbidding migrations of irq 0. I can take care
> > > of the patch.
> >
> > I was thinking about the same fix. On i386 we already have the irq
> > migration / balancing of irq 0 disabled. That's why we setup IRQ0 with
> > IRQ_NOBALANCING.
>
> btw doing this is a problem if the user decides to hot(un)plug cpu 0...
> he then can't move the irqs away to do that

IRQ_NOBALANCING is not preventing cpu unplug. It moves the affinity to the
next CPU, but the check in NMI watchdog for CPU == 0 would not longer
work.

Fix below. Post .23 material. I work out a separate one for the x8664
clock events series.

tglx

------------>

[PATCH] i386: Fix nmi watchdog per cpu timer irq accounting

The clock events patches changed the interrupt distribution and the
local apic timer interrupt accounting for the broadcast case. The per
cpu clock events handler of the cpu, which runs the broadcast
interrupt, is executed directly in the broadcast irq context. This
does not invoke the low level arch code, which does the local apic
timer irq accounting. The work around for false positives in the nmi
watchdog was to add the irq0 interrupts (broadcast device) to the
local apic timer interrupts. This falsifies the results for the CPUs
which are not handling the broadcast interrupt, i.e. stuck CPUs might
be not detected, as noticed by Andi Kleen.

It would be possible to move the clockevents handler invocation of the
CPU which runs the broadcast interrupt into the tick device broadcast
function, but this would require to handle the per cpu device to this
function and perform the direct operation in the clock device specific
architecture code. Right now this is only i386 and x86_64, but MIPS is
on the way to use the broadcast mode as well.

Introduce a weak function tick_broadcast_account(), which allows x86
to adjust the local apic timer interrupt counter in the case when the
cpu local timer handler has been invoked. This keeps the cpu local
handler decision and invocation in the common code and allows x86 to
handle the nmi watchdog accounting correctly.

Signed-off-by: Thomas Gleixner <[email protected]>

diff --git a/arch/i386/kernel/apic.c b/arch/i386/kernel/apic.c
index 3d67ae1..180dde8 100644
--- a/arch/i386/kernel/apic.c
+++ b/arch/i386/kernel/apic.c
@@ -283,6 +283,16 @@ static void lapic_timer_broadcast(cpumask_t mask)
}

/*
+ * Called from the broadcasting code to keep the local apic timer irq
+ * accounting straight for the nmi watchdog. Is called with interrupts
+ * disabled.
+ */
+void tick_broadcast_account(int cpu)
+{
+ per_cpu(irq_stat, cpu).apic_timer_irqs++;
+}
+
+/*
* Setup the local APIC timer for this CPU. Copy the initilized values
* of the boot CPU and register the clock event in the framework.
*/
diff --git a/arch/i386/kernel/nmi.c b/arch/i386/kernel/nmi.c
index c7227e2..03cdcaf 100644
--- a/arch/i386/kernel/nmi.c
+++ b/arch/i386/kernel/nmi.c
@@ -349,11 +349,7 @@ __kprobes int nmi_watchdog_tick(struct pt_regs * regs, unsigned reason)
cpu_clear(cpu, backtrace_mask);
}

- /*
- * Take the local apic timer and PIT/HPET into account. We don't
- * know which one is active, when we have highres/dyntick on
- */
- sum = per_cpu(irq_stat, cpu).apic_timer_irqs + kstat_cpu(cpu).irqs[0];
+ sum = per_cpu(irq_stat, cpu).apic_timer_irqs;

/* if the none of the timers isn't firing, this cpu isn't doing much */
if (!touched && last_irq_sums[cpu] == sum) {
diff --git a/include/linux/tick.h b/include/linux/tick.h
index 9a7252e..99b3021 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -73,6 +73,7 @@ static inline void tick_cancel_sched_timer(int cpu) { }
# ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
extern struct tick_device *tick_get_broadcast_device(void);
extern cpumask_t *tick_get_broadcast_mask(void);
+extern void tick_broadcast_account(int cpu);

# ifdef CONFIG_TICK_ONESHOT
extern cpumask_t *tick_get_broadcast_oneshot_mask(void);
diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 0962e05..43d0085 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -123,6 +123,16 @@ int tick_device_uses_broadcast(struct clock_event_device *dev, int cpu)
}

/*
+ * Weak function for cpu local interrupt accounting. Used by x86 to
+ * keep the lapic accounting correct for nmi_watchdog.
+ *
+ * Must be called with interrupts disabled.
+ */
+void __attribute__((weak)) tick_broadcast_account(int cpu)
+{
+}
+
+/*
* Broadcast the event to the cpus, which are set in the mask
*/
int tick_do_broadcast(cpumask_t mask)
@@ -137,6 +147,7 @@ int tick_do_broadcast(cpumask_t mask)
cpu_clear(cpu, mask);
td = &per_cpu(tick_cpu_device, cpu);
td->evtdev->event_handler(td->evtdev);
+ tick_broadcast_account(cpu);
ret = 1;
}

2007-10-01 21:43:46

by Andi Kleen

[permalink] [raw]
Subject: Re: nmi_watchdog fix for x86_64 to be more like i386


> IRQ_NOBALANCING is not preventing cpu unplug. It moves the affinity to the
> next CPU, but the check in NMI watchdog for CPU == 0 would not longer
> work.

That cannot happen right now because cpu_disable() on both i386/x86-64
reject CPU #0. So just setting IRQ_NOBALANCING is sufficient and both
do that already. I was wrong earlier in being concerned about this.

> int tick_do_broadcast(cpumask_t mask)
> @@ -137,6 +147,7 @@ int tick_do_broadcast(cpumask_t mask)
> cpu_clear(cpu, mask);
> td = &per_cpu(tick_cpu_device, cpu);
> td->evtdev->event_handler(td->evtdev);
> + tick_broadcast_account(cpu);

That would not handle the case with a single CPU running only
irq 0 but not broadcasting I think.

I believe ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/fix-watchdog
is the correct fix

-Andi

2007-10-01 21:59:19

by Thomas Gleixner

[permalink] [raw]
Subject: Re: nmi_watchdog fix for x86_64 to be more like i386

On Mon, 1 Oct 2007, Andi Kleen wrote:
>
> > IRQ_NOBALANCING is not preventing cpu unplug. It moves the affinity to the
> > next CPU, but the check in NMI watchdog for CPU == 0 would not longer
> > work.
>
> That cannot happen right now because cpu_disable() on both i386/x86-64
> reject CPU #0. So just setting IRQ_NOBALANCING is sufficient and both
> do that already. I was wrong earlier in being concerned about this.
>
> > int tick_do_broadcast(cpumask_t mask)
> > @@ -137,6 +147,7 @@ int tick_do_broadcast(cpumask_t mask)
> > cpu_clear(cpu, mask);
> > td = &per_cpu(tick_cpu_device, cpu);
> > td->evtdev->event_handler(td->evtdev);
> > + tick_broadcast_account(cpu);
>
> That would not handle the case with a single CPU running only
> irq 0 but not broadcasting I think.

Hmm. The only situation where this can happen is when you add
"nolapic_timer" to the command line on a single CPU system. We do not
register the lapic "dummy" clock event device then.

> I believe
> ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/fix-watchdog
> is the correct fix

Yup, I completely missed the fact, that we reject CPU#0 unplugging, so
your fix seems indeed to be more correct and simpler.

OTOH, the accounting hook would allow us to remove the IRQ#0 -> CPU#0
restriction. Not sure whether it's worth the trouble.

tglx

2007-10-01 22:09:03

by Andi Kleen

[permalink] [raw]
Subject: Re: nmi_watchdog fix for x86_64 to be more like i386


> OTOH, the accounting hook would allow us to remove the IRQ#0 -> CPU#0
> restriction. Not sure whether it's worth the trouble.

Some SIS chipsets hang the machine when you migrate irq 0 to another
CPU. It's better to keep that Also I wouldn't be surprised if there are some
other assumptions about this elsewhere.

Ok in theory it could be done only on SIS, but that probably would really
not be worth the trouble

-Andi

2007-10-01 22:47:53

by Thomas Gleixner

[permalink] [raw]
Subject: Re: nmi_watchdog fix for x86_64 to be more like i386

On Tue, 2 Oct 2007, Andi Kleen wrote:
>
> > OTOH, the accounting hook would allow us to remove the IRQ#0 -> CPU#0
> > restriction. Not sure whether it's worth the trouble.
>
> Some SIS chipsets hang the machine when you migrate irq 0 to another
> CPU. It's better to keep that Also I wouldn't be surprised if there are some
> other assumptions about this elsewhere.
>
> Ok in theory it could be done only on SIS, but that probably would really
> not be worth the trouble

Agreed.

I just got a x8664-hrt report, where I found the following oddity:

0: 1197 172881 IO-APIC-edge timer

That's one of those infamous AMD C1E boxen. Strange, all my systems have
IRQ#0 on CPU#0 and nowhere else. Any idea ?

tglx

2007-10-01 22:54:18

by Arjan van de Ven

[permalink] [raw]
Subject: Re: nmi_watchdog fix for x86_64 to be more like i386

On Tue, 2 Oct 2007 00:47:12 +0200 (CEST)
Thomas Gleixner <[email protected]> wrote:

> On Tue, 2 Oct 2007, Andi Kleen wrote:
> >
> > > OTOH, the accounting hook would allow us to remove the IRQ#0 ->
> > > CPU#0 restriction. Not sure whether it's worth the trouble.
> >
> > Some SIS chipsets hang the machine when you migrate irq 0 to another
> > CPU. It's better to keep that Also I wouldn't be surprised if there
> > are some other assumptions about this elsewhere.
> >
> > Ok in theory it could be done only on SIS, but that probably would
> > really not be worth the trouble
>
> Agreed.
>
> I just got a x8664-hrt report, where I found the following oddity:
>
> 0: 1197 172881 IO-APIC-edge timer
>
> That's one of those infamous AMD C1E boxen. Strange, all my systems
> have IRQ#0 on CPU#0 and nowhere else. Any idea ?

2 things;
the current irq balancers don't balance the timer interrupt, in fact
they'll leave it alone (so the chipset might balance)
older ones pin it to cpu 0 or rotate it....

2007-10-02 04:57:00

by Mika Penttilä

[permalink] [raw]
Subject: Re: nmi_watchdog fix for x86_64 to be more like i386

Thomas Gleixner wrote:
> On Tue, 2 Oct 2007, Andi Kleen wrote:
>
>>> OTOH, the accounting hook would allow us to remove the IRQ#0 -> CPU#0
>>> restriction. Not sure whether it's worth the trouble.
>>>
>> Some SIS chipsets hang the machine when you migrate irq 0 to another
>> CPU. It's better to keep that Also I wouldn't be surprised if there are some
>> other assumptions about this elsewhere.
>>
>> Ok in theory it could be done only on SIS, but that probably would really
>> not be worth the trouble
>>
>
> Agreed.
>
> I just got a x8664-hrt report, where I found the following oddity:
>
> 0: 1197 172881 IO-APIC-edge timer
>
> That's one of those infamous AMD C1E boxen. Strange, all my systems have
> IRQ#0 on CPU#0 and nowhere else. Any idea ?
>
> tglx
>
>
Here I have with stock FC7 (2.6.22.9-91) kernel :
0: 107835 133459760 IO-APIC-edge timer

Processor:
vendor_id : AuthenticAMD
cpu family : 15
model : 107
model name : AMD Athlon(tm) 64 X2 Dual Core Processor 4000+
stepping : 1
cpu MHz : 2109.721
cache size : 512 KB

MB:
Asus M2N-E (NF570)

--Mika


2007-10-02 05:02:30

by Arjan van de Ven

[permalink] [raw]
Subject: Re: nmi_watchdog fix for x86_64 to be more like i386

On Tue, 02 Oct 2007 07:56:24 +0300
Mika Penttilä <[email protected]> wrote:


> Here I have with stock FC7 (2.6.22.9-91) kernel :
> 0: 107835 133459760 IO-APIC-edge timer
>


fwiw this is entirely done by the hardware; no irq balancer has touched
this one (fc7 has the new irqbalancer which leaves irq0 alone)

2007-10-02 05:51:55

by Andi Kleen

[permalink] [raw]
Subject: Re: nmi_watchdog fix for x86_64 to be more like i386


> Agreed.
>
> I just got a x8664-hrt report, where I found the following oddity:
>
> 0: 1197 172881 IO-APIC-edge timer
>
> That's one of those infamous AMD C1E boxen. Strange, all my systems have
> IRQ#0 on CPU#0 and nowhere else. Any idea ?

Hmm, in lowestpriority mode it would be possible that the APIC changes
the CPU to #1 once; but IRQ 0 is always set to fixed mode. Also even
if that happens you should have them all on 1.

Maybe the chipset is just ignoring the IO-APIC configuration in this case?

Is it always the same chipset? Is it seen on i386 too?

The problem is really that if this happens it's more than the NMI watchdog
that is broken. If you don't run an additional APIC timer interrupt on CPU #0
it's possible that CPU #0 won't schedule at all.

The only workaround for chipsets ignoring IRQ affinity would be to keep
track on which CPU irq 0 happens and then restart APIC timer interrupts
on the others (or send IPIs) as needed. But that would be fairly ugly.

IRQ 0 is unfortunately an generally under-validated area because Windows doesn't
use it.

-Andi

2007-10-02 06:19:26

by Thomas Gleixner

[permalink] [raw]
Subject: Re: nmi_watchdog fix for x86_64 to be more like i386

On Tue, 2 Oct 2007, Andi Kleen wrote:
> > Agreed.
> >
> > I just got a x8664-hrt report, where I found the following oddity:
> >
> > 0: 1197 172881 IO-APIC-edge timer
> >
> > That's one of those infamous AMD C1E boxen. Strange, all my systems have
> > IRQ#0 on CPU#0 and nowhere else. Any idea ?
>
> Hmm, in lowestpriority mode it would be possible that the APIC changes
> the CPU to #1 once; but IRQ 0 is always set to fixed mode. Also even
> if that happens you should have them all on 1.
>
> Maybe the chipset is just ignoring the IO-APIC configuration in this case?
>
> Is it always the same chipset? Is it seen on i386 too?
>
> The problem is really that if this happens it's more than the NMI watchdog
> that is broken. If you don't run an additional APIC timer interrupt on CPU #0
> it's possible that CPU #0 won't schedule at all.
>
> The only workaround for chipsets ignoring IRQ affinity would be to keep
> track on which CPU irq 0 happens and then restart APIC timer interrupts
> on the others (or send IPIs) as needed. But that would be fairly ugly.

The clock events code does handle this already. The broadcast interrupt
can come in on any cpu. It's just the nmi watchdog which would be affected
by that.

tglx

2007-10-05 16:01:35

by David Bahi

[permalink] [raw]
Subject: Re: nmi_watchdog fix for x86_64 to be more like i386

On Mon, 2007-10-01 at 23:41 +0200, Andi Kleen wrote:
> > IRQ_NOBALANCING is not preventing cpu unplug. It moves the affinity to the
> > next CPU, but the check in NMI watchdog for CPU == 0 would not longer
> > work.
>
> That cannot happen right now because cpu_disable() on both i386/x86-64
> reject CPU #0. So just setting IRQ_NOBALANCING is sufficient and both
> do that already. I was wrong earlier in being concerned about this.
>
> > int tick_do_broadcast(cpumask_t mask)
> > @@ -137,6 +147,7 @@ int tick_do_broadcast(cpumask_t mask)
> > cpu_clear(cpu, mask);
> > td = &per_cpu(tick_cpu_device, cpu);
> > td->evtdev->event_handler(td->evtdev);
> > + tick_broadcast_account(cpu);
>
> That would not handle the case with a single CPU running only
> irq 0 but not broadcasting I think.
>
> I believe ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/fix-watchdog
> is the correct fix
>
> -Andi

Andi,

If it's agreed that this is the fix - can you submit a proper [PATCH] so
all users of watchdog_use_timer_and_hpet_on_x86_64.patch can be removed,
and replaced with yours.

Thank you very much



Attachments:
(No filename) (1.09 kB)
signature.asc (190.00 B)
Download all attachments

2007-10-05 16:05:38

by Andi Kleen

[permalink] [raw]
Subject: Re: nmi_watchdog fix for x86_64 to be more like i386


> If it's agreed that this is the fix - can you submit a proper [PATCH] so
> all users of watchdog_use_timer_and_hpet_on_x86_64.patch can be removed,
> and replaced with yours.

ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/watchdog-fix

-Andi

2007-10-05 17:41:10

by Peter W. Morreale

[permalink] [raw]
Subject: Re: nmi_watchdog fix for x86_64 to be more like i386

On Fri, 2007-10-05 at 18:03 +0200, Andi Kleen wrote:
> > If it's agreed that this is the fix - can you submit a proper [PATCH] so
> > all users of watchdog_use_timer_and_hpet_on_x86_64.patch can be removed,
> > and replaced with yours.
>
> ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/watchdog-fix
>
> -Andi
>

Hummm, a directory appears to be protected, or DNE. I get "failed to
change directory".

Thx,
-PWM



2007-10-05 18:00:29

by Andi Kleen

[permalink] [raw]
Subject: Re: nmi_watchdog fix for x86_64 to be more like i386


> Hummm, a directory appears to be protected, or DNE. I get "failed to
> change directory".

Sorry typo

ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/fix-watchdog

-Andi

2007-10-05 18:29:20

by Pallipadi, Venkatesh

[permalink] [raw]
Subject: RE: nmi_watchdog fix for x86_64 to be more like i386



>-----Original Message-----
>From: [email protected]
>[mailto:[email protected]] On Behalf Of
>Thomas Gleixner
>Sent: Monday, October 01, 2007 11:19 PM
>To: Andi Kleen
>Cc: Arjan van de Ven; David Bahi; LKML;
>[email protected]; Andrew Morton; Ingo Molnar;
>Gregory Haskins
>Subject: Re: nmi_watchdog fix for x86_64 to be more like i386
>
>>
>> The only workaround for chipsets ignoring IRQ affinity would
>be to keep
>> track on which CPU irq 0 happens and then restart APIC timer
>interrupts
>> on the others (or send IPIs) as needed. But that would be
>fairly ugly.
>
>The clock events code does handle this already. The broadcast
>interrupt
>can come in on any cpu. It's just the nmi watchdog which would
>be affected
>by that.
>

Probably we can workaround this by keeping track of IRQ0 count at percpu
level and
use local apic timer + this percpu counter in NMI. Or just increment
local
apic timer count in IRQ0 with nohz enabled.

Thanks,
Venki

2007-10-05 20:37:49

by Thomas Gleixner

[permalink] [raw]
Subject: RE: nmi_watchdog fix for x86_64 to be more like i386

On Thu, 4 Oct 2007, Pallipadi, Venkatesh wrote:
> >-----Original Message-----
> >From: [email protected]
> >[mailto:[email protected]] On Behalf Of
> >Thomas Gleixner
> >Sent: Monday, October 01, 2007 11:19 PM
> >To: Andi Kleen
> >Cc: Arjan van de Ven; David Bahi; LKML;
> >[email protected]; Andrew Morton; Ingo Molnar;
> >Gregory Haskins
> >Subject: Re: nmi_watchdog fix for x86_64 to be more like i386
> >
> >>
> >> The only workaround for chipsets ignoring IRQ affinity would
> >be to keep
> >> track on which CPU irq 0 happens and then restart APIC timer
> >interrupts
> >> on the others (or send IPIs) as needed. But that would be
> >fairly ugly.
> >
> >The clock events code does handle this already. The broadcast
> >interrupt
> >can come in on any cpu. It's just the nmi watchdog which would
> >be affected
> >by that.
> >
>
> Probably we can workaround this by keeping track of IRQ0 count at percpu
> level and
> use local apic timer + this percpu counter in NMI. Or just increment
> local
> apic timer count in IRQ0 with nohz enabled.

No, I tried that. It's ugly.

The per cpu accounting is the correct way to go if we want to take
care of those systems, which ignore the CPU0 binding of irq0.

See patch against the x86 tree below.

tglx

-------------------->
commit 093976c7ad206a008bd5de4619f40f6bca4a79c3
Author: Thomas Gleixner <[email protected]>
Date: Fri Oct 5 22:19:18 2007 +0200

x86: Fix irq0 / local apic timer accounting

The clock events merge introduced a change to the nmi watchdog code to
handle the not longer increasing local apic timer count in the
broadcast mode. This is fine for UP, but on SMP it pampers over a
stuck CPU which is not handling the broadcast interrupt due to the
unconditional sum up of local apic timer count and irq0 count.

To cover all cases we need to keep track on which CPU irq0 is
handled. In theory this is CPU#0 due to the explicit disabling of irq
balancing for irq0, but there are systems which ignore this on the
hardware level. The per cpu irq0 accounting allows us to remove the
irq0 to CPU0 binding as well.

Add a per cpu counter for irq0 and evaluate this instead of the global
irq0 count in the nmi watchdog code.

Signed-off-by: Thomas Gleixner <[email protected]>

diff --git a/arch/x86/kernel/nmi_32.c b/arch/x86/kernel/nmi_32.c
index c7227e2..95d3fc2 100644
--- a/arch/x86/kernel/nmi_32.c
+++ b/arch/x86/kernel/nmi_32.c
@@ -353,7 +353,8 @@ __kprobes int nmi_watchdog_tick(struct pt_regs * regs, unsigned reason)
* Take the local apic timer and PIT/HPET into account. We don't
* know which one is active, when we have highres/dyntick on
*/
- sum = per_cpu(irq_stat, cpu).apic_timer_irqs + kstat_cpu(cpu).irqs[0];
+ sum = per_cpu(irq_stat, cpu).apic_timer_irqs +
+ per_cpu(irq_stat, cpu).irq0_irqs;

/* if the none of the timers isn't firing, this cpu isn't doing much */
if (!touched && last_irq_sums[cpu] == sum) {
diff --git a/arch/x86/kernel/time_32.c b/arch/x86/kernel/time_32.c
index 19a6c67..3571d0a 100644
--- a/arch/x86/kernel/time_32.c
+++ b/arch/x86/kernel/time_32.c
@@ -157,6 +157,9 @@ EXPORT_SYMBOL(profile_pc);
*/
irqreturn_t timer_interrupt(int irq, void *dev_id)
{
+ /* Keep nmi watchdog up to date */
+ per_cpu(irq_stat, cpu).irq0_irqs++;
+
#ifdef CONFIG_X86_IO_APIC
if (timer_ack) {
/*
diff --git a/include/asm-x86/hardirq_32.h b/include/asm-x86/hardirq_32.h
index ed7cf97..9188635 100644
--- a/include/asm-x86/hardirq_32.h
+++ b/include/asm-x86/hardirq_32.h
@@ -9,6 +9,7 @@ typedef struct {
unsigned long idle_timestamp;
unsigned int __nmi_count; /* arch dependent */
unsigned int apic_timer_irqs; /* arch dependent */
+ unsigned int irq0_irqs;
} ____cacheline_aligned irq_cpustat_t;

DECLARE_PER_CPU(irq_cpustat_t, irq_stat);