2005-04-19 10:54:54

by Mikael Pettersson

[permalink] [raw]
Subject: x86_64 NMI watchdog breakage in 2.6.12-rc2-mm3

Andi & Andrew,

The "x86_64-switch-smp-bootup-over-to-new-cpu-hotplug-state.patch" in
2.6.12-rc2-mm3 appears to have broken the NMI watchdog. Specifically:

diff -puN arch/x86_64/kernel/nmi.c~x86_64-switch-smp-bootup-over-to-new-cpu-hotplug-state arch/x86_64/kernel/nmi.c
--- 25/arch/x86_64/kernel/nmi.c~x86_64-switch-smp-bootup-over-to-new-cpu-hotplug-state Thu Apr 7 15:15:01 2005
+++ 25-akpm/arch/x86_64/kernel/nmi.c Thu Apr 7 15:15:01 2005
@@ -133,12 +133,6 @@ static int __init check_nmi_watchdog (vo
mdelay((10*1000)/nmi_hz); // wait 10 ticks

for (cpu = 0; cpu < NR_CPUS; cpu++) {
-#ifdef CONFIG_SMP
- /* Check cpu_callin_map here because that is set
- after the timer is started. */
- if (!cpu_isset(cpu, cpu_callin_map))
- continue;
-#endif
if (cpu_pda[cpu].__nmi_count - counts[cpu] <= 5) {
printk("CPU#%d: NMI appears to be stuck (%d)!\n",
cpu,

This is wrong because in general the number of actual CPUs is _less_
than the number of configured CPUs (== NR_CPUS). Hence the code will
now check the NMI counts of non-existent CPUs, complain that they are
stuck, and disable the NMI watchdog. Actually the disablement is broken
in this case, but that's a different issue.

The error is easily reproducible by booting an SMP kernel on a UP box.

/Mikael


2005-04-19 13:07:07

by Andi Kleen

[permalink] [raw]
Subject: Re: x86_64 NMI watchdog breakage in 2.6.12-rc2-mm3

> This is wrong because in general the number of actual CPUs is _less_
> than the number of configured CPUs (== NR_CPUS). Hence the code will
> now check the NMI counts of non-existent CPUs, complain that they are
> stuck, and disable the NMI watchdog. Actually the disablement is broken
> in this case, but that's a different issue.

Yes, I know. I have it already fixed in my tree, together with a lot
of other NMI watchdog bugs (including some long standing ones inherited
from i386) I will post the full patchkit later.

Andrew, please dont apply any nmi watchdog changes for x86-64 right now.
I will fix the current breakage before .12

-Andi