Andi & Andrew,
The "x86_64-switch-smp-bootup-over-to-new-cpu-hotplug-state.patch" in
2.6.12-rc2-mm3 appears to have broken the NMI watchdog. Specifically:
diff -puN arch/x86_64/kernel/nmi.c~x86_64-switch-smp-bootup-over-to-new-cpu-hotplug-state arch/x86_64/kernel/nmi.c
--- 25/arch/x86_64/kernel/nmi.c~x86_64-switch-smp-bootup-over-to-new-cpu-hotplug-state Thu Apr 7 15:15:01 2005
+++ 25-akpm/arch/x86_64/kernel/nmi.c Thu Apr 7 15:15:01 2005
@@ -133,12 +133,6 @@ static int __init check_nmi_watchdog (vo
mdelay((10*1000)/nmi_hz); // wait 10 ticks
for (cpu = 0; cpu < NR_CPUS; cpu++) {
-#ifdef CONFIG_SMP
- /* Check cpu_callin_map here because that is set
- after the timer is started. */
- if (!cpu_isset(cpu, cpu_callin_map))
- continue;
-#endif
if (cpu_pda[cpu].__nmi_count - counts[cpu] <= 5) {
printk("CPU#%d: NMI appears to be stuck (%d)!\n",
cpu,
This is wrong because in general the number of actual CPUs is _less_
than the number of configured CPUs (== NR_CPUS). Hence the code will
now check the NMI counts of non-existent CPUs, complain that they are
stuck, and disable the NMI watchdog. Actually the disablement is broken
in this case, but that's a different issue.
The error is easily reproducible by booting an SMP kernel on a UP box.
/Mikael
> This is wrong because in general the number of actual CPUs is _less_
> than the number of configured CPUs (== NR_CPUS). Hence the code will
> now check the NMI counts of non-existent CPUs, complain that they are
> stuck, and disable the NMI watchdog. Actually the disablement is broken
> in this case, but that's a different issue.
Yes, I know. I have it already fixed in my tree, together with a lot
of other NMI watchdog bugs (including some long standing ones inherited
from i386) I will post the full patchkit later.
Andrew, please dont apply any nmi watchdog changes for x86-64 right now.
I will fix the current breakage before .12
-Andi