(Note: Patch also attached because the inline version is certain to get
line wrapped.)
Don't call nmi_watchdog_tick() when this isn't enabled.
Signed-off-by: Jan Beulich <[email protected]>
diff -Npru 2.6.13/arch/i386/kernel/traps.c
2.6.13-i386-watchdog-active/arch/i386/kernel/traps.c
--- 2.6.13/arch/i386/kernel/traps.c 2005-08-29 01:41:01.000000000
+0200
+++
2.6.13-i386-watchdog-active/arch/i386/kernel/traps.c 2005-09-01
14:04:35.000000000 +0200
@@ -611,7 +611,7 @@ static void default_do_nmi(struct pt_reg
* Ok, so this is none of the documented NMI sources,
* so it must be the NMI watchdog.
*/
- if (nmi_watchdog) {
+ if (nmi_watchdog && nmi_active > 0) {
nmi_watchdog_tick(regs);
return;
}
diff -Npru 2.6.13/include/asm-i386/apic.h
2.6.13-i386-watchdog-active/include/asm-i386/apic.h
--- 2.6.13/include/asm-i386/apic.h 2005-08-29 01:41:01.000000000
+0200
+++
2.6.13-i386-watchdog-active/include/asm-i386/apic.h 2005-09-01
11:32:11.000000000 +0200
@@ -125,6 +125,7 @@ extern void enable_APIC_timer(void);
extern void enable_NMI_through_LVT0 (void * dummy);
extern unsigned int nmi_watchdog;
+extern int nmi_active;
#define NMI_NONE 0
#define NMI_IO_APIC 1
#define NMI_LOCAL_APIC 2
On Thu, 8 Sep 2005, Jan Beulich wrote:
> diff -Npru 2.6.13/arch/i386/kernel/traps.c
> 2.6.13-i386-watchdog-active/arch/i386/kernel/traps.c
> --- 2.6.13/arch/i386/kernel/traps.c 2005-08-29 01:41:01.000000000
> +0200
> +++
> 2.6.13-i386-watchdog-active/arch/i386/kernel/traps.c 2005-09-01
> 14:04:35.000000000 +0200
> @@ -611,7 +611,7 @@ static void default_do_nmi(struct pt_reg
> * Ok, so this is none of the documented NMI sources,
> * so it must be the NMI watchdog.
> */
> - if (nmi_watchdog) {
> + if (nmi_watchdog && nmi_active > 0) {
> nmi_watchdog_tick(regs);
> return;
> }
I dislike this patch, and it's not your fault. The reason being is that
there are a few systems (i have one such) which always reports "CPU stuck"
during watchdog setup but then eventually the watchdog starts ticking
during runtime. Unfortunately if this gets in you'll get lots of the
following;
Uhhuh. NMI received for unknown reason 00 on CPU 1.
Dazed and confused, but trying to continue
Do you have a strange power saving mode enabled?
Uhhuh. NMI received for unknown reason 21 on CPU 0.
So, before the patch can go in, the "CPU stuck" systems probably need
looking at. Since i have one, i'll have a look.
Thanks,
Zwane
Ps. why is NMI watchdog perpetually broken?