2003-07-23 17:28:37

by Ville Herva

[permalink] [raw]
Subject: [PATCH] NMI watchdog documentation

Documentation/nmi-watchdoc.txt doesn't actually tell what options need to be
enabled in kernel config in order to use NMI watchdog. I for one found it
confusing.

I vaguely recall someone posted a similar patch some time ago, but it still
doesn't seem to be present in 2.4 or 2.6-test.

Andi: what about x86-64 - does it have something similar that should be
mentioned?


-- v --

[email protected]

--- linux/Documentation/nmi_watchdog.txt Tue Sep 18 09:03:09 2001
+++ linux~/Documentation/nmi_watchdog.txt Wed Jul 23 20:25:42 2003
@@ -8,9 +8,20 @@
which get executed even if the system is otherwise locked up hard).
This can be used to debug hard kernel lockups. By executing periodic
NMI interrupts, the kernel can monitor whether any CPU has locked up,
-and print out debugging messages if so. You must enable the NMI
-watchdog at boot time with the 'nmi_watchdog=n' boot parameter. Eg.
-the relevant lilo.conf entry:
+and print out debugging messages if so.
+
+In order to use the NMI watchdoc, you need to have APIC support in your
+kernel. For SMP kernels, APIC support gets compiled in automatically. For
+UP, enable either CONFIG_X86_UP_APIC (Processor type and features -> Local
+APIC support on uniprocessors) or CONFIG_X86_UP_IOAPIC (Processor type and
+features -> IO-APIC support on uniprocessors) in your kernel config.
+CONFIG_X86_UP_APIC is for uniprocessor machines without an IO-APIC.
+CONFIG_X86_UP_IOAPIC is for uniprocessor with an IO-APIC. [Note: certain
+kernel debugging options, such as Kernel Stack Meter or Kernel Tracer,
+may implicitly disable the NMI watchdog.]
+
+To actually enable the NMI watchdog, use the 'nmi_watchdog=N' boot
+parameter. Eg. the relevant lilo.conf entry:

append="nmi_watchdog=1"


2003-07-28 17:38:29

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] NMI watchdog documentation

On Wed, 23 Jul 2003 20:43:25 +0300
Ville Herva <[email protected]> wrote:

> Documentation/nmi-watchdoc.txt doesn't actually tell what options need to be
> enabled in kernel config in order to use NMI watchdog. I for one found it
> confusing.
>
> I vaguely recall someone posted a similar patch some time ago, but it still
> doesn't seem to be present in 2.4 or 2.6-test.
>
> Andi: what about x86-64 - does it have something similar that should be
> mentioned?

x86-64 is the same, except APIC is always compiled in and the nmi watchdog is
always enabled with perfctr mode. mode=2 seems to also not work correctly currently.

However one caveat (even for i386): when you use perfctr mode 1 you lose the first
performance register which you may need for other things.

-Andi

2003-07-28 19:21:55

by Ville Herva

[permalink] [raw]
Subject: Re: [PATCH] NMI watchdog documentation

On Mon, Jul 28, 2003 at 07:53:42PM +0200, you [Andi Kleen] wrote:
>
> x86-64 is the same, except APIC is always compiled in and the nmi watchdog is
> always enabled with perfctr mode. mode=2 seems to also not work correctly currently.
>
> However one caveat (even for i386): when you use perfctr mode 1 you lose the first
> performance register which you may need for other things.

Thanks.

So, is something like the following ok by you (patch is relative to -test2
nmi-watchdog.txt)? If it is, I'll send it to Linus and Marcelo.


-- v --

[email protected]

--- /usr/src/linux/Documentation/nmi_watchdog.txt Mon Jul 28 22:10:18 2003
+++ /usr/src/linux~/Documentation/nmi_watchdog.txt Mon Jul 28 22:18:10 2003
@@ -1,9 +1,11 @@

-Is your ix86 system locking up unpredictably? No keyboard activity, just
+[NMI watchdog is available for x86 and x86-64 architectures]
+
+Is your system locking up unpredictably? No keyboard activity, just
a frustrating complete hard lockup? Do you want to help us debugging
such lockups? If all yes then this document is definitely for you.

-On Intel and similar ix86 type hardware there is a feature that enables
+On many x86/x86-64 type hardware there is a feature that enables
us to generate 'watchdog NMI interrupts'. (NMI: Non Maskable Interrupt
which get executed even if the system is otherwise locked up hard).
This can be used to debug hard kernel lockups. By executing periodic
@@ -20,6 +22,13 @@
kernel debugging options such as Kernel Stack Meter or Kernel Tracer
may implicitly disable NMI watchdog.]

+For x86-64, the needed APIC is always compiled in, and the NMI watchdog is
+always enabled with perfctr mode. Currently, mode=2 does not work on x86-64.
+
+Using NMI watchdog (in mode=1) needs the first performance register, so you
+can't use it for other purposes (such as high precision performance
+profiling.)
+
To actually enable the NMI watchdog, use the 'nmi_watchdog=N' boot
parameter. Eg. the relevant lilo.conf entry:

2003-07-29 10:37:55

by Mikael Pettersson

[permalink] [raw]
Subject: Re: [PATCH] NMI watchdog documentation

On Mon, 28 Jul 2003 19:53:42 +0200, Andi Kleen wrote:
>On Wed, 23 Jul 2003 20:43:25 +0300
>Ville Herva <[email protected]> wrote:
>
>> Documentation/nmi-watchdoc.txt doesn't actually tell what options need to be
>> enabled in kernel config in order to use NMI watchdog. I for one found it
>> confusing.
>>
>> I vaguely recall someone posted a similar patch some time ago, but it still
>> doesn't seem to be present in 2.4 or 2.6-test.
>>
>> Andi: what about x86-64 - does it have something similar that should be
>> mentioned?
>
>x86-64 is the same, except APIC is always compiled in and the nmi watchdog is
>always enabled with perfctr mode. mode=2 seems to also not work correctly currently.
>
>However one caveat (even for i386): when you use perfctr mode 1 you lose the first
>performance register which you may need for other things.

Andi, you have the numbers mixed up. mode 1 is I/O-APIC, mode 2 is local APIC,
and x86-64 defaults nmi_watchdog to I/O-APIC mode.
Now, is it the I/O-APIC or local APIC watchdog that doesn't work in x86-64?

/Mikael

2003-07-29 16:08:13

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] NMI watchdog documentation

> Andi, you have the numbers mixed up. mode 1 is I/O-APIC, mode 2 is local APIC,
> and x86-64 defaults nmi_watchdog to I/O-APIC mode.
> Now, is it the I/O-APIC or local APIC watchdog that doesn't work in x86-64?

Right, 1 and 2 need to be exchanged. Anyways local apic mode does not seem
to work, the kernel always reportss "NMI stuck" at bootup.
IO APIC mode for is default.

I have not tested if it works with a 32bit kernel on an Opteron box.

-Andi

2003-07-29 17:58:36

by Ville Herva

[permalink] [raw]
Subject: Re: [PATCH] NMI watchdog documentation

On Tue, Jul 29, 2003 at 06:06:30PM +0200, you [Andi Kleen] wrote:
> > Andi, you have the numbers mixed up. mode 1 is I/O-APIC, mode 2 is local APIC,
> > and x86-64 defaults nmi_watchdog to I/O-APIC mode.
> > Now, is it the I/O-APIC or local APIC watchdog that doesn't work in x86-64?
>
> Right, 1 and 2 need to be exchanged. Anyways local apic mode does not seem
> to work, the kernel always reportss "NMI stuck" at bootup.
> IO APIC mode for is default.
>
> I have not tested if it works with a 32bit kernel on an Opteron box.

Ok, I'll send the following to Linus and Marcelo unless you object.


-- v --

[email protected]

--- /usr/src/linux/Documentation/nmi_watchdog.txt Mon Jul 28 22:10:18 2003
+++ /usr/src/linux~/Documentation/nmi_watchdog.txt Mon Jul 28 22:18:10 2003
@@ -1,9 +1,11 @@

-Is your ix86 system locking up unpredictably? No keyboard activity, just
+[NMI watchdog is available for x86 and x86-64 architectures]
+
+Is your system locking up unpredictably? No keyboard activity, just
a frustrating complete hard lockup? Do you want to help us debugging
such lockups? If all yes then this document is definitely for you.

-On Intel and similar ix86 type hardware there is a feature that enables
+On many x86/x86-64 type hardware there is a feature that enables
us to generate 'watchdog NMI interrupts'. (NMI: Non Maskable Interrupt
which get executed even if the system is otherwise locked up hard).
This can be used to debug hard kernel lockups. By executing periodic
@@ -20,6 +22,13 @@
kernel debugging options such as Kernel Stack Meter or Kernel Tracer
may implicitly disable NMI watchdog.]

+For x86-64, the needed APIC is always compiled in, and the NMI watchdog is
+always enabled with perfctr mode. Currently, mode=1 does not work on x86-64.
+
+Using NMI watchdog (in mode=2) needs the first performance register, so you
+can't use it for other purposes (such as high precision performance
+profiling.)
+
To actually enable the NMI watchdog, use the 'nmi_watchdog=N' boot
parameter. Eg. the relevant lilo.conf entry:

2003-07-30 19:24:50

by Mikael Pettersson

[permalink] [raw]
Subject: Re: [PATCH] NMI watchdog documentation

On Tue, 29 Jul 2003 20:53:19 +0300, Ville Herva wrote:
>On Tue, Jul 29, 2003 at 06:06:30PM +0200, you [Andi Kleen] wrote:
>> > Andi, you have the numbers mixed up. mode 1 is I/O-APIC, mode 2 is local APIC,
>> > and x86-64 defaults nmi_watchdog to I/O-APIC mode.
>> > Now, is it the I/O-APIC or local APIC watchdog that doesn't work in x86-64?
>>
>> Right, 1 and 2 need to be exchanged. Anyways local apic mode does not seem
>> to work, the kernel always reportss "NMI stuck" at bootup.
>> IO APIC mode for is default.
...
>+For x86-64, the needed APIC is always compiled in, and the NMI watchdog is
>+always enabled with perfctr mode. Currently, mode=1 does not work on x86-64.

Didn't Andi just say it's the other way around? nmi_watchdog=1 (I/O-APIC)
by default since nmi_watchdog=2 (local APIC) doesn't work.

/Mikael

2003-07-30 19:19:53

by Mikael Pettersson

[permalink] [raw]
Subject: Re: [PATCH] NMI watchdog documentation

On Tue, 29 Jul 2003 18:06:30 +0200, Andi Kleen wrote:
>Right, 1 and 2 need to be exchanged. Anyways local apic mode does not seem
>to work, the kernel always reportss "NMI stuck" at bootup.
>IO APIC mode for is default.

That's strange. I've tested perfctr-generated interrupts through
the local APIC on Opteron, and they work with the perfctr driver.

Two things you might want to test:
- In case the unofficial event 0x76 really doesn't work in your
version of the chip, try this event specifier instead: it
creates a clock-like event using an inverted threshold approach.
I've tested this on K8 and P6 with the perfctr driver. The event
code (0xC0) is immaterial, 0x00 and 0xFF work equally well.

--- linux-2.6.0-test2/arch/x86_64/kernel/nmi.c.~1~ 2003-07-03 12:32:44.000000000 +0200
+++ linux-2.6.0-test2/arch/x86_64/kernel/nmi.c 2003-07-30 20:46:21.412657728 +0200
@@ -51,7 +51,7 @@
#define K7_EVNTSEL_OS (1 << 17)
#define K7_EVNTSEL_USR (1 << 16)
#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING 0x76
-#define K7_NMI_EVENT K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING
+#define K7_NMI_EVENT (0xC0 | (1<<23) | (0xFF << 24))

#define P6_EVNTSEL0_ENABLE (1 << 22)
#define P6_EVNTSEL_INT (1 << 20)

- My perfctr driver routes interrupts through LVTPC programmed for
Fixed delivery mode. Maybe the NMI delivery mode is broken. You
could try changing the NMI watchdog to use a new vector and Fixed
delivery mode, just to see if the watchdog starts ticking.

/Mikael

2003-07-30 19:41:14

by Ville Herva

[permalink] [raw]
Subject: Re: [PATCH] NMI watchdog documentation

On Wed, Jul 30, 2003 at 09:20:33PM +0200, you [Mikael Pettersson] wrote:
> On Tue, 29 Jul 2003 20:53:19 +0300, Ville Herva wrote:
> >On Tue, Jul 29, 2003 at 06:06:30PM +0200, you [Andi Kleen] wrote:
> >> > Andi, you have the numbers mixed up. mode 1 is I/O-APIC, mode 2 is local APIC,
> >> > and x86-64 defaults nmi_watchdog to I/O-APIC mode.
> >> > Now, is it the I/O-APIC or local APIC watchdog that doesn't work in x86-64?
> >>
> >> Right, 1 and 2 need to be exchanged. Anyways local apic mode does not seem
> >> to work, the kernel always reportss "NMI stuck" at bootup.
> >> IO APIC mode for is default.
> ...
> >+For x86-64, the needed APIC is always compiled in, and the NMI watchdog is
> >+always enabled with perfctr mode. Currently, mode=1 does not work on x86-64.
>
> Didn't Andi just say it's the other way around? nmi_watchdog=1 (I/O-APIC)
> by default since nmi_watchdog=2 (local APIC) doesn't work.

Ok, you got me confused (thankfully I didn't submit anything for inclusion
yet. :)

Initially, Andi said:

http://marc.theaimsgroup.com/?l=linux-kernel&m=105941508314399&w=2
> x86-64 is the same, except APIC is always compiled in and the nmi watchdog
> is always enabled with perfctr mode. mode=2 seems to also not work
> correctly currently.
>
> However one caveat (even for i386): when you use perfctr mode 1 you lose
> the first performance register which you may need for other things.

To which I replied:
http://marc.theaimsgroup.com/?l=linux-kernel&m=105942026020567&w=2
> So, is something like the following ok by you
>
> +For x86-64, the needed APIC is always compiled in, and the NMI watchdog is
> +always enabled with perfctr mode. Currently, mode=2 does not work on x86-64.
> +
> +Using NMI watchdog (in mode=1) needs the first performance register, so you
> +can't use it for other purposes (such as high precision performance
> +profiling.)

But you pointed out it was the other way around:
http://marc.theaimsgroup.com/?l=linux-kernel&m=105947532631384&w=2
> Andi, you have the numbers mixed up. mode 1 is I/O-APIC, mode 2 is local
> APIC, and x86-64 defaults nmi_watchdog to I/O-APIC mode. Now, is it the
> I/O-APIC or local APIC watchdog that doesn't work in x86-64?

And Andi agreed:
http://marc.theaimsgroup.com/?l=linux-kernel&m=105949540722325&w=2
> Right, 1 and 2 need to be exchanged. Anyways local apic mode does not seem
> to work, the kernel always reportss "NMI stuck" at bootup. IO APIC mode
> for is default.

So I proposed (blindly exchanging the numbers):
http://marc.theaimsgroup.com/?l=linux-kernel&m=105950174531125&w=2
> +For x86-64, the needed APIC is always compiled in, and the NMI watchdog is
> +always enabled with perfctr mode. Currently, mode=1 does not work on x86-64.
> +
> +Using NMI watchdog (in mode=2) needs the first performance register, so you
> +can't use it for other purposes (such as high precision performance
> +profiling.)

So... Should it be something like:

+For x86-64, the needed APIC is always compiled in, and the NMI watchdog is
+always enabled with perctr mode. Currently, mode=2 (local APIC) does not
+work on x86-64. IO APIC mode (mode=1) is the default. Using NMI watchdog
+(mode=1) needs the first performance register, so you can't use it for
+other purposes (such as high precision performance profiling.)

(Is the last sentence only valid for x86-64?)


-- v --

[email protected]

2003-07-30 22:53:09

by Mikael Pettersson

[permalink] [raw]
Subject: Re: [PATCH] NMI watchdog documentation

On Wed, 30 Jul 2003 22:40:52 +0300, Ville Herva wrote:
>Ok, you got me confused (thankfully I didn't submit anything for inclusion
>yet. :)
...
>So... Should it be something like:
>
>+For x86-64, the needed APIC is always compiled in, and the NMI watchdog is
>+always enabled with perctr mode. Currently, mode=2 (local APIC) does not

always enabled with I/O-APIC mode.

>+work on x86-64. IO APIC mode (mode=1) is the default. Using NMI watchdog

Using local APIC

>+(mode=1) needs the first performance register, so you can't use it for

(mode=2)

>+other purposes (such as high precision performance profiling.)

>(Is the last sentence only valid for x86-64?)

No, it's true for both x86 and x86-64. However, both oprofile
and the perfctr driver disable the local APIC NMI watchdog, so
the statement is only true for other drivers that don't do this.

/Mikael

2003-07-31 05:44:58

by Ville Herva

[permalink] [raw]
Subject: Re: [PATCH] NMI watchdog documentation

On Thu, Jul 31, 2003 at 12:53:00AM +0200, you [Mikael Pettersson] wrote:
> On Wed, 30 Jul 2003 22:40:52 +0300, Ville Herva wrote:
> >Ok, you got me confused (thankfully I didn't submit anything for inclusion
> >yet. :)
> ...
> >So... Should it be something like:
> >
> >+For x86-64, the needed APIC is always compiled in, and the NMI watchdog is
> >+always enabled with perctr mode. Currently, mode=2 (local APIC) does not
>
> always enabled with I/O-APIC mode.
>
> >+work on x86-64. IO APIC mode (mode=1) is the default. Using NMI watchdog
>
> Using local APIC
>
> >+(mode=1) needs the first performance register, so you can't use it for
>
> (mode=2)
>
> >+other purposes (such as high precision performance profiling.)
>
> >(Is the last sentence only valid for x86-64?)
>
> No, it's true for both x86 and x86-64. However, both oprofile
> and the perfctr driver disable the local APIC NMI watchdog, so
> the statement is only true for other drivers that don't do this.

Uuh, sorry. Is the one below ok by you for submission to Linus and Marcelo?


-- v --

[email protected]

--- linux/Documentation/nmi_watchdog.txt Sun Jul 27 19:58:26 2003
+++ linux~/Documentation/nmi_watchdog.txt Tue Jul 29 21:08:01 2003
@@ -1,9 +1,11 @@

-Is your ix86 system locking up unpredictably? No keyboard activity, just
+[NMI watchdog is available for x86 and x86-64 architectures]
+
+Is your system locking up unpredictably? No keyboard activity, just
a frustrating complete hard lockup? Do you want to help us debugging
such lockups? If all yes then this document is definitely for you.

-On Intel and similar ix86 type hardware there is a feature that enables
+On many x86/x86-64 type hardware there is a feature that enables
us to generate 'watchdog NMI interrupts'. (NMI: Non Maskable Interrupt
which get executed even if the system is otherwise locked up hard).
This can be used to debug hard kernel lockups. By executing periodic
@@ -20,6 +22,15 @@
kernel debugging options, such as Kernel Stack Meter or Kernel Tracer,
may implicitly disable the NMI watchdog.]

+For x86-64, the needed APIC is always compiled in, and the NMI watchdog is
+always enabled with I/O-APIC mode (nmi_watchdog=1). Currently, local APIC
+mode (nmi_watchdog=2) does not work on x86-64.
+
+Using local APIC (nmi_watchdog=2) needs the first performance register, so
+you can't use it for other purposes (such as high precision performance
+profiling.) However, at least oprofile and the perfctr driver disable the
+local APIC NMI watchdog automatically.
+
To actually enable the NMI watchdog, use the 'nmi_watchdog=N' boot
parameter. Eg. the relevant lilo.conf entry:

2003-07-31 21:25:28

by Mikael Pettersson

[permalink] [raw]
Subject: Re: [PATCH] NMI watchdog documentation

On Thu, 31 Jul 2003 08:44:48 +0300, Ville Herva wrote:
>Uuh, sorry. Is the one below ok by you for submission to Linus and Marcelo?
>
>
>-- v --
>
>[email protected]
>
>--- linux/Documentation/nmi_watchdog.txt Sun Jul 27 19:58:26 2003
>+++ linux~/Documentation/nmi_watchdog.txt Tue Jul 29 21:08:01 2003
>@@ -1,9 +1,11 @@
>
>-Is your ix86 system locking up unpredictably? No keyboard activity, just
>+[NMI watchdog is available for x86 and x86-64 architectures]
>+
>+Is your system locking up unpredictably? No keyboard activity, just
> a frustrating complete hard lockup? Do you want to help us debugging
> such lockups? If all yes then this document is definitely for you.
>
>-On Intel and similar ix86 type hardware there is a feature that enables
>+On many x86/x86-64 type hardware there is a feature that enables
> us to generate 'watchdog NMI interrupts'. (NMI: Non Maskable Interrupt
> which get executed even if the system is otherwise locked up hard).
> This can be used to debug hard kernel lockups. By executing periodic
>@@ -20,6 +22,15 @@
> kernel debugging options, such as Kernel Stack Meter or Kernel Tracer,
> may implicitly disable the NMI watchdog.]
>
>+For x86-64, the needed APIC is always compiled in, and the NMI watchdog is
>+always enabled with I/O-APIC mode (nmi_watchdog=1). Currently, local APIC
>+mode (nmi_watchdog=2) does not work on x86-64.
>+
>+Using local APIC (nmi_watchdog=2) needs the first performance register, so
>+you can't use it for other purposes (such as high precision performance
>+profiling.) However, at least oprofile and the perfctr driver disable the
>+local APIC NMI watchdog automatically.
>+
> To actually enable the NMI watchdog, use the 'nmi_watchdog=N' boot
> parameter. Eg. the relevant lilo.conf entry:

Looks Ok to me.

/Mikael