2006-03-07 02:13:38

by Masanori Goto

[permalink] [raw]
Subject: [PATCH] x86: Fix i386 nmi_watchdog that does not trigger die_nmi

It fixes i386 nmi_watchdog that does not meet watchdog timeout
condition. It does not hit die_nmi when it should be triggered,
because the current nmi_watchdog_tick in arch/i386/kernel/nmi.c never
count up alert_counter like this:

void nmi_watchdog_tick (struct pt_regs * regs) {
if (last_irq_sums[cpu] == sum) {
alert_counter[cpu]++; <- count up alert_counter, but
if (alert_counter[cpu] == 5*nmi_hz)
die_nmi(regs, "NMI Watchdog detected LOCKUP");
alert_counter[cpu] = 0; <- reset alert_counter

This patch changes it back to the previous and working version.
Tested with 2.6.15. It's also OK for 2.6.16-rc5.

This was found and originally written by Kohta NAKASHIMA.

-- gotom

Signed-Off-By: GOTO Masanori <[email protected]>
---

--- linux-2.6.15/arch/i386/kernel/nmi.c.gotom 2006-03-02 17:52:49.021365056 +0900
+++ linux-2.6.15/arch/i386/kernel/nmi.c 2006-03-02 17:53:19.939664760 +0900
@@ -544,7 +544,7 @@ void nmi_watchdog_tick (struct pt_regs *
* die_nmi will return ONLY if NOTIFY_STOP happens..
*/
die_nmi(regs, "NMI Watchdog detected LOCKUP");
-
+ } else {
last_irq_sums[cpu] = sum;
alert_counter[cpu] = 0;
}


2006-03-07 03:46:49

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] x86: Fix i386 nmi_watchdog that does not trigger die_nmi

GOTO Masanori <[email protected]> writes:

> It fixes i386 nmi_watchdog that does not meet watchdog timeout
> condition. It does not hit die_nmi when it should be triggered,
> because the current nmi_watchdog_tick in arch/i386/kernel/nmi.c never
> count up alert_counter like this:
>
> void nmi_watchdog_tick (struct pt_regs * regs) {
> if (last_irq_sums[cpu] == sum) {
> alert_counter[cpu]++; <- count up alert_counter, but
> if (alert_counter[cpu] == 5*nmi_hz)
> die_nmi(regs, "NMI Watchdog detected LOCKUP");
> alert_counter[cpu] = 0; <- reset alert_counter
>
> This patch changes it back to the previous and working version.
> Tested with 2.6.15. It's also OK for 2.6.16-rc5.
>
> This was found and originally written by Kohta NAKASHIMA.

Oops. Looks quite bad. Real 2.6.16 candidate I guess.

-Andi

>
> -- gotom
>
> Signed-Off-By: GOTO Masanori <[email protected]>
> ---
>
> --- linux-2.6.15/arch/i386/kernel/nmi.c.gotom 2006-03-02 17:52:49.021365056 +0900
> +++ linux-2.6.15/arch/i386/kernel/nmi.c 2006-03-02 17:53:19.939664760 +0900
> @@ -544,7 +544,7 @@ void nmi_watchdog_tick (struct pt_regs *
> * die_nmi will return ONLY if NOTIFY_STOP happens..
> */
> die_nmi(regs, "NMI Watchdog detected LOCKUP");
> -
> + } else {
> last_irq_sums[cpu] = sum;
> alert_counter[cpu] = 0;
> }