2007-09-01 21:00:29

by Daniel Walker

[permalink] [raw]
Subject: [PATCH 1/1] i386: fix a hang on stuck nmi watchdog

In the case when an nmi gets stucks the endflag stays equal to zero. This
causes the busy looping on other cpus to continue, even tho the nmi test
is done.

On my machine with out the change below the system would hang right after
check_nmi_watchdog(). The change below just sets endflag prior to checking
if the test was successful or not.

Signed-off-by: Daniel Walker <[email protected]>

---
arch/i386/kernel/nmi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.22/arch/i386/kernel/nmi.c
===================================================================
--- linux-2.6.22.orig/arch/i386/kernel/nmi.c
+++ linux-2.6.22/arch/i386/kernel/nmi.c
@@ -115,12 +115,12 @@ static int __init check_nmi_watchdog(voi
atomic_dec(&nmi_active);
}
}
+ endflag = 1;
if (!atomic_read(&nmi_active)) {
kfree(prev_nmi_count);
atomic_set(&nmi_active, -1);
return -1;
}
- endflag = 1;
printk("OK.\n");

/* now that we know it works we can reduce NMI frequency to
--

--


2007-09-01 21:33:28

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 1/1] i386: fix a hang on stuck nmi watchdog

Daniel,

Thanks for your help tracking down this bug. Maybe we can close
the bugzilla report now.

On Sat, Sep 01, 2007 at 01:54:17PM -0700, Daniel Walker wrote:
> In the case when an nmi gets stucks the endflag stays equal to zero. This
> causes the busy looping on other cpus to continue, even tho the nmi test
> is done.
>
> On my machine with out the change below the system would hang right after
> check_nmi_watchdog(). The change below just sets endflag prior to checking
> if the test was successful or not.
>
> Signed-off-by: Daniel Walker <[email protected]>
>
> ---
> arch/i386/kernel/nmi.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> Index: linux-2.6.22/arch/i386/kernel/nmi.c
> ===================================================================
> --- linux-2.6.22.orig/arch/i386/kernel/nmi.c
> +++ linux-2.6.22/arch/i386/kernel/nmi.c
> @@ -115,12 +115,12 @@ static int __init check_nmi_watchdog(voi
> atomic_dec(&nmi_active);
> }
> }
> + endflag = 1;
> if (!atomic_read(&nmi_active)) {
> kfree(prev_nmi_count);
> atomic_set(&nmi_active, -1);
> return -1;
> }
> - endflag = 1;
> printk("OK.\n");
>
> /* now that we know it works we can reduce NMI frequency to
> --
>
> --

--

-Stephane

2007-09-01 21:37:28

by Daniel Walker

[permalink] [raw]
Subject: Re: [PATCH 1/1] i386: fix a hang on stuck nmi watchdog

On Sat, 2007-09-01 at 14:33 -0700, Stephane Eranian wrote:
> Daniel,
>
> Thanks for your help tracking down this bug. Maybe we can close
> the bugzilla report now.

No problem , and thanks for taking the time to dig into it ..

I'll close the bugzilla report.

Daniel

2007-09-01 21:45:20

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 1/1] i386: fix a hang on stuck nmi watchdog

On Saturday 01 September 2007 22:54:17 Daniel Walker wrote:
> In the case when an nmi gets stucks the endflag stays equal to zero. This
> causes the busy looping on other cpus to continue, even tho the nmi test
> is done.
>
> On my machine with out the change below the system would hang right after
> check_nmi_watchdog(). The change below just sets endflag prior to checking
> if the test was successful or not.
>
> Signed-off-by: Daniel Walker <[email protected]>

Added thanks. I guess it's .23 material.

-Andi

2007-09-01 22:38:16

by Daniel Walker

[permalink] [raw]
Subject: Re: [PATCH 1/1] i386: fix a hang on stuck nmi watchdog

On Sat, 2007-09-01 at 23:45 +0200, Andi Kleen wrote:
> On Saturday 01 September 2007 22:54:17 Daniel Walker wrote:
> > In the case when an nmi gets stucks the endflag stays equal to zero. This
> > causes the busy looping on other cpus to continue, even tho the nmi test
> > is done.
> >
> > On my machine with out the change below the system would hang right after
> > check_nmi_watchdog(). The change below just sets endflag prior to checking
> > if the test was successful or not.
> >
> > Signed-off-by: Daniel Walker <[email protected]>
>
> Added thanks. I guess it's .23 material.

putting it in 2.6.23 seems appropriate .. Stephane's patch might be good
for 2.6.23 too ..

Daniel