Date: Mon, 18 Oct 2004 20:00:17 +0200
From: Andi Kleen <ak@suse.de>
To: "Randy.Dunlap" <rddunlap@osdl.org>
Cc: bevand_m@epita.fr, linux-kernel@vger.kernel.org, discuss@x86-64.org
Subject: Re: NMI watchdog detected lockup
Message-Id: <20041018200017.0098710d.ak@suse.de>
In-Reply-To: <4173F9A7.2090504@osdl.org>
References: <4172F91D.8090109@osdl.org>
	<ckv123$pcs$1@sea.gmane.org>
	<4173F9A7.2090504@osdl.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1593
Lines: 48

On Mon, 18 Oct 2004 10:13:11 -0700
"Randy.Dunlap" <rddunlap@osdl.org> wrote:

> Marc Bevand wrote:
> > On 2004-10-17, Randy.Dunlap <rddunlap@osdl.org> wrote:
> > | 
> > |  I'm seeing this often during a kernel build on AIC79xx.
> > |  I did one kernel build on SATA without seeing this.
> > |  This is on a dual-Opteron IBM Workstation A with
> > |  2 GB RAM, SATA, & SCSI.
> > |  [...]
> > |  NMI Watchdog detected LOCKUP on CPU0, registers:
> > |  [...]
> > 
> > You are not the first one to observe frequent watchdog timeout
> > lockup on dual Opteron systems during intense I/O operations,
> > see this thread:
> > 
> >   http://thread.gmane.org/gmane.linux.ide/1933
> > 
> > Note: this does *not* seem to be SATA-related.
> 
> Hi,
> 
> Zwane suspected NMI spikes and advised me to disable nmi_watchdog
> (nmi_watchdog=0).  After doing that, a kernel build completes
> successfully, although with many messages like these:
> 
> Uhhuh. NMI received for unknown reason 21.

Something on your system creates bogus NMI interrupts. What chipset
are you using exactly?

Sometimes chipsets can be programmed to raise NMIs when an PCI bus
error occurs. 

21 is the normal state (PIT timer running, but no errors logged) 

If you have an AMD 8131 it could be in theory erratum 54, but then
normally one of the error bits in reason should be set.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/