Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S267165AbUJRRWT (ORCPT ); Mon, 18 Oct 2004 13:22:19 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S267170AbUJRRWT (ORCPT ); Mon, 18 Oct 2004 13:22:19 -0400 Received: from fire.osdl.org ([65.172.181.4]:46521 "EHLO fire-1.osdl.org") by vger.kernel.org with ESMTP id S267165AbUJRRWO (ORCPT ); Mon, 18 Oct 2004 13:22:14 -0400 Message-ID: <4173F9A7.2090504@osdl.org> Date: Mon, 18 Oct 2004 10:13:11 -0700 From: "Randy.Dunlap" Organization: OSDL User-Agent: Mozilla Thunderbird 0.8 (X11/20040913) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Marc Bevand CC: linux-kernel@vger.kernel.org, ak@suse.de Subject: Re: NMI watchdog detected lockup References: <4172F91D.8090109@osdl.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3766 Lines: 101 Marc Bevand wrote: > On 2004-10-17, Randy.Dunlap wrote: > | > | I'm seeing this often during a kernel build on AIC79xx. > | I did one kernel build on SATA without seeing this. > | This is on a dual-Opteron IBM Workstation A with > | 2 GB RAM, SATA, & SCSI. > | [...] > | NMI Watchdog detected LOCKUP on CPU0, registers: > | [...] > > You are not the first one to observe frequent watchdog timeout > lockup on dual Opteron systems during intense I/O operations, > see this thread: > > http://thread.gmane.org/gmane.linux.ide/1933 > > Note: this does *not* seem to be SATA-related. Hi, Zwane suspected NMI spikes and advised me to disable nmi_watchdog (nmi_watchdog=0). After doing that, a kernel build completes successfully, although with many messages like these: Uhhuh. NMI received for unknown reason 21. Dazed and confused, but trying to continue Do you have a strange power saving mode enabled? Uhhuh. NMI received for unknown reason 31. Dazed and confused, but trying to continue Do you have a strange power saving mode enabled? Dazed and confused, but trying to continue Do you have a strange power saving mode enabled? Uhhuh. NMI received for unknown reason 31. Dazed and confused, but trying to continue Do you have a strange power saving mode enabled? Uhhuh. NMI received for unknown reason 31. Uhhuh. NMI received for unknown reason 31. Dazed and confused, but trying to continue Do you have a strange power saving mode enabled? Uhhuh. NMI received for unknown reason 21. Dazed and confused, but trying to continue Do you have a strange power saving mode enabled? Uhhuh. NMI received for unknown reason 21. Dazed and confused, but trying to continue Do you have a strange power saving mode enabled? Dazed and confused, but trying to continue Do you have a strange power saving mode enabled? Uhhuh. NMI received for unknown reason 21. Dazed and confused, but trying to continue Do you have a strange power saving mode enabled? I've also seen reason == 20. This is on 2.6.9-rc4. Andi, any ideas? I've had several hundred of these messages, with only 1 dazed & confused that did not continue OK. Adding show_registers(regs); in the NMI handler points to default_idle(): Dazed and confused, but trying to continue Do you have a strange power saving mode enabled? CPU 0 Modules linked in: aic79xx usbserial aic7xxx ohci1394 ieee1394 Pid: 0, comm: swapper Not tainted 2.6.9-rc4 RIP: 0010:[] {default_idle+32} RSP: 0018:ffffffff805e3fb8 EFLAGS: 00000246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000018 RDX: ffffffff8010f5d0 RSI: ffffffff80472b00 RDI: 0000010001e11b20 RBP: 0000000000000000 R08: 00000000ffffffff R09: 0000000000000001 R10: 0000000000000080 R11: ffffffff80562ae0 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 0000002a95b2e4c0(0000) GS:ffffffff805de800(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000002a955a6000 CR3: 0000000000101000 CR4: 00000000000006e0 Process swapper (pid: 0, threadinfo ffffffff805e2000, task ffffffff80472b00) Stack: ffffffff8010f9fd 0000000000000000 ffffffff805e56e5 0000000000000000 ffffffff8055fbe0 0000000000000800 ffffffff805e51e0 0000000000000404 0000000000000000 Call Trace:{cpu_idle+29} {start_kernel+421} {_sinittext+480} Code: c3 fb f3 c3 66 66 66 90 66 66 66 90 66 66 66 90 48 83 ec 38 -- ~Randy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/