Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753452AbXI0AU0 (ORCPT ); Wed, 26 Sep 2007 20:20:26 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751118AbXI0AUN (ORCPT ); Wed, 26 Sep 2007 20:20:13 -0400 Received: from NEUROSIS.MIT.EDU ([18.95.3.133]:44917 "EHLO neurosis.jim.sh" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751855AbXI0AUM (ORCPT ); Wed, 26 Sep 2007 20:20:12 -0400 X-Greylist: delayed 1902 seconds by postgrey-1.27 at vger.kernel.org; Wed, 26 Sep 2007 20:20:12 EDT Date: Wed, 26 Sep 2007 19:48:14 -0400 From: Jim Paris To: AndrewL733 , Randy Dunlap , Alan Cox , samson yeung Cc: linux-kernel , bbermack@alum.mit.edu, Justin Mazzola Paluska Subject: Re: NMI error and Intel S5000PSL Motherboards Message-ID: <20070926234814.GA27743@jim.sh> References: <20070926135229.67edd4eb.rdunlap@xenotime.net> <46FA3092.70108@aol.com> <20070926121655.55334682@the-village.bc.nu> <46FA3092.70108@aol.com> <20070925215819.c993e2d8.rdunlap@xenotime.net> <46FA3092.70108@aol.com> <20070925195946.cef5ae9d.rdunlap@xenotime.net> <46FA3092.70108@aol.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070926135229.67edd4eb.rdunlap@xenotime.net> <20070926121655.55334682@the-village.bc.nu> <20070925215819.c993e2d8.rdunlap@xenotime.net> <20070925195946.cef5ae9d.rdunlap@xenotime.net> <46FA3092.70108@aol.com> User-Agent: Mutt/1.5.13 (2006-08-11) Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2334 Lines: 54 Hello, > We have about 100 servers based on Intel S5000PSL-SATA motherboards. > They have been running for anywhere between 1 and 10 months. For the > past few months, after updating them all to the 2.6.20.15 kernel > (because of a bug in the 2.6.18 kernel), we are seeing some strange NMI > errors. For example: > > Aug 29 09:02:10 master kernel: Uhhuh. NMI received for unknown reason 30. > Aug 29 09:02:10 master kernel: Do you have a strange power saving mode enabled? > Aug 29 09:02:10 master kernel: Dazed and confused, but trying to continue I'm also working with Andrew and Samson. It seems that the cause of the problem is CONFIG_PCIEAER, which was introduced after 2.6.18 and defaults to y. With CONFIG_PCIEAER=n, scanpci works fine with no errors. This is the workaround that they'll likely use for now. With CONFIG_PCIEAER=y, scanpci always triggers the NMI error. The option aerdriver.forceload=1 has no effect. The related dmesg output at boot is: Evaluate _OSC Set fails. Status = 0x0005 Evaluate _OSC Set fails. Status = 0x0005 aer_init: AER service init fails - Run ACPI _OSC fails aer: probe of 0000:00:02.0:pcie01 failed with error 2 aer_init: AER service init fails - No ACPI _OSC support aer: probe of 0000:00:03.0:pcie01 failed with error 1 Evaluate _OSC Set fails. Status = 0x0005 Evaluate _OSC Set fails. Status = 0x0005 aer_init: AER service init fails - Run ACPI _OSC fails aer: probe of 0000:00:04.0:pcie01 failed with error 2 Evaluate _OSC Set fails. Status = 0x0005 Evaluate _OSC Set fails. Status = 0x0005 aer_init: AER service init fails - Run ACPI _OSC fails aer: probe of 0000:00:05.0:pcie01 failed with error 2 Evaluate _OSC Set fails. Status = 0x0005 Evaluate _OSC Set fails. Status = 0x0005 aer_init: AER service init fails - Run ACPI _OSC fails aer: probe of 0000:00:06.0:pcie01 failed with error 2 aer_init: AER service init fails - No ACPI _OSC support aer: probe of 0000:00:07.0:pcie01 failed with error 1 Full dmesg, lspci, and ACPI DSDT are available here: http://jim.sh/~jim/tmp/nmi/ -jim - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/