Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751108AbWCKWIF (ORCPT ); Sat, 11 Mar 2006 17:08:05 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751240AbWCKWIF (ORCPT ); Sat, 11 Mar 2006 17:08:05 -0500 Received: from lucidpixels.com ([66.45.37.187]:51688 "EHLO lucidpixels.com") by vger.kernel.org with ESMTP id S1751108AbWCKWIE (ORCPT ); Sat, 11 Mar 2006 17:08:04 -0500 Date: Sat, 11 Mar 2006 17:07:54 -0500 (EST) From: Justin Piszcz X-X-Sender: jpiszcz@p34 To: Marco Roeland cc: linux-kernel@vger.kernel.org Subject: Re: MCE Errors, Bad CPU, Memory or Motherboard? In-Reply-To: <20060311095113.GA32274@fiberbit.xs4all.nl> Message-ID: References: <20060311095113.GA32274@fiberbit.xs4all.nl> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1692 Lines: 38 Thanks for this response. On Sat, 11 Mar 2006, Marco Roeland wrote: > On Friday March 10th 2006 Justin Piszcz wrote: > >> This is the first time I have seen an MCE error, googling the EIP value at >> the time of the panic does not return any useful results. >> >> Does anyone know whether it is the CPU or MEMORY that is bad in this >> machine? As it shows some problems with BANK4; however, if the CPU is >> bad, then it is possible to get all sorts of unpredictable results, right? > > If it's the memory you can try swapping RAM sticks or taking them out > altogether, if you have more than one. > > Recently I saw these kind of MCE events also happen on a 3 year old > Athlon 2000 machine, which was on 24/7. There it finally turned out that > I ran an "athcool" program which enables powersaving mode on idle state. > Although this saves energy and reduces the fan noise, switching back to > work mode apparently wasn't fast enough anymore after these years. After > disabling the powersaving the MCE errors were gone. > > The difference between these MCE's on memory or on the ACPI powersaving > (if you use that at all) is really simple to diagnose: for memory they > will occur on a very busy system, whereas the powersaving errors will > occur on a completely idle system. > > But you might have real hardware problems of course. Then use the usual > routines: memtest(86), check the cooling, power surges, etc. > -- > Marco Roeland > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/