2008-03-30 20:58:06

by Gene Heskett

[permalink] [raw]
Subject: MCE going wild, 14 megs of this in the logs

Greetings;
Mar 30 16:46:42 coyote kernel: [469249.031832] MCE: The hardware reports a non
fatal, correctable incident occurred on CPU 0.
Mar 30 16:46:42 coyote kernel: [469249.031838] Bank 1: d400400000000152
Mar 30 16:46:42 coyote kernel: [469249.031841] MCE: The hardware reports a non
fatal, correctable incident occurred on CPU 0.
Mar 30 16:46:42 coyote kernel: [469249.031844] Bank 2: d40040000000017a

Its always the same 2 addresses reported, and every 15 seconds. So I have the
non-fatal part of MCE now turned off, & 2.6.24.4 rebuilding.

I saw this once before, and a nearly round the clock run of memtest86 gave my
memory a clean bill. Processor is an XP-2800, biostar mainboard with NForce2
chipset. Is this possibly a known artifact of this hardware?

--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
If I am elected, the concrete barriers around the WHITE HOUSE will be
replaced by tasteful foam replicas of ANN MARGARET!


2008-03-30 23:50:10

by Jiri Kosina

[permalink] [raw]
Subject: Re: MCE going wild, 14 megs of this in the logs

On Sun, 30 Mar 2008, Gene Heskett wrote:

> Mar 30 16:46:42 coyote kernel: [469249.031832] MCE: The hardware reports a non
> fatal, correctable incident occurred on CPU 0.
> Mar 30 16:46:42 coyote kernel: [469249.031838] Bank 1: d400400000000152
> Mar 30 16:46:42 coyote kernel: [469249.031841] MCE: The hardware reports a non
> fatal, correctable incident occurred on CPU 0.
> Mar 30 16:46:42 coyote kernel: [469249.031844] Bank 2: d40040000000017a
> Its always the same 2 addresses reported, and every 15 seconds. So I have the
> non-fatal part of MCE now turned off, & 2.6.24.4 rebuilding.

Hi,

you possible have some buggy hardware. I'd suggest running this through
mcelog, which should decode the MCE reason.

--
Jiri Kosina
SUSE Labs

2008-03-31 14:03:09

by Pavel Machek

[permalink] [raw]
Subject: Re: MCE going wild, 14 megs of this in the logs

On Sun 2008-03-30 16:57:53, Gene Heskett wrote:
> Greetings;
> Mar 30 16:46:42 coyote kernel: [469249.031832] MCE: The hardware reports a non
> fatal, correctable incident occurred on CPU 0.
> Mar 30 16:46:42 coyote kernel: [469249.031838] Bank 1: d400400000000152
> Mar 30 16:46:42 coyote kernel: [469249.031841] MCE: The hardware reports a non
> fatal, correctable incident occurred on CPU 0.
> Mar 30 16:46:42 coyote kernel: [469249.031844] Bank 2: d40040000000017a
>
> Its always the same 2 addresses reported, and every 15 seconds. So I have the
> non-fatal part of MCE now turned off, & 2.6.24.4 rebuilding.
>
> I saw this once before, and a nearly round the clock run of memtest86 gave my
> memory a clean bill. Processor is an XP-2800, biostar mainboard with NForce2
> chipset. Is this possibly a known artifact of this hardware?

Thats expected. If ECC can correct the problem, memtest will pass.

I had similar problems, and was told by AMD that I had cpu with bad L2 cache.

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html