2002-01-25 10:47:46

by Stephan von Krawczynski

[permalink] [raw]
Subject: Machine Check Exception ?

Hello,

I get the following message under 2.2.19:

Message from syslogd@diehard at Thu Jan 24 14:44:49 2002 ...
diehard kernel: CPU 0: Machine Check Exception: 0000000000000004

and the box is dead.
Can anybody please enlighten me what this means or what a possible
problem behind might be?

Thank you in advance
Stephan


2002-01-25 12:37:39

by Marcel Kunath

[permalink] [raw]
Subject: Re: Machine Check Exception ?

Whats the mobo? What do you mean the box is dead? Dead or deadly stalled on
boot?

mk

> > Hello,
>
> I get the following message under 2.2.19:
>
> Message from syslogd@diehard at Thu Jan 24 14:44:49 2002 ...
> diehard kernel: CPU 0: Machine Check Exception: 0000000000000004
>
> and the box is dead.
> Can anybody please enlighten me what this means or what a possible
> problem behind might be?
>
> Thank you in advance
> Stephan
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2002-01-25 12:41:09

by Denis Oliver Kropp

[permalink] [raw]
Subject: Re: Machine Check Exception ?

Quoting Stephan von Krawczynski ([email protected]):
> Hello,
>
> I get the following message under 2.2.19:
>
> Message from syslogd@diehard at Thu Jan 24 14:44:49 2002 ...
> diehard kernel: CPU 0: Machine Check Exception: 0000000000000004

Hi,

I had the same error sometimes, during heavy load (compiling).
I replaced a memory module by another and it didn't crash anymore,
until now at least ;)

--
Best regards,
Denis Oliver Kropp

.------------------------------------------.
| DirectFB - Hardware accelerated graphics |
| http://www.directfb.org/ |
"------------------------------------------"

convergence integrated media GmbH

2002-01-25 13:18:05

by Dave Jones

[permalink] [raw]
Subject: Re: Machine Check Exception ?

On Fri, Jan 25, 2002 at 11:47:18AM +0100, Stephan von Krawczynski wrote:
> Message from syslogd@diehard at Thu Jan 24 14:44:49 2002 ...
> diehard kernel: CPU 0: Machine Check Exception: 0000000000000004
>
> and the box is dead.
> Can anybody please enlighten me what this means or what a possible
> problem behind might be?

Typically a hardware problem. Some older systems generate them
spuriously though, which is why we have a "nomce" boot option.

--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs

2002-01-25 13:27:55

by Denis Oliver Kropp

[permalink] [raw]
Subject: Re: Machine Check Exception ?

Quoting Dave Jones ([email protected]):
> On Fri, Jan 25, 2002 at 11:47:18AM +0100, Stephan von Krawczynski wrote:
> > Message from syslogd@diehard at Thu Jan 24 14:44:49 2002 ...
> > diehard kernel: CPU 0: Machine Check Exception: 0000000000000004
> >
> > and the box is dead.
> > Can anybody please enlighten me what this means or what a possible
> > problem behind might be?
>
> Typically a hardware problem. Some older systems generate them
> spuriously though, which is why we have a "nomce" boot option.

My system here is a P3 800 Coppermine with Infineon RAM.
After removing that module it didn't occur. Linux 2.4.17.

--
Best regards,
Denis Oliver Kropp

.------------------------------------------.
| DirectFB - Hardware accelerated graphics |
| http://www.directfb.org/ |
"------------------------------------------"

convergence integrated media GmbH

2002-01-25 16:37:08

by Stephan von Krawczynski

[permalink] [raw]
Subject: Re: Machine Check Exception ?

On Fri, 25 Jan 2002 07:37:24 -0500 (EST)
"Marcel Kunath" <[email protected]> wrote:

> Whats the mobo?

I will answer later, it is remote, I have to check out. It is UP, btw.

> What do you mean the box is dead? Dead or deadly stalled on
> boot?

It boots and runs for quite a while (weeks), then suddenly freezes and
shows this message.
It does not happen often, but very rarely.
Has the number any meaning, or is it a goof?

Regards,
Stephan

> > Hello,
> >
> > I get the following message under 2.2.19:
> >
> > Message from syslogd@diehard at Thu Jan 24 14:44:49 2002 ...
> > diehard kernel: CPU 0: Machine Check Exception: 0000000000000004
> >
> > and the box is dead.
> > Can anybody please enlighten me what this means or what a possible
> > problem behind might be?

2002-01-25 16:48:03

by Stephan von Krawczynski

[permalink] [raw]
Subject: Re: Machine Check Exception ?

On Fri, 25 Jan 2002 07:37:24 -0500 (EST)
"Marcel Kunath" <[email protected]> wrote:

> Whats the mobo?

Ok,here we go:

diehard:~ # lspci
00:00.0 Host bridge: Intel Corporation 440BX/ZX - 82443BX/ZX Host bridge (rev 03)
00:01.0 PCI bridge: Intel Corporation 440BX/ZX - 82443BX/ZX AGP bridge (rev 03)
00:04.0 ISA bridge: Intel Corporation 82371AB PIIX4 ISA (rev 02)
00:04.1 IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 01)
00:04.2 USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 01)
00:04.3 Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 02)
00:09.0 SCSI storage controller: Symbios Logic Inc. (formerly NCR) 53c810 (rev 23)
00:0a.0 PCI bridge: Intel Corporation 80960RP [i960 RP Microprocessor/Bridge] (rev 05)
00:0a.1 RAID bus controller: Mylex Corporation DAC960PX (rev 05)
00:0b.0 Ethernet controller: 3Com Corporation 3c905C-TX [Fast Etherlink] (rev 74)
01:00.0 VGA compatible controller: S3 Inc. 86c368 [Trio 3D/2X] (rev 02)

Regards,
Stephan


2002-01-25 17:23:08

by Dave Jones

[permalink] [raw]
Subject: Re: Machine Check Exception ?

On Fri, Jan 25, 2002 at 05:36:23PM +0100, Stephan von Krawczynski wrote:

> It boots and runs for quite a while (weeks), then suddenly freezes and
> shows this message.
> It does not happen often, but very rarely.
> Has the number any meaning, or is it a goof?

> > > diehard kernel: CPU 0: Machine Check Exception: 0000000000000004

This number alone doesn't really tell anything.
There should be an extra line in the log / on the console,
which contains two more 64bit numbers. Putting these into
http://www.codemonkey.org.uk/cruft/parsemce.c will decode
them and give you more idea hopefully over whats wrong.
(I really should make that tool a little friendlier ..
I'll do it soon)

--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs

2002-01-25 18:37:43

by Dana Lacoste

[permalink] [raw]
Subject: RE: Machine Check Exception ?

I used to get these all the time as well (with a very
similar hardware setup) and although I have never
identified exactly what was wrong (still using 2.2.x)
I don't get them any more after doing this :
1 - switched from IDE to SCSI
2 - changed RAM vendors (yes, this was unpleasant)
and, most significantly :
3 - made sure the BIOS had the correct microcode update
for the CPU. the one it had was out of date, and
changing to the latest from Intel solved a LOT of
instability issues....

> -----Original Message-----
> From: Stephan von Krawczynski [mailto:[email protected]]
> Sent: January 25, 2002 11:48
> To: Marcel Kunath
> Cc: [email protected]
> Subject: Re: Machine Check Exception ?
>
>
> On Fri, 25 Jan 2002 07:37:24 -0500 (EST)
> "Marcel Kunath" <[email protected]> wrote:
>
> > Whats the mobo?
>
> Ok,here we go:
>
> diehard:~ # lspci
> 00:00.0 Host bridge: Intel Corporation 440BX/ZX - 82443BX/ZX
> Host bridge (rev 03)
> 00:01.0 PCI bridge: Intel Corporation 440BX/ZX - 82443BX/ZX
> AGP bridge (rev 03)
> 00:04.0 ISA bridge: Intel Corporation 82371AB PIIX4 ISA (rev 02)
> 00:04.1 IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 01)
> 00:04.2 USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 01)
> 00:04.3 Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 02)
> 00:09.0 SCSI storage controller: Symbios Logic Inc. (formerly
> NCR) 53c810 (rev 23)
> 00:0a.0 PCI bridge: Intel Corporation 80960RP [i960 RP
> Microprocessor/Bridge] (rev 05)
> 00:0a.1 RAID bus controller: Mylex Corporation DAC960PX (rev 05)
> 00:0b.0 Ethernet controller: 3Com Corporation 3c905C-TX [Fast
> Etherlink] (rev 74)
> 01:00.0 VGA compatible controller: S3 Inc. 86c368 [Trio
> 3D/2X] (rev 02)
>
> Regards,
> Stephan
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2002-01-25 18:48:28

by Dave Jones

[permalink] [raw]
Subject: RE: Machine Check Exception ?

On Fri, 25 Jan 2002, Dana Lacoste wrote:

> I don't get them any more after doing this :
> 1 - switched from IDE to SCSI
> 2 - changed RAM vendors (yes, this was unpleasant)
> and, most significantly :
> 3 - made sure the BIOS had the correct microcode update
> for the CPU. the one it had was out of date, and
> changing to the latest from Intel solved a LOT of
> instability issues....

Flaky RAM tends to be one of the more popular triggers
of these exceptions, so (2) above was more than likely your
cause as opposed to (3). (1) seems incredibly unlikely
unless it had adverse change on power drain.

--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs