2005-12-27 02:52:26

by Legend W.

[permalink] [raw]
Subject: Machine Check Exception !

Hello,

I get the following message under 2.4.21 from RedHat:

CPU 3: Machine Check Exception: 0000000000000004
<Bank 0: b20000001040080f

and the box is dead.

When i use parsemce, it said:
Status: (4) Machine Check in progress.
Restart IP invalid.
parsebank(0): b20000001040080f @ 3
External tag parity error
CPU state corrupt. Restart not possible
Error enabled in control register
Error not corrected.
Bus and interconnect error
Participation: Local processor originated request
Timeout: Request did not timeout
Request: Generic error
Transaction type : Invalid
Memory/IO : Other

Can anybody please enlighten me what this means or what a possible
problem behind might be?

Thank you in advance

PS: my box has dual Xeon 2.8G CPU


2005-12-27 06:46:57

by Nauman Tahir

[permalink] [raw]
Subject: Re: Machine Check Exception !

On 12/27/05, Legend W. <[email protected]> wrote:
> Hello,
>
> I get the following message under 2.4.21 from RedHat:
>
> CPU 3: Machine Check Exception: 0000000000000004
> <Bank 0: b20000001040080f
>
> and the box is dead.
>
> When i use parsemce, it said:
> Status: (4) Machine Check in progress.
> Restart IP invalid.
> parsebank(0): b20000001040080f @ 3
> External tag parity error
> CPU state corrupt. Restart not possible
> Error enabled in control register
> Error not corrected.
> Bus and interconnect error
> Participation: Local processor originated request
> Timeout: Request did not timeout
> Request: Generic error
> Transaction type : Invalid
> Memory/IO : Other
>
> Can anybody please enlighten me what this means or what a possible
> problem behind might be?
>
> Thank you in advance
>
> PS: my box has dual Xeon 2.8G CPU

if you want to make your machine run any way use "nomce" at boot
prompt against your respective grub entry.

> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2005-12-27 17:47:25

by Lee Revell

[permalink] [raw]
Subject: Re: Machine Check Exception !

On Tue, 2005-12-27 at 11:46 +0500, Nauman Tahir wrote:
> On 12/27/05, Legend W. <[email protected]> wrote:
> > Hello,
> >
> > I get the following message under 2.4.21 from RedHat:
> >
> > CPU 3: Machine Check Exception: 0000000000000004
> > <Bank 0: b20000001040080f
> >
> > and the box is dead.
> >
> > When i use parsemce, it said:
> > Status: (4) Machine Check in progress.
> > Restart IP invalid.
> > parsebank(0): b20000001040080f @ 3
> > External tag parity error
> > CPU state corrupt. Restart not possible
> > Error enabled in control register
> > Error not corrected.
> > Bus and interconnect error
> > Participation: Local processor originated request
> > Timeout: Request did not timeout
> > Request: Generic error
> > Transaction type : Invalid
> > Memory/IO : Other
> >
> > Can anybody please enlighten me what this means or what a possible
> > problem behind might be?
> >
> > Thank you in advance
> >
> > PS: my box has dual Xeon 2.8G CPU
>
> if you want to make your machine run any way use "nomce" at boot
> prompt against your respective grub entry.

This is a terrible idea. MCEs indicate some kind of hardware problem,
it would be idiotic to just ignore that.

Figure out the hardware problem and fix it (bad RAM, overheating, poorly
seated card, etc).

Lee

2005-12-31 01:03:13

by Alan

[permalink] [raw]
Subject: Re: Machine Check Exception !

On Maw, 2005-12-27 at 10:52 +0800, Legend W. wrote:
> parsebank(0): b20000001040080f @ 3
> External tag parity error
> CPU state corrupt. Restart not possible
> Error enabled in control register
> Error not corrected.
> Bus and interconnect error
> Participation: Local processor originated request
> Timeout: Request did not timeout
> Request: Generic error
> Transaction type : Invalid
> Memory/IO : Other
>
> Can anybody please enlighten me what this means or what a possible
> problem behind might be?

Executive summary - your hardware is broken. In this case its reporting
a parity error on external tag bits - presumably cache bits. "Contact
your system vendor for advice" as they say 8)