2003-07-12 09:55:00

by Nicolas Mailhot

[permalink] [raw]
Subject: MCE exception advice

[ Please CC me on answers since I'm not on the list ]

Hi,

I've been getting MCE's repeatedly today when trying to compile
2.5.75-bk1 on 2.5.75-bk1 (obviously I didn't have them yesterday when I
build my first 2.5.75-bk1 kernel on a 2.4 kernel).

The MCE is always the same (I think) and reads like this :

CPU 0: Machine Check Exception: 0000000000000004
Bank 0: b600000000000135 at 000000000b99b9f0
Kernel panic: CPU context corrupt

Which when decoded with parsemce gives :

[nim@rousalka parse]$ ./parse -i < mce
CPU 0
Status: (4) Machine Check in progress.
Restart IP invalid.
parsebank(0): b600000000000135 @ b99b9f0
External tag parity error
CPU state corrupt. Restart not possible
Address in addr register valid
Error enabled in control register
Error not corrected.
Memory heirarchy error
Request: Generic error
Transaction type : Data
Memory/IO : Reserved

I'd like to have some advice on what to do next. Is this a 2.5 bug ? An
hardware problem only triggered in 2.5 because it exercises the harware
in a different way ? Should I change something in the system ? If so,
should I change memory, cpu, psu, something else ?

I don't usually build 2.5 on 2.5, but again yesterday was very hot and
hardware might have suffered (the best case cooling can not do much with
room temperature = 30+ ?C)

Any hint will be welcome - this is my first mce encounter.

Regards,

--
Nicolas Mailhot


Attachments:
signature.asc (189.00 B)
Ceci est une partie de message num?riquement sign

2003-07-12 10:06:09

by Nicolas Mailhot

[permalink] [raw]
Subject: Re: MCE exception advice

Le sam 12/07/2003 ? 12:09, Nicolas Mailhot a ?crit :
> [ Please CC me on answers since I'm not on the list ]
>
> Hi,
>
> I've been getting MCE's repeatedly today when trying to compile
> 2.5.75-bk1 on 2.5.75-bk1 (obviously I didn't have them yesterday when I
> build my first 2.5.75-bk1 kernel on a 2.4 kernel).
>
> The MCE is always the same (I think) and reads like this :
>
> CPU 0: Machine Check Exception: 0000000000000004
> Bank 0: b600000000000135 at 000000000b99b9f0

Well looking in the logs the MCE type is always the same but the actual
address changes :

/var/log/messages:4371:Jul 12 11:14:08 rousalka kernel: Bank 0:
b67e800000000135 at 0000000004fc8678
/var/log/messages:4692:Jul 12 11:22:52 rousalka kernel: Bank 0:
b607000000000135 at 0000000011b6e7f0
/var/log/messages:4982:Jul 12 11:29:49 rousalka kernel: Bank 0:
b674000000000135 at 0000000017c029f0
/var/log/messages:5265:Jul 12 11:45:15 rousalka kernel: Bank 0:
b600000000000135 at 000000000b99b9f0

What's the best course of action now ?

--
Nicolas Mailhot


Attachments:
signature.asc (189.00 B)
Ceci est une partie de message num?riquement sign