2005-11-02 15:32:09

by Evgeny Rodichev

[permalink] [raw]
Subject: x86_64 mce_log question

Hello,

at Opteron-based x86_64 system sometimes I get message

Machine check events logged

(non-fatal). How can I read the correspondent events? From the source
code (arch/x86_64/kernel/mce.c) it sounds like some misc device with
MISC_MCELOG_MINOR 227 is registered (with name "mcelog"?), but there is
no such device under /dev.

Thank you.
_________________________________________________________________________
Evgeny Rodichev Sternberg Astronomical Institute
email: [email protected] Moscow State University
Phone: 007 (095) 939 2383
Fax: 007 (095) 932 8841 http://www.sai.msu.su/~er


2005-11-05 17:35:55

by Andi Kleen

[permalink] [raw]
Subject: Re: x86_64 mce_log question

Evgeny Rodichev <[email protected]> writes:

> Hello,
>
> at Opteron-based x86_64 system sometimes I get message
>
> Machine check events logged
>
> (non-fatal). How can I read the correspondent events?

Read the help

config X86_MCE
bool "Machine check support" if EMBEDDED
default y
help
Include a machine check error handler to report hardware errors.
This version will require the mcelog utility to decode some
machine check error logs. See
ftp://ftp.x86-64.org/pub/linux/tools/mcelog


> From the source
> code (arch/x86_64/kernel/mce.c) it sounds like some misc device with
> MISC_MCELOG_MINOR 227 is registered (with name "mcelog"?), but there is
> no such device under /dev.

Your distribution is broken then. In fact it is supposed to run
mcelog regularly from a cronjob to log machine check events into
a disk log. Complain to them.

-Andi