2011-03-12 20:49:22

by Jan Engelhardt

[permalink] [raw]
Subject: MCE hardware error, but no message



Running Linux 2.6.37, I am getting these errors on one of a box:

[696782.810387] [Hardware Error]: No human readable MCE decoding support
on this CPU type.
[696782.810470] [Hardware Error]: Run the message through 'mcelog
--ascii' to decode.
[696783.585853] [Hardware Error]: No human readable MCE decoding support
on this CPU type.
[696783.585937] [Hardware Error]: Run the message through 'mcelog
--ascii' to decode.

Except that it never tells me the actual non-human readable form.
The error starts to show after 6-48 hours after a reboot (including
warm reboots). A second machine of the exact same configuration shows no
problems over the past 30 days. Environmental sensors of the problem box
show normal parameters.

How would I get the messages to run through mcelog?


2011-03-15 01:27:41

by Hidetoshi Seto

[permalink] [raw]
Subject: Re: MCE hardware error, but no message

(2011/03/13 5:49), Jan Engelhardt wrote:
>
>
> Running Linux 2.6.37, I am getting these errors on one of a box:
>
> [696782.810387] [Hardware Error]: No human readable MCE decoding support
> on this CPU type.
> [696782.810470] [Hardware Error]: Run the message through 'mcelog
> --ascii' to decode.
> [696783.585853] [Hardware Error]: No human readable MCE decoding support
> on this CPU type.
> [696783.585937] [Hardware Error]: Run the message through 'mcelog
> --ascii' to decode.
>
> Except that it never tells me the actual non-human readable form.
> The error starts to show after 6-48 hours after a reboot (including
> warm reboots). A second machine of the exact same configuration shows no
> problems over the past 30 days. Environmental sensors of the problem box
> show normal parameters.
>
> How would I get the messages to run through mcelog?

It looks like a kind of corrected error.

Let's try the latest mcelog:
git://git.kernel.org/pub/scm/utils/cpu/mce/mcelog.git


Thanks,
H.Seto

2011-03-16 18:32:56

by Andi Kleen

[permalink] [raw]
Subject: Re: MCE hardware error, but no message

Jan Engelhardt <[email protected]> writes:

> Running Linux 2.6.37, I am getting these errors on one of a box:
>
> [696782.810387] [Hardware Error]: No human readable MCE decoding support
> on this CPU type.
> [696782.810470] [Hardware Error]: Run the message through 'mcelog
> --ascii' to decode.
> [696783.585853] [Hardware Error]: No human readable MCE decoding support
> on this CPU type.
> [696783.585937] [Hardware Error]: Run the message through 'mcelog
> --ascii' to decode.
>
> Except that it never tells me the actual non-human readable form.

mcelog logs them. The kernel shouldn't be spewing these messages
at all, especially not for corrected errors (this is a still
unfixed regression for Intel CPUs)

Here's an older fix:

http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=commit;h=6e3c7411d2b86bff210c59caa432e8e862037bfd

> How would I get the messages to run through mcelog?

They are already logged, no need to do anything further.

-Andi

--
[email protected] -- Speaking for myself only