2003-07-19 22:44:51

by Dave Gilbert (Home)

[permalink] [raw]
Subject: AMD Athlon MP Machine check exceptions

Hi,
Is there any information on decoding AMD Athlon MP Machine
check exceptions? I can't seem to find the appropriate AMD
document on their website - it would be nice to know
if this is RAM or cache or something else that gave it.

The error reported is:

Jul 19 21:07:37 gallifrey kernel: MCE: The hardware reports a non fatal,
correctable incident occurred on CPU 0.
Jul 19 21:07:37 gallifrey kernel: Bank 2: 940040000000017a

Thats from 2.5.75 on a dual Athlon MP on a Tyan 760MP motherboard.

The machine has apparently been running fine for some time now - perhaps
this is heat related due to the unusually warm weather over here,
or perhaps it is the machine check polling picking
up something that has been going dodgy for a while.

Dave
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux on Alpha,68K| Happy \
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/


2003-07-19 23:18:48

by Ian molton

[permalink] [raw]
Subject: Re: AMD Athlon MP Machine check exceptions

On Sat, 19 Jul 2003 23:59:35 +0100
"Dr. David Alan Gilbert" <[email protected]> wrote:

>
> Thats from 2.5.75 on a dual Athlon MP on a Tyan 760MP motherboard.
>
> The machine has apparently been running fine for some time now - perhaps
> this is heat related due to the unusually warm weather over here,
> or perhaps it is the machine check polling picking
> up something that has been going dodgy for a while.

I've seen a couple of these too, and would appreciate knowing whats going on :)

--
Spyros lair: http://www.mnementh.co.uk/ |||| Maintainer: arm26 linux

Do not meddle in the affairs of Dragons, for you are tasty and good with ketchup.

2003-07-20 08:06:38

by Willy Tarreau

[permalink] [raw]
Subject: Re: AMD Athlon MP Machine check exceptions

Hi !

You should feed it through Dave Jones' parsemce program. BTW, he already
replied a few months ago to exactly the same report (search 940040000000017a
on google, you have it already decoded :-))

Cheers,
Willy

On Sat, Jul 19, 2003 at 11:59:35PM +0100, Dr. David Alan Gilbert wrote:
> Hi,
> Is there any information on decoding AMD Athlon MP Machine
> check exceptions? I can't seem to find the appropriate AMD
> document on their website - it would be nice to know
> if this is RAM or cache or something else that gave it.
>
> The error reported is:
>
> Jul 19 21:07:37 gallifrey kernel: MCE: The hardware reports a non fatal,
> correctable incident occurred on CPU 0.
> Jul 19 21:07:37 gallifrey kernel: Bank 2: 940040000000017a
>
> Thats from 2.5.75 on a dual Athlon MP on a Tyan 760MP motherboard.
>
> The machine has apparently been running fine for some time now - perhaps
> this is heat related due to the unusually warm weather over here,
> or perhaps it is the machine check polling picking
> up something that has been going dodgy for a while.
>
> Dave
> -----Open up your eyes, open up your mind, open up your code -------
> / Dr. David Alan Gilbert | Running GNU/Linux on Alpha,68K| Happy \
> \ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex /
> \ _________________________|_____ http://www.treblig.org |_______/
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2003-07-20 11:39:35

by Dave Gilbert (Home)

[permalink] [raw]
Subject: Re: AMD Athlon MP Machine check exceptions

* Willy Tarreau ([email protected]) wrote:
> Hi !
>
> You should feed it through Dave Jones' parsemce program.

Thank you! Unfortunatly Dave's site seems to be down at the moment
(and google don't seem to have it cached - why?)

> BTW, he already
> replied a few months ago to exactly the same report (search 940040000000017a
> on google, you have it already decoded :-))

Ah - I hadn't thought of searching for the hex, I'd presumed that
related to a particular bank/address/cache line and the chances of lots
of people hitting the same one would be slim - unless there is a
problem? Perhaps when I get parsemce it will be clearer.

Dave
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux on Alpha,68K| Happy \
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/

2003-07-21 18:20:59

by Dave Jones

[permalink] [raw]
Subject: Re: AMD Athlon MP Machine check exceptions

On Sun, Jul 20, 2003 at 12:54:32PM +0100, Dr. David Alan Gilbert wrote:
> > You should feed it through Dave Jones' parsemce program.
> Thank you! Unfortunatly Dave's site seems to be down at the moment

Gah, a pox on Cobalt Raq's...
Such problems always seem to happen when I'm a few thousand miles
away from said box also, so I'm not really too sure whats happening
with it right now..
When I get chance, I'll move it (and post-halloween and other bits)
to kernel.org.

> (and google don't seem to have it cached - why?)

Beats me.


Dave