2000-11-15 20:40:50

by H. Peter Anvin

[permalink] [raw]
Subject: test11-pre5, Athlon, and Machine Check Architecture

Hi friends,

I noticed a slight bug in my CPUID 2.4.0-test11-pre5, and when I
unwound it, found some interesting things.

This relates to the Machine Check Architecture code (bluesmoke.c),
which in the previous code was conditionalized on running on an Intel
CPU. It appears that that shouldn't be necessary.

However, since at least AMD Athlon actually advertises MCA, I would
like to verify that the code works on these processors before
submitting it to Linus. Most importantly, of course, that it doesn't
crash; I don't expect anyone to actually see an #MF exception in real
life. I'm trying to get confirmation from AMD that the code should
be correct even for Athlon.

If it *doesn't* work, there are two possibilities:

a) Athlon is advertising a capability it doesn't have, or implements
incorrectly. This can be handled in setup.c as an Athlon bug.
b) Athlon is implementing a by-the-(Intel-)book correct version of MCA
that still differs in the details from Intel, and the code isn't
handling that correctly. This would be a bluesmoke.c bug and
should be fixed there.

So I would really appreciate if Athlon-equipped people would test this
patch (against 2.4.0-test11-pre5), and also if there are AMD people
that could comment on their implementation of MCA, I would really
appreciate it.

In the future, if I get around to it, I might also extend bluesmoke.c
to handle the case of a processor which implements MCE but not MCA (in
which case you get the exception that something died, but no
information about what caused it.)

This patch is also available at:

ftp://ftp.kernel.org/pub/linux/kernel/people/hpa/cpuid-2.4.0-test11-pre5-1.diff

Thanks,

-hpa

--- include/asm-i386/cpufeature.h.old Wed Nov 15 11:24:21 2000
+++ include/asm-i386/cpufeature.h Wed Nov 15 11:35:10 2000
@@ -20,7 +20,7 @@
#define X86_FEATURE_TSC (0*32+ 4) /* Time Stamp Counter */
#define X86_FEATURE_MSR (0*32+ 5) /* Model-Specific Registers, RDMSR, WRMSR */
#define X86_FEATURE_PAE (0*32+ 6) /* Physical Address Extensions */
-#define X86_FEATURE_MCE (0*32+ 7) /* Machine Check Architecture */
+#define X86_FEATURE_MCE (0*32+ 7) /* Machine Check Exception */
#define X86_FEATURE_CX8 (0*32+ 8) /* CMPXCHG8 instruction */
#define X86_FEATURE_APIC (0*32+ 9) /* Onboard APIC */
#define X86_FEATURE_SEP (0*32+11) /* SYSENTER/SYSEXIT */
--- arch/i386/kernel/bluesmoke.c.old Wed Nov 15 11:31:55 2000
+++ arch/i386/kernel/bluesmoke.c Wed Nov 15 11:34:21 2000
@@ -72,10 +72,12 @@
int i;
static int done;

- if( c->x86_vendor != X86_VENDOR_INTEL )
- return;
-
- if( !test_bit(X86_FEATURE_TSC, &c->x86_capability) )
+ /* We should not check for vendor here. The MCA capability
+ bit, below, should only be set if the CPU has the Intel
+ Machine Check Architecture (it's up to identify_cpu() to
+ make sure that is true! */
+
+ if( !test_bit(X86_FEATURE_MCE, &c->x86_capability) )
return;

if( !test_bit(X86_FEATURE_MCA, &c->x86_capability) )
--- arch/i386/kernel/setup.c.old Wed Nov 15 11:24:19 2000
+++ arch/i386/kernel/setup.c Wed Nov 15 11:37:38 2000
@@ -1483,7 +1483,6 @@
#ifndef CONFIG_M686
static int f00f_workaround_enabled = 0;
#endif
- extern void mcheck_init(struct cpuinfo_x86 *c);
char *p = NULL;

#ifndef CONFIG_M686
@@ -1575,9 +1574,6 @@

if ( p )
strcpy(c->x86_model_id, p);
-
- /* Enable MCA if available */
- mcheck_init(c);
}

void __init get_cpu_vendor(struct cpuinfo_x86 *c)
@@ -1797,6 +1793,8 @@
return have_cpuid_p(); /* Check to see if CPUID now enabled? */
}

+extern void mcheck_init(struct cpuinfo_x86 *c);
+
/*
* This does the hard work of actually picking apart the CPU stuff...
*/
@@ -1919,6 +1917,9 @@
* The vendor-specific functions might have changed features. Now
* we do "generic changes."
*/
+
+ /* Enable Machine Check Architecture if appropriate */
+ mcheck_init(c);

/* TSC disabled? */
#ifdef CONFIG_TSC
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt


2000-11-15 21:42:53

by Dave Jones

[permalink] [raw]
Subject: Re: test11-pre5, Athlon, and Machine Check Architecture


> However, since at least AMD Athlon actually advertises MCA, I would
> like to verify that the code works on these processors before
> submitting it to Linus.

The Athlon MCA is basically the same architecture-wise as Pentium Pro/II
But there are some differences.. Until AMD make document 21656 (BIOS
writers guide) public (or even a subset of it), we'll not be able to take
advantage of these extra features.

I'd suggest that until this happens, we leave bluesmoke.c Intel only.

regards,

Davej.

--
| Dave Jones <[email protected]> http://www.suse.de/~davej
| SuSE Labs

2000-11-15 21:45:32

by Rogier Wolff

[permalink] [raw]
Subject: Re: test11-pre5, Athlon, and Machine Check Architecture

H. Peter Anvin wrote:
> crash; I don't expect anyone to actually see an #MF exception in real
> life. I'm trying to get confirmation from AMD that the code should
> be correct even for Athlon.

Peter,

Would it be an idea to invite people to lower the voltage on their
CPUs a bit, to try and trigger #MF's?

(I started thinking about slowly overclocking the CPUs, to try and
trigger them, but that's not neccesary. At lower voltages, you'll also
get errors, but shouldn't risk smoking your CPU.... )

Roger.

--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* Common sense is the collection of *
****** prejudices acquired by age eighteen. -- Albert Einstein ********

2000-11-15 22:03:19

by H. Peter Anvin

[permalink] [raw]
Subject: Re: test11-pre5, Athlon, and Machine Check Architecture

Followup to: <[email protected]>
By author: [email protected] (Rogier Wolff)
In newsgroup: linux.dev.kernel
>
> H. Peter Anvin wrote:
> > crash; I don't expect anyone to actually see an #MF exception in real
> > life. I'm trying to get confirmation from AMD that the code should
> > be correct even for Athlon.
>
> Peter,
>
> Would it be an idea to invite people to lower the voltage on their
> CPUs a bit, to try and trigger #MF's?
>
> (I started thinking about slowly overclocking the CPUs, to try and
> trigger them, but that's not neccesary. At lower voltages, you'll also
> get errors, but shouldn't risk smoking your CPU.... )
>

If they wouldn't mind, I certainly would appreciate it... but on the
other hand, once you have gotten an #MF you have no guarantee of
proper operation anyway... after all, the code itself could be
corrupt.

-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

2000-11-15 22:04:49

by H. Peter Anvin

[permalink] [raw]
Subject: Re: test11-pre5, Athlon, and Machine Check Architecture

Followup to: <[email protected]>
By author: [email protected]
In newsgroup: linux.dev.kernel
>
>
> > However, since at least AMD Athlon actually advertises MCA, I would
> > like to verify that the code works on these processors before
> > submitting it to Linus.
>
> The Athlon MCA is basically the same architecture-wise as Pentium Pro/II
> But there are some differences.. Until AMD make document 21656 (BIOS
> writers guide) public (or even a subset of it), we'll not be able to take
> advantage of these extra features.
>
> I'd suggest that until this happens, we leave bluesmoke.c Intel only.
>

That's completely the wrong way to look at it. AMD are certainly free
to add features, what they aren't free to do is making code that
expects the documented behaviour fail -- and if so, it's their bug. I
have so far gotten no indication that that is the case; the only thing
I have gotten so far is a positive report that it at least doesn't do
the wrong thing in the no-#MF case.

-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt