The 2.6.22-rc1 boot panics early in amd_mcheck_init() with my k6-III/450.
It panics early enough that some of what I'm sure would be useful has
already scrolled off the screen, and there's no scrollback buffer at
that point. If more detail is needed, I'll have to transcribe what I
*can* see by hand.
Here's /proc/cpuinfo on the chance it will help someone track this down:
processor : 0
vendor_id : AuthenticAMD
cpu family : 5
model : 9
model name : AMD-K6(tm) 3D+ Processor
stepping : 1
cpu MHz : 451.032
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr mce cx8 pge mmx syscall 3dnow k6_mtrr
bogomips : 902.76
clflush size : 32
--
-----------------------------------------------------------------------
Bob Tracy WTO + WIPO = DMCA? http://www.anti-dmca.org
[email protected]
-----------------------------------------------------------------------
On Tue, 15 May 2007 22:13:14 -0500 (CDT) Bob Tracy wrote:
> The 2.6.22-rc1 boot panics early in amd_mcheck_init() with my k6-III/450.
> It panics early enough that some of what I'm sure would be useful has
> already scrolled off the screen, and there's no scrollback buffer at
> that point. If more detail is needed, I'll have to transcribe what I
> *can* see by hand.
or use digital camera photo of screen if you have such a camera.
It's a good idea to increase the screen virtual size by decreasing
the font size if your video driver supports that.
> Here's /proc/cpuinfo on the chance it will help someone track this down:
>
> processor : 0
> vendor_id : AuthenticAMD
> cpu family : 5
> model : 9
> model name : AMD-K6(tm) 3D+ Processor
> stepping : 1
> cpu MHz : 451.032
> cache size : 256 KB
> fdiv_bug : no
> hlt_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 1
> wp : yes
> flags : fpu vme de pse tsc msr mce cx8 pge mmx syscall 3dnow k6_mtrr
> bogomips : 902.76
> clflush size : 32
---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
On May 15 2007 21:27, Randy Dunlap wrote:
>On Tue, 15 May 2007 22:13:14 -0500 (CDT) Bob Tracy wrote:
>
>> The 2.6.22-rc1 boot panics early in amd_mcheck_init() with my k6-III/450.
>> It panics early enough that some of what I'm sure would be useful has
>> already scrolled off the screen, and there's no scrollback buffer at
>> that point. If more detail is needed, I'll have to transcribe what I
>> *can* see by hand.
>
>or use digital camera photo of screen if you have such a camera.
>
>It's a good idea to increase the screen virtual size by decreasing
>the font size if your video driver supports that.
In short, boot with vga=6 (80x50), or even vga=ask, selecting something with
lots of chars (132x60 for example)
Jan
--
Jan Engelhardt wrote:
> >On Tue, 15 May 2007 22:13:14 -0500 (CDT) Bob Tracy wrote:
> >
> >> The 2.6.22-rc1 boot panics early in amd_mcheck_init() with my k6-III/450.
>
> In short, boot with vga=6 (80x50), or even vga=ask, selecting something with
> lots of chars (132x60 for example)
That did it...
(...)
CPU: L1 I Cache: (...)
CPU: L2 Cache: (...)
Intel machine check architecture supported.
general protection fault: 0000 [#1]
PREEMPT
Modules linked in:
CPU: 0
EIP: 0060:[<c01079f4>] Not tainted VLI
EFLAGS: 00010286 (2.6.22-rc1 #1)
EIP is at amd_mcheck_init+0x2b/0xc3
eax: 0000002f ebx: 00000000 ecx: 00000179 edx: 00000001
esi: 000a0b00 edi: c03c1000 ebp: 00843007 esp: c03c7fb0
ds:007b es:007b fs:0000 gs:0000 ss:0068
Process swapper (pid:0, ti=c03c6000 task=c0399a20 task.ti=c03c6000)
Stack: c03569fc 00000000 00843007 c0189398 ffffffff 00000000 000a0600 c03cb34f
00000002 c03a2b40 00000000 c03cb79c 00040000 00000000 00000000 c03c8842
00000054 c03c8388 c03dfb40 00000000
Call Trace:
[<c0189398>] proc_register+0x3b/0xb7
[<c03cb34f>] identify_boot_cpu+0xd/0x1f
[<c03cb79c>] check_bugs+0x8/0x4e
[<c03c8842>] start_kernel+0x19a/0x1a3
[<c03c8388>] unknown_bootoption+0x0/0x191
==========================================
Code: 56 53 83 ec 14 c7 05 08 f1 39 c0 3c 78 10 c0 8b 40 0c a8 80 0f 84 a0 00 00 00 c7 04 24 fc 69 35 c0 e8 79 f3 00 00 b9 79 01 00 00 <0f> 32 f6 c4 01 89 c3 74 0a b1 7b 83 c8 ff 83 ca ff 0f 30 0f b6
EIP:[<c01079f4>] amd_mcheck_init+0x2b/0xc3 SS:ESP 0068:c03c7fb0
Kernel panic - not syncing: Attempted to kill the idle task!
--
-----------------------------------------------------------------------
Bob Tracy WTO + WIPO = DMCA? http://www.anti-dmca.org
[email protected]
-----------------------------------------------------------------------
Bob Tracy wrote:
> Jan Engelhardt wrote:
>>> On Tue, 15 May 2007 22:13:14 -0500 (CDT) Bob Tracy wrote:
>>>
>>>> The 2.6.22-rc1 boot panics early in amd_mcheck_init() with my k6-III/450.
> Intel machine check architecture supported.
> general protection fault: 0000 [#1]
> PREEMPT
> Modules linked in:
> CPU: 0
> EIP: 0060:[<c01079f4>] Not tainted VLI
> EFLAGS: 00010286 (2.6.22-rc1 #1)
> EIP is at amd_mcheck_init+0x2b/0xc3
>
rdmsr with ecx == 0x179 (Machine Check Global Capabilities Register)
Probably K6 doesn't have that.
Caused by:
[PATCH] i386: check capability
On Wed, May 16, 2007 at 11:53:22AM -0400, Chuck Ebbert wrote:
> Bob Tracy wrote:
> > Jan Engelhardt wrote:
> >>> On Tue, 15 May 2007 22:13:14 -0500 (CDT) Bob Tracy wrote:
> >>>
> >>>> The 2.6.22-rc1 boot panics early in amd_mcheck_init() with my k6-III/450.
>
> > Intel machine check architecture supported.
> > general protection fault: 0000 [#1]
> > PREEMPT
> > Modules linked in:
> > CPU: 0
> > EIP: 0060:[<c01079f4>] Not tainted VLI
> > EFLAGS: 00010286 (2.6.22-rc1 #1)
> > EIP is at amd_mcheck_init+0x2b/0xc3
> >
>
> rdmsr with ecx == 0x179 (Machine Check Global Capabilities Register)
>
> Probably K6 doesn't have that.
sounds right. Intel style MCE capability was introduced with the Athlon
on AMD systems iirc.
> Caused by:
>
> [PATCH] i386: check capability
Though this would imply that Bobs K6-3 is reporting that it does have
that bit in its cpuid flags.
Bob, can you send your /proc/cpuinfo and dmesg |grep CPU ?
Dave
--
http://www.codemonkey.org.uk
Dave Jones wrote:
> On Wed, May 16, 2007 at 11:53:22AM -0400, Chuck Ebbert wrote:
> > Bob Tracy wrote:
> > > Jan Engelhardt wrote:
> > >>> On Tue, 15 May 2007 22:13:14 -0500 (CDT) Bob Tracy wrote:
> > >>>> The 2.6.22-rc1 boot panics early in amd_mcheck_init() with my k6-III/450.
> >
> > > Intel machine check architecture supported.
> > > general protection fault: 0000 [#1]
> > > PREEMPT
> > > Modules linked in:
> > > CPU: 0
> > > EIP: 0060:[<c01079f4>] Not tainted VLI
> > > EFLAGS: 00010286 (2.6.22-rc1 #1)
> > > EIP is at amd_mcheck_init+0x2b/0xc3
> > >
> >
> > rdmsr with ecx == 0x179 (Machine Check Global Capabilities Register)
> >
> > Probably K6 doesn't have that.
>
> sounds right. Intel style MCE capability was introduced with the Athlon
> on AMD systems iirc.
>
> > Caused by:
> >
> > [PATCH] i386: check capability
>
> Though this would imply that Bobs K6-3 is reporting that it does have
> that bit in its cpuid flags.
>
> Bob, can you send your /proc/cpuinfo and dmesg |grep CPU ?
/proc/cpuinfo sent in the first message in this thread (anticipated your
request :-)), but it's small enough to repeat:
processor : 0
vendor_id : AuthenticAMD
cpu family : 5
model : 9
model name : AMD-K6(tm) 3D+ Processor
stepping : 1
cpu MHz : 451.040
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr mce cx8 pge mmx syscall 3dnow k6_mtrr
bogomips : 902.78
clflush size : 32
Here's the requested dmesg output (for 2.6.21):
Initializing CPU#0
CPU: After generic identify, caps: 008021bf 808029bf 00000000 00000000 00000000 00000000 00000000
CPU: L1 I Cache: 32K (32 bytes/line), D cache 32K (32 bytes/line)
CPU: L2 Cache: 256K (32 bytes/line)
CPU: After all inits, caps: 008021bf 808029bf 00000000 00000002 00000000 00000000 00000000
CPU: AMD-K6(tm) 3D+ Processor stepping 01
NVRM: CPU does not support the PAT, falling back to MTRRs.
--
-----------------------------------------------------------------------
Bob Tracy WTO + WIPO = DMCA? http://www.anti-dmca.org
[email protected]
-----------------------------------------------------------------------
On Wed, May 16, 2007 at 02:11:56PM -0500, Bob Tracy wrote:
> flags : fpu vme de pse tsc msr mce cx8 pge mmx syscall 3dnow k6_mtrr
> bogomips : 902.78
> clflush size : 32
Ah so it really does think it has mce.
I just dug out the datasheet for the K6-3, and true enough, it did have MCE, however,
it isn't intel compatible. It has two MSRs (MCAR at 0x0, and MCTR at 0x01).
Then the punchline..
"Because the processor does not support machine check exceptions, the contents of the
MCAR and MCTR are only affected by the WRMSR instruction and by RESET being sampled
asserted (where all bits in each register are reset to 0)."
In short, it's useless.
We could clear the capability bit and pretend it isn't there, at no loss of
functionality, or we could revert back to doing model checks instead of cpuid flag checks.
Dave
--
http://www.codemonkey.org.uk
On Wed, May 16, 2007 at 03:22:48PM -0400, Dave Jones wrote:
> On Wed, May 16, 2007 at 02:11:56PM -0500, Bob Tracy wrote:
>
> > flags : fpu vme de pse tsc msr mce cx8 pge mmx syscall 3dnow k6_mtrr
> > bogomips : 902.78
> > clflush size : 32
>
> Ah so it really does think it has mce.
> I just dug out the datasheet for the K6-3, and true enough, it did have MCE, however,
> it isn't intel compatible. It has two MSRs (MCAR at 0x0, and MCTR at 0x01).
> Then the punchline..
>
> "Because the processor does not support machine check exceptions, the contents of the
> MCAR and MCTR are only affected by the WRMSR instruction and by RESET being sampled
> asserted (where all bits in each register are reset to 0)."
>
> In short, it's useless.
> We could clear the capability bit and pretend it isn't there, at no loss of
> functionality, or we could revert back to doing model checks instead of cpuid flag checks.
Bob, does this patch make it boot again for you?
Dave
Some AMD K6's advertise machine check capability, but don't actually
have an Intel compatible implementation. It also doesn't actually work,
so don't advertise it as being present.
Signed-off-by: Dave Jones <[email protected]>
diff --git a/arch/i386/kernel/cpu/amd.c b/arch/i386/kernel/cpu/amd.c
index 4fec702..3a75c5b 100644
--- a/arch/i386/kernel/cpu/amd.c
+++ b/arch/i386/kernel/cpu/amd.c
@@ -197,7 +197,14 @@ static void __cpuinit init_amd(struct cpuinfo_x86 *c)
/* placeholder for any needed mods */
break;
}
+
+ /*
+ * Some K6's advertise MCE, but it's incompatible
+ * to Intel style MCE, and also non-functional.
+ */
+ clear_bit(X86_FEATURE_MCE, c->x86_capability);
break;
+
case 6: /* An Athlon/Duron */
/* Bit 15 of Athlon specific MSR 15, needs to be 0
--
http://www.codemonkey.org.uk
Dave Jones wrote:
> Bob, does this patch make it boot again for you?
>
> Dave
>
> Some AMD K6's advertise machine check capability, but don't actually
> have an Intel compatible implementation. It also doesn't actually work,
> so don't advertise it as being present.
>
> Signed-off-by: Dave Jones <[email protected]>
NAK. No difference. Identical panic message. (Yes, I double-checked
to make sure I was booting the patched kernel :-)).
--
-----------------------------------------------------------------------
Bob Tracy WTO + WIPO = DMCA? http://www.anti-dmca.org
[email protected]
-----------------------------------------------------------------------
On Wed, May 16, 2007 at 11:36:46PM -0500, Bob Tracy wrote:
> Dave Jones wrote:
> > Bob, does this patch make it boot again for you?
> >
> > Dave
> >
> > Some AMD K6's advertise machine check capability, but don't actually
> > have an Intel compatible implementation. It also doesn't actually work,
> > so don't advertise it as being present.
> >
> > Signed-off-by: Dave Jones <[email protected]>
>
> NAK. No difference. Identical panic message. (Yes, I double-checked
> to make sure I was booting the patched kernel :-)).
Hmm, odd.
Does reverting the patch that Chuck fingered fix it?
Dave
--
http://www.codemonkey.org.uk
Dave Jones wrote:
> On Wed, May 16, 2007 at 11:36:46PM -0500, Bob Tracy wrote:
> > Dave Jones wrote:
> > > Bob, does this patch make it boot again for you?
> >
> > NAK. No difference. Identical panic message. (Yes, I double-checked
> > to make sure I was booting the patched kernel :-)).
>
> Hmm, odd.
> Does reverting the patch that Chuck fingered fix it?
ACK. I'm running 2.6.22-rc1 minus Joachim's patch as I type this.
Anticipating the question, here's the "Processor type and features"
section from "linux/.config":
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.22-rc1
# Sun May 13 00:29:57 2007
#
(...)
#
# Processor type and features
#
CONFIG_TICK_ONESHOT=y
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y
# CONFIG_SMP is not set
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
# CONFIG_PARAVIRT is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MCORE2 is not set
# CONFIG_MPENTIUM4 is not set
CONFIG_MK6=y
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
# CONFIG_X86_GENERIC is not set
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=5
CONFIG_X86_XADD=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_ALIGNMENT_16=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_TSC=y
CONFIG_X86_MINIMUM_CPU_MODEL=4
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_BKL=y
# CONFIG_X86_UP_APIC is not set
CONFIG_X86_MCE=y
# CONFIG_X86_MCE_NONFATAL is not set
CONFIG_VM86=y
# CONFIG_TOSHIBA is not set
# CONFIG_I8K is not set
# CONFIG_X86_REBOOTFIXUPS is not set
# CONFIG_MICROCODE is not set
# CONFIG_X86_MSR is not set
# CONFIG_X86_CPUID is not set
--
-----------------------------------------------------------------------
Bob Tracy WTO + WIPO = DMCA? http://www.anti-dmca.org
[email protected]
-----------------------------------------------------------------------
On Wednesday 16 May 2007 17:53, Chuck Ebbert wrote:
> Bob Tracy wrote:
> > Jan Engelhardt wrote:
> >>> On Tue, 15 May 2007 22:13:14 -0500 (CDT) Bob Tracy wrote:
> >>>> The 2.6.22-rc1 boot panics early in amd_mcheck_init() with my
> >>>> k6-III/450.
> >
> > Intel machine check architecture supported.
> > general protection fault: 0000 [#1]
> > PREEMPT
> > Modules linked in:
> > CPU: 0
> > EIP: 0060:[<c01079f4>] Not tainted VLI
> > EFLAGS: 00010286 (2.6.22-rc1 #1)
> > EIP is at amd_mcheck_init+0x2b/0xc3
>
> rdmsr with ecx == 0x179 (Machine Check Global Capabilities Register)
>
> Probably K6 doesn't have that.
Hmpf.
We cold either use rdmsr_safe or add a family check again or clear it
in k6 setup. I think clearing it in setup is cleanest.
Does this patch work?
-Andi
Clear MCE flag on AMD K6
It reports machine check capability in CPUID, but doesn't actually
implement all the necessary MSRs of the standard Intel machine
check architecture.
This fixes a boot failure recently introduced.
Signed-off-by: Andi Kleen <[email protected]>
Index: linux/arch/i386/kernel/cpu/amd.c
===================================================================
--- linux.orig/arch/i386/kernel/cpu/amd.c
+++ linux/arch/i386/kernel/cpu/amd.c
@@ -280,6 +280,10 @@ static void __cpuinit init_amd(struct cp
if (c->x86 == 0x10 && !force_mwait)
clear_bit(X86_FEATURE_MWAIT, c->x86_capability);
+
+ /* K6s reports MCEs but don't actually have all the MSRs */
+ if (c->x86 < 6)
+ clear_bit(X86_FEATURE_MCE, c->x86_capability);
}
static unsigned int __cpuinit amd_size_cache(struct cpuinfo_x86 * c, unsigned
int size)
Index: linux/arch/i386/kernel/cpu/mcheck/k7.c
===================================================================
--- linux.orig/arch/i386/kernel/cpu/mcheck/k7.c
+++ linux/arch/i386/kernel/cpu/mcheck/k7.c
@@ -72,12 +72,12 @@ void amd_mcheck_init(struct cpuinfo_x86
u32 l, h;
int i;
- machine_check_vector = k7_machine_check;
- wmb();
-
if (!cpu_has(c, X86_FEATURE_MCE))
return;
+ machine_check_vector = k7_machine_check;
+ wmb();
+
printk (KERN_INFO "Intel machine check architecture supported.\n");
rdmsr (MSR_IA32_MCG_CAP, l, h);
if (l & (1<<8)) /* Control register present ? */
Andi Kleen wrote:
> Clear MCE flag on AMD K6
>
> It reports machine check capability in CPUID, but doesn't actually
> implement all the necessary MSRs of the standard Intel machine
> check architecture.
>
> This fixes a boot failure recently introduced.
>
> Signed-off-by: Andi Kleen <[email protected]>
I want to acknowledge receiving the above, but it arrived too late for
me to test this morning (the work day intrudes). I'll get a new kernel
built, test this, and report back in about 10-11 hours.
--
-----------------------------------------------------------------------
Bob Tracy WTO + WIPO = DMCA? http://www.anti-dmca.org
[email protected]
-----------------------------------------------------------------------
Andi Kleen wrote:
> Hmpf.
>
> We cold either use rdmsr_safe or add a family check again or clear it
> in k6 setup. I think clearing it in setup is cleanest.
>
> Does this patch work?
>
> -Andi
>
> Clear MCE flag on AMD K6
>
> It reports machine check capability in CPUID, but doesn't actually
> implement all the necessary MSRs of the standard Intel machine
> check architecture.
>
> This fixes a boot failure recently introduced.
>
> Signed-off-by: Andi Kleen <[email protected]>
ACK. I reinstalled Joachim's patch (default 2.6.22-rc1 state), and
added your patch. Life is good: we have a fix/workaround.
--
-----------------------------------------------------------------------
Bob Tracy WTO + WIPO = DMCA? http://www.anti-dmca.org
[email protected]
-----------------------------------------------------------------------
On Friday 18 May 2007 01:38:13 [email protected] wrote:
> Andi Kleen wrote:
> > Hmpf.
> >
> > We cold either use rdmsr_safe or add a family check again or clear it
> > in k6 setup. I think clearing it in setup is cleanest.
> >
> > Does this patch work?
> >
> > -Andi
> >
> > Clear MCE flag on AMD K6
> >
> > It reports machine check capability in CPUID, but doesn't actually
> > implement all the necessary MSRs of the standard Intel machine
> > check architecture.
> >
> > This fixes a boot failure recently introduced.
> >
> > Signed-off-by: Andi Kleen <[email protected]>
>
> ACK. I reinstalled Joachim's patch (default 2.6.22-rc1 state), and
> added your patch. Life is good: we have a fix/workaround.
Great, thanks for finding this and Andi for the patch. I'll talk to our labs
about dusting off old systems for testing ;)
-Joachim