This problem seems to be getting worse. With the latest linus tree, after
the initial hang where the power button is needed to bring the system back
to life, the boot process slows to a crawl.
Although if I repeatedly press the shift key it'll help move things along.
I'm guessing the keyboard interrupt firing off is doing something to help
it. The system seems OK though after booting is finished.
Upon shutdown it actually stalls out without help from the shift key.
Any thoughts on this, or tips to help debug it further?
thx, Cal
On Mon, 30 Jul 2007, Cal Peake wrote:
> On Sun, 29 Jul 2007, Gabriel C wrote:
>
> > Frank Hale wrote:
> > [ added linux-acpi to CC ]
> > > I have an Averatec 2370 laptop with the nVidia MCP51. With kernel
> > > 2.6.20 I had no issues with ACPI however with 2.6.21 and higher the
> > > kernel will hang on boot until I press the suspend button or the power
> > > button in which case the kernel wakes up and finishes the boot
> > > process. Including the following support only causes the issue:
> > >
> > > [*] ACPI Support
> > >
> > > What I mean by that is every ACPI option has been deactivated and only
> > > ACPI support checked. The boot process with 2.6.21 and higher hangs at
> > > the point where the Scheduler is being registered.
> > >
> > > io scheduler cfq registered (default)
> > >
> > > If I allow it to sit there it never comes back to life and finishes
> > > booting. If I press the power or suspend button it will finish booting
> > > as expected.
> > >
> > > I've scoured google for quite a while but cannot find any relevant
> > > information pertaining to this issue. For now I've disabled ACPI
> > > altogether.
>
> Frank, thanks for the tip about 2.6.20 being good, it gave me a nice place
> to start bisecting from.
>
> Thomas, Ingo,
>
> Regarding the issue described above that Frank and I are having, I've
> narrowed it down to commit e9e2cdb4[1]: [PATCH] clockevents: i386 drivers
>
> About our systems:
> Averatec 2370/2371 Laptop
> AMD Turion 64 X2 TL-50/TL-52
> nVidia MCP51 chipset
>
> Here a small matrix of my tests:
>
> 2.6.20.15 SMP : OK
> 2.6.21.5 SMP : hang
> 2.6.21.5 UP w/o APIC : OK
> 2.6.22.1 UP : hang
> 2.6.22.1 UP w/o IO-APIC : hang
> 2.6.22.1 UP w/o APIC : OK
> 2.6.22.1 SMP : hang
> 2.6.22.1 SMP w/o ACPI : OK
>
> Please let me know if there's anything else I can provide to help.
>
> thanks!
> --
> Cal Peake
>
> [1] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e9e2cdb412412326c4827fc78ba27f410d837e6e
On 08/02/2007 01:50 PM, Cal Peake wrote:
> This problem seems to be getting worse. With the latest linus tree, after
> the initial hang where the power button is needed to bring the system back
> to life, the boot process slows to a crawl.
>
> Although if I repeatedly press the shift key it'll help move things along.
> I'm guessing the keyboard interrupt firing off is doing something to help
> it. The system seems OK though after booting is finished.
>
> Upon shutdown it actually stalls out without help from the shift key.
>
> Any thoughts on this, or tips to help debug it further?
Try the 'nolapic_timer' option.
On Thu, 2 Aug 2007, Chuck Ebbert wrote:
> On 08/02/2007 01:50 PM, Cal Peake wrote:
> > This problem seems to be getting worse. With the latest linus tree, after
> > the initial hang where the power button is needed to bring the system back
> > to life, the boot process slows to a crawl.
> >
> > Although if I repeatedly press the shift key it'll help move things along.
> > I'm guessing the keyboard interrupt firing off is doing something to help
> > it. The system seems OK though after booting is finished.
> >
> > Upon shutdown it actually stalls out without help from the shift key.
> >
> > Any thoughts on this, or tips to help debug it further?
>
> Try the 'nolapic_timer' option.
Ah, thank you Chuck! This looks to have fixed the stalling/hanging
problems I was having.
Now I'm wondering if arch/i386/kernel/cpu/amd.c:amd_apic_timer_broken()
can (or needs to) be updated for this particular CPU revision.
Andi?
Thanks,
--
Cal Peake
On 08/02/2007 03:42 PM, Cal Peake wrote:
> On Thu, 2 Aug 2007, Chuck Ebbert wrote:
>
>> On 08/02/2007 01:50 PM, Cal Peake wrote:
>>> This problem seems to be getting worse. With the latest linus tree, after
>>> the initial hang where the power button is needed to bring the system back
>>> to life, the boot process slows to a crawl.
>>>
>>> Although if I repeatedly press the shift key it'll help move things along.
>>> I'm guessing the keyboard interrupt firing off is doing something to help
>>> it. The system seems OK though after booting is finished.
>>>
>>> Upon shutdown it actually stalls out without help from the shift key.
>>>
>>> Any thoughts on this, or tips to help debug it further?
>> Try the 'nolapic_timer' option.
>
> Ah, thank you Chuck! This looks to have fixed the stalling/hanging
> problems I was having.
>
> Now I'm wondering if arch/i386/kernel/cpu/amd.c:amd_apic_timer_broken()
> can (or needs to) be updated for this particular CPU revision.
What does your /proc/cpuinfo say?
On Thu, 2 Aug 2007, Chuck Ebbert wrote:
> On 08/02/2007 03:42 PM, Cal Peake wrote:
> > On Thu, 2 Aug 2007, Chuck Ebbert wrote:
> >
> >> Try the 'nolapic_timer' option.
> >
> > Ah, thank you Chuck! This looks to have fixed the stalling/hanging
> > problems I was having.
> >
> > Now I'm wondering if arch/i386/kernel/cpu/amd.c:amd_apic_timer_broken()
> > can (or needs to) be updated for this particular CPU revision.
>
> What does your /proc/cpuinfo say?
Figured I should have sent that right after I hit the send key...
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 72
model name : AMD Turion(tm) 64 X2 Mobile Technology TL-52
stepping : 2
cpu MHz : 1607.320
cache size : 512 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy ts fid vid ttp tm stc
bogomips : 3216.76
clflush size : 64
processor : 1
vendor_id : AuthenticAMD
cpu family : 15
model : 72
model name : AMD Turion(tm) 64 X2 Mobile Technology TL-52
stepping : 2
cpu MHz : 1607.320
cache size : 512 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy ts fid vid ttp tm stc
bogomips : 3214.55
clflush size : 64
--
Cal Peake
Here is my /proc/cpuinfo, I have SMP disabled at the moment. Looks
like my model is slightly older than Cal's.
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 72
model name : AMD Turion(tm) 64 X2 Mobile Technology TL-50
stepping : 2
cpu MHz : 800.000
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext
fxsr_opt rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm
extapic cr8legacy ts fid vid ttp tm stc
bogomips : 1609.36
clflush size : 64
On Thu, 2 Aug 2007, Cal Peake wrote:
>
> Figured I should have sent that right after I hit the send key...
>
> processor : 0
> vendor_id : AuthenticAMD
> cpu family : 15
> model : 72
> model name : AMD Turion(tm) 64 X2 Mobile Technology TL-52
Sadly, this doesn't show the "extended family" stuff from cpuid.
So it doesn't show any of the bits we actually care about. Sad.
That said, the "AMD Turion(tm) 64 X2 Mobile Technology TL-52" _should_ be
a REV-F CPU afaik, and it should have thus fallen through to the
"ENABLE_C1E_MASK" logic. Afaik that's broken.
Cal - can you
(a) test that forcing a "return 1" from that amd_apic_timer_broken()
function fixes it for you.
(b) make that function print out the values it uses for debugging (ie the
xtended family and model numbers, and the MSR_K8_ENABLE_C1E MSR
values)?
Andi, can you check with your AMD contacts that those bits are correct..
Maybe the "Mobile Technology" things *always* have the broken "Enhanced
Halt State", regardless of any MSR settings? That would perhaps be what
makes them "Mobile".
Linus
On Thu, 2 Aug 2007, Linus Torvalds wrote:
> On Thu, 2 Aug 2007, Cal Peake wrote:
> >
> > Figured I should have sent that right after I hit the send key...
> >
> > processor : 0
> > vendor_id : AuthenticAMD
> > cpu family : 15
> > model : 72
> > model name : AMD Turion(tm) 64 X2 Mobile Technology TL-52
>
> Sadly, this doesn't show the "extended family" stuff from cpuid.
>
> So it doesn't show any of the bits we actually care about. Sad.
>
> That said, the "AMD Turion(tm) 64 X2 Mobile Technology TL-52" _should_ be
> a REV-F CPU afaik, and it should have thus fallen through to the
> "ENABLE_C1E_MASK" logic. Afaik that's broken.
>
> Cal - can you
> (a) test that forcing a "return 1" from that amd_apic_timer_broken()
> function fixes it for you.
ACK
> (b) make that function print out the values it uses for debugging (ie the
> xtended family and model numbers, and the MSR_K8_ENABLE_C1E MSR
> values)?
eax & CPUID_XFAM == 0x00000000
eax & CPUID_XMOD == 0x00040000
MSR_K8_ENABLE_C1E lo == 0x04c14015
MSR_K8_ENABLE_C1E hi == 0x00000000
lo & ENABLE_C1E_MASK == 0
amd_apic_timer_broken: forcing return value of 1
--
Cal Peake
On Thu, 2007-08-02 at 14:07 -0700, Linus Torvalds wrote:
>
> On Thu, 2 Aug 2007, Cal Peake wrote:
> >
> > Figured I should have sent that right after I hit the send key...
> >
> > processor : 0
> > vendor_id : AuthenticAMD
> > cpu family : 15
> > model : 72
> > model name : AMD Turion(tm) 64 X2 Mobile Technology TL-52
>
> Sadly, this doesn't show the "extended family" stuff from cpuid.
>
> So it doesn't show any of the bits we actually care about. Sad.
>
> That said, the "AMD Turion(tm) 64 X2 Mobile Technology TL-52" _should_ be
> a REV-F CPU afaik, and it should have thus fallen through to the
> "ENABLE_C1E_MASK" logic. Afaik that's broken.
>
> Cal - can you
> (a) test that forcing a "return 1" from that amd_apic_timer_broken()
> function fixes it for you.
> (b) make that function print out the values it uses for debugging (ie the
> xtended family and model numbers, and the MSR_K8_ENABLE_C1E MSR
> values)?
>
> Andi, can you check with your AMD contacts that those bits are correct..
> Maybe the "Mobile Technology" things *always* have the broken "Enhanced
> Halt State", regardless of any MSR settings? That would perhaps be what
> makes them "Mobile".
This is the same problem I'm seeing (See Subject: Regression in 2.6.22,
clock problems on Turion with 32-bit kernel).
This commit is what we bisected to:
commit e9e2cdb412412326c4827fc78ba27f410d837e6e
Author: Thomas Gleixner <[email protected]>
Date: Fri Feb 16 01:28:04 2007 -0800
[PATCH] clockevents: i386 drivers
Add clockevent drivers for i386: lapic (local) and PIT/HPET (global). Update
the timer IRQ to call into the PIT/HPET driver's event handler and the
lapic-timer IRQ to call into the lapic clockevent driver. The assignement of
timer functionality is delegated to the core framework code and replaces the
compile and runtime evalution in do_timer_interrupt_hook()
Use the clockevents broadcast support and implement the lapic_broadcast
function for ACPI.
No changes to existing functionality.
--
Ubuntu : http://www.ubuntu.com/
Linux1394: http://wiki.linux1394.org/
On 08/03/2007 11:52 AM, Ben Collins wrote:
>
> This is the same problem I'm seeing (See Subject: Regression in 2.6.22,
> clock problems on Turion with 32-bit kernel).
>
> This commit is what we bisected to:
>
> commit e9e2cdb412412326c4827fc78ba27f410d837e6e
> Author: Thomas Gleixner <[email protected]>
> Date: Fri Feb 16 01:28:04 2007 -0800
>
> [PATCH] clockevents: i386 drivers
>
Yes, and the lapic timer apparently worked okay until then, right?
FWIW when you disable it this appears in the boot messages:
Clockevents: could not switch to one-shot mode:<6>Clockevents: could not switch to one-shot mode: lapic is not functional.
Could not switch to high resolution mode on CPU 0
lapic is not functional.
Could not switch to high resolution mode on CPU 1
Yet "highres=off" does not fix the problem. Very strange...
On Thu, 2 Aug 2007, Cal Peake wrote:
> On Thu, 2 Aug 2007, Linus Torvalds wrote:
> >
> > That said, the "AMD Turion(tm) 64 X2 Mobile Technology TL-52" _should_ be
> > a REV-F CPU afaik, and it should have thus fallen through to the
> > "ENABLE_C1E_MASK" logic. Afaik that's broken.
> >
> > Cal - can you
> > (a) test that forcing a "return 1" from that amd_apic_timer_broken()
> > function fixes it for you.
>
> ACK
>
> > (b) make that function print out the values it uses for debugging (ie the
> > xtended family and model numbers, and the MSR_K8_ENABLE_C1E MSR
> > values)?
>
> eax & CPUID_XFAM == 0x00000000
> eax & CPUID_XMOD == 0x00040000
Yeah, that's a REV-F
> MSR_K8_ENABLE_C1E lo == 0x04c14015
> MSR_K8_ENABLE_C1E hi == 0x00000000
> lo & ENABLE_C1E_MASK == 0
And yeah, that claims that C1E is not on, but:
> amd_apic_timer_broken: forcing return value of 1
since this makes it all work for you, it does appear that the AMD local
timer stops in C1 even when that isn't true, and as such is not useful.
Sad. It probably means that we have to disable the local timer for *all*
modern AMD CPU's.
Thomas/Ingo - did something change in the local apic programming? Or why
did this work before? Was it just that we didn't use the local timer apic
for some other reason?
Linus
> > amd_apic_timer_broken: forcing return value of 1
>
> since this makes it all work for you, it does appear that the AMD local
> timer stops in C1 even when that isn't true, and as such is not useful.
Probably messed up SMM code. Might not apply to all AMD CPUs.
> Sad. It probably means that we have to disable the local timer for *all*
> modern AMD CPU's.
I saw an APIC timer problem on a older AMD laptop too (long before RevF).
Back then I asked and it was apparently some SMM bug of that specific
BIOS.
What exact type of machine is it?
>
> Thomas/Ingo - did something change in the local apic programming? Or why
> did this work before? Was it just that we didn't use the local timer apic
We used to always use the interrupt 0 too, which woke up the CPU and
then piggy backed the APIC interrupt too which works once the CPU
is woken.
Thomas changed that then when he redid the code for clock sources
because interrupt 0/PIT isn't good for frequent reprogramming.
I did similar experiments some time ago but found it caused a lot of
problems and wasn't as brave as Thomas in working through them, so
i never pushed these patches.
-Andi
> What exact type of machine is it?
Not sure which one Cal has but I have an Averatec 2370 12inch laptop
with an nVidia MCP51 chipset.
AMD Turion(tm) 64 X2 Mobile Technology TL-50
On Sat, 2007-08-04 at 11:30 +0200, Andi Kleen wrote:
> > > amd_apic_timer_broken: forcing return value of 1
> What exact type of machine is it?
FYI: There seem to be a very wide range of Turion machines affected by
the latest no_hz/time/clockevents changes.
Tim Gardner reported a hang while booting and also pointed to commit:
e9e2cdb412412326c4827fc78ba27f410d837e6e
In the thread: ACPI Regression on Dell E1501
http://marc.info/?t=118246118400004&r=1&w=2
I am also seeing the hang Tim reported with a Ferrari F5000
(AMD Turion(tm) 64 X2 Mobile Technology TL-60)
On this machine, the hang only occurs sporadically (about every third
time). It seems, once it could pass a critical init section all is fine?
Hope that helps in some way...
Thomas
On Fri, 3 Aug 2007, Linus Torvalds wrote:
> > MSR_K8_ENABLE_C1E lo == 0x04c14015
> > MSR_K8_ENABLE_C1E hi == 0x00000000
> > lo & ENABLE_C1E_MASK == 0
>
> And yeah, that claims that C1E is not on, but:
>
> > amd_apic_timer_broken: forcing return value of 1
So it seems my initial debugging report was, err, incomplete. I failed to
notice that the amd_apic_timer_broken function was getting called twice,
once for each core.
The second call shows this:
MSR_K8_ENABLE_C1E == 0x14c14015
which causes our ENABLE_C1E_MASK check to be true and thus properly
return 1 from the function. So when we call the above function from
init_amd we prolly need to do a
set_bit(X86_FEATURE_LAPIC_TIMER_BROKEN, c->x86_capability);
for each core if any of them happen to return true upon checking for a
broken timer.
Andi, does that seem right?
--
Cal Peake
On Tue, Aug 07, 2007 at 06:15:37PM -0400, Cal Peake wrote:
> On Fri, 3 Aug 2007, Linus Torvalds wrote:
>
> > > MSR_K8_ENABLE_C1E lo == 0x04c14015
> > > MSR_K8_ENABLE_C1E hi == 0x00000000
> > > lo & ENABLE_C1E_MASK == 0
> >
> > And yeah, that claims that C1E is not on, but:
> >
> > > amd_apic_timer_broken: forcing return value of 1
>
> So it seems my initial debugging report was, err, incomplete. I failed to
> notice that the amd_apic_timer_broken function was getting called twice,
> once for each core.
>
> The second call shows this:
>
> MSR_K8_ENABLE_C1E == 0x14c14015
Ah interesting. Ok finally that all starts making sense.
Not sure why the MSR varies between cores though.
> which causes our ENABLE_C1E_MASK check to be true and thus properly
> return 1 from the function. So when we call the above function from
> init_amd we prolly need to do a
>
> set_bit(X86_FEATURE_LAPIC_TIMER_BROKEN, c->x86_capability);
>
> for each core if any of them happen to return true upon checking for a
> broken timer.
It's better to just make it a global instead.
-Andi
On Wed, 8 Aug 2007, Andi Kleen wrote:
> Not sure why the MSR varies between cores though.
Yeah that boggled me too.
> It's better to just make it a global instead.
Haven't gotten to figuring out how to do *that* yet... but here's a
cleanup for the detection function:
From: Cal Peake <[email protected]>
We only care about the lower 32-bits when reading the Interrupt Pending
Message Register so drop the 'hi' variable and use rdmsrl() instead.
Signed-off-by: Cal Peake <[email protected]>
--- ./arch/i386/kernel/cpu/amd.c~orig 2007-08-07 20:22:26.000000000 -0400
+++ ./arch/i386/kernel/cpu/amd.c 2007-08-07 20:23:22.000000000 -0400
@@ -34,7 +34,7 @@ __asm__(".align 4\nvide: ret");
/* AMD systems with C1E don't have a working lAPIC timer. Check for that. */
static __cpuinit int amd_apic_timer_broken(void)
{
- u32 lo, hi;
+ u32 msr;
u32 eax = cpuid_eax(CPUID_PROCESSOR_SIGNATURE);
switch (eax & CPUID_XFAM) {
case CPUID_XFAM_K8:
@@ -42,8 +42,8 @@ static __cpuinit int amd_apic_timer_brok
break;
case CPUID_XFAM_10H:
case CPUID_XFAM_11H:
- rdmsr(MSR_K8_ENABLE_C1E, lo, hi);
- if (lo & ENABLE_C1E_MASK)
+ rdmsrl(MSR_K8_ENABLE_C1E, msr);
+ if (msr & ENABLE_C1E_MASK)
return 1;
break;
default:
On Wednesday 08 August 2007 02:53:21 Cal Peake wrote:
> On Wed, 8 Aug 2007, Andi Kleen wrote:
>
> > Not sure why the MSR varies between cores though.
>
> Yeah that boggled me too.
>
> > It's better to just make it a global instead.
>
> Haven't gotten to figuring out how to do *that* yet... but here's a
> cleanup for the detection function:
Can you please test if this patch works?
BTW I checked with AMD and they seem to think it's just a buggy BIOS.
-Andi
Use global flag to disable broken local apic timer on AMD CPUs.
The Averatec 2370 laptop BIOS seems to program the ENABLE_C1E
MSR inconsistently between cores. This confuses the lapic
use heuristics wants to know if C1E is enabled anywhere.
Use a global flag instead of a per cpu flag to handle this.
If any CPU has C1E enabled disabled lapic use.
Thanks to Cal Peake for debugging.
Signed-off-by: Andi Kleen <[email protected]>
Index: linux/arch/i386/kernel/apic.c
===================================================================
--- linux.orig/arch/i386/kernel/apic.c
+++ linux/arch/i386/kernel/apic.c
@@ -61,8 +61,9 @@ static int enable_local_apic __initdata
/* Local APIC timer verification ok */
static int local_apic_timer_verify_ok;
-/* Disable local APIC timer from the kernel commandline or via dmi quirk */
-static int local_apic_timer_disabled;
+/* Disable local APIC timer from the kernel commandline or via dmi quirk
+ or using CPU MSR check */
+int local_apic_timer_disabled;
/* Local APIC timer works in C2 */
int local_apic_timer_c2_ok;
EXPORT_SYMBOL_GPL(local_apic_timer_c2_ok);
@@ -370,9 +371,6 @@ void __init setup_boot_APIC_clock(void)
long delta, deltapm;
int pm_referenced = 0;
- if (boot_cpu_has(X86_FEATURE_LAPIC_TIMER_BROKEN))
- local_apic_timer_disabled = 1;
-
/*
* The local apic timer can be disabled via the kernel
* commandline or from the test above. Register the lapic
Index: linux/arch/i386/kernel/cpu/amd.c
===================================================================
--- linux.orig/arch/i386/kernel/cpu/amd.c
+++ linux/arch/i386/kernel/cpu/amd.c
@@ -3,6 +3,7 @@
#include <linux/mm.h>
#include <asm/io.h>
#include <asm/processor.h>
+#include <asm/apic.h>
#include "cpu.h"
@@ -22,6 +23,7 @@
extern void vide(void);
__asm__(".align 4\nvide: ret");
+#ifdef CONFIG_X86_LOCAL_APIC
#define ENABLE_C1E_MASK 0x18000000
#define CPUID_PROCESSOR_SIGNATURE 1
#define CPUID_XFAM 0x0ff00000
@@ -52,6 +54,7 @@ static __cpuinit int amd_apic_timer_brok
}
return 0;
}
+#endif
int force_mwait __cpuinitdata;
@@ -282,8 +285,10 @@ static void __cpuinit init_amd(struct cp
num_cache_leaves = 3;
}
+#ifdef CONFIG_X86_LOCAL_APIC
if (amd_apic_timer_broken())
- set_bit(X86_FEATURE_LAPIC_TIMER_BROKEN, c->x86_capability);
+ local_apic_timer_disabled = 1;
+#endif
if (c->x86 == 0x10 && !force_mwait)
clear_bit(X86_FEATURE_MWAIT, c->x86_capability);
Index: linux/include/asm-i386/apic.h
===================================================================
--- linux.orig/include/asm-i386/apic.h
+++ linux/include/asm-i386/apic.h
@@ -116,6 +116,8 @@ extern void enable_NMI_through_LVT0 (voi
extern int timer_over_8254;
extern int local_apic_timer_c2_ok;
+extern int local_apic_timer_disabled;
+
#else /* !CONFIG_X86_LOCAL_APIC */
static inline void lapic_shutdown(void) { }
Index: linux/include/asm-i386/cpufeature.h
===================================================================
--- linux.orig/include/asm-i386/cpufeature.h
+++ linux/include/asm-i386/cpufeature.h
@@ -79,7 +79,7 @@
#define X86_FEATURE_ARCH_PERFMON (3*32+11) /* Intel Architectural PerfMon */
#define X86_FEATURE_PEBS (3*32+12) /* Precise-Event Based Sampling */
#define X86_FEATURE_BTS (3*32+13) /* Branch Trace Store */
-#define X86_FEATURE_LAPIC_TIMER_BROKEN (3*32+ 14) /* lapic timer broken in C1 */
+/* 14 free */
#define X86_FEATURE_SYNC_RDTSC (3*32+15) /* RDTSC synchronizes the CPU */
#define X86_FEATURE_REP_GOOD (3*32+16) /* rep microcode works well on this CPU */
On Wednesday 08 August 2007 02:06:31 Andi Kleen wrote:
> On Tue, Aug 07, 2007 at 06:15:37PM -0400, Cal Peake wrote:
> > On Fri, 3 Aug 2007, Linus Torvalds wrote:
> > > > MSR_K8_ENABLE_C1E lo == 0x04c14015
> > > > MSR_K8_ENABLE_C1E hi == 0x00000000
> > > > lo & ENABLE_C1E_MASK == 0
> > >
> > > And yeah, that claims that C1E is not on, but:
> > > > amd_apic_timer_broken: forcing return value of 1
> >
> > So it seems my initial debugging report was, err, incomplete. I failed to
> > notice that the amd_apic_timer_broken function was getting called twice,
> > once for each core.
> >
> > The second call shows this:
> >
> > MSR_K8_ENABLE_C1E == 0x14c14015
>
> Ah interesting. Ok finally that all starts making sense.
>
> Not sure why the MSR varies between cores though.
This is a BIOS bug as the BIOS should have programmed the MSR the same for
both cores. See section 10.2.4 of the Rev F BKDG [1] (10.2.4.1 talks about
the SMI case but a newer version of the doc not yet release has similar
wording about both cores needing to have the bit set for the chipset case).
-Joachim
[1]
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/32559.pdf
On Wed, 8 Aug 2007, Andi Kleen wrote:
> Can you please test if this patch works?
Yep, seems to do the trick. Thanks!
> BTW I checked with AMD and they seem to think it's just a buggy BIOS.
Nod. Atleast we can work around it.
> Use global flag to disable broken local apic timer on AMD CPUs.
>
> The Averatec 2370 laptop BIOS seems to program the ENABLE_C1E
s~2370~2370/2371~ to be completely accurate ;)
> MSR inconsistently between cores. This confuses the lapic
> use heuristics wants to know if C1E is enabled anywhere.
>
> Use a global flag instead of a per cpu flag to handle this.
> If any CPU has C1E enabled disabled lapic use.
>
> Thanks to Cal Peake for debugging.
> Signed-off-by: Andi Kleen <[email protected]>
Acked-by: Cal Peake <[email protected]>
--
Cal Peake
Cal Peake wrote:
> On Wed, 8 Aug 2007, Andi Kleen wrote:
>
>> Can you please test if this patch works?
>
> Yep, seems to do the trick. Thanks!
>
>> BTW I checked with AMD and they seem to think it's just a buggy BIOS.
>
> Nod. Atleast we can work around it.
>
>> Use global flag to disable broken local apic timer on AMD CPUs.
>>
>> The Averatec 2370 laptop BIOS seems to program the ENABLE_C1E
>
> s~2370~2370/2371~ to be completely accurate ;)
>
>> MSR inconsistently between cores. This confuses the lapic
>> use heuristics wants to know if C1E is enabled anywhere.
>>
>> Use a global flag instead of a per cpu flag to handle this.
>> If any CPU has C1E enabled disabled lapic use.
>>
>> Thanks to Cal Peake for debugging.
>> Signed-off-by: Andi Kleen <[email protected]>
>
> Acked-by: Cal Peake <[email protected]>
>
This patch also solves the boot problem on a Dell E1501. I started the
thread "ACPI Regression on Dell E1501" regarding this issue on June 21,
2007.
Acked-by: Tim Gardner <[email protected]>
--
Tim Gardner [email protected]
I have the latest BIOS update for my laptop which is buggy I suppose.
There has been only one update this year if my memory serves me
correctly. Is there any hope to fix this or am I at the mercy of the
hardware vendor which apparenlty doesn't look like they will release
another patch this year. Please forgive me as I am not a kernel
developer but a concerned user. I've sacrificed ACPI in favor of SMP
at this point, I don't know what it's buying me but the kernel boots
and works fine with the draw back that I have no ACPI and I have to
manually power the computer down by pressing the power button when it
halts. I can live with that if that is the solution but I haven't
really tracked as far as the high level dev stuff goes in this thread
and don't know what the solution might be.
On 8/8/07, Joachim Deguara <[email protected]> wrote:
> On Wednesday 08 August 2007 02:06:31 Andi Kleen wrote:
> > On Tue, Aug 07, 2007 at 06:15:37PM -0400, Cal Peake wrote:
> > > On Fri, 3 Aug 2007, Linus Torvalds wrote:
> > > > > MSR_K8_ENABLE_C1E lo == 0x04c14015
> > > > > MSR_K8_ENABLE_C1E hi == 0x00000000
> > > > > lo & ENABLE_C1E_MASK == 0
> > > >
> > > > And yeah, that claims that C1E is not on, but:
> > > > > amd_apic_timer_broken: forcing return value of 1
> > >
> > > So it seems my initial debugging report was, err, incomplete. I failed to
> > > notice that the amd_apic_timer_broken function was getting called twice,
> > > once for each core.
> > >
> > > The second call shows this:
> > >
> > > MSR_K8_ENABLE_C1E == 0x14c14015
> >
> > Ah interesting. Ok finally that all starts making sense.
> >
> > Not sure why the MSR varies between cores though.
>
> This is a BIOS bug as the BIOS should have programmed the MSR the same for
> both cores. See section 10.2.4 of the Rev F BKDG [1] (10.2.4.1 talks about
> the SMI case but a newer version of the doc not yet release has similar
> wording about both cores needing to have the bit set for the chipset case).
>
> -Joachim
>
> [1]
> http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/32559.pdf
>
>
>
On Thursday 09 August 2007 01:52:37 Frank Hale wrote:
> I have the latest BIOS update for my laptop which is buggy I suppose.
> There has been only one update this year if my memory serves me
> correctly. Is there any hope to fix this or am I at the mercy of the
> hardware vendor which apparenlty doesn't look like they will release
> another patch this year.
You can fix this with the kernel parameter nolapic_timer as Cal reported
earlier that working right, or try Andi's patch to automagically mark the
lapic as bad. This should work fine with ACPI.
The BIOS is at fault but Andi's patch just works around the BIOS'
incompleteness and has been successful for two other people.
-Joachim