2010-08-16 19:25:56

by Alok Kataria

[permalink] [raw]
Subject: [Patch] Skip cpu_calibrate for kernel running under hypervisors.

Hi,

This is a trivial change to fix the cpu_khz value returned when running
on a virtualized environment. We have seen instances when the cpu_khz
value is off by couple of MHz's when running on VMware's platform on AMD
hardware.

--
Since the TSC frequency read from hypervisor is accurate for the guest, and
since the hypervisor will always clock the vcpu at the TSC frequency, there is
no need to calibrate it again. To avoid any calibration errors through
calibrate_cpu this patch skips calling calibrate_cpu for kernel running
under hypervisors.

Signed-off-by: Alok N Kataria <[email protected]>
Cc: K. Y. Srinivasan <[email protected]>
Cc: Greg KH <[email protected]>
Cc: H. Peter Anvin <[email protected]>

Index: linux-x86-tree.git/arch/x86/kernel/tsc.c
===================================================================
--- linux-x86-tree.git.orig/arch/x86/kernel/tsc.c 2010-08-03 12:21:20.000000000 -0700
+++ linux-x86-tree.git/arch/x86/kernel/tsc.c 2010-08-13 15:07:08.000000000 -0700
@@ -927,7 +927,7 @@ void __init tsc_init(void)
}

if (cpu_has(&boot_cpu_data, X86_FEATURE_CONSTANT_TSC) &&
- (boot_cpu_data.x86_vendor == X86_VENDOR_AMD))
+ (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) && !x86_hyper)
cpu_khz = calibrate_cpu();

printk("Detected %lu.%03lu MHz processor.\n",


2010-08-16 23:56:44

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [Patch] Skip cpu_calibrate for kernel running under hypervisors.

On 08/16/2010 12:25 PM, Alok Kataria wrote:
> Hi,
>
> This is a trivial change to fix the cpu_khz value returned when running
> on a virtualized environment. We have seen instances when the cpu_khz
> value is off by couple of MHz's when running on VMware's platform on AMD
> hardware.
>
> --
> Since the TSC frequency read from hypervisor is accurate for the guest, and
> since the hypervisor will always clock the vcpu at the TSC frequency, there is
> no need to calibrate it again. To avoid any calibration errors through
> calibrate_cpu this patch skips calling calibrate_cpu for kernel running
> under hypervisors.
>

I'm somewhat reluctant to take this one, since it assumes all the
hypervisors act the same. This seems rather inherently wrong. In fact,
the whole statement is fishy as heck... instead of being dependent on
AMD and so on, this should either be a function pointer or a CPU
(mis)feature bit.

Can we do this saner?

-hpa

2010-08-17 05:51:53

by Alok Kataria

[permalink] [raw]
Subject: Re: [Patch] Skip cpu_calibrate for kernel running under hypervisors.

Thanks for taking a look.

On Mon, 2010-08-16 at 16:56 -0700, H. Peter Anvin wrote:
> On 08/16/2010 12:25 PM, Alok Kataria wrote:
> > Hi,
> >
> > This is a trivial change to fix the cpu_khz value returned when running
> > on a virtualized environment. We have seen instances when the cpu_khz
> > value is off by couple of MHz's when running on VMware's platform on AMD
> > hardware.
> >
> > --
> > Since the TSC frequency read from hypervisor is accurate for the guest, and
> > since the hypervisor will always clock the vcpu at the TSC frequency, there is
> > no need to calibrate it again. To avoid any calibration errors through
> > calibrate_cpu this patch skips calling calibrate_cpu for kernel running
> > under hypervisors.
> >
>
> I'm somewhat reluctant to take this one, since it assumes all the
> hypervisors act the same. This seems rather inherently wrong. In fact,
> the whole statement is fishy as heck... instead of being dependent on
> AMD and so on,

The check about being on AMD is something that was already there.

> this should either be a function pointer or a CPU
> (mis)feature bit.

In any case, I agree that my previous patch did assume all hypervisors
to be same, which might be wrong. How about using the
X86_FEATURE_TSC_RELIABLE bit for this too ? i.e. Skip cpu_calibrate call
if TSC_RELIABLE bit is set. As of now that bit is set for vmware only.

Something like the below.

Signed-off-by: Alok N Kataria <[email protected]>
Cc: H. Peter Anvin <[email protected]>

Index: linux-x86-tree.git/arch/x86/kernel/tsc.c
===================================================================
--- linux-x86-tree.git.orig/arch/x86/kernel/tsc.c 2010-08-03 12:21:20.000000000 -0700
+++ linux-x86-tree.git/arch/x86/kernel/tsc.c 2010-08-16 21:59:32.000000000 -0700
@@ -927,7 +927,8 @@ void __init tsc_init(void)
}

if (cpu_has(&boot_cpu_data, X86_FEATURE_CONSTANT_TSC) &&
- (boot_cpu_data.x86_vendor == X86_VENDOR_AMD))
+ (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) &&
+ !(cpu_has(&boot_cpu_data, X86_FEATURE_TSC_RELIABLE)))
cpu_khz = calibrate_cpu();

printk("Detected %lu.%03lu MHz processor.\n",

2010-08-17 06:30:55

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [Patch] Skip cpu_calibrate for kernel running under hypervisors.

On 08/16/2010 10:51 PM, Alok Kataria wrote:
>>
>> I'm somewhat reluctant to take this one, since it assumes all the
>> hypervisors act the same. This seems rather inherently wrong. In fact,
>> the whole statement is fishy as heck... instead of being dependent on
>> AMD and so on,
>
> The check about being on AMD is something that was already there.
>

I know it was... and calibrate_cpu() seems to be an AMD-specific
function, but that's rather crappy. I'm thinking that perhaps we should
make it an x86_init function, then the AMD CPU detection can install it
and the vmware hypervisor detection can uninstall it.

>> this should either be a function pointer or a CPU
>> (mis)feature bit.
>
> In any case, I agree that my previous patch did assume all hypervisors
> to be same, which might be wrong. How about using the
> X86_FEATURE_TSC_RELIABLE bit for this too ? i.e. Skip cpu_calibrate call
> if TSC_RELIABLE bit is set. As of now that bit is set for vmware only.
>
> Something like the below.
>
> Signed-off-by: Alok N Kataria <[email protected]>
> Cc: H. Peter Anvin <[email protected]>
>
> Index: linux-x86-tree.git/arch/x86/kernel/tsc.c
> ===================================================================
> --- linux-x86-tree.git.orig/arch/x86/kernel/tsc.c 2010-08-03 12:21:20.000000000 -0700
> +++ linux-x86-tree.git/arch/x86/kernel/tsc.c 2010-08-16 21:59:32.000000000 -0700
> @@ -927,7 +927,8 @@ void __init tsc_init(void)
> }
>
> if (cpu_has(&boot_cpu_data, X86_FEATURE_CONSTANT_TSC) &&
> - (boot_cpu_data.x86_vendor == X86_VENDOR_AMD))
> + (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) &&
> + !(cpu_has(&boot_cpu_data, X86_FEATURE_TSC_RELIABLE)))
> cpu_khz = calibrate_cpu();
>
> printk("Detected %lu.%03lu MHz processor.\n",
>

That seems like a much better approach.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2010-08-17 07:05:25

by Borislav Petkov

[permalink] [raw]
Subject: Re: [Patch] Skip cpu_calibrate for kernel running under hypervisors.

From: "H. Peter Anvin" <[email protected]>
Date: Mon, Aug 16, 2010 at 11:30:48PM -0700

> On 08/16/2010 10:51 PM, Alok Kataria wrote:
> >>
> >> I'm somewhat reluctant to take this one, since it assumes all the
> >> hypervisors act the same. This seems rather inherently wrong. In fact,
> >> the whole statement is fishy as heck... instead of being dependent on
> >> AMD and so on,
> >
> > The check about being on AMD is something that was already there.
> >
>
> I know it was... and calibrate_cpu() seems to be an AMD-specific
> function, but that's rather crappy. I'm thinking that perhaps we should
> make it an x86_init function, then the AMD CPU detection can install it
> and the vmware hypervisor detection can uninstall it.

Btw, can we revisit this AMD-specific issue? IIUC, Alok you're seeing
a mismatch between the calibrated TSC value and the cpu frequency even
on cpus which have the CONSTANT_TSC bit set, i.e. their TSC is counting
with P0 frequency. Can you please elaborate more on what systems you're
seeing this (cpu family, chipset, etc)?

Thanks.

--
Regards/Gruss,
Boris.

2010-08-17 16:45:38

by Alok Kataria

[permalink] [raw]
Subject: Re: [Patch] Skip cpu_calibrate for kernel running under hypervisors.

Hi Borislav,

On Tue, 2010-08-17 at 00:05 -0700, Borislav Petkov wrote:
> From: "H. Peter Anvin" <[email protected]>
> Date: Mon, Aug 16, 2010 at 11:30:48PM -0700
>
> > On 08/16/2010 10:51 PM, Alok Kataria wrote:
> > >>
> > >> I'm somewhat reluctant to take this one, since it assumes all the
> > >> hypervisors act the same. This seems rather inherently wrong. In fact,
> > >> the whole statement is fishy as heck... instead of being dependent on
> > >> AMD and so on,
> > >
> > > The check about being on AMD is something that was already there.
> > >
> >
> > I know it was... and calibrate_cpu() seems to be an AMD-specific
> > function, but that's rather crappy. I'm thinking that perhaps we should
> > make it an x86_init function, then the AMD CPU detection can install it
> > and the vmware hypervisor detection can uninstall it.
>
> Btw, can we revisit this AMD-specific issue? IIUC, Alok you're seeing
> a mismatch between the calibrated TSC value and the cpu frequency even
> on cpus which have the CONSTANT_TSC bit set, i.e. their TSC is counting
> with P0 frequency. Can you please elaborate more on what systems you're
> seeing this (cpu family, chipset, etc)?

We have seen these issues when running inside a Virtual Machine on
VMware's platform. Please look at the vmware_set_cpu_features function,
it relies on the hypervisor to provide a constant/reliable TSC. Though
still when running the kernel on virtual cpus, as compared to
running on physical cpus, the timing characteristics are different,
since virtual cpus have to time share physical cpus with each other,
which may result in errors during calibration. As a result its better to
get these values directly from the hypervisor rather than trying to
calibrate them.

And just to clarify, we have never seen this on a physical machine.

Thanks,
Alok


>
> Thanks.
>

2010-08-17 16:48:33

by Alok Kataria

[permalink] [raw]
Subject: Re: [Patch] Skip cpu_calibrate for kernel running under hypervisors.

Hi HPA,

On Mon, 2010-08-16 at 23:30 -0700, H. Peter Anvin wrote:
> On 08/16/2010 10:51 PM, Alok Kataria wrote:
> >>
> >> I'm somewhat reluctant to take this one, since it assumes all the
> >> hypervisors act the same. This seems rather inherently wrong. In fact,
> >> the whole statement is fishy as heck... instead of being dependent on
> >> AMD and so on,
> >
> > The check about being on AMD is something that was already there.
> >
>
> I know it was... and calibrate_cpu() seems to be an AMD-specific
> function, but that's rather crappy. I'm thinking that perhaps we should
> make it an x86_init function, then the AMD CPU detection can install it
> and the vmware hypervisor detection can uninstall it.

I am planning to add a calibrate_apic function ptr in x86_platform_ops,
for getting the APIC frequency too directly from the hypervisor. If you
want I can add this calibrate_cpu function ptr too or is the patch below
okay for now ?

Thanks,
Alok

>
> >> this should either be a function pointer or a CPU
> >> (mis)feature bit.
> >
> > In any case, I agree that my previous patch did assume all hypervisors
> > to be same, which might be wrong. How about using the
> > X86_FEATURE_TSC_RELIABLE bit for this too ? i.e. Skip cpu_calibrate call
> > if TSC_RELIABLE bit is set. As of now that bit is set for vmware only.
> >
> > Something like the below.
> >
> > Signed-off-by: Alok N Kataria <[email protected]>
> > Cc: H. Peter Anvin <[email protected]>
> >
> > Index: linux-x86-tree.git/arch/x86/kernel/tsc.c
> > ===================================================================
> > --- linux-x86-tree.git.orig/arch/x86/kernel/tsc.c 2010-08-03 12:21:20.000000000 -0700
> > +++ linux-x86-tree.git/arch/x86/kernel/tsc.c 2010-08-16 21:59:32.000000000 -0700
> > @@ -927,7 +927,8 @@ void __init tsc_init(void)
> > }
> >
> > if (cpu_has(&boot_cpu_data, X86_FEATURE_CONSTANT_TSC) &&
> > - (boot_cpu_data.x86_vendor == X86_VENDOR_AMD))
> > + (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) &&
> > + !(cpu_has(&boot_cpu_data, X86_FEATURE_TSC_RELIABLE)))
> > cpu_khz = calibrate_cpu();
> >
> > printk("Detected %lu.%03lu MHz processor.\n",
> >
>
> That seems like a much better approach.
>
> -hpa
>

2010-08-17 16:49:47

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [Patch] Skip cpu_calibrate for kernel running under hypervisors.

On 08/17/2010 09:48 AM, Alok Kataria wrote:
>
> I am planning to add a calibrate_apic function ptr in x86_platform_ops,
> for getting the APIC frequency too directly from the hypervisor. If you
> want I can add this calibrate_cpu function ptr too or is the patch below
> okay for now ?
>

That would be good.

-hpa

2010-08-17 18:56:42

by Borislav Petkov

[permalink] [raw]
Subject: Re: [Patch] Skip cpu_calibrate for kernel running under hypervisors.

From: Alok Kataria <[email protected]>
Date: Tue, Aug 17, 2010 at 09:45:32AM -0700

Hi Alok,

> > > I know it was... and calibrate_cpu() seems to be an AMD-specific
> > > function, but that's rather crappy. I'm thinking that perhaps we should
> > > make it an x86_init function, then the AMD CPU detection can install it
> > > and the vmware hypervisor detection can uninstall it.
> >
> > Btw, can we revisit this AMD-specific issue? IIUC, Alok you're seeing
> > a mismatch between the calibrated TSC value and the cpu frequency even
> > on cpus which have the CONSTANT_TSC bit set, i.e. their TSC is counting
> > with P0 frequency. Can you please elaborate more on what systems you're
> > seeing this (cpu family, chipset, etc)?
>
> We have seen these issues when running inside a Virtual Machine on
> VMware's platform. Please look at the vmware_set_cpu_features function,
> it relies on the hypervisor to provide a constant/reliable TSC. Though
> still when running the kernel on virtual cpus, as compared to
> running on physical cpus, the timing characteristics are different,
> since virtual cpus have to time share physical cpus with each other,
> which may result in errors during calibration. As a result its better to
> get these values directly from the hypervisor rather than trying to
> calibrate them.

Thanks for clarifying this. I kinda stumbled over "AMD hardware" in your
previous email but now its perfectly clear.

> And just to clarify, we have never seen this on a physical machine.

Anyway, your patch got me looking into the code and currently we do
calibrate_cpu() on all AMD systems with invariant TSC. However, machines
starting with F10h revB have a setting in MSRC001_0015[24] which, when
set, denotes that the TSC counts with P0 frequency and thus equals the
core frequency. So for those machines we won't need to calibrate the cpu
frequency. I'm figuring out some additional details internally and will
be sending a fix soon.

Thanks.

--
Regards/Gruss,
Boris.

2010-08-18 16:14:41

by Borislav Petkov

[permalink] [raw]
Subject: [PATCH] x86, tsc: Limit CPU frequency calibration on AMD

6b37f5a20c0e5c334c010a587058354215433e92 introduced the CPU frequency
calibration code for AMD CPUs whose TSCs didn't increment with the
core's P0 frequency. From F10h, revB onward, the TSC increment rate is
denoted by MSRC001_0015[24] and when this bit is set (which is normally
done by the BIOS,) the TSC increments with the P0 frequency so the
calibration is not needed and booting can be a couple of mcecs faster on
those machines.

Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/kernel/tsc.c | 14 ++++++++++++--
1 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index ce8e502..41b2b8b 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -927,8 +927,18 @@ void __init tsc_init(void)
}

if (cpu_has(&boot_cpu_data, X86_FEATURE_CONSTANT_TSC) &&
- (boot_cpu_data.x86_vendor == X86_VENDOR_AMD))
- cpu_khz = calibrate_cpu();
+ (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)) {
+
+ if (boot_cpu_data.x86 > 0x10 ||
+ (boot_cpu_data.x86 == 0x10 &&
+ boot_cpu_data.x86_model >= 0x2)) {
+ u64 val;
+
+ rdmsrl(MSR_K7_HWCR, val);
+ if (!(val & BIT(24)))
+ cpu_khz = calibrate_cpu();
+ }
+ }

printk("Detected %lu.%03lu MHz processor.\n",
(unsigned long)cpu_khz / 1000,
--
1.7.1


--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

2010-08-18 16:24:40

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86, tsc: Limit CPU frequency calibration on AMD

On 08/18/2010 09:16 AM, Borislav Petkov wrote:
> 6b37f5a20c0e5c334c010a587058354215433e92 introduced the CPU frequency
> calibration code for AMD CPUs whose TSCs didn't increment with the
> core's P0 frequency. From F10h, revB onward, the TSC increment rate is
> denoted by MSRC001_0015[24] and when this bit is set (which is normally
> done by the BIOS,) the TSC increments with the P0 frequency so the
> calibration is not needed and booting can be a couple of mcecs faster on
> those machines.
>
> Signed-off-by: Borislav Petkov <[email protected]>
> ---
> arch/x86/kernel/tsc.c | 14 ++++++++++++--
> 1 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index ce8e502..41b2b8b 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -927,8 +927,18 @@ void __init tsc_init(void)
> }
>
> if (cpu_has(&boot_cpu_data, X86_FEATURE_CONSTANT_TSC) &&
> - (boot_cpu_data.x86_vendor == X86_VENDOR_AMD))
> - cpu_khz = calibrate_cpu();
> + (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)) {
> +
> + if (boot_cpu_data.x86 > 0x10 ||
> + (boot_cpu_data.x86 == 0x10 &&
> + boot_cpu_data.x86_model >= 0x2)) {
> + u64 val;
> +
> + rdmsrl(MSR_K7_HWCR, val);
> + if (!(val & BIT(24)))
> + cpu_khz = calibrate_cpu();
> + }
> + }
>
> printk("Detected %lu.%03lu MHz processor.\n",
> (unsigned long)cpu_khz / 1000,

Yuck! There are a number of problems with this code, quite a few of
which are, of course, pre-existing, but this makes it worse.

calibrate_cpu() is AMD-specific, despite the generic name. (It is also,
strangely enough, only compiled on 64 bits for some reason???) Either
which way, it is definitely not okay for the test for when the code
applies to be this distant from the code itself.

The easy way to fix this is to rename it amd_calibrate_cpu() and move
the applicability test into the routine itself. That is probably okay
as long as there are no other users. However, if there are other users,
then this really should move into x86_init and have a function pointer
associated with it.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2010-08-18 17:32:00

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86, tsc: Limit CPU frequency calibration on AMD

From: "H. Peter Anvin" <[email protected]>
Date: Wed, Aug 18, 2010 at 12:23:08PM -0400

> calibrate_cpu() is AMD-specific, despite the generic name. (It is also,
> strangely enough, only compiled on 64 bits for some reason???)

Yeah, this is ancient stuff, who knows. I will test the fix on 32-bit
too after we agree on the design tho.

> Either which way, it is definitely not okay for the test for when the
> code applies to be this distant from the code itself.

Makes sense.

> The easy way to fix this is to rename it amd_calibrate_cpu() and move
> the applicability test into the routine itself. That is probably okay
> as long as there are no other users. However, if there are other users,
> then this really should move into x86_init and have a function pointer
> associated with it.

Right, do you have strong preferences between x86_init and x86_platform?
The version below uses x86_platform because it has the calibrate_tsc()
function in there too. Also, the version below nicely moves all that
AMD-specific code to cpu/amd.c.

I didn't opt for a calibrate_cpu_noop stub because I didn't want to
pollute x86_init.c with yet another noop prototype. But I guess I should
do that since the pointer testing is still executed while stubs are
removed completely by smart compilers :).

Anyway, the version below builds, I'll test tomorrow and send an
official version then:

--

diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index baa579c..f00ed28 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -146,6 +146,7 @@ struct x86_cpuinit_ops {
*/
struct x86_platform_ops {
unsigned long (*calibrate_tsc)(void);
+ unsigned long (*calibrate_cpu)(void);
unsigned long (*get_wallclock)(void);
int (*set_wallclock)(unsigned long nowtime);
void (*iommu_shutdown)(void);
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 60a57b1..6478bd5 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -6,6 +6,7 @@
#include <asm/processor.h>
#include <asm/apic.h>
#include <asm/cpu.h>
+#include <asm/nmi.h>
#include <asm/pci-direct.h>

#ifdef CONFIG_X86_64
@@ -381,6 +382,56 @@ static void __cpuinit early_init_amd_mc(struct cpuinfo_x86 *c)
#endif
}

+/*
+ * calibrate_cpu is used on systems with fixed rate TSCs to determine
+ * processor frequency
+ */
+#define TICK_COUNT 100000000
+unsigned long __init amd_calibrate_cpu(void)
+{
+ int tsc_start, tsc_now;
+ int i, no_ctr_free;
+ unsigned long evntsel3 = 0, pmc3 = 0, pmc_now = 0;
+ unsigned long flags;
+
+ for (i = 0; i < 4; i++)
+ if (avail_to_resrv_perfctr_nmi_bit(i))
+ break;
+ no_ctr_free = (i == 4);
+ if (no_ctr_free) {
+ WARN(1, KERN_WARNING "Warning: AMD perfctrs busy ... "
+ "cpu_khz value may be incorrect.\n");
+ i = 3;
+ rdmsrl(MSR_K7_EVNTSEL3, evntsel3);
+ wrmsrl(MSR_K7_EVNTSEL3, 0);
+ rdmsrl(MSR_K7_PERFCTR3, pmc3);
+ } else {
+ reserve_perfctr_nmi(MSR_K7_PERFCTR0 + i);
+ reserve_evntsel_nmi(MSR_K7_EVNTSEL0 + i);
+ }
+ local_irq_save(flags);
+ /* start measuring cycles, incrementing from 0 */
+ wrmsrl(MSR_K7_PERFCTR0 + i, 0);
+ wrmsrl(MSR_K7_EVNTSEL0 + i, 1 << 22 | 3 << 16 | 0x76);
+ rdtscl(tsc_start);
+ do {
+ rdmsrl(MSR_K7_PERFCTR0 + i, pmc_now);
+ tsc_now = get_cycles();
+ } while ((tsc_now - tsc_start) < TICK_COUNT);
+
+ local_irq_restore(flags);
+ if (no_ctr_free) {
+ wrmsrl(MSR_K7_EVNTSEL3, 0);
+ wrmsrl(MSR_K7_PERFCTR3, pmc3);
+ wrmsrl(MSR_K7_EVNTSEL3, evntsel3);
+ } else {
+ release_perfctr_nmi(MSR_K7_PERFCTR0 + i);
+ release_evntsel_nmi(MSR_K7_EVNTSEL0 + i);
+ }
+
+ return pmc_now * tsc_khz / (tsc_now - tsc_start);
+}
+
static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
{
early_init_amd_mc(c);
@@ -412,6 +463,23 @@ static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
set_cpu_cap(c, X86_FEATURE_EXTD_APICID);
}
#endif
+
+ /*
+ * We can check for constant TSC CPUID bit here since this bit is
+ * introduced with F10h.
+ */
+ if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) {
+
+ if (c->x86 > 0x10 ||
+ (c->x86 == 0x10 && c->x86_model >= 0x2)) {
+ u64 val;
+
+ rdmsrl(MSR_K7_HWCR, val);
+ if (!(val & BIT(24)))
+ x86_platform.calibrate_cpu = amd_calibrate_cpu;
+ }
+ } else
+ x86_platform.calibrate_cpu = amd_calibrate_cpu;
}

static void __cpuinit init_amd(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 41b2b8b..1915706 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -854,60 +854,6 @@ static void __init init_tsc_clocksource(void)
clocksource_register_khz(&clocksource_tsc, tsc_khz);
}

-#ifdef CONFIG_X86_64
-/*
- * calibrate_cpu is used on systems with fixed rate TSCs to determine
- * processor frequency
- */
-#define TICK_COUNT 100000000
-static unsigned long __init calibrate_cpu(void)
-{
- int tsc_start, tsc_now;
- int i, no_ctr_free;
- unsigned long evntsel3 = 0, pmc3 = 0, pmc_now = 0;
- unsigned long flags;
-
- for (i = 0; i < 4; i++)
- if (avail_to_resrv_perfctr_nmi_bit(i))
- break;
- no_ctr_free = (i == 4);
- if (no_ctr_free) {
- WARN(1, KERN_WARNING "Warning: AMD perfctrs busy ... "
- "cpu_khz value may be incorrect.\n");
- i = 3;
- rdmsrl(MSR_K7_EVNTSEL3, evntsel3);
- wrmsrl(MSR_K7_EVNTSEL3, 0);
- rdmsrl(MSR_K7_PERFCTR3, pmc3);
- } else {
- reserve_perfctr_nmi(MSR_K7_PERFCTR0 + i);
- reserve_evntsel_nmi(MSR_K7_EVNTSEL0 + i);
- }
- local_irq_save(flags);
- /* start measuring cycles, incrementing from 0 */
- wrmsrl(MSR_K7_PERFCTR0 + i, 0);
- wrmsrl(MSR_K7_EVNTSEL0 + i, 1 << 22 | 3 << 16 | 0x76);
- rdtscl(tsc_start);
- do {
- rdmsrl(MSR_K7_PERFCTR0 + i, pmc_now);
- tsc_now = get_cycles();
- } while ((tsc_now - tsc_start) < TICK_COUNT);
-
- local_irq_restore(flags);
- if (no_ctr_free) {
- wrmsrl(MSR_K7_EVNTSEL3, 0);
- wrmsrl(MSR_K7_PERFCTR3, pmc3);
- wrmsrl(MSR_K7_EVNTSEL3, evntsel3);
- } else {
- release_perfctr_nmi(MSR_K7_PERFCTR0 + i);
- release_evntsel_nmi(MSR_K7_EVNTSEL0 + i);
- }
-
- return pmc_now * tsc_khz / (tsc_now - tsc_start);
-}
-#else
-static inline unsigned long calibrate_cpu(void) { return cpu_khz; }
-#endif
-
void __init tsc_init(void)
{
u64 lpj;
@@ -926,19 +872,8 @@ void __init tsc_init(void)
return;
}

- if (cpu_has(&boot_cpu_data, X86_FEATURE_CONSTANT_TSC) &&
- (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)) {
-
- if (boot_cpu_data.x86 > 0x10 ||
- (boot_cpu_data.x86 == 0x10 &&
- boot_cpu_data.x86_model >= 0x2)) {
- u64 val;
-
- rdmsrl(MSR_K7_HWCR, val);
- if (!(val & BIT(24)))
- cpu_khz = calibrate_cpu();
- }
- }
+ if (x86_platform.calibrate_cpu)
+ cpu_khz = x86_platform.calibrate_cpu();

printk("Detected %lu.%03lu MHz processor.\n",
(unsigned long)cpu_khz / 1000,


--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

2010-08-18 17:46:26

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86, tsc: Limit CPU frequency calibration on AMD

On 08/18/2010 10:34 AM, Borislav Petkov wrote:
>
> Right, do you have strong preferences between x86_init and x86_platform?
> The version below uses x86_platform because it has the calibrate_tsc()
> function in there too. Also, the version below nicely moves all that
> AMD-specific code to cpu/amd.c.
>

x86_init if it is expected to be __init code, otherwise x86_platform.

> I didn't opt for a calibrate_cpu_noop stub because I didn't want to
> pollute x86_init.c with yet another noop prototype. But I guess I should
> do that since the pointer testing is still executed while stubs are
> removed completely by smart compilers :).

Don't think it matters much, but tglx might have an opinion.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2010-08-18 17:51:38

by Alok Kataria

[permalink] [raw]
Subject: Re: [PATCH] x86, tsc: Limit CPU frequency calibration on AMD

Hi Boris,

On Wed, 2010-08-18 at 10:34 -0700, Borislav Petkov wrote:
> From: "H. Peter Anvin" <[email protected]>
> Date: Wed, Aug 18, 2010 at 12:23:08PM -0400
>
> > calibrate_cpu() is AMD-specific, despite the generic name. (It is also,
> > strangely enough, only compiled on 64 bits for some reason???)
>
> Yeah, this is ancient stuff, who knows. I will test the fix on 32-bit
> too after we agree on the design tho.
>
> > Either which way, it is definitely not okay for the test for when the
> > code applies to be this distant from the code itself.
>
> Makes sense.
>
> > The easy way to fix this is to rename it amd_calibrate_cpu() and move
> > the applicability test into the routine itself. That is probably okay
> > as long as there are no other users. However, if there are other users,
> > then this really should move into x86_init and have a function pointer
> > associated with it.
>
> Right, do you have strong preferences between x86_init and x86_platform?
> The version below uses x86_platform because it has the calibrate_tsc()
> function in there too. Also, the version below nicely moves all that
> AMD-specific code to cpu/amd.c.
>
> I didn't opt for a calibrate_cpu_noop stub because I didn't want to
> pollute x86_init.c with yet another noop prototype. But I guess I should
> do that since the pointer testing is still executed while stubs are
> removed completely by smart compilers :).
>
> Anyway, the version below builds, I'll test tomorrow and send an
> official version then:
>

Thanks for doing this. I already had a patch ready doing just this,
though this change looks much nicer since you have moved all the code to
the amd file. Guess I just have to wait for you to do the noop stuff if
you are doing that. Once the patch is completed I can then just
initialize this func ptr to the noop routine when on VMware's platform.

Alok
> --
>
> diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
> index baa579c..f00ed28 100644
> --- a/arch/x86/include/asm/x86_init.h
> +++ b/arch/x86/include/asm/x86_init.h
> @@ -146,6 +146,7 @@ struct x86_cpuinit_ops {
> */
> struct x86_platform_ops {
> unsigned long (*calibrate_tsc)(void);
> + unsigned long (*calibrate_cpu)(void);
> unsigned long (*get_wallclock)(void);
> int (*set_wallclock)(unsigned long nowtime);
> void (*iommu_shutdown)(void);
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index 60a57b1..6478bd5 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -6,6 +6,7 @@
> #include <asm/processor.h>
> #include <asm/apic.h>
> #include <asm/cpu.h>
> +#include <asm/nmi.h>
> #include <asm/pci-direct.h>
>
> #ifdef CONFIG_X86_64
> @@ -381,6 +382,56 @@ static void __cpuinit early_init_amd_mc(struct cpuinfo_x86 *c)
> #endif
> }
>
> +/*
> + * calibrate_cpu is used on systems with fixed rate TSCs to determine
> + * processor frequency
> + */
> +#define TICK_COUNT 100000000
> +unsigned long __init amd_calibrate_cpu(void)
> +{
> + int tsc_start, tsc_now;
> + int i, no_ctr_free;
> + unsigned long evntsel3 = 0, pmc3 = 0, pmc_now = 0;
> + unsigned long flags;
> +
> + for (i = 0; i < 4; i++)
> + if (avail_to_resrv_perfctr_nmi_bit(i))
> + break;
> + no_ctr_free = (i == 4);
> + if (no_ctr_free) {
> + WARN(1, KERN_WARNING "Warning: AMD perfctrs busy ... "
> + "cpu_khz value may be incorrect.\n");
> + i = 3;
> + rdmsrl(MSR_K7_EVNTSEL3, evntsel3);
> + wrmsrl(MSR_K7_EVNTSEL3, 0);
> + rdmsrl(MSR_K7_PERFCTR3, pmc3);
> + } else {
> + reserve_perfctr_nmi(MSR_K7_PERFCTR0 + i);
> + reserve_evntsel_nmi(MSR_K7_EVNTSEL0 + i);
> + }
> + local_irq_save(flags);
> + /* start measuring cycles, incrementing from 0 */
> + wrmsrl(MSR_K7_PERFCTR0 + i, 0);
> + wrmsrl(MSR_K7_EVNTSEL0 + i, 1 << 22 | 3 << 16 | 0x76);
> + rdtscl(tsc_start);
> + do {
> + rdmsrl(MSR_K7_PERFCTR0 + i, pmc_now);
> + tsc_now = get_cycles();
> + } while ((tsc_now - tsc_start) < TICK_COUNT);
> +
> + local_irq_restore(flags);
> + if (no_ctr_free) {
> + wrmsrl(MSR_K7_EVNTSEL3, 0);
> + wrmsrl(MSR_K7_PERFCTR3, pmc3);
> + wrmsrl(MSR_K7_EVNTSEL3, evntsel3);
> + } else {
> + release_perfctr_nmi(MSR_K7_PERFCTR0 + i);
> + release_evntsel_nmi(MSR_K7_EVNTSEL0 + i);
> + }
> +
> + return pmc_now * tsc_khz / (tsc_now - tsc_start);
> +}
> +
> static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
> {
> early_init_amd_mc(c);
> @@ -412,6 +463,23 @@ static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
> set_cpu_cap(c, X86_FEATURE_EXTD_APICID);
> }
> #endif
> +
> + /*
> + * We can check for constant TSC CPUID bit here since this bit is
> + * introduced with F10h.
> + */
> + if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) {
> +
> + if (c->x86 > 0x10 ||
> + (c->x86 == 0x10 && c->x86_model >= 0x2)) {
> + u64 val;
> +
> + rdmsrl(MSR_K7_HWCR, val);
> + if (!(val & BIT(24)))
> + x86_platform.calibrate_cpu = amd_calibrate_cpu;
> + }
> + } else
> + x86_platform.calibrate_cpu = amd_calibrate_cpu;
> }
>
> static void __cpuinit init_amd(struct cpuinfo_x86 *c)
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 41b2b8b..1915706 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -854,60 +854,6 @@ static void __init init_tsc_clocksource(void)
> clocksource_register_khz(&clocksource_tsc, tsc_khz);
> }
>
> -#ifdef CONFIG_X86_64
> -/*
> - * calibrate_cpu is used on systems with fixed rate TSCs to determine
> - * processor frequency
> - */
> -#define TICK_COUNT 100000000
> -static unsigned long __init calibrate_cpu(void)
> -{
> - int tsc_start, tsc_now;
> - int i, no_ctr_free;
> - unsigned long evntsel3 = 0, pmc3 = 0, pmc_now = 0;
> - unsigned long flags;
> -
> - for (i = 0; i < 4; i++)
> - if (avail_to_resrv_perfctr_nmi_bit(i))
> - break;
> - no_ctr_free = (i == 4);
> - if (no_ctr_free) {
> - WARN(1, KERN_WARNING "Warning: AMD perfctrs busy ... "
> - "cpu_khz value may be incorrect.\n");
> - i = 3;
> - rdmsrl(MSR_K7_EVNTSEL3, evntsel3);
> - wrmsrl(MSR_K7_EVNTSEL3, 0);
> - rdmsrl(MSR_K7_PERFCTR3, pmc3);
> - } else {
> - reserve_perfctr_nmi(MSR_K7_PERFCTR0 + i);
> - reserve_evntsel_nmi(MSR_K7_EVNTSEL0 + i);
> - }
> - local_irq_save(flags);
> - /* start measuring cycles, incrementing from 0 */
> - wrmsrl(MSR_K7_PERFCTR0 + i, 0);
> - wrmsrl(MSR_K7_EVNTSEL0 + i, 1 << 22 | 3 << 16 | 0x76);
> - rdtscl(tsc_start);
> - do {
> - rdmsrl(MSR_K7_PERFCTR0 + i, pmc_now);
> - tsc_now = get_cycles();
> - } while ((tsc_now - tsc_start) < TICK_COUNT);
> -
> - local_irq_restore(flags);
> - if (no_ctr_free) {
> - wrmsrl(MSR_K7_EVNTSEL3, 0);
> - wrmsrl(MSR_K7_PERFCTR3, pmc3);
> - wrmsrl(MSR_K7_EVNTSEL3, evntsel3);
> - } else {
> - release_perfctr_nmi(MSR_K7_PERFCTR0 + i);
> - release_evntsel_nmi(MSR_K7_EVNTSEL0 + i);
> - }
> -
> - return pmc_now * tsc_khz / (tsc_now - tsc_start);
> -}
> -#else
> -static inline unsigned long calibrate_cpu(void) { return cpu_khz; }
> -#endif
> -
> void __init tsc_init(void)
> {
> u64 lpj;
> @@ -926,19 +872,8 @@ void __init tsc_init(void)
> return;
> }
>
> - if (cpu_has(&boot_cpu_data, X86_FEATURE_CONSTANT_TSC) &&
> - (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)) {
> -
> - if (boot_cpu_data.x86 > 0x10 ||
> - (boot_cpu_data.x86 == 0x10 &&
> - boot_cpu_data.x86_model >= 0x2)) {
> - u64 val;
> -
> - rdmsrl(MSR_K7_HWCR, val);
> - if (!(val & BIT(24)))
> - cpu_khz = calibrate_cpu();
> - }
> - }
> + if (x86_platform.calibrate_cpu)
> + cpu_khz = x86_platform.calibrate_cpu();
>
> printk("Detected %lu.%03lu MHz processor.\n",
> (unsigned long)cpu_khz / 1000,
>
>

2010-08-18 18:46:01

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86, tsc: Limit CPU frequency calibration on AMD

From: Alok Kataria <[email protected]>
Date: Wed, Aug 18, 2010 at 01:51:35PM -0400

Hi,

> Thanks for doing this. I already had a patch ready doing just this,

sorry if I have caused any conflicts with your patch.

> though this change looks much nicer since you have moved all the code
> to the amd file. Guess I just have to wait for you to do the noop
> stuff if you are doing that. Once the patch is completed I can then
> just initialize this func ptr to the noop routine when on VMware's
> platform.

yeah, actually I was on the verge of completely removing that
calibrate_cpu code since maybe 99% of the systems out there should have
a TSC counting with the P0 rate so the issue becomes moot and no need
for fixing. But according to Murphy, there'll be this one system which
needs it and thus we have to keep it, unfortunately.

Anyway, I'll keep you posted.

Thanks.

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

2010-08-19 18:47:38

by john stultz

[permalink] [raw]
Subject: Re: [PATCH] x86, tsc: Limit CPU frequency calibration on AMD

On Wed, Aug 18, 2010 at 9:16 AM, Borislav Petkov <[email protected]> wrote:
> 6b37f5a20c0e5c334c010a587058354215433e92 introduced the CPU frequency
> calibration code for AMD CPUs whose TSCs didn't increment with the
> core's P0 frequency. From F10h, revB onward, the TSC increment rate is
> denoted by MSRC001_0015[24] and when this bit is set (which is normally
> done by the BIOS,) the TSC increments with the P0 frequency so the
> calibration is not needed and booting can be a couple of mcecs faster on
> those machines.

Very cool. This was brought up on an earlier thread and is really nice
because it also avoids the 50+ppm variability easily seen in the
calibrate results boot to boot. That variance causes difficulty
keeping close NTP sync across reboots, as the persistent drift value
is invalid after a reboot.

I recently proposed a timer based solution that doesn't block bootup,
and allows for very accurate calibration. This might help fill the
gaps on legacy systems that do not provide TSC freq information. I'd
be interested if you had any comments.

https://kerneltrap.org/mailarchive/linux-kernel/2010/7/28/4598868

Notes on the code below.

> Signed-off-by: Borislav Petkov <[email protected]>
> ---
> ?arch/x86/kernel/tsc.c | ? 14 ++++++++++++--
> ?1 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index ce8e502..41b2b8b 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -927,8 +927,18 @@ void __init tsc_init(void)
> ? ? ? ?}
>
> ? ? ? ?if (cpu_has(&boot_cpu_data, X86_FEATURE_CONSTANT_TSC) &&
> - ? ? ? ? ? ? ? ? ? ? ? (boot_cpu_data.x86_vendor == X86_VENDOR_AMD))
> - ? ? ? ? ? ? ? cpu_khz = calibrate_cpu();
> + ? ? ? ? ? ? ? ? ? ? ? (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)) {
> +
> + ? ? ? ? ? ? ? if (boot_cpu_data.x86 > 0x10 ||
> + ? ? ? ? ? ? ? ? ? (boot_cpu_data.x86 == 0x10 &&
> + ? ? ? ? ? ? ? ? ? ?boot_cpu_data.x86_model >= 0x2)) {
> + ? ? ? ? ? ? ? ? ? ? ? u64 val;
> +
> + ? ? ? ? ? ? ? ? ? ? ? rdmsrl(MSR_K7_HWCR, val);
> + ? ? ? ? ? ? ? ? ? ? ? if (!(val & BIT(24)))
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? cpu_khz = calibrate_cpu();
> + ? ? ? ? ? ? ? }
> + ? ? ? }

Am I just missing the point in the code where you actually use the msr
value to assign cpu_khz? Or is the idea that the default tsc
calibration already is correct, and we don't need to further refine
it?

And yea, the calibrate_cpu function needs to be renamed.

thanks
-john

2010-08-19 20:29:34

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86, tsc: Limit CPU frequency calibration on AMD

From: john stultz <[email protected]>
Date: Thu, Aug 19, 2010 at 02:47:35PM -0400

Hi John,

> On Wed, Aug 18, 2010 at 9:16 AM, Borislav Petkov <[email protected]> wrote:
> > 6b37f5a20c0e5c334c010a587058354215433e92 introduced the CPU frequency
> > calibration code for AMD CPUs whose TSCs didn't increment with the
> > core's P0 frequency. From F10h, revB onward, the TSC increment rate is
> > denoted by MSRC001_0015[24] and when this bit is set (which is normally
> > done by the BIOS,) the TSC increments with the P0 frequency so the
> > calibration is not needed and booting can be a couple of mcecs faster on
> > those machines.
>
> Very cool. This was brought up on an earlier thread and is really nice
> because it also avoids the 50+ppm variability easily seen in the
> calibrate results boot to boot. That variance causes difficulty
> keeping close NTP sync across reboots, as the persistent drift value
> is invalid after a reboot.
>
> I recently proposed a timer based solution that doesn't block bootup,
> and allows for very accurate calibration. This might help fill the
> gaps on legacy systems that do not provide TSC freq information. I'd
> be interested if you had any comments.
>
> https://kerneltrap.org/mailarchive/linux-kernel/2010/7/28/4598868
>
> Notes on the code below.
>
> > Signed-off-by: Borislav Petkov <[email protected]>
> > ---
> > ?arch/x86/kernel/tsc.c | ? 14 ++++++++++++--
> > ?1 files changed, 12 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> > index ce8e502..41b2b8b 100644
> > --- a/arch/x86/kernel/tsc.c
> > +++ b/arch/x86/kernel/tsc.c
> > @@ -927,8 +927,18 @@ void __init tsc_init(void)
> > ? ? ? ?}
> >
> > ? ? ? ?if (cpu_has(&boot_cpu_data, X86_FEATURE_CONSTANT_TSC) &&
> > - ? ? ? ? ? ? ? ? ? ? ? (boot_cpu_data.x86_vendor == X86_VENDOR_AMD))
> > - ? ? ? ? ? ? ? cpu_khz = calibrate_cpu();
> > + ? ? ? ? ? ? ? ? ? ? ? (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)) {
> > +
> > + ? ? ? ? ? ? ? if (boot_cpu_data.x86 > 0x10 ||
> > + ? ? ? ? ? ? ? ? ? (boot_cpu_data.x86 == 0x10 &&
> > + ? ? ? ? ? ? ? ? ? ?boot_cpu_data.x86_model >= 0x2)) {
> > + ? ? ? ? ? ? ? ? ? ? ? u64 val;
> > +
> > + ? ? ? ? ? ? ? ? ? ? ? rdmsrl(MSR_K7_HWCR, val);
> > + ? ? ? ? ? ? ? ? ? ? ? if (!(val & BIT(24)))
> > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? cpu_khz = calibrate_cpu();
> > + ? ? ? ? ? ? ? }
> > + ? ? ? }
>
> Am I just missing the point in the code where you actually use the msr
> value to assign cpu_khz? Or is the idea that the default tsc
> calibration already is correct, and we don't need to further refine
> it?

Yes.

Actually Alok brought the code to my attention originally, and what
puzzled me was the fact that we do calibrate_cpu() on all F10h and later
AMD machines needlessly (well, almost all, maybe 99%). This is because,
on F10h, revB machines and later we support "TSC increments with P0
frequency" so what native_calibrate_tsc() comes up with in terms of
tsc_khz should be the same as cpu_khz.

So the MSR bit check above is to see whether the TSC increments with P0
frequency and call calibrate_cpu only if _not_.

As a result, this patch basically drops the additional cpu_khz
calibration when it isn't needed. And as such it doesn't have a whole
lot to do with the actual TSC calibration - this is up to you guys :).

The original reason for the calibrate_cpu() is that there's the
possibility for the TSC to count with the northbridge clockrate and
there we need to recalibrate obviously.

Hope that makes it more clear.

> And yea, the calibrate_cpu function needs to be renamed.

Done, the whole code is moved to cpu/amd.c anyway. I'm currently testing
the new version and will be sending out maybe tomorrow or so.

Thanks.

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

2010-08-19 20:53:09

by john stultz

[permalink] [raw]
Subject: Re: [PATCH] x86, tsc: Limit CPU frequency calibration on AMD

On Thu, 2010-08-19 at 22:29 +0200, Borislav Petkov wrote:
> From: john stultz <[email protected]>
> Date: Thu, Aug 19, 2010 at 02:47:35PM -0400
>
> Hi John,
>
> > On Wed, Aug 18, 2010 at 9:16 AM, Borislav Petkov <[email protected]> wrote:
> > > 6b37f5a20c0e5c334c010a587058354215433e92 introduced the CPU frequency
> > > calibration code for AMD CPUs whose TSCs didn't increment with the
> > > core's P0 frequency. From F10h, revB onward, the TSC increment rate is
> > > denoted by MSRC001_0015[24] and when this bit is set (which is normally
> > > done by the BIOS,) the TSC increments with the P0 frequency so the
> > > calibration is not needed and booting can be a couple of mcecs faster on
> > > those machines.
> >
> > Very cool. This was brought up on an earlier thread and is really nice
> > because it also avoids the 50+ppm variability easily seen in the
> > calibrate results boot to boot. That variance causes difficulty
> > keeping close NTP sync across reboots, as the persistent drift value
> > is invalid after a reboot.
> >
> > I recently proposed a timer based solution that doesn't block bootup,
> > and allows for very accurate calibration. This might help fill the
> > gaps on legacy systems that do not provide TSC freq information. I'd
> > be interested if you had any comments.
> >
> > https://kerneltrap.org/mailarchive/linux-kernel/2010/7/28/4598868
> >
> > Notes on the code below.
> >
> > > Signed-off-by: Borislav Petkov <[email protected]>
> > > ---
> > > arch/x86/kernel/tsc.c | 14 ++++++++++++--
> > > 1 files changed, 12 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> > > index ce8e502..41b2b8b 100644
> > > --- a/arch/x86/kernel/tsc.c
> > > +++ b/arch/x86/kernel/tsc.c
> > > @@ -927,8 +927,18 @@ void __init tsc_init(void)
> > > }
> > >
> > > if (cpu_has(&boot_cpu_data, X86_FEATURE_CONSTANT_TSC) &&
> > > - (boot_cpu_data.x86_vendor == X86_VENDOR_AMD))
> > > - cpu_khz = calibrate_cpu();
> > > + (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)) {
> > > +
> > > + if (boot_cpu_data.x86 > 0x10 ||
> > > + (boot_cpu_data.x86 == 0x10 &&
> > > + boot_cpu_data.x86_model >= 0x2)) {
> > > + u64 val;
> > > +
> > > + rdmsrl(MSR_K7_HWCR, val);
> > > + if (!(val & BIT(24)))
> > > + cpu_khz = calibrate_cpu();
> > > + }
> > > + }
> >
> > Am I just missing the point in the code where you actually use the msr
> > value to assign cpu_khz? Or is the idea that the default tsc
> > calibration already is correct, and we don't need to further refine
> > it?
>
> Yes.
>
> Actually Alok brought the code to my attention originally, and what
> puzzled me was the fact that we do calibrate_cpu() on all F10h and later
> AMD machines needlessly (well, almost all, maybe 99%). This is because,
> on F10h, revB machines and later we support "TSC increments with P0
> frequency" so what native_calibrate_tsc() comes up with in terms of
> tsc_khz should be the same as cpu_khz.
>
> So the MSR bit check above is to see whether the TSC increments with P0
> frequency and call calibrate_cpu only if _not_.

Ah. I see, sorry for misreading it.


> As a result, this patch basically drops the additional cpu_khz
> calibration when it isn't needed. And as such it doesn't have a whole
> lot to do with the actual TSC calibration - this is up to you guys :).
>
> The original reason for the calibrate_cpu() is that there's the
> possibility for the TSC to count with the northbridge clockrate and
> there we need to recalibrate obviously.
>
> Hope that makes it more clear.
>
> > And yea, the calibrate_cpu function needs to be renamed.
>
> Done, the whole code is moved to cpu/amd.c anyway. I'm currently testing
> the new version and will be sending out maybe tomorrow or so.

Great!

thanks
-john

2010-08-24 15:52:56

by Borislav Petkov

[permalink] [raw]
Subject: [PATCH -v2] x86, tsc: Limit CPU frequency calibration on AMD

6b37f5a20c0e5c334c010a587058354215433e92 introduced the CPU frequency
calibration code for AMD CPUs whose TSCs didn't increment with the
core's P0 frequency. From F10h, revB onward, the TSC increment rate is
denoted by MSRC001_0015[24] and when this bit is set (which is normally
done by the BIOS,) the TSC increments with the P0 frequency so the
calibration is not needed and booting can be a couple of mcecs faster on
those machines.

While at it, make the code work on 32-bit. In addition, use the 4th
perfctr since using perfctr 0 might clash with perfctr-watchdog.c during
LAPIC init. Finally, warn about wrongly calibrated value in the most
seldom cases when the core TSC is not incrementing with P0 frequency.

Signed-off-by: Borislav Petkov <[email protected]>
---

Here's the new version, had to change quite a lot and check all families
first.

@Alok, I think in your case you will want to do

x86_cpuinit.calibrate_cpu = NULL;

since this means no need to recalibrate.

Sorry for the delay.


arch/x86/include/asm/x86_init.h | 1 +
arch/x86/kernel/cpu/amd.c | 74 +++++++++++++++++++++++++++++++++++++++
arch/x86/kernel/tsc.c | 65 ++++------------------------------
3 files changed, 83 insertions(+), 57 deletions(-)

diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index baa579c..c63ab76 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -133,6 +133,7 @@ struct x86_init_ops {
*/
struct x86_cpuinit_ops {
void (*setup_percpu_clockev)(void);
+ unsigned long (*calibrate_cpu)(void);
};

/**
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index ba5f62f..236bcff 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -6,6 +6,7 @@
#include <asm/processor.h>
#include <asm/apic.h>
#include <asm/cpu.h>
+#include <asm/nmi.h>
#include <asm/pci-direct.h>

#ifdef CONFIG_X86_64
@@ -381,6 +382,62 @@ static void __cpuinit early_init_amd_mc(struct cpuinfo_x86 *c)
#endif
}

+/*
+ * This is used on systems with fixed rate TSCs to determine processor frequency
+ * when the TSC on those systems is not incrementing with the P0 frequency.
+ */
+#define TICK_COUNT 100000000
+unsigned long __cpuinit amd_calibrate_cpu(void)
+{
+ u64 evntsel3 = 0, pmc3 = 0, pmcs = 0;
+ int tsc_start, tscs, i, no_ctr_free;
+ unsigned long flags;
+
+ for (i = 3; i >= -1; i--)
+ if (avail_to_resrv_perfctr_nmi_bit(i))
+ break;
+
+ no_ctr_free = (i == -1);
+ if (no_ctr_free) {
+ WARN(1, KERN_WARNING "Warning: AMD perfctrs busy ... "
+ "cpu_khz value may be incorrect.\n");
+ i = 3;
+ rdmsrl(MSR_K7_EVNTSEL3, evntsel3);
+ wrmsrl(MSR_K7_EVNTSEL3, 0);
+ rdmsrl(MSR_K7_PERFCTR3, pmc3);
+ } else {
+ reserve_perfctr_nmi(MSR_K7_PERFCTR0 + i);
+ reserve_evntsel_nmi(MSR_K7_EVNTSEL0 + i);
+ }
+
+ /* start measuring cycles, incrementing from 0 */
+ local_irq_save(flags);
+ wrmsrl(MSR_K7_PERFCTR0 + i, 0);
+ wrmsrl(MSR_K7_EVNTSEL0 + i, 1 << 22 | 3 << 16 | 0x76);
+ rdtscl(tsc_start);
+ do {
+ rdmsrl(MSR_K7_PERFCTR0 + i, pmcs);
+ tscs = get_cycles();
+ } while ((tscs - tsc_start) < TICK_COUNT);
+ local_irq_restore(flags);
+
+ if (no_ctr_free) {
+ wrmsrl(MSR_K7_EVNTSEL3, 0);
+ wrmsrl(MSR_K7_PERFCTR3, pmc3);
+ wrmsrl(MSR_K7_EVNTSEL3, evntsel3);
+ } else {
+ release_perfctr_nmi(MSR_K7_PERFCTR0 + i);
+ release_evntsel_nmi(MSR_K7_EVNTSEL0 + i);
+ }
+
+ pmcs *= tsc_khz;
+ tscs -= tsc_start;
+
+ (void)do_div(pmcs, tscs);
+
+ return pmcs;
+}
+
static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
{
early_init_amd_mc(c);
@@ -412,6 +469,23 @@ static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
set_cpu_cap(c, X86_FEATURE_EXTD_APICID);
}
#endif
+
+ /* We need to do the following only once */
+ if (c != &boot_cpu_data)
+ return;
+
+ if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) {
+
+ if (c->x86 > 0x10 ||
+ (c->x86 == 0x10 && c->x86_model >= 0x2)) {
+ u64 val;
+
+ rdmsrl(MSR_K7_HWCR, val);
+ if (!(val & BIT(24)))
+ x86_cpuinit.calibrate_cpu = amd_calibrate_cpu;
+ } else
+ x86_cpuinit.calibrate_cpu = amd_calibrate_cpu;
+ }
}

static void __cpuinit init_amd(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index ce8e502..6b4f22f 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -854,60 +854,6 @@ static void __init init_tsc_clocksource(void)
clocksource_register_khz(&clocksource_tsc, tsc_khz);
}

-#ifdef CONFIG_X86_64
-/*
- * calibrate_cpu is used on systems with fixed rate TSCs to determine
- * processor frequency
- */
-#define TICK_COUNT 100000000
-static unsigned long __init calibrate_cpu(void)
-{
- int tsc_start, tsc_now;
- int i, no_ctr_free;
- unsigned long evntsel3 = 0, pmc3 = 0, pmc_now = 0;
- unsigned long flags;
-
- for (i = 0; i < 4; i++)
- if (avail_to_resrv_perfctr_nmi_bit(i))
- break;
- no_ctr_free = (i == 4);
- if (no_ctr_free) {
- WARN(1, KERN_WARNING "Warning: AMD perfctrs busy ... "
- "cpu_khz value may be incorrect.\n");
- i = 3;
- rdmsrl(MSR_K7_EVNTSEL3, evntsel3);
- wrmsrl(MSR_K7_EVNTSEL3, 0);
- rdmsrl(MSR_K7_PERFCTR3, pmc3);
- } else {
- reserve_perfctr_nmi(MSR_K7_PERFCTR0 + i);
- reserve_evntsel_nmi(MSR_K7_EVNTSEL0 + i);
- }
- local_irq_save(flags);
- /* start measuring cycles, incrementing from 0 */
- wrmsrl(MSR_K7_PERFCTR0 + i, 0);
- wrmsrl(MSR_K7_EVNTSEL0 + i, 1 << 22 | 3 << 16 | 0x76);
- rdtscl(tsc_start);
- do {
- rdmsrl(MSR_K7_PERFCTR0 + i, pmc_now);
- tsc_now = get_cycles();
- } while ((tsc_now - tsc_start) < TICK_COUNT);
-
- local_irq_restore(flags);
- if (no_ctr_free) {
- wrmsrl(MSR_K7_EVNTSEL3, 0);
- wrmsrl(MSR_K7_PERFCTR3, pmc3);
- wrmsrl(MSR_K7_EVNTSEL3, evntsel3);
- } else {
- release_perfctr_nmi(MSR_K7_PERFCTR0 + i);
- release_evntsel_nmi(MSR_K7_EVNTSEL0 + i);
- }
-
- return pmc_now * tsc_khz / (tsc_now - tsc_start);
-}
-#else
-static inline unsigned long calibrate_cpu(void) { return cpu_khz; }
-#endif
-
void __init tsc_init(void)
{
u64 lpj;
@@ -926,9 +872,14 @@ void __init tsc_init(void)
return;
}

- if (cpu_has(&boot_cpu_data, X86_FEATURE_CONSTANT_TSC) &&
- (boot_cpu_data.x86_vendor == X86_VENDOR_AMD))
- cpu_khz = calibrate_cpu();
+ if (x86_cpuinit.calibrate_cpu) {
+ cpu_khz = x86_cpuinit.calibrate_cpu();
+ if (cpu_khz < tsc_khz) {
+ printk(KERN_WARNING "TSC possibly calibrated at non-P0 "
+ "core frequency, fall back to previous value.\n");
+ cpu_khz = tsc_khz;
+ }
+ }

printk("Detected %lu.%03lu MHz processor.\n",
(unsigned long)cpu_khz / 1000,
--
1.7.1

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

2010-08-24 17:51:34

by Alok Kataria

[permalink] [raw]
Subject: Re: [PATCH -v2] x86, tsc: Limit CPU frequency calibration on AMD

Hi Borislav,

Thanks for working on this change..

On Tue, 2010-08-24 at 08:53 -0700, Borislav Petkov wrote:
> 6b37f5a20c0e5c334c010a587058354215433e92 introduced the CPU frequency
> calibration code for AMD CPUs whose TSCs didn't increment with the
> core's P0 frequency. From F10h, revB onward, the TSC increment rate is
> denoted by MSRC001_0015[24] and when this bit is set (which is normally
> done by the BIOS,) the TSC increments with the P0 frequency so the
> calibration is not needed and booting can be a couple of mcecs faster on
> those machines.
>
> While at it, make the code work on 32-bit. In addition, use the 4th
> perfctr since using perfctr 0 might clash with perfctr-watchdog.c during
> LAPIC init. Finally, warn about wrongly calibrated value in the most
> seldom cases when the core TSC is not incrementing with P0 frequency.
>
> Signed-off-by: Borislav Petkov <[email protected]>
> ---
>
> Here's the new version, had to change quite a lot and check all families
> first.
>
> @Alok, I think in your case you will want to do
>
> x86_cpuinit.calibrate_cpu = NULL;

> since this means no need to recalibrate.

Right, I will send that patch once this is committed.

Alok
>
> Sorry for the delay.
>
>
> arch/x86/include/asm/x86_init.h | 1 +
> arch/x86/kernel/cpu/amd.c | 74 +++++++++++++++++++++++++++++++++++++++
> arch/x86/kernel/tsc.c | 65 ++++------------------------------
> 3 files changed, 83 insertions(+), 57 deletions(-)
>
> diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
> index baa579c..c63ab76 100644
> --- a/arch/x86/include/asm/x86_init.h
> +++ b/arch/x86/include/asm/x86_init.h
> @@ -133,6 +133,7 @@ struct x86_init_ops {
> */
> struct x86_cpuinit_ops {
> void (*setup_percpu_clockev)(void);
> + unsigned long (*calibrate_cpu)(void);
> };
>
> /**
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index ba5f62f..236bcff 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -6,6 +6,7 @@
> #include <asm/processor.h>
> #include <asm/apic.h>
> #include <asm/cpu.h>
> +#include <asm/nmi.h>
> #include <asm/pci-direct.h>
>
> #ifdef CONFIG_X86_64
> @@ -381,6 +382,62 @@ static void __cpuinit early_init_amd_mc(struct cpuinfo_x86 *c)
> #endif
> }
>
> +/*
> + * This is used on systems with fixed rate TSCs to determine processor frequency
> + * when the TSC on those systems is not incrementing with the P0 frequency.
> + */
> +#define TICK_COUNT 100000000
> +unsigned long __cpuinit amd_calibrate_cpu(void)
> +{
> + u64 evntsel3 = 0, pmc3 = 0, pmcs = 0;
> + int tsc_start, tscs, i, no_ctr_free;
> + unsigned long flags;
> +
> + for (i = 3; i >= -1; i--)
> + if (avail_to_resrv_perfctr_nmi_bit(i))
> + break;
> +
> + no_ctr_free = (i == -1);
> + if (no_ctr_free) {
> + WARN(1, KERN_WARNING "Warning: AMD perfctrs busy ... "
> + "cpu_khz value may be incorrect.\n");
> + i = 3;
> + rdmsrl(MSR_K7_EVNTSEL3, evntsel3);
> + wrmsrl(MSR_K7_EVNTSEL3, 0);
> + rdmsrl(MSR_K7_PERFCTR3, pmc3);
> + } else {
> + reserve_perfctr_nmi(MSR_K7_PERFCTR0 + i);
> + reserve_evntsel_nmi(MSR_K7_EVNTSEL0 + i);
> + }
> +
> + /* start measuring cycles, incrementing from 0 */
> + local_irq_save(flags);
> + wrmsrl(MSR_K7_PERFCTR0 + i, 0);
> + wrmsrl(MSR_K7_EVNTSEL0 + i, 1 << 22 | 3 << 16 | 0x76);
> + rdtscl(tsc_start);
> + do {
> + rdmsrl(MSR_K7_PERFCTR0 + i, pmcs);
> + tscs = get_cycles();
> + } while ((tscs - tsc_start) < TICK_COUNT);
> + local_irq_restore(flags);
> +
> + if (no_ctr_free) {
> + wrmsrl(MSR_K7_EVNTSEL3, 0);
> + wrmsrl(MSR_K7_PERFCTR3, pmc3);
> + wrmsrl(MSR_K7_EVNTSEL3, evntsel3);
> + } else {
> + release_perfctr_nmi(MSR_K7_PERFCTR0 + i);
> + release_evntsel_nmi(MSR_K7_EVNTSEL0 + i);
> + }
> +
> + pmcs *= tsc_khz;
> + tscs -= tsc_start;
> +
> + (void)do_div(pmcs, tscs);
> +
> + return pmcs;
> +}
> +
> static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
> {
> early_init_amd_mc(c);
> @@ -412,6 +469,23 @@ static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
> set_cpu_cap(c, X86_FEATURE_EXTD_APICID);
> }
> #endif
> +
> + /* We need to do the following only once */
> + if (c != &boot_cpu_data)
> + return;
> +
> + if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) {
> +
> + if (c->x86 > 0x10 ||
> + (c->x86 == 0x10 && c->x86_model >= 0x2)) {
> + u64 val;
> +
> + rdmsrl(MSR_K7_HWCR, val);
> + if (!(val & BIT(24)))
> + x86_cpuinit.calibrate_cpu = amd_calibrate_cpu;
> + } else
> + x86_cpuinit.calibrate_cpu = amd_calibrate_cpu;
> + }
> }
>
> static void __cpuinit init_amd(struct cpuinfo_x86 *c)
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index ce8e502..6b4f22f 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -854,60 +854,6 @@ static void __init init_tsc_clocksource(void)
> clocksource_register_khz(&clocksource_tsc, tsc_khz);
> }
>
> -#ifdef CONFIG_X86_64
> -/*
> - * calibrate_cpu is used on systems with fixed rate TSCs to determine
> - * processor frequency
> - */
> -#define TICK_COUNT 100000000
> -static unsigned long __init calibrate_cpu(void)
> -{
> - int tsc_start, tsc_now;
> - int i, no_ctr_free;
> - unsigned long evntsel3 = 0, pmc3 = 0, pmc_now = 0;
> - unsigned long flags;
> -
> - for (i = 0; i < 4; i++)
> - if (avail_to_resrv_perfctr_nmi_bit(i))
> - break;
> - no_ctr_free = (i == 4);
> - if (no_ctr_free) {
> - WARN(1, KERN_WARNING "Warning: AMD perfctrs busy ... "
> - "cpu_khz value may be incorrect.\n");
> - i = 3;
> - rdmsrl(MSR_K7_EVNTSEL3, evntsel3);
> - wrmsrl(MSR_K7_EVNTSEL3, 0);
> - rdmsrl(MSR_K7_PERFCTR3, pmc3);
> - } else {
> - reserve_perfctr_nmi(MSR_K7_PERFCTR0 + i);
> - reserve_evntsel_nmi(MSR_K7_EVNTSEL0 + i);
> - }
> - local_irq_save(flags);
> - /* start measuring cycles, incrementing from 0 */
> - wrmsrl(MSR_K7_PERFCTR0 + i, 0);
> - wrmsrl(MSR_K7_EVNTSEL0 + i, 1 << 22 | 3 << 16 | 0x76);
> - rdtscl(tsc_start);
> - do {
> - rdmsrl(MSR_K7_PERFCTR0 + i, pmc_now);
> - tsc_now = get_cycles();
> - } while ((tsc_now - tsc_start) < TICK_COUNT);
> -
> - local_irq_restore(flags);
> - if (no_ctr_free) {
> - wrmsrl(MSR_K7_EVNTSEL3, 0);
> - wrmsrl(MSR_K7_PERFCTR3, pmc3);
> - wrmsrl(MSR_K7_EVNTSEL3, evntsel3);
> - } else {
> - release_perfctr_nmi(MSR_K7_PERFCTR0 + i);
> - release_evntsel_nmi(MSR_K7_EVNTSEL0 + i);
> - }
> -
> - return pmc_now * tsc_khz / (tsc_now - tsc_start);
> -}
> -#else
> -static inline unsigned long calibrate_cpu(void) { return cpu_khz; }
> -#endif
> -
> void __init tsc_init(void)
> {
> u64 lpj;
> @@ -926,9 +872,14 @@ void __init tsc_init(void)
> return;
> }
>
> - if (cpu_has(&boot_cpu_data, X86_FEATURE_CONSTANT_TSC) &&
> - (boot_cpu_data.x86_vendor == X86_VENDOR_AMD))
> - cpu_khz = calibrate_cpu();
> + if (x86_cpuinit.calibrate_cpu) {
> + cpu_khz = x86_cpuinit.calibrate_cpu();
> + if (cpu_khz < tsc_khz) {
> + printk(KERN_WARNING "TSC possibly calibrated at non-P0 "
> + "core frequency, fall back to previous value.\n");
> + cpu_khz = tsc_khz;
> + }
> + }
>
> printk("Detected %lu.%03lu MHz processor.\n",
> (unsigned long)cpu_khz / 1000,
> --
> 1.7.1
>

2010-08-24 22:34:31

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH -v2] x86, tsc: Limit CPU frequency calibration on AMD

On 08/24/2010 08:53 AM, Borislav Petkov wrote:
> 6b37f5a20c0e5c334c010a587058354215433e92 introduced the CPU frequency
> calibration code for AMD CPUs whose TSCs didn't increment with the
> core's P0 frequency. From F10h, revB onward, the TSC increment rate is
> denoted by MSRC001_0015[24] and when this bit is set (which is normally
> done by the BIOS,) the TSC increments with the P0 frequency so the
> calibration is not needed and booting can be a couple of mcecs faster on
> those machines.
>
> While at it, make the code work on 32-bit. In addition, use the 4th
> perfctr since using perfctr 0 might clash with perfctr-watchdog.c during
> LAPIC init. Finally, warn about wrongly calibrated value in the most
> seldom cases when the core TSC is not incrementing with P0 frequency.
>
> Signed-off-by: Borislav Petkov <[email protected]>
> ---
>
> Here's the new version, had to change quite a lot and check all families
> first.
>

Build failure:

/home/hpa/kernel/linux-2.6-tip.cpu/arch/x86/kernel/cpu/amd.c: In
function ?amd_calibrate_cpu?:
/home/hpa/kernel/linux-2.6-tip.cpu/arch/x86/kernel/cpu/amd.c:397: error:
implicit declaration of function ?avail_to_resrv_perfctr_nmi_bit?
/home/hpa/kernel/linux-2.6-tip.cpu/arch/x86/kernel/cpu/amd.c:409: error:
implicit declaration of function ?reserve_perfctr_nmi?
/home/hpa/kernel/linux-2.6-tip.cpu/arch/x86/kernel/cpu/amd.c:410: error:
implicit declaration of function ?reserve_evntsel_nmi?
/home/hpa/kernel/linux-2.6-tip.cpu/arch/x86/kernel/cpu/amd.c:429: error:
implicit declaration of function ?release_perfctr_nmi?
/home/hpa/kernel/linux-2.6-tip.cpu/arch/x86/kernel/cpu/amd.c:430: error:
implicit declaration of function ?release_evntsel_nmi?

Reproducible by doing "make ARCH=i386 allnoconfig".

-hpa

2010-08-25 07:06:31

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH -v2] x86, tsc: Limit CPU frequency calibration on AMD

From: "H. Peter Anvin" <[email protected]>
Date: Tue, Aug 24, 2010 at 06:33:07PM -0400

> Build failure:
>
> /home/hpa/kernel/linux-2.6-tip.cpu/arch/x86/kernel/cpu/amd.c: In
> function ‘amd_calibrate_cpu’:
> /home/hpa/kernel/linux-2.6-tip.cpu/arch/x86/kernel/cpu/amd.c:397: error:
> implicit declaration of function ‘avail_to_resrv_perfctr_nmi_bit’
> /home/hpa/kernel/linux-2.6-tip.cpu/arch/x86/kernel/cpu/amd.c:409: error:
> implicit declaration of function ‘reserve_perfctr_nmi’
> /home/hpa/kernel/linux-2.6-tip.cpu/arch/x86/kernel/cpu/amd.c:410: error:
> implicit declaration of function ‘reserve_evntsel_nmi’
> /home/hpa/kernel/linux-2.6-tip.cpu/arch/x86/kernel/cpu/amd.c:429: error:
> implicit declaration of function ‘release_perfctr_nmi’
> /home/hpa/kernel/linux-2.6-tip.cpu/arch/x86/kernel/cpu/amd.c:430: error:
> implicit declaration of function ‘release_evntsel_nmi’
>
> Reproducible by doing "make ARCH=i386 allnoconfig".

Sh*t, I can't catch a break with that Kconfig dependency stuff, can I?

This happens because perfctr-watchdog.c gets pulled in by
CONFIG_X86_LOCAL_APIC which is, _of course_, not selected in an
allnoconfig build. Fixing this would mean exporting all that perfcounter
reservation functionality for the allnoconfig case, which is of course
doable but I'm starting to question the need for recalibrating the TSC
at all:

I mean, in the 99% of the cases MSRC001_0015[24] should be set by the
BIOS and if not then the BIOS which does that is pretty b0rked anyway.
So I'm thinking of removing the recalibration code and simply warning
the user instead, for the 1% case.

Andreas, what do you think?

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

Subject: Re: [PATCH -v2] x86, tsc: Limit CPU frequency calibration on AMD

On Wed, Aug 25, 2010 at 03:06:53AM -0400, Borislav Petkov wrote:
> From: "H. Peter Anvin" <[email protected]>
> Date: Tue, Aug 24, 2010 at 06:33:07PM -0400
>
> > Build failure:
> >
> > /home/hpa/kernel/linux-2.6-tip.cpu/arch/x86/kernel/cpu/amd.c: In
> > function ‘amd_calibrate_cpu’:
> > /home/hpa/kernel/linux-2.6-tip.cpu/arch/x86/kernel/cpu/amd.c:397: error:
> > implicit declaration of function ‘avail_to_resrv_perfctr_nmi_bit’
> > /home/hpa/kernel/linux-2.6-tip.cpu/arch/x86/kernel/cpu/amd.c:409: error:
> > implicit declaration of function ‘reserve_perfctr_nmi’
> > /home/hpa/kernel/linux-2.6-tip.cpu/arch/x86/kernel/cpu/amd.c:410: error:
> > implicit declaration of function ‘reserve_evntsel_nmi’
> > /home/hpa/kernel/linux-2.6-tip.cpu/arch/x86/kernel/cpu/amd.c:429: error:
> > implicit declaration of function ‘release_perfctr_nmi’
> > /home/hpa/kernel/linux-2.6-tip.cpu/arch/x86/kernel/cpu/amd.c:430: error:
> > implicit declaration of function ‘release_evntsel_nmi’
> >
> > Reproducible by doing "make ARCH=i386 allnoconfig".
>
> Sh*t, I can't catch a break with that Kconfig dependency stuff, can I?
>
> This happens because perfctr-watchdog.c gets pulled in by
> CONFIG_X86_LOCAL_APIC which is, _of course_, not selected in an
> allnoconfig build. Fixing this would mean exporting all that perfcounter
> reservation functionality for the allnoconfig case, which is of course
> doable but I'm starting to question the need for recalibrating the TSC
> at all:
>
> I mean, in the 99% of the cases MSRC001_0015[24] should be set by the
> BIOS and if not then the BIOS which does that is pretty b0rked anyway.
> So I'm thinking of removing the recalibration code and simply warning
> the user instead, for the 1% case.
>
> Andreas, what do you think?

I opt for removing the recalibration code plus keeping a FIRMWARE_WARN
for borked BIOSes (just in case that there are any old systems with
the wrong setting).


Andreas

--
Operating | Advanced Micro Devices GmbH
System | Einsteinring 24, 85609 Dornach b. München, Germany
Research | Geschäftsführer: Alberto Bozzo, Andrew Bowd
Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
(OSRC) | Registergericht München, HRB Nr. 43632

Subject: Re: [PATCH -v2] x86, tsc: Limit CPU frequency calibration on AMD

On Wed, Aug 25, 2010 at 03:04:54PM +0200, Andreas Herrmann wrote:
> On Wed, Aug 25, 2010 at 03:06:53AM -0400, Borislav Petkov wrote:
> > From: "H. Peter Anvin" <[email protected]>
> > Date: Tue, Aug 24, 2010 at 06:33:07PM -0400
> >
> > > Build failure:
> > >
> > > /home/hpa/kernel/linux-2.6-tip.cpu/arch/x86/kernel/cpu/amd.c: In
> > > function ‘amd_calibrate_cpu’:
> > > /home/hpa/kernel/linux-2.6-tip.cpu/arch/x86/kernel/cpu/amd.c:397: error:
> > > implicit declaration of function ‘avail_to_resrv_perfctr_nmi_bit’
> > > /home/hpa/kernel/linux-2.6-tip.cpu/arch/x86/kernel/cpu/amd.c:409: error:
> > > implicit declaration of function ‘reserve_perfctr_nmi’
> > > /home/hpa/kernel/linux-2.6-tip.cpu/arch/x86/kernel/cpu/amd.c:410: error:
> > > implicit declaration of function ‘reserve_evntsel_nmi’
> > > /home/hpa/kernel/linux-2.6-tip.cpu/arch/x86/kernel/cpu/amd.c:429: error:
> > > implicit declaration of function ‘release_perfctr_nmi’
> > > /home/hpa/kernel/linux-2.6-tip.cpu/arch/x86/kernel/cpu/amd.c:430: error:
> > > implicit declaration of function ‘release_evntsel_nmi’
> > >
> > > Reproducible by doing "make ARCH=i386 allnoconfig".
> >
> > Sh*t, I can't catch a break with that Kconfig dependency stuff, can I?
> >
> > This happens because perfctr-watchdog.c gets pulled in by
> > CONFIG_X86_LOCAL_APIC which is, _of course_, not selected in an
> > allnoconfig build. Fixing this would mean exporting all that perfcounter
> > reservation functionality for the allnoconfig case, which is of course
> > doable but I'm starting to question the need for recalibrating the TSC
> > at all:
> >
> > I mean, in the 99% of the cases MSRC001_0015[24] should be set by the
> > BIOS and if not then the BIOS which does that is pretty b0rked anyway.
> > So I'm thinking of removing the recalibration code and simply warning
> > the user instead, for the 1% case.
> >
> > Andreas, what do you think?
>
> I opt for removing the recalibration code plus keeping a FIRMWARE_WARN
> for borked BIOSes (just in case that there are any old systems with
> the wrong setting).

... and checking the HWCR MSR and issuing the firmware warning should
only happen if not running on a hypervisor. (Validate whether CPU has
X86_FEATURE_HYPERVISOR bit set or not.)


Andreas

--
Operating | Advanced Micro Devices GmbH
System | Einsteinring 24, 85609 Dornach b. München, Germany
Research | Geschäftsführer: Alberto Bozzo, Andrew Bowd
Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
(OSRC) | Registergericht München, HRB Nr. 43632

2010-08-25 16:27:58

by Borislav Petkov

[permalink] [raw]
Subject: [PATCH -v3] x86, tsc: Remove CPU frequency calibration on AMD

6b37f5a20c0e5c334c010a587058354215433e92 introduced the CPU frequency
calibration code for AMD CPUs whose TSCs didn't increment with the
core's P0 frequency. From F10h, revB onward, however, the TSC increment
rate is denoted by MSRC001_0015[24] and when this bit is set (which
should be done by the BIOS) the TSC increments with the P0 frequency
so the calibration is not needed and booting can be a couple of mcecs
faster on those machines.

Besides, there should be virtually no machines out there which don't
have this bit set, therefore this calibration can be safely removed. It
is a shaky hack anyway since it assumes implicitly that the core is in
P0 when BIOS hands off to the OS, which might not always be the case.

Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/kernel/cpu/amd.c | 17 +++++++++++++
arch/x86/kernel/tsc.c | 58 ---------------------------------------------
2 files changed, 17 insertions(+), 58 deletions(-)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index ba5f62f..fc563fa 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -412,6 +412,23 @@ static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
set_cpu_cap(c, X86_FEATURE_EXTD_APICID);
}
#endif
+
+ /* We need to do the following only once */
+ if (c != &boot_cpu_data)
+ return;
+
+ if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) {
+
+ if (c->x86 > 0x10 ||
+ (c->x86 == 0x10 && c->x86_model >= 0x2)) {
+ u64 val;
+
+ rdmsrl(MSR_K7_HWCR, val);
+ if (!(val & BIT(24)))
+ printk(KERN_WARNING FW_BUG "TSC doesn't count "
+ "with P0 frequency!\n");
+ }
+ }
}

static void __cpuinit init_amd(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index ce8e502..13b6a6c 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -854,60 +854,6 @@ static void __init init_tsc_clocksource(void)
clocksource_register_khz(&clocksource_tsc, tsc_khz);
}

-#ifdef CONFIG_X86_64
-/*
- * calibrate_cpu is used on systems with fixed rate TSCs to determine
- * processor frequency
- */
-#define TICK_COUNT 100000000
-static unsigned long __init calibrate_cpu(void)
-{
- int tsc_start, tsc_now;
- int i, no_ctr_free;
- unsigned long evntsel3 = 0, pmc3 = 0, pmc_now = 0;
- unsigned long flags;
-
- for (i = 0; i < 4; i++)
- if (avail_to_resrv_perfctr_nmi_bit(i))
- break;
- no_ctr_free = (i == 4);
- if (no_ctr_free) {
- WARN(1, KERN_WARNING "Warning: AMD perfctrs busy ... "
- "cpu_khz value may be incorrect.\n");
- i = 3;
- rdmsrl(MSR_K7_EVNTSEL3, evntsel3);
- wrmsrl(MSR_K7_EVNTSEL3, 0);
- rdmsrl(MSR_K7_PERFCTR3, pmc3);
- } else {
- reserve_perfctr_nmi(MSR_K7_PERFCTR0 + i);
- reserve_evntsel_nmi(MSR_K7_EVNTSEL0 + i);
- }
- local_irq_save(flags);
- /* start measuring cycles, incrementing from 0 */
- wrmsrl(MSR_K7_PERFCTR0 + i, 0);
- wrmsrl(MSR_K7_EVNTSEL0 + i, 1 << 22 | 3 << 16 | 0x76);
- rdtscl(tsc_start);
- do {
- rdmsrl(MSR_K7_PERFCTR0 + i, pmc_now);
- tsc_now = get_cycles();
- } while ((tsc_now - tsc_start) < TICK_COUNT);
-
- local_irq_restore(flags);
- if (no_ctr_free) {
- wrmsrl(MSR_K7_EVNTSEL3, 0);
- wrmsrl(MSR_K7_PERFCTR3, pmc3);
- wrmsrl(MSR_K7_EVNTSEL3, evntsel3);
- } else {
- release_perfctr_nmi(MSR_K7_PERFCTR0 + i);
- release_evntsel_nmi(MSR_K7_EVNTSEL0 + i);
- }
-
- return pmc_now * tsc_khz / (tsc_now - tsc_start);
-}
-#else
-static inline unsigned long calibrate_cpu(void) { return cpu_khz; }
-#endif
-
void __init tsc_init(void)
{
u64 lpj;
@@ -926,10 +872,6 @@ void __init tsc_init(void)
return;
}

- if (cpu_has(&boot_cpu_data, X86_FEATURE_CONSTANT_TSC) &&
- (boot_cpu_data.x86_vendor == X86_VENDOR_AMD))
- cpu_khz = calibrate_cpu();
-
printk("Detected %lu.%03lu MHz processor.\n",
(unsigned long)cpu_khz / 1000,
(unsigned long)cpu_khz % 1000);
--
1.7.1

--
Regards/Gruss,
Boris.

Operating Systems Research Center
Advanced Micro Devices, Inc.

2010-08-25 21:38:50

by Borislav Petkov

[permalink] [raw]
Subject: [tip:x86/cpu] x86, tsc: Remove CPU frequency calibration on AMD

Commit-ID: acf01734b1747b1ec4be6f159aff579ea5f7f8e2
Gitweb: http://git.kernel.org/tip/acf01734b1747b1ec4be6f159aff579ea5f7f8e2
Author: Borislav Petkov <[email protected]>
AuthorDate: Wed, 25 Aug 2010 18:28:23 +0200
Committer: H. Peter Anvin <[email protected]>
CommitDate: Wed, 25 Aug 2010 13:32:52 -0700

x86, tsc: Remove CPU frequency calibration on AMD

6b37f5a20c0e5c334c010a587058354215433e92 introduced the CPU frequency
calibration code for AMD CPUs whose TSCs didn't increment with the
core's P0 frequency. From F10h, revB onward, however, the TSC increment
rate is denoted by MSRC001_0015[24] and when this bit is set (which
should be done by the BIOS) the TSC increments with the P0 frequency
so the calibration is not needed and booting can be a couple of mcecs
faster on those machines.

Besides, there should be virtually no machines out there which don't
have this bit set, therefore this calibration can be safely removed. It
is a shaky hack anyway since it assumes implicitly that the core is in
P0 when BIOS hands off to the OS, which might not always be the case.

Signed-off-by: Borislav Petkov <[email protected]>
LKML-Reference: <20100825162823.GE26438@aftab>
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/kernel/cpu/amd.c | 17 +++++++++++++
arch/x86/kernel/tsc.c | 58 ---------------------------------------------
2 files changed, 17 insertions(+), 58 deletions(-)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index ba5f62f..fc563fa 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -412,6 +412,23 @@ static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
set_cpu_cap(c, X86_FEATURE_EXTD_APICID);
}
#endif
+
+ /* We need to do the following only once */
+ if (c != &boot_cpu_data)
+ return;
+
+ if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) {
+
+ if (c->x86 > 0x10 ||
+ (c->x86 == 0x10 && c->x86_model >= 0x2)) {
+ u64 val;
+
+ rdmsrl(MSR_K7_HWCR, val);
+ if (!(val & BIT(24)))
+ printk(KERN_WARNING FW_BUG "TSC doesn't count "
+ "with P0 frequency!\n");
+ }
+ }
}

static void __cpuinit init_amd(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index ce8e502..13b6a6c 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -854,60 +854,6 @@ static void __init init_tsc_clocksource(void)
clocksource_register_khz(&clocksource_tsc, tsc_khz);
}

-#ifdef CONFIG_X86_64
-/*
- * calibrate_cpu is used on systems with fixed rate TSCs to determine
- * processor frequency
- */
-#define TICK_COUNT 100000000
-static unsigned long __init calibrate_cpu(void)
-{
- int tsc_start, tsc_now;
- int i, no_ctr_free;
- unsigned long evntsel3 = 0, pmc3 = 0, pmc_now = 0;
- unsigned long flags;
-
- for (i = 0; i < 4; i++)
- if (avail_to_resrv_perfctr_nmi_bit(i))
- break;
- no_ctr_free = (i == 4);
- if (no_ctr_free) {
- WARN(1, KERN_WARNING "Warning: AMD perfctrs busy ... "
- "cpu_khz value may be incorrect.\n");
- i = 3;
- rdmsrl(MSR_K7_EVNTSEL3, evntsel3);
- wrmsrl(MSR_K7_EVNTSEL3, 0);
- rdmsrl(MSR_K7_PERFCTR3, pmc3);
- } else {
- reserve_perfctr_nmi(MSR_K7_PERFCTR0 + i);
- reserve_evntsel_nmi(MSR_K7_EVNTSEL0 + i);
- }
- local_irq_save(flags);
- /* start measuring cycles, incrementing from 0 */
- wrmsrl(MSR_K7_PERFCTR0 + i, 0);
- wrmsrl(MSR_K7_EVNTSEL0 + i, 1 << 22 | 3 << 16 | 0x76);
- rdtscl(tsc_start);
- do {
- rdmsrl(MSR_K7_PERFCTR0 + i, pmc_now);
- tsc_now = get_cycles();
- } while ((tsc_now - tsc_start) < TICK_COUNT);
-
- local_irq_restore(flags);
- if (no_ctr_free) {
- wrmsrl(MSR_K7_EVNTSEL3, 0);
- wrmsrl(MSR_K7_PERFCTR3, pmc3);
- wrmsrl(MSR_K7_EVNTSEL3, evntsel3);
- } else {
- release_perfctr_nmi(MSR_K7_PERFCTR0 + i);
- release_evntsel_nmi(MSR_K7_EVNTSEL0 + i);
- }
-
- return pmc_now * tsc_khz / (tsc_now - tsc_start);
-}
-#else
-static inline unsigned long calibrate_cpu(void) { return cpu_khz; }
-#endif
-
void __init tsc_init(void)
{
u64 lpj;
@@ -926,10 +872,6 @@ void __init tsc_init(void)
return;
}

- if (cpu_has(&boot_cpu_data, X86_FEATURE_CONSTANT_TSC) &&
- (boot_cpu_data.x86_vendor == X86_VENDOR_AMD))
- cpu_khz = calibrate_cpu();
-
printk("Detected %lu.%03lu MHz processor.\n",
(unsigned long)cpu_khz / 1000,
(unsigned long)cpu_khz % 1000);

2010-08-25 22:33:10

by Alok Kataria

[permalink] [raw]
Subject: Re: [PATCH -v3] x86, tsc: Remove CPU frequency calibration on AMD


On Wed, 2010-08-25 at 09:28 -0700, Borislav Petkov wrote:
> 6b37f5a20c0e5c334c010a587058354215433e92 introduced the CPU frequency
> calibration code for AMD CPUs whose TSCs didn't increment with the
> core's P0 frequency. From F10h, revB onward, however, the TSC increment
> rate is denoted by MSRC001_0015[24] and when this bit is set (which
> should be done by the BIOS) the TSC increments with the P0 frequency
> so the calibration is not needed and booting can be a couple of mcecs
> faster on those machines.
>
> Besides, there should be virtually no machines out there which don't
> have this bit set, therefore this calibration can be safely removed. It
> is a shaky hack anyway since it assumes implicitly that the core is in
> P0 when BIOS hands off to the OS, which might not always be the case.

Nice... this works for us too, we don't muck with that MSR bit either,
its directly passed as is from the h/w to the guest. So no additional
changes would be needed for us with this.

Hope that the 3rd time is a charm for you too :)

Thanks,
Alok

>
> Signed-off-by: Borislav Petkov <[email protected]>
> ---
> arch/x86/kernel/cpu/amd.c | 17 +++++++++++++
> arch/x86/kernel/tsc.c | 58 ---------------------------------------------
> 2 files changed, 17 insertions(+), 58 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index ba5f62f..fc563fa 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -412,6 +412,23 @@ static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
> set_cpu_cap(c, X86_FEATURE_EXTD_APICID);
> }
> #endif
> +
> + /* We need to do the following only once */
> + if (c != &boot_cpu_data)
> + return;
> +
> + if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) {
> +
> + if (c->x86 > 0x10 ||
> + (c->x86 == 0x10 && c->x86_model >= 0x2)) {
> + u64 val;
> +
> + rdmsrl(MSR_K7_HWCR, val);
> + if (!(val & BIT(24)))
> + printk(KERN_WARNING FW_BUG "TSC doesn't count "
> + "with P0 frequency!\n");
> + }
> + }
> }
>
> static void __cpuinit init_amd(struct cpuinfo_x86 *c)
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index ce8e502..13b6a6c 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -854,60 +854,6 @@ static void __init init_tsc_clocksource(void)
> clocksource_register_khz(&clocksource_tsc, tsc_khz);
> }
>
> -#ifdef CONFIG_X86_64
> -/*
> - * calibrate_cpu is used on systems with fixed rate TSCs to determine
> - * processor frequency
> - */
> -#define TICK_COUNT 100000000
> -static unsigned long __init calibrate_cpu(void)
> -{
> - int tsc_start, tsc_now;
> - int i, no_ctr_free;
> - unsigned long evntsel3 = 0, pmc3 = 0, pmc_now = 0;
> - unsigned long flags;
> -
> - for (i = 0; i < 4; i++)
> - if (avail_to_resrv_perfctr_nmi_bit(i))
> - break;
> - no_ctr_free = (i == 4);
> - if (no_ctr_free) {
> - WARN(1, KERN_WARNING "Warning: AMD perfctrs busy ... "
> - "cpu_khz value may be incorrect.\n");
> - i = 3;
> - rdmsrl(MSR_K7_EVNTSEL3, evntsel3);
> - wrmsrl(MSR_K7_EVNTSEL3, 0);
> - rdmsrl(MSR_K7_PERFCTR3, pmc3);
> - } else {
> - reserve_perfctr_nmi(MSR_K7_PERFCTR0 + i);
> - reserve_evntsel_nmi(MSR_K7_EVNTSEL0 + i);
> - }
> - local_irq_save(flags);
> - /* start measuring cycles, incrementing from 0 */
> - wrmsrl(MSR_K7_PERFCTR0 + i, 0);
> - wrmsrl(MSR_K7_EVNTSEL0 + i, 1 << 22 | 3 << 16 | 0x76);
> - rdtscl(tsc_start);
> - do {
> - rdmsrl(MSR_K7_PERFCTR0 + i, pmc_now);
> - tsc_now = get_cycles();
> - } while ((tsc_now - tsc_start) < TICK_COUNT);
> -
> - local_irq_restore(flags);
> - if (no_ctr_free) {
> - wrmsrl(MSR_K7_EVNTSEL3, 0);
> - wrmsrl(MSR_K7_PERFCTR3, pmc3);
> - wrmsrl(MSR_K7_EVNTSEL3, evntsel3);
> - } else {
> - release_perfctr_nmi(MSR_K7_PERFCTR0 + i);
> - release_evntsel_nmi(MSR_K7_EVNTSEL0 + i);
> - }
> -
> - return pmc_now * tsc_khz / (tsc_now - tsc_start);
> -}
> -#else
> -static inline unsigned long calibrate_cpu(void) { return cpu_khz; }
> -#endif
> -
> void __init tsc_init(void)
> {
> u64 lpj;
> @@ -926,10 +872,6 @@ void __init tsc_init(void)
> return;
> }
>
> - if (cpu_has(&boot_cpu_data, X86_FEATURE_CONSTANT_TSC) &&
> - (boot_cpu_data.x86_vendor == X86_VENDOR_AMD))
> - cpu_khz = calibrate_cpu();
> -
> printk("Detected %lu.%03lu MHz processor.\n",
> (unsigned long)cpu_khz / 1000,
> (unsigned long)cpu_khz % 1000);
> --
> 1.7.1
>

2010-08-26 07:19:19

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH -v3] x86, tsc: Remove CPU frequency calibration on AMD

From: Alok Kataria <[email protected]>
Date: Wed, Aug 25, 2010 at 06:33:08PM -0400

>
> On Wed, 2010-08-25 at 09:28 -0700, Borislav Petkov wrote:
> > 6b37f5a20c0e5c334c010a587058354215433e92 introduced the CPU frequency
> > calibration code for AMD CPUs whose TSCs didn't increment with the
> > core's P0 frequency. From F10h, revB onward, however, the TSC increment
> > rate is denoted by MSRC001_0015[24] and when this bit is set (which
> > should be done by the BIOS) the TSC increments with the P0 frequency
> > so the calibration is not needed and booting can be a couple of mcecs
> > faster on those machines.
> >
> > Besides, there should be virtually no machines out there which don't
> > have this bit set, therefore this calibration can be safely removed. It
> > is a shaky hack anyway since it assumes implicitly that the core is in
> > P0 when BIOS hands off to the OS, which might not always be the case.
>
> Nice... this works for us too, we don't muck with that MSR bit either,
> its directly passed as is from the h/w to the guest. So no additional
> changes would be needed for us with this.

That's nice, KVM appears to not hit it either due to unsynchronized
TSCs.

> Hope that the 3rd time is a charm for you too :)

Yeah, I think it is :). Sorry for taking so long but removing code which
is actively executed from the kernel is not such a light decision. But
the hw guys made sure that this bit is always set so we don't need the
calibration. It wouldn't work in all cases anyway (hint: boosted cores).

Thanks.

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632