2019-04-17 05:30:53

by Daniel Drake

[permalink] [raw]
Subject: Detecting x86 LAPIC timer frequency from CPUID data

The CPUID.0x16 leaf provides "Bus (Reference) Frequency (in MHz)".

In the thread "No 8254 PIT & no HPET on new Intel N3350 platforms
causes kernel panic during early boot" we are exploring ways to have
the kernel avoid using the PIT/HPET IRQ0 timer in more cases, and
Thomas Gleixner suggested that we could use this CPUID data to set
lapic_timer_frequency, avoiding the need for calibrate_APIC_clock()
to measure the APIC clock against the IRQ0 timer.

I'm thinking of the the following code change, however I get
unexpected results on Intel i7-8565U (Whiskey Lake). When
booting without this change, and with apic=notscdeadline (so that
APIC clock gets calibrated and used), the bus speed is detected as
23MHz:

... lapic delta = 149994
... PM-Timer delta = 357939
... PM-Timer result ok
..... delta 149994
..... mult: 6442193
..... calibration result: 23999
..... CPU clock speed is 1991.0916 MHz.
..... host bus clock speed is 23.0999 MHz.

However the CPUID.0x16 ECX reports a 100MHz bus speed on this device,
so this code change would produce a significantly different calibration.

Am I doing anything obviously wrong?

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 3fae23834069..6c51ce842f86 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -679,6 +679,16 @@ static unsigned long cpu_khz_from_cpuid(void)

cpuid(0x16, &eax_base_mhz, &ebx_max_mhz, &ecx_bus_mhz, &edx);

+#ifdef CONFIG_X86_LOCAL_APIC
+ /*
+ * If bus frequency is provided in CPUID data, set
+ * global lapic_timer_frequency to bus_clock_cycles/jiffy.
+ * This avoids having to calibrate the APIC timer later.
+ */
+ if (ecx_bus_mhz)
+ lapic_timer_frequency = (ecx_bus_mhz * 1000000) / HZ;
+#endif
+
return eax_base_mhz * 1000;
}

--
2.19.1


2019-04-18 13:13:26

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Detecting x86 LAPIC timer frequency from CPUID data

On Wed, 17 Apr 2019, Daniel Drake wrote:

> The CPUID.0x16 leaf provides "Bus (Reference) Frequency (in MHz)".
>
> In the thread "No 8254 PIT & no HPET on new Intel N3350 platforms
> causes kernel panic during early boot" we are exploring ways to have
> the kernel avoid using the PIT/HPET IRQ0 timer in more cases, and
> Thomas Gleixner suggested that we could use this CPUID data to set
> lapic_timer_frequency, avoiding the need for calibrate_APIC_clock()
> to measure the APIC clock against the IRQ0 timer.
>
> I'm thinking of the the following code change, however I get
> unexpected results on Intel i7-8565U (Whiskey Lake). When
> booting without this change, and with apic=notscdeadline (so that
> APIC clock gets calibrated and used), the bus speed is detected as
> 23MHz:
>
> ... lapic delta = 149994
> ... PM-Timer delta = 357939
> ... PM-Timer result ok
> ..... delta 149994
> ..... mult: 6442193
> ..... calibration result: 23999
> ..... CPU clock speed is 1991.0916 MHz.
> ..... host bus clock speed is 23.0999 MHz.
>
> However the CPUID.0x16 ECX reports a 100MHz bus speed on this device,
> so this code change would produce a significantly different calibration.
>
> Am I doing anything obviously wrong?

It's probably just my fault sending you down the wrong path. What's the
content of CPUUD.0x15 on that box?

Thanks,

tglx

2019-04-18 22:32:39

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Detecting x86 LAPIC timer frequency from CPUID data

On Thu, 18 Apr 2019, Thomas Gleixner wrote:
> On Wed, 17 Apr 2019, Daniel Drake wrote:
>
> > The CPUID.0x16 leaf provides "Bus (Reference) Frequency (in MHz)".
> >
> > In the thread "No 8254 PIT & no HPET on new Intel N3350 platforms
> > causes kernel panic during early boot" we are exploring ways to have
> > the kernel avoid using the PIT/HPET IRQ0 timer in more cases, and
> > Thomas Gleixner suggested that we could use this CPUID data to set
> > lapic_timer_frequency, avoiding the need for calibrate_APIC_clock()
> > to measure the APIC clock against the IRQ0 timer.
> >
> > I'm thinking of the the following code change, however I get
> > unexpected results on Intel i7-8565U (Whiskey Lake). When
> > booting without this change, and with apic=notscdeadline (so that
> > APIC clock gets calibrated and used), the bus speed is detected as
> > 23MHz:
> >
> > ... lapic delta = 149994
> > ... PM-Timer delta = 357939
> > ... PM-Timer result ok
> > ..... delta 149994
> > ..... mult: 6442193
> > ..... calibration result: 23999

That's 24MHZ which is the nominal clock for these machines.

> > ..... CPU clock speed is 1991.0916 MHz.
> > ..... host bus clock speed is 23.0999 MHz.

I think that printout is wrong in two aspects. First it should be
23.9999Mhz, second it should be 100MHz. This stuff comes from old systems
which had completely different clock setups.

> > However the CPUID.0x16 ECX reports a 100MHz bus speed on this device,
> > so this code change would produce a significantly different calibration.

Yes.

> > Am I doing anything obviously wrong?
>
> It's probably just my fault sending you down the wrong path. What's the
> content of CPUUD.0x15 on that box?

I bet that CPUID.0x15 ECX says 24Mhz or it just says 0 like on my
machine. But then interestingly enough on that box I see:

Time Stamp Counter/Core Crystal Clock Information (0x15):
TSC/clock ratio = 168/2
nominal core crystal clock = 0 Hz

Processor Frequency Information (0x16):
Core Base Frequency (MHz) = 0x834 (2100)
Core Maximum Frequency (MHz) = 0xed8 (3800)
Bus (Reference) Frequency (MHz) = 0x64 (100)

Assuming that TSC and local APIC timer run from the same frequency on these
modern machines.

2100MHz * 2 / 168 = 25MHz

and disabling the tsc deadline timer tells me:

..... calibration result: 24999

Close enough. Thinking about it - that makes a lot of sense. With TSC
deadline timer available it would be pretty silly if there is yet another
clock feeding into local APIC.

And the 0x15 -> 0x16 correlation is clear as well. The TSC runs at the
nominal CPU frequency, in this case 2100MHz. So the nominator/denominator
pair from 0x15 allows to deduce the nominal core crystal clock which seems
to be exactly the clock which is fed into the local APIC timer.

If someone from Intel could confirm that, then we could make that work for
any modern system.

Thanks,

tglx

2019-04-19 19:05:18

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Detecting x86 LAPIC timer frequency from CPUID data

On Fri, 19 Apr 2019, Daniel Drake wrote:
> On Fri, Apr 19, 2019 at 6:30 AM Thomas Gleixner <[email protected]> wrote:
> > Time Stamp Counter/Core Crystal Clock Information (0x15):
> > TSC/clock ratio = 168/2
> > nominal core crystal clock = 0 Hz
> >
> > Processor Frequency Information (0x16):
> > Core Base Frequency (MHz) = 0x834 (2100)
> > Core Maximum Frequency (MHz) = 0xed8 (3800)
> > Bus (Reference) Frequency (MHz) = 0x64 (100)
> >
> > Assuming that TSC and local APIC timer run from the same frequency on these
> > modern machines.
> >
> > 2100MHz * 2 / 168 = 25MHz
> >
> > and disabling the tsc deadline timer tells me:
> >
> > ..... calibration result: 24999
> >
> > Close enough.
>
> I tested all the Intel SoC generations we have on hand. The assumption that
> the core crystal clock feeds the APIC seems to be consistently true.
>
> (Please note that all the following results are done with CONFIG_HZ=250,
> which is why the "calibration result" is 4x higher than HZ=1000 as used
> in previous mails)
>
> In the easy case, the low cost platforms do not support CPUID.0x16 (so no
> CPU frequency reporting), but they do tell us the core crystal clock, which
> is consistent with the APIC calibration result:

...

> And the 4 higher-end SoCs that we have available for testing all report
> crystal clock 0Hz from CPUID 0x15, but by combining the CPUID.0x16 base
> frequency with the CPUID.0x15 TSC/clock ratio, the crystal frequency can
> be calculated as you describe, and it consistently matches the APIC timer
> calibration result.

...

> Is this data convincing enough or should we additionally wait for some
> comments from Intel?

For me it's pretty convincing, but having some confirmation from Intel
wouldn't be a bad thing.

> I came up with the patch below. However, upon testing, I realised that, at
> least for the platforms I have in hand, only the first hunk is really needed.
> We don't need to use your magic calculation to find the crystal frequency
> because Intel already told us! native_calibrate_tsc() already hardcodes the
> crystal frequency for Kabylake, and Amber/Whiskey/Coffee also report the
> 0x8e/0x9e Kabylake model codes.

I'd rather replace these model checks with math. These tables are horrible
to maintain.

> Plus ApolloLake/GeminiLake do report the crystal frequency in CPUID.0x15
> so that is covered too.

> While looking around this code I also spotted something curious.
> In calibrate_APIC_clock() for the case where lapic_timer_frequency has been
> externally provided, we have:
> lapic_clockevent.max_delta_ns =
> clockevent_delta2ns(0x7FFFFF, &lapic_clockevent);
> lapic_clockevent.max_delta_ticks = 0x7FFFFF;
>
> But in the case where we calibrate, we have:
> lapic_clockevent.max_delta_ns =
> clockevent_delta2ns(0x7FFFFFFF, &lapic_clockevent);
> lapic_clockevent.max_delta_ticks = 0x7FFFFFFF;
>
> 0x7FFFFF vs 0x7FFFFFFF, is that intentional?

I don't think so. Looks like a failed copy and paste. Cc'ed Jacob, he might
know.

Thanks,

tglx

2019-04-19 20:12:16

by Daniel Drake

[permalink] [raw]
Subject: Re: Detecting x86 LAPIC timer frequency from CPUID data

On Fri, Apr 19, 2019 at 6:30 AM Thomas Gleixner <[email protected]> wrote:
> Time Stamp Counter/Core Crystal Clock Information (0x15):
> TSC/clock ratio = 168/2
> nominal core crystal clock = 0 Hz
>
> Processor Frequency Information (0x16):
> Core Base Frequency (MHz) = 0x834 (2100)
> Core Maximum Frequency (MHz) = 0xed8 (3800)
> Bus (Reference) Frequency (MHz) = 0x64 (100)
>
> Assuming that TSC and local APIC timer run from the same frequency on these
> modern machines.
>
> 2100MHz * 2 / 168 = 25MHz
>
> and disabling the tsc deadline timer tells me:
>
> ..... calibration result: 24999
>
> Close enough.

I tested all the Intel SoC generations we have on hand. The assumption that
the core crystal clock feeds the APIC seems to be consistently true.

(Please note that all the following results are done with CONFIG_HZ=250,
which is why the "calibration result" is 4x higher than HZ=1000 as used
in previous mails)

In the easy case, the low cost platforms do not support CPUID.0x16 (so no
CPU frequency reporting), but they do tell us the core crystal clock, which
is consistent with the APIC calibration result:

N5000 (Gemini Lake)
[ 0.122948] ... lapic delta = 119999
[ 0.122948] ... PM-Timer delta = 357950
[ 0.122948] ... PM-Timer result ok
[ 0.122948] ..... delta 119999
[ 0.122948] ..... mult: 5153917
[ 0.122948] ..... calibration result: 76799
[ 0.122948] ..... CPU clock speed is 1094.1542 MHz.
[ 0.122948] ..... host bus clock speed is 19.0799 MHz.
Time Stamp Counter/Core Crystal Clock Information (0x15):
TSC/clock ratio = 171/3
nominal core crystal clock = 19200000 Hz
Processor Frequency Information (0x16):
Core Base Frequency (MHz) = 0x0 (0)
Core Maximum Frequency (MHz) = 0x0 (0)
Bus (Reference) Frequency (MHz) = 0x0 (0)

N3350 (Apollo Lake)
[ 0.248894] ... lapic delta = 119999
[ 0.248894] ... PM-Timer delta = 357949
[ 0.248894] ... PM-Timer result ok
[ 0.248894] ..... delta 119999
[ 0.248894] ..... mult: 5153917
[ 0.248894] ..... calibration result: 76799
[ 0.248894] ..... CPU clock speed is 1094.1540 MHz.
[ 0.248894] ..... host bus clock speed is 19.0799 MHz.
Time Stamp Counter/Core Crystal Clock Information (0x15):
TSC/clock ratio = 171/3
nominal core crystal clock = 19200000 Hz
(CPUID 0x16 not supported at all)

And the 4 higher-end SoCs that we have available for testing all report
crystal clock 0Hz from CPUID 0x15, but by combining the CPUID.0x16 base
frequency with the CPUID.0x15 TSC/clock ratio, the crystal frequency can
be calculated as you describe, and it consistently matches the APIC timer
calibration result.

i9-9980HK (Coffee Lake HR)
[ 0.379421] ... lapic delta = 149998
[ 0.379421] ... PM-Timer delta = 357950
[ 0.379421] ... PM-Timer result ok
[ 0.379421] ..... delta 149998
[ 0.379421] ..... mult: 6442365
[ 0.379421] ..... calibration result: 95998
[ 0.379421] ..... CPU clock speed is 2399.3902 MHz.
[ 0.379421] ..... host bus clock speed is 23.3998 MHz.
Time Stamp Counter/Core Crystal Clock Information (0x15):
TSC/clock ratio = 200/2
nominal core crystal clock = 0 Hz
Processor Frequency Information (0x16):
Core Base Frequency (MHz) = 0x960 (2400)
Core Maximum Frequency (MHz) = 0x1388 (5000)
Bus (Reference) Frequency (MHz) = 0x64 (100)

i7-8565U (Whiskey Lake)
[ 0.173776] ... lapic delta = 149998
[ 0.173776] ... PM-Timer delta = 357950
[ 0.173776] ... PM-Timer result ok
[ 0.173776] ..... delta 149998
[ 0.173776] ..... mult: 6442365
[ 0.173776] ..... calibration result: 95998
[ 0.173776] ..... CPU clock speed is 1991.3903 MHz.
[ 0.173776] ..... host bus clock speed is 23.3998 MHz.
Time Stamp Counter/Core Crystal Clock Information (0x15):
TSC/clock ratio = 166/2
nominal core crystal clock = 0 Hz
Processor Frequency Information (0x16):
Core Base Frequency (MHz) = 0x7d0 (2000)
Core Maximum Frequency (MHz) = 0x11f8 (4600)
Bus (Reference) Frequency (MHz) = 0x64 (100)

i5-7200U (Kabylake)
[ 0.219142] ... lapic delta = 149998
[ 0.219142] ... PM-Timer delta = 357951
[ 0.219142] ... PM-Timer result ok
[ 0.219142] ..... delta 149998
[ 0.219142] ..... mult: 6442365
[ 0.219142] ..... calibration result: 95998
[ 0.219142] ..... CPU clock speed is 2711.3880 MHz.
[ 0.219142] ..... host bus clock speed is 23.3998 MHz.
Time Stamp Counter/Core Crystal Clock Information (0x15):
TSC/clock ratio = 226/2
nominal core crystal clock = 0 Hz
Processor Frequency Information (0x16):
Core Base Frequency (MHz) = 0xa8c (2700)
Core Maximum Frequency (MHz) = 0xc1c (3100)
Bus (Reference) Frequency (MHz) = 0x64 (100)

m3-8110Y (Amber Lake)
[ 0.102289] ... lapic delta = 149998
[ 0.102289] ... PM-Timer delta = 357951
[ 0.102289] ... PM-Timer result ok
[ 0.102289] ..... delta 149998
[ 0.102289] ..... mult: 6442365
[ 0.102289] ..... calibration result: 95998
[ 0.102289] ..... CPU clock speed is 1607.3930 MHz.
[ 0.102289] ..... host bus clock speed is 23.3998 MHz.
Time Stamp Counter/Core Crystal Clock Information (0x15):
TSC/clock ratio = 134/2
nominal core crystal clock = 0 Hz
Processor Frequency Information (0x16):
Core Base Frequency (MHz) = 0x640 (1600)
Core Maximum Frequency (MHz) = 0xd48 (3400)
Bus (Reference) Frequency (MHz) = 0x64 (100)

Is this data convincing enough or should we additionally wait for some
comments from Intel?

I came up with the patch below. However, upon testing, I realised that, at
least for the platforms I have in hand, only the first hunk is really needed.
We don't need to use your magic calculation to find the crystal frequency
because Intel already told us! native_calibrate_tsc() already hardcodes the
crystal frequency for Kabylake, and Amber/Whiskey/Coffee also report the
0x8e/0x9e Kabylake model codes. Plus ApolloLake/GeminiLake do report the
crystal frequency in CPUID.0x15 so that is covered too.

While looking around this code I also spotted something curious.
In calibrate_APIC_clock() for the case where lapic_timer_frequency has been
externally provided, we have:
lapic_clockevent.max_delta_ns =
clockevent_delta2ns(0x7FFFFF, &lapic_clockevent);
lapic_clockevent.max_delta_ticks = 0x7FFFFF;

But in the case where we calibrate, we have:
lapic_clockevent.max_delta_ns =
clockevent_delta2ns(0x7FFFFFFF, &lapic_clockevent);
lapic_clockevent.max_delta_ticks = 0x7FFFFFFF;

0x7FFFFF vs 0x7FFFFFFF, is that intentional?

---
arch/x86/kernel/tsc.c | 51 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 3fae23834069..8dcfd9a30b24 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -648,6 +648,23 @@ unsigned long native_calibrate_tsc(void)

if (crystal_khz == 0)
return 0;
+
+#ifdef CONFIG_X86_LOCAL_APIC
+ /* The local APIC appears to be fed by the core crystal clock
+ * (which sounds entirely sensible). We can set the global
+ * lapic_timer_frequency here to avoid having to calibrate the APIC
+ * timer later.
+ *
+ * This has been empirically verified on ApolloLake and GeminiLake:
+ * the APIC calibration otherwise determines the same bus frequency
+ * as the CPUID.0x15 crystal frequency.
+ * It has also been empirically verified on KabyLake, where the
+ * APIC calibration otherwise determines the same bus frequency
+ * as the one hardcoded above.
+ */
+ lapic_timer_frequency = (crystal_khz * 1000) / HZ;
+#endif
+
/*
* TSC frequency determined by CPUID is a "hardware reported"
* frequency and is the most accurate one so far we have. This
@@ -665,6 +682,38 @@ unsigned long native_calibrate_tsc(void)
return crystal_khz * ebx_numerator / eax_denominator;
}

+static void set_lapic_timer_freq_from_cpu_freq(unsigned int base_mhz)
+{
+#ifdef CONFIG_X86_LOCAL_APIC
+ unsigned int eax_denominator, ebx_numerator, ecx_hz, edx;
+
+ /*
+ * Use CPUID.0x16 CPU base frequency to calculate the global
+ * lapic_timer_frequency, which avoids having to calibrate the
+ * APIC timer later.
+ *
+ * Like in native_calibrate_tsc(), we rely on the observation
+ * that the local APIC is fed by the core crystal clock.
+ * Experimenting with KabyLake, CoffeeLake, WhiskeyLake and
+ * AmberLake, the core crystal clock is not reported in CPUID.0x15,
+ * but we can calculate it by considering the CPU base frequency
+ * and the TSC/clock ratio. The result has been verified to closely
+ * match the result that would otherwise be obtained through
+ * APIC clock calibration.
+ */
+
+ if (!base_mhz)
+ return;
+
+ cpuid(0x15, &eax_denominator, &ebx_numerator, &ecx_hz, &edx);
+ if (!eax_denominator || !ebx_numerator)
+ return;
+
+ lapic_timer_frequency = (base_mhz * 1000000 * \
+ eax_denominator / ebx_numerator) / HZ;
+#endif
+}
+
static unsigned long cpu_khz_from_cpuid(void)
{
unsigned int eax_base_mhz, ebx_max_mhz, ecx_bus_mhz, edx;
@@ -679,6 +728,8 @@ static unsigned long cpu_khz_from_cpuid(void)

cpuid(0x16, &eax_base_mhz, &ebx_max_mhz, &ecx_bus_mhz, &edx);

+ set_lapic_timer_freq_from_cpu_freq(eax_base_mhz);
+
return eax_base_mhz * 1000;
}

--
2.19.1


2019-04-19 20:49:25

by Jacob Pan

[permalink] [raw]
Subject: Re: Detecting x86 LAPIC timer frequency from CPUID data

On Fri, 19 Apr 2019 10:57:10 +0200 (CEST)
Thomas Gleixner <[email protected]> wrote:

> On Fri, 19 Apr 2019, Daniel Drake wrote:
> > On Fri, Apr 19, 2019 at 6:30 AM Thomas Gleixner
> > <[email protected]> wrote:
> > > Time Stamp Counter/Core Crystal Clock Information (0x15):
> > > TSC/clock ratio = 168/2
> > > nominal core crystal clock = 0 Hz
> > >
> > > Processor Frequency Information (0x16):
> > > Core Base Frequency (MHz) = 0x834 (2100)
> > > Core Maximum Frequency (MHz) = 0xed8 (3800)
> > > Bus (Reference) Frequency (MHz) = 0x64 (100)
> > >
> > > Assuming that TSC and local APIC timer run from the same
> > > frequency on these modern machines.
> > >
> > > 2100MHz * 2 / 168 = 25MHz
> > >
> > > and disabling the tsc deadline timer tells me:
> > >
> > > ..... calibration result: 24999
> > >
> > > Close enough.
> >
> > I tested all the Intel SoC generations we have on hand. The
> > assumption that the core crystal clock feeds the APIC seems to be
> > consistently true.
> >
> > (Please note that all the following results are done with
> > CONFIG_HZ=250, which is why the "calibration result" is 4x higher
> > than HZ=1000 as used in previous mails)
> >
> > In the easy case, the low cost platforms do not support CPUID.0x16
> > (so no CPU frequency reporting), but they do tell us the core
> > crystal clock, which is consistent with the APIC calibration
> > result:
>
> ...
>
> > And the 4 higher-end SoCs that we have available for testing all
> > report crystal clock 0Hz from CPUID 0x15, but by combining the
> > CPUID.0x16 base frequency with the CPUID.0x15 TSC/clock ratio, the
> > crystal frequency can be calculated as you describe, and it
> > consistently matches the APIC timer calibration result.
>
> ...
>
> > Is this data convincing enough or should we additionally wait for
> > some comments from Intel?
>
> For me it's pretty convincing, but having some confirmation from Intel
> wouldn't be a bad thing.
>
> > I came up with the patch below. However, upon testing, I realised
> > that, at least for the platforms I have in hand, only the first
> > hunk is really needed. We don't need to use your magic calculation
> > to find the crystal frequency because Intel already told us!
> > native_calibrate_tsc() already hardcodes the crystal frequency for
> > Kabylake, and Amber/Whiskey/Coffee also report the 0x8e/0x9e
> > Kabylake model codes.
>
> I'd rather replace these model checks with math. These tables are
> horrible to maintain.
>
> > Plus ApolloLake/GeminiLake do report the crystal frequency in
> > CPUID.0x15 so that is covered too.
>
> > While looking around this code I also spotted something curious.
> > In calibrate_APIC_clock() for the case where lapic_timer_frequency
> > has been externally provided, we have:
> > lapic_clockevent.max_delta_ns =
> > clockevent_delta2ns(0x7FFFFF,
> > &lapic_clockevent); lapic_clockevent.max_delta_ticks = 0x7FFFFF;
> >
> > But in the case where we calibrate, we have:
> > lapic_clockevent.max_delta_ns =
> > clockevent_delta2ns(0x7FFFFFFF, &lapic_clockevent);
> > lapic_clockevent.max_delta_ticks = 0x7FFFFFFF;
> >
> > 0x7FFFFF vs 0x7FFFFFFF, is that intentional?
>
> I don't think so. Looks like a failed copy and paste. Cc'ed Jacob, he
> might know.
>
At the time of v2.6.35 both places use 0x7FFFFF. But later this patch
increased the latter to 0x7FFFFFFF but forgot the first part. So I
guess it is not exactly a failed copy and paste.

commit 4aed89d6b515b9185351706ca95cd712c9d8d6a3
Author: Pierre Tardy <[email protected]>
Date: Thu Jan 6 16:23:29 2011 +0100

x86, lapic-timer: Increase the max_delta to 31 bits


> Thanks,
>
> tglx

[Jacob Pan]

2019-04-19 20:53:15

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Detecting x86 LAPIC timer frequency from CPUID data

On Fri, 19 Apr 2019, Jacob Pan wrote:
> On Fri, 19 Apr 2019 10:57:10 +0200 (CEST)
> Thomas Gleixner <[email protected]> wrote:
> > On Fri, 19 Apr 2019, Daniel Drake wrote:
> > > 0x7FFFFF vs 0x7FFFFFFF, is that intentional?
> >
> > I don't think so. Looks like a failed copy and paste. Cc'ed Jacob, he
> > might know.
> >
> At the time of v2.6.35 both places use 0x7FFFFF. But later this patch
> increased the latter to 0x7FFFFFFF but forgot the first part. So I
> guess it is not exactly a failed copy and paste.
>
> commit 4aed89d6b515b9185351706ca95cd712c9d8d6a3
> Author: Pierre Tardy <[email protected]>
> Date: Thu Jan 6 16:23:29 2011 +0100
>
> x86, lapic-timer: Increase the max_delta to 31 bits

Indeed. Thanks for digging that up!

tglx

2019-04-19 23:08:35

by Jacob Pan

[permalink] [raw]
Subject: Re: Detecting x86 LAPIC timer frequency from CPUID data

On Fri, 19 Apr 2019 22:52:01 +0200 (CEST)
Thomas Gleixner <[email protected]> wrote:

> On Fri, 19 Apr 2019, Jacob Pan wrote:
> > On Fri, 19 Apr 2019 10:57:10 +0200 (CEST)
> > Thomas Gleixner <[email protected]> wrote:
> > > On Fri, 19 Apr 2019, Daniel Drake wrote:
> > > > 0x7FFFFF vs 0x7FFFFFFF, is that intentional?
> > >
> > > I don't think so. Looks like a failed copy and paste. Cc'ed
> > > Jacob, he might know.
> > >
> > At the time of v2.6.35 both places use 0x7FFFFF. But later this
> > patch increased the latter to 0x7FFFFFFF but forgot the first part.
> > So I guess it is not exactly a failed copy and paste.
> >
> > commit 4aed89d6b515b9185351706ca95cd712c9d8d6a3
> > Author: Pierre Tardy <[email protected]>
> > Date: Thu Jan 6 16:23:29 2011 +0100
> >
> > x86, lapic-timer: Increase the max_delta to 31 bits
>
> Indeed. Thanks for digging that up!
>
> tglx

How about a fix like this? I should have taken your advice 9 years ago
to avoid duplicated code :)
https://lkml.org/lkml/2010/5/11/499

From 18450bb67e09f5b472f1ed313d7f87a983cb0ac1 Mon Sep 17 00:00:00 2001
From: Jacob Pan <[email protected]>
Date: Fri, 19 Apr 2019 15:56:06 -0700
Subject: [PATCH] x86/apic: Fix duplicated lapic timer calculation

Local APIC timer clockevent parameters can be calculated based on
platform specific methods. However the code is mostly duplicated
with the interrupt based calibration. This causes further updates
prone to mistakes.

Fixes: 4aed89d ("x86, lapic-timer: Increase the max_delta to 31 bits")

Signed-off-by: Jacob Pan <[email protected]>
---
arch/x86/kernel/apic/apic.c | 46
++++++++++++++++++++++++++------------------- 1 file changed, 27
insertions(+), 19 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index b7bcdd7..b2ef91c 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -802,6 +802,24 @@ calibrate_by_pmtimer(long deltapm, long *delta,
long *deltatsc) return 0;
}

+static int __init calculate_lapic_clockevent(void)
+{
+ if (!lapic_timer_frequency)
+ return -1;
+
+ /* Calculate the scaled math multiplication factor */
+ lapic_clockevent.mult =
div_sc(lapic_timer_frequency/APIC_DIVISOR,
+ TICK_NSEC,
lapic_clockevent.shift);
+ lapic_clockevent.max_delta_ns =
+ clockevent_delta2ns(0x7FFFFFFF, &lapic_clockevent);
+ lapic_clockevent.max_delta_ticks = 0x7FFFFFFF;
+ lapic_clockevent.min_delta_ns =
+ clockevent_delta2ns(0xF, &lapic_clockevent);
+ lapic_clockevent.min_delta_ticks = 0xF;
+
+ return 0;
+}
+
static int __init calibrate_APIC_clock(void)
{
struct clock_event_device *levt = this_cpu_ptr(&lapic_events);
@@ -818,18 +836,17 @@ static int __init calibrate_APIC_clock(void)

if (boot_cpu_has(X86_FEATURE_TSC_DEADLINE_TIMER)) {
return 0;
- } else if (lapic_timer_frequency) {
+ }
+
+ if (!calculate_lapic_clockevent()) {
apic_printk(APIC_VERBOSE, "lapic timer already
calibrated %d\n", lapic_timer_frequency);
- lapic_clockevent.mult =
div_sc(lapic_timer_frequency/APIC_DIVISOR,
- TICK_NSEC,
lapic_clockevent.shift);
- lapic_clockevent.max_delta_ns =
- clockevent_delta2ns(0x7FFFFF,
&lapic_clockevent);
- lapic_clockevent.max_delta_ticks = 0x7FFFFF;
- lapic_clockevent.min_delta_ns =
- clockevent_delta2ns(0xF, &lapic_clockevent);
- lapic_clockevent.min_delta_ticks = 0xF;
+ /*
+ * Newer platforms provide early calibration methods
must have always
+ * running local APIC timers, no need for boradcast
timer.
+ */
lapic_clockevent.features &= ~CLOCK_EVT_FEAT_DUMMY;
+
return 0;
}

@@ -869,17 +886,8 @@ static int __init calibrate_APIC_clock(void)
pm_referenced = !calibrate_by_pmtimer(lapic_cal_pm2 -
lapic_cal_pm1, &delta, &deltatsc);

- /* Calculate the scaled math multiplication factor */
- lapic_clockevent.mult = div_sc(delta, TICK_NSEC *
LAPIC_CAL_LOOPS,
- lapic_clockevent.shift);
- lapic_clockevent.max_delta_ns =
- clockevent_delta2ns(0x7FFFFFFF, &lapic_clockevent);
- lapic_clockevent.max_delta_ticks = 0x7FFFFFFF;
- lapic_clockevent.min_delta_ns =
- clockevent_delta2ns(0xF, &lapic_clockevent);
- lapic_clockevent.min_delta_ticks = 0xF;
-
lapic_timer_frequency = (delta * APIC_DIVISOR) /
LAPIC_CAL_LOOPS;
+ calculate_lapic_clockevent();

apic_printk(APIC_VERBOSE, "..... delta %ld\n", delta);
apic_printk(APIC_VERBOSE, "..... mult: %u\n",
lapic_clockevent.mult);
--
2.7.4

Subject: [tip:x86/apic] x86/tsc: Use CPUID.0x16 to calculate missing crystal frequency

Commit-ID: 604dc9170f2435d27da5039a3efd757dceadc684
Gitweb: https://git.kernel.org/tip/604dc9170f2435d27da5039a3efd757dceadc684
Author: Daniel Drake <[email protected]>
AuthorDate: Thu, 9 May 2019 13:54:15 +0800
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 9 May 2019 11:06:48 +0200

x86/tsc: Use CPUID.0x16 to calculate missing crystal frequency

native_calibrate_tsc() had a data mapping Intel CPU families
and crystal clock speed, but hardcoded tables are not ideal, and this
approach was already problematic at least in the Skylake X case, as
seen in commit:

b51120309348 ("x86/tsc: Fix erroneous TSC rate on Skylake Xeon")

By examining CPUID data from http://instlatx64.atw.hu/ and units
in the lab, we have found that 3 different scenarios need to be dealt
with, and we can eliminate most of the hardcoded data using an approach a
little more advanced than before:

1. ApolloLake, GeminiLake, CannonLake (and presumably all new chipsets
from this point) report the crystal frequency directly via CPUID.0x15.
That's definitive data that we can rely upon.

2. Skylake, Kabylake and all variants of those two chipsets report a
crystal frequency of zero, however we can calculate the crystal clock
speed by condidering data from CPUID.0x16.

This method correctly distinguishes between the two crystal clock
frequencies present on different Skylake X variants that caused
headaches before.

As the calculations do not quite match the previously-hardcoded values
in some cases (e.g. 23913043Hz instead of 24MHz), TSC refinement is
enabled on all platforms where we had to calculate the crystal
frequency in this way.

3. Denverton (GOLDMONT_X) reports a crystal frequency of zero and does
not support CPUID.0x16, so we leave this entry hardcoded.

Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Daniel Drake <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/tsc.c | 47 +++++++++++++++++++++++++++--------------------
1 file changed, 27 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 15b5e98a86f9..6e6d933fb99c 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -631,31 +631,38 @@ unsigned long native_calibrate_tsc(void)

crystal_khz = ecx_hz / 1000;

- if (crystal_khz == 0) {
- switch (boot_cpu_data.x86_model) {
- case INTEL_FAM6_SKYLAKE_MOBILE:
- case INTEL_FAM6_SKYLAKE_DESKTOP:
- case INTEL_FAM6_KABYLAKE_MOBILE:
- case INTEL_FAM6_KABYLAKE_DESKTOP:
- crystal_khz = 24000; /* 24.0 MHz */
- break;
- case INTEL_FAM6_ATOM_GOLDMONT_X:
- crystal_khz = 25000; /* 25.0 MHz */
- break;
- case INTEL_FAM6_ATOM_GOLDMONT:
- crystal_khz = 19200; /* 19.2 MHz */
- break;
- }
- }
+ /*
+ * Denverton SoCs don't report crystal clock, and also don't support
+ * CPUID.0x16 for the calculation below, so hardcode the 25MHz crystal
+ * clock.
+ */
+ if (crystal_khz == 0 &&
+ boot_cpu_data.x86_model == INTEL_FAM6_ATOM_GOLDMONT_X)
+ crystal_khz = 25000;

- if (crystal_khz == 0)
- return 0;
/*
- * TSC frequency determined by CPUID is a "hardware reported"
+ * TSC frequency reported directly by CPUID is a "hardware reported"
* frequency and is the most accurate one so far we have. This
* is considered a known frequency.
*/
- setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
+ if (crystal_khz != 0)
+ setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
+
+ /*
+ * Some Intel SoCs like Skylake and Kabylake don't report the crystal
+ * clock, but we can easily calculate it to a high degree of accuracy
+ * by considering the crystal ratio and the CPU speed.
+ */
+ if (crystal_khz == 0 && boot_cpu_data.cpuid_level >= 0x16) {
+ unsigned int eax_base_mhz, ebx, ecx, edx;
+
+ cpuid(0x16, &eax_base_mhz, &ebx, &ecx, &edx);
+ crystal_khz = eax_base_mhz * 1000 *
+ eax_denominator / ebx_numerator;
+ }
+
+ if (crystal_khz == 0)
+ return 0;

/*
* For Atom SoCs TSC is the only reliable clocksource.