2017-12-22 05:28:05

by Len Brown

[permalink] [raw]
Subject: [PATCH 1/3] x86/tsc: Future-proof native_calibrate_tsc()

From: Len Brown <[email protected]>

If native_calibrate_tsc() can not discover the TSC frequency,
via CPUID or via built-in table, it must return without
setting X86_FEATURE_TSC_KNOWN_FREQ. Otherwise, X86_FEATURE_TSC_KNOWN_FREQ
will prevent TSC refined calibration.

This patch allows Linux to correctly support future Intel hardware,
that has (cpu_khz != tsc_khz), and support for CPUID.15
without support for CPUID.15.crystal_khz.

This patch is needed since X86_FEATURE_TSC_KNOWN_FREQ was added in Linux-4.10:

commit 4ca4df0b7eb0
("x86/tsc: Mark TSC frequency determined by CPUID as known")

If not applied, such systems will run with tsc_khz = cpu_khz,
which may result in under-stated TSC rate, and time-of-day drift.

Signed-off-by: Len Brown <[email protected]>
Cc: Bin Gao <[email protected]>
Cc: <[email protected]> # v4.10+
---
arch/x86/kernel/tsc.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 8ea117f8142e..ce4b71119c36 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -612,6 +612,8 @@ unsigned long native_calibrate_tsc(void)
}
}

+ if (crystal_khz == 0)
+ return 0;
/*
* TSC frequency determined by CPUID is a "hardware reported"
* frequency and is the most accurate one so far we have. This
--
2.14.0-rc0


2017-12-22 05:28:09

by Len Brown

[permalink] [raw]
Subject: [PATCH 2/3] x86/tsc: Fix erroneous TSC rate on Skylake Xeon

From: Len Brown <[email protected]>

Linux-4.9 added INTEL_FAM6_SKYLAKE_X to native_calibrate_tsc():

commit 6baf3d61821f
("x86/tsc: Add additional Intel CPU models to the crystal quirk list")

There are several problems with doing this.

The first is that while SKX servers use a 25 MHz crystal,
SKX workstations (with same model #) use a 24 MHz crystal.
This results in a -4.0% time drift rate on SKX workstations.

While SKX servers do have a 25 MHz crystal, but they too have a problem.
All SKX subject the crystal to an EMI reduction circuit that
reduces its actual frequency by (approximately) -0.25%.
This results in -1 second per 10 minute time drift
as compared to network time.

This issue can also trigger a timer and power problem,
on configurations that use the LAPIC timer (versus the TSC deadline timer).
Clock ticks scheduled with the LAPIC timer arrive a few usec
before the time they are expected (according to the slow TSC).
This causes Linux to poll-idle, when it should be in an idle
power saving state. The idle and clock code do not graciously
recover from this error, sometimes resulting in significant polling
and measurable power impact.

So stop using native_calibrate_tsc() for INTEL_FAM6_SKYLAKE_X.
native_calibrate_tsc() will return 0, boot will run with
tsc_khz = cpu_khz, and the TSC refined calibration will
update tsc_khz to correct for the difference.

This patch restores correctness. Without it, all three of the
issues above occur.

Signed-off-by: Len Brown <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Cc: <[email protected]> # v4.9+
---
arch/x86/kernel/tsc.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index ce4b71119c36..3bf4df7f52d7 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -602,7 +602,6 @@ unsigned long native_calibrate_tsc(void)
case INTEL_FAM6_KABYLAKE_DESKTOP:
crystal_khz = 24000; /* 24.0 MHz */
break;
- case INTEL_FAM6_SKYLAKE_X:
case INTEL_FAM6_ATOM_DENVERTON:
crystal_khz = 25000; /* 25.0 MHz */
break;
--
2.14.0-rc0

2017-12-22 05:28:14

by Len Brown

[permalink] [raw]
Subject: [PATCH 3/3] x86/tsc: Print tsc_khz, when it differs from cpu_khz

From: Len Brown <[email protected]>

When Linux detects the CPU and TSC rate are the same

tsc: Detected 2900.000 MHz processor

tells us both. But if Linux detects a TSC rate that differs
from the CPU rate, print that also:

tsc: Detected 2904.000 MHz TSC

Signed-off-by: Len Brown <[email protected]>
---
arch/x86/kernel/tsc.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 3bf4df7f52d7..6077ef5354d5 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1316,6 +1316,11 @@ void __init tsc_init(void)
(unsigned long)cpu_khz / 1000,
(unsigned long)cpu_khz % 1000);

+ if (cpu_khz != tsc_khz)
+ pr_info("Detected %lu.%03lu MHz TSC",
+ (unsigned long)tsc_khz / 1000,
+ (unsigned long)tsc_khz % 1000);
+
/* Sanitize TSC ADJUST before cyc2ns gets initialized */
tsc_store_and_check_tsc_adjust(true);

--
2.14.0-rc0

2018-01-02 15:36:37

by Prarit Bhargava

[permalink] [raw]
Subject: Re: [PATCH 2/3] x86/tsc: Fix erroneous TSC rate on Skylake Xeon



On 12/22/2017 12:27 AM, Len Brown wrote:
> From: Len Brown <[email protected]>
>
> Linux-4.9 added INTEL_FAM6_SKYLAKE_X to native_calibrate_tsc():
>
> commit 6baf3d61821f
> ("x86/tsc: Add additional Intel CPU models to the crystal quirk list")
>
> There are several problems with doing this.
>
> The first is that while SKX servers use a 25 MHz crystal,
> SKX workstations (with same model #) use a 24 MHz crystal.
> This results in a -4.0% time drift rate on SKX workstations.
>
> While SKX servers do have a 25 MHz crystal, but they too have a problem.
> All SKX subject the crystal to an EMI reduction circuit that
> reduces its actual frequency by (approximately) -0.25%.
> This results in -1 second per 10 minute time drift
> as compared to network time.
>
> This issue can also trigger a timer and power problem,
> on configurations that use the LAPIC timer (versus the TSC deadline timer).
> Clock ticks scheduled with the LAPIC timer arrive a few usec
> before the time they are expected (according to the slow TSC).
> This causes Linux to poll-idle, when it should be in an idle
> power saving state. The idle and clock code do not graciously
> recover from this error, sometimes resulting in significant polling
> and measurable power impact.
>
> So stop using native_calibrate_tsc() for INTEL_FAM6_SKYLAKE_X.
> native_calibrate_tsc() will return 0, boot will run with
> tsc_khz = cpu_khz, and the TSC refined calibration will
> update tsc_khz to correct for the difference.
>
> This patch restores correctness. Without it, all three of the
> issues above occur.
>
> Signed-off-by: Len Brown <[email protected]>
> Cc: Prarit Bhargava <[email protected]>
> Cc: <[email protected]> # v4.9+
> ---
> arch/x86/kernel/tsc.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index ce4b71119c36..3bf4df7f52d7 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -602,7 +602,6 @@ unsigned long native_calibrate_tsc(void)
> case INTEL_FAM6_KABYLAKE_DESKTOP:
> crystal_khz = 24000; /* 24.0 MHz */
> break;
> - case INTEL_FAM6_SKYLAKE_X:
> case INTEL_FAM6_ATOM_DENVERTON:
> crystal_khz = 25000; /* 25.0 MHz */
> break;
>

Sorry, Len I didn't realize you had sent this multiple times.

I lifted this code directly from tools/power/x86/turbostat/turbostat.c:4155
-- is that code incorrect too?

P.

2018-01-02 21:09:41

by Len Brown

[permalink] [raw]
Subject: Re: [PATCH 2/3] x86/tsc: Fix erroneous TSC rate on Skylake Xeon

On Tue, Jan 2, 2018 at 10:36 AM, Prarit Bhargava <[email protected]> wrote:
>
>
> On 12/22/2017 12:27 AM, Len Brown wrote:
>> From: Len Brown <[email protected]>
>>
>> Linux-4.9 added INTEL_FAM6_SKYLAKE_X to native_calibrate_tsc():

>
> Sorry, Len I didn't realize you had sent this multiple times.
>
> I lifted this code directly from tools/power/x86/turbostat/turbostat.c:4155
> -- is that code incorrect too?

yes, also a bug in turbostat -- fix is in the turbostat queue.

--
Len Brown, Intel Open Source Technology Center

Subject: [tip:x86/urgent] x86/tsc: Future-proof native_calibrate_tsc()

Commit-ID: da4ae6c4a0b8dee5a5377a385545d2250fa8cddb
Gitweb: https://git.kernel.org/tip/da4ae6c4a0b8dee5a5377a385545d2250fa8cddb
Author: Len Brown <[email protected]>
AuthorDate: Fri, 22 Dec 2017 00:27:54 -0500
Committer: Thomas Gleixner <[email protected]>
CommitDate: Sun, 14 Jan 2018 12:14:50 +0100

x86/tsc: Future-proof native_calibrate_tsc()

If the crystal frequency cannot be determined via CPUID(15).crystal_khz or
the built-in table then native_calibrate_tsc() will still set the
X86_FEATURE_TSC_KNOWN_FREQ flag which prevents the refined TSC calibration.

As a consequence such systems use cpu_khz for the TSC frequency which is
incorrect when cpu_khz != tsc_khz resulting in time drift.

Return early when the crystal frequency cannot be retrieved without setting
the X86_FEATURE_TSC_KNOWN_FREQ flag. This ensures that the refined TSC
calibration is invoked.

[ tglx: Steam-blastered changelog. Sigh ]

Fixes: 4ca4df0b7eb0 ("x86/tsc: Mark TSC frequency determined by CPUID as known")
Signed-off-by: Len Brown <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: Bin Gao <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/0fe2503aa7d7fc69137141fc705541a78101d2b9.1513920414.git.len.brown@intel.com

---
arch/x86/kernel/tsc.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 8ea117f..ce4b711 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -612,6 +612,8 @@ unsigned long native_calibrate_tsc(void)
}
}

+ if (crystal_khz == 0)
+ return 0;
/*
* TSC frequency determined by CPUID is a "hardware reported"
* frequency and is the most accurate one so far we have. This

Subject: [tip:x86/urgent] x86/tsc: Fix erroneous TSC rate on Skylake Xeon

Commit-ID: b511203093489eb1829cb4de86e8214752205ac6
Gitweb: https://git.kernel.org/tip/b511203093489eb1829cb4de86e8214752205ac6
Author: Len Brown <[email protected]>
AuthorDate: Fri, 22 Dec 2017 00:27:55 -0500
Committer: Thomas Gleixner <[email protected]>
CommitDate: Sun, 14 Jan 2018 12:14:50 +0100

x86/tsc: Fix erroneous TSC rate on Skylake Xeon

The INTEL_FAM6_SKYLAKE_X hardcoded crystal_khz value of 25MHZ is
problematic:

- SKX workstations (with same model # as server variants) use a 24 MHz
crystal. This results in a -4.0% time drift rate on SKX workstations.

- SKX servers subject the crystal to an EMI reduction circuit that reduces its
actual frequency by (approximately) -0.25%. This results in -1 second per
10 minute time drift as compared to network time.

This issue can also trigger a timer and power problem, on configurations
that use the LAPIC timer (versus the TSC deadline timer). Clock ticks
scheduled with the LAPIC timer arrive a few usec before the time they are
expected (according to the slow TSC). This causes Linux to poll-idle, when
it should be in an idle power saving state. The idle and clock code do not
graciously recover from this error, sometimes resulting in significant
polling and measurable power impact.

Stop using native_calibrate_tsc() for INTEL_FAM6_SKYLAKE_X.
native_calibrate_tsc() will return 0, boot will run with tsc_khz = cpu_khz,
and the TSC refined calibration will update tsc_khz to correct for the
difference.

[ tglx: Sanitized change log ]

Fixes: 6baf3d61821f ("x86/tsc: Add additional Intel CPU models to the crystal quirk list")
Signed-off-by: Len Brown <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: Prarit Bhargava <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/ff6dcea166e8ff8f2f6a03c17beab2cb436aa779.1513920414.git.len.brown@intel.com

---
arch/x86/kernel/tsc.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index ce4b711..3bf4df7 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -602,7 +602,6 @@ unsigned long native_calibrate_tsc(void)
case INTEL_FAM6_KABYLAKE_DESKTOP:
crystal_khz = 24000; /* 24.0 MHz */
break;
- case INTEL_FAM6_SKYLAKE_X:
case INTEL_FAM6_ATOM_DENVERTON:
crystal_khz = 25000; /* 25.0 MHz */
break;

Subject: [tip:x86/urgent] x86/tsc: Print tsc_khz, when it differs from cpu_khz

Commit-ID: 4b5b2127238e689ee18aa6752959751dd61c4c73
Gitweb: https://git.kernel.org/tip/4b5b2127238e689ee18aa6752959751dd61c4c73
Author: Len Brown <[email protected]>
AuthorDate: Fri, 22 Dec 2017 00:27:56 -0500
Committer: Thomas Gleixner <[email protected]>
CommitDate: Sun, 14 Jan 2018 12:14:50 +0100

x86/tsc: Print tsc_khz, when it differs from cpu_khz

If CPU and TSC frequency are the same the printout of the CPU frequency is
valid for the TSC as well:

tsc: Detected 2900.000 MHz processor

If the TSC frequency is different there is no information in dmesg. Add a
conditional printout:

tsc: Detected 2904.000 MHz TSC

Signed-off-by: Len Brown <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/537b342debcd8e8aebc8d631015dcdf9f9ba8a26.1513920414.git.len.brown@intel.com

---
arch/x86/kernel/tsc.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 3bf4df7..e169e85 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1316,6 +1316,12 @@ void __init tsc_init(void)
(unsigned long)cpu_khz / 1000,
(unsigned long)cpu_khz % 1000);

+ if (cpu_khz != tsc_khz) {
+ pr_info("Detected %lu.%03lu MHz TSC",
+ (unsigned long)tsc_khz / 1000,
+ (unsigned long)tsc_khz % 1000);
+ }
+
/* Sanitize TSC ADJUST before cyc2ns gets initialized */
tsc_store_and_check_tsc_adjust(true);