2016-10-11 21:05:44

by Bin Gao

[permalink] [raw]
Subject: Re: [PATCH v2] x86/tsc: Set X86_FEATURE_TSC_RELIABLE to skip refined calibration

On Fri, Aug 26, 2016 at 12:14:58PM +0200, Thomas Gleixner wrote:
> On Fri, 26 Aug 2016, Thomas Gleixner wrote:
> > On Thu, 25 Aug 2016, Bin Gao wrote:
> > > On Wed, Aug 24, 2016 at 10:51:20AM +0200, Thomas Gleixner wrote:
> > > > On Tue, 16 Aug 2016, Bin Gao wrote:
> > > > > On some newer Intel x86 processors/SoCs the TSC frequency can be directly
> > > > > calculated by factors read from specific MSR registers or from a cpuid
> > > > > leaf (0x15). TSC frequency calculated by native msr/cpuid is absolutely
> > > > > accurate so we should always skip calibrating TSC aginst another clock,
> > > > > e.g. PIT, HPET, etc. So we want to skip the refined calibration by setting
> > > > > the X86_FEATURE_TSC_RELIABLE flag. Existing code setting the flag by
> > > > > set_cpu_cap() doesn't work as the flag is cleared later in identify_cpu().
> > > > > A cpu caps flag is not cleared only if it's set by setup_force_cpu_cap().
> > > > > This patch converted set_cpu_cap() to setup_force_cpu_cap() to ensure
> > > > > refined calibration is skipped.
> > > > >
> > > > > We had a test on Intel CherryTrail platform: the 24 hours time drift is
> > > > > 3.6 seconds if refined calibration was not skipped while the drift is less
> > > > > than 0.6 second when refined calibration was skipped.
> > > > >
> > > > > Correctly setting the X86_FEATURE_TSC_RELIABLE flag also guarantees TSC is
> > > > > not monitored by timekeeping watchdog because on most of these system TSC
> > > > > is the only reliable clocksource. HPET, for instance, works but may not
> > > > > be reliable. So kernel may report a physically reliable TSC is not reliable
> > > > > just because a physically not reliable HPET is acting as timekeeping
> > > > > watchdog.
> > > >
> > > > What about non SoC systems where the MSR is available, but we still see that
> > > > cross socket TSC wreckage? This change will prevent the watchdog from
> > > > detecting that.
> > >
> > > MSR is only available on Intel Atom SoCs. There is no such a multi-socket system.
> >
> > Fair enough.
>
> Second thoughts. We should seperate the calibration aspect from the reliablity
> aspect.
>
> If a MSR/CPUID readout provides reliable calibration then this does not tell
> us about the reliablity (i.e. no watchdog required). So having two flags for
> this - and sure you can set both on those SoCs is the proper solution.
>
> Thanks,
>
> tglx


Hi Thomas,

The Linux kernel does think a reliable calibration implies the reliability (i.e.
no watchdog required). I'm posting some code pieces to explain.

X86_FEATURE_TSC_RELIABLE is referred only in two places as shown below.

As you can see from init_tsc_clocksource(), X86_FEATURE_TSC_RELIABLE acts
as a switch to launch the delayed calibration work. The delayed calibration
is skipped if X86_FEATURE_TSC_RELIABLE is set, else not.

In check_system_tsc_reliable(), X86_FEATURE_TSC_RELIABLE helps to set
tsc_clocksource_reliable which in turn enables TSC as a clocksource
watchdog, i.e. watching others instead of being watched by others.

So X86_FEATURE_TSC_RELIABLE really means two things:
1) Calibrated(or directly calculated) result is trustable so delayed
calibration is skipped.
2) TSC is reliable so clocksource framework won't monitor it, instead
TSC acts as a watchdog monitoring other clocksources.

X86_FEATURE_TSC_RELIABLE is set also only from two places:
arch/x86/platform/intel-mid/mrfld.c and arch/x86/platform/intel-mid/mrfld.c
which are for Intel Atom SoCs with MSR based TSC frequency calculation.

The patch I'm doing is to set this flag (X86_FEATURE_TSC_RELIABLE) for
another case: Intel processors/SoCs with CPUID based TSC frequency calculation.

arch/x86/kernel/tsc.c:
satic void __init check_system_tsc_reliable(void)
{
...... (lines ignored)

if (boot_cpu_has(X86_FEATURE_TSC_RELIABLE))
/* This flag is used by init_tsc_clocksource(), see below. */
tsc_clocksource_reliable = 1;
}

arch/x86/kernel/tsc.c:
static int __init init_tsc_clocksource(void)
{
...... (lines ignored)

if (tsc_clocksource_reliable)
clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;

...... (lines ignored)

/*
* Trust the results of the earlier calibration on systems
* exporting a reliable TSC.
*/
if (boot_cpu_has(X86_FEATURE_TSC_RELIABLE)) {
clocksource_register_khz(&clocksource_tsc, tsc_khz);
return 0;
}
schedule_delayed_work(&tsc_irqwork, 0);
return 0;
}

Thanks,
Bin



2016-10-12 07:56:38

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v2] x86/tsc: Set X86_FEATURE_TSC_RELIABLE to skip refined calibration

On Tue, 11 Oct 2016, Bin Gao wrote:
> On Fri, Aug 26, 2016 at 12:14:58PM +0200, Thomas Gleixner wrote:
>
> The Linux kernel does think a reliable calibration implies the reliability (i.e.
> no watchdog required). I'm posting some code pieces to explain.

I know that and I know exactly how all that works. And I certainly did not
ask for an explanation of the current state of affairs. Here is what I
wrote:

> > Second thoughts. We should seperate the calibration aspect from the reliablity
> > aspect.
> >
> > If a MSR/CPUID readout provides reliable calibration then this does not tell
> > us about the reliablity (i.e. no watchdog required). So having two flags for
> > this - and sure you can set both on those SoCs is the proper solution.

In other words: I want to have two seperate flags:

1) FEATURE_KNOWN_FREQUENCY - Grab the frequency from CPUID/MSR or whatever
and skip the whole calibration thing

2) FEATURE_RELIABLE - Do not invoke the watchdog

Thanks,

tglx






2016-10-13 23:10:58

by Bin Gao

[permalink] [raw]
Subject: [PATCH v3] x86/tsc: add X86_FEATURE_TSC_KNOWN_FREQ flag

The X86_FEATURE_TSC_RELIABLE flag in Linux kernel implies both reliable
(at runtime) and trustable (at calibration). But reliable running and
trustable calibration are logically irrelevant. Per Thomas Gleixner's
suggestion we would like to split this flag into two separate flags:
X86_FEATURE_TSC_RELIABLE - running reliably
X86_FEATURE_TSC_KNOWN_FREQ - frequency is known (no calibration required)
These two flags allow Linux kernel to act differently based on
processor/SoC's capability, i.e. no watchdog on TSC if TSC is reliable,
and no calibration if TSC frequency is known.

Current Linux kernel already gurantees calibration is skipped for
processors that can report TSC frequency by CPUID or MSR. However, the
delayed calibration is still not skipped for these CPUID/MSR capable
processors. The new flag X86_FEATURE_TSC_KNOWN_FREQ added by this patch
will gurantee the delayed calibration is skipped.

Signed-off-by: Bin Gao <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/tsc.c | 11 ++++++++++-
arch/x86/kernel/tsc_msr.c | 6 ++++++
arch/x86/platform/intel-mid/mfld.c | 7 +++++--
arch/x86/platform/intel-mid/mrfld.c | 6 ++++--
5 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 1188bc8..2df6e86 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -106,6 +106,7 @@
#define X86_FEATURE_APERFMPERF ( 3*32+28) /* APERFMPERF */
#define X86_FEATURE_EAGER_FPU ( 3*32+29) /* "eagerfpu" Non lazy FPU restore */
#define X86_FEATURE_NONSTOP_TSC_S3 ( 3*32+30) /* TSC doesn't stop in S3 state */
+#define X86_FEATURE_TSC_KNOWN_FREQ ( 3*32+31) /* TSC has known frequency */

/* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */
#define X86_FEATURE_XMM3 ( 4*32+ 0) /* "pni" SSE-3 */
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 46b2f41..aed2dc3 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -702,6 +702,15 @@ unsigned long native_calibrate_tsc(void)
}
}

+ setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
+
+ /*
+ * For Atom SoCs TSC is the only reliable clocksource.
+ * Mark TSC reliable so no watchdog on it.
+ */
+ if (boot_cpu_data.x86_model == INTEL_FAM6_ATOM_GOLDMONT)
+ setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
+
return crystal_khz * ebx_numerator / eax_denominator;
}

@@ -1286,7 +1295,7 @@ static int __init init_tsc_clocksource(void)
* Trust the results of the earlier calibration on systems
* exporting a reliable TSC.
*/
- if (boot_cpu_has(X86_FEATURE_TSC_RELIABLE)) {
+ if (boot_cpu_has(X86_FEATURE_TSC_KNOWN_FREQ)) {
clocksource_register_khz(&clocksource_tsc, tsc_khz);
return 0;
}
diff --git a/arch/x86/kernel/tsc_msr.c b/arch/x86/kernel/tsc_msr.c
index 0fe720d..8c33292 100644
--- a/arch/x86/kernel/tsc_msr.c
+++ b/arch/x86/kernel/tsc_msr.c
@@ -100,5 +100,11 @@ unsigned long cpu_khz_from_msr(void)
#ifdef CONFIG_X86_LOCAL_APIC
lapic_timer_frequency = (freq * 1000) / HZ;
#endif
+
+ setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
+
+ /* Mark TSC reliable */
+ setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
+
return res;
}
diff --git a/arch/x86/platform/intel-mid/mfld.c b/arch/x86/platform/intel-mid/mfld.c
index 1eb47b6..6724ab9b 100644
--- a/arch/x86/platform/intel-mid/mfld.c
+++ b/arch/x86/platform/intel-mid/mfld.c
@@ -49,8 +49,11 @@ static unsigned long __init mfld_calibrate_tsc(void)
fast_calibrate = ratio * fsb;
pr_debug("read penwell tsc %lu khz\n", fast_calibrate);
lapic_timer_frequency = fsb * 1000 / HZ;
- /* mark tsc clocksource as reliable */
- set_cpu_cap(&boot_cpu_data, X86_FEATURE_TSC_RELIABLE);
+
+ setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
+
+ /* Mark TSC reliable */
+ setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);

return fast_calibrate;
}
diff --git a/arch/x86/platform/intel-mid/mrfld.c b/arch/x86/platform/intel-mid/mrfld.c
index 59253db..c8b9870 100644
--- a/arch/x86/platform/intel-mid/mrfld.c
+++ b/arch/x86/platform/intel-mid/mrfld.c
@@ -78,8 +78,10 @@ static unsigned long __init tangier_calibrate_tsc(void)
pr_debug("Setting lapic_timer_frequency = %d\n",
lapic_timer_frequency);

- /* mark tsc clocksource as reliable */
- set_cpu_cap(&boot_cpu_data, X86_FEATURE_TSC_RELIABLE);
+ setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
+
+ /* Mark TSC reliable */
+ setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);

return fast_calibrate;
}
--
1.9.1

2016-10-20 10:01:08

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v3] x86/tsc: add X86_FEATURE_TSC_KNOWN_FREQ flag

On Thu, 13 Oct 2016, Bin Gao wrote:
> @@ -702,6 +702,15 @@ unsigned long native_calibrate_tsc(void)
> }
> }
>
> + setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
> +
> + /*
> + * For Atom SoCs TSC is the only reliable clocksource.
> + * Mark TSC reliable so no watchdog on it.
> + */
> + if (boot_cpu_data.x86_model == INTEL_FAM6_ATOM_GOLDMONT)
> + setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
> +

Right. That's what I wanted to see, but please split this into two patches:

#1 Split the TSC flags
#2 Set the flag for Goldmont

We do not mix design changes with hw support changes.

Thanks,

tglx

2016-10-20 10:17:52

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3] x86/tsc: add X86_FEATURE_TSC_KNOWN_FREQ flag

On Thu, Oct 20, 2016 at 11:57:03AM +0200, Thomas Gleixner wrote:
> On Thu, 13 Oct 2016, Bin Gao wrote:
> > @@ -702,6 +702,15 @@ unsigned long native_calibrate_tsc(void)
> > }
> > }
> >
> > + setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
> > +
> > + /*
> > + * For Atom SoCs TSC is the only reliable clocksource.
> > + * Mark TSC reliable so no watchdog on it.
> > + */
> > + if (boot_cpu_data.x86_model == INTEL_FAM6_ATOM_GOLDMONT)
> > + setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
> > +

AFAICT setting TSC_RELIABLE also skips the check_tsc_warp() tests in
tsc_sync.c.

This means that if someone does a Goldmont BIOS with 'features', we'll
never detect the wreckage :-/

2016-10-20 19:40:47

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v3] x86/tsc: add X86_FEATURE_TSC_KNOWN_FREQ flag

On Thu, 20 Oct 2016, Peter Zijlstra wrote:
> On Thu, Oct 20, 2016 at 11:57:03AM +0200, Thomas Gleixner wrote:
> > On Thu, 13 Oct 2016, Bin Gao wrote:
> > > @@ -702,6 +702,15 @@ unsigned long native_calibrate_tsc(void)
> > > }
> > > }
> > >
> > > + setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
> > > +
> > > + /*
> > > + * For Atom SoCs TSC is the only reliable clocksource.
> > > + * Mark TSC reliable so no watchdog on it.
> > > + */
> > > + if (boot_cpu_data.x86_model == INTEL_FAM6_ATOM_GOLDMONT)
> > > + setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
> > > +
>
> AFAICT setting TSC_RELIABLE also skips the check_tsc_warp() tests in
> tsc_sync.c.
>
> This means that if someone does a Goldmont BIOS with 'features', we'll
> never detect the wreckage :-/

Well, we have the same issue on other platforms/models which set the
reliable flag.

So one sanity check we can do is to read the IA32_TSC_ADJUST MSR on all
cores. They should all have the same value (usually 0) or at least have a
very minimal delta. If that's off by more than 1us then something is fishy
especially on single socket systems. We could at least WARN about it.

We could do this in idle occasionally as well, so we can detect the dreaded
"SMI wants to hide the cycles" crapola.

Thanks,

tglx

2016-10-21 05:47:24

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3] x86/tsc: add X86_FEATURE_TSC_KNOWN_FREQ flag

On Thu, Oct 20, 2016 at 09:37:50PM +0200, Thomas Gleixner wrote:

> Well, we have the same issue on other platforms/models which set the
> reliable flag.

I was not aware we had other platforms doing this, git grep tells me
intel-mid does this as well..

> So one sanity check we can do is to read the IA32_TSC_ADJUST MSR on all
> cores. They should all have the same value (usually 0) or at least have a
> very minimal delta. If that's off by more than 1us then something is fishy
> especially on single socket systems. We could at least WARN about it.
>
> We could do this in idle occasionally as well, so we can detect the dreaded
> "SMI wants to hide the cycles" crapola.

Indeed, that sounds like the best we can; and probably should; do.

2016-10-21 08:08:47

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v3] x86/tsc: add X86_FEATURE_TSC_KNOWN_FREQ flag

On Fri, 21 Oct 2016, Peter Zijlstra wrote:
> On Thu, Oct 20, 2016 at 09:37:50PM +0200, Thomas Gleixner wrote:
>
> > Well, we have the same issue on other platforms/models which set the
> > reliable flag.
>
> I was not aware we had other platforms doing this, git grep tells me
> intel-mid does this as well..
>
> > So one sanity check we can do is to read the IA32_TSC_ADJUST MSR on all
> > cores. They should all have the same value (usually 0) or at least have a
> > very minimal delta. If that's off by more than 1us then something is fishy
> > especially on single socket systems. We could at least WARN about it.
> >
> > We could do this in idle occasionally as well, so we can detect the dreaded
> > "SMI wants to hide the cycles" crapola.
>
> Indeed, that sounds like the best we can; and probably should; do.

I'll have a look at that in the next days.

Thanks,

tglx