2020-02-26 16:48:43

by Prarit Bhargava

[permalink] [raw]
Subject: [PATCH] x86/tsc: Add kernel options to disable CPUID and MSR calibrations

I have had to debug a few unstable x86 systems that required me to block
the CPUID and MSR calibrations and only use the PIT calibration. After
blocking the calibrations I was able to debug and find bugs in firmware
that were eventually fixed and resulted in stable systems.

Patrick Geary also posted a similar patch (see link below) that would have
allowed him to boot overclocked CPUs by skipping the CPUID calibration
which resulted in unstable boots due to timing issues.

Add kernel options to disable the CPUID and MSR calibrations.

Also allow for comma-separated TSC options, for example,

tsc=no_cpuid_calibration,no_msr_calibration,reliable

Link: https://lore.kernel.org/lkml/[email protected]/#r
Co-developed-by: Patrick Geary <[email protected]>
Signed-off-by: Patrick Geary <[email protected]>
Signed-off-by: Prarit Bhargava <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Pawan Gupta <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: Daniel Drake <[email protected]>
Cc: Michael Zhivich <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
---
.../admin-guide/kernel-parameters.txt | 8 +++-
arch/x86/kernel/tsc.c | 44 ++++++++++++++-----
2 files changed, 41 insertions(+), 11 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index dbc22d684627..0316aadfff08 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4942,7 +4942,7 @@
See Documentation/admin-guide/mm/transhuge.rst
for more details.

- tsc= Disable clocksource stability checks for TSC.
+ tsc=option[,option...] Various TSC options.
Format: <string>
[x86] reliable: mark tsc clocksource as reliable, this
disables clocksource verification at runtime, as well
@@ -4960,6 +4960,12 @@
in situations with strict latency requirements (where
interruptions from clocksource watchdog are not
acceptable).
+ [x86] no_cpuid_calibration: Disable the CPUID TSC
+ calibration. Used in situations where the CPUID
+ TSC khz does not match the actual CPU TSC khz
+ [x86] no_msr_calibration: Disable the MSR TSC
+ calibration. Used in situations where the MSR
+ TSC khz does not match the actual CPU TSC khz.

tsx= [X86] Control Transactional Synchronization
Extensions (TSX) feature in Intel processors that
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 7e322e2daaf5..c949cb833d05 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -45,6 +45,11 @@ static int __read_mostly tsc_unstable;
static DEFINE_STATIC_KEY_FALSE(__use_tsc);

int tsc_clocksource_reliable;
+/*
+ * TSC calibration sequence disablement
+ */
+int calibrate_cpuid_khz_enabled = 1;
+int calibrate_msr_enabled = 1;

static u32 art_to_tsc_numerator;
static u32 art_to_tsc_denominator;
@@ -287,14 +292,30 @@ static int no_tsc_watchdog;

static int __init tsc_setup(char *str)
{
- if (!strcmp(str, "reliable"))
- tsc_clocksource_reliable = 1;
- if (!strncmp(str, "noirqtime", 9))
- no_sched_irq_time = 1;
- if (!strcmp(str, "unstable"))
- mark_tsc_unstable("boot parameter");
- if (!strcmp(str, "nowatchdog"))
- no_tsc_watchdog = 1;
+ while (str) {
+ char *k = strchr(str, ',');
+
+ if (k)
+ *k++ = 0;
+
+ if (!strcmp(str, "reliable"))
+ tsc_clocksource_reliable = 1;
+ if (!strcmp(str, "noirqtime"))
+ no_sched_irq_time = 1;
+ if (!strcmp(str, "unstable"))
+ mark_tsc_unstable("boot parameter");
+ if (!strcmp(str, "nowatchdog"))
+ no_tsc_watchdog = 1;
+ if (!strcmp(str, "no_cpuid_calibration")) {
+ calibrate_cpuid_khz_enabled = 0;
+ pr_info("CPUID khz calibration disabled\n");
+ }
+ if (!strcmp(str, "no_msr_calibration")) {
+ calibrate_cpuid_khz_enabled = 0;
+ pr_info("msr calibration disabled\n");
+ }
+ str = k;
+ }
return 1;
}

@@ -860,9 +881,12 @@ static unsigned long pit_hpet_ptimer_calibrate_cpu(void)
*/
unsigned long native_calibrate_cpu_early(void)
{
- unsigned long flags, fast_calibrate = cpu_khz_from_cpuid();
+ unsigned long flags, fast_calibrate = 0;
+
+ if (calibrate_cpuid_khz_enabled)
+ fast_calibrate = cpu_khz_from_cpuid();

- if (!fast_calibrate)
+ if (!fast_calibrate && calibrate_msr_enabled)
fast_calibrate = cpu_khz_from_msr();
if (!fast_calibrate) {
local_irq_save(flags);
--
2.21.1


2020-02-26 16:55:03

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] x86/tsc: Add kernel options to disable CPUID and MSR calibrations

On Wed, Feb 26, 2020 at 11:43:08AM -0500, Prarit Bhargava wrote:
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index dbc22d684627..0316aadfff08 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4942,7 +4942,7 @@
> See Documentation/admin-guide/mm/transhuge.rst
> for more details.
>
> - tsc= Disable clocksource stability checks for TSC.
> + tsc=option[,option...] Various TSC options.
> Format: <string>
> [x86] reliable: mark tsc clocksource as reliable, this
> disables clocksource verification at runtime, as well
> @@ -4960,6 +4960,12 @@
> in situations with strict latency requirements (where
> interruptions from clocksource watchdog are not
> acceptable).
> + [x86] no_cpuid_calibration: Disable the CPUID TSC
> + calibration. Used in situations where the CPUID
> + TSC khz does not match the actual CPU TSC khz
> + [x86] no_msr_calibration: Disable the MSR TSC
> + calibration. Used in situations where the MSR
> + TSC khz does not match the actual CPU TSC khz.

Do we want to mention that these situations are mostly broken firmware?
Also do mention that if you disable these you might not boot due to not
having a PIT/HPET at all?

As it stands, I find this text a little too encouraging.

> tsx= [X86] Control Transactional Synchronization
> Extensions (TSX) feature in Intel processors that

2020-02-26 23:28:07

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] x86/tsc: Add kernel options to disable CPUID and MSR calibrations

Peter Zijlstra <[email protected]> writes:
> On Wed, Feb 26, 2020 at 11:43:08AM -0500, Prarit Bhargava wrote:
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index dbc22d684627..0316aadfff08 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -4942,7 +4942,7 @@
>> See Documentation/admin-guide/mm/transhuge.rst
>> for more details.
>>
>> - tsc= Disable clocksource stability checks for TSC.
>> + tsc=option[,option...] Various TSC options.
>> Format: <string>
>> [x86] reliable: mark tsc clocksource as reliable, this
>> disables clocksource verification at runtime, as well
>> @@ -4960,6 +4960,12 @@
>> in situations with strict latency requirements (where
>> interruptions from clocksource watchdog are not
>> acceptable).
>> + [x86] no_cpuid_calibration: Disable the CPUID TSC
>> + calibration. Used in situations where the CPUID
>> + TSC khz does not match the actual CPU TSC khz
>> + [x86] no_msr_calibration: Disable the MSR TSC
>> + calibration. Used in situations where the MSR
>> + TSC khz does not match the actual CPU TSC khz.
>
> Do we want to mention that these situations are mostly broken firmware?
> Also do mention that if you disable these you might not boot due to not
> having a PIT/HPET at all?

Right. Same discussion as before.

Also why do we want no_cpuid_calibration and no_msr_calibration? How
should Joe User figure out which one to use? This does not make
sense. The point is that the BIOS/Firmware supplied value in system
registers is bogus. So something like "skip_firmware_calibration" might
be better suitable.

Aside of that this really wants to be combined with the ability to
supply the actual frequency on the command line as I suggested in the
other thread to cope with machines which do not expose PIT/HPET or have
broken variants of them.

Thanks,

tglx


2020-02-27 13:00:15

by Prarit Bhargava

[permalink] [raw]
Subject: Re: [PATCH] x86/tsc: Add kernel options to disable CPUID and MSR calibrations



On 2/26/20 6:27 PM, Thomas Gleixner wrote:
> Peter Zijlstra <[email protected]> writes:
>> On Wed, Feb 26, 2020 at 11:43:08AM -0500, Prarit Bhargava wrote:
>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>>> index dbc22d684627..0316aadfff08 100644
>>> --- a/Documentation/admin-guide/kernel-parameters.txt
>>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>>> @@ -4942,7 +4942,7 @@
>>> See Documentation/admin-guide/mm/transhuge.rst
>>> for more details.
>>>
>>> - tsc= Disable clocksource stability checks for TSC.
>>> + tsc=option[,option...] Various TSC options.
>>> Format: <string>
>>> [x86] reliable: mark tsc clocksource as reliable, this
>>> disables clocksource verification at runtime, as well
>>> @@ -4960,6 +4960,12 @@
>>> in situations with strict latency requirements (where
>>> interruptions from clocksource watchdog are not
>>> acceptable).
>>> + [x86] no_cpuid_calibration: Disable the CPUID TSC
>>> + calibration. Used in situations where the CPUID
>>> + TSC khz does not match the actual CPU TSC khz
>>> + [x86] no_msr_calibration: Disable the MSR TSC
>>> + calibration. Used in situations where the MSR
>>> + TSC khz does not match the actual CPU TSC khz.
>>
>> Do we want to mention that these situations are mostly broken firmware?
>> Also do mention that if you disable these you might not boot due to not
>> having a PIT/HPET at all?
>
> Right. Same discussion as before.
>
> Also why do we want no_cpuid_calibration and no_msr_calibration? How


> should Joe User figure out which one to use? This does not make
> sense. The point is that the BIOS/Firmware supplied value in system
> registers is bogus. So something like "skip_firmware_calibration" might
> be better suitable.


no_cpuid_calibration was required for Patrick's case where the CPU was
overclocked and therefore the CPUID khz value was invalid, but the MSR value is
good. I had to skip both to get to the PIT calibration because I had broken FW.
I don't see how a single skip_firmware_calibration covers these cases.

>
> Aside of that this really wants to be combined with the ability to
> supply the actual frequency on the command line as I suggested in the
> other thread to cope with machines which do not expose PIT/HPET or have
> broken variants of them.

tglx, can you give a lore link to the thread?

Thanks,

P.
>
> Thanks,
>
> tglx
>
>