Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753787AbcLIXC3 (ORCPT ); Fri, 9 Dec 2016 18:02:29 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:36163 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751064AbcLIXC1 (ORCPT ); Fri, 9 Dec 2016 18:02:27 -0500 Date: Fri, 9 Dec 2016 23:59:21 +0100 (CET) From: Thomas Gleixner To: Roland Scheidegger cc: Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, LKML , Allen Hung Subject: Re: [PATCH] x86/tsc: RFC: re-synchronize TSCs to boot cpu TSC In-Reply-To: <50876e71-ade8-3c3b-db8b-8f5ae136157e@hispeed.ch> Message-ID: References: <1481179042-968-1-git-send-email-sroland@vmware.com> <6194ea73-99dd-ceca-6f8d-e945ca717c75@hispeed.ch> <38f3ca77-ce32-d180-106a-e7242c635394@hispeed.ch> <50876e71-ade8-3c3b-db8b-8f5ae136157e@hispeed.ch> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7379 Lines: 223 On Fri, 9 Dec 2016, Roland Scheidegger wrote: Cc'ed someone from Dell. > Am 09.12.2016 um 18:33 schrieb Thomas Gleixner: > > Can you add the patch below to gather more information? There is a hunk in > > there with an '#if 0' which sets the TSC ADJUST to 0 on boot, which you can > > turn on as second step. > > Ok, here's the results: > ... > TSC ADJUST synchronize: Reference CPU0: -2820267100 CPU1: -2822498296 > TSC target sync skipped > smpboot: Vector locked > smpboot: Vector setup done > smpboot: Clock setup > TSC source sync skipped > smpboot: Target CPU is online I did not expect that to happen. Now I'm puzzled and curious where the machine gets lost after that. See below. > With the #if 0 block activated, it boots up fine, the output was: That does not make any sense at all, but yes, nothing in this context makes sense. > [ 1.038892] x86: Booting SMP configuration: > [ 1.038930] .... node #0, CPUs: #1 > [ 0.171851] TSC ADJUST: CPU1: -2830353064 218577682002 > [ 1.117495] TSC source sync 0 -> 1 runs 3 > [ 0.171852] TSC ADJUST differs: Reference CPU0: -2828600940 CPU1: > -2830353064 > [ 0.171853] TSC ADJUST synchronize: Reference CPU0: 0 CPU1: -2830353064 > [ 1.117497] TSC target sync skip > (And fwiw with my quick hack the lockups disappear to when I change that > back to blast a zero into TSC_ADJ for all cpus.) Right, That's what that hunk does as well. Now what's interesting is that the adjustement of CPU1 in the non write to zero case results in the following: TSC ADJUST: CPU1: -2830353064 218577682002 <-- TSC value TSC ADJUST differs: Reference CPU0: -2828600940 CPU1: -2830353064 We write CPU1 adjust register to -2828600940 which makes the TSC on CPU1 jump forwards by -2828600940 - -2830353064 = 1752124 cycles. In the write to zero case the jump is forward as well, but this time it's huge, i.e. 2830353064 cycles. I tried to wreckage the TSC by writing similar values to the adjust MSR on early boot, but independent of the values and independent of the write to zero part the machine comes up happily all the time. The only difference is that my machine has a somewhat saner BIOS. So the thing might just die in the value add SMM crap, but who knows. In the patch below is another bunch of debug prints which emit the state information of CPU1 during bringup. Maybe that gives a hint where the system gets stuck when you disable the 'write to zero' magic again. The NMI watchdog does not catch anything, right? > The system also came back up fine from suspend with this (well - still > minus graphics...), however disabled tsc clocksource: > > [ 579.931739] Enabling non-boot CPUs ... > [ 579.943107] smpboot: Booting Node 0 Processor 1 APIC 0x2 > [ 579.943189] TSC ADJUST: CPU1: -1504429974 21601834126 Fun, yet another adjust value. Are they set by a random number generator? > [ 579.944093] CPU1 is up > [ 580.458983] clocksource: timekeeping watchdog on CPU1: Marking > clocksource 'tsc' as unstable because the skew is too large: > [ 580.458985] clocksource: 'hpet' wd_now: 587c1 > wd_last: 437c7 mask: ffffffff > [ 580.458986] clocksource: 'tsc' cs_now: > 563963cd8 cs_last: 508f5a02a mask: ffffffffffffffff Ok, that's caused by the fact that we do not sanitize the TSC adjust register on the boot CPU in the resume path. > [ 581.006760] [Firmware Bug]: TSC ADJUST differs: CPU0 0 --> > -1502494750. Restoring We only detect it later and correct it, but that's too late. The untested patch below should cure that. > > This BIOS seems to be rather strange ... > Don't tell me... > > No idea what it's doing, but I think it's safe to say whatever it's > trying to do, it's doing it wrong... Amen to that. I've seen a lot of Value Add BIOSes which broke things left and right, but this one takes it to a new level of insanity. Did you report that back to DELL already? Thanks, tglx 8<-------------------- --- a/arch/x86/include/asm/tsc.h +++ b/arch/x86/include/asm/tsc.h @@ -47,7 +47,7 @@ extern int tsc_clocksource_reliable; */ #ifdef CONFIG_X86_TSC extern bool tsc_store_and_check_tsc_adjust(void); -extern void tsc_verify_tsc_adjust(void); +extern void tsc_verify_tsc_adjust(bool resume); extern void check_tsc_sync_source(int cpu); extern void check_tsc_sync_target(void); #else --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -277,7 +277,7 @@ void exit_idle(void) void arch_cpu_idle_enter(void) { - tsc_verify_tsc_adjust(); + tsc_verify_tsc_adjust(false); local_touch_nmi(); enter_idle(); } --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -1080,6 +1080,11 @@ static void detect_art(void) static struct clocksource clocksource_tsc; +static void tsc_resume(struct clocksource *cs) +{ + tsc_verify_tsc_adjust(true); +} + /* * We used to compare the TSC to the cycle_last value in the clocksource * structure to avoid a nasty time-warp. This can be observed in a @@ -1112,6 +1117,7 @@ static struct clocksource clocksource_ts .flags = CLOCK_SOURCE_IS_CONTINUOUS | CLOCK_SOURCE_MUST_VERIFY, .archdata = { .vclock_mode = VCLOCK_TSC }, + .resume = tsc_resume, }; void mark_tsc_unstable(char *reason) --- a/arch/x86/kernel/tsc_sync.c +++ b/arch/x86/kernel/tsc_sync.c @@ -30,7 +30,7 @@ struct tsc_adjust { static DEFINE_PER_CPU(struct tsc_adjust, tsc_adjust); -void tsc_verify_tsc_adjust(void) +void tsc_verify_tsc_adjust(bool resume) { struct tsc_adjust *adj = this_cpu_ptr(&tsc_adjust); s64 curval; @@ -39,7 +39,7 @@ void tsc_verify_tsc_adjust(void) return; /* Rate limit the MSR check */ - if (time_before(jiffies, adj->nextcheck)) + if (!resume && time_before(jiffies, adj->nextcheck)) return; adj->nextcheck = jiffies + HZ; @@ -51,7 +51,7 @@ void tsc_verify_tsc_adjust(void) /* Restore the original value */ wrmsrl(MSR_IA32_TSC_ADJUST, adj->adjusted); - if (!adj->warned) { + if (!adj->warned || resume) { pr_warn(FW_BUG "TSC ADJUST differs: CPU%u %lld --> %lld. Restoring\n", smp_processor_id(), adj->adjusted, curval); adj->warned = true; --- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -256,6 +256,7 @@ static void notrace __restore_processor_ mtrr_bp_restore(); perf_restore_debug_store(); msr_restore_context(ctxt); + tsc_verify_tsc_adjust(true); } /* Needed by apm.c */ --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -133,7 +133,9 @@ static int cpuhp_invoke_callback(unsigne if (!cb) return 0; trace_cpuhp_enter(cpu, st->target, state, cb); + pr_info("Invoking state %d CB %pF\n", state, cb); ret = cb(cpu); + pr_info("CB returned %d\n", ret); trace_cpuhp_exit(cpu, st->state, state, ret); return ret; } @@ -153,7 +155,10 @@ static int cpuhp_invoke_callback(unsigne cnt = 0; hlist_for_each(node, &step->list) { trace_cpuhp_multi_enter(cpu, st->target, state, cbm, node); + pr_info("Invoking state %d CB %pF instance %p\n", state, cbm, + node); ret = cbm(cpu, node); + pr_info("CB returned %d\n", ret); trace_cpuhp_exit(cpu, st->state, state, ret); if (ret) goto err; @@ -550,6 +555,7 @@ static void cpuhp_thread_fun(unsigned in st->should_run = false; + pr_info("CPUHP thread running\n"); /* Single callback invocation for [un]install ? */ if (st->single) { if (st->cb_state < CPUHP_AP_ONLINE) {