Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752041AbcLJBzW (ORCPT ); Fri, 9 Dec 2016 20:55:22 -0500 Received: from vie01a-dmta-ch01-2.mx.upcmail.net ([84.116.36.92]:17975 "EHLO vie01a-dmta-ch01-2.mx.upcmail.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751011AbcLJBzU (ORCPT ); Fri, 9 Dec 2016 20:55:20 -0500 X-SourceIP: 77.56.147.151 X-Authenticated-Sender: rscheidegger_lists@hispeed.ch Subject: Re: [PATCH] x86/tsc: RFC: re-synchronize TSCs to boot cpu TSC To: Thomas Gleixner References: <1481179042-968-1-git-send-email-sroland@vmware.com> <6194ea73-99dd-ceca-6f8d-e945ca717c75@hispeed.ch> <38f3ca77-ce32-d180-106a-e7242c635394@hispeed.ch> <50876e71-ade8-3c3b-db8b-8f5ae136157e@hispeed.ch> Cc: Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, LKML , Allen Hung From: Roland Scheidegger Message-ID: Date: Sat, 10 Dec 2016 02:55:16 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 9642 Lines: 274 Am 09.12.2016 um 23:59 schrieb Thomas Gleixner: > On Fri, 9 Dec 2016, Roland Scheidegger wrote: > > Cc'ed someone from Dell. > >> Am 09.12.2016 um 18:33 schrieb Thomas Gleixner: >>> Can you add the patch below to gather more information? There is a hunk in >>> there with an '#if 0' which sets the TSC ADJUST to 0 on boot, which you can >>> turn on as second step. >> >> Ok, here's the results: >> ... >> TSC ADJUST synchronize: Reference CPU0: -2820267100 CPU1: -2822498296 >> TSC target sync skipped >> smpboot: Vector locked >> smpboot: Vector setup done >> smpboot: Clock setup >> TSC source sync skipped >> smpboot: Target CPU is online > > I did not expect that to happen. Now I'm puzzled and curious where the > machine gets lost after that. See below. > >> With the #if 0 block activated, it boots up fine, the output was: > > That does not make any sense at all, but yes, nothing in this context makes > sense. > >> [ 1.038892] x86: Booting SMP configuration: >> [ 1.038930] .... node #0, CPUs: #1 >> [ 0.171851] TSC ADJUST: CPU1: -2830353064 218577682002 >> [ 1.117495] TSC source sync 0 -> 1 runs 3 >> [ 0.171852] TSC ADJUST differs: Reference CPU0: -2828600940 CPU1: >> -2830353064 >> [ 0.171853] TSC ADJUST synchronize: Reference CPU0: 0 CPU1: -2830353064 >> [ 1.117497] TSC target sync skip > >> (And fwiw with my quick hack the lockups disappear to when I change that >> back to blast a zero into TSC_ADJ for all cpus.) > > Right, That's what that hunk does as well. > > Now what's interesting is that the adjustement of CPU1 in the non write to > zero case results in the following: > > TSC ADJUST: CPU1: -2830353064 218577682002 <-- TSC value > TSC ADJUST differs: Reference CPU0: -2828600940 CPU1: -2830353064 > > We write CPU1 adjust register to -2828600940 which makes the TSC on CPU1 > jump forwards by -2828600940 - -2830353064 = 1752124 cycles. > > In the write to zero case the jump is forward as well, but this time it's > huge, i.e. 2830353064 cycles. > > I tried to wreckage the TSC by writing similar values to the adjust MSR on > early boot, but independent of the values and independent of the write to > zero part the machine comes up happily all the time. > > The only difference is that my machine has a somewhat saner BIOS. So the > thing might just die in the value add SMM crap, but who knows. > > In the patch below is another bunch of debug prints which emit the state > information of CPU1 during bringup. Maybe that gives a hint where the > system gets stuck when you disable the 'write to zero' magic again. > > The NMI watchdog does not catch anything, right? Nope. (Though as mentioned earlier, with my hack when not writing zero it did - but the lockup there was later after all 16 cpus were online, and I only really tried that with the ubuntu 4.4 kernel. I never got to see the full output from that NMI though due to limited screen space, my attempts to try anything different than text mode were met with a blank screen, and from the parts I did see I didn't really see anything interesting albeit that's not saying much as I really have no idea about that code...) With the new patch here's the output (albeit the typing gets a bit annoying...) ... Invoking state 32 CB replay_prepare_cpu+0x0/0xe0 CB returned 0 Invoking state 35 CB rcutree_prepare_cpu+0x0/0x50 CB returned 0 Invoking state 41 CB notify_prepare+0x0/0xa0 CBreturned 0 Invoking state 48 CB bringup_cpu+0x0/0x90 x86: Booting SMP configuration: .... node #0, CPUs: #1 Invoking state 51 CB sched_cpu_starting+0x0/0x60 CB returned 0 Invoking state 62 CB x86_pmu_starting_cpu+0x0/0x20 CB returned 0 TSC ADJUST: CPU1: -2846131604 175264328618 TSC ADJUST differs: Reference CPU0: -2843967660 CPU1: -2846131604 TSC ADJUST synchronize: Reference CPU0: -2843967660 CPU1: -2846131604 TSC target sync skip smpboot: Vector locked TSC source sync 0 -> 1 runs 3 smpboot: Vector setup done smpboot: Clock setup TSC source sync skipped smpboot: Target CPU is online > >> The system also came back up fine from suspend with this (well - still >> minus graphics...), however disabled tsc clocksource: >> >> [ 579.931739] Enabling non-boot CPUs ... >> [ 579.943107] smpboot: Booting Node 0 Processor 1 APIC 0x2 >> [ 579.943189] TSC ADJUST: CPU1: -1504429974 21601834126 > > Fun, yet another adjust value. Are they set by a random number generator? > >> [ 579.944093] CPU1 is up > >> [ 580.458983] clocksource: timekeeping watchdog on CPU1: Marking >> clocksource 'tsc' as unstable because the skew is too large: >> [ 580.458985] clocksource: 'hpet' wd_now: 587c1 >> wd_last: 437c7 mask: ffffffff >> [ 580.458986] clocksource: 'tsc' cs_now: >> 563963cd8 cs_last: 508f5a02a mask: ffffffffffffffff > > Ok, that's caused by the fact that we do not sanitize the TSC adjust > register on the boot CPU in the resume path. Yep. (Which is why my hack hacked the restore path as well - I am still really confused about boot/resume boot cpu initialization...) > >> [ 581.006760] [Firmware Bug]: TSC ADJUST differs: CPU0 0 --> >> -1502494750. Restoring > > We only detect it later and correct it, but that's too late. The untested > patch below should cure that. > >>> This BIOS seems to be rather strange ... >> Don't tell me... >> >> No idea what it's doing, but I think it's safe to say whatever it's >> trying to do, it's doing it wrong... > > Amen to that. I've seen a lot of Value Add BIOSes which broke things left > and right, but this one takes it to a new level of insanity. > > Did you report that back to DELL already? No, I never did (even though the box actually shipped with ubuntu (albeit a really old version)). Since it was basically working - just without TSCs... I was quite happy with using my hack for quite some time, it just got annoying I'll either have to stick to one kernel or reapply it all the time :-). Roland > > Thanks, > > tglx > > 8<-------------------- > --- a/arch/x86/include/asm/tsc.h > +++ b/arch/x86/include/asm/tsc.h > @@ -47,7 +47,7 @@ extern int tsc_clocksource_reliable; > */ > #ifdef CONFIG_X86_TSC > extern bool tsc_store_and_check_tsc_adjust(void); > -extern void tsc_verify_tsc_adjust(void); > +extern void tsc_verify_tsc_adjust(bool resume); > extern void check_tsc_sync_source(int cpu); > extern void check_tsc_sync_target(void); > #else > --- a/arch/x86/kernel/process.c > +++ b/arch/x86/kernel/process.c > @@ -277,7 +277,7 @@ void exit_idle(void) > > void arch_cpu_idle_enter(void) > { > - tsc_verify_tsc_adjust(); > + tsc_verify_tsc_adjust(false); > local_touch_nmi(); > enter_idle(); > } > --- a/arch/x86/kernel/tsc.c > +++ b/arch/x86/kernel/tsc.c > @@ -1080,6 +1080,11 @@ static void detect_art(void) > > static struct clocksource clocksource_tsc; > > +static void tsc_resume(struct clocksource *cs) > +{ > + tsc_verify_tsc_adjust(true); > +} > + > /* > * We used to compare the TSC to the cycle_last value in the clocksource > * structure to avoid a nasty time-warp. This can be observed in a > @@ -1112,6 +1117,7 @@ static struct clocksource clocksource_ts > .flags = CLOCK_SOURCE_IS_CONTINUOUS | > CLOCK_SOURCE_MUST_VERIFY, > .archdata = { .vclock_mode = VCLOCK_TSC }, > + .resume = tsc_resume, > }; > > void mark_tsc_unstable(char *reason) > --- a/arch/x86/kernel/tsc_sync.c > +++ b/arch/x86/kernel/tsc_sync.c > @@ -30,7 +30,7 @@ struct tsc_adjust { > > static DEFINE_PER_CPU(struct tsc_adjust, tsc_adjust); > > -void tsc_verify_tsc_adjust(void) > +void tsc_verify_tsc_adjust(bool resume) > { > struct tsc_adjust *adj = this_cpu_ptr(&tsc_adjust); > s64 curval; > @@ -39,7 +39,7 @@ void tsc_verify_tsc_adjust(void) > return; > > /* Rate limit the MSR check */ > - if (time_before(jiffies, adj->nextcheck)) > + if (!resume && time_before(jiffies, adj->nextcheck)) > return; > > adj->nextcheck = jiffies + HZ; > @@ -51,7 +51,7 @@ void tsc_verify_tsc_adjust(void) > /* Restore the original value */ > wrmsrl(MSR_IA32_TSC_ADJUST, adj->adjusted); > > - if (!adj->warned) { > + if (!adj->warned || resume) { > pr_warn(FW_BUG "TSC ADJUST differs: CPU%u %lld --> %lld. Restoring\n", > smp_processor_id(), adj->adjusted, curval); > adj->warned = true; > --- a/arch/x86/power/cpu.c > +++ b/arch/x86/power/cpu.c > @@ -256,6 +256,7 @@ static void notrace __restore_processor_ > mtrr_bp_restore(); > perf_restore_debug_store(); > msr_restore_context(ctxt); > + tsc_verify_tsc_adjust(true); > } > > /* Needed by apm.c */ > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@ -133,7 +133,9 @@ static int cpuhp_invoke_callback(unsigne > if (!cb) > return 0; > trace_cpuhp_enter(cpu, st->target, state, cb); > + pr_info("Invoking state %d CB %pF\n", state, cb); > ret = cb(cpu); > + pr_info("CB returned %d\n", ret); > trace_cpuhp_exit(cpu, st->state, state, ret); > return ret; > } > @@ -153,7 +155,10 @@ static int cpuhp_invoke_callback(unsigne > cnt = 0; > hlist_for_each(node, &step->list) { > trace_cpuhp_multi_enter(cpu, st->target, state, cbm, node); > + pr_info("Invoking state %d CB %pF instance %p\n", state, cbm, > + node); > ret = cbm(cpu, node); > + pr_info("CB returned %d\n", ret); > trace_cpuhp_exit(cpu, st->state, state, ret); > if (ret) > goto err; > @@ -550,6 +555,7 @@ static void cpuhp_thread_fun(unsigned in > > st->should_run = false; > > + pr_info("CPUHP thread running\n"); > /* Single callback invocation for [un]install ? */ > if (st->single) { > if (st->cb_state < CPUHP_AP_ONLINE) { >