Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934210AbcLIRgQ (ORCPT ); Fri, 9 Dec 2016 12:36:16 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:35764 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933819AbcLIRgP (ORCPT ); Fri, 9 Dec 2016 12:36:15 -0500 Date: Fri, 9 Dec 2016 18:33:22 +0100 (CET) From: Thomas Gleixner To: Roland Scheidegger cc: Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] x86/tsc: RFC: re-synchronize TSCs to boot cpu TSC In-Reply-To: <38f3ca77-ce32-d180-106a-e7242c635394@hispeed.ch> Message-ID: References: <1481179042-968-1-git-send-email-sroland@vmware.com> <6194ea73-99dd-ceca-6f8d-e945ca717c75@hispeed.ch> <38f3ca77-ce32-d180-106a-e7242c635394@hispeed.ch> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3514 Lines: 112 On Fri, 9 Dec 2016, Roland Scheidegger wrote: > Am 09.12.2016 um 10:59 schrieb Thomas Gleixner: > > On Fri, 9 Dec 2016, Roland Scheidegger wrote: > >> > >> I saw some system lockups though: > >> When doing a cold boot, this kernel never managed to boot up. The last > >> message seen is: > >> x86: Booting SMP configuration: > >> .... node #0, CPUs: #1 > > > > Weird. That really would be interesting to figure out what goes wrong > > there. What bothers me is that we don't see something like this: > > > >> [ 0.172334] TSC ADJUST differs: Reference CPU0: -577421768610 CPU1: > >> -577423766270 > > > > Can you please apply the debug patch below and provide the output ? > Ok, this is the output (minus some typos maybe...): > > x86: Booting SMP configuration: > .... node #0, CPUs: #1 > TSC ADJUST: CPU1: -2806491604 > TSC source sync 0 -> 1 runs 3 > TSC ADJUST differs: Reference CPU0: -2805503200 CPU1: -2806491604 > TSC ADJUST synchronize: Reference CPU0: -2805503200 CPU1: -2806491604 > TSC target sync skip > TSC source sync skipped > > And that's it. That's a cold boot (power on), right? That looks like I expected. As we know that the CPUs are in the same package simply resynchronizing the TSC ADJUST MSR is enough. And the sync test is skipped as we have resynced TSC ADJUST already to be the same value. What's unexpected is that the thing stops working :( I'm pretty sure the boot cpu hangs in that endless loop waiting for CPU1 to set the online bit, but for whatever reason this does not happen. Can you add the patch below to gather more information? There is a hunk in there with an '#if 0' which sets the TSC ADJUST to 0 on boot, which you can turn on as second step. This BIOS seems to be rather strange ... Thanks. tglx ---- --- a/arch/x86/kernel/tsc_sync.c +++ b/arch/x86/kernel/tsc_sync.c @@ -95,7 +95,7 @@ bool tsc_store_and_check_tsc_adjust(void cur->nextcheck = jiffies + HZ; cur->warned = false; - pr_info("TSC ADJUST: CPU%u: %lld\n", cpu, bootval); + pr_info("TSC ADJUST: CPU%u: %lld %lld\n", cpu, bootval, rdtsc()); /* * Check whether this CPU is the first in a package to come up. In @@ -114,8 +114,15 @@ bool tsc_store_and_check_tsc_adjust(void * the sync mechanism. If that fails we still can yell * about boot values not being consistent. */ - cur->adjusted = bootval; pr_info_once("TSC ADJUST: Boot CPU%u: %lld\n", cpu, bootval); +#if 0 + if (bootval) { + pr_warn("TSC ADJUST: CPU%u: Setting to 0\n", cpu); + bootval = 0; + wrmsrl(MSR_IA32_TSC_ADJUST, bootval); + } +#endif + cur->adjusted = bootval; return false; } --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -245,7 +245,9 @@ static void notrace start_secondary(void * from seeing a half valid vector space. */ lock_vector_lock(); + pr_info("Vector locked\n"); setup_vector_irq(smp_processor_id()); + pr_info("Vector setup done\n"); set_cpu_online(smp_processor_id(), true); unlock_vector_lock(); cpu_set_state_online(smp_processor_id()); @@ -257,8 +259,10 @@ static void notrace start_secondary(void /* to prevent fake stack check failure in clock setup */ boot_init_stack_canary(); + pr_info("Clock setup\n"); x86_cpuinit.setup_percpu_clockev(); + pr_info("Enter idle startup\n"); wmb(); cpu_startup_entry(CPUHP_AP_ONLINE_IDLE); } @@ -1133,7 +1137,7 @@ int native_cpu_up(unsigned int cpu, stru cpu_relax(); touch_nmi_watchdog(); } - + pr_info("Target CPU is online\n"); return 0; }