Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753676AbcLIKDB (ORCPT ); Fri, 9 Dec 2016 05:03:01 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:48114 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753113AbcLIKC7 (ORCPT ); Fri, 9 Dec 2016 05:02:59 -0500 Date: Fri, 9 Dec 2016 10:59:51 +0100 (CET) From: Thomas Gleixner To: Roland Scheidegger cc: sroland@vmware.com, Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] x86/tsc: RFC: re-synchronize TSCs to boot cpu TSC In-Reply-To: <6194ea73-99dd-ceca-6f8d-e945ca717c75@hispeed.ch> Message-ID: References: <1481179042-968-1-git-send-email-sroland@vmware.com> <6194ea73-99dd-ceca-6f8d-e945ca717c75@hispeed.ch> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3758 Lines: 124 On Fri, 9 Dec 2016, Roland Scheidegger wrote: > > I saw some system lockups though: > When doing a cold boot, this kernel never managed to boot up. The last > message seen is: > x86: Booting SMP configuration: > .... node #0, CPUs: #1 Weird. That really would be interesting to figure out what goes wrong there. What bothers me is that we don't see something like this: > [ 0.172334] TSC ADJUST differs: Reference CPU0: -577421768610 CPU1: > -577423766270 Can you please apply the debug patch below and provide the output ? > [ 0.094492] x86: Booting SMP configuration: > [ 0.094534] .... node #0, CPUs: #1 > [ 0.172334] TSC ADJUST differs: Reference CPU0: -577421768610 CPU1: > -577423766270 What on earth is this BIOS doing? That's a couple of minutes back in time. And the difference between the max and min adjust value is 2050932 cycles. > Without the patches on cold boot it just was as expected: > [ 0.093700] x86: Booting SMP configuration: > [ 0.093737] .... node #0, CPUs: #1 > [ 0.174304] TSC synchronization [CPU#0 -> CPU#1]: > [ 0.174375] Measured 1837188 cycles TSC warp between CPUs, turning > off TSC clock. Not surprising given the above numbers. Thanks, tglx 8<------------------------- --- a/arch/x86/kernel/tsc_sync.c +++ b/arch/x86/kernel/tsc_sync.c @@ -95,6 +95,8 @@ bool tsc_store_and_check_tsc_adjust(void cur->nextcheck = jiffies + HZ; cur->warned = false; + pr_info("TSC ADJUST: CPU%u: %lld\n", cpu, bootval); + /* * Check whether this CPU is the first in a package to come up. In * this case do not check the boot value against another package @@ -282,17 +284,22 @@ void check_tsc_sync_source(int cpu) else atomic_set(&test_runs, 3); retry: + pr_info("TSC source sync %d -> %d runs %d\n", smp_processor_id(), + cpu, atomic_read(&test_runs)); /* * Wait for the target to start or to skip the test: */ while (atomic_read(&start_count) != cpus - 1) { if (atomic_read(&skip_test) > 0) { + pr_info("TSC source sync skipped\n"); atomic_set(&skip_test, 0); return; } cpu_relax(); } + pr_info("TSC source sync starting\n"); + /* * Trigger the target to continue into the measurement too: */ @@ -300,6 +307,7 @@ void check_tsc_sync_source(int cpu) check_tsc_warp(loop_timeout(cpu)); + pr_info("TSC source sync wait for stop\n"); while (atomic_read(&stop_count) != cpus-1) cpu_relax(); @@ -311,7 +319,7 @@ void check_tsc_sync_source(int cpu) if (!nr_warps) { atomic_set(&test_runs, 0); - pr_debug("TSC synchronization [CPU#%d -> CPU#%d]: passed\n", + pr_info("TSC synchronization [CPU#%d -> CPU#%d]: passed\n", smp_processor_id(), cpu); } else if (atomic_dec_and_test(&test_runs) || random_warps) { @@ -367,6 +375,7 @@ void check_tsc_sync_target(void) * successful skip the test. */ if (tsc_store_and_check_tsc_adjust()) { + pr_info("TSC target sync skip\n"); atomic_inc(&skip_test); return; } @@ -376,6 +385,7 @@ void check_tsc_sync_target(void) * Register this CPU's participation and wait for the * source CPU to start the measurement: */ + pr_info("TSC target sync starting\n"); atomic_inc(&start_count); while (atomic_read(&start_count) != cpus) cpu_relax(); @@ -392,6 +402,7 @@ void check_tsc_sync_target(void) */ atomic_inc(&stop_count); + pr_info("TSC target sync stop\n"); /* * Wait for the source CPU to print stuff: */ @@ -403,6 +414,8 @@ void check_tsc_sync_target(void) */ atomic_set(&stop_count, 0); + pr_info("TSC target sync remaining runs %d\n", atomic_read(&test_runs)); + /* * Check the number of remaining test runs. If not zero, the test * failed and a retry with adjusted TSC is possible. If zero the