Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp7795451imm; Thu, 28 Jun 2018 09:22:49 -0700 (PDT) X-Google-Smtp-Source: AAOMgpcMhVHi/9iXNOh6TI9z7CKTsbpgvQvJvu73o8NUtcw1QzBU3Y+zFTuWO8LJA8RE1vYwZr+a X-Received: by 2002:a62:c819:: with SMTP id z25-v6mr10076344pff.44.1530202969749; Thu, 28 Jun 2018 09:22:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530202969; cv=none; d=google.com; s=arc-20160816; b=C0ws4f6xZeAH/yJXBpYuFJcDRNPngrEQvyfTgAGnYKBO2P2sMo7MPLWekNIJKJ7xO5 peByotHjmqh7TdCG1vyIVQVY6oWVfuh1kzOkFi9i1SE1ofRsd4oIqpE3G4wADBoO+1td RLabVixUmZQcFbygQmST33p1z1GXkdj4J92RluBTWnxYl6F7REOsmdQW+ogzEqP5t4J6 FHK1XCE0jsVtWySqloU9Sh85DJ/QERygpsSNQ9Oy+JOlSgX53l8C/9p9vSjQ6def3IYc 4vG4/TXaylVPKe2SdnqTmCxYor2fu0M4WL6LebHyvUFO0VMyf6zPbsrmhVWJklHHW2TO 4lKA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date :arc-authentication-results; bh=Q6GTsLKocik9OL25Q6D1Bhcnm2pCXkTnQ1vUTCyMlXc=; b=GhEZLZyRL8GB8tBt3f66PDCfyGQQLlYy+AO4URPY51nrUfdS+fFInFQgyRhMZuQatw KQ5CydrFd6dCZ/G2EGliF0Kvuq52WXSa4W1liyvREV2u90ODcwwg2ul0ywOBw9ABqL2C NYBANUAf+sb4s5V3KOjTGmTQUzxi2s4trg3IYWqHO8nUikAS3Cza5og4xuw/p7si4HMx 7lXV2z8pKS+uPSdQlsqdPA12tczoXHD4YCjHPFcyw6wHTcK0C8A4WlINevWnp+g24v/K /20PoV48Bys/+U3Ekt1v6FdEaTVdJm4ua6n5Q2lfAXw1zCBg/OEuY9Jyv5TNtKVa3wXz rSJw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s14-v6si5091766pgc.617.2018.06.28.09.22.34; Thu, 28 Jun 2018 09:22:49 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965120AbeF1Kop (ORCPT + 99 others); Thu, 28 Jun 2018 06:44:45 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:55804 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965097AbeF1Kon (ORCPT ); Thu, 28 Jun 2018 06:44:43 -0400 Received: from hsi-kbw-5-158-153-55.hsi19.kabel-badenwuerttemberg.de ([5.158.153.55] helo=nanos) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1fYUPF-0005IO-OE; Thu, 28 Jun 2018 12:44:05 +0200 Date: Thu, 28 Jun 2018 12:43:59 +0200 (CEST) From: Thomas Gleixner To: Pavel Tatashin cc: Steven Sistare , Daniel Jordan , linux@armlinux.org.uk, Martin Schwidefsky , Heiko Carstens , John Stultz , sboyd@codeaurora.org, x86@kernel.org, LKML , mingo@redhat.com, "H. Peter Anvin" , douly.fnst@cn.fujitsu.com, Peter Zijlstra , Prarit Bhargava , feng.tang@intel.com, Petr Mladek , gnomes@lxorguk.ukuu.org.uk, linux-s390@vger.kernel.org, Andy Shevchenko , Boris Ostrovsky Subject: Re: [PATCH v12 09/11] x86/tsc: prepare for early sched_clock In-Reply-To: Message-ID: References: <20180621212518.19914-1-pasha.tatashin@oracle.com> <20180621212518.19914-10-pasha.tatashin@oracle.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 28 Jun 2018, Thomas Gleixner wrote: > I still want to document the unholy mess of what is initialized and > available when. We have 5 hypervisors and 3 different points in early boot > where the calibrate_* callbacks are overwritten. The XEN PV one is actually > post tsc_init_early() for whatever reason. > > That's all completely obscure and any attempt of moving tsc_early_init() > earlier than where it is now is just lottery. > > The other issue is that double calibration, e.g. doing the PIT thing twice > is just consuming boot time for no value. > > All of that has been duct taped over time and we really don't want yet > another thing glued to it just because we can. So here is the full picture of the TSC/CPU calibration maze: Compile time setup: native_calibrate_tsc CPUID based frequency read out with magic fixups for broken CPUID implementations native_calibrate_cpu Try the following: 1) CPUID based (different leaf than the TSC one) 2) MSR based 3) Quick PIT calibration 4) PIT/HPET/PMTIMER calibration (slow) and only available in tsc_init(). Could be made working post x86_dtb_init(). Boot sequence: start_kernel() INTEL_MID: x86_intel_mid_early_setup() calibrate_tsc = intel_mid_calibrate_tsc intel_mid_calibrate_tsc() { return 0; } setup_arch() x86_init.oem.arch_setup(); INTEL_MID: intel_mid_arch_setup() PENWELL: x86_platform.calibrate_tsc = mfld_calibrate_tsc; MSR based magic. Value would be available right away. TANGIER: x86_platform.calibrate_tsc = tangier_calibrate_tsc; Different MSR based magic. Value would be available right away. .... init_hypervisor_platform() vmware: Retrieves frequency and store it for the calibration function khz = vmware_get_khz_magic() vmware_tsc_khz = khz calibrate_cpu = vmware_get_tsc_khz calibrate_tsc = vmware_get_tsc_khz preset_lpj(khz) hyperv: if special hyperv MSRs are available: calibrate_cpu = hv_get_tsc_khz calibrate_tsc = hv_get_tsc_khz MSR is readable already in this function jailhouse: Frequency is available in this function and store in a variable for the calibration function calibrate_cpu = jailhouse_get_tsc calibrate_tsc = jailhouse_get_tsc ... kvmclock_init() if (magic_conditions) calibrate_tsc = kvm_get_tsc_khz calibrate_cpu = kvm_get_tsc_khz kvm_get_preset_lpj() khz = kvm_get_tsc_khz() preset_lpj(khz); tsc_early_delay_calibrate() tsc_khz = calibrate_tsc() cpu_khz = calibrate_cpu() .... set_lpj(tsc_khz); x86_init.paging.pagetable_init() xen_pagetable_init() xen_setup_shared_info() xen_hvm_init_time_ops() if (XENFEAT_hvm_safe_pvclock) calibrate_tsc = xen_tsc_khz PV clock based access tsc_init() tsc_khz = calibrate_tsc() cpu_khz = calibrate_cpu() Putting this into a table: Platform tsc_early_delay_calibrate() tsc_init() ----------------------------------------------------------------------- Generic native_calibrate_tsc() native_calibrate_tsc() native_calibrate_cpu() native_calibrate_cpu() (Cannot do HPET/PMTIMER) ----------------------------------------------------------------------- INTEL_MID intel_mid_calibrate_tsc() intel_mid_calibrate_tsc() Generic native_calibrate_cpu() native_calibrate_cpu() INTEL_MID mfld_calibrate_tsc() mfld_calibrate_tsc() PENWELL native_calibrate_cpu() native_calibrate_cpu() INTEL_MID tangier_calibrate_tsc() tangier_calibrate_tsc() TANGIER native_calibrate_cpu() native_calibrate_cpu() ----------------------------------------------------------------------- VNWARE vmware_get_tsc_khz() vmware_get_tsc_khz() vmware_get_tsc_khz() vmware_get_tsc_khz() HYPERV hv_get_tsc_khz() hv_get_tsc_khz() hv_get_tsc_khz() hv_get_tsc_khz() JAILHOUSE jailhouse_get_tsc() jailhouse_get_tsc() jailhouse_get_tsc() jailhouse_get_tsc() KVM kvm_get_tsc_khz() kvm_get_tsc_khz() kvm_get_tsc_khz() kvm_get_tsc_khz() ------------------------------------------------------------------------ XEN native_calibrate_tsc() xen_tsc_khz() native_calibrate_cpu() native_calibrate_cpu() ------------------------------------------------------------------------ The only platform which cannot use the special TSC calibration routine in the early calibration is XEN because it's initialized just _after_ the early calibration runs. For enhanced fun the early calibration stuff was moved from right after init_hypervisor_platform() to the place where it is now in commit ccb64941f375a6 ("x86/timers: Move simple_udelay_calibration() past kvmclock_init()") to speed up KVM boot time by avoiding the PIT calibration. I have no idea why it wasn't just moved past the XEN initialization a few lines further down, especially as the change was done by a XEN maintainer :) Boris? The other HV guests all do more or less the same thing and return the same value for cpu_khz and tsc_khz via the calibration indirection despite the value being known in the init_platform() function already. The generic initilizaiton does everything twice, which makes no sense, except for the unlikely case were no fast functions are available and the quick PIT calibration fails (PMTIMER/HPET) are not available in early calibration. HPET The INTEL MID stuff is wierd and not really obvious. AFAIR those systems don't have PIT or such, so they need to rely on the MSR/CPUID mechanisms to work, but that's just working because and not for obvious reasons. Andy, can you shed some light on that stuff? So some of this just works by chance, things are done twice and pointlessly (XEN). This really wants to be cleaned up and well documented which the requirements of each platform are, especially the Intel-MID stuff needs that. Thanks, tglx