Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp4879360imm; Tue, 26 Jun 2018 02:02:13 -0700 (PDT) X-Google-Smtp-Source: AAOMgpeW1IjjObk34aGi2O1iol7QbbEaU/kUuGh1LyocLqGb0OzHJEHTHdhYBrZPRgYjfaoEFdWi X-Received: by 2002:a65:5803:: with SMTP id g3-v6mr481936pgr.117.1530003733824; Tue, 26 Jun 2018 02:02:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530003733; cv=none; d=google.com; s=arc-20160816; b=vuUPrZKfACgk8havOnf5DpAQ/x+Ny0g5sd6q7IAtwiQVDLiiGLcnEtR0aYVd6iCsPO 2HJF8E5/Y5on4vvmgXrws3w7Q2CpCW2L4+lHdL+PVSMkVPEykmP87Zp3yIUwvIpNqKIn 4h8vD65tJkSNGsYZv/L+NIMhQ8eaSh7OFZv2ha+WmfnIXv4LWu55zGI3hWeKLHT4dx4G y7GmRLRwIlvq8VE3GnKe26f17l3nIzMYJH7tuSt8DIdnxbarcAnj/ee/WYPqzD+wOZij iRVTFhe91wn9lOG8HOhdXkBrNUu7rLOJsLhdKAMgf3P8O5eKhu4xwj05kZOBxX5WW13C bocg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=knbt4nsNDlw6BJyCk3blM79cAITpFNowfwduF4lvq2E=; b=BuO73oq3nh8WgNmiQiopUGBu/IWGRn0afymboqrKOHPn6hgslSDO2MZ2Eosn86Rd6K TtH7+Ilp1mzZJ2IcySV4OQjKlG7cnKCI9OvYlXksGtx3pFZWOVC9a7Kf04KpHzRppbq/ cDTD1Ch8V9Z6NT6OyQq5LpBM3VmMH4GbXqXKkIKlf3AKGFFg8Ytt99+t82QitBejrJi3 LL1ClA2/bdfbEQvHhp5hI907wocbMjHI1vqBKrxIamJJGfi4iakTxeWVfq4QPX5ncIF2 Yit+xJBCQsaQHFb8V7SPfii8nWc3PihAFuvcJ1YG59LUZrYo/6RuAFqRySaK1ITHsMKH 4ong== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=K9qwAQMn; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r65-v6si1213863pfk.83.2018.06.26.02.01.59; Tue, 26 Jun 2018 02:02:13 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=K9qwAQMn; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933677AbeFZJAw (ORCPT + 99 others); Tue, 26 Jun 2018 05:00:52 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:48422 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752826AbeFZJAr (ORCPT ); Tue, 26 Jun 2018 05:00:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=knbt4nsNDlw6BJyCk3blM79cAITpFNowfwduF4lvq2E=; b=K9qwAQMni/UKZokBSdXN6c/tm 3MwjZt40C6v2OOSKio4v+BQviJUt2Bx3qOvWcU97E9WhP3SbOZuqWcgWCPoGNLAClaG6IBcayXK56 Ve8DSWaIVTUiDGJampTlvfkMwz+ynx/nl9GhbwCc+6bu1U75/CL/pKgW4bJPD1btFtUHYSRvbkC3b s+zYuXyzeiYpMwsGT/hL+Yn6PhShTw8EjW8SWL7CE+iOlbjUO9mRrfLVFHiIQhGvTZF5Vj6lsCB6X Iu2xJV18cDKXeVUSQ5+CTY5wd6hJCJn5UtobRSLBfwuAjVn/fP1FBjs7pvBpTp7/y0PUOyyOZo21b oFu16UK2A==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by bombadil.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux)) id 1fXjpV-0005IG-Qu; Tue, 26 Jun 2018 09:00:06 +0000 Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 453B62029F1D7; Tue, 26 Jun 2018 11:00:03 +0200 (CEST) Date: Tue, 26 Jun 2018 11:00:03 +0200 From: Peter Zijlstra To: Pavel Tatashin Cc: steven.sistare@oracle.com, daniel.m.jordan@oracle.com, linux@armlinux.org.uk, schwidefsky@de.ibm.com, heiko.carstens@de.ibm.com, john.stultz@linaro.org, sboyd@codeaurora.org, x86@kernel.org, linux-kernel@vger.kernel.org, mingo@redhat.com, tglx@linutronix.de, hpa@zytor.com, douly.fnst@cn.fujitsu.com, prarit@redhat.com, feng.tang@intel.com, pmladek@suse.com, gnomes@lxorguk.ukuu.org.uk, linux-s390@vger.kernel.org Subject: Re: [PATCH v12 10/11] sched: early boot clock Message-ID: <20180626090003.GA2458@hirez.programming.kicks-ass.net> References: <20180621212518.19914-1-pasha.tatashin@oracle.com> <20180621212518.19914-11-pasha.tatashin@oracle.com> <20180625085543.GT2494@hirez.programming.kicks-ass.net> <20180625192320.kzmqkvmfh5aeuhhx@xakep.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180625192320.kzmqkvmfh5aeuhhx@xakep.localdomain> User-Agent: Mutt/1.10.0 (2018-05-17) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 25, 2018 at 03:23:20PM -0400, Pavel Tatashin wrote: > Unfortunatly the above suggestion won't work. And here is why. > > We have a call sequence like this: > > start_kernel > sched_init() > sched_clock_init() > In this call sched_clock_running is set to 1. Which means > that sched_clock_cpu() starts doing the following sequence: > scd = cpu_sdc(cpu); > clock = sched_clock_local(scd); > Where we try to filter the output of sched_clock() based > on the value of scd. But, that won't work, because to get > this functionality, we need to have: timer initialized > that wakes up and updates scd, and we need timekeeping > initialized, so we can call ktime_get_ns(). Both of which > are called later. > ... > timekeeping_init() After this we can call ktime_get_ns() > time_init() Here we configure x86_late_time_init pointer. > ... > late_time_init() > x86_late_time_init() > x86_init.timers.timer_init() > hpet_time_init() Only after this call we finally start > getting clock interrupts, and can get precise output from > sched_clock_local(). > > The way I solved the above, is I changed sched_clock() to keep outputing > time based on early boot sched_clock() until sched_clock_init_late(), at > whic point everything is configured and we can switch to the permanent > clock, eventhough this happens after smp init. > > If you have a better solution, please let me know. How's something like this? That moves sched_clock_init() to right before we enable IRQs for the first time (which is after we've started the whole timekeeping business). The thing is, sched_clock_init_late() reall is far too late, we need to switch to unstable before we bring up SMP. --- include/linux/sched_clock.h | 5 ----- init/main.c | 4 ++-- kernel/sched/clock.c | 49 +++++++++++++++++++++++++++++++++++---------- kernel/sched/core.c | 1 - kernel/time/sched_clock.c | 2 +- 5 files changed, 41 insertions(+), 20 deletions(-) diff --git a/include/linux/sched_clock.h b/include/linux/sched_clock.h index 411b52e424e1..2d223677740f 100644 --- a/include/linux/sched_clock.h +++ b/include/linux/sched_clock.h @@ -9,17 +9,12 @@ #define LINUX_SCHED_CLOCK #ifdef CONFIG_GENERIC_SCHED_CLOCK -extern void sched_clock_postinit(void); - extern void sched_clock_register(u64 (*read)(void), int bits, unsigned long rate); #else -static inline void sched_clock_postinit(void) { } - static inline void sched_clock_register(u64 (*read)(void), int bits, unsigned long rate) { - ; } #endif diff --git a/init/main.c b/init/main.c index 3b4ada11ed52..162d931c9511 100644 --- a/init/main.c +++ b/init/main.c @@ -79,7 +79,7 @@ #include #include #include -#include +#include #include #include #include @@ -642,7 +642,7 @@ asmlinkage __visible void __init start_kernel(void) softirq_init(); timekeeping_init(); time_init(); - sched_clock_postinit(); + sched_clock_init(); printk_safe_init(); perf_event_init(); profile_init(); diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c index 10c83e73837a..c8286b9fc593 100644 --- a/kernel/sched/clock.c +++ b/kernel/sched/clock.c @@ -68,11 +68,6 @@ EXPORT_SYMBOL_GPL(sched_clock); __read_mostly int sched_clock_running; -void sched_clock_init(void) -{ - sched_clock_running = 1; -} - #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK /* * We must start with !__sched_clock_stable because the unstable -> stable @@ -199,6 +194,23 @@ void clear_sched_clock_stable(void) __clear_sched_clock_stable(); } +static void __sched_clock_gtod_offset(void) +{ + __gtod_offset = (sched_clock() + __sched_clock_offset) - ktime_get_ns(); +} + +void __init sched_clock_init(void) +{ + /* + * Set __gtod_offset such that once we mark sched_clock_running, + * sched_clock_tick() continues where sched_clock() left off. + * + * Even if TSC is buggered, we're still UP at this point so it + * can't really be out of sync. + */ + __sched_clock_gtod_offset(); + sched_clock_running = 1; +} /* * We run this as late_initcall() such that it runs after all built-in drivers, * notably: acpi_processor and intel_idle, which can mark the TSC as unstable. @@ -351,7 +363,7 @@ u64 sched_clock_cpu(int cpu) return sched_clock() + __sched_clock_offset; if (unlikely(!sched_clock_running)) - return 0ull; + return sched_clock(); /* __sched_clock_offset == 0 */ preempt_disable_notrace(); scd = cpu_sdc(cpu); @@ -385,8 +397,6 @@ void sched_clock_tick(void) void sched_clock_tick_stable(void) { - u64 gtod, clock; - if (!sched_clock_stable()) return; @@ -398,9 +408,7 @@ void sched_clock_tick_stable(void) * TSC to be unstable, any computation will be computing crap. */ local_irq_disable(); - gtod = ktime_get_ns(); - clock = sched_clock(); - __gtod_offset = (clock + __sched_clock_offset) - gtod; + __sched_clock_gtod_offset(); local_irq_enable(); } @@ -434,6 +442,24 @@ EXPORT_SYMBOL_GPL(sched_clock_idle_wakeup_event); #else /* CONFIG_HAVE_UNSTABLE_SCHED_CLOCK */ +#ifdef CONFIG_GENERIC_SCHED_CLOCK + +/* + * kernel/time/sched_clock.c:sched_clock_init() + */ + +u64 sched_clock_cpu(int cpu) +{ + return sched_clock(); +} + +#else /* CONFIG_GENERIC_SCHED_CLOCK */ + +void __init sched_clock_init(void) +{ + sched_clock_running = 1; +} + u64 sched_clock_cpu(int cpu) { if (unlikely(!sched_clock_running)) @@ -442,6 +468,7 @@ u64 sched_clock_cpu(int cpu) return sched_clock(); } +#endif /* CONFIG_GENERIC_SCHED_CLOCK */ #endif /* CONFIG_HAVE_UNSTABLE_SCHED_CLOCK */ /* diff --git a/kernel/sched/core.c b/kernel/sched/core.c index a98d54cd5535..b27d034ef4a7 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5953,7 +5953,6 @@ void __init sched_init(void) int i, j; unsigned long alloc_size = 0, ptr; - sched_clock_init(); wait_bit_init(); #ifdef CONFIG_FAIR_GROUP_SCHED diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c index 2d8f05aad442..b4fedf312979 100644 --- a/kernel/time/sched_clock.c +++ b/kernel/time/sched_clock.c @@ -237,7 +237,7 @@ sched_clock_register(u64 (*read)(void), int bits, unsigned long rate) pr_debug("Registered %pF as sched_clock source\n", read); } -void __init sched_clock_postinit(void) +void __init sched_clock_init(void) { /* * If no sched_clock() function has been provided at that point,