Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp59076imm; Tue, 19 Jun 2018 14:16:29 -0700 (PDT) X-Google-Smtp-Source: ADUXVKK1GyKEJGolr1kmTWoB0k3Crov5voM+uIdj+pbx5cePkyw9tFcvWXiy0m0D57BSkK2Ym/jM X-Received: by 2002:a17:902:a581:: with SMTP id az1-v6mr20840961plb.61.1529442989116; Tue, 19 Jun 2018 14:16:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529442989; cv=none; d=google.com; s=arc-20160816; b=ncKkdqqVmlw3PudawYiWJrEcB6os+iCi0jPqlWrL8X7PluOEFTNWkwdQK1eZmBK6ig MYirXPgA+HALnMD/m2wO58YBwTZi9xCS7s6O9uELhKBzOIgKA6QiY7VNa+ItBOty98Nq k9B3afuGweuaE/XKMW76Q00IiFgM66rdgJa+Is4STXNrxP5zjfajVot1qpaaYs7YD3f1 Q4pUGTo62Z/zWRllx0vdsEhHpGezVNKBMc3ZLC+6k1PSr08tZc4+rAAa6aADapmG8R6X ygwRblpe/z3MvI9XNE7PzwIdug/3oHfHPfakntcyeR+d0p5sX4gDFOXiNHZ1BREtfbOB DTFQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date :arc-authentication-results; bh=8JwusuU9Tntf8CN16UtytPPXc+YaXWmQlTDzk7VjCRw=; b=m5cCt3dFQ7pa1QNllUni94XsmRnyKKJyCN9XuYOv3hCwGIvStrLjERQryEwH3NfwRe cN1wrYmMS2zW6LOYUI5r9ylB/sm3bopVcsivOS9HiJICMzSbxiDgjNPSnwyP6NdDLeSQ N84Nt95fa/kNXHNUgbVl5htkZYEVyo12fqDCYJsmeB6RMv7DPmpJz30n4/0eGVp2QlfX PYbKFxfCjnmDIGrLk9shb76aiELiMxarx4b1Ld6aUIvYgSHhaP3Gr4RzlUIEvk0A8FdF iV7S/+2w4RnpJ5IN1V01Gkn6oijGPNvNEJ/mXPWj9NgRiiAXEzypfwO+2z3i0neXV8sy bVWQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 59-v6si571585plp.496.2018.06.19.14.16.10; Tue, 19 Jun 2018 14:16:29 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757347AbeFSVPD (ORCPT + 99 others); Tue, 19 Jun 2018 17:15:03 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:58312 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755424AbeFSVPB (ORCPT ); Tue, 19 Jun 2018 17:15:01 -0400 Received: from p4fea482e.dip0.t-ipconnect.de ([79.234.72.46] helo=nanos.glx-home) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1fVNxj-0003va-6z; Tue, 19 Jun 2018 23:14:51 +0200 Date: Tue, 19 Jun 2018 23:14:50 +0200 (CEST) From: Thomas Gleixner To: Pavel Tatashin cc: steven.sistare@oracle.com, daniel.m.jordan@oracle.com, linux@armlinux.org.uk, schwidefsky@de.ibm.com, heiko.carstens@de.ibm.com, john.stultz@linaro.org, sboyd@codeaurora.org, x86@kernel.org, linux-kernel@vger.kernel.org, mingo@redhat.com, hpa@zytor.com, douly.fnst@cn.fujitsu.com, peterz@infradead.org, prarit@redhat.com, feng.tang@intel.com, pmladek@suse.com, gnomes@lxorguk.ukuu.org.uk Subject: Re: [PATCH v10 2/7] time: sync read_boot_clock64() with persistent clock In-Reply-To: <20180615174204.30581-3-pasha.tatashin@oracle.com> Message-ID: References: <20180615174204.30581-1-pasha.tatashin@oracle.com> <20180615174204.30581-3-pasha.tatashin@oracle.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 15 Jun 2018, Pavel Tatashin wrote: > read_boot_clock64() returns a boot start timestamp from epoch. Some arches > may need to access the persistent clock interface in order to calculate the > epoch offset. The resolution of the persistent clock, however, might be > low. Therefore, in order to avoid time discrepancies a new argument 'now' > is added to read_boot_clock64() parameters. Arch may decide to use it > instead of accessing persistent clock again. I kinda know what you are trying to say, but it's all but obvious. > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c > index 4786df904c22..fb6898fab374 100644 > --- a/kernel/time/timekeeping.c > +++ b/kernel/time/timekeeping.c > @@ -1502,9 +1502,13 @@ void __weak read_persistent_clock64(struct timespec64 *ts64) > * Function to read the exact time the system has been started. > * Returns a timespec64 with tv_sec=0 and tv_nsec=0 if unsupported. > * > + * Argument 'now' contains time from persistent clock to calculate offset from > + * epoch. May contain zeros if persist ant clock is not available. What's a 'persist ant' ? > + * > * XXX - Do be sure to remove it once all arches implement it. > */ > -void __weak read_boot_clock64(struct timespec64 *ts) > +void __weak __init read_boot_clock64(struct timespec64 *now, > + struct timespec64 *ts) > { > ts->tv_sec = 0; > ts->tv_nsec = 0; > @@ -1535,7 +1539,7 @@ void __init timekeeping_init(void) > } else if (now.tv_sec || now.tv_nsec) > persistent_clock_exists = true; > > - read_boot_clock64(&boot); > + read_boot_clock64(&now, &boot); This interface was already bolted on and this extension just makes it worse. If 'now' is invalid then you need the same checks as after read_persistent_clock() replicated along with conditionals of some sort. 'boot' time is used to adjust the wall to monotonic offset. So looking at the math behind all of that: read_persistent_clock(&now); read_boot_clock(&boot); tk_set_xtime(tk, now) tk->xtime_sec = now.sec; tk->xtime_nsec = now.nsec; tk_set_wall_to_mono(tk, -boot); tk->wall_to_mono = boot; tk->offs_real = -boot; timekeeping_update(tk) tk->tkr_mono = tk->xtime + tk->wall_to_mono; tkr_mono.base is guaranteed to be >= 0. So you need to make sure that tk->xtime + tk->wall_to_mono >= 0 Yet another check which you need to do in that magic function. That's just wrong. If this grows more users then they all have to copy the same sanity checks over and over and half of them will be wrong. Fortunately there is only a single real user of read_boot_clock() in the tree: S390. By virtue of being S390 there is no worry about any sanity checks. It just works. ARM has the function, but there is no single subarch which actually implements the thing, so this is easy to fix by removing it. So the right thing to do is the following: Provide a new function: void __weak read_persistent_wall_and_boot_offset(struct timespec64 *wall_time, ktime_t *boot_offset) { read_persistent_clock(wall_time); } Then add the new function to S390: void read_persistent_wall_and_boot_offset(struct timespec64 *wall_time, ktime_t *boot_offset) { unsigned char clk[STORE_CLOCK_EXT_SIZE]; struct timespec64 boot_time; __u64 delta; read_persistent_clock(wall_time); delta = initial_leap_seconds + TOD_UNIX_EPOCH; memcpy(clk, tod_clock_base, 16); *(__u64 *) &clk[1] -= delta; if (*(__u64 *) &clk[1] > delta) clk[0]--; ext_to_timespec64(clk, boot_time); *boot_offset = timespec64_to_ns(timespec64_sub(wall_time, boot_time)); } Then rework timekeeping_init(): timekeeping_init() struct timespec64 wall_time, wall_to_mono, offs; ktime_t boot_offset = 0; read_persistent_wall_and_boot_offset(&walltime, &boot_offset); if (!valid(walltime)) boottime = wall_time.tv_sec = wall_time.tv_nsec = 0; else persistent_clock = true; if (boot_offset > timespec64_to_nsec(wall_time)) offs.tv_sec = offs.tv_nsec = 0; else offs = ns_to_timespec64(boot_offset); wall_to_mono = timespec64_sub(offs, wall_time); tk_set_wall_to_mono(tk, wall_time); After that remove the read_boot_time() leftovers all over the place. And then the x86 implementation boils down to: void read_persistent_wall_and_boot_offset(struct timespec64 *wall_time, ktime_t *boot_offset) { x86_platform.get_wallclock(ts); *boot_offset = sched_clock(); } And doing it this way - by adding the extra math to the only architecture on the planet which has sane timers - is the right thing to do because all other architectures will have to fall back to tricks similar to x86 by doing clocksource/schedclock based guestimation of the time where the machine actually reached the kernel. Hmm? Thanks, tglx