Received: by 2002:a25:d7c1:0:0:0:0:0 with SMTP id o184csp4026759ybg; Mon, 21 Oct 2019 02:34:19 -0700 (PDT) X-Google-Smtp-Source: APXvYqwNiQ1CLLqAQZhGqMXhgw6FRd4t0T/tBANKIhtY8ok6DbWCpnBQFNfBATAzsxKspel0vvKY X-Received: by 2002:a17:906:5115:: with SMTP id w21mr9546303ejk.32.1571650459866; Mon, 21 Oct 2019 02:34:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1571650459; cv=none; d=google.com; s=arc-20160816; b=XsnisNBBcyQ/VuY/0OCvSSD4xENJ/HfHtH0MiGvRocKJa4ERj90Q/xgu+tb7zdzI0m LcOoCxS4GSUwbMss+opUWj9Vwo1DPDZwysLNPwMk+8MAiaKYBkLrDXEx9z3CWHB8Cs92 v2UGlKUMawR1PyoA01cVJMDNbyfD4EwbvC4tYVUpqQ0dVkbzvSMAUpKXesGyun/+3/au oBFi/SPFokZqULwPW6Xwve8ziF4XOV+zoy6PTntrADcLBsIzHkN/Unk7bA/UIT7sxH8U tu5GGwNkadJrsYIOUV9bdPCJH1Y8Qtv7ZVFlpKdkkl9aBIOxoDSdvw3+qE2p/HTOUh5o 2F9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date; bh=gE5xVFsHwX/EW7gkTCinQaKxocdO2k9CMeEdjdJ/lQE=; b=VUdXNuQzn+JbHZYdKPVwJc/DYkOSSfybbc2W8NXQDx3t/wGNyBT43EsbKjnFkb/7pa m/OhRBXX3ZWdythslX7On/HkbhivVZC9uk2NWRE2qKUMxqCxghtCRS6v+zc/CISlN+p+ 901hh3FaEW81fRr457woran3XSWOPRAwc6JoYojNQo0iPZv7nKQbqTDIczYVcT/vuha4 +5jHAKgVToRhWdH3qeB4CwZS4AlayEf5MJC2gMEXg7AzZP7AzCOHOEGGWUER4pdlRYQs SizQmFk0M/YIHk3K/5snX3dkqmQNXBwdBOtSoJKkI6FI/0AEQBa0a1M4v8AsHQzwvIHW El4Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id pk7si7947159ejb.216.2019.10.21.02.33.56; Mon, 21 Oct 2019 02:34:19 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727718AbfJUJcj (ORCPT + 99 others); Mon, 21 Oct 2019 05:32:39 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:34070 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726725AbfJUJcj (ORCPT ); Mon, 21 Oct 2019 05:32:39 -0400 Received: from [5.158.153.52] (helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1iMU3B-0005rj-Ie; Mon, 21 Oct 2019 11:32:29 +0200 Date: Mon, 21 Oct 2019 11:32:29 +0200 (CEST) From: Thomas Gleixner To: Andy Lutomirski cc: Huacai Chen , Vincenzo Frascino , LKML , stable , Arnd Bergmann , Paul Burton , "open list:MIPS" , linux-arm-kernel Subject: Re: [PATCH] lib/vdso: Use __arch_use_vsyscall() to indicate fallback In-Reply-To: Message-ID: References: <1571367619-13573-1-git-send-email-chenhc@lemote.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, 20 Oct 2019, Andy Lutomirski wrote: > On Sat, Oct 19, 2019 at 3:01 AM Thomas Gleixner wrote: > > __arch_use_vsyscall() is a pointless exercise TBH. The VDSO data should be > > updated unconditionally so all the trivial interfaces like time() and > > getres() just work independently of the functions which depend on the > > underlying clocksource. > > > > This functions have a fallback operation already: > > > > Let __arch_get_hw_counter() return U64_MAX and the syscall fallback is > > invoked. > > > > My thought was that __arch_get_hw_counter() could return last-1 to > indicate failure, which would allow the two checks to be folded into > one check. Or we could continue to use U64_MAX and rely on the fact > that (s64)U64_MAX < 0, not worry about the cycle counter overflowing, > and letting cycles < last catch it. This is not an overflow catch. It's solely to deal with the fact that on X86 you can observe (cycles < last) on multi socket systems under rare circumstances. Any other architecture does not have that issue AFAIK. The wraparound of clocksources with a smaller width than 64bit is handled by: delta = (cycles - last) & mask; which operates on unsigned values for obvious reasons. > (And we should change it to return s64 at some point regardless -- all > the math is signed, so the underlying types should be too IMO.) See above. delta is guaranteed to be >= 0 and the mult/shift is not signed either. All the base values which are in the VDSO are unsigned as well. The weird typecast there: if ((s64)cycles < 0) could as well be if (cycles == U64_MAX) but the typecasted version creates better code. I tried to fold the two operations (see patch below) and on all machines I tested on (various generations of Intel and AMD) the result is slower than what we have now by a couple of cycles, which is a lot for these functions (i.e. between 3-5%). I'm sure I tried that before and ended up with the existing code as the fastest variant. Why? That's subject to speculation :) Thanks, tglx 8<---------- arch/x86/include/asm/vdso/gettimeofday.h | 39 ++++++------------------------- lib/vdso/gettimeofday.c | 23 +++--------------- 2 files changed, 13 insertions(+), 49 deletions(-) --- a/arch/x86/include/asm/vdso/gettimeofday.h +++ b/arch/x86/include/asm/vdso/gettimeofday.h @@ -235,10 +235,14 @@ static u64 vread_hvclock(void) } #endif -static inline u64 __arch_get_hw_counter(s32 clock_mode) +static inline u64 __arch_get_hw_counter(s32 clock_mode, u64 last, u64 mask) { + /* + * Mask operation is not required as all VDSO clocksources are + * 64bit wide. + */ if (clock_mode == VCLOCK_TSC) - return (u64)rdtsc_ordered(); + return (u64)rdtsc_ordered() - last; /* * For any memory-mapped vclock type, we need to make sure that gcc * doesn't cleverly hoist a load before the mode check. Otherwise we @@ -248,13 +252,13 @@ static inline u64 __arch_get_hw_counter( #ifdef CONFIG_PARAVIRT_CLOCK if (clock_mode == VCLOCK_PVCLOCK) { barrier(); - return vread_pvclock(); + return vread_pvclock() - last; } #endif #ifdef CONFIG_HYPERV_TIMER if (clock_mode == VCLOCK_HVCLOCK) { barrier(); - return vread_hvclock(); + return vread_hvclock() - last; } #endif return U64_MAX; @@ -265,33 +269,6 @@ static __always_inline const struct vdso return __vdso_data; } -/* - * x86 specific delta calculation. - * - * The regular implementation assumes that clocksource reads are globally - * monotonic. The TSC can be slightly off across sockets which can cause - * the regular delta calculation (@cycles - @last) to return a huge time - * jump. - * - * Therefore it needs to be verified that @cycles are greater than - * @last. If not then use @last, which is the base time of the current - * conversion period. - * - * This variant also removes the masking of the subtraction because the - * clocksource mask of all VDSO capable clocksources on x86 is U64_MAX - * which would result in a pointless operation. The compiler cannot - * optimize it away as the mask comes from the vdso data and is not compile - * time constant. - */ -static __always_inline -u64 vdso_calc_delta(u64 cycles, u64 last, u64 mask, u32 mult) -{ - if (cycles > last) - return (cycles - last) * mult; - return 0; -} -#define vdso_calc_delta vdso_calc_delta - #endif /* !__ASSEMBLY__ */ #endif /* __ASM_VDSO_GETTIMEOFDAY_H */ --- a/lib/vdso/gettimeofday.c +++ b/lib/vdso/gettimeofday.c @@ -26,34 +26,21 @@ #include #endif /* ENABLE_COMPAT_VDSO */ -#ifndef vdso_calc_delta -/* - * Default implementation which works for all sane clocksources. That - * obviously excludes x86/TSC. - */ -static __always_inline -u64 vdso_calc_delta(u64 cycles, u64 last, u64 mask, u32 mult) -{ - return ((cycles - last) & mask) * mult; -} -#endif - static int do_hres(const struct vdso_data *vd, clockid_t clk, struct __kernel_timespec *ts) { const struct vdso_timestamp *vdso_ts = &vd->basetime[clk]; - u64 cycles, last, sec, ns; + u64 delta, sec, ns; u32 seq; do { seq = vdso_read_begin(vd); - cycles = __arch_get_hw_counter(vd->clock_mode); - ns = vdso_ts->nsec; - last = vd->cycle_last; - if (unlikely((s64)cycles < 0)) + delta = __arch_get_hw_counter(vd->clock_mode, vd->cycle_last, + vd->mask); + if (unlikely((s64)delta < 0)) return -1; - ns += vdso_calc_delta(cycles, last, vd->mask, vd->mult); + ns = vdso_ts->nsec + delta * vd->mult; ns >>= vd->shift; sec = vdso_ts->sec; } while (unlikely(vdso_read_retry(vd, seq)));