Received: by 10.213.65.16 with SMTP id m16csp246353imf; Mon, 12 Mar 2018 02:14:45 -0700 (PDT) X-Google-Smtp-Source: AG47ELu0ilSc1NtFXGFP7qqxPEfk3oGekg+SntvQIh/pc2U5f2ZBEoTtKk50c0eo42S72Kg0WDUN X-Received: by 10.99.136.194 with SMTP id l185mr5842827pgd.419.1520846084950; Mon, 12 Mar 2018 02:14:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1520846084; cv=none; d=google.com; s=arc-20160816; b=sERAcQ6dUzjU972GLaK0twMMfb/s9xG1oApDx/A7VG50UPl3xKGIHZCdJrsldsjFR+ 9tfAjK/TjxB6obpSaLJCXPt7APSbLhdgEcmAEuOHtdGYnqv5Rp6b3Ze1d+9PuO1DTScs ek/SnAN7DJwrjiUDs0jOXu4vMW0M1JTwWJhIyxlQMWSz+uwNv164hiRF/IK7jNFqH1tp q98XBImfN0Wg1+H/zaBerdEhTf6FNXjfbO7xxlqG5TstBr0KJYAvxX5nNzX7hasINTjN Kxj09Dyduh00JfOSjc1zwkACUxkT+lzHAy1WGveDmqPwVEgz9QtZb2W8Ow4v5rRPlHog ICCg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:message-id:date :subject:to:from:dkim-signature:arc-authentication-results; bh=8lZRFV744k+q7uohsztJMaxI76BUdijXcx4up0p3Xqo=; b=Jl2X+iU8Gt1MA6wvnmuD5hDHFolauNDq4pkD37ftwkxvdUJcl5XDYSlVx6H8dTuMNc zzupRsHw5Qum6Vl7qFF/tuiJBKFPBFi8xyX4JS3qTy3kgcl0zur/GqIVCLCJ7nf0hCJ8 o2g5st/bP2+dlPYq95ltdcAxBFvnb3Onf+W7sfa2dUo0WTXPPy3eEPrLuIHVZUsNa8e7 nCSp3P5z33KStYY1kqWFSdeIBGcFiXNHQNbGcsd2Jw7dpgC/8y5PnmUvh7yOrbTD2Tj3 Mtynl6GirubkVGhrSl29NpUIDUKBVfc3XYmvD18tD7NZgF+eGERYoxEkBXnJlRuEUks0 fzxw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=sLS8DaLE; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s36-v6si2185680pld.556.2018.03.12.02.14.30; Mon, 12 Mar 2018 02:14:44 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=sLS8DaLE; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932231AbeCLJMZ (ORCPT + 99 others); Mon, 12 Mar 2018 05:12:25 -0400 Received: from mail-wm0-f65.google.com ([74.125.82.65]:38530 "EHLO mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751237AbeCLJMV (ORCPT ); Mon, 12 Mar 2018 05:12:21 -0400 Received: by mail-wm0-f65.google.com with SMTP id z9so14512571wmb.3 for ; Mon, 12 Mar 2018 02:12:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:user-agent:mime-version; bh=8lZRFV744k+q7uohsztJMaxI76BUdijXcx4up0p3Xqo=; b=sLS8DaLEdrmBkz23sszNw9aXkZAA+vtP5WF50CzpPFZ1Mw6QUy2qOs7quDbGwnFgfn 3Q2slM6PhRBY+IDXJgItSdHkI4HC6eT+zmvTfByHQDihQnBELzZ+nubqzYa7Uc6l2dRc CPn21kGWRAvP7CLBnEZMsvkZb/JjR8amMIaopcxLUjELZeft/aohv743bgFs9woR/dPD nePyJ18wnkCAdLGA4DAKkzYw2u5s211PeK6gm5Q6v9DJS2yIeifrN6BXPppWMcy0Ok8E zM0X2kQi2hL1W+H3E3LeffryKvAJmJ26bqu3V4f2fPY6lCALcPa9KiNNscSk2Eu226kV XBpw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:user-agent :mime-version; bh=8lZRFV744k+q7uohsztJMaxI76BUdijXcx4up0p3Xqo=; b=qsd+QVShvX6NJ+rY/zW/taBxyeCsjcL21JczWPyxLJ+hXDrWvVDI33DbgmNWuoktOh AV1uicFvTDlO0q1zgOIQ8vlcjlGJ/T+Pfq67BbFfEL9tlp1K8o9TwuoQjLgrO9I9oZZF OKOthLW7oc2J9IWs2yy+s1EAdhfB87n28OB9rwpffwrek+C5jvHmwJEGhEdsjJKwX8yi KuOHUYM4l2xoe9U82D4LYHljfSFcVBZ+a4ofOWQyJk7zn7q3sL1qjypjBEEND6fFyA47 2pI4/squezW1/3yr4tt7uhKbg2iG6T4cXLZ6AmZpbkH6393E7TjFY7HSlOYdFsXTunGB s40w== X-Gm-Message-State: AElRT7G9Mf7+Ko+sZUEWHHMCdS9VayKLLryCBNaRLo5uV1K5ieRStj5D VzDYLMuxBLzPGA1nAcZ4ohY= X-Received: by 10.80.163.183 with SMTP id s52mr9768106edb.271.1520845939717; Mon, 12 Mar 2018 02:12:19 -0700 (PDT) Received: from jvdlux ([109.125.19.25]) by smtp.gmail.com with ESMTPSA id m1sm4788919edd.75.2018.03.12.02.12.18 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 12 Mar 2018 02:12:19 -0700 (PDT) From: Jason Vas Dias To: x86@kernel.org, LKML , Thomas Gleixner , andi , Peter Zijlstra Subject: [PATCH v4.16-rc4 1/2] x86/vdso: on Intel, VDSO should handle CLOCK_MONOTONIC_RAW Date: Mon, 12 Mar 2018 09:12:15 +0000 Message-ID: <3715iro0.fsf@gmail.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Currently the VDSO does not handle clock_gettime( CLOCK_MONOTONIC_RAW, &ts ) on Intel / AMD - it calls vdso_fallback_gettime() for this clock, which issues a syscall, having an unacceptably high latency (minimum measurable time or time between measurements) of 300-700ns on 2 2.8-3.9ghz Haswell x86_64 Family'_'Model : 06_3C machines under various versions of Linux. Sometimes, particularly when correlating elapsed time to performance counter values, code needs to know elapsed time from the perspective of the CPU no matter how "hot" / fast or "cold" / slow it might be running wrt NTP / PTP ; when code needs this, the latencies with a syscall are often unacceptably high. I reported this as Bug #198161 : 'https://bugzilla.kernel.org/show_bug.cgi?id=198961' and in previous posts with subjects matching 'CLOCK_MONOTONIC_RAW' . This patch handles CLOCK_MONOTONIC_RAW clock_gettime() in the VDSO , by exporting the raw clock calibration, last cycles, last xtime_nsec, and last raw_sec value in the vsyscall_gtod_data during vsyscall_update() . Now the new do_monotonic_raw() function in the vDSO has a latency of @ 24ns on average, and the test program: tools/testing/selftest/timers/inconsistency-check.c succeeds with arguments: '-c 4 -t 120' or any arbitrary -t value. The patch is against Linus' latest 4.16-rc5 tree, current HEAD of : git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git . This patch affects only these files: arch/x86/include/asm/vgtod.h arch/x86/entry/vdso/vclock_gettime.c arch/x86/entry/vsyscall/vsyscall_gtod.c There are 2 patches in the series - this first one handles CLOCK_MONOTONIC_RAW in VDSO using existing rdtsc_ordered() , and the second uses new rstscp() function which avoids use of an explicit barrier. Best Regards, Jason Vas Dias . --- diff -up linux-4.16-rc5.1/arch/x86/entry/vdso/vclock_gettime.c.4.16-rc5 linux-4.16-rc5.1/arch/x86/entry/vdso/vclock_gettime.c --- linux-4.16-rc5.1/arch/x86/entry/vdso/vclock_gettime.c.4.16-rc5 2018-03-12 00:25:09.000000000 +0000 +++ linux-4.16-rc5.1/arch/x86/entry/vdso/vclock_gettime.c 2018-03-12 08:12:17.110120433 +0000 @@ -182,6 +182,18 @@ notrace static u64 vread_tsc(void) return last; } +notrace static u64 vread_tsc_raw(void) +{ + u64 tsc + , last = gtod->raw_cycle_last; + + tsc = rdtsc_ordered(); + if (likely(tsc >= last)) + return tsc; + asm volatile (""); + return last; +} + notrace static inline u64 vgetsns(int *mode) { u64 v; @@ -203,6 +215,27 @@ notrace static inline u64 vgetsns(int *m return v * gtod->mult; } +notrace static inline u64 vgetsns_raw(int *mode) +{ + u64 v; + cycles_t cycles; + + if (gtod->vclock_mode == VCLOCK_TSC) + cycles = vread_tsc_raw(); +#ifdef CONFIG_PARAVIRT_CLOCK + else if (gtod->vclock_mode == VCLOCK_PVCLOCK) + cycles = vread_pvclock(mode); +#endif +#ifdef CONFIG_HYPERV_TSCPAGE + else if (gtod->vclock_mode == VCLOCK_HVCLOCK) + cycles = vread_hvclock(mode); +#endif + else + return 0; + v = (cycles - gtod->raw_cycle_last) & gtod->raw_mask; + return v * gtod->raw_mult; +} + /* Code size doesn't matter (vdso is 4k anyway) and this is faster. */ notrace static int __always_inline do_realtime(struct timespec *ts) { @@ -246,6 +279,27 @@ notrace static int __always_inline do_mo return mode; } +notrace static __always_inline int do_monotonic_raw(struct timespec *ts) +{ + unsigned long seq; + u64 ns; + int mode; + + do { + seq = gtod_read_begin(gtod); + mode = gtod->vclock_mode; + ts->tv_sec = gtod->monotonic_time_raw_sec; + ns = gtod->monotonic_time_raw_nsec; + ns += vgetsns_raw(&mode); + ns >>= gtod->raw_shift; + } while (unlikely(gtod_read_retry(gtod, seq))); + + ts->tv_sec += __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns); + ts->tv_nsec = ns; + + return mode; +} + notrace static void do_realtime_coarse(struct timespec *ts) { unsigned long seq; @@ -277,6 +331,10 @@ notrace int __vdso_clock_gettime(clockid if (do_monotonic(ts) == VCLOCK_NONE) goto fallback; break; + case CLOCK_MONOTONIC_RAW: + if (do_monotonic_raw(ts) == VCLOCK_NONE) + goto fallback; + break; case CLOCK_REALTIME_COARSE: do_realtime_coarse(ts); break; diff -up linux-4.16-rc5.1/arch/x86/entry/vsyscall/vsyscall_gtod.c.4.16-rc5 linux-4.16-rc5.1/arch/x86/entry/vsyscall/vsyscall_gtod.c --- linux-4.16-rc5.1/arch/x86/entry/vsyscall/vsyscall_gtod.c.4.16-rc5 2018-03-12 00:25:09.000000000 +0000 +++ linux-4.16-rc5.1/arch/x86/entry/vsyscall/vsyscall_gtod.c 2018-03-12 07:58:07.974214168 +0000 @@ -45,6 +45,11 @@ void update_vsyscall(struct timekeeper * vdata->mult = tk->tkr_mono.mult; vdata->shift = tk->tkr_mono.shift; + vdata->raw_cycle_last = tk->tkr_raw.cycle_last; + vdata->raw_mask = tk->tkr_raw.mask; + vdata->raw_mult = tk->tkr_raw.mult; + vdata->raw_shift = tk->tkr_raw.shift; + vdata->wall_time_sec = tk->xtime_sec; vdata->wall_time_snsec = tk->tkr_mono.xtime_nsec; @@ -74,5 +79,8 @@ void update_vsyscall(struct timekeeper * vdata->monotonic_time_coarse_sec++; } + vdata->monotonic_time_raw_sec = tk->raw_sec; + vdata->monotonic_time_raw_nsec = tk->tkr_raw.xtime_nsec; + gtod_write_end(vdata); } --- linux-4.16-rc5.1/arch/x86/include/asm/vgtod.h.4.16-rc5 2018-03-12 00:25:09.000000000 +0000 +++ linux-4.16-rc5.1/arch/x86/include/asm/vgtod.h 2018-03-12 07:44:17.910539760 +0000 @@ -22,6 +22,10 @@ struct vsyscall_gtod_data { u64 mask; u32 mult; u32 shift; + u64 raw_cycle_last; + u64 raw_mask; + u32 raw_mult; + u32 raw_shift; /* open coded 'struct timespec' */ u64 wall_time_snsec; @@ -32,6 +36,8 @@ struct vsyscall_gtod_data { gtod_long_t wall_time_coarse_nsec; gtod_long_t monotonic_time_coarse_sec; gtod_long_t monotonic_time_coarse_nsec; + gtod_long_t monotonic_time_raw_sec; + gtod_long_t monotonic_time_raw_nsec; int tz_minuteswest; int tz_dsttime; ---