Received: by 10.213.65.16 with SMTP id m16csp226268imf; Mon, 12 Mar 2018 01:16:40 -0700 (PDT) X-Google-Smtp-Source: AG47ELtk0EcfP2fsbZOjrAo0UwI1sS+YB6sE7CwFQFexjY+EYl1h2jZkvK7ezpSiUYlj33eAbYNe X-Received: by 2002:a17:902:b109:: with SMTP id q9-v6mr7457920plr.340.1520842600860; Mon, 12 Mar 2018 01:16:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1520842600; cv=none; d=google.com; s=arc-20160816; b=zp4OkSPo52RAJU2bmOGRXnJ5Kvkkjdo8MsN8LoAiX0QCtzJl/Wks8LPCUmx2rZ23np IMMkT/HJHx3pyotIEAV7GAa6Jbo4eeXWglO9YhpD3SHuRv23/BElA5GKJAyGEPJ6M1cx n0w61a75jUNSlFYoj+srpu74fiAFBckM6SbZFJQWgi56i6ImttAHeqRLPF1/xZTJL18i Hy1toeqyp9dbXGgsDsJSL36G8nwKDhpfok3I2xxCMs7yL0ZwPvgkPzVqzeyITYrJL1sR 00Q7NU9VswHhtB4pdz7S5PeaMCsVfCcqGr8nMhU2EAEWUGJeYi1zQz/Wr4YjkwkELIhE vBRQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:user-agent:lines :date:subject:to:from:dkim-signature:arc-authentication-results; bh=OCFgjyM/A2kbCRLMQ/vQZXUCs/iUAqm6Cl0rBs1+4PA=; b=SjNRJblrPx8qEnUiEmmSAA7rEjs6CbaD+IK264/zr5qjGFFNKQl9GGMeBy85OgTOZI HgHyMWy/VieBe+H7C7jBXBrOFibCQ+SnhT1vCsuNDp142bTTLkh1494LCWDu2qnK33Ba MSqpm5hpHj2sZBigAu+9YhazBWlciW1dk1Xdf+9JZCW2XGX8mslVc0YtFiF/jknhVHbr gj0lvUasEIW2Z1ayUKi22K+yQxF19jsoXD4Kq84QMj7oYv+ZECQj16g7QpywKEy4et2D ABH6QYEuMYiAIdjbDVkgPm+h1zfkKXpk26/FaC7+n8mxAKTsl/Ref5562sQ846Te45Wu nvAA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=ves+YATE; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z13-v6si5575020plo.614.2018.03.12.01.16.26; Mon, 12 Mar 2018 01:16:40 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=ves+YATE; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751473AbeCLIPR (ORCPT + 99 others); Mon, 12 Mar 2018 04:15:17 -0400 Received: from mail-wm0-f66.google.com ([74.125.82.66]:52034 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750752AbeCLIPO (ORCPT ); Mon, 12 Mar 2018 04:15:14 -0400 Received: by mail-wm0-f66.google.com with SMTP id h21so14530842wmd.1 for ; Mon, 12 Mar 2018 01:15:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:lines:user-agent:message-id:mime-version; bh=OCFgjyM/A2kbCRLMQ/vQZXUCs/iUAqm6Cl0rBs1+4PA=; b=ves+YATEIGSZvX/exFxdc/7cYFYFe0EH/CriGnRmv1UYwwONNutZtIN2JSAOxCaBPq /d/OliVywVEP7621qvXezNjMwEgB10Cq+xG4Fb9nB+Z/jtWnGmTI2o9oJHTjug4FrngW MfY4XA9bqI9kgKdBbboNE6pOcW7xcVaNZZGr6sZHyJOeWuaa7tLOyWOgq6VvF82SKwBq /wst501GkNpjWwei3yAtdFCPWRG1pFjNAbVwdBn2DEZKSGmmdEkdxtjnv82cHBVKwMzq n05x3xeLk95MUOPVJARTQRoEr/ZtxcHkPPQksgDFFmIxVxR4jSUzkeXWSVbYiEuGqdWE i/EQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:lines:user-agent:message-id :mime-version; bh=OCFgjyM/A2kbCRLMQ/vQZXUCs/iUAqm6Cl0rBs1+4PA=; b=DyRbvFPyPOakN7EpMRI08u9xmxhzX2wZTR+/eo0CHbeXNYgFjSrSrdV+srcZchDw0k gATwBn/i3P8CgvsOkMVsSIrgvisuokJ313k+HHOptMyJ5kMga+9Gl7APk9/ykvgkL/0q fucI5M0PL7bvauzESQzVbQWpFQn/lUynDHMyRtDQ4S0Uta34aHqomrT8beOvUTuSGcXY ANZJXnSHDLF3GFCIAlWkomIe1mmH+A8DTcyai1I/YYrZadeXmD9IOriYaDC2ZDXmaUvH G7DmNKF4HIDaBcX4WiYYGuX1j6M6BOa2GsSuggZcZgZTeo3jK/6hjjARrLGKUQBjZ6K1 7rBg== X-Gm-Message-State: AElRT7Hr80jJ4Qq8e2JPyZFgG6ozd+WBEBy8kz0KorsS/hYdsVW2LX2N dvQNVAr4uDDReA5PTg2KkKg= X-Received: by 10.80.212.158 with SMTP id s30mr9609848edi.268.1520842512828; Mon, 12 Mar 2018 01:15:12 -0700 (PDT) Received: from jvdlux ([109.125.16.20]) by smtp.gmail.com with ESMTPSA id v34sm4764491edm.91.2018.03.12.01.15.11 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 12 Mar 2018 01:15:12 -0700 (PDT) From: Jason Vas Dias To: x86@kernel.org, LKML , Thomas Gleixner , andi , Peter Zijlstra , Ingo Molnar Subject: [PATCH v4.16-rc4 1/3] x86/vdso: on Intel, VDSO should handle CLOCK_MONOTONIC_RAW Date: Sun, 11 Mar 2018 06:25:56 +0000 Lines: 193 User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) Message-ID: <7eqhiub5.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Currently the VDSO does not handle clock_gettime( CLOCK_MONOTONIC_RAW, &ts ) on Intel / AMD - it calls vdso_fallback_gettime() for this clock, which issues a syscall, having an unacceptably high latency (minimum measurable time or time between measurements) of 300-700ns on 2 2.8-3.9ghz Haswell x86_64 Family'_'Model : 06_3C machines under various versions of Linux. Sometimes, particularly when correlating elapsed time to performance counter values, code needs to know elapsed time from the perspective of the CPU no matter how "hot" / fast or "cold" / slow it might be running wrt NTP / PTP ; when code needs this, the latencies with a syscall are often unacceptably high. I reported this as Bug #198161 : 'https://bugzilla.kernel.org/show_bug.cgi?id=198961' and in previous posts with subjects matching 'CLOCK_MONOTONIC_RAW' . This patch handles CLOCK_MONOTONIC_RAW clock_gettime() in the VDSO , by exporting the raw clock calibration, last cycles, last xtime_nsec, and last raw_sec value in the vsyscall_gtod_data during vsyscall_update() . Now the new do_monotonic_raw() function in the vDSO has a latency of @ 24ns on average, about the same as do_monotonic(), and the test program: tools/testing/selftest/timers/inconsistency-check.c succeeds with arguments: '-c 4 -t 120' or any arbitrary -t value. The patch is against Linus' latest 4.16-rc5 tree, current HEAD of : git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git . The patch affects only files: arch/x86/include/asm/vgtod.h arch/x86/entry/vdso/vclock_gettime.c arch/x86/entry/vsyscall/vsyscall_gtod.c This is a resend of the original patch fixing review issues - the next patch will add the rdtscp() function . The patch passes the checkpatch.pl script . Best Regards, Jason Vas Dias . --- diff -up linux-4.16-rc5.1/arch/x86/entry/vdso/vclock_gettime.c.4.16-rc5 linux-4.16-rc5.1/arch/x86/entry/vdso/vclock_gettime.c --- linux-4.16-rc5.1/arch/x86/entry/vdso/vclock_gettime.c.4.16-rc5 2018-03-12 00:25:09.000000000 +0000 +++ linux-4.16-rc5.1/arch/x86/entry/vdso/vclock_gettime.c 2018-03-12 08:12:17.110120433 +0000 @@ -182,6 +182,18 @@ notrace static u64 vread_tsc(void) return last; } +notrace static u64 vread_tsc_raw(void) +{ + u64 tsc + , last = gtod->raw_cycle_last; + + tsc = rdtsc_ordered(); + if (likely(tsc >= last)) + return tsc; + asm volatile (""); + return last; +} + notrace static inline u64 vgetsns(int *mode) { u64 v; @@ -203,6 +215,27 @@ notrace static inline u64 vgetsns(int *m return v * gtod->mult; } +notrace static inline u64 vgetsns_raw(int *mode) +{ + u64 v; + cycles_t cycles; + + if (gtod->vclock_mode == VCLOCK_TSC) + cycles = vread_tsc_raw(); +#ifdef CONFIG_PARAVIRT_CLOCK + else if (gtod->vclock_mode == VCLOCK_PVCLOCK) + cycles = vread_pvclock(mode); +#endif +#ifdef CONFIG_HYPERV_TSCPAGE + else if (gtod->vclock_mode == VCLOCK_HVCLOCK) + cycles = vread_hvclock(mode); +#endif + else + return 0; + v = (cycles - gtod->raw_cycle_last) & gtod->raw_mask; + return v * gtod->raw_mult; +} + /* Code size doesn't matter (vdso is 4k anyway) and this is faster. */ notrace static int __always_inline do_realtime(struct timespec *ts) { @@ -246,6 +279,27 @@ notrace static int __always_inline do_mo return mode; } +notrace static __always_inline int do_monotonic_raw(struct timespec *ts) +{ + unsigned long seq; + u64 ns; + int mode; + + do { + seq = gtod_read_begin(gtod); + mode = gtod->vclock_mode; + ts->tv_sec = gtod->monotonic_time_raw_sec; + ns = gtod->monotonic_time_raw_nsec; + ns += vgetsns_raw(&mode); + ns >>= gtod->raw_shift; + } while (unlikely(gtod_read_retry(gtod, seq))); + + ts->tv_sec += __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns); + ts->tv_nsec = ns; + + return mode; +} + notrace static void do_realtime_coarse(struct timespec *ts) { unsigned long seq; @@ -277,6 +331,10 @@ notrace int __vdso_clock_gettime(clockid if (do_monotonic(ts) == VCLOCK_NONE) goto fallback; break; + case CLOCK_MONOTONIC_RAW: + if (do_monotonic_raw(ts) == VCLOCK_NONE) + goto fallback; + break; case CLOCK_REALTIME_COARSE: do_realtime_coarse(ts); break; diff -up linux-4.16-rc5.1/arch/x86/entry/vsyscall/vsyscall_gtod.c.4.16-rc5 linux-4.16-rc5.1/arch/x86/entry/vsyscall/vsyscall_gtod.c --- linux-4.16-rc5.1/arch/x86/entry/vsyscall/vsyscall_gtod.c.4.16-rc5 2018-03-12 00:25:09.000000000 +0000 +++ linux-4.16-rc5.1/arch/x86/entry/vsyscall/vsyscall_gtod.c 2018-03-12 07:58:07.974214168 +0000 @@ -45,6 +45,11 @@ void update_vsyscall(struct timekeeper * vdata->mult = tk->tkr_mono.mult; vdata->shift = tk->tkr_mono.shift; + vdata->raw_cycle_last = tk->tkr_raw.cycle_last; + vdata->raw_mask = tk->tkr_raw.mask; + vdata->raw_mult = tk->tkr_raw.mult; + vdata->raw_shift = tk->tkr_raw.shift; + vdata->wall_time_sec = tk->xtime_sec; vdata->wall_time_snsec = tk->tkr_mono.xtime_nsec; @@ -74,5 +79,8 @@ void update_vsyscall(struct timekeeper * vdata->monotonic_time_coarse_sec++; } + vdata->monotonic_time_raw_sec = tk->raw_sec; + vdata->monotonic_time_raw_nsec = tk->tkr_raw.xtime_nsec; + gtod_write_end(vdata); } diff -up linux-4.16-rc5.1/arch/x86/include/asm/msr.h.4.16-rc5 linux-4.16-rc5.1/arch/x86/include/asm/msr.h --- linux-4.16-rc5.1/arch/x86/include/asm/msr.h.4.16-rc5 2018-03-12 00:25:09.000000000 +0000 +++ linux-4.16-rc5.1/arch/x86/include/asm/msr.h 2018-03-12 08:01:25.400289554 +0000 @@ -218,7 +218,7 @@ static __always_inline unsigned long lon return rdtsc(); } -/* Deprecated, keep it for a cycle for easier merging: */ +/* Deprecated, keep it for a cycle for easier merging: */o #define rdtscll(now) do { (now) = rdtsc_ordered(); } while (0) static inline unsigned long long native_read_pmc(int counter) diff -up linux-4.16-rc5.1/arch/x86/include/asm/vgtod.h.4.16-rc5 linux-4.16-rc5.1/arch/x86/include/asm/vgtod.h --- linux-4.16-rc5.1/arch/x86/include/asm/vgtod.h.4.16-rc5 2018-03-12 00:25:09.000000000 +0000 +++ linux-4.16-rc5.1/arch/x86/include/asm/vgtod.h 2018-03-12 07:44:17.910539760 +0000 @@ -22,6 +22,10 @@ struct vsyscall_gtod_data { u64 mask; u32 mult; u32 shift; + u64 raw_cycle_last; + u64 raw_mask; + u32 raw_mult; + u32 raw_shift; /* open coded 'struct timespec' */ u64 wall_time_snsec; @@ -32,6 +36,8 @@ struct vsyscall_gtod_data { gtod_long_t wall_time_coarse_nsec; gtod_long_t monotonic_time_coarse_sec; gtod_long_t monotonic_time_coarse_nsec; + gtod_long_t monotonic_time_raw_sec; + gtod_long_t monotonic_time_raw_nsec; int tz_minuteswest; int tz_dsttime; ---