2015-07-28 00:46:04

by Hall, Christopher S

[permalink] [raw]
Subject: [PATCH 0/5] Patchset enabling hardware based cross-timestamps for next gen Intel platforms

Next generation Intel platforms will have an Always Running
Timer (ART) that always runs when the system is powered and
is available to both the CPU and various on-board devices.
Initially, those devices include audio and network. The
ART will give these devices the capability of precisely
cross timestamping their local device clock with the system
clock.

A system clock value like TSC or ART is not useful
unless translated to system time. The first three patches
enable this by changing the timekeeping code to return a
system time given a system clock value and translating ART
to TSC.

The last two patches modify the PTP driver to call a
cross timestamp function in the driver when available and
perform the cross timestamp in the e1000e driver.

Given the precise relationship between the network device
clock and system time enables better synchronization of
events on multiple network connected devices.

Changelog:

* The PTP portion of the patch set was posted 7/8/2015 (v3)
and rejected because of there wasn't a driver that
implemented the new API. Now, the driver patch is added
and the PTP patch operation is modified to revert to
previous behavior when cross timestamp can't be
completed. This is indicated by the driver returning a
non-zero value.

Christopher Hall (5):
Add functions producing system time given a backing counter value
Added functions mapping TSC value to system time
Add calls to translate Always Running Timer (ART) to system time
Add support for driver cross-timestamp to PTP_SYS_OFFSET ioctl
Enables cross timestamping in the e1000e driver

Documentation/ptp/testptp.c | 6 +-
arch/x86/Kconfig | 12 ++
arch/x86/include/asm/art.h | 42 ++++++
arch/x86/include/asm/tsc.h | 5 +
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/art.c | 134 ++++++++++++++++++
arch/x86/kernel/tsc.c | 22 +++
drivers/net/ethernet/intel/e1000e/defines.h | 7 +
drivers/net/ethernet/intel/e1000e/ptp.c | 77 ++++++++++
drivers/net/ethernet/intel/e1000e/regs.h | 4 +
drivers/ptp/ptp_chardev.c | 29 ++--
include/linux/ptp_clock_kernel.h | 7 +
include/linux/timekeeping.h | 8 ++
include/uapi/linux/ptp_clock.h | 4 +-
kernel/time/timekeeping.c | 211 +++++++++++++++++++++++++---
15 files changed, 538 insertions(+), 31 deletions(-)
create mode 100644 arch/x86/include/asm/art.h
create mode 100644 arch/x86/kernel/art.c

--
1.9.1


2015-07-28 00:47:23

by Hall, Christopher S

[permalink] [raw]
Subject: [PATCH 1/5] Add functions producing system time given a backing counter value

* counter_to_rawmono64
* counter_to_mono64
* counter_to_realtime64

Enables drivers to translate a captured system clock counter to system
time. This is useful for network and audio devices that capture timestamps
in terms of both the system clock and device clock.

Signed-off-by: Christopher Hall <[email protected]>
---
include/linux/timekeeping.h | 8 ++
kernel/time/timekeeping.c | 211 +++++++++++++++++++++++++++++++++++++++-----
2 files changed, 199 insertions(+), 20 deletions(-)

diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index 6e191e4..04e6455 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -1,6 +1,8 @@
#ifndef _LINUX_TIMEKEEPING_H
#define _LINUX_TIMEKEEPING_H

+struct clocksource; /* Defined in clocksource.h */
+
/* Included from linux/ktime.h */

void timekeeping_init(void);
@@ -27,6 +29,12 @@ struct timespec __current_kernel_time(void);
*/
struct timespec64 get_monotonic_coarse64(void);
extern void getrawmonotonic64(struct timespec64 *ts);
+extern int counter_to_rawmono64
+(struct timespec64 *rawmono, cycle_t counterval, struct clocksource *cs);
+extern int counter_to_mono64
+(struct timespec64 *mono, cycle_t counterval, struct clocksource *cs);
+extern int counter_to_realtime64
+(struct timespec64 *realtime, cycle_t counterval, struct clocksource *cs);
extern void ktime_get_ts64(struct timespec64 *ts);
extern time64_t ktime_get_seconds(void);
extern time64_t ktime_get_real_seconds(void);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index bca3667..1ca4fe0 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -116,6 +116,8 @@ static inline void tk_update_sleep_time(struct timekeeper *tk, ktime_t delta)
tk->offs_boot = ktime_add(tk->offs_boot, delta);
}

+#define ROLLOVER_THRESHOLD (2ULL << 39)
+
#ifdef CONFIG_DEBUG_TIMEKEEPING
#define WARNING_FREQ (HZ*300) /* 5 minute rate-limiting */

@@ -158,10 +160,11 @@ static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset)
}
}

-static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
+static inline int timekeeping_get_delta
+(cycle_t *delta, struct tk_read_base *tkr, cycle_t *counterval)
{
struct timekeeper *tk = &tk_core.timekeeper;
- cycle_t now, last, mask, max, delta;
+ cycle_t now = *counterval, last, mask, max, delta;
unsigned int seq;

/*
@@ -173,46 +176,61 @@ static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
*/
do {
seq = read_seqcount_begin(&tk_core.seq);
- now = tkr->read(tkr->clock);
+ if (counterval == NULL) {
+ now = tkr->read(tkr->clock);
+ } else {
+ if (now < tkr->cycle_last &&
+ tkr->cycle_last - now < ROLLOVER_THRESHOLD)
+ return -EAGAIN;
+ }
last = tkr->cycle_last;
mask = tkr->mask;
max = tkr->clock->max_cycles;
} while (read_seqcount_retry(&tk_core.seq, seq));

- delta = clocksource_delta(now, last, mask);
+ *delta = clocksource_delta(now, last, mask);

/*
* Try to catch underflows by checking if we are seeing small
* mask-relative negative values.
*/
- if (unlikely((~delta & mask) < (mask >> 3))) {
+ if (unlikely((~*delta & mask) < (mask >> 3))) {
tk->underflow_seen = 1;
- delta = 0;
+ *delta = 0;
}

/* Cap delta value to the max_cycles values to avoid mult overflows */
- if (unlikely(delta > max)) {
- tk->overflow_seen = 1;
- delta = tkr->clock->max_cycles;
+ if (unlikely(*delta > max)) {
+ tk->underflow_seen = 1;
+ *delta = tkr->clock->max_cycles;
}

- return delta;
+ return 0;
}
#else
-static inline void timekeeping_check_update(struct timekeeper *tk, cycle_t offset)
+static inline void timekeeping_check_update
+(struct timekeeper *tk, cycle_t offset)
{
}
-static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
+static inline cycle_t timekeeping_get_delta
+(cycle_t *delta, struct tk_read_base *tkr, cycle_t *counterval)
{
- cycle_t cycle_now, delta;
+ cycle_t cycle_now;

/* read clocksource */
- cycle_now = tkr->read(tkr->clock);
+ if (counterval == NULL) {
+ cycle_now = tkr->read(tkr->clock);
+ } else {
+ cycle_now = *counterval;
+ if (cycle_now < tkr->cycle_last &&
+ tkr->cycle_last - cycle_now < ROLLOVER_THRESHOLD)
+ return -EAGAIN;
+ }

/* calculate the delta since the last update_wall_time */
- delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
+ *delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);

- return delta;
+ return 0;
}
#endif

@@ -298,13 +316,11 @@ u32 (*arch_gettimeoffset)(void) = default_arch_gettimeoffset;
static inline u32 arch_gettimeoffset(void) { return 0; }
#endif

-static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
+static inline s64 timekeeping_delta_to_ns
+(struct tk_read_base *tkr, cycle_t delta)
{
- cycle_t delta;
s64 nsec;

- delta = timekeeping_get_delta(tkr);
-
nsec = delta * tkr->mult + tkr->xtime_nsec;
nsec >>= tkr->shift;

@@ -312,6 +328,28 @@ static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
return nsec + arch_gettimeoffset();
}

+static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
+{
+ cycle_t delta;
+
+ timekeeping_get_delta(&delta, tkr, NULL);
+ return timekeeping_delta_to_ns(tkr, delta);
+}
+
+static inline int timekeeping_counter_to_ns
+(s64 *nsec, struct tk_read_base *tkr, cycle_t counterval)
+{
+ cycle_t delta;
+ int err;
+
+ err = timekeeping_get_delta(&delta, tkr, &counterval);
+ if (err != 0)
+ return err;
+
+ *nsec = timekeeping_delta_to_ns(tkr, delta);
+ return 0;
+}
+
/**
* update_fast_timekeeper - Update the fast and NMI safe monotonic timekeeper.
* @tkr: Timekeeping readout base from which we take the update
@@ -1117,6 +1155,139 @@ void getrawmonotonic64(struct timespec64 *ts)
EXPORT_SYMBOL(getrawmonotonic64);


+ /**
+ * counterval_to_rawmono64 - Returns the raw monotonic time given a counter
+ * value
+ * @counterval: counter value (input)
+ * @rawmono: raw monotonic clock (output)
+ * @cs: clocksource from which clockvalue is derived
+ *
+ * Returns:
+ * 0 - Success
+ * -EAGAIN - Clock value is in the past, try again
+ * -ENXIO - Clocksource 'cs' doesn't match the current clocksource
+ *
+ */
+int counter_to_rawmono64(struct timespec64 *rawmono, cycle_t counterval,
+ struct clocksource *cs)
+{
+ struct timekeeper *tk = &tk_core.timekeeper;
+ struct clocksource *curr_clock;
+ struct timespec64 ts64;
+ unsigned long seq;
+ s64 nsecs;
+ int err;
+
+ do {
+ seq = read_seqcount_begin(&tk_core.seq);
+ err = timekeeping_counter_to_ns
+ (&nsecs, &tk->tkr_raw, counterval);
+ if (err != 0)
+ return err;
+ ts64 = tk->raw_time;
+ curr_clock = tk->tkr_raw.clock;
+ } while (read_seqcount_retry(&tk_core.seq, seq));
+
+ if (curr_clock != cs)
+ return -ENXIO;
+
+ timespec64_add_ns(&ts64, nsecs);
+ *rawmono = ts64;
+
+ return 0;
+}
+EXPORT_SYMBOL(counter_to_rawmono64);
+
+ /**
+ * counterval_to_realtime64 - Returns the real time given a counter
+ * value
+ * @counterval: counter value (input)
+ * @realtime: realtime clock (output)
+ * @cs: clocksource from which clockvalue is derived
+ *
+ * Returns:
+ * 0 - Success
+ * -EAGAIN - Clock value is in the past, try again
+ * -ENXIO - Clocksource 'cs' doesn't match the current clocksource
+ *
+ */
+int counter_to_realtime64(struct timespec64 *realtime, cycle_t counterval,
+ struct clocksource *cs)
+{
+ struct timekeeper *tk = &tk_core.timekeeper;
+ struct clocksource *curr_clock;
+ struct timespec64 ts64;
+ unsigned long seq;
+ s64 nsecs;
+ int err;
+
+ do {
+ seq = read_seqcount_begin(&tk_core.seq);
+ err = timekeeping_counter_to_ns
+ (&nsecs, &tk->tkr_mono, counterval);
+ if (err != 0)
+ return err;
+ ts64.tv_sec = tk->xtime_sec;
+ curr_clock = tk->tkr_mono.clock;
+ } while (read_seqcount_retry(&tk_core.seq, seq));
+
+ if (curr_clock != cs)
+ return -ENXIO;
+
+ ts64.tv_nsec = 0;
+ timespec64_add_ns(&ts64, nsecs);
+ *realtime = ts64;
+
+ return 0;
+}
+EXPORT_SYMBOL(counter_to_realtime64);
+
+ /**
+ * counterval_to_mono64 - Returns the real time given a counter
+ * value
+ * @counterval: counter value (input)
+ * @realtime: realtime clock (output)
+ * @cs: clocksource from which clockvalue is derived
+ *
+ * Returns:
+ * 0 - Success
+ * -EAGAIN - Clock value is in the past, try again
+ * -ENXIO - Clocksource 'cs' doesn't match the current clocksource
+ *
+ */
+int counter_to_mono64(struct timespec64 *mono, cycle_t counterval,
+ struct clocksource *cs)
+{
+ struct timekeeper *tk = &tk_core.timekeeper;
+ struct clocksource *curr_clock;
+ struct timespec64 tomono;
+ struct timespec64 ts64;
+ unsigned long seq;
+ s64 nsecs;
+ int err;
+
+ do {
+ seq = read_seqcount_begin(&tk_core.seq);
+ err = timekeeping_counter_to_ns
+ (&nsecs, &tk->tkr_mono, counterval);
+ if (err != 0)
+ return err;
+ ts64.tv_sec = tk->xtime_sec;
+ tomono = tk->wall_to_monotonic;
+ curr_clock = tk->tkr_mono.clock;
+ } while (read_seqcount_retry(&tk_core.seq, seq));
+
+ if (curr_clock != cs)
+ return -ENXIO;
+
+ ts64.tv_nsec = 0;
+ timespec64_add_ns(&ts64, nsecs);
+ *mono = timespec64_add(ts64, tomono);
+
+ return 0;
+}
+EXPORT_SYMBOL(counter_to_mono64);
+
/**
* timekeeping_valid_for_hres - Check if timekeeping is suitable for hres
*/
--
1.9.1

2015-07-28 00:47:21

by Hall, Christopher S

[permalink] [raw]
Subject: [PATCH 2/5] Added functions mapping TSC value to system time

* tsc_to_rawmono64
* tsc_to_realtime64
* tsc_to_mono64

These function build on the counter_to_* time translation function
specifically for TSC based system clock

Signed-off-by: Christopher Hall <[email protected]>
---
arch/x86/include/asm/tsc.h | 5 +++++
arch/x86/kernel/tsc.c | 18 ++++++++++++++++++
2 files changed, 23 insertions(+)

diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index 94605c0..3fdf8b1 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -5,6 +5,7 @@
#define _ASM_X86_TSC_H

#include <asm/processor.h>
+#include <linux/time64.h>

#define NS_SCALE 10 /* 2^10, carefully chosen */
#define US_SCALE 32 /* 2^32, arbitralrily chosen */
@@ -52,6 +53,10 @@ extern int check_tsc_unstable(void);
extern int check_tsc_disabled(void);
extern unsigned long native_calibrate_tsc(void);

+extern int tsc_to_rawmono64(struct timespec64 *rawmono, cycle_t counterval);
+extern int tsc_to_realtime64(struct timespec64 *realtime, cycle_t counterval);
+extern int tsc_to_mono64(struct timespec64 *mono, cycle_t counterval);
+
extern int tsc_clocksource_reliable;

/*
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 7437b41..a192271 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -979,6 +979,24 @@ static cycle_t read_tsc(struct clocksource *cs)
return (cycle_t)get_cycles();
}

+int tsc_to_rawmono64(struct timespec64 *rawmono, cycle_t counterval)
+{
+ return counter_to_rawmono64(rawmono, counterval, &clocksource_tsc);
+}
+EXPORT_SYMBOL(tsc_to_rawmono64);
+
+int tsc_to_realtime64(struct timespec64 *realtime, cycle_t counterval)
+{
+ return counter_to_realtime64(realtime, counterval, &clocksource_tsc);
+}
+EXPORT_SYMBOL(tsc_to_realtime64);
+
+int tsc_to_mono64(struct timespec64 *mono, cycle_t counterval)
+{
+ return counter_to_mono64(mono, counterval, &clocksource_tsc);
+}
+EXPORT_SYMBOL(tsc_to_mono64);
+
/*
* .mask MUST be CLOCKSOURCE_MASK(64). See comment above read_tsc()
*/
--
1.9.1

2015-07-28 00:46:13

by Hall, Christopher S

[permalink] [raw]
Subject: [PATCH 3/5] Add calls to translate Always Running Timer (ART) to system time

* art_to_mono64
* art_to_rawmono64
* art_to_realtime64

Intel audio and PCH ethernet devices use the Always Running Timer (ART) to
relate their device clock to system time

Signed-off-by: Christopher Hall <[email protected]>
---
arch/x86/Kconfig | 12 ++++
arch/x86/include/asm/art.h | 42 ++++++++++++++
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/art.c | 134 +++++++++++++++++++++++++++++++++++++++++++++
arch/x86/kernel/tsc.c | 4 ++
5 files changed, 193 insertions(+)
create mode 100644 arch/x86/include/asm/art.h
create mode 100644 arch/x86/kernel/art.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b3a1a5d..1ef9985 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1175,6 +1175,18 @@ config X86_CPUID
with major 203 and minors 0 to 31 for /dev/cpu/0/cpuid to
/dev/cpu/31/cpuid.

+config X86_ART
+ bool "Always Running Timer"
+ default y
+ depends on X86_TSC
+ ---help---
+ This option provides functionality to drivers and devices that use
+ the always-running-timer (ART) to correlate their device clock
+ counter with the system clock counter. The TSC is *exactly* related
+ to the ART by a ratio m/n specified by CPUID leaf 0x15
+ (n=EAX,m=EBX). If ART is unused or unavailable there isn't any
+ performance impact. It's safe to say Y.
+
choice
prompt "High Memory Support"
default HIGHMEM4G
diff --git a/arch/x86/include/asm/art.h b/arch/x86/include/asm/art.h
new file mode 100644
index 0000000..da58ce4
--- /dev/null
+++ b/arch/x86/include/asm/art.h
@@ -0,0 +1,42 @@
+/*
+ * x86 ART related functions
+ */
+#ifndef _ASM_X86_ART_H
+#define _ASM_X86_ART_H
+
+#ifndef CONFIG_X86_ART
+
+static inline int setup_art(void)
+{
+ return 0;
+}
+
+static inline bool has_art(void)
+{
+ return false;
+}
+
+static inline int art_to_rawmono64(struct timespec64 *rawmono, cycle_t art)
+{
+ return -ENXIO;
+}
+static inline int art_to_realtime64(struct timespec64 *realtime, cycle_t art)
+{
+ return -ENXIO;
+}
+static inline int art_to_mono64(struct timespec64 *mono, cycle_t art)
+{
+ return -ENXIO;
+}
+
+#else
+
+extern int setup_art(void);
+extern bool has_art(void);
+extern int art_to_rawmono64(struct timespec64 *rawmono, cycle_t art);
+extern int art_to_realtime64(struct timespec64 *realtime, cycle_t art);
+extern int art_to_mono64(struct timespec64 *mono, cycle_t art);
+
+#endif
+
+#endif/*_ASM_X86_ART_H*/
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 0f15af4..0908311 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -109,6 +109,7 @@ obj-$(CONFIG_PERF_EVENTS) += perf_regs.o
obj-$(CONFIG_TRACING) += tracepoint.o
obj-$(CONFIG_IOSF_MBI) += iosf_mbi.o
obj-$(CONFIG_PMC_ATOM) += pmc_atom.o
+obj-$(CONFIG_X86_ART) += art.o

###
# 64 bit specific files
diff --git a/arch/x86/kernel/art.c b/arch/x86/kernel/art.c
new file mode 100644
index 0000000..1906cf0
--- /dev/null
+++ b/arch/x86/kernel/art.c
@@ -0,0 +1,134 @@
+#include <asm/tsc.h>
+#include <asm/cpufeature.h>
+#include <asm/processor.h>
+#include <linux/spinlock.h>
+#include <linux/seqlock.h>
+
+#define CPUID_ART_LEAF 0x15
+
+static bool art_present;
+
+static struct art_state {
+ seqcount_t seq;
+ u32 art_ratio_numerator;
+ u32 art_ratio_denominator;
+
+ cycle_t prev_art;
+ cycle_t prev_tsc_corr_art; /*This is the TSC value corresponding to
+ prev_art */
+ u32 tsc_remainder;
+} art_state ____cacheline_aligned;
+
+static DEFINE_RAW_SPINLOCK(art_lock);
+
+#define MIN_DENOMINATOR 2
+int setup_art(void)
+{
+ if (boot_cpu_data.cpuid_level < CPUID_ART_LEAF)
+ return 0;
+ art_state.art_ratio_denominator = cpuid_eax(CPUID_ART_LEAF);
+ if (art_state.art_ratio_denominator < MIN_DENOMINATOR)
+ return 0;
+ art_state.art_ratio_numerator = cpuid_ebx(CPUID_ART_LEAF);
+
+ art_present = true;
+ return 0;
+}
+
+static bool has_art(void)
+{
+ return art_present;
+}
+EXPORT_SYMBOL(has_art);
+
+#define ROLLOVER_THRESHOLD (2ULL << 23)
+
+static u32 art_scale(struct art_state *art_state, cycle_t *art)
+{
+ u32 rem;
+
+ *art *= art_state->art_ratio_numerator;
+
+ switch (art_state->art_ratio_denominator) {
+ default:
+ rem = do_div(*art, art_state->art_ratio_denominator);
+ case 2:
+ rem = *art & 0x1;
+ *art >>= 1;
+ }
+ return rem + art_state->tsc_remainder;
+}
+
+static cycle_t art_to_tsc(cycle_t art)
+{
+ unsigned seq;
+ cycle_t tsc_next;
+ u32 rem_next;
+ bool backward = false;
+ unsigned long flags;
+
+ do {
+ seq = read_seqcount_begin(&art_state.seq);
+
+ if (art < art_state.prev_art &&
+ art_state.prev_art - art < ROLLOVER_THRESHOLD) {
+ tsc_next = (art_state.prev_art-art);
+ art_scale(&art_state, &tsc_next);
+ tsc_next = art_state.prev_tsc_corr_art - tsc_next;
+ backward = true;
+ } else {
+ tsc_next = art - art_state.prev_art;
+ rem_next = art_scale(&art_state, &tsc_next);
+ tsc_next += art_state.prev_tsc_corr_art;
+
+ tsc_next += rem_next /
+ art_state.art_ratio_denominator;
+ rem_next %= art_state.art_ratio_denominator;
+ }
+ } while (read_seqcount_retry(&art_state.seq, seq));
+
+ /* There's no need to update after every read, if an update is
+ already in progress by someone else just exit */
+ if (!backward && raw_spin_trylock_irqsave(&art_lock, flags)) {
+ write_seqcount_begin(&art_state.seq);
+ art_state.prev_art = art;
+ art_state.prev_tsc_corr_art = tsc_next;
+ art_state.tsc_remainder = rem_next;
+ write_seqcount_end(&art_state.seq);
+ raw_spin_unlock_irqrestore(&art_lock, flags);
+ }
+
+ return tsc_next;
+}
+
+static bool checked_art_to_tsc(cycle_t *tsc)
+{
+ if (!has_art())
+ return false;
+ *tsc = art_to_tsc(*tsc);
+ return true;
+}
+
+static int art_to_rawmono64(struct timespec64 *rawmono, cycle_t art)
+{
+ if (!checked_art_to_tsc(&art))
+ return -ENXIO;
+ return tsc_to_rawmono64(rawmono, art);
+}
+EXPORT_SYMBOL(art_to_rawmono64);
+
+static int art_to_realtime64(struct timespec64 *realtime, cycle_t art)
+{
+ if (!checked_art_to_tsc(&art))
+ return -ENXIO;
+ return tsc_to_realtime64(realtime, art);
+}
+EXPORT_SYMBOL(art_to_realtime64);
+
+static int art_to_mono64(struct timespec64 *mono, cycle_t art)
+{
+ if (!checked_art_to_tsc(&art))
+ return -ENXIO;
+ return tsc_to_mono64(mono, art);
+}
+EXPORT_SYMBOL(art_to_mono64);
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index a192271..828c4b3 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -22,6 +22,8 @@
#include <asm/nmi.h>
#include <asm/x86_init.h>

+#include <asm/art.h>
+
unsigned int __read_mostly cpu_khz; /* TSC clocks / usec, not used here */
EXPORT_SYMBOL(cpu_khz);

@@ -1177,6 +1179,8 @@ static int __init init_tsc_clocksource(void)
return 0;
}

+ setup_art();
+
schedule_delayed_work(&tsc_irqwork, 0);
return 0;
}
--
1.9.1

2015-07-28 00:46:11

by Hall, Christopher S

[permalink] [raw]
Subject: [PATCH 4/5] Add support for driver cross-timestamp to PTP_SYS_OFFSET ioctl

This patch allows system and device time ("cross-timestamp") to be
performed by the driver. Currently, the cross-timestamping is performed
in the PTP_SYS_OFFSET ioctl. The PTP clock driver reads gettimeofday()
and the gettime64() callback provided by the driver. The cross-timestamp
is best effort where the latency between the capture of system time
(getnstimeofday()) and the device time (driver callback) may be
significant.

This patch adds an additional callback getsynctime64(). Which will be
called when the driver is able to perform a more accurate, implementation
specific cross-timestamping. For example, future network devices that
implement PCIE PTM will be able to precisely correlate the device clock
with the system clock with virtually zero latency between captures.
This added callback can be used by the driver to expose this functionality.

The callback, getsynctime64(), will only be called when defined and
n_samples == 1 because the driver returns only 1 cross-timestamp where
multiple samples cannot be chained together.

This patch also adds to the capabilities ioctl (PTP_CLOCK_GETCAPS),
allowing applications to query whether or not drivers implement the
getsynctime callback, providing more precise cross timestamping.

Commit Details:

Added additional callback to ptp_clock_info:

* getsynctime64()

This takes 2 arguments referring to system and device time

With this callback drivers may provide both system time and device time
to ensure precise correlation

Modified PTP_SYS_OFFSET ioctl in PTP clock driver to use the above
callback if it's available

Added capability (PTP_CLOCK_GETCAPS) for checking whether driver supports
cross timestamping

Added check for cross timestamping flag to testptp.c

Signed-off-by: Christopher Hall <[email protected]>
---
Documentation/ptp/testptp.c | 6 ++++--
drivers/ptp/ptp_chardev.c | 29 +++++++++++++++++++++--------
include/linux/ptp_clock_kernel.h | 7 +++++++
include/uapi/linux/ptp_clock.h | 4 +++-
4 files changed, 35 insertions(+), 11 deletions(-)

diff --git a/Documentation/ptp/testptp.c b/Documentation/ptp/testptp.c
index 2bc8abc..8004efd 100644
--- a/Documentation/ptp/testptp.c
+++ b/Documentation/ptp/testptp.c
@@ -276,13 +276,15 @@ int main(int argc, char *argv[])
" %d external time stamp channels\n"
" %d programmable periodic signals\n"
" %d pulse per second\n"
- " %d programmable pins\n",
+ " %d programmable pins\n"
+ " %d cross timestamping\n",
caps.max_adj,
caps.n_alarm,
caps.n_ext_ts,
caps.n_per_out,
caps.pps,
- caps.n_pins);
+ caps.n_pins,
+ caps.cross_timestamping);
}
}

diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
index da7bae9..392ccfa 100644
--- a/drivers/ptp/ptp_chardev.c
+++ b/drivers/ptp/ptp_chardev.c
@@ -124,7 +124,7 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
struct ptp_clock_info *ops = ptp->info;
struct ptp_clock_time *pct;
- struct timespec64 ts;
+ struct timespec64 ts, systs;
int enable, err = 0;
unsigned int i, pin_index;

@@ -138,6 +138,7 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
caps.n_per_out = ptp->info->n_per_out;
caps.pps = ptp->info->pps;
caps.n_pins = ptp->info->n_pins;
+ caps.cross_timestamping = ptp->info->getsynctime64 != NULL;
if (copy_to_user((void __user *)arg, &caps, sizeof(caps)))
err = -EFAULT;
break;
@@ -196,19 +197,31 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
break;
}
pct = &sysoff->ts[0];
- for (i = 0; i < sysoff->n_samples; i++) {
- getnstimeofday64(&ts);
+ if (ptp->info->getsynctime64 && sysoff->n_samples == 1 &&
+ ptp->info->getsynctime64(ptp->info, &ts, &systs) == 0) {
+ pct->sec = systs.tv_sec;
+ pct->nsec = systs.tv_nsec;
+ pct++;
pct->sec = ts.tv_sec;
pct->nsec = ts.tv_nsec;
pct++;
- ptp->info->gettime64(ptp->info, &ts);
+ pct->sec = systs.tv_sec;
+ pct->nsec = systs.tv_nsec;
+ } else {
+ for (i = 0; i < sysoff->n_samples; i++) {
+ getnstimeofday64(&ts);
+ pct->sec = ts.tv_sec;
+ pct->nsec = ts.tv_nsec;
+ pct++;
+ ptp->info->gettime64(ptp->info, &ts);
+ pct->sec = ts.tv_sec;
+ pct->nsec = ts.tv_nsec;
+ pct++;
+ }
+ getnstimeofday64(&ts);
pct->sec = ts.tv_sec;
pct->nsec = ts.tv_nsec;
- pct++;
}
- getnstimeofday64(&ts);
- pct->sec = ts.tv_sec;
- pct->nsec = ts.tv_nsec;
if (copy_to_user((void __user *)arg, sysoff, sizeof(*sysoff)))
err = -EFAULT;
break;
diff --git a/include/linux/ptp_clock_kernel.h b/include/linux/ptp_clock_kernel.h
index b8b7306..8c43345 100644
--- a/include/linux/ptp_clock_kernel.h
+++ b/include/linux/ptp_clock_kernel.h
@@ -67,6 +67,11 @@ struct ptp_clock_request {
* @gettime64: Reads the current time from the hardware clock.
* parameter ts: Holds the result.
*
+ * @getsynctime64: Reads the current time from the hardware clock and system
+ * clock simultaneously.
+ * parameter dev: Holds the device time
+ * parameter sys: Holds the system time
+ *
* @settime64: Set the current time on the hardware clock.
* parameter ts: Time value to set.
*
@@ -105,6 +110,8 @@ struct ptp_clock_info {
int (*adjfreq)(struct ptp_clock_info *ptp, s32 delta);
int (*adjtime)(struct ptp_clock_info *ptp, s64 delta);
int (*gettime64)(struct ptp_clock_info *ptp, struct timespec64 *ts);
+ int (*getsynctime64)(struct ptp_clock_info *ptp, struct timespec64 *dev,
+ struct timespec64 *sys);
int (*settime64)(struct ptp_clock_info *p, const struct timespec64 *ts);
int (*enable)(struct ptp_clock_info *ptp,
struct ptp_clock_request *request, int on);
diff --git a/include/uapi/linux/ptp_clock.h b/include/uapi/linux/ptp_clock.h
index f0b7bfe..ffb2635 100644
--- a/include/uapi/linux/ptp_clock.h
+++ b/include/uapi/linux/ptp_clock.h
@@ -51,7 +51,9 @@ struct ptp_clock_caps {
int n_per_out; /* Number of programmable periodic signals. */
int pps; /* Whether the clock supports a PPS callback. */
int n_pins; /* Number of input/output pins. */
- int rsv[14]; /* Reserved for future use. */
+ /* Whether the clock supports precise system-device cross timestamps */
+ int cross_timestamping;
+ int rsv[13]; /* Reserved for future use. */
};

struct ptp_extts_request {
--
1.9.1

2015-07-28 00:46:09

by Hall, Christopher S

[permalink] [raw]
Subject: [PATCH 5/5] Enables cross timestamping in the e1000e driver

Adds getsynctime64() callback which captures "raw" cross timestamp
between ART and Ethernet device clock
Translates captured ART to REALTIME clock

Signed-off-by: Christopher Hall <[email protected]>
---
drivers/net/ethernet/intel/e1000e/defines.h | 7 +++
drivers/net/ethernet/intel/e1000e/ptp.c | 83 +++++++++++++++++++++++++++++
drivers/net/ethernet/intel/e1000e/regs.h | 4 ++
3 files changed, 94 insertions(+)

diff --git a/drivers/net/ethernet/intel/e1000e/defines.h b/drivers/net/ethernet/intel/e1000e/defines.h
index 133d407..9f16269 100644
--- a/drivers/net/ethernet/intel/e1000e/defines.h
+++ b/drivers/net/ethernet/intel/e1000e/defines.h
@@ -527,6 +527,13 @@
#define E1000_RXCW_C 0x20000000 /* Receive config */
#define E1000_RXCW_SYNCH 0x40000000 /* Receive config synch */

+/* HH Time Sync */
+#define E1000_TSYNCTXCTL_MAX_ALLOWED_DLY_MASK 0x0000F000 /* max delay */
+#define E1000_TSYNCTXCTL_SYNC_COMP 0x40000000 /* sync complete
+ */
+#define E1000_TSYNCTXCTL_START_SYNC 0x80000000 /* initiate sync
+ */
+
#define E1000_TSYNCTXCTL_VALID 0x00000001 /* Tx timestamp valid */
#define E1000_TSYNCTXCTL_ENABLED 0x00000010 /* enable Tx timestamping */

diff --git a/drivers/net/ethernet/intel/e1000e/ptp.c b/drivers/net/ethernet/intel/e1000e/ptp.c
index 25a0ad5..bd8b888 100644
--- a/drivers/net/ethernet/intel/e1000e/ptp.c
+++ b/drivers/net/ethernet/intel/e1000e/ptp.c
@@ -25,6 +25,7 @@
*/

#include "e1000.h"
+#include <asm/art.h>

/**
* e1000e_phc_adjfreq - adjust the frequency of the hardware clock
@@ -98,6 +99,87 @@ static int e1000e_phc_adjtime(struct ptp_clock_info *ptp, s64 delta)
return 0;
}

+#define HW_WAIT_COUNT (2)
+#define HW_RETRY_COUNT (2)
+
+static int e1000e_phc_read_timestamp_pair(struct e1000_adapter *adapter,
+ u64 *systim, u64 *plttim)
+{
+ struct e1000_hw *hw = &adapter->hw;
+ int i, j;
+ u32 tsync_ctrl;
+ int ret;
+
+ if (hw->mac.type < e1000_pch_spt)
+ return -ENXIO;
+
+ for (j = 0; j < HW_RETRY_COUNT; ++j) {
+ tsync_ctrl = er32(TSYNCTXCTL);
+ tsync_ctrl |= E1000_TSYNCTXCTL_START_SYNC |
+ E1000_TSYNCTXCTL_MAX_ALLOWED_DLY_MASK;
+ ew32(TSYNCTXCTL, tsync_ctrl);
+ ret = 0;
+ for (i = 0; i < HW_WAIT_COUNT; ++i) {
+ udelay(2);
+ tsync_ctrl = er32(TSYNCTXCTL);
+ if (tsync_ctrl & E1000_TSYNCTXCTL_SYNC_COMP)
+ break;
+ }
+
+ if (i == HW_WAIT_COUNT) {
+ ret = -ETIMEDOUT;
+ } else if (ret == 0) {
+ *plttim = er32(PLTSTMPH);
+ *plttim <<= 32;
+ *plttim |= er32(PLTSTMPL);
+ *systim = er32(SYSSTMPH);
+ *systim <<= 32;
+ *systim |= er32(SYSSTMPL);
+ break;
+ }
+ }
+
+ return ret;
+}
+
+#define SYNCTIME_RETRY_COUNT (2)
+
+static int e1000e_phc_getsynctime(struct ptp_clock_info *ptp,
+ struct timespec64 *devts,
+ struct timespec64 *systs)
+{
+ struct e1000_adapter *adapter = container_of(ptp, struct e1000_adapter,
+ ptp_clock_info);
+ unsigned long flags;
+ u32 remainder;
+ u64 plttim, systim;
+
+ int i, ret;
+
+ if (!has_art())
+ return -ENXIO;
+
+ for (i = 0; i < SYNCTIME_RETRY_COUNT; ++i) {
+ ret = e1000e_phc_read_timestamp_pair
+ (adapter, &systim, &plttim);
+ if (ret != 0)
+ continue;
+
+ ret = art_to_realtime64(systs, plttim);
+ if (ret == 0) {
+ spin_lock_irqsave(&adapter->systim_lock, flags);
+ systim = timecounter_cyc2time(&adapter->tc, systim);
+ spin_unlock_irqrestore(&adapter->systim_lock, flags);
+ devts->tv_sec =
+ div_u64_rem(systim, NSEC_PER_SEC, &remainder);
+ devts->tv_nsec = remainder;
+ break;
+ }
+ }
+
+ return ret;
+}
+
/**
* e1000e_phc_gettime - Reads the current time from the hardware clock
* @ptp: ptp clock structure
@@ -190,6 +272,7 @@ static const struct ptp_clock_info e1000e_ptp_clock_info = {
.adjfreq = e1000e_phc_adjfreq,
.adjtime = e1000e_phc_adjtime,
.gettime64 = e1000e_phc_gettime,
+ .getsynctime64 = e1000e_phc_getsynctime,
.settime64 = e1000e_phc_settime,
.enable = e1000e_phc_enable,
};
diff --git a/drivers/net/ethernet/intel/e1000e/regs.h b/drivers/net/ethernet/intel/e1000e/regs.h
index b24e5fe..4dd5b54 100644
--- a/drivers/net/ethernet/intel/e1000e/regs.h
+++ b/drivers/net/ethernet/intel/e1000e/regs.h
@@ -246,6 +246,10 @@
#define E1000_SYSTIML 0x0B600 /* System time register Low - RO */
#define E1000_SYSTIMH 0x0B604 /* System time register High - RO */
#define E1000_TIMINCA 0x0B608 /* Increment attributes register - RW */
+#define E1000_SYSSTMPL 0x0B648 /* HH Timesync system stamp low register */
+#define E1000_SYSSTMPH 0x0B64C /* HH Timesync system stamp hi register */
+#define E1000_PLTSTMPL 0x0B640 /* HH Timesync platform stamp low register */
+#define E1000_PLTSTMPH 0x0B644 /* HH Timesync platform stamp hi register */
#define E1000_RXMTRL 0x0B634 /* Time sync Rx EtherType and Msg Type - RW */
#define E1000_RXUDP 0x0B638 /* Time Sync Rx UDP Port - RW */

--
1.9.1

2015-07-28 01:32:31

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 3/5] Add calls to translate Always Running Timer (ART) to system time

On 07/27/2015 05:46 PM, Christopher Hall wrote:
> * art_to_mono64
> * art_to_rawmono64
> * art_to_realtime64
>
> Intel audio and PCH ethernet devices use the Always Running Timer (ART) to
> relate their device clock to system time
>
> Signed-off-by: Christopher Hall <[email protected]>
> ---
> arch/x86/Kconfig | 12 ++++
> arch/x86/include/asm/art.h | 42 ++++++++++++++
> arch/x86/kernel/Makefile | 1 +
> arch/x86/kernel/art.c | 134 +++++++++++++++++++++++++++++++++++++++++++++
> arch/x86/kernel/tsc.c | 4 ++
> 5 files changed, 193 insertions(+)
> create mode 100644 arch/x86/include/asm/art.h
> create mode 100644 arch/x86/kernel/art.c
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index b3a1a5d..1ef9985 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1175,6 +1175,18 @@ config X86_CPUID
> with major 203 and minors 0 to 31 for /dev/cpu/0/cpuid to
> /dev/cpu/31/cpuid.
>
> +config X86_ART
> + bool "Always Running Timer"
> + default y
> + depends on X86_TSC
> + ---help---
> + This option provides functionality to drivers and devices that use
> + the always-running-timer (ART) to correlate their device clock
> + counter with the system clock counter. The TSC is *exactly* related
> + to the ART by a ratio m/n specified by CPUID leaf 0x15
> + (n=EAX,m=EBX). If ART is unused or unavailable there isn't any
> + performance impact. It's safe to say Y.
> +

Is there a good reason to make this optional?

Also, is there *still* no way to ask the thing for its nominal
frequnency? Or can we expect CPUID leaf 16H to work on CPUs that
support this and can we expect it to actually work? The SDM says "The
returned information should not be used for any other purpose as the
returned information does not accurately correlate to information /
counters returned by other processor interfaces."

Also, does this thing let us learn the real time base? SDM 17.14.4
suggests that the ART value isn't affected by "privileged software" (aka
buggy/malicious firmware). Or, alternatively, how do we learn the
offset K between ART and scaled TSC?

> choice
> prompt "High Memory Support"
> default HIGHMEM4G
> diff --git a/arch/x86/include/asm/art.h b/arch/x86/include/asm/art.h
> new file mode 100644
> index 0000000..da58ce4
> --- /dev/null
> +++ b/arch/x86/include/asm/art.h
> @@ -0,0 +1,42 @@
> +/*
> + * x86 ART related functions
> + */
> +#ifndef _ASM_X86_ART_H
> +#define _ASM_X86_ART_H
> +
> +#ifndef CONFIG_X86_ART
> +
> +static inline int setup_art(void)
> +{
> + return 0;
> +}
> +
> +static inline bool has_art(void)
> +{
> + return false;
> +}
> +
> +static inline int art_to_rawmono64(struct timespec64 *rawmono, cycle_t art)
> +{
> + return -ENXIO;
> +}
> +static inline int art_to_realtime64(struct timespec64 *realtime, cycle_t art)
> +{
> + return -ENXIO;
> +}
> +static inline int art_to_mono64(struct timespec64 *mono, cycle_t art)
> +{
> + return -ENXIO;
> +}
> +
> +#else
> +
> +extern int setup_art(void);
> +extern bool has_art(void);
> +extern int art_to_rawmono64(struct timespec64 *rawmono, cycle_t art);
> +extern int art_to_realtime64(struct timespec64 *realtime, cycle_t art);
> +extern int art_to_mono64(struct timespec64 *mono, cycle_t art);
> +
> +#endif
> +
> +#endif/*_ASM_X86_ART_H*/
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index 0f15af4..0908311 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -109,6 +109,7 @@ obj-$(CONFIG_PERF_EVENTS) += perf_regs.o
> obj-$(CONFIG_TRACING) += tracepoint.o
> obj-$(CONFIG_IOSF_MBI) += iosf_mbi.o
> obj-$(CONFIG_PMC_ATOM) += pmc_atom.o
> +obj-$(CONFIG_X86_ART) += art.o
>
> ###
> # 64 bit specific files
> diff --git a/arch/x86/kernel/art.c b/arch/x86/kernel/art.c
> new file mode 100644
> index 0000000..1906cf0
> --- /dev/null
> +++ b/arch/x86/kernel/art.c
> @@ -0,0 +1,134 @@
> +#include <asm/tsc.h>
> +#include <asm/cpufeature.h>
> +#include <asm/processor.h>
> +#include <linux/spinlock.h>
> +#include <linux/seqlock.h>
> +
> +#define CPUID_ART_LEAF 0x15
> +
> +static bool art_present;
> +
> +static struct art_state {
> + seqcount_t seq;
> + u32 art_ratio_numerator;
> + u32 art_ratio_denominator;
> +

This needs much better comments, IMO, including some discussion of how
the locking works.

> + cycle_t prev_art;
> + cycle_t prev_tsc_corr_art; /*This is the TSC value corresponding to
> + prev_art */
> + u32 tsc_remainder;
> +} art_state ____cacheline_aligned;
> +
> +static DEFINE_RAW_SPINLOCK(art_lock);
> +
> +#define MIN_DENOMINATOR 2
> +int setup_art(void)
> +{
> + if (boot_cpu_data.cpuid_level < CPUID_ART_LEAF)
> + return 0;
> + art_state.art_ratio_denominator = cpuid_eax(CPUID_ART_LEAF);
> + if (art_state.art_ratio_denominator < MIN_DENOMINATOR)
> + return 0;
> + art_state.art_ratio_numerator = cpuid_ebx(CPUID_ART_LEAF);
> +
> + art_present = true;
> + return 0;
> +}
> +
> +static bool has_art(void)
> +{
> + return art_present;
> +}
> +EXPORT_SYMBOL(has_art);

IMO this should be a pseudo-feature X86_FEATURE_ART.

> +
> +#define ROLLOVER_THRESHOLD (2ULL << 23)
> +
> +static u32 art_scale(struct art_state *art_state, cycle_t *art)
> +{
> + u32 rem;
> +
> + *art *= art_state->art_ratio_numerator;

This seems very likely to overflow. I think you should be using the
equivalent of mul_u128_u64_shr (which probably doesn't exist, but
mul_u64_u32_shr does) or just open-code it using __uint128_t if this
thing is x86_64-only.

> +static cycle_t art_to_tsc(cycle_t art)
> +{

This function needs some kind of description of what it does.

Also, I don't think it's okay to have a pure-sounding thing like
art_to_xyz actually be stateful and rely on a current value being passed.

Also, how does this code account for the unspecified ART-to-TSC offset?
I don't see it in the code on a quick reading.

--Andy

2015-07-28 03:44:09

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH 1/5] Add functions producing system time given a backing counter value

On Mon, Jul 27, 2015 at 5:46 PM, Christopher Hall
<[email protected]> wrote:
> * counter_to_rawmono64
> * counter_to_mono64
> * counter_to_realtime64
>
> Enables drivers to translate a captured system clock counter to system
> time. This is useful for network and audio devices that capture timestamps
> in terms of both the system clock and device clock.

Huh. So for counter_to_realtime64 & mono64, this seems to ignore the
fact that the multiplier is constantly adjusted and corrected. So that
calling the function twice with the same counter value may result in
different returned values.

I've not yet groked the whole patchset, but it seems like there needs
to be some mechanism that ensures the counter value is captured and
used in the same (or at least close) interval that the timekeeper data
is valid for.

thanks
-john

2015-07-28 04:05:29

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH 1/5] Add functions producing system time given a backing counter value

On Mon, Jul 27, 2015 at 8:44 PM, John Stultz <[email protected]> wrote:
> On Mon, Jul 27, 2015 at 5:46 PM, Christopher Hall
> <[email protected]> wrote:
>> * counter_to_rawmono64
>> * counter_to_mono64
>> * counter_to_realtime64
>>
>> Enables drivers to translate a captured system clock counter to system
>> time. This is useful for network and audio devices that capture timestamps
>> in terms of both the system clock and device clock.
>
> Huh. So for counter_to_realtime64 & mono64, this seems to ignore the
> fact that the multiplier is constantly adjusted and corrected. So that
> calling the function twice with the same counter value may result in
> different returned values.
>
> I've not yet groked the whole patchset, but it seems like there needs
> to be some mechanism that ensures the counter value is captured and
> used in the same (or at least close) interval that the timekeeper data
> is valid for.


So reading through. It looks like you only use art_to_realtime(), right?

So again, since CLOCK_MONOTONIC and CLOCK_REALTIME are constaly being
frequency adjusted, it might be best to construct this delta in the
following way.


Add counter_to_rawmono64(), which should be able to safely calculate
the corresponding CLOCK_MONOTONIC_RAW time from any given cycle value.

Use getnstime_raw_and_real() to get a immediate snapshot of current
MONOTONIC_RAW and REALTIME clocks.

Then calculate the delta between the snapshotted counter raw time, and
the current raw time. Then apply that offset to the current realtime.

The larger the raw-time delta, the larger the possible realtime error.
But I think that will be as good as it gets.

This should simplify the interfaces you're adding to the timekeeping core.

thanks
-john

2015-07-28 04:11:30

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH 3/5] Add calls to translate Always Running Timer (ART) to system time

On Mon, Jul 27, 2015 at 5:46 PM, Christopher Hall
<[email protected]> wrote:
> +static bool checked_art_to_tsc(cycle_t *tsc)
> +{
> + if (!has_art())
> + return false;
> + *tsc = art_to_tsc(*tsc);
> + return true;
> +}
> +
> +static int art_to_rawmono64(struct timespec64 *rawmono, cycle_t art)
> +{
> + if (!checked_art_to_tsc(&art))
> + return -ENXIO;
> + return tsc_to_rawmono64(rawmono, art);
> +}
> +EXPORT_SYMBOL(art_to_rawmono64);

This all seems to assume the TSC is the current clocksource, which it
may not be if the user has overridden it.

If instead there were a counter_to_rawmono64() which took the counter
value and maybe the name of the clocksource (if the strncmp is
affordable for your use), it might be easier for the core to provide
an error if the current timekeeping clocksource isn't the one the
counter value is based on. This would also allow the tsc_to_*()
midlayers to be dropped (since they don't seem to do much).

thanks
-john

2015-07-29 01:18:48

by Hall, Christopher S

[permalink] [raw]
Subject: RE: [PATCH 3/5] Add calls to translate Always Running Timer (ART) to system time



> -----Original Message-----
> From: Andy Lutomirski [mailto:[email protected]]
> Sent: Monday, July 27, 2015 6:32 PM
> To: Hall, Christopher S; [email protected]; [email protected];
> [email protected]; [email protected]; Kirsher, Jeffrey T; Ronciak,
> John; [email protected]; [email protected]
> Cc: [email protected]; [email protected]; Borislav
> Petkov
> Subject: Re: [PATCH 3/5] Add calls to translate Always Running Timer
> (ART) to system time
>
> On 07/27/2015 05:46 PM, Christopher Hall wrote:
> > * art_to_mono64
> > * art_to_rawmono64
> > * art_to_realtime64
> >
> > Intel audio and PCH ethernet devices use the Always Running Timer
> (ART) to
> > relate their device clock to system time
> >
> > Signed-off-by: Christopher Hall <[email protected]>
> > ---
> > arch/x86/Kconfig | 12 ++++
> > arch/x86/include/asm/art.h | 42 ++++++++++++++
> > arch/x86/kernel/Makefile | 1 +
> > arch/x86/kernel/art.c | 134
> +++++++++++++++++++++++++++++++++++++++++++++
> > arch/x86/kernel/tsc.c | 4 ++
> > 5 files changed, 193 insertions(+)
> > create mode 100644 arch/x86/include/asm/art.h
> > create mode 100644 arch/x86/kernel/art.c
> >
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index b3a1a5d..1ef9985 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -1175,6 +1175,18 @@ config X86_CPUID
> > with major 203 and minors 0 to 31 for /dev/cpu/0/cpuid to
> > /dev/cpu/31/cpuid.
> >
> > +config X86_ART
> > + bool "Always Running Timer"
> > + default y
> > + depends on X86_TSC
> > + ---help---
> > + This option provides functionality to drivers and devices that
> use
> > + the always-running-timer (ART) to correlate their device clock
> > + counter with the system clock counter. The TSC is *exactly*
> related
> > + to the ART by a ratio m/n specified by CPUID leaf 0x15
> > + (n=EAX,m=EBX). If ART is unused or unavailable there isn't any
> > + performance impact. It's safe to say Y.
> > +
>
> Is there a good reason to make this optional?

If there aren't any objections, it sound OK to me. So no, I don't know
of any good reasons.

>
> Also, is there *still* no way to ask the thing for its nominal
> frequnency? Or can we expect CPUID leaf 16H to work on CPUs that
> support this and can we expect it to actually work?

There isn't any way to query nominal frequency. CPUID leaf 0x15 only
exposes the relationship between ART and TSC. CPUID leaf 0x16 stays
the more or less the same and isn't related to ART.

The SDM says "The
> returned information should not be used for any other purpose as the
> returned information does not accurately correlate to information /
> counters returned by other processor interfaces."
>
> Also, does this thing let us learn the real time base? SDM 17.14.4
> suggests that the ART value isn't affected by "privileged software" (aka
> buggy/malicious firmware). Or, alternatively, how do we learn the
> offset K between ART and scaled TSC?

ART isn't affected by software. The determination of K used to convert ART to
TSC is in a footnote (2) in that section of the SDM. I'm not going to risk
repeating it here and possibly altering its meaning.

>
> > choice
> > prompt "High Memory Support"
> > default HIGHMEM4G
> > diff --git a/arch/x86/include/asm/art.h b/arch/x86/include/asm/art.h
> > new file mode 100644
> > index 0000000..da58ce4
> > --- /dev/null
> > +++ b/arch/x86/include/asm/art.h
> > @@ -0,0 +1,42 @@
> > +/*
> > + * x86 ART related functions
> > + */
> > +#ifndef _ASM_X86_ART_H
> > +#define _ASM_X86_ART_H
> > +
> > +#ifndef CONFIG_X86_ART
> > +
> > +static inline int setup_art(void)
> > +{
> > + return 0;
> > +}
> > +
> > +static inline bool has_art(void)
> > +{
> > + return false;
> > +}
> > +
> > +static inline int art_to_rawmono64(struct timespec64 *rawmono,
> cycle_t art)
> > +{
> > + return -ENXIO;
> > +}
> > +static inline int art_to_realtime64(struct timespec64 *realtime,
> cycle_t art)
> > +{
> > + return -ENXIO;
> > +}
> > +static inline int art_to_mono64(struct timespec64 *mono, cycle_t art)
> > +{
> > + return -ENXIO;
> > +}
> > +
> > +#else
> > +
> > +extern int setup_art(void);
> > +extern bool has_art(void);
> > +extern int art_to_rawmono64(struct timespec64 *rawmono, cycle_t art);
> > +extern int art_to_realtime64(struct timespec64 *realtime, cycle_t
> art);
> > +extern int art_to_mono64(struct timespec64 *mono, cycle_t art);
> > +
> > +#endif
> > +
> > +#endif/*_ASM_X86_ART_H*/
> > diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> > index 0f15af4..0908311 100644
> > --- a/arch/x86/kernel/Makefile
> > +++ b/arch/x86/kernel/Makefile
> > @@ -109,6 +109,7 @@ obj-$(CONFIG_PERF_EVENTS) +=
> perf_regs.o
> > obj-$(CONFIG_TRACING) += tracepoint.o
> > obj-$(CONFIG_IOSF_MBI) += iosf_mbi.o
> > obj-$(CONFIG_PMC_ATOM) += pmc_atom.o
> > +obj-$(CONFIG_X86_ART) += art.o
> >
> > ###
> > # 64 bit specific files
> > diff --git a/arch/x86/kernel/art.c b/arch/x86/kernel/art.c
> > new file mode 100644
> > index 0000000..1906cf0
> > --- /dev/null
> > +++ b/arch/x86/kernel/art.c
> > @@ -0,0 +1,134 @@
> > +#include <asm/tsc.h>
> > +#include <asm/cpufeature.h>
> > +#include <asm/processor.h>
> > +#include <linux/spinlock.h>
> > +#include <linux/seqlock.h>
> > +
> > +#define CPUID_ART_LEAF 0x15
> > +
> > +static bool art_present;
> > +
> > +static struct art_state {
> > + seqcount_t seq;
> > + u32 art_ratio_numerator;
> > + u32 art_ratio_denominator;
> > +
>
> This needs much better comments, IMO, including some discussion of how
> the locking works.

I'll work on the comments for version 2.

>
> > + cycle_t prev_art;
> > + cycle_t prev_tsc_corr_art; /*This is the TSC value corresponding
> to
> > + prev_art */
> > + u32 tsc_remainder;
> > +} art_state ____cacheline_aligned;
> > +
> > +static DEFINE_RAW_SPINLOCK(art_lock);
> > +
> > +#define MIN_DENOMINATOR 2
> > +int setup_art(void)
> > +{
> > + if (boot_cpu_data.cpuid_level < CPUID_ART_LEAF)
> > + return 0;
> > + art_state.art_ratio_denominator = cpuid_eax(CPUID_ART_LEAF);
> > + if (art_state.art_ratio_denominator < MIN_DENOMINATOR)
> > + return 0;
> > + art_state.art_ratio_numerator = cpuid_ebx(CPUID_ART_LEAF);
> > +
> > + art_present = true;
> > + return 0;
> > +}
> > +
> > +static bool has_art(void)
> > +{
> > + return art_present;
> > +}
> > +EXPORT_SYMBOL(has_art);
>
> IMO this should be a pseudo-feature X86_FEATURE_ART.

Good idea.

>
> > +
> > +#define ROLLOVER_THRESHOLD (2ULL << 23)
> > +
> > +static u32 art_scale(struct art_state *art_state, cycle_t *art)
> > +{
> > + u32 rem;
> > +
> > + *art *= art_state->art_ratio_numerator;
>
> This seems very likely to overflow.

I don't think it would. The art (u64) value is a delta:

tsc_next = art - art_state.prev_art;
rem_next = art_scale(&art_state, &tsc_next);

The time to overflow would depend upon the TSC frequency and m, n
values, but it should handle 10+ years of delta.

I think you should be using the
> equivalent of mul_u128_u64_shr (which probably doesn't exist, but
> mul_u64_u32_shr does) or just open-code it using __uint128_t if this
> thing is x86_64-only.
>
> > +static cycle_t art_to_tsc(cycle_t art)
> > +{
>
> This function needs some kind of description of what it does.

More descriptive comments for v2.

>
> Also, I don't think it's okay to have a pure-sounding thing like
> art_to_xyz actually be stateful and rely on a current value being
> passed.

I guess I see your point, but I'm not sure how to change it. Would
a more descriptive verb (e.g. "convert") address this?

>
> Also, how does this code account for the unspecified ART-to-TSC offset?
> I don't see it in the code on a quick reading.

It's not there. The assumption is that the TSC adjust registers is 0.

Thanks for your comments.

Chris

>
> --Andy

2015-07-29 01:41:39

by Hall, Christopher S

[permalink] [raw]
Subject: RE: [PATCH 1/5] Add functions producing system time given a backing counter value


> -----Original Message-----
> From: John Stultz [mailto:[email protected]]
> Sent: Monday, July 27, 2015 8:44 PM
> To: Hall, Christopher S
> Cc: Thomas Gleixner; Richard Cochran; Ingo Molnar; Kirsher, Jeffrey T;
> Ronciak, John; H. Peter Anvin; [email protected]; lkml;
> [email protected]
> Subject: Re: [PATCH 1/5] Add functions producing system time given a
> backing counter value
>
> On Mon, Jul 27, 2015 at 5:46 PM, Christopher Hall
> <[email protected]> wrote:
> > * counter_to_rawmono64
> > * counter_to_mono64
> > * counter_to_realtime64
> >
> > Enables drivers to translate a captured system clock counter to system
> > time. This is useful for network and audio devices that capture
> timestamps
> > in terms of both the system clock and device clock.
>
> Huh. So for counter_to_realtime64 & mono64, this seems to ignore the
> fact that the multiplier is constantly adjusted and corrected. So that
> calling the function twice with the same counter value may result in
> different returned values.
>
> I've not yet groked the whole patchset, but it seems like there needs
> to be some mechanism that ensures the counter value is captured and
> used in the same (or at least close) interval that the timekeeper data
> is valid for.

The ART (and derived TSC) values are always in the past. There's no
chance that we could exceed the interval. I don't think any similar
usage would be a problem either.

Are you suggesting that, for completeness, this be enforced by the
conversion function?

I do a check here to make sure that the current counter value isn't before
the beginning of the current interval:

timekeeping_get_delta()
...
if (cycle_now < tkr->cycle_last &&
tkr->cycle_last - cycle_now < ROLLOVER_THRESHOLD)
return -EAGAIN;

If tkr->cycle_last - cycle_now is large, the assumption is that
rollover occurred. Otherwise, the caller should re-read the counter
so that it falls within the current interval. In my "normal use"
testing, re-read never occurred.

Thanks for your input.

Chris

>
> thanks
> -john
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-07-29 01:47:49

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 3/5] Add calls to translate Always Running Timer (ART) to system time

On Tue, Jul 28, 2015 at 6:18 PM, Hall, Christopher S
<[email protected]> wrote:
>
>
>> -----Original Message-----
>> From: Andy Lutomirski [mailto:[email protected]]
>> Sent: Monday, July 27, 2015 6:32 PM
>> To: Hall, Christopher S; [email protected]; [email protected];
>> [email protected]; [email protected]; Kirsher, Jeffrey T; Ronciak,
>> John; [email protected]; [email protected]
>> Cc: [email protected]; [email protected]; Borislav
>> Petkov
>> Subject: Re: [PATCH 3/5] Add calls to translate Always Running Timer
>> (ART) to system time
>>
>> On 07/27/2015 05:46 PM, Christopher Hall wrote:
>> > * art_to_mono64
>> > * art_to_rawmono64
>> > * art_to_realtime64
>> >
>> > Intel audio and PCH ethernet devices use the Always Running Timer
>> (ART) to
>> > relate their device clock to system time
>> >
>> > Signed-off-by: Christopher Hall <[email protected]>
>> > ---
>> > arch/x86/Kconfig | 12 ++++
>> > arch/x86/include/asm/art.h | 42 ++++++++++++++
>> > arch/x86/kernel/Makefile | 1 +
>> > arch/x86/kernel/art.c | 134
>> +++++++++++++++++++++++++++++++++++++++++++++
>> > arch/x86/kernel/tsc.c | 4 ++
>> > 5 files changed, 193 insertions(+)
>> > create mode 100644 arch/x86/include/asm/art.h
>> > create mode 100644 arch/x86/kernel/art.c
>> >
>> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> > index b3a1a5d..1ef9985 100644
>> > --- a/arch/x86/Kconfig
>> > +++ b/arch/x86/Kconfig
>> > @@ -1175,6 +1175,18 @@ config X86_CPUID
>> > with major 203 and minors 0 to 31 for /dev/cpu/0/cpuid to
>> > /dev/cpu/31/cpuid.
>> >
>> > +config X86_ART
>> > + bool "Always Running Timer"
>> > + default y
>> > + depends on X86_TSC
>> > + ---help---
>> > + This option provides functionality to drivers and devices that
>> use
>> > + the always-running-timer (ART) to correlate their device clock
>> > + counter with the system clock counter. The TSC is *exactly*
>> related
>> > + to the ART by a ratio m/n specified by CPUID leaf 0x15
>> > + (n=EAX,m=EBX). If ART is unused or unavailable there isn't any
>> > + performance impact. It's safe to say Y.
>> > +
>>
>> Is there a good reason to make this optional?
>
> If there aren't any objections, it sound OK to me. So no, I don't know
> of any good reasons.
>
>>
>> Also, is there *still* no way to ask the thing for its nominal
>> frequnency? Or can we expect CPUID leaf 16H to work on CPUs that
>> support this and can we expect it to actually work?
>
> There isn't any way to query nominal frequency. CPUID leaf 0x15 only
> exposes the relationship between ART and TSC. CPUID leaf 0x16 stays
> the more or less the same and isn't related to ART.
>
> The SDM says "The
>> returned information should not be used for any other purpose as the
>> returned information does not accurately correlate to information /
>> counters returned by other processor interfaces."
>>
>> Also, does this thing let us learn the real time base? SDM 17.14.4
>> suggests that the ART value isn't affected by "privileged software" (aka
>> buggy/malicious firmware). Or, alternatively, how do we learn the
>> offset K between ART and scaled TSC?
>
> ART isn't affected by software. The determination of K used to convert ART to
> TSC is in a footnote (2) in that section of the SDM. I'm not going to risk
> repeating it here and possibly altering its meaning.
>
>>
>> > choice
>> > prompt "High Memory Support"
>> > default HIGHMEM4G
>> > diff --git a/arch/x86/include/asm/art.h b/arch/x86/include/asm/art.h
>> > new file mode 100644
>> > index 0000000..da58ce4
>> > --- /dev/null
>> > +++ b/arch/x86/include/asm/art.h
>> > @@ -0,0 +1,42 @@
>> > +/*
>> > + * x86 ART related functions
>> > + */
>> > +#ifndef _ASM_X86_ART_H
>> > +#define _ASM_X86_ART_H
>> > +
>> > +#ifndef CONFIG_X86_ART
>> > +
>> > +static inline int setup_art(void)
>> > +{
>> > + return 0;
>> > +}
>> > +
>> > +static inline bool has_art(void)
>> > +{
>> > + return false;
>> > +}
>> > +
>> > +static inline int art_to_rawmono64(struct timespec64 *rawmono,
>> cycle_t art)
>> > +{
>> > + return -ENXIO;
>> > +}
>> > +static inline int art_to_realtime64(struct timespec64 *realtime,
>> cycle_t art)
>> > +{
>> > + return -ENXIO;
>> > +}
>> > +static inline int art_to_mono64(struct timespec64 *mono, cycle_t art)
>> > +{
>> > + return -ENXIO;
>> > +}
>> > +
>> > +#else
>> > +
>> > +extern int setup_art(void);
>> > +extern bool has_art(void);
>> > +extern int art_to_rawmono64(struct timespec64 *rawmono, cycle_t art);
>> > +extern int art_to_realtime64(struct timespec64 *realtime, cycle_t
>> art);
>> > +extern int art_to_mono64(struct timespec64 *mono, cycle_t art);
>> > +
>> > +#endif
>> > +
>> > +#endif/*_ASM_X86_ART_H*/
>> > diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
>> > index 0f15af4..0908311 100644
>> > --- a/arch/x86/kernel/Makefile
>> > +++ b/arch/x86/kernel/Makefile
>> > @@ -109,6 +109,7 @@ obj-$(CONFIG_PERF_EVENTS) +=
>> perf_regs.o
>> > obj-$(CONFIG_TRACING) += tracepoint.o
>> > obj-$(CONFIG_IOSF_MBI) += iosf_mbi.o
>> > obj-$(CONFIG_PMC_ATOM) += pmc_atom.o
>> > +obj-$(CONFIG_X86_ART) += art.o
>> >
>> > ###
>> > # 64 bit specific files
>> > diff --git a/arch/x86/kernel/art.c b/arch/x86/kernel/art.c
>> > new file mode 100644
>> > index 0000000..1906cf0
>> > --- /dev/null
>> > +++ b/arch/x86/kernel/art.c
>> > @@ -0,0 +1,134 @@
>> > +#include <asm/tsc.h>
>> > +#include <asm/cpufeature.h>
>> > +#include <asm/processor.h>
>> > +#include <linux/spinlock.h>
>> > +#include <linux/seqlock.h>
>> > +
>> > +#define CPUID_ART_LEAF 0x15
>> > +
>> > +static bool art_present;
>> > +
>> > +static struct art_state {
>> > + seqcount_t seq;
>> > + u32 art_ratio_numerator;
>> > + u32 art_ratio_denominator;
>> > +
>>
>> This needs much better comments, IMO, including some discussion of how
>> the locking works.
>
> I'll work on the comments for version 2.
>
>>
>> > + cycle_t prev_art;
>> > + cycle_t prev_tsc_corr_art; /*This is the TSC value corresponding
>> to
>> > + prev_art */
>> > + u32 tsc_remainder;
>> > +} art_state ____cacheline_aligned;
>> > +
>> > +static DEFINE_RAW_SPINLOCK(art_lock);
>> > +
>> > +#define MIN_DENOMINATOR 2
>> > +int setup_art(void)
>> > +{
>> > + if (boot_cpu_data.cpuid_level < CPUID_ART_LEAF)
>> > + return 0;
>> > + art_state.art_ratio_denominator = cpuid_eax(CPUID_ART_LEAF);
>> > + if (art_state.art_ratio_denominator < MIN_DENOMINATOR)
>> > + return 0;
>> > + art_state.art_ratio_numerator = cpuid_ebx(CPUID_ART_LEAF);
>> > +
>> > + art_present = true;
>> > + return 0;
>> > +}
>> > +
>> > +static bool has_art(void)
>> > +{
>> > + return art_present;
>> > +}
>> > +EXPORT_SYMBOL(has_art);
>>
>> IMO this should be a pseudo-feature X86_FEATURE_ART.
>
> Good idea.
>
>>
>> > +
>> > +#define ROLLOVER_THRESHOLD (2ULL << 23)
>> > +
>> > +static u32 art_scale(struct art_state *art_state, cycle_t *art)
>> > +{
>> > + u32 rem;
>> > +
>> > + *art *= art_state->art_ratio_numerator;
>>
>> This seems very likely to overflow.
>
> I don't think it would. The art (u64) value is a delta:
>
> tsc_next = art - art_state.prev_art;
> rem_next = art_scale(&art_state, &tsc_next);
>
> The time to overflow would depend upon the TSC frequency and m, n
> values, but it should handle 10+ years of delta.

Since I don't work at Intel and I don't have access to prerelease
stuff, I'm just guessing. But suppose that the numerator and
denominator are both around 2^31 and are similar to each other so the
ART counts at around TSC rate, i.e. ~2 GHz. Then we overflow in 8
seconds, I think.

>
> I think you should be using the
>> equivalent of mul_u128_u64_shr (which probably doesn't exist, but
>> mul_u64_u32_shr does) or just open-code it using __uint128_t if this
>> thing is x86_64-only.
>>
>> > +static cycle_t art_to_tsc(cycle_t art)
>> > +{
>>
>> This function needs some kind of description of what it does.
>
> More descriptive comments for v2.
>
>>
>> Also, I don't think it's okay to have a pure-sounding thing like
>> art_to_xyz actually be stateful and rely on a current value being
>> passed.
>
> I guess I see your point, but I'm not sure how to change it. Would
> a more descriptive verb (e.g. "convert") address this?

I can't really comment on what to call it since I still don't really
know what it does. Maybe update_art_state? Why is it stateful,
anyway?

>
>>
>> Also, how does this code account for the unspecified ART-to-TSC offset?
>> I don't see it in the code on a quick reading.
>
> It's not there. The assumption is that the TSC adjust registers is 0.

And the VMCS adjustment is zero, too? That's not true in practice, I
think, so a passed-through PTM NIC won't work right.

FWIW, I have a shiny BIOS that intentionally changes my TSC offset on
boot. I'm typing this email on that piece of crap right now. Unless
the SDM says that this kind of tomfoolery is impossible, we should
assume that it's not only possible but that BIOS vendors will do it
out of sheer spite.

--Andy

2015-07-29 02:05:15

by Hall, Christopher S

[permalink] [raw]
Subject: RE: [PATCH 3/5] Add calls to translate Always Running Timer (ART) to system time



> -----Original Message-----
> From: John Stultz [mailto:[email protected]]
> Sent: Monday, July 27, 2015 9:11 PM
> To: Hall, Christopher S
> Cc: Thomas Gleixner; Richard Cochran; Ingo Molnar; Kirsher, Jeffrey T;
> Ronciak, John; H. Peter Anvin; [email protected]; lkml;
> [email protected]
> Subject: Re: [PATCH 3/5] Add calls to translate Always Running Timer
> (ART) to system time
>
> On Mon, Jul 27, 2015 at 5:46 PM, Christopher Hall
> <[email protected]> wrote:
> > +static bool checked_art_to_tsc(cycle_t *tsc)
> > +{
> > + if (!has_art())
> > + return false;
> > + *tsc = art_to_tsc(*tsc);
> > + return true;
> > +}
> > +
> > +static int art_to_rawmono64(struct timespec64 *rawmono, cycle_t art)
> > +{
> > + if (!checked_art_to_tsc(&art))
> > + return -ENXIO;
> > + return tsc_to_rawmono64(rawmono, art);
> > +}
> > +EXPORT_SYMBOL(art_to_rawmono64);
>
> This all seems to assume the TSC is the current clocksource, which it
> may not be if the user has overridden it.

I don't make that assumption. The counter_to_* functions take a
pointer to a clocksource struct. They return -ENXIO if that clocksource
doesn’t match the current clocksource.

The tsc_to_* functions pass the tsc clocksource pointer to the counter_to_*
functions. These tsc conversion functions are called by the art_to_*
functions.

>
> If instead there were a counter_to_rawmono64() which took the counter
> value and maybe the name of the clocksource (if the strncmp is
> affordable for your use), it might be easier for the core to provide
> an error if the current timekeeping clocksource isn't the one the
> counter value is based on. This would also allow the tsc_to_*()
> midlayers to be dropped (since they don't seem to do much).
>
> thanks
> -john

Again, thanks for your input.

Chris
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-07-29 08:25:18

by Paul Bolle

[permalink] [raw]
Subject: Re: [PATCH 3/5] Add calls to translate Always Running Timer (ART) to system time

On ma, 2015-07-27 at 17:46 -0700, Christopher Hall wrote:
> --- /dev/null
> +++ b/arch/x86/include/asm/art.h

> +#ifndef CONFIG_X86_ART
> +
> +static inline int setup_art(void)
> +{
> + return 0;
> +}
> +
> +static inline bool has_art(void)
> +{
> + return false;
> +}
> +
> +static inline int art_to_rawmono64(struct timespec64 *rawmono, cycle_t art)
> +{
> + return -ENXIO;
> +}
> +static inline int art_to_realtime64(struct timespec64 *realtime, cycle_t art)
> +{
> + return -ENXIO;
> +}
> +static inline int art_to_mono64(struct timespec64 *mono, cycle_t art)
> +{
> + return -ENXIO;
> +}
> +
> +#else
> +
> +extern int setup_art(void);
> +extern bool has_art(void);
> +extern int art_to_rawmono64(struct timespec64 *rawmono, cycle_t art);
> +extern int art_to_realtime64(struct timespec64 *realtime, cycle_t art);
> +extern int art_to_mono64(struct timespec64 *mono, cycle_t art);
> +
> +#endif

> --- /dev/null
> +++ b/arch/x86/kernel/art.c

> +static bool has_art(void)
> +{
> + return art_present;
> +}
> +EXPORT_SYMBOL(has_art);

This exports a static function. Does that work?

> +static int art_to_rawmono64(struct timespec64 *rawmono, cycle_t art)
> +{
> + if (!checked_art_to_tsc(&art))
> + return -ENXIO;
> + return tsc_to_rawmono64(rawmono, art);
> +}
> +EXPORT_SYMBOL(art_to_rawmono64);
> +
> +static int art_to_realtime64(struct timespec64 *realtime, cycle_t art)
> +{
> + if (!checked_art_to_tsc(&art))
> + return -ENXIO;
> + return tsc_to_realtime64(realtime, art);
> +}
> +EXPORT_SYMBOL(art_to_realtime64);
> +
> +static int art_to_mono64(struct timespec64 *mono, cycle_t art)
> +{
> + if (!checked_art_to_tsc(&art))
> + return -ENXIO;
> + return tsc_to_mono64(mono, art);
> +}
> +EXPORT_SYMBOL(art_to_mono64);

Ditto (three times).

By the way, this series doesn't add users for art_to_mono64() and
art_to_mono64(), right?

Thanks,


Paul Bolle

2015-07-29 10:19:35

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/5] Add functions producing system time given a backing counter value

On Mon, 27 Jul 2015, John Stultz wrote:
> On Mon, Jul 27, 2015 at 8:44 PM, John Stultz <[email protected]> wrote:
> > On Mon, Jul 27, 2015 at 5:46 PM, Christopher Hall
> > <[email protected]> wrote:
> >> * counter_to_rawmono64
> >> * counter_to_mono64
> >> * counter_to_realtime64
> >>
> >> Enables drivers to translate a captured system clock counter to system
> >> time. This is useful for network and audio devices that capture timestamps
> >> in terms of both the system clock and device clock.
> >
> > Huh. So for counter_to_realtime64 & mono64, this seems to ignore the
> > fact that the multiplier is constantly adjusted and corrected. So that
> > calling the function twice with the same counter value may result in
> > different returned values.
> >
> > I've not yet groked the whole patchset, but it seems like there needs
> > to be some mechanism that ensures the counter value is captured and
> > used in the same (or at least close) interval that the timekeeper data
> > is valid for.
>
>
> So reading through. It looks like you only use art_to_realtime(), right?
>
> So again, since CLOCK_MONOTONIC and CLOCK_REALTIME are constaly being
> frequency adjusted, it might be best to construct this delta in the
> following way.
>
>
> Add counter_to_rawmono64(), which should be able to safely calculate
> the corresponding CLOCK_MONOTONIC_RAW time from any given cycle value.
>
> Use getnstime_raw_and_real() to get a immediate snapshot of current
> MONOTONIC_RAW and REALTIME clocks.
>
> Then calculate the delta between the snapshotted counter raw time, and
> the current raw time. Then apply that offset to the current realtime.
>
> The larger the raw-time delta, the larger the possible realtime error.
> But I think that will be as good as it gets.

I think that's still not the right approach. The whole purpose of this
is to get a precise snapshot of

- PTP time from the ETH device
and
- current system time

Right now this does

ktime_get_real();
read_ptp_time();

which is obviously not precise.

The new hardware allows you to latch PTP time and ART time atomically
in the ETH device and read them out.

ART is the base clock of the TSC where

TSC = K + (ART * n) / d;

So for this to work proper, we need a function which converts ART to
TSC. This is obviously x86/TSC specific code.

Now on the PTP side we need a callback provided by the device driver
to get the snapshot of the PTP and the ART.

So the proposed implementation merily calls that callback from the PTP
ioctl and then tries to do a magic CLOCK_REALTIME conversion of the
ART value. But that's just wrong as it does not guarantee a proper
correlation to the core timekeeping.

So what we really need is a function in the timekeeper core which gets
the PTP/ART timestamps from the device under the timekeeper sequence
counter and converts to clock realtime and raw monotonic.

That function is then called from the PTP ioctl.

Anything else is just 'lets hope it works and is precise enough'
voodoo.

Something like the below untested patch should be all we need for PTP
to be as precise as possible.

I don't know whether we need functionality to convert arbitrary
timestamps at all, but if we really need them then they are going to
be pretty simple and explicitely not precise for anything else than
clock monotonic raw. But that's a different story.

Lets concentrate on PTP first and talk about the other stuff once we
settled the use case which actually has a precision requirement.

Thanks,

tglx

----------------------------------------->
Subject: ptp: Get sync timestamps
From: Thomas Gleixner <[email protected]>
Date: Wed, 29 Jul 2015 10:52:06 +0200

The ART stuff wants to be splitted out.

Not-Signed-off-by: Thomas Gleixner <[email protected]>
---
arch/x86/kernel/tsc.c | 27 ++++++++++++++++++
include/linux/clocksource.h | 30 ++++++++++++++++++++
include/linux/timekeeping.h | 4 ++
kernel/time/timekeeping.c | 63 ++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 124 insertions(+)

Index: linux/arch/x86/kernel/tsc.c
===================================================================
--- linux.orig/arch/x86/kernel/tsc.c
+++ linux/arch/x86/kernel/tsc.c
@@ -1059,6 +1059,23 @@ int unsynchronized_tsc(void)
return 0;
}

+static u32 tsc_numerator;
+static u32 tsc_denominator;
+/*
+ * CHECKME: Do we need the adjust value? It should be 0, but if we run
+ * in a VM this might be a different story.
+ */
+static u64 tsc_adjust;
+
+static u64 art_to_tsc(u64 cycles)
+{
+ /* FIXME: This needs 128bit math to work proper */
+ return tsc_adjust + (cycles * tsc_numerator) / tsc_denominator;
+}
+
+struct correlated_cs art_timestamper = {
+ .convert = art_to_tsc,
+};

static void tsc_refine_calibration_work(struct work_struct *work);
static DECLARE_DELAYED_WORK(tsc_irqwork, tsc_refine_calibration_work);
@@ -1129,6 +1146,16 @@ static void tsc_refine_calibration_work(
(unsigned long)tsc_khz / 1000,
(unsigned long)tsc_khz % 1000);

+ /*
+ * TODO:
+ *
+ * If the system has ART, initialize the art_to_tsc conversion
+ * and set: art_timestamp.related_cs = &tsc_clocksource.
+ *
+ * Before that point a call to get_correlated_timestamp will
+ * fail the clocksource match check and return -ENODEV
+ */
+
out:
clocksource_register_khz(&clocksource_tsc, tsc_khz);
}
Index: linux/include/linux/clocksource.h
===================================================================
--- linux.orig/include/linux/clocksource.h
+++ linux/include/linux/clocksource.h
@@ -258,4 +258,34 @@ void acpi_generic_timer_init(void);
static inline void acpi_generic_timer_init(void) { }
#endif

+/**
+ * struct correlated_cs - Descriptor for a clocksource correlated to another clocksource
+ * @related_cs: Pointer to the related timekeeping clocksource
+ * @convert: Conversion function to convert a timestamp from
+ * the correlated clocksource to cycles of the related
+ * timekeeping clocksource
+ */
+struct correlated_cs {
+ struct clocksource *related_cs;
+ u64 (*convert)(u64 cycles);
+};
+
+struct correlated_ts;
+
+/**
+ * struct correlated_ts - Descriptor for taking a correlated time stamp
+ * @get_ts: Function to read out a synced system and device
+ * timestamp
+ * @system_ts: The raw system clock timestamp
+ * @device_ts: The raw device timestamp
+ * @system_real: @system_ts converted to CLOCK_REALTIME
+ * @system_raw: @system_ts converted to CLOCK_MONOTONIC_RAW
+ */
+struct correlated_ts {
+ int (*get_ts)(struct correlated_ts *ts);
+ u64 system_ts;
+ u64 device_ts;
+ u64 system_real;
+ u64 system_raw;
+};
#endif /* _LINUX_CLOCKSOURCE_H */
Index: linux/include/linux/timekeeping.h
===================================================================
--- linux.orig/include/linux/timekeeping.h
+++ linux/include/linux/timekeeping.h
@@ -258,6 +258,10 @@ extern void timekeeping_inject_sleeptime
*/
extern void getnstime_raw_and_real(struct timespec *ts_raw,
struct timespec *ts_real);
+struct correlated_ts;
+struct correlated_cs;
+extern int get_correlated_timestamp(struct correlated_ts *crt,
+ struct correlated_cs *crs);

/*
* Persistent clock related interfaces
Index: linux/kernel/time/timekeeping.c
===================================================================
--- linux.orig/kernel/time/timekeeping.c
+++ linux/kernel/time/timekeeping.c
@@ -312,6 +312,19 @@ static inline s64 timekeeping_get_ns(str
return nsec + arch_gettimeoffset();
}

+static inline s64 timekeeping_convert_to_ns(struct tk_read_base *tkr,
+ cycle_t cycles)
+{
+ cycle_t delta;
+ s64 nsec;
+
+ /* calculate the delta since the last update_wall_time */
+ delta = clocksource_delta(cycles, tkr->cycle_last, tkr->mask);
+
+ nsec = delta * tkr->mult + tkr->xtime_nsec;
+ return nsec >> tkr->shift;
+}
+
/**
* update_fast_timekeeper - Update the fast and NMI safe monotonic timekeeper.
* @tkr: Timekeeping readout base from which we take the update
@@ -885,6 +898,56 @@ EXPORT_SYMBOL(getnstime_raw_and_real);
#endif /* CONFIG_NTP_PPS */

/**
+ * get_correlated_timestamp - Get a correlated timestamp
+ *
+ * Reads a timestamp from a device and correlates it to system time
+ */
+int get_correlated_timestamp(struct correlated_ts *crt,
+ struct correlated_cs *crs)
+{
+ struct timekeeper *tk = &tk_core.timekeeper;
+ unsigned long seq;
+ cycles_t cycles;
+ ktime_t base;
+ s64 nsecs;
+ int ret;
+
+ do {
+ seq = read_seqcount_begin(&tk_core.seq);
+ /*
+ * Verify that the correlated clocksoure is related to
+ * the currently installed timekeeper clocksoure
+ */
+ if (tk->tkr_mono.clock != crs->related_cs)
+ return -ENODEV;
+
+ /*
+ * Try to get a timestamp from the device.
+ */
+ ret = crt->get_ts(crt);
+ if (ret)
+ return ret;
+
+ /*
+ * Convert the timestamp to timekeeper clock cycles
+ */
+ cycles = crs->convert(crs, crt->system_ts);
+
+ /* Convert to clock realtime */
+ base = ktime_add(tk->tkr_mono.base, tk_core.timekeeper.offs_real);
+ nsecs = timekeeping_convert_to_ns(&tk->tkr_mono, cycles);
+ crt->system_real = ktime_add_ns(base, nsecs);
+
+ /* Convert to clock raw monotonic */
+ base = tk->tkr_raw.base;
+ nsecs = timekeeping_convert_to_ns(&tk->tkr_raw, cycles);
+ crt->system_raw = ktime_add_ns(base, nsecs);
+
+ } while (read_seqcount_retry(&tk_core.seq, seq));
+ return 0;
+}
+
+/**
* do_gettimeofday - Returns the time of day in a timeval
* @tv: pointer to the timeval to be set
*

2015-07-29 10:23:52

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/5] Add functions producing system time given a backing counter value

On Wed, 29 Jul 2015, Thomas Gleixner wrote:
> On Mon, 27 Jul 2015, John Stultz wrote:
> > On Mon, Jul 27, 2015 at 8:44 PM, John Stultz <[email protected]> wrote:
> > > On Mon, Jul 27, 2015 at 5:46 PM, Christopher Hall
> > > <[email protected]> wrote:
> > >> * counter_to_rawmono64
> > >> * counter_to_mono64
> > >> * counter_to_realtime64
> > >>
> > >> Enables drivers to translate a captured system clock counter to system
> > >> time. This is useful for network and audio devices that capture timestamps
> > >> in terms of both the system clock and device clock.
> > >
> > > Huh. So for counter_to_realtime64 & mono64, this seems to ignore the
> > > fact that the multiplier is constantly adjusted and corrected. So that
> > > calling the function twice with the same counter value may result in
> > > different returned values.
> > >
> > > I've not yet groked the whole patchset, but it seems like there needs
> > > to be some mechanism that ensures the counter value is captured and
> > > used in the same (or at least close) interval that the timekeeper data
> > > is valid for.
> >
> >
> > So reading through. It looks like you only use art_to_realtime(), right?
> >
> > So again, since CLOCK_MONOTONIC and CLOCK_REALTIME are constaly being
> > frequency adjusted, it might be best to construct this delta in the
> > following way.
> >
> >
> > Add counter_to_rawmono64(), which should be able to safely calculate
> > the corresponding CLOCK_MONOTONIC_RAW time from any given cycle value.
> >
> > Use getnstime_raw_and_real() to get a immediate snapshot of current
> > MONOTONIC_RAW and REALTIME clocks.
> >
> > Then calculate the delta between the snapshotted counter raw time, and
> > the current raw time. Then apply that offset to the current realtime.
> >
> > The larger the raw-time delta, the larger the possible realtime error.
> > But I think that will be as good as it gets.
>
> I think that's still not the right approach. The whole purpose of this
> is to get a precise snapshot of
>
> - PTP time from the ETH device
> and
> - current system time
>
> Right now this does
>
> ktime_get_real();
> read_ptp_time();
>
> which is obviously not precise.
>
> The new hardware allows you to latch PTP time and ART time atomically
> in the ETH device and read them out.
>
> ART is the base clock of the TSC where
>
> TSC = K + (ART * n) / d;
>
> So for this to work proper, we need a function which converts ART to
> TSC. This is obviously x86/TSC specific code.
>
> Now on the PTP side we need a callback provided by the device driver
> to get the snapshot of the PTP and the ART.
>
> So the proposed implementation merily calls that callback from the PTP
> ioctl and then tries to do a magic CLOCK_REALTIME conversion of the
> ART value. But that's just wrong as it does not guarantee a proper
> correlation to the core timekeeping.
>
> So what we really need is a function in the timekeeper core which gets
> the PTP/ART timestamps from the device under the timekeeper sequence
> counter and converts to clock realtime and raw monotonic.
>
> That function is then called from the PTP ioctl.
>
> Anything else is just 'lets hope it works and is precise enough'
> voodoo.
>
> Something like the below untested patch should be all we need for PTP
> to be as precise as possible.
>
> I don't know whether we need functionality to convert arbitrary
> timestamps at all, but if we really need them then they are going to
> be pretty simple and explicitely not precise for anything else than
> clock monotonic raw. But that's a different story.
>
> Lets concentrate on PTP first and talk about the other stuff once we
> settled the use case which actually has a precision requirement.

Gah crap. Picked a stale version of the patch.

Thanks,

tglx

----------------------------------------->
Subject: ptp: Get sync timestamps
From: Thomas Gleixner <[email protected]>
Date: Wed, 29 Jul 2015 10:52:06 +0200

The ART stuff wants to be splitted out.

Not-Signed-off-by: Thomas Gleixner <[email protected]>
---
arch/x86/kernel/tsc.c | 31 +++++++++++++++++++++
include/linux/clocksource.h | 30 ++++++++++++++++++++
include/linux/timekeeping.h | 4 ++
kernel/time/timekeeping.c | 63 ++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 128 insertions(+)

Index: linux/arch/x86/kernel/tsc.c
===================================================================
--- linux.orig/arch/x86/kernel/tsc.c
+++ linux/arch/x86/kernel/tsc.c
@@ -1059,6 +1059,27 @@ int unsynchronized_tsc(void)
return 0;
}

+static u32 tsc_numerator;
+static u32 tsc_denominator;
+/*
+ * CHECKME: Do we need the adjust value? It should be 0, but if we run
+ * in a VM this might be a different story.
+ */
+static u64 tsc_adjust;
+
+static u64 art_to_tsc(u64 cycles)
+{
+ u64 tmp, res = tsc_adjust;
+
+ res += (cycles / tsc_denominator) * tsc_numerator;
+ tmp = (cycles % tsc_denominator) * tsc_numerator;
+ res += tmp / tsc_denominator;
+ return res;
+}
+
+struct correlated_cs art_timestamper = {
+ .convert = art_to_tsc,
+};

static void tsc_refine_calibration_work(struct work_struct *work);
static DECLARE_DELAYED_WORK(tsc_irqwork, tsc_refine_calibration_work);
@@ -1129,6 +1150,16 @@ static void tsc_refine_calibration_work(
(unsigned long)tsc_khz / 1000,
(unsigned long)tsc_khz % 1000);

+ /*
+ * TODO:
+ *
+ * If the system has ART, initialize the art_to_tsc conversion
+ * and set: art_timestamp.related_cs = &tsc_clocksource.
+ *
+ * Before that point a call to get_correlated_timestamp will
+ * fail the clocksource match check.
+ */
+
out:
clocksource_register_khz(&clocksource_tsc, tsc_khz);
}
Index: linux/include/linux/clocksource.h
===================================================================
--- linux.orig/include/linux/clocksource.h
+++ linux/include/linux/clocksource.h
@@ -258,4 +258,34 @@ void acpi_generic_timer_init(void);
static inline void acpi_generic_timer_init(void) { }
#endif

+/**
+ * struct correlated_cs - Descriptor for a clocksource correlated to another clocksource
+ * @related_cs: Pointer to the related timekeeping clocksource
+ * @convert: Conversion function to convert a timestamp from
+ * the correlated clocksource to cycles of the related
+ * timekeeping clocksource
+ */
+struct correlated_cs {
+ struct clocksource *related_cs;
+ u64 (*convert)(u64 cycles);
+};
+
+struct correlated_ts;
+
+/**
+ * struct correlated_ts - Descriptor for taking a correlated time stamp
+ * @get_ts: Function to read out a synced system and device
+ * timestamp
+ * @system_ts: The raw system clock timestamp
+ * @device_ts: The raw device timestamp
+ * @system_real: @system_ts converted to CLOCK_REALTIME
+ * @system_raw: @system_ts converted to CLOCK_MONOTONIC_RAW
+ */
+struct correlated_ts {
+ int (*get_ts)(struct correlated_ts *ts);
+ u64 system_ts;
+ u64 device_ts;
+ u64 system_real;
+ u64 system_raw;
+};
#endif /* _LINUX_CLOCKSOURCE_H */
Index: linux/include/linux/timekeeping.h
===================================================================
--- linux.orig/include/linux/timekeeping.h
+++ linux/include/linux/timekeeping.h
@@ -258,6 +258,10 @@ extern void timekeeping_inject_sleeptime
*/
extern void getnstime_raw_and_real(struct timespec *ts_raw,
struct timespec *ts_real);
+struct correlated_ts;
+struct correlated_cs;
+extern int get_correlated_timestamp(struct correlated_ts *crt,
+ struct correlated_cs *crs);

/*
* Persistent clock related interfaces
Index: linux/kernel/time/timekeeping.c
===================================================================
--- linux.orig/kernel/time/timekeeping.c
+++ linux/kernel/time/timekeeping.c
@@ -312,6 +312,19 @@ static inline s64 timekeeping_get_ns(str
return nsec + arch_gettimeoffset();
}

+static inline s64 timekeeping_convert_to_ns(struct tk_read_base *tkr,
+ cycle_t cycles)
+{
+ cycle_t delta;
+ s64 nsec;
+
+ /* calculate the delta since the last update_wall_time */
+ delta = clocksource_delta(cycles, tkr->cycle_last, tkr->mask);
+
+ nsec = delta * tkr->mult + tkr->xtime_nsec;
+ return nsec >> tkr->shift;
+}
+
/**
* update_fast_timekeeper - Update the fast and NMI safe monotonic timekeeper.
* @tkr: Timekeeping readout base from which we take the update
@@ -885,6 +898,56 @@ EXPORT_SYMBOL(getnstime_raw_and_real);
#endif /* CONFIG_NTP_PPS */

/**
+ * get_correlated_timestamp - Get a correlated timestamp
+ *
+ * Reads a timestamp from a device and correlates it to system time
+ */
+int get_correlated_timestamp(struct correlated_ts *crt,
+ struct correlated_cs *crs)
+{
+ struct timekeeper *tk = &tk_core.timekeeper;
+ unsigned long seq;
+ cycles_t cycles;
+ ktime_t base;
+ s64 nsecs;
+ int ret;
+
+ do {
+ seq = read_seqcount_begin(&tk_core.seq);
+ /*
+ * Verify that the correlated clocksoure is related to
+ * the currently installed timekeeper clocksoure
+ */
+ if (tk->tkr_mono.clock != crs->related_cs)
+ return -ENODEV;
+
+ /*
+ * Try to get a timestamp from the device.
+ */
+ ret = crt->get_ts(crt);
+ if (ret)
+ return ret;
+
+ /*
+ * Convert the timestamp to timekeeper clock cycles
+ */
+ cycles = crs->convert(crs, crt->system_ts);
+
+ /* Convert to clock realtime */
+ base = ktime_add(tk->tkr_mono.base, tk_core.timekeeper.offs_real);
+ nsecs = timekeeping_convert_to_ns(&tk->tkr_mono, cycles);
+ crt->system_real = ktime_add_ns(base, nsecs);
+
+ /* Convert to clock raw monotonic */
+ base = tk->tkr_raw.base;
+ nsecs = timekeeping_convert_to_ns(&tk->tkr_raw, cycles);
+ crt->system_raw = ktime_add_ns(base, nsecs);
+
+ } while (read_seqcount_retry(&tk_core.seq, seq));
+ return 0;
+}
+
+/**
* do_gettimeofday - Returns the time of day in a timeval
* @tv: pointer to the timeval to be set
*

2015-07-29 10:49:55

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/5] Add functions producing system time given a backing counter value

On Wed, Jul 29, 2015 at 12:19:06PM +0200, Thomas Gleixner wrote:
> I don't know whether we need functionality to convert arbitrary
> timestamps at all, but if we really need them then they are going to
> be pretty simple and explicitely not precise for anything else than
> clock monotonic raw. But that's a different story.

I think we need that too, and agreed, given NTP anything other than
MONO_RAW is going to be fuzzy at best.

> Index: linux/arch/x86/kernel/tsc.c
> ===================================================================
> --- linux.orig/arch/x86/kernel/tsc.c
> +++ linux/arch/x86/kernel/tsc.c
> @@ -1059,6 +1059,23 @@ int unsynchronized_tsc(void)
> return 0;
> }
>
> +static u32 tsc_numerator;
> +static u32 tsc_denominator;
> +/*
> + * CHECKME: Do we need the adjust value? It should be 0, but if we run
> + * in a VM this might be a different story.
> + */
> +static u64 tsc_adjust;
> +
> +static u64 art_to_tsc(u64 cycles)
> +{
> + /* FIXME: This needs 128bit math to work proper */
> + return tsc_adjust + (cycles * tsc_numerator) / tsc_denominator;

Yeah, its really unfortunate its given as a divisor and not a shift.
That said I think, at least on the initial hardware, its 2, so:

return mul_u64_u32_shr(cycles, tsc_numerator, 1);

Should do, given that TSC_ADJUST had better be 0.

> +}



> + * get_correlated_timestamp - Get a correlated timestamp
> + *
> + * Reads a timestamp from a device and correlates it to system time
> + */
> +int get_correlated_timestamp(struct correlated_ts *crt,
> + struct correlated_cs *crs)
> +{
> + struct timekeeper *tk = &tk_core.timekeeper;
> + unsigned long seq;
> + cycles_t cycles;
> + ktime_t base;
> + s64 nsecs;
> + int ret;
> +
> + do {
> + seq = read_seqcount_begin(&tk_core.seq);
> + /*
> + * Verify that the correlated clocksoure is related to
> + * the currently installed timekeeper clocksoure
> + */
> + if (tk->tkr_mono.clock != crs->related_cs)
> + return -ENODEV;
> +
> + /*
> + * Try to get a timestamp from the device.
> + */
> + ret = crt->get_ts(crt);
> + if (ret)
> + return ret;
> +
> + /*
> + * Convert the timestamp to timekeeper clock cycles
> + */
> + cycles = crs->convert(crs, crt->system_ts);
> +
> + /* Convert to clock realtime */
> + base = ktime_add(tk->tkr_mono.base, tk_core.timekeeper.offs_real);
> + nsecs = timekeeping_convert_to_ns(&tk->tkr_mono, cycles);
> + crt->system_real = ktime_add_ns(base, nsecs);
> +
> + /* Convert to clock raw monotonic */
> + base = tk->tkr_raw.base;
> + nsecs = timekeeping_convert_to_ns(&tk->tkr_raw, cycles);
> + crt->system_raw = ktime_add_ns(base, nsecs);
> +
> + } while (read_seqcount_retry(&tk_core.seq, seq));
> + return 0;
> +}

This is still fuzzy, right? The hardware ART timestamp could be from
_before_ the NTP adjust; or the NTP adjust could happen while we do this
conversion and we'll take the retry loop.

In both cases, the resulting value is computed using a different slope
than was in effect at the time of the stamp. With the end result being
slightly off from what it should be.

With the PTP case the ART timestamp is very recent, so the fuzz is
smallest, but its still there.

Any other ART users (buffered ETH frames) the delay will only get
bigger.

2015-07-29 10:51:16

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/5] Add functions producing system time given a backing counter value

On Wed, Jul 29, 2015 at 12:23:27PM +0200, Thomas Gleixner wrote:
> +/*
> + * CHECKME: Do we need the adjust value? It should be 0, but if we run
> + * in a VM this might be a different story.
> + */
> +static u64 tsc_adjust;

Urgh, VMs have !0 values here?

2015-07-29 11:48:48

by Richard Cochran

[permalink] [raw]
Subject: Re: [PATCH 1/5] Add functions producing system time given a backing counter value

On Wed, Jul 29, 2015 at 12:49:44PM +0200, Peter Zijlstra wrote:
> This is still fuzzy, right? The hardware ART timestamp could be from
> _before_ the NTP adjust; or the NTP adjust could happen while we do this
> conversion and we'll take the retry loop.

In the original series, yes.

> In both cases, the resulting value is computed using a different slope
> than was in effect at the time of the stamp. With the end result being
> slightly off from what it should be.

In Thomas' patch the get_ts() is meant to read fresh pairs of time
stamps from within the loop.


Thanks,
Richard

2015-07-29 12:30:13

by Richard Cochran

[permalink] [raw]
Subject: Re: [PATCH 1/5] Add functions producing system time given a backing counter value

On Wed, Jul 29, 2015 at 12:23:27PM +0200, Thomas Gleixner wrote:
> > Something like the below untested patch should be all we need for PTP
> > to be as precise as possible.

Yes, that is as good as it can be. The code protects against
concurrent NTP adjustments, and the PTP driver will have to block
changes to its clock during the ioctl.

> > I don't know whether we need functionality to convert arbitrary
> > timestamps at all, but if we really need them then they are going to
> > be pretty simple and explicitely not precise for anything else than
> > clock monotonic raw. But that's a different story.
> >
> > Lets concentrate on PTP first and talk about the other stuff once we
> > settled the use case which actually has a precision requirement.

The PTP ioctl only needs the REALTIME value, and so the MONO-RAW bit
could be dropped for now.

Thanks,
Richard


2015-07-29 12:36:06

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/5] Add functions producing system time given a backing counter value

On Wed, Jul 29, 2015 at 01:48:41PM +0200, Richard Cochran wrote:
> On Wed, Jul 29, 2015 at 12:49:44PM +0200, Peter Zijlstra wrote:
> > This is still fuzzy, right? The hardware ART timestamp could be from
> > _before_ the NTP adjust; or the NTP adjust could happen while we do this
> > conversion and we'll take the retry loop.
>
> In the original series, yes.
>
> > In both cases, the resulting value is computed using a different slope
> > than was in effect at the time of the stamp. With the end result being
> > slightly off from what it should be.
>
> In Thomas' patch the get_ts() is meant to read fresh pairs of time
> stamps from within the loop.

Ah, indeed, I missed that. Yes if that's possible we get it right.

2015-07-29 12:52:55

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/5] Add functions producing system time given a backing counter value

On Wed, 29 Jul 2015, Peter Zijlstra wrote:
> On Wed, Jul 29, 2015 at 12:19:06PM +0200, Thomas Gleixner wrote:
> > I don't know whether we need functionality to convert arbitrary
> > timestamps at all, but if we really need them then they are going to
> > be pretty simple and explicitely not precise for anything else than
> > clock monotonic raw. But that's a different story.
>
> I think we need that too, and agreed, given NTP anything other than
> MONO_RAW is going to be fuzzy at best.

Yes, but that's a trivial case.

> > +static u64 art_to_tsc(u64 cycles)
> > +{
> > + /* FIXME: This needs 128bit math to work proper */
> > + return tsc_adjust + (cycles * tsc_numerator) / tsc_denominator;
>
> Yeah, its really unfortunate its given as a divisor and not a shift.
> That said I think, at least on the initial hardware, its 2, so:
>
> return mul_u64_u32_shr(cycles, tsc_numerator, 1);
>
> Should do, given that TSC_ADJUST had better be 0.

I don't trust BIOS folks :)

+ u64 tmp, res = tsc_adjust;
+
+ res += (cycles / tsc_denominator) * tsc_numerator;
+ tmp = (cycles % tsc_denominator) * tsc_numerator;
+ res += tmp / tsc_denominator;
+ return res;

That's what I have in my final patch.

> > + do {
> > + seq = read_seqcount_begin(&tk_core.seq);
> > + /*
> > + * Verify that the correlated clocksoure is related to
> > + * the currently installed timekeeper clocksoure
> > + */
> > + if (tk->tkr_mono.clock != crs->related_cs)
> > + return -ENODEV;
> > +
> > + /*
> > + * Try to get a timestamp from the device.
> > + */
> > + ret = crt->get_ts(crt);
> > + if (ret)
> > + return ret;
> > +
> > + /*
> > + * Convert the timestamp to timekeeper clock cycles
> > + */
> > + cycles = crs->convert(crs, crt->system_ts);
> > +
> > + /* Convert to clock realtime */
> > + base = ktime_add(tk->tkr_mono.base, tk_core.timekeeper.offs_real);
> > + nsecs = timekeeping_convert_to_ns(&tk->tkr_mono, cycles);
> > + crt->system_real = ktime_add_ns(base, nsecs);
> > +
> > + /* Convert to clock raw monotonic */
> > + base = tk->tkr_raw.base;
> > + nsecs = timekeeping_convert_to_ns(&tk->tkr_raw, cycles);
> > + crt->system_raw = ktime_add_ns(base, nsecs);
> > +
> > + } while (read_seqcount_retry(&tk_core.seq, seq));
> > + return 0;
> > +}
>
> This is still fuzzy, right? The hardware ART timestamp could be from
> _before_ the NTP adjust; or the NTP adjust could happen while we do this
> conversion and we'll take the retry loop.

I read the timestamp pair in the loop, so its always consistent.

> Any other ART users (buffered ETH frames) the delay will only get
> bigger.

That's a different story and we really can only convert this to
monotonic raw.

Thanks,

tglx

2015-07-29 14:05:43

by Thomas Gleixner

[permalink] [raw]
Subject: RE: [PATCH 1/5] Add functions producing system time given a backing counter value

On Wed, 29 Jul 2015, Hall, Christopher S wrote:
>
> > -----Original Message-----
> > From: John Stultz [mailto:[email protected]]
> > Sent: Monday, July 27, 2015 8:44 PM
> > To: Hall, Christopher S
> > Cc: Thomas Gleixner; Richard Cochran; Ingo Molnar; Kirsher, Jeffrey T;
> > Ronciak, John; H. Peter Anvin; [email protected]; lkml;
> > [email protected]
> > Subject: Re: [PATCH 1/5] Add functions producing system time given a
> > backing counter value
> >
> > On Mon, Jul 27, 2015 at 5:46 PM, Christopher Hall
> > <[email protected]> wrote:
> > > * counter_to_rawmono64
> > > * counter_to_mono64
> > > * counter_to_realtime64
> > >
> > > Enables drivers to translate a captured system clock counter to system
> > > time. This is useful for network and audio devices that capture
> > timestamps
> > > in terms of both the system clock and device clock.
> >
> > Huh. So for counter_to_realtime64 & mono64, this seems to ignore the
> > fact that the multiplier is constantly adjusted and corrected. So that
> > calling the function twice with the same counter value may result in
> > different returned values.
> >
> > I've not yet groked the whole patchset, but it seems like there needs
> > to be some mechanism that ensures the counter value is captured and
> > used in the same (or at least close) interval that the timekeeper data
> > is valid for.
>
> The ART (and derived TSC) values are always in the past. There's no
> chance that we could exceed the interval. I don't think any similar
> usage would be a problem either.
>
> Are you suggesting that, for completeness, this be enforced by the
> conversion function?
>
> I do a check here to make sure that the current counter value isn't before
> the beginning of the current interval:
>
> timekeeping_get_delta()
> ...
> if (cycle_now < tkr->cycle_last &&
> tkr->cycle_last - cycle_now < ROLLOVER_THRESHOLD)
> return -EAGAIN;
>
> If tkr->cycle_last - cycle_now is large, the assumption is that
> rollover occurred. Otherwise, the caller should re-read the counter
> so that it falls within the current interval. In my "normal use"
> testing, re-read never occurred.

Sure that never happens, because your rollover value is 2 << 39 for
whatever reasons.

So on a 1GHz machine that is (2 << 39) / 1e9 ~= 1099.51 seconds.

Oh well.