2006-11-14 01:06:54

by Suleiman Souhlal

[permalink] [raw]
Subject: [PATCH 0/2] Make the TSC safe to be used by gettimeofday().

I've had a proof-of-concept for this since August, and finally got around to
somewhat cleaning it up.

It can certainly still be improved, namely by using vgetcpu() instead of CPUID
to find the cpu number (but I couldn't get it to work, when I tried).
Another possible improvement would be to use RDTSCP when available.
There's also a small race in do_gettimeofday(), vgettimeofday() and
vmonotonic_clock() but I've never seen it happen.

Suggestions are welcome.

-- Suleiman



2006-11-14 01:08:33

by Suleiman Souhlal

[permalink] [raw]
Subject: [PATCH 1/2] Make the TSC safe to be used by gettimeofday().

This is done by a per-cpu vxtime structure that stores the last TSC and HPET
values.

Whenever we switch to a userland process after a HLT instruction has been
executed or after the CPU frequency has changed, we force a new read of the
TSC, HPET and xtime so that we know the correct frequency we have to deal
with.

With this, we can safely use RDTSC in gettimeofday() in CPUs where the
TSCs are not synchronized, such as Opterons, instead of doing a very expensive
HPET read.

Signed-off-by: Suleiman Souhlal <[email protected]>

---
arch/x86_64/kernel/process.c | 23 ++++++-
arch/x86_64/kernel/time.c | 38 ++++++++++--
arch/x86_64/kernel/vmlinux.lds.S | 6 +-
arch/x86_64/kernel/vsyscall.c | 121 +++++++++++++++++++++++++++++++-------
include/asm-x86_64/timex.h | 2 +
include/asm-x86_64/vsyscall.h | 13 ++++
include/linux/hrtimer.h | 2 +
7 files changed, 170 insertions(+), 35 deletions(-)

diff --git a/arch/x86_64/kernel/process.c b/arch/x86_64/kernel/process.c
index 49f7fac..b80bc00 100644
--- a/arch/x86_64/kernel/process.c
+++ b/arch/x86_64/kernel/process.c
@@ -51,6 +51,7 @@ #include <asm/desc.h>
#include <asm/proto.h>
#include <asm/ia32.h>
#include <asm/idle.h>
+#include <asm/vsyscall.h>

asmlinkage extern void ret_from_fork(void);

@@ -109,15 +110,21 @@ void exit_idle(void)
*/
static void default_idle(void)
{
+ int cpu;
+
+ cpu = hard_smp_processor_id();
+
local_irq_enable();

current_thread_info()->status &= ~TS_POLLING;
smp_mb__after_clear_bit();
while (!need_resched()) {
local_irq_disable();
- if (!need_resched())
+ if (!need_resched()) {
+ if (vxtime.pcpu[cpu].need_update == 0)
+ vxtime.pcpu[cpu].need_update = 1;
safe_halt();
- else
+ } else
local_irq_enable();
}
current_thread_info()->status |= TS_POLLING;
@@ -560,7 +567,7 @@ __switch_to(struct task_struct *prev_p,
{
struct thread_struct *prev = &prev_p->thread,
*next = &next_p->thread;
- int cpu = smp_processor_id();
+ int apicid, cpu = smp_processor_id();
struct tss_struct *tss = &per_cpu(init_tss, cpu);

/* we're going to use this soon, after a few expensive things */
@@ -657,6 +664,16 @@ #endif
*/
if (next_p->fpu_counter>5)
math_state_restore();
+
+ apicid = hard_smp_processor_id();
+
+ /*
+ * We will need to also do this when switching to kernel tasks if we
+ * want to use the per-cpu monotonic_clock in the kernel
+ */
+ if (vxtime.pcpu[apicid].need_update == 1 && next_p->mm != NULL)
+ vxtime_update_pcpu();
+
return prev_p;
}

diff --git a/arch/x86_64/kernel/time.c b/arch/x86_64/kernel/time.c
index 88722f1..55fefa2 100644
--- a/arch/x86_64/kernel/time.c
+++ b/arch/x86_64/kernel/time.c
@@ -118,14 +118,17 @@ unsigned int (*do_gettimeoffset)(void) =

void do_gettimeofday(struct timeval *tv)
{
- unsigned long seq;
+ unsigned long seq, t, x;
unsigned int sec, usec;
+ int cpu;

do {
seq = read_seqbegin(&xtime_lock);
+ preempt_disable();
+ cpu = hard_smp_processor_id();

- sec = xtime.tv_sec;
- usec = xtime.tv_nsec / NSEC_PER_USEC;
+ sec = vxtime.pcpu[cpu].tv_sec;
+ usec = vxtime.pcpu[cpu].tv_usec;

/* i386 does some correction here to keep the clock
monotonous even when ntpd is fixing drift.
@@ -135,9 +138,13 @@ void do_gettimeofday(struct timeval *tv)
be found. Note when you fix it here you need to do the same
in arch/x86_64/kernel/vsyscall.c and export all needed
variables in vmlinux.lds. -AK */
- usec += do_gettimeoffset();
+ t = get_cycles_sync();
+ x = (((t - vxtime.pcpu[cpu].last_tsc) *
+ vxtime.pcpu[cpu].tsc_nsquot) >> NS_SCALE) / NSEC_PER_USEC;
+ usec += x;

} while (read_seqretry(&xtime_lock, seq));
+ preempt_enable();

tv->tv_sec = sec + usec / USEC_PER_SEC;
tv->tv_usec = usec % USEC_PER_SEC;
@@ -624,10 +631,20 @@ #endif
cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new);

cpu_khz = cpufreq_scale(cpu_khz_ref, ref_freq, freq->new);
- if (!(freq->flags & CPUFREQ_CONST_LOOPS))
+ if (!(freq->flags & CPUFREQ_CONST_LOOPS)) {
+ int cpu;
+
+ cpu = cpu_physical_id(freq->cpu);
vxtime.tsc_quot = (USEC_PER_MSEC << US_SCALE) / cpu_khz;
+
+ vxtime.pcpu[cpu].tsc_nsquot = (NSEC_PER_SEC << NS_SCALE)
+ / (cpufreq_scale(cpu_khz_ref, ref_freq, freq->new) *
+ NSEC_PER_USEC);
+ if (vxtime.pcpu[cpu].need_update == 0)
+ vxtime.pcpu[cpu].need_update = 1;
+ }
}
-
+
set_cyc2ns_scale(cpu_khz_ref);

return 0;
@@ -887,6 +904,9 @@ time_cpu_notifier(struct notifier_block

void __init time_init(void)
{
+ char *timename;
+ int i;
+
if (nohpet)
vxtime.hpet_address = 0;

@@ -931,13 +951,17 @@ #endif
#ifndef CONFIG_SMP
time_init_gtod();
#endif
+
+ for (i = 0; i < NR_CPUS; i++)
+ vxtime.pcpu[i].tsc_nsquot = (NSEC_PER_SEC << NS_SCALE)
+ / (cpu_khz * NSEC_PER_USEC);
}

/*
* Make an educated guess if the TSC is trustworthy and synchronized
* over all CPUs.
*/
-__cpuinit int unsynchronized_tsc(void)
+int unsynchronized_tsc(void)
{
#ifdef CONFIG_SMP
if (apic_is_clustered_box())
diff --git a/arch/x86_64/kernel/vmlinux.lds.S b/arch/x86_64/kernel/vmlinux.lds.S
index edb24aa..b1a39d1 100644
--- a/arch/x86_64/kernel/vmlinux.lds.S
+++ b/arch/x86_64/kernel/vmlinux.lds.S
@@ -96,9 +96,6 @@ #define VVIRT(x) (ADDR(x) - VVIRT_OFFSET
.xtime_lock : AT(VLOAD(.xtime_lock)) { *(.xtime_lock) }
xtime_lock = VVIRT(.xtime_lock);

- .vxtime : AT(VLOAD(.vxtime)) { *(.vxtime) }
- vxtime = VVIRT(.vxtime);
-
.vgetcpu_mode : AT(VLOAD(.vgetcpu_mode)) { *(.vgetcpu_mode) }
vgetcpu_mode = VVIRT(.vgetcpu_mode);

@@ -119,6 +116,9 @@ #define VVIRT(x) (ADDR(x) - VVIRT_OFFSET
.vsyscall_2 ADDR(.vsyscall_0) + 2048: AT(VLOAD(.vsyscall_2)) { *(.vsyscall_2) }
.vsyscall_3 ADDR(.vsyscall_0) + 3072: AT(VLOAD(.vsyscall_3)) { *(.vsyscall_3) }

+ .vxtime : AT(VLOAD(.vxtime)) { *(.vxtime) }
+ vxtime = VVIRT(.vxtime);
+
. = VSYSCALL_VIRT_ADDR + 4096;

#undef VSYSCALL_ADDR
diff --git a/arch/x86_64/kernel/vsyscall.c b/arch/x86_64/kernel/vsyscall.c
index a98b460..9025699 100644
--- a/arch/x86_64/kernel/vsyscall.c
+++ b/arch/x86_64/kernel/vsyscall.c
@@ -27,6 +27,8 @@ #include <linux/seqlock.h>
#include <linux/jiffies.h>
#include <linux/sysctl.h>
#include <linux/getcpu.h>
+#include <linux/smp.h>
+#include <linux/kthread.h>

#include <asm/vsyscall.h>
#include <asm/pgtable.h>
@@ -37,6 +39,8 @@ #include <asm/io.h>
#include <asm/segment.h>
#include <asm/desc.h>
#include <asm/topology.h>
+#include <asm/smp.h>
+#include <asm/proto.h>

#define __vsyscall(nr) __attribute__ ((unused,__section__(".vsyscall_" #nr)))

@@ -46,6 +50,12 @@ int __vgetcpu_mode __section_vgetcpu_mod

#include <asm/unistd.h>

+#define NS_SCALE 10
+#define NSEC_PER_TICK (NSEC_PER_SEC / HZ)
+extern unsigned long hpet_tick;
+
+extern unsigned long vxtime_hz;
+
static __always_inline void timeval_normalize(struct timeval * tv)
{
time_t __sec;
@@ -57,35 +67,107 @@ static __always_inline void timeval_norm
}
}

+inline int apicid(void)
+{
+ int cpu;
+
+ __asm __volatile("cpuid" : "=b" (cpu) : "a" (1) : "cx", "dx");
+ return (cpu >> 24);
+}
+
static __always_inline void do_vgettimeofday(struct timeval * tv)
{
long sequence, t;
unsigned long sec, usec;
+ int cpu;

do {
sequence = read_seqbegin(&__xtime_lock);
-
- sec = __xtime.tv_sec;
- usec = __xtime.tv_nsec / 1000;
-
- if (__vxtime.mode != VXTIME_HPET) {
- t = get_cycles_sync();
- if (t < __vxtime.last_tsc)
- t = __vxtime.last_tsc;
- usec += ((t - __vxtime.last_tsc) *
- __vxtime.tsc_quot) >> 32;
- /* See comment in x86_64 do_gettimeofday. */
- } else {
- usec += ((readl((void __iomem *)
- fix_to_virt(VSYSCALL_HPET) + 0xf0) -
- __vxtime.last) * __vxtime.quot) >> 32;
- }
- } while (read_seqretry(&__xtime_lock, sequence));
+ cpu = apicid();
+
+ sec = __vxtime.pcpu[cpu].tv_sec;
+ usec = __vxtime.pcpu[cpu].tv_usec;
+ rdtscll(t);
+
+ usec += (((t - __vxtime.pcpu[cpu].last_tsc) *
+ __vxtime.pcpu[cpu].tsc_nsquot) >> NS_SCALE) / NSEC_PER_USEC;
+ } while (read_seqretry(&__xtime_lock, sequence) || apicid() != cpu);

tv->tv_sec = sec + usec / 1000000;
tv->tv_usec = usec % 1000000;
}

+void vxtime_update_pcpu(void)
+{
+ unsigned long flags, offset, seq;
+ int cpu;
+
+ write_seqlock_irqsave(&vxtime.vx_seq, flags);
+
+ cpu = hard_smp_processor_id();
+
+ do {
+ seq = read_seqbegin(&xtime_lock);
+ vxtime.pcpu[cpu].tv_sec = xtime.tv_sec;
+ vxtime.pcpu[cpu].tv_usec = xtime.tv_nsec / 1000;
+ offset = hpet_readl(HPET_COUNTER) - vxtime.last;
+ } while (read_seqretry(&xtime_lock, seq));
+
+ vxtime.pcpu[cpu].tv_usec += (offset * vxtime.quot) >> 32;
+ vxtime.pcpu[cpu].last_tsc = get_cycles_sync();
+
+ if (vxtime.pcpu[cpu].need_update == 1)
+ vxtime.pcpu[cpu].need_update = 0;
+
+ write_sequnlock_irqrestore(&vxtime.vx_seq, flags);
+}
+
+static void _vxtime_update_pcpu(void *arg)
+{
+ vxtime_update_pcpu();
+}
+
+static int vxtime_periodic(void *arg)
+{
+ while (1) {
+ set_current_state(TASK_INTERRUPTIBLE);
+ schedule_timeout(msecs_to_jiffies(1000));
+
+ smp_call_function(_vxtime_update_pcpu, NULL, 1, 1);
+ _vxtime_update_pcpu(NULL);
+ }
+
+ return (0); /* NOTREACHED */
+}
+
+void clock_was_set(void)
+{
+ smp_call_function(_vxtime_update_pcpu, NULL, 1, 1);
+ _vxtime_update_pcpu(NULL);
+}
+
+
+static __init int vxtime_init_pcpu(void)
+{
+ int i;
+
+ seqlock_init(&vxtime.vx_seq);
+
+ /*
+ * Don't bother updating the per-cpu data after each HLT
+ * if we don't need to.
+ */
+ if (!unsynchronized_tsc())
+ for (i = 0; i < NR_CPUS; i++)
+ vxtime.pcpu[i].need_update = -1;
+
+ kthread_create(vxtime_periodic, NULL, "vxtime_periodic");
+
+ return (0);
+}
+
+core_initcall(vxtime_init_pcpu);
+
/* RED-PEN may want to readd seq locking, but then the variable should be write-once. */
static __always_inline void do_get_tz(struct timezone * tz)
{
@@ -174,11 +256,6 @@ vgetcpu(unsigned *cpu, unsigned *node, s
return 0;
}

-long __vsyscall(3) venosys_1(void)
-{
- return -ENOSYS;
-}
-
#ifdef CONFIG_SYSCTL

#define SYSCALL 0x050f
diff --git a/include/asm-x86_64/timex.h b/include/asm-x86_64/timex.h
index b9e5320..91bad25 100644
--- a/include/asm-x86_64/timex.h
+++ b/include/asm-x86_64/timex.h
@@ -46,4 +46,6 @@ #define ARCH_HAS_READ_CURRENT_TIMER 1

extern struct vxtime_data vxtime;

+void clock_was_set(void);
+
#endif
diff --git a/include/asm-x86_64/vsyscall.h b/include/asm-x86_64/vsyscall.h
index fd452fc..707353b 100644
--- a/include/asm-x86_64/vsyscall.h
+++ b/include/asm-x86_64/vsyscall.h
@@ -30,6 +30,14 @@ #define VXTIME_PMTMR 3
#define VGETCPU_RDTSCP 1
#define VGETCPU_LSL 2

+struct vxtime_pcpu {
+ time_t tv_sec;
+ long tv_usec;
+ unsigned long tsc_nsquot;
+ unsigned long last_tsc;
+ int need_update;
+};
+
struct vxtime_data {
long hpet_address; /* HPET base address */
int last;
@@ -37,6 +45,8 @@ struct vxtime_data {
long quot;
long tsc_quot;
int mode;
+ seqlock_t vx_seq;
+ struct vxtime_pcpu pcpu[NR_CPUS] ____cacheline_aligned;
};

#define hpet_readl(a) readl((const void __iomem *)fix_to_virt(FIX_HPET_BASE) + a)
@@ -61,7 +71,10 @@ extern int sysctl_vsyscall;

extern void vsyscall_set_cpu(int cpu);

+void vxtime_update_pcpu(void);
+
#define ARCH_HAVE_XTIME_LOCK 1
+#define ARCH_HAVE_CLOCK_WAS_SET 1

#endif /* __KERNEL__ */

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index fca9302..7f59619 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -95,12 +95,14 @@ struct hrtimer_base {
struct lock_class_key lock_key;
};

+#ifndef ARCH_HAVE_CLOCK_WAS_SET
/*
* clock_was_set() is a NOP for non- high-resolution systems. The
* time-sorted order guarantees that a timer does not expire early and
* is expired in the next softirq when the clock was advanced.
*/
#define clock_was_set() do { } while (0)
+#endif

/* Exported timer functions: */


2006-11-14 01:09:39

by Suleiman Souhlal

[permalink] [raw]
Subject: [PATCH 2/2] Introduce a vmonotonic_clock() vsyscall.

This is the vsyscall equivalent of monotonic_clock(), using the per-cpu vxtime.
It returns the number of nanoseconds since boot.

It's intended to be used by processes that need a cheap and nanosecond
resolution time counter.

Signed-off-by: Suleiman Souhlal <[email protected]>

---
arch/x86_64/kernel/vsyscall.c | 21 +++++++++++++++++++++
include/asm-x86_64/vsyscall.h | 1 +
2 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/arch/x86_64/kernel/vsyscall.c b/arch/x86_64/kernel/vsyscall.c
index 9025699..f124cc6 100644
--- a/arch/x86_64/kernel/vsyscall.c
+++ b/arch/x86_64/kernel/vsyscall.c
@@ -52,6 +52,7 @@ #include <asm/unistd.h>

#define NS_SCALE 10
#define NSEC_PER_TICK (NSEC_PER_SEC / HZ)
+extern unsigned long long monotonic_base;
extern unsigned long hpet_tick;

extern unsigned long vxtime_hz;
@@ -108,11 +109,13 @@ void vxtime_update_pcpu(void)

do {
seq = read_seqbegin(&xtime_lock);
+ vxtime.pcpu[cpu].monotonic_base = monotonic_base;
vxtime.pcpu[cpu].tv_sec = xtime.tv_sec;
vxtime.pcpu[cpu].tv_usec = xtime.tv_nsec / 1000;
offset = hpet_readl(HPET_COUNTER) - vxtime.last;
} while (read_seqretry(&xtime_lock, seq));

+ vxtime.pcpu[cpu].monotonic_base += offset * NSEC_PER_TICK / hpet_tick;
vxtime.pcpu[cpu].tv_usec += (offset * vxtime.quot) >> 32;
vxtime.pcpu[cpu].last_tsc = get_cycles_sync();

@@ -256,6 +259,24 @@ vgetcpu(unsigned *cpu, unsigned *node, s
return 0;
}

+unsigned long __vsyscall(3) vmonotonic_clock(void)
+{
+ unsigned long nsec, seq, t;
+ int cpu;
+
+ do {
+ seq = read_seqbegin(&__vxtime.vx_seq);
+ cpu = apicid();
+ nsec = __vxtime.pcpu[cpu].monotonic_base;
+
+ rdtscll(t);
+ nsec += ((t - __vxtime.pcpu[cpu].last_tsc)
+ * __vxtime.pcpu[cpu].tsc_nsquot) >> NS_SCALE;
+ } while (read_seqretry(&__vxtime.vx_seq, seq) || cpu != apicid());
+
+ return (nsec);
+}
+
#ifdef CONFIG_SYSCTL

#define SYSCALL 0x050f
diff --git a/include/asm-x86_64/vsyscall.h b/include/asm-x86_64/vsyscall.h
index 707353b..5a10fd0 100644
--- a/include/asm-x86_64/vsyscall.h
+++ b/include/asm-x86_64/vsyscall.h
@@ -31,6 +31,7 @@ #define VGETCPU_RDTSCP 1
#define VGETCPU_LSL 2

struct vxtime_pcpu {
+ unsigned long monotonic_base;
time_t tv_sec;
long tv_usec;
unsigned long tsc_nsquot;


2006-11-14 01:51:06

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 0/2] Make the TSC safe to be used by gettimeofday().

On Tuesday 14 November 2006 02:06, Suleiman Souhlal wrote:
> I've had a proof-of-concept for this since August, and finally got around to
> somewhat cleaning it up.

Thanks.

I got a competing implementation for this unfortunately now from Vojtech & Jiri

Yours is simpler, but I'm not sure as complete. What are your assurances
against non monoticity for example?

>
> It can certainly still be improved, namely by using vgetcpu() instead of CPUID
> to find the cpu number (but I couldn't get it to work, when I tried).

What did not work?

> Another possible improvement would be to use RDTSCP when available.
> There's also a small race in do_gettimeofday(), vgettimeofday() and
> vmonotonic_clock() but I've never seen it happen.

I did a vposix_getclock with monotonic clock support on my own already, was about to
be merged with the vDSO. It still used global synchronized TSC though.

-Andi

2006-11-14 02:05:08

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 1/2] Make the TSC safe to be used by gettimeofday().

On Tuesday 14 November 2006 02:08, Suleiman Souhlal wrote:
> This is done by a per-cpu vxtime structure that stores the last TSC and HPET
> values.
>
> Whenever we switch to a userland process after a HLT instruction has been
> executed or after the CPU frequency has changed, we force a new read of the
> TSC, HPET and xtime so that we know the correct frequency we have to deal
> with.

Hmm, interesting approach.

>
> With this, we can safely use RDTSC in gettimeofday() in CPUs where the
> TSCs are not synchronized, such as Opterons, instead of doing a very expensive
> HPET read.

> + cpu = hard_smp_processor_id();

Why not smp_processor_id ?


> +
> + apicid = hard_smp_processor_id();

The apicid <-> linux cpuid mapping should be 1:1 so i don't know
why you do this separately. All the uses of hard_smp_processor_id
are bogus.

> +
> + /*
> + * We will need to also do this when switching to kernel tasks if we
> + * want to use the per-cpu monotonic_clock in the kernel
> + */
> + if (vxtime.pcpu[apicid].need_update == 1 && next_p->mm != NULL)
> + vxtime_update_pcpu();

Why? It should be really only needed on HLT/cpufreq change, or?

Anyways, I'm not very fond of adding code to the context switch critical
path.


> seq = read_seqbegin(&xtime_lock);
> + preempt_disable();
> + cpu = hard_smp_processor_id();

Again, shouldn't use hard

>
> - sec = xtime.tv_sec;
> - usec = xtime.tv_nsec / NSEC_PER_USEC;
> + sec = vxtime.pcpu[cpu].tv_sec;
> + usec = vxtime.pcpu[cpu].tv_usec;
>
> /* i386 does some correction here to keep the clock
> monotonous even when ntpd is fixing drift.
> @@ -135,9 +138,13 @@ void do_gettimeofday(struct timeval *tv)
> be found. Note when you fix it here you need to do the same
> in arch/x86_64/kernel/vsyscall.c and export all needed
> variables in vmlinux.lds. -AK */
> - usec += do_gettimeoffset();
> + t = get_cycles_sync();
> + x = (((t - vxtime.pcpu[cpu].last_tsc) *
> + vxtime.pcpu[cpu].tsc_nsquot) >> NS_SCALE) / NSEC_PER_USEC;
> + usec += x;
>
> } while (read_seqretry(&xtime_lock, seq));
> + preempt_enable();

If it can be implemented without races in user space, why does
it need preempt disable in kernel space?

The faster way to access all the vxtime code would be to stick
a pointer to the per CPU vxtime data into the PDA


> +#define NSEC_PER_TICK (NSEC_PER_SEC / HZ)
> +extern unsigned long hpet_tick;
> +
> +extern unsigned long vxtime_hz;

No externs in .c files. Happened earlier too I think

> +
> static __always_inline void timeval_normalize(struct timeval * tv)
> {
> time_t __sec;
> @@ -57,35 +67,107 @@ static __always_inline void timeval_norm
> }
> }
>
> +inline int apicid(void)
> +{
> + int cpu;
> +
> + __asm __volatile("cpuid" : "=b" (cpu) : "a" (1) : "cx", "dx");
> + return (cpu >> 24);

The faster way to do this is to use LSL from a magic GDT entry.

> +}
> +
> static __always_inline void do_vgettimeofday(struct timeval * tv)
> {
> long sequence, t;
> unsigned long sec, usec;
> + int cpu;
>
> do {
> sequence = read_seqbegin(&__xtime_lock);
> -
> - sec = __xtime.tv_sec;
> - usec = __xtime.tv_nsec / 1000;
> -
> - if (__vxtime.mode != VXTIME_HPET) {
> - t = get_cycles_sync();
> - if (t < __vxtime.last_tsc)
> - t = __vxtime.last_tsc;
> - usec += ((t - __vxtime.last_tsc) *
> - __vxtime.tsc_quot) >> 32;
> - /* See comment in x86_64 do_gettimeofday. */
> - } else {
> - usec += ((readl((void __iomem *)
> - fix_to_virt(VSYSCALL_HPET) + 0xf0) -
> - __vxtime.last) * __vxtime.quot) >> 32;
> - }
> - } while (read_seqretry(&__xtime_lock, sequence));
> + cpu = apicid();
> +
> + sec = __vxtime.pcpu[cpu].tv_sec;
> + usec = __vxtime.pcpu[cpu].tv_usec;
> + rdtscll(t);
> +
> + usec += (((t - __vxtime.pcpu[cpu].last_tsc) *
> + __vxtime.pcpu[cpu].tsc_nsquot) >> NS_SCALE) / NSEC_PER_USEC;
> + } while (read_seqretry(&__xtime_lock, sequence) || apicid() != cpu);

Hmm, isn't there still a (unlikely, but possible) race:

CPU #0 CPU #1
read cpu number
switch to CPU #1
read TSC
switch back to CPU #0
check cpu number
check succeeds, but you got wrong TSC data

How do you prevent that? I suppose you could force the seqlock to retry
in this case in the context switch, but I don't think you do that?

Also I'm not sure what your context switch code is good for.

-Andi

2006-11-14 02:06:25

by Suleiman Souhlal

[permalink] [raw]
Subject: Re: [PATCH 0/2] Make the TSC safe to be used by gettimeofday().

Andi Kleen wrote:
> On Tuesday 14 November 2006 02:06, Suleiman Souhlal wrote:
>
>>I've had a proof-of-concept for this since August, and finally got around to
>>somewhat cleaning it up.
>
>
> Thanks.
>
> I got a competing implementation for this unfortunately now from Vojtech & Jiri
>
> Yours is simpler, but I'm not sure as complete. What are your assurances
> against non monoticity for example?

I believe that the results returned will always be monotonic, as long as
the frequency of the TSC does not change from under us (that is, without
the kernel knowing). This is because we "synchronize" each CPU's vxtime
with a global time source (HPET) every time we know the TSC rate changes.

>>It can certainly still be improved, namely by using vgetcpu() instead of CPUID
>>to find the cpu number (but I couldn't get it to work, when I tried).
>
>
> What did not work?

I was not able to make vgettimeofday use vgetcpu(). It seemed like vgetcpu()
was not returning the same value as smp_processor_id() would, so the
values I'd get with vgettimeofday() did not completely agree with the
ones from gettimeofday(2). I didn't have the chance to investigate more.

>>Another possible improvement would be to use RDTSCP when available.
>>There's also a small race in do_gettimeofday(), vgettimeofday() and
>>vmonotonic_clock() but I've never seen it happen.
>
>
> I did a vposix_getclock with monotonic clock support on my own already, was about to
> be merged with the vDSO. It still used global synchronized TSC though.

-- Suleiman

2006-11-14 02:25:56

by Suleiman Souhlal

[permalink] [raw]
Subject: Re: [PATCH 1/2] Make the TSC safe to be used by gettimeofday().

Andi Kleen wrote:
> On Tuesday 14 November 2006 02:08, Suleiman Souhlal wrote:
>
>>This is done by a per-cpu vxtime structure that stores the last TSC and HPET
>>values.
>>
>>Whenever we switch to a userland process after a HLT instruction has been
>>executed or after the CPU frequency has changed, we force a new read of the
>>TSC, HPET and xtime so that we know the correct frequency we have to deal
>>with.
>
>
> Hmm, interesting approach.
>
>
>>With this, we can safely use RDTSC in gettimeofday() in CPUs where the
>>TSCs are not synchronized, such as Opterons, instead of doing a very expensive
>>HPET read.
>
>
>>+ cpu = hard_smp_processor_id();
>
>
> Why not smp_processor_id ?

Because CPUID returns the APIC ID, and I was under the impression that
the cpu numbers
smp_processor_id() were dynamically allocated and didn't necessarily
match the
APIC ID.

>>+
>>+ apicid = hard_smp_processor_id();
>
>
> The apicid <-> linux cpuid mapping should be 1:1 so i don't know
> why you do this separately. All the uses of hard_smp_processor_id
> are bogus.
>
>
>>+
>>+ /*
>>+ * We will need to also do this when switching to kernel tasks if we
>>+ * want to use the per-cpu monotonic_clock in the kernel
>>+ */
>>+ if (vxtime.pcpu[apicid].need_update == 1 && next_p->mm != NULL)
>>+ vxtime_update_pcpu();
>
>
> Why? It should be really only needed on HLT/cpufreq change, or?
>
> Anyways, I'm not very fond of adding code to the context switch critical
> path.

Yes, it's only needed on HLT and cpufreq change.
The code here is to force a "resynch" with the HPET if we've done a HLT.
It has to be done before we switch to any userland thread that might
use the per-cpu vxtime. switch_to() seemed like the most natural place
to put this.

>
>
>> seq = read_seqbegin(&xtime_lock);
>>+ preempt_disable();
>>+ cpu = hard_smp_processor_id();
>
>
> Again, shouldn't use hard
>
>
>>
>>- sec = xtime.tv_sec;
>>- usec = xtime.tv_nsec / NSEC_PER_USEC;
>>+ sec = vxtime.pcpu[cpu].tv_sec;
>>+ usec = vxtime.pcpu[cpu].tv_usec;
>>
>> /* i386 does some correction here to keep the clock
>> monotonous even when ntpd is fixing drift.
>>@@ -135,9 +138,13 @@ void do_gettimeofday(struct timeval *tv)
>> be found. Note when you fix it here you need to do the same
>> in arch/x86_64/kernel/vsyscall.c and export all needed
>> variables in vmlinux.lds. -AK */
>>- usec += do_gettimeoffset();
>>+ t = get_cycles_sync();
>>+ x = (((t - vxtime.pcpu[cpu].last_tsc) *
>>+ vxtime.pcpu[cpu].tsc_nsquot) >> NS_SCALE) / NSEC_PER_USEC;
>>+ usec += x;
>>
>> } while (read_seqretry(&xtime_lock, seq));
>>+ preempt_enable();
>
>
> If it can be implemented without races in user space, why does
> it need preempt disable in kernel space?

Because there is indeed a small race in user space (as you noted below)
that I still haven't found how to address.

> The faster way to access all the vxtime code would be to stick
> a pointer to the per CPU vxtime data into the PDA
>
>
>
>>+#define NSEC_PER_TICK (NSEC_PER_SEC / HZ)
>>+extern unsigned long hpet_tick;
>>+
>>+extern unsigned long vxtime_hz;
>
>
> No externs in .c files. Happened earlier too I think
>
>
>>+
>> static __always_inline void timeval_normalize(struct timeval * tv)
>> {
>> time_t __sec;
>>@@ -57,35 +67,107 @@ static __always_inline void timeval_norm
>> }
>> }
>>
>>+inline int apicid(void)
>>+{
>>+ int cpu;
>>+
>>+ __asm __volatile("cpuid" : "=b" (cpu) : "a" (1) : "cx", "dx");
>>+ return (cpu >> 24);
>
>
> The faster way to do this is to use LSL from a magic GDT entry.

A cow-orker suggested that we use SIDT and encode the CPU number in the
limit of the IDT, which should be even faster than LSL.

>
>>+}
>>+
>> static __always_inline void do_vgettimeofday(struct timeval * tv)
>> {
>> long sequence, t;
>> unsigned long sec, usec;
>>+ int cpu;
>>
>> do {
>> sequence = read_seqbegin(&__xtime_lock);
>>-
>>- sec = __xtime.tv_sec;
>>- usec = __xtime.tv_nsec / 1000;
>>-
>>- if (__vxtime.mode != VXTIME_HPET) {
>>- t = get_cycles_sync();
>>- if (t < __vxtime.last_tsc)
>>- t = __vxtime.last_tsc;
>>- usec += ((t - __vxtime.last_tsc) *
>>- __vxtime.tsc_quot) >> 32;
>>- /* See comment in x86_64 do_gettimeofday. */
>>- } else {
>>- usec += ((readl((void __iomem *)
>>- fix_to_virt(VSYSCALL_HPET) + 0xf0) -
>>- __vxtime.last) * __vxtime.quot) >> 32;
>>- }
>>- } while (read_seqretry(&__xtime_lock, sequence));
>>+ cpu = apicid();
>>+
>>+ sec = __vxtime.pcpu[cpu].tv_sec;
>>+ usec = __vxtime.pcpu[cpu].tv_usec;
>>+ rdtscll(t);
>>+
>>+ usec += (((t - __vxtime.pcpu[cpu].last_tsc) *
>>+ __vxtime.pcpu[cpu].tsc_nsquot) >> NS_SCALE) / NSEC_PER_USEC;
>>+ } while (read_seqretry(&__xtime_lock, sequence) || apicid() != cpu);
>
>
> Hmm, isn't there still a (unlikely, but possible) race:
>
> CPU #0 CPU #1
> read cpu number
> switch to CPU #1
> read TSC
> switch back to CPU #0
> check cpu number
> check succeeds, but you got wrong TSC data
>
> How do you prevent that? I suppose you could force the seqlock to retry
> in this case in the context switch, but I don't think you do that?

I couldn't figure out how to tell if a context switch has happened from
userland. I tried putting a per-cpu context switch count, but I couldn't
figure out how to get it atomically along with the CPU number..

> Also I'm not sure what your context switch code is good for.
>
> -Andi

Thanks for the feedback.

-- Suleiman

2006-11-14 02:44:08

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 1/2] Make the TSC safe to be used by gettimeofday().


>
> Because CPUID returns the APIC ID, and I was under the impression that
> the cpu numbers
> smp_processor_id() were dynamically allocated and didn't necessarily
> match the
> APIC ID.

Correct, but one can use a mapping table.


> Yes, it's only needed on HLT and cpufreq change.
> The code here is to force a "resynch" with the HPET if we've done a HLT.
> It has to be done before we switch to any userland thread that might
> use the per-cpu vxtime. switch_to() seemed like the most natural place
> to put this.

I don't think so. The natural place after HLT is in the idle loop or
better in idle notifiers[1] and after cpufreq is in the appropiate cpufreq
notifiers.


[1] unfortunately they are still subtly broken in .19, but will be fixed
in .20

> A cow-orker suggested that we use SIDT and encode the CPU number in the
> limit of the IDT, which should be even faster than LSL.

Possible yes. Did you time it?

But then we would make the IDT variable length in memory? While
the CPUs probably won't care some Hypervisors seem to be picky
about these limits. LSL still seems somewhat safer.

> I couldn't figure out how to tell if a context switch has happened from
> userland. I tried putting a per-cpu context switch count, but I couldn't
> figure out how to get it atomically along with the CPU number..

It's tricky. That is why we asked for RDTSCP.

-Andi

2006-11-14 03:35:13

by dean gaudet

[permalink] [raw]
Subject: Re: [PATCH 1/2] Make the TSC safe to be used by gettimeofday().

On Tue, 14 Nov 2006, Andi Kleen wrote:

> > A cow-orker suggested that we use SIDT and encode the CPU number in the
> > limit of the IDT, which should be even faster than LSL.
>
> Possible yes. Did you time it?
>
> But then we would make the IDT variable length in memory? While
> the CPUs probably won't care some Hypervisors seem to be picky
> about these limits. LSL still seems somewhat safer.

i'm one of the coworkers suleiman is referring to... below is the README
from <http://arctic.org/~dean/vtime64.tar.gz>. see the tarball if you
want to peruse the code.

the nomenclature in this benchmark doesn't line up with the patch suleiman
posted, but the concept is similar.

in this code i mock-up an implementation of a "uint64_t vtime64(void)"
vsyscall which return 64-bit ns since the epoch. i think this is a
much more useful syscall than gettimeofday() because it doesn't require
extra multiply/divide to break the data into two pieces (which most
folks then recombine back into a uint64_t). the concepts are the same
for a vgettimeofday.

note that fundamentally the same code as vtime64() can be used to
provide clock_gettime(CLOCK_MONOTONIC) (akin to gethrtime() on slowaris).
it just needs a different epoch (one which causes the values to remain
monotonic across adjtime()).

i mock up three new methods of implementing vgetcpu() suggested by Nathan
Laredo -- using sidt/sgdt/sldt, and present a comparison vs. the existing
kernel lsl code.

the s*dt instructions have varying degrees of complexity in their use
in a vgetcpu() implementation. sldt is clearly the fastest but has
conflicts with code such as wine. note that the sidt limit is essentially
"infinity" if it's >= 0xfff (64-bit) or 0x7ff (32-bit) ... because there
are only 256 software interrupts.

the s*dt instructions are faster than lsl everywhere simply because lsl is
microcoded, involves protection tests, and extra memory references.

note that i don't present the data, but sidt is faster than rdtscp on
rev F opteron especially if all you want is the cpuid.

first -- an implementation where the userland code handles restarting
the vsyscall.

guide to the table:

ff = family
mm = model
ss = stepping
lm = long mode or not
(note the 32-bit code was timed on 64-bit boxes in long mode... it might
be different if the kernel itself was also in 32-bit mode)

vendor name ffmmss lm |----------- timings all in cycles ----------| note

GenuineIntel 060f05 32 sgdt 112.0 sidt 111.1 sldt 107.1 lsl 196.1 core2
GenuineIntel 060f05 64 sgdt 102.1 sidt 104.1 sldt 96.2 lsl 178.1 core2

AuthenticAMD 0f4102 32 sgdt 80.1 sidt 80.1 sldt 58.1 lsl 156.0 revF opteron
AuthenticAMD 0f4102 64 sgdt 67.1 sidt 65.1 sldt 41.0 lsl 136.0 revF opteron

AuthenticAMD 0f2102 32 sgdt 77.0 sidt 77.7 sldt 56.0 lsl 154.3 revE opteron
AuthenticAMD 0f2102 64 sgdt 65.0 sidt 63.1 sldt 40.0 lsl 137.6 revE opteron

GenuineIntel 0f0401 32 sgdt 231.7 sidt 225.9 sldt 218.0 lsl 421.5 nocona
GenuineIntel 0f0401 64 sgdt 212.3 sidt 210.0 sldt 200.7 lsl 449.9 nocona

GenuineIntel 0f0403 32 sgdt 232.1 sidt 244.1 sldt 221.4 lsl 420.1 p4 desktop
GenuineIntel 0f0403 64 sgdt 216.1 sidt 216.8 sldt 204.1 lsl 396.1 p4 desktop

GenuineIntel 0f0209 32 sgdt 240.1 sidt 232.1 sldt 224.4 lsl 384.1 xeon


next an implementation which relies on the kernel restarting the computation when
necessary. this would be achieved by testing to see when the task to be restarted
is on the vsyscall page and backtracking the task to the vsyscall entry point.

this is challenging when the vsyscall is implemented in C -- because of potential
stack usage. there are ways to get this to work though, even without resorting to
assembly. i'm presenting this only as a best case scenario should such an effort
be undertaken. (i have a crazy idea involving the direction flag which i need to
mock up.)

vendor name ffmmss lm |----------- timings all in cycles ----------| note

GenuineIntel 060f05 32 sgdt 90.0 sidt 91.0 sldt 86.0 lsl 128.0 core2
GenuineIntel 060f05 64 sgdt 76.0 sidt 78.1 sldt 76.5 lsl 113.0 core2

AuthenticAMD 0f4102 32 sgdt 43.0 sidt 43.1 sldt 28.0 lsl 82.0 revF opteron
AuthenticAMD 0f4102 64 sgdt 29.0 sidt 28.6 sldt 16.0 lsl 72.0 revF opteron

AuthenticAMD 0f2102 32 sgdt 44.0 sidt 42.8 sldt 28.0 lsl 82.6 revE opteron
AuthenticAMD 0f2102 64 sgdt 27.0 sidt 25.6 sldt 14.5 lsl 72.0 revE opteron

GenuineIntel 0f0401 32 sgdt 111.9 sidt 120.1 sldt 108.6 lsl 225.0 nocona
GenuineIntel 0f0401 64 sgdt 100.9 sidt 100.0 sldt 100.0 lsl 158.0 nocona

GenuineIntel 0f0403 32 sgdt 129.5 sidt 116.1 sldt 112.0 lsl 228.0 p4 desktop
GenuineIntel 0f0403 64 sgdt 104.9 sidt 102.2 sldt 100.0 lsl 138.0 p4 desktop

GenuineIntel 0f0209 32 sgdt 136.0 sidt 136.1 sldt 132.5 lsl 200.0 xeon

-dean

p.s. i work at google, and google paid for this experiment.

2006-11-14 03:54:12

by Suleiman Souhlal

[permalink] [raw]
Subject: Re: [PATCH 1/2] Make the TSC safe to be used by gettimeofday().

Andi Kleen wrote:
>>Because CPUID returns the APIC ID, and I was under the impression that
>>the cpu numbers
>>smp_processor_id() were dynamically allocated and didn't necessarily
>>match the
>>APIC ID.
>
>
> Correct, but one can use a mapping table.
>
>>Yes, it's only needed on HLT and cpufreq change.
>>The code here is to force a "resynch" with the HPET if we've done a HLT.
>> It has to be done before we switch to any userland thread that might
>>use the per-cpu vxtime. switch_to() seemed like the most natural place
>>to put this.
>
>
> I don't think so. The natural place after HLT is in the idle loop or
> better in idle notifiers[1] and after cpufreq is in the appropiate cpufreq
> notifiers.
>
>
> [1] unfortunately they are still subtly broken in .19, but will be fixed
> in .20

I initially had it after the HLT in the idle loop, but realized there
would be a small race: We could get switched away from the idle thread
after the HLT but before doing the resynch.

It seems like idle notifiers would do the trick, except that they don't
appear to include the id of the CPU that went/exited from idle.

Do you want me to also send a patch that addresses this?

>>A cow-orker suggested that we use SIDT and encode the CPU number in the
>>limit of the IDT, which should be even faster than LSL.
>
>
> Possible yes. Did you time it?
>
> But then we would make the IDT variable length in memory? While
> the CPUs probably won't care some Hypervisors seem to be picky
> about these limits. LSL still seems somewhat safer.

Does the LSL from the magic GDT entry return the APIC ID or the
smp_processor_id? If it's the latter, I can avoid using the
smp_processor_id->apic id table completely.

>
>>I couldn't figure out how to tell if a context switch has happened from
>>userland. I tried putting a per-cpu context switch count, but I couldn't
>>figure out how to get it atomically along with the CPU number..
>
>
> It's tricky. That is why we asked for RDTSCP.

Another way I could fix this would be to just touch the seqlock, in
switch_to(), if we happen to be in vgettimeofday. Would this be acceptable?

-- Suleiman

2006-11-14 04:22:26

by dean gaudet

[permalink] [raw]
Subject: Re: [PATCH 1/2] Make the TSC safe to be used by gettimeofday().

On Mon, 13 Nov 2006, dean gaudet wrote:

> next an implementation which relies on the kernel restarting the computation when
> necessary. this would be achieved by testing to see when the task to be restarted
> is on the vsyscall page and backtracking the task to the vsyscall entry point.
>
> this is challenging when the vsyscall is implemented in C -- because of potential
> stack usage. there are ways to get this to work though, even without resorting to
> assembly. i'm presenting this only as a best case scenario should such an effort
> be undertaken. (i have a crazy idea involving the direction flag which i need to
> mock up.)

nevermind the crazy idea using DF... i was hoping to use DF as a generic
"restart a vsyscall" indicator -- switch_to() would note the task is on
the vsyscall page and unilaterally clear DF before restoring eflags.

then a vsyscall critical section could be surrounded like so:

unsigned long tmp;
do {
asm volatile("std");

critical section

asm volatile(
"\n pushf"
"\n pop %0"
"\n cld"
: "=r" (tmp));
} while ((tmp & 0x400) == 0);

it works great on k8 ... but DF manipulation hurts way too much on core2 and p4.

i even tried reading DF using a string instruction:

long tmp;
do {
asm volatile("std");

critical section

asm volatile(
"\n mov %%rsp,%%rsi"
"\n lodsl"
"\n sub %%rsp,%%rsi"
"\n cld"
: "=S" (tmp));
} while (tmp > 0);

it's no better.

i've also tried similar tricks setting the EFLAGS.ID bit... but the popf
hurts in that case.

i think a general vsyscall restart mechanism would be useful (for more
than just the time functions), but still haven't found one which is
cheap enough.

-dean

2006-11-14 07:42:40

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH 0/2] Make the TSC safe to be used by gettimeofday().

On Tue, 2006-11-14 at 02:50 +0100, Andi Kleen wrote:
> On Tuesday 14 November 2006 02:06, Suleiman Souhlal wrote:
> > I've had a proof-of-concept for this since August, and finally got around to
> > somewhat cleaning it up.
>
> Thanks.
>
> I got a competing implementation for this unfortunately now from Vojtech & Jiri


where is this posted?
Since last weeks discussion on tickless several people started
implementing alternatives since nothing existed in public ...


2006-11-14 11:13:01

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 0/2] Make the TSC safe to be used by gettimeofday() II


> I was not able to make vgettimeofday use vgetcpu(). It seemed like vgetcpu()
> was not returning the same value as smp_processor_id() would, so the
> values I'd get with vgettimeofday() did not completely agree with the
> ones from gettimeofday(2). I didn't have the chance to investigate more.

I investigated now. You got cpu hotplug disabled, right?
The code currently only works with cpu hotplug enabled. Will fix.

-Andi

2006-11-14 11:13:04

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 1/2] Make the TSC safe to be used by gettimeofday().


> I initially had it after the HLT in the idle loop, but realized there
> would be a small race: We could get switched away from the idle thread
> after the HLT but before doing the resynch.

idle threads are pinned to CPUs of course.

> It seems like idle notifiers would do the trick, except that they don't
> appear to include the id of the CPU that went/exited from idle.

Just use smp_processor_id()

> Does the LSL from the magic GDT entry return the APIC ID or the
> smp_processor_id?

smp_processor_id

> Another way I could fix this would be to just touch the seqlock, in
> switch_to(), if we happen to be in vgettimeofday. Would this be acceptable?

Hmm, i'm not sure that would cover all cases with multiple cpu switches.
You would need to do it for all CPUs, which might get tricky.
I would also prefer to not do anything complicated in switch_to

-Andi

2006-11-14 12:30:41

by Shem Multinymous

[permalink] [raw]
Subject: Re: [PATCH 0/2] Make the TSC safe to be used by gettimeofday().

On 11/14/06, Suleiman Souhlal <[email protected]> wrote:
> I believe that the results returned will always be monotonic, as long as
> the frequency of the TSC does not change from under us (that is, without
> the kernel knowing). This is because we "synchronize" each CPU's vxtime
> with a global time source (HPET) every time we know the TSC rate changes.

Does this hold after a suspend/resume cycle? You could be resuming
with a different CPU clock than what you suspended with, and I'm not
sure anything guarantees an early enough re-sync.

Shem

2006-11-14 17:06:20

by Suleiman Souhlal

[permalink] [raw]
Subject: Re: [PATCH 0/2] Make the TSC safe to be used by gettimeofday().

Shem Multinymous wrote:
> On 11/14/06, Suleiman Souhlal <[email protected]> wrote:
>
>> I believe that the results returned will always be monotonic, as long as
>> the frequency of the TSC does not change from under us (that is, without
>> the kernel knowing). This is because we "synchronize" each CPU's vxtime
>> with a global time source (HPET) every time we know the TSC rate changes.
>
>
> Does this hold after a suspend/resume cycle? You could be resuming
> with a different CPU clock than what you suspended with, and I'm not
> sure anything guarantees an early enough re-sync.

I have to admit that I didn't really design this patch with
suspend/resume in mind, and that I'm not too familiar with how they work.

Do the HPET/ACPI timers still run while the system is suspended?
If not, there shouldn't be any problem, as long as the kernel is
informed of the resume early enough.

-- Suleiman

2006-11-14 18:30:36

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 0/2] Make the TSC safe to be used by gettimeofday().


> I have to admit that I didn't really design this patch with
> suspend/resume in mind, and that I'm not too familiar with how they work.
>
> Do the HPET/ACPI timers still run while the system is suspended?

No. In suspend to RAM only the RAM is still powered. On suspend to disk
everything is off. But you can do every needed synchronization on resume.

-Andi

2006-11-14 21:28:15

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 0/2] Make the TSC safe to be used by gettimeofday().

On Mon, 13 Nov 2006, Suleiman Souhlal wrote:

> I believe that the results returned will always be monotonic, as long as the
> frequency of the TSC does not change from under us (that is, without the
> kernel knowing). This is because we "synchronize" each CPU's vxtime with a
> global time source (HPET) every time we know the TSC rate changes.

TSC frequencies of different clocks coming with multiple
processors may vary which creates a differential even if they are
initially synchronized.