2005-03-12 01:27:53

by john stultz

[permalink] [raw]
Subject: [RFC][PATCH] new timeofday core subsystem (v. A3)

All,
This patch implements the architecture independent portion of the time
of day subsystem. For a brief description on the rework, see here:
http://lwn.net/Articles/120850/ (Many thanks to the LWN team for that
clear writeup!)

The exciting new changes are ntp_scale() has been removed, speeding up
gettimeofday(), and the periodically run timekeeping code is now called
via soft-timer rather then at interrupt time. So timekeeping can now be
done every tick or every 10 ticks or whatever!

Included below is timeofday.c (which includes all the time of day
management and accessor functions), ntp.c (which includes the ntp
scaling calculation code, leapsecond processing, and ntp kernel state
machine code), timesource.c (for timesource specific management
functions), interface definition .h files, the example jiffies
timesource (lowest common denominator time source, mainly for use as
example code) and minimal hooks into arch independent code.

The patch does not function without minimal architecture specific hooks
(i386, x86-64, ppc32, ppc64, and ia64 examples to follow), and it should
be able to be applied to a tree without affecting the code.

New in this version:
o ntp_scale has been removed from the gettimeofday fastpath
o ntp_advance now pre-calculates the ntp scaling factor for the next
interval
o timeofday_periodic_hook is now called by a soft-timer and runs outside
of interrupt context
o comment cleanups

Items still on the TODO list:
o cyc2ns needs better remainder code
o Infrastructure for exporting time values for vsyscall/fsyscall
o finer grianed ntp adjustments in ppb instead of ppm
o more flexible timesource management interface
o Testing, performance and cleanup work

I look forward to your comments and feedback.

thanks
-john

linux-2.6.11_timeofday-core_A3.patch
========================================
diff -Nru a/drivers/Makefile b/drivers/Makefile
--- a/drivers/Makefile 2005-03-11 17:00:02 -08:00
+++ b/drivers/Makefile 2005-03-11 17:00:02 -08:00
@@ -64,3 +64,4 @@
obj-$(CONFIG_BLK_DEV_SGIIOC4) += sn/
obj-y += firmware/
obj-$(CONFIG_CRYPTO) += crypto/
+obj-$(CONFIG_NEWTOD) += timesource/
diff -Nru a/drivers/timesource/Makefile b/drivers/timesource/Makefile
--- /dev/null Wed Dec 31 16:00:00 196900
+++ b/drivers/timesource/Makefile 2005-03-11 17:00:02 -08:00
@@ -0,0 +1 @@
+obj-y += jiffies.o
diff -Nru a/drivers/timesource/jiffies.c b/drivers/timesource/jiffies.c
--- /dev/null Wed Dec 31 16:00:00 196900
+++ b/drivers/timesource/jiffies.c 2005-03-11 17:00:02 -08:00
@@ -0,0 +1,45 @@
+/*
+ * linux/drivers/timesource/jiffies.c
+ *
+ * Copyright (C) 2004 IBM
+ *
+ * This file contains the jiffies based time source.
+ *
+ */
+#include <linux/timesource.h>
+#include <linux/jiffies.h>
+#include <linux/init.h>
+
+/* The Jiffies based timesource is the lowest common
+ * denominator time source which should function on
+ * all systems. It has the same coarse resolution as
+ * the timer interrupt frequency HZ and it suffers
+ * inaccuracies caused by missed or lost timer
+ * interrupts and the inability for the timer
+ * interrupt hardware to accuratly tick at the
+ * requested HZ value. It is also not reccomended
+ * for "tick-less" systems.
+ */
+
+static cycle_t jiffies_read(void)
+{
+ cycle_t ret = get_jiffies_64();
+ return ret;
+}
+
+struct timesource_t timesource_jiffies = {
+ .name = "jiffies",
+ .priority = 0, /* lowest priority*/
+ .type = TIMESOURCE_FUNCTION,
+ .read_fnct = jiffies_read,
+ .mask = (cycle_t)~0,
+ .mult = NSEC_PER_SEC/HZ,
+ .shift = 0,
+};
+
+static int init_jiffies_timesource(void)
+{
+ register_timesource(&timesource_jiffies);
+ return 0;
+}
+module_init(init_jiffies_timesource);
diff -Nru a/include/linux/ntp.h b/include/linux/ntp.h
--- /dev/null Wed Dec 31 16:00:00 196900
+++ b/include/linux/ntp.h 2005-03-11 17:00:02 -08:00
@@ -0,0 +1,22 @@
+/* linux/include/linux/ntp.h
+ *
+ * Copyright (C) 2003, 2004 IBM, John Stultz ([email protected])
+ *
+ * This file contains time of day helper functions
+ */
+
+#ifndef _LINUX_NTP_H
+#define _LINUX_NTP_H
+#include <linux/types.h>
+#include <linux/time.h>
+#include <linux/timex.h>
+
+/* timeofday interfaces */
+nsec_t ntp_scale(nsec_t value);
+int ntp_advance(nsec_t value);
+int ntp_adjtimex(struct timex*);
+int ntp_leapsecond(struct timespec now);
+void ntp_clear(void);
+int get_ntp_status(void);
+
+#endif
diff -Nru a/include/linux/time.h b/include/linux/time.h
--- a/include/linux/time.h 2005-03-11 17:00:02 -08:00
+++ b/include/linux/time.h 2005-03-11 17:00:02 -08:00
@@ -27,6 +27,10 @@

#ifdef __KERNEL__

+/* timeofday base types */
+typedef u64 nsec_t;
+typedef u64 cycle_t;
+
/* Parameters used to convert the timespec values */
#ifndef USEC_PER_SEC
#define USEC_PER_SEC (1000000L)
diff -Nru a/include/linux/timeofday.h b/include/linux/timeofday.h
--- /dev/null Wed Dec 31 16:00:00 196900
+++ b/include/linux/timeofday.h 2005-03-11 17:00:02 -08:00
@@ -0,0 +1,65 @@
+/* linux/include/linux/timeofday.h
+ *
+ * Copyright (C) 2003, 2004 IBM, John Stultz ([email protected])
+ *
+ * This file contains the interface to the time of day subsystem
+ */
+#ifndef _LINUX_TIMEOFDAY_H
+#define _LINUX_TIMEOFDAY_H
+#include <linux/types.h>
+#include <linux/time.h>
+#include <linux/timex.h>
+#include <asm/div64.h>
+
+#ifdef CONFIG_NEWTOD
+nsec_t get_lowres_timestamp(void);
+nsec_t get_lowres_timeofday(void);
+nsec_t do_monotonic_clock(void);
+
+void do_gettimeofday(struct timeval *tv);
+int do_settimeofday(struct timespec *tv);
+int do_adjtimex(struct timex *tx);
+
+void timeofday_suspend_hook(void);
+void timeofday_resume_hook(void);
+
+void timeofday_init(void);
+
+
+/* Helper functions */
+static inline struct timeval ns2timeval(nsec_t ns)
+{
+ struct timeval tv;
+ tv.tv_sec = div_long_long_rem(ns, NSEC_PER_SEC, &tv.tv_usec);
+ tv.tv_usec /= NSEC_PER_USEC;
+ return tv;
+}
+
+static inline struct timespec ns2timespec(nsec_t ns)
+{
+ struct timespec ts;
+ ts.tv_sec = div_long_long_rem(ns, NSEC_PER_SEC, &ts.tv_nsec);
+ return ts;
+}
+
+static inline u64 timespec2ns(struct timespec* ts)
+{
+ nsec_t ret;
+ ret = ((nsec_t)ts->tv_sec) * NSEC_PER_SEC;
+ ret += ts->tv_nsec;
+ return ret;
+}
+
+static inline nsec_t timeval2ns(struct timeval* tv)
+{
+ nsec_t ret;
+ ret = ((nsec_t)tv->tv_sec) * NSEC_PER_SEC;
+ ret += tv->tv_usec*NSEC_PER_USEC;
+ return ret;
+}
+#else /* CONFIG_NEWTOD */
+#define timeofday_suspend_hook()
+#define timeofday_resume_hook()
+#define timeofday_init()
+#endif /* CONFIG_NEWTOD */
+#endif /* _LINUX_TIMEOFDAY_H */
diff -Nru a/include/linux/timesource.h b/include/linux/timesource.h
--- /dev/null Wed Dec 31 16:00:00 196900
+++ b/include/linux/timesource.h 2005-03-11 17:00:02 -08:00
@@ -0,0 +1,150 @@
+/* linux/include/linux/timesource.h
+ *
+ * Copyright (C) 2003, 2004 IBM, John Stultz ([email protected])
+ *
+ * This file contains the structure definitions for timesources.
+ *
+ * If you are not a timesource, or the time of day code, you should
+ * not be including this file!
+ */
+#ifndef _LINUX_TIMESORUCE_H
+#define _LINUX_TIMESORUCE_H
+
+#include <linux/types.h>
+#include <linux/time.h>
+#include <linux/timex.h>
+#include <asm/io.h>
+#include <asm/div64.h>
+
+/* struct timesource_t:
+ * Provides mostly state-free accessors to the underlying
+ * hardware.
+ * name: ptr to timesource name
+ * priority: priority value for selection (higher is better)
+ * type: defines timesource type
+ * @read_fnct: returns a cycle value
+ * ptr: ptr to MMIO'ed counter
+ * mask: bitmask for two's complement
+ * subtraction of non 64 bit counters
+ * mult: cycle to nanosecond multiplier
+ * shift: cycle to nanosecond divisor (power of two)
+ * @update_callback: called when safe to alter timesource values
+ */
+struct timesource_t {
+ char* name;
+ int priority;
+ enum {
+ TIMESOURCE_FUNCTION,
+ TIMESOURCE_CYCLES,
+ TIMESOURCE_MMIO_32,
+ TIMESOURCE_MMIO_64
+ } type;
+ cycle_t (*read_fnct)(void);
+ void __iomem* mmio_ptr;
+ cycle_t mask;
+ u32 mult;
+ u32 shift;
+ void (*update_callback)(void);
+};
+
+
+/* Helper functions that converts a khz counter
+ * frequency to a timsource multiplier, given the
+ * timesource shift value
+ */
+static inline u32 timesource_khz2mult(u32 khz, u32 shift_constant)
+{
+ /* khz = cyc/(Million ns)
+ * mult/2^shift = ns/cyc
+ * mult = ns/cyc * 2^shift
+ * mult = 1Million/khz * 2^shift
+ * mult = 1000000 * 2^shift / khz
+ * mult = (1000000<<shift) / khz
+ */
+ u64 tmp = ((u64)1000000) << shift_constant;
+ do_div(tmp, khz);
+ return (u32)tmp;
+}
+
+/* Helper functions that converts a hz counter
+ * frequency to a timsource multiplier, given the
+ * timesource shift value
+ */
+static inline u32 timesource_hz2mult(u32 hz, u32 shift_constant)
+{
+ /* hz = cyc/(Billion ns)
+ * mult/2^shift = ns/cyc
+ * mult = ns/cyc * 2^shift
+ * mult = 1Billion/hz * 2^shift
+ * mult = 1000000000 * 2^shift / hz
+ * mult = (1000000000<<shift) / hz
+ */
+ u64 tmp = ((u64)1000000000) << shift_constant;
+ do_div(tmp, hz);
+ return (u32)tmp;
+}
+
+
+/* XXX - this should go somewhere better! */
+#ifndef readq
+static inline unsigned long long readq(void __iomem *addr)
+{
+ u32 low, high;
+ /* loop is required to make sure we get an atomic read */
+ do {
+ high = readl(addr+4);
+ low = readl(addr);
+ } while (high != readl(addr+4));
+
+ return low | (((unsigned long long)high) << 32LL);
+}
+#endif
+
+
+/* read_timersource():
+ * Uses the timesource to return the current cycle_t value
+ */
+static inline cycle_t read_timesource(struct timesource_t* ts)
+{
+ switch (ts->type) {
+ case TIMESOURCE_MMIO_32:
+ return (cycle_t)readl(ts->mmio_ptr);
+ case TIMESOURCE_MMIO_64:
+ return (cycle_t)readq(ts->mmio_ptr);
+ case TIMESOURCE_CYCLES:
+ return (cycle_t)get_cycles();
+ default:/* case: TIMESOURCE_FUNCTION */
+ return ts->read_fnct();
+ }
+}
+
+/* cyc2ns():
+ * Uses the timesource and ntp ajdustment interval to
+ * convert cycle_ts to nanoseconds.
+ * If rem is not null, it stores the remainder of the
+ * calculation there.
+ *
+ */
+static inline nsec_t cyc2ns(struct timesource_t* ts, int ntp_adj, cycle_t cycles, cycle_t* rem)
+{
+ u64 ret;
+ ret = (u64)cycles;
+ ret *= (ts->mult + ntp_adj);
+ if (unlikely(rem)) {
+ /* XXX clean this up later!
+ * buf for now relax, we only calc
+ * remainders at interrupt time
+ */
+ u64 remainder = ret & ((1 << ts->shift) -1);
+ do_div(remainder, ts->mult);
+ *rem = remainder;
+ }
+ ret >>= ts->shift;
+ return (nsec_t)ret;
+}
+
+/* used to install a new time source */
+void register_timesource(struct timesource_t*);
+struct timesource_t* get_next_timesource(void);
+
+#endif
diff -Nru a/init/main.c b/init/main.c
--- a/init/main.c 2005-03-11 17:00:02 -08:00
+++ b/init/main.c 2005-03-11 17:00:02 -08:00
@@ -47,6 +47,7 @@
#include <linux/rmap.h>
#include <linux/mempolicy.h>
#include <linux/key.h>
+#include <linux/timeofday.h>

#include <asm/io.h>
#include <asm/bugs.h>
@@ -467,6 +468,7 @@
pidhash_init();
init_timers();
softirq_init();
+ timeofday_init();
time_init();

/*
diff -Nru a/kernel/Makefile b/kernel/Makefile
--- a/kernel/Makefile 2005-03-11 17:00:02 -08:00
+++ b/kernel/Makefile 2005-03-11 17:00:02 -08:00
@@ -9,6 +9,7 @@
rcupdate.o intermodule.o extable.o params.o posix-timers.o \
kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o

+obj-$(CONFIG_NEWTOD) += timeofday.o timesource.o ntp.o
obj-$(CONFIG_FUTEX) += futex.o
obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o
obj-$(CONFIG_SMP) += cpu.o spinlock.o
diff -Nru a/kernel/ntp.c b/kernel/ntp.c
--- /dev/null Wed Dec 31 16:00:00 196900
+++ b/kernel/ntp.c 2005-03-11 17:00:02 -08:00
@@ -0,0 +1,471 @@
+/********************************************************************
+* linux/kernel/ntp.c
+*
+* NTP state machine and time scaling code.
+*
+* Copyright (C) 2004 IBM, John Stultz ([email protected])
+*
+* Portions rewritten from kernel/time.c and kernel/timer.c
+* Please see those files for original copyrights.
+*
+* Hopefully you should never have to understand or touch
+* any of the code below. but don't let that keep you from trying!
+*
+* This code is loosely based on David Mills' RFC 1589 and its
+* updates. Please see the following for more details:
+* http://www.eecis.udel.edu/~mills/database/rfc/rfc1589.txt
+* http://www.eecis.udel.edu/~mills/database/reports/kern/kernb.pdf
+*
+* NOTE: To simplify the code, we do not implement any of
+* the PPS code, as the code that uses it never was merged.
+* [email protected]
+*
+* Revision History:
+* 2004-09-02: A0
+* o First pass sent to lkml for review.
+* 2004-12-07: A1
+* o No changes, sent to lkml for review.
+*
+* TODO List:
+* o More documentation
+* o More testing
+* o More optimization
+*********************************************************************/
+
+#include <linux/ntp.h>
+#include <linux/errno.h>
+#include <linux/sched.h> /* Needed for capable() */
+
+/* NTP scaling code
+ * Functions:
+ * ----------
+ * nsec_t ntp_scale(nsec_t value):
+ * Scales the nsec_t vale using ntp kernel state
+ * void ntp_advance(nsec_t interval):
+ * Increments the NTP state machine by interval time
+ * static int ntp_hardupdate(long offset, struct timeval tv)
+ * ntp_adjtimex helper function
+ * int ntp_adjtimex(struct timex* tx):
+ * Interface to adjust NTP state machine
+ * int ntp_leapsecond(struct timespec now)
+ * Does NTP leapsecond processing. Returns number of
+ * seconds current time should be adjusted by.
+ * void ntp_clear(void):
+ * Clears the ntp kernel state
+ * int get_ntp_status(void):
+ * returns ntp_status value
+ *
+ * Variables:
+ * ----------
+ * ntp kernel state variables:
+ * See below for full list.
+ * ntp_lock:
+ * Protects ntp kernel state variables
+ */
+
+
+
+/* Chapter 5: Kernel Variables [RFC 1589 pg. 28] */
+/* 5.1 Interface Variables */
+static int ntp_status = STA_UNSYNC; /* status */
+static long ntp_offset; /* usec */
+static long ntp_constant = 2; /* ntp magic? */
+static long ntp_maxerror = NTP_PHASE_LIMIT; /* usec */
+static long ntp_esterror = NTP_PHASE_LIMIT; /* usec */
+static const long ntp_tolerance = MAXFREQ; /* shifted ppm */
+static const long ntp_precision = 1; /* constant */
+
+/* 5.2 Phase-Lock Loop Variables */
+static long ntp_freq; /* shifted ppm */
+static long ntp_reftime; /* sec */
+
+/* Extra values */
+static int ntp_state = TIME_OK; /* leapsecond state */
+static long ntp_tick = USEC_PER_SEC/USER_HZ; /* tick length */
+
+static s64 ss_offset_len; /* SINGLESHOT offset adj interval (nsec)*/
+static long singleshot_adj; /* +/- MAX_SINGLESHOT_ADJ (ppm)*/
+static long tick_adj; /* tx->tick adjustment (ppm) */
+static long offset_adj; /* offset adjustment (ppm) */
+
+
+/* lock for the above variables */
+static seqlock_t ntp_lock = SEQLOCK_UNLOCKED;
+
+#define MILLION 1000000
+#define MAX_SINGLESHOT_ADJ 500 /* (ppm) */
+#define SEC_PER_DAY 86400
+
+/* Required to safely shift negative values */
+#define shiftR(x,s) (x < 0) ? (-((-x) >> (s))) : ((x) >> (s))
+
+/* int ntp_advance(nsec_t interval):
+ * Periodic hook which increments NTP state machine by interval.
+ * Returns the signed PPM adjustment to be used for the next interval.
+ * This is ntp_hardclock in the RFC.
+ */
+int ntp_advance(nsec_t interval)
+{
+ static u64 interval_sum=0;
+ static long ss_adj=0;
+ unsigned long flags;
+ long ppm_sum;
+
+ /* inc interval sum */
+ interval_sum += interval;
+
+ write_seqlock_irqsave(&ntp_lock, flags);
+
+ /* decrement singleshot offset interval */
+ ss_offset_len =- interval;
+ if(ss_offset_len < 0) /* make sure it doesn't go negative */
+ ss_offset_len=0;
+
+ /* Do second overflow code */
+ while (interval_sum > NSEC_PER_SEC) {
+ /* XXX - I'd prefer to smoothly apply this math
+ * at each call to ntp_advance() rather then each
+ * second.
+ */
+ long tmp;
+
+ /* Bump maxerror by ntp_tolerance */
+ ntp_maxerror += shiftR(ntp_tolerance, SHIFT_USEC);
+ if (ntp_maxerror > NTP_PHASE_LIMIT) {
+ ntp_maxerror = NTP_PHASE_LIMIT;
+ ntp_status |= STA_UNSYNC;
+ }
+
+ /* Calculate offset_adj for the next second */
+ tmp = ntp_offset;
+ if (!(ntp_status & STA_FLL))
+ tmp = shiftR(tmp, SHIFT_KG + ntp_constant);
+
+ /* bound the adjustment to MAXPHASE/MINSEC */
+ if (tmp > (MAXPHASE / MINSEC) << SHIFT_UPDATE)
+ tmp = (MAXPHASE / MINSEC) << SHIFT_UPDATE;
+ if (tmp < -(MAXPHASE / MINSEC) << SHIFT_UPDATE)
+ tmp = -(MAXPHASE / MINSEC) << SHIFT_UPDATE;
+
+ offset_adj = shiftR(tmp, SHIFT_UPDATE); /* (usec/sec) = ppm */
+ ntp_offset -= tmp;
+
+ interval_sum -= NSEC_PER_SEC;
+
+ /* calculate an singleshot aproximation ppm for the next second */
+ ss_adj = singleshot_adj;
+ singleshot_adj = 0;
+ }
+
+ /* calculate total ppm adjustment for the next interval */
+ ppm_sum = tick_adj;
+ ppm_sum += offset_adj;
+ ppm_sum += shiftR(ntp_freq,SHIFT_USEC);
+ ppm_sum += ss_adj;
+
+{ /*XXX - yank me! just for debug */
+ static int dbg=0;
+ if(!(dbg++%300000))
+ printk("tick_adj(%d) + offset_adj(%d) + ntp_freq(%d) + ss_adj(%d) = ppm_sum(%d)\n", tick_adj, offset_adj, shiftR(ntp_freq,SHIFT_USEC), ss_adj, ppm_sum);
+}
+
+ write_sequnlock_irqrestore(&ntp_lock, flags);
+
+ return ppm_sum;
+}
+
+/* XXX - This function needs more explanation */
+/* called only by ntp_adjtimex while holding ntp_lock */
+static int ntp_hardupdate(long offset, struct timeval tv)
+{
+ int ret;
+ long tmp, interval;
+
+ ret = 0;
+ if (!(ntp_status & STA_PLL))
+ return ret;
+
+ tmp = offset;
+ /* Make sure offset is bounded by MAXPHASE */
+ if (tmp > MAXPHASE)
+ tmp = MAXPHASE;
+ if (tmp < -MAXPHASE)
+ tmp = -MAXPHASE;
+
+ ntp_offset = tmp << SHIFT_UPDATE;
+
+ if ((ntp_status & STA_FREQHOLD) || (ntp_reftime == 0))
+ ntp_reftime = tv.tv_sec;
+
+ /* calculate seconds since last call to hardupdate */
+ interval = tv.tv_sec - ntp_reftime;
+ ntp_reftime = tv.tv_sec;
+
+ if ((ntp_status & STA_FLL) && (interval >= MINSEC)) {
+ long damping;
+ tmp = (offset / interval); /* ppm (usec/sec)*/
+
+ /* convert to shifted ppm, then apply damping factor */
+
+ /* calculate damping factor - XXX bigger comment!*/
+ damping = SHIFT_KH - SHIFT_USEC;
+
+ /* apply damping factor */
+ ntp_freq += shiftR(tmp,damping);
+
+ printk("ntp->freq change: %ld\n",shiftR(tmp,damping));
+
+ } else if ((ntp_status & STA_PLL) && (interval < MAXSEC)) {
+ long damping;
+ tmp = offset * interval; /* ppm XXX - not quite*/
+
+ /* calculate damping factor - XXX bigger comment!*/
+ damping = (2 * ntp_constant) + SHIFT_KF - SHIFT_USEC;
+
+ /* apply damping factor */
+ ntp_freq += shiftR(tmp,damping);
+
+ printk("ntp->freq change: %ld\n", shiftR(tmp,damping));
+
+ } else { /* interval out of bounds */
+ printk("ntp_hardupdate(): interval out of bounds: %ld\n", interval);
+ ret = -1; /* TIME_ERROR */
+ }
+
+ /* bound ntp_freq */
+ if (ntp_freq > ntp_tolerance)
+ ntp_freq = ntp_tolerance;
+ if (ntp_freq < -ntp_tolerance)
+ ntp_freq = -ntp_tolerance;
+
+ return ret;
+}
+
+/* int ntp_adjtimex(struct timex* tx)
+ * Interface to change NTP state machine
+ */
+int ntp_adjtimex(struct timex* tx)
+{
+ long save_offset;
+ int result;
+ unsigned long flags;
+
+/*=[Sanity checking]===============================*/
+ /* Check capabilities if we're trying to modify something */
+ if (tx->modes && !capable(CAP_SYS_TIME))
+ return -EPERM;
+
+ /* frequency adjustment limited to +/- MAXFREQ */
+ if ((tx->modes & ADJ_FREQUENCY)
+ && (abs(tx->freq) > MAXFREQ))
+ return -EINVAL;
+
+ /* maxerror adjustment limited to NTP_PHASE_LIMIT */
+ if ((tx->modes & ADJ_MAXERROR)
+ && (tx->maxerror < 0
+ || tx->maxerror >= NTP_PHASE_LIMIT))
+ return -EINVAL;
+
+ /* esterror adjustment limited to NTP_PHASE_LIMIT */
+ if ((tx->modes & ADJ_ESTERROR)
+ && (tx->esterror < 0
+ || tx->esterror >= NTP_PHASE_LIMIT))
+ return -EINVAL;
+
+ /* constant adjustment must be positive */
+ if ((tx->modes & ADJ_TIMECONST)
+ && (tx->constant < 0))
+ return -EINVAL;
+
+ /* Single shot mode can only be used by itself */
+ if (((tx->modes & ADJ_OFFSET_SINGLESHOT) == ADJ_OFFSET_SINGLESHOT)
+ && (tx->modes != ADJ_OFFSET_SINGLESHOT))
+ return -EINVAL;
+
+ /* offset adjustment limited to +/- MAXPHASE */
+ if ((tx->modes != ADJ_OFFSET_SINGLESHOT)
+ && (tx->modes & ADJ_OFFSET)
+ && (abs(tx->offset)>= MAXPHASE))
+ return -EINVAL;
+
+ /* tick adjustment limited to 10% */
+ if ((tx->modes & ADJ_TICK)
+ && ((tx->tick < 900000/USER_HZ)
+ ||(tx->tick > 11000000/USER_HZ)))
+ return -EINVAL;
+
+ /* dbg output XXX - yank me! */
+ if(tx->modes) {
+ printk("adjtimex: tx->offset: %ld tx->freq: %ld\n",
+ tx->offset, tx->freq);
+ }
+
+/*=[Kernel input bits]==========================*/
+ write_seqlock_irqsave(&ntp_lock, flags);
+
+ result = ntp_state;
+
+ /* For ADJ_OFFSET_SINGLESHOT we must return the old offset */
+ save_offset = shiftR(ntp_offset, SHIFT_UPDATE);
+
+ /* Process input parameters */
+ if (tx->modes & ADJ_STATUS) {
+ ntp_status &= STA_RONLY;
+ ntp_status |= tx->status & ~STA_RONLY;
+ }
+
+ if (tx->modes & ADJ_FREQUENCY)
+ ntp_freq = tx->freq;
+
+ if (tx->modes & ADJ_MAXERROR)
+ ntp_maxerror = tx->maxerror;
+
+ if (tx->modes & ADJ_ESTERROR)
+ ntp_esterror = tx->esterror;
+
+ if (tx->modes & ADJ_TIMECONST)
+ ntp_constant = tx->constant;
+
+ if (tx->modes & ADJ_OFFSET) {
+ /* check if we're doing a singleshot adjustment */
+ if (tx->modes == ADJ_OFFSET_SINGLESHOT)
+ singleshot_adj = tx->offset;
+ /* otherwise, call hardupdate() */
+ else if (ntp_hardupdate(tx->offset, tx->time))
+ result = TIME_ERROR;
+ }
+
+ if (tx->modes & ADJ_TICK) {
+ /* first calculate usec/user_tick offset */
+ tick_adj = (USEC_PER_SEC/USER_HZ) - tx->tick;
+ /* multiply by user_hz to get usec/sec => ppm */
+ tick_adj *= USER_HZ;
+ /* save tx->tick for future calls to adjtimex */
+ ntp_tick = tx->tick;
+ }
+
+ if ((ntp_status & (STA_UNSYNC|STA_CLOCKERR)) != 0 )
+ result = TIME_ERROR;
+
+/*=[Kernel output bits]================================*/
+ /* write kernel state to user timex values*/
+ if ((tx->modes & ADJ_OFFSET_SINGLESHOT) == ADJ_OFFSET_SINGLESHOT)
+ tx->offset = save_offset;
+ else
+ tx->offset = shiftR(ntp_offset, SHIFT_UPDATE);
+
+ tx->freq = ntp_freq;
+ tx->maxerror = ntp_maxerror;
+ tx->esterror = ntp_esterror;
+ tx->status = ntp_status;
+ tx->constant = ntp_constant;
+ tx->precision = ntp_precision;
+ tx->tolerance = ntp_tolerance;
+
+ /* PPS is not implemented, so these are zero */
+ tx->ppsfreq = /*XXX - Not Implemented!*/ 0;
+ tx->jitter = /*XXX - Not Implemented!*/ 0;
+ tx->shift = /*XXX - Not Implemented!*/ 0;
+ tx->stabil = /*XXX - Not Implemented!*/ 0;
+ tx->jitcnt = /*XXX - Not Implemented!*/ 0;
+ tx->calcnt = /*XXX - Not Implemented!*/ 0;
+ tx->errcnt = /*XXX - Not Implemented!*/ 0;
+ tx->stbcnt = /*XXX - Not Implemented!*/ 0;
+
+ write_sequnlock_irqrestore(&ntp_lock, flags);
+
+ return result;
+}
+
+
+/* void ntp_leapsecond(struct timespec now):
+ * NTP Leapsecnod processing code. Returns the number of
+ * seconds (-1, 0, or 1) that should be added to the current
+ * time to properly adjust for leapseconds.
+ */
+int ntp_leapsecond(struct timespec now)
+{
+ /*
+ * Leap second processing. If in leap-insert state at
+ * the end of the day, the system clock is set back one
+ * second; if in leap-delete state, the system clock is
+ * set ahead one second.
+ */
+ static time_t leaptime = 0;
+
+ switch (ntp_state) {
+ case TIME_OK:
+ if (ntp_status & STA_INS) {
+ ntp_state = TIME_INS;
+ /* calculate end of today (23:59:59)*/
+ leaptime = now.tv_sec + SEC_PER_DAY - (now.tv_sec % SEC_PER_DAY) - 1;
+ }
+ else if (ntp_status & STA_DEL) {
+ ntp_state = TIME_DEL;
+ /* calculate end of today (23:59:59)*/
+ leaptime = now.tv_sec + SEC_PER_DAY - (now.tv_sec % SEC_PER_DAY) - 1;
+ }
+ break;
+
+ case TIME_INS:
+ /* Once we are at (or past) leaptime, insert the second */
+ if (now.tv_sec > leaptime) {
+ ntp_state = TIME_OOP;
+ printk(KERN_NOTICE "Clock: inserting leap second 23:59:60 UTC\n");
+
+ return -1;
+ }
+ break;
+
+ case TIME_DEL:
+ /* Once we are at (or past) leaptime, delete the second */
+ if (now.tv_sec >= leaptime) {
+ ntp_state = TIME_WAIT;
+ printk(KERN_NOTICE "Clock: deleting leap second 23:59:59 UTC\n");
+
+ return 1;
+ }
+ break;
+
+ case TIME_OOP:
+ /* Wait for the end of the leap second*/
+ if (now.tv_sec > (leaptime + 1))
+ ntp_state = TIME_WAIT;
+ break;
+
+ case TIME_WAIT:
+ if (!(ntp_status & (STA_INS | STA_DEL)))
+ ntp_state = TIME_OK;
+ }
+
+ return 0;
+}
+
+/* void ntp_clear(void):
+ * Clears the NTP state machine.
+ */
+void ntp_clear(void)
+{
+ unsigned long flags;
+ write_seqlock_irqsave(&ntp_lock, flags);
+
+ /* clear everything */
+ ntp_status |= STA_UNSYNC;
+ ntp_maxerror = NTP_PHASE_LIMIT;
+ ntp_esterror = NTP_PHASE_LIMIT;
+ ss_offset_len=0;
+ singleshot_adj=0;
+ tick_adj=0;
+ offset_adj =0;
+
+ write_sequnlock_irqrestore(&ntp_lock, flags);
+}
+
+/* int get_ntp_status(void):
+ * Returns the NTP status.
+ */
+int get_ntp_status(void)
+{
+ return ntp_status;
+}
+
diff -Nru a/kernel/time.c b/kernel/time.c
--- a/kernel/time.c 2005-03-11 17:00:02 -08:00
+++ b/kernel/time.c 2005-03-11 17:00:02 -08:00
@@ -37,6 +37,7 @@

#include <asm/uaccess.h>
#include <asm/unistd.h>
+#include <linux/timeofday.h>

/*
* The timezone where the local system is located. Used as a default by some
@@ -218,6 +219,7 @@
/* adjtimex mainly allows reading (and writing, if superuser) of
* kernel time-keeping variables. used by xntpd.
*/
+#ifndef CONFIG_NEWTOD
int do_adjtimex(struct timex *txc)
{
long ltemp, mtemp, save_adjust;
@@ -400,6 +402,7 @@
do_gettimeofday(&txc->time);
return(result);
}
+#endif

asmlinkage long sys_adjtimex(struct timex __user *txc_p)
{
diff -Nru a/kernel/timeofday.c b/kernel/timeofday.c
--- /dev/null Wed Dec 31 16:00:00 196900
+++ b/kernel/timeofday.c 2005-03-11 17:00:02 -08:00
@@ -0,0 +1,464 @@
+/*********************************************************************
+* linux/kernel/timeofday.c
+*
+* Copyright (C) 2003, 2004 IBM, John Stultz ([email protected])
+*
+* This file contains the functions which access and manage
+* the system's time of day functionality.
+*
+* Revision History:
+* 2004-09-02: A0
+* o First pass sent to lkml for review.
+* 2004-12-07: A1
+* o Rework of timesource structure
+* o Sent to lkml for review
+* 2005-01-24: A2
+* o write_seqlock_irq -> writeseqlock_irqsave
+* o arch generic interface for for get_cmos_time() equivalents
+* o suspend/resume hooks for sleep/hibernate (lightly tested)
+* o timesource adjust_callback hook
+* o Sent to lkml for review
+* 2005-03-11: A3
+* o periodic_hook (formerly interrupt_hook) now calle by softtimer
+* o yanked ntp_scale(), ntp adjustments are done in cyc2ns now
+* TODO List:
+* o cyc2ns remainder code needs reworking
+* o vsyscall/fsyscall infrastructure
+**********************************************************************/
+
+#include <linux/timeofday.h>
+#include <linux/timesource.h>
+#include <linux/ntp.h>
+#include <linux/timex.h>
+#include <linux/timer.h>
+#include <linux/module.h>
+
+/*XXX - remove later */
+#define TIME_DBG 1
+#define TIME_DBG_FREQ 60000
+
+/*[Nanosecond based variables]----------------
+ * system_time:
+ * Monotonically increasing counter of the number of nanoseconds
+ * since boot.
+ * wall_time_offset:
+ * Offset added to system_time to provide accurate time-of-day
+ */
+static nsec_t system_time;
+static nsec_t wall_time_offset;
+
+
+/*[Cycle based variables]----------------
+ * offset_base:
+ * Value of the timesource at the last timeofday_periodic_hook()
+ * (adjusted only minorly to account for rounded off cycles)
+ */
+static cycle_t offset_base;
+
+/*[Time source data]-------------------
+ * timesource:
+ * current timesource pointer
+ */
+static struct timesource_t *timesource;
+
+/*[Locks]----------------------------
+ * system_time_lock:
+ * generic lock for all locally scoped time values
+ */
+static seqlock_t system_time_lock = SEQLOCK_UNLOCKED;
+
+
+/*[Suspend/Resume info]-------------------
+ * time_suspend_state:
+ * variable that keeps track of suspend state
+ * suspend_start:
+ * start of the suspend call
+ */
+static enum { TIME_RUNNING, TIME_SUSPENDED } time_suspend_state = TIME_RUNNING;
+static nsec_t suspend_start;
+
+
+/* [XXX - Hacks]--------------------
+ * Makes stuff compile
+ */
+extern nsec_t read_persistent_clock(void);
+extern void sync_persistent_clock(struct timespec ts);
+
+int ntp_adj;
+
+/* get_lowres_timestamp():
+ * Returns a low res timestamp.
+ * (ie: the value of system_time as calculated at
+ * the last invocation of timeofday_periodic_hook() )
+ */
+nsec_t get_lowres_timestamp(void)
+{
+ nsec_t ret;
+ unsigned long seq;
+ do {
+ seq = read_seqbegin(&system_time_lock);
+
+ /* quickly grab system_time*/
+ ret = system_time;
+
+ } while (read_seqretry(&system_time_lock, seq));
+
+ return ret;
+}
+
+
+/* get_lowres_timeofday():
+ * Returns a low res time of day, as calculated at the
+ * last invocation of timeofday_periodic_hook()
+ */
+nsec_t get_lowres_timeofday(void)
+{
+ nsec_t ret;
+ unsigned long seq;
+ do {
+ seq = read_seqbegin(&system_time_lock);
+
+ /* quickly calculate low-res time of day */
+ ret = system_time + wall_time_offset;
+
+ } while (read_seqretry(&system_time_lock, seq));
+
+ return ret;
+}
+
+
+/* __monotonic_clock():
+ * private function, must hold system_time_lock lock when being
+ * called. Returns the monotonically increasing number of
+ * nanoseconds since the system booted (adjusted by NTP scaling)
+ */
+static inline nsec_t __monotonic_clock(void)
+{
+ nsec_t ret, ns_offset;
+ cycle_t now, delta;
+
+ /* read timesource */
+ now = read_timesource(timesource);
+
+ /* calculate the delta since the last timeofday_periodic_hook */
+ delta = (now - offset_base) & timesource->mask;
+
+ /* convert to nanoseconds */
+ ns_offset = cyc2ns(timesource, ntp_adj, delta, NULL);
+
+ /* add result to system time */
+ ret = system_time + ns_offset;
+
+ return ret;
+}
+
+
+/* do_monotonic_clock():
+ * Returns the monotonically increasing number of nanoseconds
+ * since the system booted via __monotonic_clock()
+ */
+nsec_t do_monotonic_clock(void)
+{
+ nsec_t ret;
+ unsigned long seq;
+
+ /* atomically read __monotonic_clock() */
+ do {
+ seq = read_seqbegin(&system_time_lock);
+
+ ret = __monotonic_clock();
+
+ } while (read_seqretry(&system_time_lock, seq));
+
+ return ret;
+}
+
+
+/* do_gettimeofday():
+ * Returns the time of day
+ */
+void do_gettimeofday(struct timeval *tv)
+{
+ nsec_t wall, sys;
+ unsigned long seq;
+
+ /* atomically read wall and sys time */
+ do {
+ seq = read_seqbegin(&system_time_lock);
+
+ wall = wall_time_offset;
+ sys = __monotonic_clock();
+
+ } while (read_seqretry(&system_time_lock, seq));
+
+ /* add them and convert to timeval */
+ *tv = ns2timeval(wall+sys);
+}
+EXPORT_SYMBOL(do_gettimeofday);
+
+
+/* do_settimeofday():
+ * Sets the time of day
+ */
+int do_settimeofday(struct timespec *tv)
+{
+ unsigned long flags;
+ /* convert timespec to ns */
+ nsec_t newtime = timespec2ns(tv);
+
+ /* atomically adjust wall_time_offset to the desired value */
+ write_seqlock_irqsave(&system_time_lock, flags);
+
+ wall_time_offset = newtime - __monotonic_clock();
+
+ /* clear NTP settings */
+ ntp_clear();
+
+ write_sequnlock_irqrestore(&system_time_lock, flags);
+
+ return 0;
+}
+EXPORT_SYMBOL(do_settimeofday);
+
+
+/* do_adjtimex:
+ * Userspace NTP daemon's interface to the kernel NTP variables
+ */
+int do_adjtimex(struct timex *tx)
+{
+ do_gettimeofday(&tx->time); /* set timex->time*/
+ /* Note: We set tx->time first, */
+ /* because ntp_adjtimex uses it */
+ return ntp_adjtimex(tx); /* call out to NTP code */
+}
+
+
+/* timeofday_suspend_hook():
+ * This function allows the timeofday subsystem to
+ * be shutdown for a period of time. Usefull when
+ * going into suspend/hibernate mode. The code is
+ * very similar to the first half of
+ * timeofday_periodic_hook().
+ */
+void timeofday_suspend_hook(void)
+{
+ unsigned long flags;
+
+ write_seqlock_irqsave(&system_time_lock, flags);
+ if (time_suspend_state != TIME_RUNNING) {
+ printk(KERN_INFO "timeofday_suspend_hook: ACK! called while we're suspended!");
+ goto out;
+ }
+
+ /* First off, save suspend start time
+ * then quickly call __monotonic_clock.
+ * These two calls hopefully occur quickly
+ * because the difference between reads will
+ * accumulate as time drift on resume.
+ */
+ suspend_start = read_persistent_clock();
+ system_time = __monotonic_clock();
+
+ /* switch states */
+ time_suspend_state = TIME_SUSPENDED;
+
+out:
+ write_sequnlock_irqrestore(&system_time_lock, flags);
+}
+
+
+/* timeofday_resume_hook():
+ * This function resumes the timeofday subsystem
+ * from a previous call to timeofday_suspend_hook.
+ */
+void timeofday_resume_hook(void)
+{
+ nsec_t now, suspend_time;
+ unsigned long s_flags, x_flags;
+
+ write_seqlock_irqsave(&system_time_lock, s_flags);
+ if (time_suspend_state != TIME_SUSPENDED) {
+ printk(KERN_INFO "timeofday_resume_hook: ACK! called while we're not suspended!");
+ goto out;
+ }
+
+ /* Read persistent clock to mark the end of
+ * the suspend interval then rebase the
+ * offset_base to current timesource value.
+ * Again, time between these two calls will
+ * not be accounted for and will show up as
+ * time drift.
+ */
+ now = read_persistent_clock();
+ offset_base = read_timesource(timesource);
+
+ /* calculate how long we were out for */
+ suspend_time = now - suspend_start;
+
+ /* update system_time */
+ system_time += suspend_time;
+
+ /* clear NTP state machine */
+ ntp_clear();
+
+ /* Set us back to running */
+ time_suspend_state = TIME_RUNNING;
+
+
+ /* finally, update legacy time values */
+ write_seqlock_irqsave(&xtime_lock, x_flags);
+ xtime = ns2timespec(system_time + wall_time_offset);
+ wall_to_monotonic = ns2timespec(wall_time_offset);
+ wall_to_monotonic.tv_sec = -wall_to_monotonic.tv_sec;
+ wall_to_monotonic.tv_nsec = -wall_to_monotonic.tv_nsec;
+ /* XXX - should jiffies be updated here? */
+ write_sequnlock_irqrestore(&xtime_lock, x_flags);
+
+out:
+ write_sequnlock_irqrestore(&system_time_lock, s_flags);
+}
+
+struct timer_list timeofday_timer;
+
+/* timeofday_periodic_hook:
+ * Calculates the delta since the last call,
+ * updates system time and clears the offset.
+ * Called via timeofday_timer.
+ */
+static void timeofday_periodic_hook(unsigned long unused)
+{
+ cycle_t now, delta, remainder;
+ nsec_t ns, ns_ntp;
+ long leapsecond;
+ struct timesource_t* next;
+ unsigned long s_flags, x_flags;
+ u64 tmp;
+
+ write_seqlock_irqsave(&system_time_lock, s_flags);
+
+ /* read time source */
+ now = read_timesource(timesource);
+
+ /* calculate cycle delta */
+ delta = (now - offset_base) & timesource->mask;
+
+ /* convert cycles to ntp adjusted ns and save remainder */
+ ns_ntp = cyc2ns(timesource, ntp_adj, delta, &remainder);
+
+ /* convert cycles to raw ns for ntp advance */
+ ns = cyc2ns(timesource, 0, delta, NULL);
+
+
+
+#if TIME_DBG /* XXX - remove later*/
+{
+ static int dbg=0;
+ if(!(dbg++%TIME_DBG_FREQ)){
+ printk(KERN_INFO "now: %lluc - then: %lluc = delta: %lluc -> %llu ns + %llu cyc (ntp_adj: %i)\n",
+ (unsigned long long)now, (unsigned long long)offset_base,
+ (unsigned long long)delta, (unsigned long long)ns,
+ (unsigned long long)remainder, ntp_adj);
+ }
+}
+#endif
+ /* update system_time */
+ system_time += ns_ntp;
+
+ /* reset the offset_base */
+ offset_base = now;
+
+ /* subtract remainder to account for rounded off cycles */
+ offset_base = (offset_base - remainder) & timesource->mask;
+
+ /* advance the ntp state machine by ns*/
+ ntp_adj = ntp_advance(ns);
+
+ /* do ntp leap second processing*/
+ leapsecond = ntp_leapsecond(ns2timespec(system_time+wall_time_offset));
+ wall_time_offset += leapsecond * NSEC_PER_SEC;
+
+ /* sync the persistent clock */
+ if (!(get_ntp_status() & STA_UNSYNC))
+ sync_persistent_clock(ns2timespec(system_time + wall_time_offset));
+
+ /* if necessary, switch timesources */
+ next = get_next_timesource();
+ if (next != timesource) {
+ /* immediately set new offset_base */
+ offset_base = read_timesource(next);
+ /* swap timesources */
+ timesource = next;
+ printk(KERN_INFO "Time: %s timesource has been installed.\n",
+ timesource->name);
+ ntp_clear();
+ ntp_adj = 0;
+ }
+
+ /* now is a safe time, so allow timesource to adjust
+ * itself (for example: to make cpufreq changes).
+ */
+ if(timesource->update_callback)
+ timesource->update_callback();
+
+
+ /* convert the signed ppm to timesource multiplier adjustment */
+ tmp = abs(ntp_adj);
+ tmp = tmp * timesource->mult;
+ do_div(tmp, 1000000);
+ if (ntp_adj < 0)
+ ntp_adj = -(int)tmp;
+ else
+ ntp_adj = (int)tmp;
+
+ /* update legacy time values */
+ write_seqlock_irqsave(&xtime_lock, x_flags);
+ xtime = ns2timespec(system_time + wall_time_offset);
+ wall_to_monotonic = ns2timespec(wall_time_offset);
+ wall_to_monotonic.tv_sec = -wall_to_monotonic.tv_sec;
+ wall_to_monotonic.tv_nsec = -wall_to_monotonic.tv_nsec;
+ write_sequnlock_irqrestore(&xtime_lock, x_flags);
+
+ write_sequnlock_irqrestore(&system_time_lock, s_flags);
+
+ /* Set us up to go off on the next tick */
+ timeofday_timer.expires = jiffies + 1;
+ add_timer(&timeofday_timer);
+}
+
+
+/* timeofday_init():
+ * Initializes time variables
+ */
+void timeofday_init(void)
+{
+ unsigned long flags;
+#if TIME_DBG
+ printk(KERN_INFO "timeofday_init: Starting up!\n");
+#endif
+ write_seqlock_irqsave(&system_time_lock, flags);
+
+ /* initialize the timesource variable */
+ timesource = get_next_timesource();
+
+ /* clear and initialize offsets*/
+ offset_base = read_timesource(timesource);
+ wall_time_offset = read_persistent_clock();
+
+ /* clear NTP scaling factor & state machine */
+ ntp_adj = 0;
+ ntp_clear();
+
+ write_sequnlock_irqrestore(&system_time_lock, flags);
+
+ /* Install timeofday_periodic_hook timer */
+ init_timer(&timeofday_timer);
+ timeofday_timer.function = timeofday_periodic_hook;
+ timeofday_timer.expires = jiffies + 1;
+ add_timer(&timeofday_timer);
+
+
+#if TIME_DBG
+ printk(KERN_INFO "timeofday_init: finished!\n");
+#endif
+ return;
+}
diff -Nru a/kernel/timer.c b/kernel/timer.c
--- a/kernel/timer.c 2005-03-11 17:00:02 -08:00
+++ b/kernel/timer.c 2005-03-11 17:00:02 -08:00
@@ -577,6 +577,7 @@
int tickadj = 500/HZ ? : 1; /* microsecs */


+#ifndef CONFIG_NEWTOD
/*
* phase-lock loop variables
*/
@@ -807,6 +808,9 @@
}
} while (ticks);
}
+#else /* CONFIG_NEWTOD */
+#define update_wall_time(x)
+#endif /* CONFIG_NEWTOD */

/*
* Called from the timer interrupt handler to charge one tick to the current
diff -Nru a/kernel/timesource.c b/kernel/timesource.c
--- /dev/null Wed Dec 31 16:00:00 196900
+++ b/kernel/timesource.c 2005-03-11 17:00:02 -08:00
@@ -0,0 +1,71 @@
+/*********************************************************************
+* linux/kernel/timesource.c
+*
+* Copyright (C) 2004 IBM, John Stultz ([email protected])
+*
+* This file contains the functions which manage
+* timesource drivers.
+*
+* Revision History:
+* 2004-12-07: A1
+* o Rework of timesource structure
+* o Sent to lkml for review
+*
+* TODO List:
+* o Allow timesource drivers to be registered and unregistered
+* o Keep list of all currently registered timesources
+* o Use "clock=xyz" boot option for selection overrides.
+* o sysfs interface for manually choosing timesources
+* o get rid of timesource_jiffies extern
+**********************************************************************/
+
+#include <linux/timesource.h>
+
+/*[Timesource internal variables]---------
+ * curr_timesource:
+ * currently selected timesource. Initialized to timesource_jiffies.
+ * next_timesource:
+ * pending next selected timesource.
+ * timesource_lock:
+ * protects manipulations to curr_timesource and next_timesource
+ */
+/* XXX - Need to have a better way for initializing curr_timesource */
+extern struct timesource_t timesource_jiffies;
+static struct timesource_t *curr_timesource = &timesource_jiffies;
+static struct timesource_t *next_timesource;
+static seqlock_t timesource_lock = SEQLOCK_UNLOCKED;
+
+
+/* register_timesource():
+ * Used to install a new timesource
+ */
+void register_timesource(struct timesource_t* t)
+{
+ write_seqlock(&timesource_lock);
+
+ /* XXX - check override */
+
+ /* if next_timesource has been set, make sure we beat that one too */
+ if (next_timesource) {
+ if (t->priority > next_timesource->priority)
+ next_timesource = t;
+ } else if(t->priority > curr_timesource->priority)
+ next_timesource = t;
+
+ write_sequnlock(&timesource_lock);
+}
+
+/* get_next_timesource():
+ * Returns the selected timesource
+ */
+struct timesource_t* get_next_timesource(void)
+{
+ write_seqlock(&timesource_lock);
+ if (next_timesource) {
+ curr_timesource = next_timesource;
+ next_timesource = NULL;
+ }
+ write_sequnlock(&timesource_lock);
+
+ return curr_timesource;
+}



2005-03-12 01:30:58

by john stultz

[permalink] [raw]
Subject: [RFC][PATCH] new timeofday arch specific hooks (v. A3)

All,
This patch implements the minimal architecture specific hooks to enable
the new time of day subsystem code for i386, x86-64, ia64, ppc32 and
ppc64. It applies on top of my linux-2.6.11_timeofday-core_A3 patch and
with this patch applied, you can test the new time of day subsystem.

Basically it configs in the NEWTOD code and cuts alot of code out of the
build via #ifdefs. I know, I know, #ifdefs' are ugly and bad, and the
final patch will just remove the old code. For now this allows us to be
flexible and easily switch between the two implementations with a single
define.

New in this version:
o ppc32 arch code (by Darrick Wong. Many thanks to him for this code!)
o ia64 arch code (by Max Asbock. Many thanks to him for this code!)
o minor cleanups moving code between the arch and timesource patches

Items still on the TODO list:
o s390 arch port (hey Martin: nudge, nudge :)
o arch specific vsyscall/fsyscall interface
o other arch ports (volunteers wanted!)

I look forward to your comments and feedback.

thanks
-john

linux-2.6.11_timeofday-arch_A3.patch
=======================================
diff -Nru a/arch/i386/Kconfig b/arch/i386/Kconfig
--- a/arch/i386/Kconfig 2005-03-11 17:02:30 -08:00
+++ b/arch/i386/Kconfig 2005-03-11 17:02:30 -08:00
@@ -14,6 +14,10 @@
486, 586, Pentiums, and various instruction-set-compatible chips by
AMD, Cyrix, and others.

+config NEWTOD
+ bool
+ default y
+
config MMU
bool
default y
diff -Nru a/arch/i386/kernel/apm.c b/arch/i386/kernel/apm.c
--- a/arch/i386/kernel/apm.c 2005-03-11 17:02:30 -08:00
+++ b/arch/i386/kernel/apm.c 2005-03-11 17:02:30 -08:00
@@ -224,6 +224,7 @@
#include <linux/smp_lock.h>
#include <linux/dmi.h>
#include <linux/suspend.h>
+#include <linux/timeofday.h>

#include <asm/system.h>
#include <asm/uaccess.h>
@@ -1204,6 +1205,7 @@
device_suspend(PMSG_SUSPEND);
device_power_down(PMSG_SUSPEND);

+ timeofday_suspend_hook();
/* serialize with the timer interrupt */
write_seqlock_irq(&xtime_lock);

@@ -1231,6 +1233,7 @@
spin_unlock(&i8253_lock);
write_sequnlock_irq(&xtime_lock);

+ timeofday_resume_hook();
if (err == APM_NO_ERROR)
err = APM_SUCCESS;
if (err != APM_SUCCESS)
diff -Nru a/arch/i386/kernel/time.c b/arch/i386/kernel/time.c
--- a/arch/i386/kernel/time.c 2005-03-11 17:02:30 -08:00
+++ b/arch/i386/kernel/time.c 2005-03-11 17:02:30 -08:00
@@ -68,6 +68,8 @@

#include "io_ports.h"

+#include <linux/timeofday.h>
+
extern spinlock_t i8259A_lock;
int pit_latch_buggy; /* extern */

@@ -117,6 +119,7 @@
}
EXPORT_SYMBOL(rtc_cmos_write);

+#ifndef CONFIG_NEWTOD
/*
* This version of gettimeofday has microsecond resolution
* and better than microsecond precision on fast x86 machines with TSC.
@@ -199,6 +202,7 @@
}

EXPORT_SYMBOL(do_settimeofday);
+#endif

static int set_rtc_mmss(unsigned long nowtime)
{
@@ -224,11 +228,13 @@
* Note: This function is required to return accurate
* time even in the absence of multiple timer ticks.
*/
+#ifndef CONFIG_NEWTOD
unsigned long long monotonic_clock(void)
{
return cur_timer->monotonic_clock();
}
EXPORT_SYMBOL(monotonic_clock);
+#endif

#if defined(CONFIG_SMP) && defined(CONFIG_FRAME_POINTER)
unsigned long profile_pc(struct pt_regs *regs)
@@ -268,6 +274,7 @@

do_timer_interrupt_hook(regs);

+#ifndef CONFIG_NEWTOD
/*
* If we have an externally synchronized Linux clock, then update
* CMOS clock accordingly every ~11 minutes. Set_rtc_mmss() has to be
@@ -286,6 +293,7 @@
} else if (set_rtc_mmss(xtime.tv_sec))
last_rtc_update -= 600;
}
+#endif

if (MCA_bus) {
/* The PS/2 uses level-triggered interrupts. You can't
@@ -318,7 +326,9 @@
*/
write_seqlock(&xtime_lock);

+#ifndef CONFIG_NEWTOD
cur_timer->mark_offset();
+#endif

do_timer_interrupt(irq, NULL, regs);

@@ -343,6 +353,40 @@
return retval;
}

+/* arch specific timeofday hooks */
+nsec_t read_persistent_clock(void)
+{
+ return (nsec_t)get_cmos_time() * NSEC_PER_SEC;
+}
+
+void sync_persistent_clock(struct timespec ts)
+{
+ /*
+ * If we have an externally synchronized Linux clock, then update
+ * CMOS clock accordingly every ~11 minutes. Set_rtc_mmss() has to be
+ * called as close as possible to 500 ms before the new second starts.
+ */
+ if (ts.tv_sec > last_rtc_update + 660 &&
+ (ts.tv_nsec / 1000)
+ >= USEC_AFTER - ((unsigned) TICK_SIZE) / 2 &&
+ (ts.tv_nsec / 1000)
+ <= USEC_BEFORE + ((unsigned) TICK_SIZE) / 2) {
+ /* horrible...FIXME */
+ if (efi_enabled) {
+ if (efi_set_rtc_mmss(ts.tv_sec) == 0)
+ last_rtc_update = ts.tv_sec;
+ else
+ last_rtc_update = ts.tv_sec - 600;
+ } else if (set_rtc_mmss(ts.tv_sec) == 0)
+ last_rtc_update = ts.tv_sec;
+ else
+ last_rtc_update = ts.tv_sec - 600; /* do it again in 60 s */
+ }
+}
+
+
+
+#ifndef CONFIG_NEWTOD
static long clock_cmos_diff, sleep_start;

static int timer_suspend(struct sys_device *dev, u32 state)
@@ -376,6 +420,23 @@
wall_jiffies += sleep_length;
return 0;
}
+#else /* !CONFIG_NEWTOD */
+static int timer_suspend(struct sys_device *dev, u32 state)
+{
+ timeofday_suspend_hook();
+ return 0;
+}
+
+static int timer_resume(struct sys_device *dev)
+{
+#ifdef CONFIG_HPET_TIMER
+ if (is_hpet_enabled())
+ hpet_reenable();
+#endif
+ timeofday_resume_hook();
+ return 0;
+}
+#endif

static struct sysdev_class timer_sysclass = {
.resume = timer_resume,
@@ -405,17 +466,21 @@
/* Duplicate of time_init() below, with hpet_enable part added */
void __init hpet_time_init(void)
{
+#ifndef CONFIG_NEWTOD
xtime.tv_sec = get_cmos_time();
xtime.tv_nsec = (INITIAL_JIFFIES % HZ) * (NSEC_PER_SEC / HZ);
set_normalized_timespec(&wall_to_monotonic,
-xtime.tv_sec, -xtime.tv_nsec);
+#endif

if (hpet_enable() >= 0) {
printk("Using HPET for base-timer\n");
}

+#ifndef CONFIG_NEWTOD
cur_timer = select_timer();
printk(KERN_INFO "Using %s for high-res timesource\n",cur_timer->name);
+#endif

time_init_hook();
}
@@ -433,6 +498,7 @@
return;
}
#endif
+#ifndef CONFIG_NEWTOD
xtime.tv_sec = get_cmos_time();
xtime.tv_nsec = (INITIAL_JIFFIES % HZ) * (NSEC_PER_SEC / HZ);
set_normalized_timespec(&wall_to_monotonic,
@@ -440,6 +506,7 @@

cur_timer = select_timer();
printk(KERN_INFO "Using %s for high-res timesource\n",cur_timer->name);
+#endif

time_init_hook();
}
diff -Nru a/arch/i386/lib/delay.c b/arch/i386/lib/delay.c
--- a/arch/i386/lib/delay.c 2005-03-11 17:02:30 -08:00
+++ b/arch/i386/lib/delay.c 2005-03-11 17:02:30 -08:00
@@ -23,10 +23,29 @@

extern struct timer_opts* timer;

+#ifndef CONFIG_NEWTOD
void __delay(unsigned long loops)
{
cur_timer->delay(loops);
}
+#else
+#include <linux/timeofday.h>
+/* XXX - For now just use a simple loop delay
+ * This has cpufreq issues, but so did the old method.
+ */
+void __delay(unsigned long loops)
+{
+ int d0;
+ __asm__ __volatile__(
+ "\tjmp 1f\n"
+ ".align 16\n"
+ "1:\tjmp 2f\n"
+ ".align 16\n"
+ "2:\tdecl %0\n\tjns 2b"
+ :"=&a" (d0)
+ :"0" (loops));
+}
+#endif

inline void __const_udelay(unsigned long xloops)
{
diff -Nru a/arch/ia64/Kconfig b/arch/ia64/Kconfig
--- a/arch/ia64/Kconfig 2005-03-11 17:02:30 -08:00
+++ b/arch/ia64/Kconfig 2005-03-11 17:02:30 -08:00
@@ -18,6 +18,10 @@
page at <http://www.linuxia64.org/> and a mailing list at
<[email protected]>.

+config NEWTOD
+ bool
+ default y
+
config 64BIT
bool
default y
@@ -36,7 +40,7 @@

config TIME_INTERPOLATION
bool
- default y
+ default n

config EFI
bool
diff -Nru a/arch/ia64/kernel/asm-offsets.c b/arch/ia64/kernel/asm-offsets.c
--- a/arch/ia64/kernel/asm-offsets.c 2005-03-11 17:02:30 -08:00
+++ b/arch/ia64/kernel/asm-offsets.c 2005-03-11 17:02:30 -08:00
@@ -222,6 +222,7 @@
DEFINE(IA64_MCA_CPU_INIT_STACK_OFFSET,
offsetof (struct ia64_mca_cpu, init_stack));
BLANK();
+#ifndef CONFIG_NEWTOD
/* used by fsys_gettimeofday in arch/ia64/kernel/fsys.S */
DEFINE(IA64_TIME_INTERPOLATOR_ADDRESS_OFFSET, offsetof (struct time_interpolator, addr));
DEFINE(IA64_TIME_INTERPOLATOR_SOURCE_OFFSET, offsetof (struct time_interpolator, source));
@@ -235,5 +236,6 @@
DEFINE(IA64_TIME_SOURCE_CPU, TIME_SOURCE_CPU);
DEFINE(IA64_TIME_SOURCE_MMIO64, TIME_SOURCE_MMIO64);
DEFINE(IA64_TIME_SOURCE_MMIO32, TIME_SOURCE_MMIO32);
+#endif /* CONFIG_NEWTOD */
DEFINE(IA64_TIMESPEC_TV_NSEC_OFFSET, offsetof (struct timespec, tv_nsec));
}
diff -Nru a/arch/ia64/kernel/fsys.S b/arch/ia64/kernel/fsys.S
--- a/arch/ia64/kernel/fsys.S 2005-03-11 17:02:30 -08:00
+++ b/arch/ia64/kernel/fsys.S 2005-03-11 17:02:30 -08:00
@@ -145,6 +145,7 @@
FSYS_RETURN
END(fsys_set_tid_address)

+#ifndef CONFIG_NEWTOD
/*
* Ensure that the time interpolator structure is compatible with the asm code
*/
@@ -326,6 +327,7 @@
EX(.fail_efault, st8 [r31] = r9)
EX(.fail_efault, st8 [r23] = r21)
FSYS_RETURN
+#endif /* !CONFIG_NEWTOD */
.fail_einval:
mov r8 = EINVAL
mov r10 = -1
@@ -334,6 +336,7 @@
mov r8 = EFAULT
mov r10 = -1
FSYS_RETURN
+#ifndef CONFIG_NEWTOD
END(fsys_gettimeofday)

ENTRY(fsys_clock_gettime)
@@ -347,6 +350,7 @@
shl r30 = r32,15
br.many .gettime
END(fsys_clock_gettime)
+#endif /* !CONFIG_NEWTOD */

/*
* long fsys_rt_sigprocmask (int how, sigset_t *set, sigset_t *oset, size_t sigsetsize).
@@ -687,7 +691,11 @@
data8 0 // setrlimit
data8 0 // getrlimit // 1085
data8 0 // getrusage
+#ifdef CONFIG_NEWTOD
+ data8 0 // gettimeofday
+#else
data8 fsys_gettimeofday // gettimeofday
+#endif
data8 0 // settimeofday
data8 0 // select
data8 0 // poll // 1090
@@ -854,7 +862,11 @@
data8 0 // timer_getoverrun
data8 0 // timer_delete
data8 0 // clock_settime
+#ifdef CONFIG_NEWTOD
+ data8 0 // clock_gettime
+#else
data8 fsys_clock_gettime // clock_gettime
+#endif
data8 0 // clock_getres // 1255
data8 0 // clock_nanosleep
data8 0 // fstatfs64
diff -Nru a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c
--- a/arch/ia64/kernel/time.c 2005-03-11 17:02:30 -08:00
+++ b/arch/ia64/kernel/time.c 2005-03-11 17:02:30 -08:00
@@ -21,6 +21,7 @@
#include <linux/efi.h>
#include <linux/profile.h>
#include <linux/timex.h>
+#include <linux/timeofday.h>

#include <asm/machvec.h>
#include <asm/delay.h>
@@ -45,11 +46,13 @@

#endif

+#ifndef CONFIG_NEWTOD
static struct time_interpolator itc_interpolator = {
.shift = 16,
.mask = 0xffffffffffffffffLL,
.source = TIME_SOURCE_CPU
};
+#endif /* CONFIG_NEWTOD */

static irqreturn_t
timer_interrupt (int irq, void *dev_id, struct pt_regs *regs)
@@ -211,6 +214,7 @@
local_cpu_data->nsec_per_cyc = ((NSEC_PER_SEC<<IA64_NSEC_PER_CYC_SHIFT)
+ itc_freq/2)/itc_freq;

+#ifndef CONFIG_NEWTOD
if (!(sal_platform_features & IA64_SAL_PLATFORM_FEATURE_ITC_DRIFT)) {
itc_interpolator.frequency = local_cpu_data->itc_freq;
itc_interpolator.drift = itc_drift;
@@ -229,6 +233,7 @@
#endif
register_time_interpolator(&itc_interpolator);
}
+#endif /* CONFIG_NEWTOD */

/* Setup the CPU local timer tick */
ia64_cpu_local_tick();
@@ -253,3 +258,17 @@
*/
set_normalized_timespec(&wall_to_monotonic, -xtime.tv_sec, -xtime.tv_nsec);
}
+
+/* arch specific timeofday hooks */
+nsec_t read_persistent_clock(void)
+{
+ struct timespec ts;
+ efi_gettimeofday(&ts);
+ return (nsec_t)(ts.tv_sec * NSEC_PER_SEC + ts.tv_nsec);
+}
+
+void sync_persistent_clock(struct timespec ts)
+{
+ /* XXX - Something should go here, no? */
+}
+
diff -Nru a/arch/ia64/sn/kernel/sn2/timer.c b/arch/ia64/sn/kernel/sn2/timer.c
--- a/arch/ia64/sn/kernel/sn2/timer.c 2005-03-11 17:02:30 -08:00
+++ b/arch/ia64/sn/kernel/sn2/timer.c 2005-03-11 17:02:30 -08:00
@@ -19,6 +19,7 @@
#include <asm/sn/shub_mmr.h>
#include <asm/sn/clksupport.h>

+#ifndef CONFIG_NEWTOD
extern unsigned long sn_rtc_cycles_per_second;

static struct time_interpolator sn2_interpolator = {
@@ -34,3 +35,8 @@
sn2_interpolator.addr = RTC_COUNTER_ADDR;
register_time_interpolator(&sn2_interpolator);
}
+#else
+void __init sn_timer_init(void)
+{
+}
+#endif
diff -Nru a/arch/ppc/Kconfig b/arch/ppc/Kconfig
--- a/arch/ppc/Kconfig 2005-03-11 17:02:30 -08:00
+++ b/arch/ppc/Kconfig 2005-03-11 17:02:30 -08:00
@@ -8,6 +8,10 @@
bool
default y

+config NEWTOD
+ bool
+ default y
+
config UID16
bool

diff -Nru a/arch/ppc/kernel/time.c b/arch/ppc/kernel/time.c
--- a/arch/ppc/kernel/time.c 2005-03-11 17:02:30 -08:00
+++ b/arch/ppc/kernel/time.c 2005-03-11 17:02:30 -08:00
@@ -57,6 +57,7 @@
#include <linux/time.h>
#include <linux/init.h>
#include <linux/profile.h>
+#include <linux/timeofday.h>

#include <asm/segment.h>
#include <asm/io.h>
@@ -95,6 +96,46 @@

EXPORT_SYMBOL(rtc_lock);

+#ifdef CONFIG_NEWTOD
+nsec_t read_persistent_clock(void)
+{
+ if (ppc_md.get_rtc_time) {
+ return (nsec_t)ppc_md.get_rtc_time() * NSEC_PER_SEC;
+ } else {
+ printk(KERN_ERR "ppc_md.get_rtc_time does not exist???\n");
+ return 0;
+ }
+}
+
+void sync_persistent_clock(struct timespec ts)
+{
+ /*
+ * update the rtc when needed, this should be performed on the
+ * right fraction of a second. Half or full second ?
+ * Full second works on mk48t59 clocks, others need testing.
+ * Note that this update is basically only used through
+ * the adjtimex system calls. Setting the HW clock in
+ * any other way is a /dev/rtc and userland business.
+ * This is still wrong by -0.5/+1.5 jiffies because of the
+ * timer interrupt resolution and possible delay, but here we
+ * hit a quantization limit which can only be solved by higher
+ * resolution timers and decoupling time management from timer
+ * interrupts. This is also wrong on the clocks
+ * which require being written at the half second boundary.
+ * We should have an rtc call that only sets the minutes and
+ * seconds like on Intel to avoid problems with non UTC clocks.
+ */
+ if ( ppc_md.set_rtc_time && ts.tv_sec - last_rtc_update >= 659 &&
+ abs((ts.tv_nsec/1000) - (1000000-1000000/HZ)) < 500000/HZ) {
+ if (ppc_md.set_rtc_time(ts.tv_sec + 1 + time_offset) == 0)
+ last_rtc_update = ts.tv_sec+1;
+ else
+ /* Try again one minute later */
+ last_rtc_update += 60;
+ }
+}
+#endif /* CONFIG_NEWTOD */
+
/* Timer interrupt helper function */
static inline int tb_delta(unsigned *jiffy_stamp) {
int delta;
@@ -152,6 +193,7 @@
tb_last_stamp = jiffy_stamp;
do_timer(regs);

+#ifndef CONFIG_NEWTOD
/*
* update the rtc when needed, this should be performed on the
* right fraction of a second. Half or full second ?
@@ -178,6 +220,7 @@
/* Try again one minute later */
last_rtc_update += 60;
}
+#endif
write_sequnlock(&xtime_lock);
}
if ( !disarm_decr[smp_processor_id()] )
@@ -193,6 +236,7 @@
/*
* This version of gettimeofday has microsecond resolution.
*/
+#ifndef CONFIG_NEWTOD
void do_gettimeofday(struct timeval *tv)
{
unsigned long flags;
@@ -281,6 +325,7 @@
}

EXPORT_SYMBOL(do_settimeofday);
+#endif

/* This function is only called on the boot processor */
void __init time_init(void)
diff -Nru a/arch/ppc/platforms/chrp_time.c b/arch/ppc/platforms/chrp_time.c
--- a/arch/ppc/platforms/chrp_time.c 2005-03-11 17:02:30 -08:00
+++ b/arch/ppc/platforms/chrp_time.c 2005-03-11 17:02:30 -08:00
@@ -115,8 +115,10 @@
chrp_cmos_clock_write(save_control, RTC_CONTROL);
chrp_cmos_clock_write(save_freq_select, RTC_FREQ_SELECT);

+#ifndef CONFIG_NEWTOD
if ( (time_state == TIME_ERROR) || (time_state == TIME_BAD) )
time_state = TIME_OK;
+#endif
spin_unlock(&rtc_lock);
return 0;
}
diff -Nru a/arch/ppc64/Kconfig b/arch/ppc64/Kconfig
--- a/arch/ppc64/Kconfig 2005-03-11 17:02:30 -08:00
+++ b/arch/ppc64/Kconfig 2005-03-11 17:02:30 -08:00
@@ -10,6 +10,10 @@
bool
default y

+config NEWTOD
+ bool
+ default y
+
config UID16
bool

diff -Nru a/arch/ppc64/kernel/sys_ppc32.c b/arch/ppc64/kernel/sys_ppc32.c
--- a/arch/ppc64/kernel/sys_ppc32.c 2005-03-11 17:02:30 -08:00
+++ b/arch/ppc64/kernel/sys_ppc32.c 2005-03-11 17:02:30 -08:00
@@ -322,8 +322,10 @@

ret = do_adjtimex(&txc);

+#ifndef CONFIG_NEWTOD
/* adjust the conversion of TB to time of day to track adjtimex */
ppc_adjtimex();
+#endif

if(put_user(txc.modes, &utp->modes) ||
__put_user(txc.offset, &utp->offset) ||
diff -Nru a/arch/ppc64/kernel/time.c b/arch/ppc64/kernel/time.c
--- a/arch/ppc64/kernel/time.c 2005-03-11 17:02:30 -08:00
+++ b/arch/ppc64/kernel/time.c 2005-03-11 17:02:30 -08:00
@@ -50,6 +50,7 @@
#include <linux/profile.h>
#include <linux/cpu.h>
#include <linux/security.h>
+#include <linux/timeofday.h>

#include <asm/segment.h>
#include <asm/io.h>
@@ -107,6 +108,7 @@

static unsigned adjusting_time = 0;

+#ifndef CONFIG_NEWTOD
static __inline__ void timer_check_rtc(void)
{
/*
@@ -140,6 +142,52 @@
last_rtc_update += 60;
}
}
+#else /* CONFIG_NEWTOD */
+nsec_t read_persistent_clock(void)
+{
+ struct rtc_time tm;
+ unsigned long sec;
+#ifdef CONFIG_PPC_ISERIES
+ if (!piranha_simulator)
+#endif
+ ppc_md.get_boot_time(&tm);
+
+ sec = mktime(tm.tm_year + 1900, tm.tm_mon + 1, tm.tm_mday,
+ tm.tm_hour, tm.tm_min, tm.tm_sec);
+ return (nsec_t)sec * NSEC_PER_SEC;
+}
+void sync_persistent_clock(struct timespec ts)
+{
+ /*
+ * update the rtc when needed, this should be performed on the
+ * right fraction of a second. Half or full second ?
+ * Full second works on mk48t59 clocks, others need testing.
+ * Note that this update is basically only used through
+ * the adjtimex system calls. Setting the HW clock in
+ * any other way is a /dev/rtc and userland business.
+ * This is still wrong by -0.5/+1.5 jiffies because of the
+ * timer interrupt resolution and possible delay, but here we
+ * hit a quantization limit which can only be solved by higher
+ * resolution timers and decoupling time management from timer
+ * interrupts. This is also wrong on the clocks
+ * which require being written at the half second boundary.
+ * We should have an rtc call that only sets the minutes and
+ * seconds like on Intel to avoid problems with non UTC clocks.
+ */
+ if ( ts.tv_sec - last_rtc_update >= 659 &&
+ abs((ts.tv_nsec/1000) - (1000000-1000000/HZ)) < 500000/HZ) {
+ struct rtc_time tm;
+ to_tm(ts.tv_sec+1, &tm);
+ tm.tm_year -= 1900;
+ tm.tm_mon -= 1;
+ if (ppc_md.set_rtc_time(&tm) == 0)
+ last_rtc_update = ts.tv_sec+1;
+ else
+ /* Try again one minute later */
+ last_rtc_update += 60;
+ }
+}
+#endif /* CONFIG_NEWTOD */

/*
* This version of gettimeofday has microsecond resolution.
@@ -171,12 +219,14 @@
tv->tv_usec = usec;
}

+#ifndef CONFIG_NEWTOD
void do_gettimeofday(struct timeval *tv)
{
__do_gettimeofday(tv, get_tb());
}

EXPORT_SYMBOL(do_gettimeofday);
+#endif

/* Synchronize xtime with do_gettimeofday */

@@ -350,11 +400,15 @@
tb_last_stamp = lpaca->next_jiffy_update_tb;
timer_recalc_offset(lpaca->next_jiffy_update_tb);
do_timer(regs);
+#ifndef CONFIG_NEWTOD
timer_sync_xtime(lpaca->next_jiffy_update_tb);
timer_check_rtc();
+#endif
write_sequnlock(&xtime_lock);
+#ifndef CONFIG_NEWTOD
if ( adjusting_time && (time_adjust == 0) )
ppc_adjtimex();
+#endif
}
lpaca->next_jiffy_update_tb += tb_ticks_per_jiffy;
}
@@ -396,6 +450,7 @@
{
return mulhdu(get_tb(), tb_to_ns_scale) << tb_to_ns_shift;
}
+#ifndef CONFIG_NEWTOD

int do_settimeofday(struct timespec *tv)
{
@@ -473,6 +528,7 @@
}

EXPORT_SYMBOL(do_settimeofday);
+#endif /* !CONFIG_NEWTOD */

void __init time_init(void)
{
@@ -525,7 +581,9 @@
systemcfg->stamp_xsec = xtime.tv_sec * XSEC_PER_SEC;
systemcfg->tb_to_xs = tb_to_xs;

+#ifndef CONFIG_NEWTOD
time_freq = 0;
+#endif

xtime.tv_nsec = 0;
last_rtc_update = xtime.tv_sec;
@@ -548,6 +606,7 @@

/* #define DEBUG_PPC_ADJTIMEX 1 */

+#ifndef CONFIG_NEWTOD
void ppc_adjtimex(void)
{
unsigned long den, new_tb_ticks_per_sec, tb_ticks, old_xsec, new_tb_to_xs, new_xsec, new_stamp_xsec;
@@ -671,6 +730,7 @@
write_sequnlock_irqrestore( &xtime_lock, flags );

}
+#endif /* !CONFIG_NEWTOD */


#define TICK_SIZE tick
diff -Nru a/arch/x86_64/Kconfig b/arch/x86_64/Kconfig
--- a/arch/x86_64/Kconfig 2005-03-11 17:02:30 -08:00
+++ b/arch/x86_64/Kconfig 2005-03-11 17:02:30 -08:00
@@ -24,6 +24,10 @@
bool
default y

+config NEWTOD
+ bool
+ default y
+
config MMU
bool
default y
diff -Nru a/arch/x86_64/kernel/time.c b/arch/x86_64/kernel/time.c
--- a/arch/x86_64/kernel/time.c 2005-03-11 17:02:30 -08:00
+++ b/arch/x86_64/kernel/time.c 2005-03-11 17:02:30 -08:00
@@ -35,6 +35,7 @@
#include <asm/sections.h>
#include <linux/cpufreq.h>
#include <linux/hpet.h>
+#include <linux/timeofday.h>
#ifdef CONFIG_X86_LOCAL_APIC
#include <asm/apic.h>
#endif
@@ -106,6 +107,7 @@

unsigned int (*do_gettimeoffset)(void) = do_gettimeoffset_tsc;

+#ifndef CONFIG_NEWTOD
/*
* This version of gettimeofday() has microsecond resolution and better than
* microsecond precision, as we're using at least a 10 MHz (usually 14.31818
@@ -180,6 +182,7 @@
}

EXPORT_SYMBOL(do_settimeofday);
+#endif /* CONFIG_NEWTOD */

unsigned long profile_pc(struct pt_regs *regs)
{
@@ -281,6 +284,7 @@
}


+#ifndef CONFIG_NEWTOD
/* monotonic_clock(): returns # of nanoseconds passed since time_init()
* Note: This function is required to return accurate
* time even in the absence of multiple timer ticks.
@@ -357,6 +361,8 @@
}
#endif
}
+#endif /* CONFIG_NEWTOD */
+

static irqreturn_t timer_interrupt(int irq, void *dev_id, struct pt_regs *regs)
{
@@ -373,6 +379,7 @@

write_seqlock(&xtime_lock);

+#ifndef CONFIG_NEWTOD
if (vxtime.hpet_address) {
offset = hpet_readl(HPET_T0_CMP) - hpet_tick;
delay = hpet_readl(HPET_COUNTER) - offset;
@@ -422,6 +429,7 @@
handle_lost_ticks(lost, regs);
jiffies += lost;
}
+#endif /* CONFIG_NEWTOD */

/*
* Do the timer stuff.
@@ -445,6 +453,7 @@
smp_local_timer_interrupt(regs);
#endif

+#ifndef CONFIG_NEWTOD
/*
* If we have an externally synchronized Linux clock, then update CMOS clock
* accordingly every ~11 minutes. set_rtc_mmss() will be called in the jiffy
@@ -458,7 +467,8 @@
set_rtc_mmss(xtime.tv_sec);
rtc_update = xtime.tv_sec + 660;
}
-
+#endif /* CONFIG_NEWTOD */
+
write_sequnlock(&xtime_lock);

return IRQ_HANDLED;
@@ -560,6 +570,30 @@
return mktime(year, mon, day, hour, min, sec);
}

+/* arch specific timeofday hooks */
+nsec_t read_persistent_clock(void)
+{
+ return (nsec_t)get_cmos_time() * NSEC_PER_SEC;
+}
+
+void sync_persistent_clock(struct timespec ts)
+{
+ static unsigned long rtc_update = 0;
+ /*
+ * If we have an externally synchronized Linux clock, then update
+ * CMOS clock accordingly every ~11 minutes. set_rtc_mmss() will
+ * be called in the jiffy closest to exactly 500 ms before the
+ * next second. If the update fails, we don't care, as it'll be
+ * updated on the next turn, and the problem (time way off) isn't
+ * likely to go away much sooner anyway.
+ */
+ if (ts.tv_sec > rtc_update &&
+ abs(ts.tv_nsec - 500000000) <= tick_nsec / 2) {
+ set_rtc_mmss(xtime.tv_sec);
+ rtc_update = xtime.tv_sec + 660;
+ }
+}
+
#ifdef CONFIG_CPU_FREQ

/* Frequency scaling support. Adjust the TSC based timer when the cpu frequency
@@ -955,6 +989,7 @@

__setup("report_lost_ticks", time_setup);

+#ifndef CONFIG_NEWTOD
static long clock_cmos_diff;
static unsigned long sleep_start;

@@ -990,6 +1025,21 @@
wall_jiffies += sleep_length;
return 0;
}
+#else /* !CONFIG_NEWTOD */
+static int timer_suspend(struct sys_device *dev, u32 state)
+{
+ timeofday_suspend_hook();
+ return 0;
+}
+
+static int timer_resume(struct sys_device *dev)
+{
+ if (vxtime.hpet_address)
+ hpet_reenable();
+ timeofday_resume_hook();
+ return 0;
+}
+#endif

static struct sysdev_class timer_sysclass = {
.resume = timer_resume,
diff -Nru a/arch/x86_64/kernel/vsyscall.c b/arch/x86_64/kernel/vsyscall.c
--- a/arch/x86_64/kernel/vsyscall.c 2005-03-11 17:02:30 -08:00
+++ b/arch/x86_64/kernel/vsyscall.c 2005-03-11 17:02:30 -08:00
@@ -171,8 +171,12 @@
BUG_ON((unsigned long) &vtime != VSYSCALL_ADDR(__NR_vtime));
BUG_ON((VSYSCALL_ADDR(0) != __fix_to_virt(VSYSCALL_FIRST_PAGE)));
map_vsyscall();
+/* XXX - disable vsyscall gettimeofday for now */
+#ifndef CONFIG_NEWTOD
sysctl_vsyscall = 1;
-
+#else
+ sysctl_vsyscall = 0;
+#endif
return 0;
}

diff -Nru a/include/asm-generic/div64.h b/include/asm-generic/div64.h
--- a/include/asm-generic/div64.h 2005-03-11 17:02:30 -08:00
+++ b/include/asm-generic/div64.h 2005-03-11 17:02:30 -08:00
@@ -55,4 +55,13 @@

#endif /* BITS_PER_LONG */

+#ifndef div_long_long_rem
+#define div_long_long_rem(dividend,divisor,remainder) \
+({ \
+ u64 result = dividend; \
+ *remainder = do_div(result,divisor); \
+ result; \
+})
+#endif
+
#endif /* _ASM_GENERIC_DIV64_H */


2005-03-12 01:34:24

by john stultz

[permalink] [raw]
Subject: [RFC][PATCH] new timeofday arch specific timesource drivers (v. A3)

All,
This patch implements most of the time sources for i386, x86-64, ppc32
and ppc64 (tsc, pit, cyclone, acpi-pm, hpet and timebase). There are
also initial untested sketches for the ia64 itc and sn2_rtc timesources.
It applies ontop of my linux-2.6.11_timeofday-arch_A3 patch. It provides
real hardware timesources (opposed to the example jiffies timesource)
that can be used for more realistic testing.

This patch is the shabbiest of the three. It needs to be broken up, and
cleaned. The i386_pit.c is broken. The hpet and cyclone code have been
attempted to be cleaned up so they can be shared between x86-64, i386
and ia64, but they still need testing. acpi_pm also needs to be made
arch generic, but for now it will get you going so you can test and play
with the core code.

New in this release:
o ppc32_timebase code (by Darrick Wong!)
o move cyclone code to TIMESOURCE_MMIO_32
o cleaned up hpet to work on i386 as well as x86-64
o untested/uncompiled ia64 timesources (these are mine, don't blame
Max!)
o other minor code cleanups

Items still on the TODO list:
o real ia64 timesources
o make cyclone/apci_pm arch generic
o example interpolation timesource
o fix i386_pit timesource
o all other arch timesources (volunteers wanted!)
o lots of cleanups
o lots of testing

thanks
-john

linux-2.6.11_timeofday-timesources_A3.patch
===================================================
diff -Nru a/arch/i386/kernel/Makefile b/arch/i386/kernel/Makefile
--- a/arch/i386/kernel/Makefile 2005-03-11 17:04:48 -08:00
+++ b/arch/i386/kernel/Makefile 2005-03-11 17:04:48 -08:00
@@ -7,10 +7,10 @@
obj-y := process.o semaphore.o signal.o entry.o traps.o irq.o vm86.o \
ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_i386.o \
pci-dma.o i386_ksyms.o i387.o dmi_scan.o bootflag.o \
- doublefault.o quirks.o
+ doublefault.o quirks.o tsc.o

obj-y += cpu/
-obj-y += timers/
+obj-$(!CONFIG_NEWTOD) += timers/
obj-$(CONFIG_ACPI_BOOT) += acpi/
obj-$(CONFIG_X86_BIOS_REBOOT) += reboot.o
obj-$(CONFIG_MCA) += mca.o
diff -Nru a/arch/i386/kernel/acpi/boot.c b/arch/i386/kernel/acpi/boot.c
--- a/arch/i386/kernel/acpi/boot.c 2005-03-11 17:04:48 -08:00
+++ b/arch/i386/kernel/acpi/boot.c 2005-03-11 17:04:48 -08:00
@@ -547,7 +547,7 @@


#ifdef CONFIG_HPET_TIMER
-
+#include <asm/hpet.h>
static int __init acpi_parse_hpet(unsigned long phys, unsigned long size)
{
struct acpi_table_hpet *hpet_tbl;
@@ -570,18 +570,12 @@
#ifdef CONFIG_X86_64
vxtime.hpet_address = hpet_tbl->addr.addrl |
((long) hpet_tbl->addr.addrh << 32);
-
- printk(KERN_INFO PREFIX "HPET id: %#x base: %#lx\n",
- hpet_tbl->id, vxtime.hpet_address);
+ hpet_address = vxtime.hpet_address;
#else /* X86 */
- {
- extern unsigned long hpet_address;
-
hpet_address = hpet_tbl->addr.addrl;
+#endif /* X86 */
printk(KERN_INFO PREFIX "HPET id: %#x base: %#lx\n",
hpet_tbl->id, hpet_address);
- }
-#endif /* X86 */

return 0;
}
diff -Nru a/arch/i386/kernel/i8259.c b/arch/i386/kernel/i8259.c
--- a/arch/i386/kernel/i8259.c 2005-03-11 17:04:48 -08:00
+++ b/arch/i386/kernel/i8259.c 2005-03-11 17:04:48 -08:00
@@ -387,6 +387,48 @@
}
}

+#ifdef CONFIG_NEWTOD
+void setup_pit_timer(void)
+{
+ extern spinlock_t i8253_lock;
+ unsigned long flags;
+
+ spin_lock_irqsave(&i8253_lock, flags);
+ outb_p(0x34,PIT_MODE); /* binary, mode 2, LSB/MSB, ch 0 */
+ udelay(10);
+ outb_p(LATCH & 0xff , PIT_CH0); /* LSB */
+ udelay(10);
+ outb(LATCH >> 8 , PIT_CH0); /* MSB */
+ spin_unlock_irqrestore(&i8253_lock, flags);
+}
+
+static int timer_resume(struct sys_device *dev)
+{
+ setup_pit_timer();
+ return 0;
+}
+
+static struct sysdev_class timer_sysclass = {
+ set_kset_name("timer_pit"),
+ .resume = timer_resume,
+};
+
+static struct sys_device device_timer = {
+ .id = 0,
+ .cls = &timer_sysclass,
+};
+
+static int __init init_timer_sysfs(void)
+{
+ int error = sysdev_class_register(&timer_sysclass);
+ if (!error)
+ error = sysdev_register(&device_timer);
+ return error;
+}
+
+device_initcall(init_timer_sysfs);
+#endif
+
void __init init_IRQ(void)
{
int i;
diff -Nru a/arch/i386/kernel/setup.c b/arch/i386/kernel/setup.c
--- a/arch/i386/kernel/setup.c 2005-03-11 17:04:48 -08:00
+++ b/arch/i386/kernel/setup.c 2005-03-11 17:04:48 -08:00
@@ -50,6 +50,7 @@
#include <asm/io_apic.h>
#include <asm/ist.h>
#include <asm/io.h>
+#include <asm/tsc.h>
#include "setup_arch_pre.h"
#include <bios_ebda.h>

@@ -1527,6 +1528,7 @@
conswitchp = &dummy_con;
#endif
#endif
+ tsc_init();
}

#include "setup_arch_post.h"
diff -Nru a/arch/i386/kernel/time.c b/arch/i386/kernel/time.c
--- a/arch/i386/kernel/time.c 2005-03-11 17:04:48 -08:00
+++ b/arch/i386/kernel/time.c 2005-03-11 17:04:48 -08:00
@@ -88,7 +88,9 @@
DEFINE_SPINLOCK(i8253_lock);
EXPORT_SYMBOL(i8253_lock);

+#ifndef CONFIG_NEWTOD
struct timer_opts *cur_timer = &timer_none;
+#endif

/*
* This is a special lock that is owned by the CPU and holds the index
diff -Nru a/arch/i386/kernel/timers/common.c b/arch/i386/kernel/timers/common.c
--- a/arch/i386/kernel/timers/common.c 2005-03-11 17:04:48 -08:00
+++ b/arch/i386/kernel/timers/common.c 2005-03-11 17:04:48 -08:00
@@ -22,8 +22,6 @@
* device.
*/

-#define CALIBRATE_TIME (5 * 1000020/HZ)
-
unsigned long __init calibrate_tsc(void)
{
mach_prepare_counter();
diff -Nru a/arch/i386/kernel/tsc.c b/arch/i386/kernel/tsc.c
--- /dev/null Wed Dec 31 16:00:00 196900
+++ b/arch/i386/kernel/tsc.c 2005-03-11 17:04:48 -08:00
@@ -0,0 +1,111 @@
+#include <linux/init.h>
+#include <linux/timex.h>
+#include <linux/cpufreq.h>
+#include <asm/tsc.h>
+#include "mach_timer.h"
+
+unsigned long cpu_freq_khz;
+#ifdef CONFIG_NEWTOD
+int tsc_disable;
+#endif
+
+void tsc_init(void)
+{
+ unsigned long long start, end;
+ unsigned long count;
+ u64 delta64;
+ int i;
+
+ /* repeat 3 times to make sure the cache is warm */
+ for(i=0; i < 3; i++) {
+ mach_prepare_counter();
+ rdtscll(start);
+ mach_countup(&count);
+ rdtscll(end);
+ }
+ delta64 = end - start;
+
+ /* cpu freq too fast */
+ if(delta64 > (1ULL<<32))
+ return;
+ /* cpu freq too slow */
+ if (delta64 <= CALIBRATE_TIME)
+ return;
+
+ delta64 *= 1000;
+ do_div(delta64,CALIBRATE_TIME);
+ cpu_freq_khz = (unsigned long)delta64;
+
+ cpu_khz = cpu_freq_khz;
+
+ printk("Detected %lu.%03lu MHz processor.\n",
+ cpu_khz / 1000, cpu_khz % 1000);
+
+}
+
+
+/* All of the code below comes from arch/i386/kernel/timers/timer_tsc.c
+ * XXX: severly needs better comments and the ifdef's killed.
+ */
+
+#ifdef CONFIG_CPU_FREQ
+static unsigned int cpufreq_init = 0;
+
+/* If the CPU frequency is scaled, TSC-based delays will need a different
+ * loops_per_jiffy value to function properly.
+ */
+
+static unsigned int ref_freq = 0;
+static unsigned long loops_per_jiffy_ref = 0;
+
+#ifndef CONFIG_SMP
+static unsigned long cpu_khz_ref = 0;
+#endif
+
+static int time_cpufreq_notifier(struct notifier_block *nb,
+ unsigned long val, void *data)
+{
+ struct cpufreq_freqs *freq = data;
+
+ if (val != CPUFREQ_RESUMECHANGE)
+ write_seqlock_irq(&xtime_lock);
+ if (!ref_freq) {
+ ref_freq = freq->old;
+ loops_per_jiffy_ref = cpu_data[freq->cpu].loops_per_jiffy;
+#ifndef CONFIG_SMP
+ cpu_khz_ref = cpu_khz;
+#endif
+ }
+
+ if ((val == CPUFREQ_PRECHANGE && freq->old < freq->new) ||
+ (val == CPUFREQ_POSTCHANGE && freq->old > freq->new) ||
+ (val == CPUFREQ_RESUMECHANGE)) {
+ if (!(freq->flags & CPUFREQ_CONST_LOOPS))
+ cpu_data[freq->cpu].loops_per_jiffy = cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new);
+#ifndef CONFIG_SMP
+ if (cpu_khz)
+ cpu_khz = cpufreq_scale(cpu_khz_ref, ref_freq, freq->new);
+#endif
+ }
+
+ if (val != CPUFREQ_RESUMECHANGE)
+ write_sequnlock_irq(&xtime_lock);
+
+ return 0;
+}
+
+static struct notifier_block time_cpufreq_notifier_block = {
+ .notifier_call = time_cpufreq_notifier
+};
+
+static int __init cpufreq_tsc(void)
+{
+ int ret;
+ ret = cpufreq_register_notifier(&time_cpufreq_notifier_block,
+ CPUFREQ_TRANSITION_NOTIFIER);
+ if (!ret)
+ cpufreq_init = 1;
+ return ret;
+}
+core_initcall(cpufreq_tsc);
+#endif /* CONFIG_CPU_FREQ */
diff -Nru a/arch/x86_64/kernel/time.c b/arch/x86_64/kernel/time.c
--- a/arch/x86_64/kernel/time.c 2005-03-11 17:04:48 -08:00
+++ b/arch/x86_64/kernel/time.c 2005-03-11 17:04:48 -08:00
@@ -59,6 +59,7 @@
#undef HPET_HACK_ENABLE_DANGEROUS

unsigned int cpu_khz; /* TSC clocks / usec, not used here */
+unsigned long hpet_address;
unsigned long hpet_period; /* fsecs / HPET clock */
unsigned long hpet_tick; /* HPET clocks / interrupt */
unsigned long vxtime_hz = PIT_TICK_RATE;
diff -Nru a/drivers/timesource/Makefile b/drivers/timesource/Makefile
--- a/drivers/timesource/Makefile 2005-03-11 17:04:48 -08:00
+++ b/drivers/timesource/Makefile 2005-03-11 17:04:48 -08:00
@@ -1 +1,14 @@
obj-y += jiffies.o
+obj-$(CONFIG_X86) += tsc.o
+obj-$(CONFIG_PPC64) += ppc64_timebase.o
+obj-$(CONFIG_PPC) += ppc_timebase.o
+obj-$(CONFIG_X86_CYCLONE_TIMER) += cyclone.o
+obj-$(CONFIG_X86_PM_TIMER) += acpi_pm.o
+obj-$(CONFIG_HPET_TIMER) += hpet.o
+
+# XXX - Known broken
+#obj-$(CONFIG_X86) += i386_pit.o
+
+# XXX - Untested/Uncompiled
+#obj-$(CONFIG_IA64) += itc.c
+#obj-$(CONFIG_IA64_SGI_SN2) += sn2_rtc.c
diff -Nru a/drivers/timesource/acpi_pm.c b/drivers/timesource/acpi_pm.c
--- /dev/null Wed Dec 31 16:00:00 196900
+++ b/drivers/timesource/acpi_pm.c 2005-03-11 17:04:48 -08:00
@@ -0,0 +1,116 @@
+#include <linux/timesource.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <asm/io.h>
+#include "mach_timer.h"
+
+/* Number of PMTMR ticks expected during calibration run */
+#define PMTMR_TICKS_PER_SEC 3579545
+#define PMTMR_EXPECTED_RATE \
+ ((CALIBRATE_LATCH * (PMTMR_TICKS_PER_SEC >> 10)) / (CLOCK_TICK_RATE>>10))
+
+
+/* The I/O port the PMTMR resides at.
+ * The location is detected during setup_arch(),
+ * in arch/i386/acpi/boot.c */
+u32 pmtmr_ioport = 0;
+
+#define ACPI_PM_MASK 0xFFFFFF /* limit it to 24 bits */
+
+static inline u32 read_pmtmr(void)
+{
+ u32 v1=0,v2=0,v3=0;
+ /* It has been reported that because of various broken
+ * chipsets (ICH4, PIIX4 and PIIX4E) where the ACPI PM time
+ * source is not latched, so you must read it multiple
+ * times to insure a safe value is read.
+ */
+ do {
+ v1 = inl(pmtmr_ioport);
+ v2 = inl(pmtmr_ioport);
+ v3 = inl(pmtmr_ioport);
+ } while ((v1 > v2 && v1 < v3) || (v2 > v3 && v2 < v1)
+ || (v3 > v1 && v3 < v2));
+
+ /* mask the output to 24 bits */
+ return v2 & ACPI_PM_MASK;
+}
+
+
+static cycle_t acpi_pm_read(void)
+{
+ return (cycle_t)read_pmtmr();
+}
+
+struct timesource_t timesource_acpi_pm = {
+ .name = "acpi_pm",
+ .priority = 200,
+ .type = TIMESOURCE_FUNCTION,
+ .read_fnct = acpi_pm_read,
+ .mask = (cycle_t)ACPI_PM_MASK,
+ .mult = 0, /*to be caluclated*/
+ .shift = 22,
+};
+
+/*
+ * Some boards have the PMTMR running way too fast. We check
+ * the PMTMR rate against PIT channel 2 to catch these cases.
+ */
+static int verify_pmtmr_rate(void)
+{
+ u32 value1, value2;
+ unsigned long count, delta;
+
+ mach_prepare_counter();
+ value1 = read_pmtmr();
+ mach_countup(&count);
+ value2 = read_pmtmr();
+ delta = (value2 - value1) & ACPI_PM_MASK;
+
+ /* Check that the PMTMR delta is within 5% of what we expect */
+ if (delta < (PMTMR_EXPECTED_RATE * 19) / 20 ||
+ delta > (PMTMR_EXPECTED_RATE * 21) / 20) {
+ printk(KERN_INFO "PM-Timer running at invalid rate: %lu%% of normal - aborting.\n", 100UL * delta / PMTMR_EXPECTED_RATE);
+ return -1;
+ }
+
+ return 0;
+}
+
+
+static int init_acpi_pm_timesource(void)
+{
+ u32 value1, value2;
+ unsigned int i;
+
+ if (!pmtmr_ioport)
+ return -ENODEV;
+
+ timesource_acpi_pm.mult = timesource_hz2mult(PMTMR_TICKS_PER_SEC,
+ timesource_acpi_pm.shift);
+
+ /* "verify" this timing source */
+ value1 = read_pmtmr();
+ for (i = 0; i < 10000; i++) {
+ value2 = read_pmtmr();
+ if (value2 == value1)
+ continue;
+ if (value2 > value1)
+ goto pm_good;
+ if ((value2 < value1) && ((value2) < 0xFFF))
+ goto pm_good;
+ printk(KERN_INFO "PM-Timer had inconsistent results: 0x%#x, 0x%#x - aborting.\n", value1, value2);
+ return -EINVAL;
+ }
+ printk(KERN_INFO "PM-Timer had no reasonable result: 0x%#x - aborting.\n", value1);
+ return -ENODEV;
+
+pm_good:
+ if (verify_pmtmr_rate() != 0)
+ return -ENODEV;
+
+ register_timesource(&timesource_acpi_pm);
+ return 0;
+}
+
+module_init(init_acpi_pm_timesource);
diff -Nru a/drivers/timesource/cyclone.c b/drivers/timesource/cyclone.c
--- /dev/null Wed Dec 31 16:00:00 196900
+++ b/drivers/timesource/cyclone.c 2005-03-11 17:04:48 -08:00
@@ -0,0 +1,137 @@
+#include <linux/timesource.h>
+#include <linux/errno.h>
+#include <linux/string.h>
+#include <linux/timex.h>
+#include <linux/init.h>
+
+#include <asm/io.h>
+#include <asm/pgtable.h>
+#include <asm/fixmap.h>
+#include "mach_timer.h"
+
+#define CYCLONE_CBAR_ADDR 0xFEB00CD0 /* base address ptr*/
+#define CYCLONE_PMCC_OFFSET 0x51A0 /* offset to control register */
+#define CYCLONE_MPCS_OFFSET 0x51A8 /* offset to select register */
+#define CYCLONE_MPMC_OFFSET 0x51D0 /* offset to count register */
+#define CYCLONE_TIMER_FREQ 100000000
+#define CYCLONE_TIMER_MASK (((u64)1<<32)-1) /* 32 bit mask */
+
+
+int use_cyclone = 0;
+
+struct timesource_t timesource_cyclone = {
+ .name = "cyclone",
+ .priority = 100,
+ .type = TIMESOURCE_MMIO_32,
+ .mmio_ptr = NULL, /* to be set */
+ .mask = (cycle_t)CYCLONE_TIMER_MASK,
+ .mult = 10,
+ .shift = 0,
+};
+
+static unsigned long calibrate_cyclone(void)
+{
+ unsigned long start, end, delta;
+ unsigned long i, count;
+ unsigned long cyclone_freq_khz;
+
+ /* repeat 3 times to make sure the cache is warm */
+ for(i=0; i < 3; i++) {
+ mach_prepare_counter();
+ start = readl(timesource_cyclone.mmio_ptr);
+ mach_countup(&count);
+ end = readl(timesource_cyclone.mmio_ptr);
+ }
+
+ delta = end - start;
+ printk("cyclone delta: %lu\n", delta);
+ delta *= (ACTHZ/1000)>>8;
+ printk("delta*hz = %lu\n", delta);
+ cyclone_freq_khz = delta/CALIBRATE_ITERATION;
+ printk("calculated cyclone_freq: %lu khz\n", cyclone_freq_khz);
+ return cyclone_freq_khz;
+}
+
+static int init_cyclone_timesource(void)
+{
+ unsigned long base; /* saved value from CBAR */
+ unsigned long offset;
+ u32 __iomem* reg;
+ u32 __iomem* volatile cyclone_timer; /* Cyclone MPMC0 register */
+ unsigned long khz;
+ int i;
+
+ /*make sure we're on a summit box*/
+ if (!use_cyclone) return -ENODEV;
+
+ printk(KERN_INFO "Summit chipset: Starting Cyclone Counter.\n");
+
+ /* find base address */
+ offset = CYCLONE_CBAR_ADDR;
+ reg = ioremap_nocache(offset, sizeof(reg));
+ if(!reg){
+ printk(KERN_ERR "Summit chipset: Could not find valid CBAR register.\n");
+ return -ENODEV;
+ }
+ /* even on 64bit systems, this is only 32bits */
+ base = readl(reg);
+ if(!base){
+ printk(KERN_ERR "Summit chipset: Could not find valid CBAR value.\n");
+ return -ENODEV;
+ }
+ iounmap(reg);
+
+ /* setup PMCC */
+ offset = base + CYCLONE_PMCC_OFFSET;
+ reg = ioremap_nocache(offset, sizeof(reg));
+ if(!reg){
+ printk(KERN_ERR "Summit chipset: Could not find valid PMCC register.\n");
+ return -ENODEV;
+ }
+ writel(0x00000001,reg);
+ iounmap(reg);
+
+ /* setup MPCS */
+ offset = base + CYCLONE_MPCS_OFFSET;
+ reg = ioremap_nocache(offset, sizeof(reg));
+ if(!reg){
+ printk(KERN_ERR "Summit chipset: Could not find valid MPCS register.\n");
+ return -ENODEV;
+ }
+ writel(0x00000001,reg);
+ iounmap(reg);
+
+ /* map in cyclone_timer */
+ offset = base + CYCLONE_MPMC_OFFSET;
+ cyclone_timer = ioremap_nocache(offset, sizeof(u64));
+ if(!cyclone_timer){
+ printk(KERN_ERR "Summit chipset: Could not find valid MPMC register.\n");
+ return -ENODEV;
+ }
+
+ /*quick test to make sure its ticking*/
+ for(i=0; i<3; i++){
+ u32 old = readl(cyclone_timer);
+ int stall = 100;
+ while(stall--) barrier();
+ if(readl(cyclone_timer) == old){
+ printk(KERN_ERR "Summit chipset: Counter not counting! DISABLED\n");
+ iounmap(cyclone_timer);
+ cyclone_timer = NULL;
+ return -ENODEV;
+ }
+ }
+ timesource_cyclone.mmio_ptr = cyclone_timer;
+
+ /* sort out mult/shift values */
+ khz = calibrate_cyclone();
+ timesource_cyclone.shift = 22;
+ timesource_cyclone.mult = timesource_khz2mult(khz,
+ timesource_cyclone.shift);
+
+ register_timesource(&timesource_cyclone);
+
+ return 0;
+}
+
+module_init(init_cyclone_timesource);
diff -Nru a/drivers/timesource/hpet.c b/drivers/timesource/hpet.c
--- /dev/null Wed Dec 31 16:00:00 196900
+++ b/drivers/timesource/hpet.c 2005-03-11 17:04:48 -08:00
@@ -0,0 +1,58 @@
+#include <linux/timesource.h>
+#include <linux/hpet.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <asm/io.h>
+#include <asm/hpet.h>
+
+#define HPET_MASK (0xFFFFFFFF)
+#define HPET_SHIFT 22
+
+/* FSEC = 10^-15 NSEC = 10^-9 */
+#define FSEC_PER_NSEC 1000000
+
+struct timesource_t timesource_hpet = {
+ .name = "hpet",
+ .priority = 300,
+ .type = TIMESOURCE_MMIO_32,
+ .mmio_ptr = NULL,
+ .mask = (cycle_t)HPET_MASK,
+ .mult = 0, /* set below */
+ .shift = HPET_SHIFT,
+};
+
+static int init_hpet_timesource(void)
+{
+ unsigned long hpet_period, hpet_hz;
+ u64 tmp;
+
+ if (!hpet_address)
+ return -ENODEV;
+
+ /* calculate the hpet address */
+ timesource_hpet.mmio_ptr =
+ (void __iomem*)ioremap_nocache(hpet_address, HPET_MMAP_SIZE)
+ + HPET_COUNTER;
+
+ /* calculate the frequency */
+ hpet_period = hpet_readl(HPET_PERIOD);
+
+
+ /* hpet period is in femto seconds per cycle
+ * so we need to convert this to ns/cyc units
+ * aproximated by mult/2^shift
+ *
+ * fsec/cyc * 1nsec/1000000fsec = nsec/cyc = mult/2^shift
+ * fsec/cyc * 1ns/1000000fsec * 2^shift = mult
+ * fsec/cyc * 2^shift * 1nsec/1000000fsec = mult
+ * (fsec/cyc << shift)/1000000 = mult
+ * (hpet_period << shift)/FSEC_PER_NSEC = mult
+ */
+ tmp = (u64)hpet_period << HPET_SHIFT;
+ do_div(tmp, FSEC_PER_NSEC);
+ timesource_hpet.mult = (u32)tmp;
+
+ register_timesource(&timesource_hpet);
+ return 0;
+}
+module_init(init_hpet_timesource);
diff -Nru a/drivers/timesource/i386_pit.c b/drivers/timesource/i386_pit.c
--- /dev/null Wed Dec 31 16:00:00 196900
+++ b/drivers/timesource/i386_pit.c 2005-03-11 17:04:48 -08:00
@@ -0,0 +1,100 @@
+/* pit timesource: XXX - broken!
+ */
+
+#include <linux/timesource.h>
+#include <linux/timex.h>
+#include <linux/init.h>
+
+#include <asm/io.h>
+#include <asm/timer.h>
+#include "io_ports.h"
+#include "do_timer.h"
+
+extern u64 jiffies_64;
+extern long jiffies;
+extern spinlock_t i8253_lock;
+
+/* Since the PIT overflows every tick, its not very useful
+ * to just read by itself. So throw jiffies into the mix to
+ * and just return nanoseconds in pit_read().
+ */
+
+static cycle_t pit_read(void)
+{
+ unsigned long flags;
+ int count;
+ unsigned long jiffies_t;
+ static int count_p;
+ static unsigned long jiffies_p = 0;
+
+ spin_lock_irqsave(&i8253_lock, flags);
+
+ outb_p(0x00, PIT_MODE); /* latch the count ASAP */
+
+ count = inb_p(PIT_CH0); /* read the latched count */
+ jiffies_t = jiffies;
+ count |= inb_p(PIT_CH0) << 8;
+
+ /* VIA686a test code... reset the latch if count > max + 1 */
+ if (count > LATCH) {
+ outb_p(0x34, PIT_MODE);
+ outb_p(LATCH & 0xff, PIT_CH0);
+ outb(LATCH >> 8, PIT_CH0);
+ count = LATCH - 1;
+ }
+
+ /*
+ * avoiding timer inconsistencies (they are rare, but they happen)...
+ * there are two kinds of problems that must be avoided here:
+ * 1. the timer counter underflows
+ * 2. hardware problem with the timer, not giving us continuous time,
+ * the counter does small "jumps" upwards on some Pentium systems,
+ * (see c't 95/10 page 335 for Neptun bug.)
+ */
+
+ if( jiffies_t == jiffies_p ) {
+ if( count > count_p ) {
+ /* the nutcase */
+ count = do_timer_overflow(count);
+ }
+ } else
+ jiffies_p = jiffies_t;
+
+ count_p = count;
+
+ spin_unlock_irqrestore(&i8253_lock, flags);
+
+ count = ((LATCH-1) - count) * TICK_SIZE;
+ count = (count + LATCH/2) / LATCH;
+
+ count *= 1000; /* convert count from usec->nsec */
+
+ return (cycle_t)((jiffies_64 * TICK_NSEC) + count);
+}
+
+static cycle_t pit_delta(cycle_t now, cycle_t then)
+{
+ return now - then;
+}
+
+/* just return cyc, as its already in ns */
+static nsec_t pit_cyc2ns(cycle_t cyc, cycle_t* remainder)
+{
+ return (nsec_t)cyc;
+}
+
+static struct timesource_t timesource_pit = {
+ .name = "pit",
+ .priority = 0,
+ .read = pit_read,
+ .delta = pit_delta,
+ .cyc2ns = pit_cyc2ns,
+};
+
+static int init_pit_timesource(void)
+{
+ register_timesource(&timesource_pit);
+ return 0;
+}
+
+module_init(init_pit_timesource);
diff -Nru a/drivers/timesource/itc.c b/drivers/timesource/itc.c
--- /dev/null Wed Dec 31 16:00:00 196900
+++ b/drivers/timesource/itc.c 2005-03-11 17:04:48 -08:00
@@ -0,0 +1,37 @@
+/* XXX - this is totally untested and uncompiled
+ * TODO:
+ * o cpufreq issues
+ * o unsynched ITCs ?
+ */
+#include <linux/timesource.h>
+
+/* XXX - Other includes needed for:
+ * sal_platform_features, IA64_SAL_PLATFORM_FEATURE_ITC_DRIFT,
+ * local_cpu_data->itc_freq
+ * See arch/ia64/kernel/time.c for ideas
+ */
+
+#define ITC_MASK (0xffffffffffffffffLL)
+
+static struct timesource_t timesource_itc = {
+ .name = "itc",
+ .priority = 25,
+ .type = TIMESOURCE_CYCLES,
+ .mask = (cycle_t)ITC_MASK,
+ .mult = 0, /* to be set */
+ .shift = 22,
+};
+
+static int init_itc_timesource(void)
+{
+ if (!(sal_platform_features & IA64_SAL_PLATFORM_FEATURE_ITC_DRIFT)) {
+ /* XXX - I'm not really sure if itc_freq is in cyc/sec */
+ timesource_itc.mult = timesource_hz2mult(local_cpu_data->itc_freq,
+ timesource_itc.shift);
+ register_timesource(&timesource_itc);
+ }
+ return 0;
+}
+
+module_init(init_itc_timesource);
+
diff -Nru a/drivers/timesource/ppc64_timebase.c b/drivers/timesource/ppc64_timebase.c
--- /dev/null Wed Dec 31 16:00:00 196900
+++ b/drivers/timesource/ppc64_timebase.c 2005-03-11 17:04:48 -08:00
@@ -0,0 +1,33 @@
+#include <linux/timesource.h>
+#include <asm/time.h>
+
+static cycle_t timebase_read(void)
+{
+ return (cycle_t)get_tb();
+}
+
+struct timesource_t timesource_timebase = {
+ .name = "timebase",
+ .priority = 200,
+ .type = TIMESOURCE_FUNCTION,
+ .read_fnct = timebase_read,
+ .mask = (cycle_t)-1UL,
+ .mult = 0,
+ .shift = 22,
+};
+
+
+/* XXX - this should be calculated or properly externed! */
+extern unsigned long tb_to_ns_scale;
+extern unsigned long tb_to_ns_shift;
+extern unsigned long tb_ticks_per_sec;
+
+static int init_timebase_timesource(void)
+{
+ timesource_timebase.mult = timesource_hz2mult(tb_ticks_per_sec,
+ timesource_timebase.shift);
+ register_timesource(&timesource_timebase);
+ return 0;
+}
+
+module_init(init_timebase_timesource);
diff -Nru a/drivers/timesource/ppc_timebase.c b/drivers/timesource/ppc_timebase.c
--- /dev/null Wed Dec 31 16:00:00 196900
+++ b/drivers/timesource/ppc_timebase.c 2005-03-11 17:04:48 -08:00
@@ -0,0 +1,55 @@
+#include <linux/timesource.h>
+#include <linux/init.h>
+#include <asm/time.h>
+
+/* XXX - this should be calculated or properly externed! */
+
+/* DJWONG: tb_to_ns_scale is supposed to be set in time_init.
+ * No idea if that actually _happens_ on a ppc601, though it
+ * seems to work on a B&W G3. :D */
+extern unsigned long tb_to_ns_scale;
+
+static cycle_t ppc_timebase_read(void)
+{
+ unsigned long lo, hi, hi2;
+ unsigned long long tb;
+
+ do {
+ hi = get_tbu();
+ lo = get_tbl();
+ hi2 = get_tbu();
+ } while (hi2 != hi);
+ tb = ((unsigned long long) hi << 32) | lo;
+
+ return (cycle_t)tb;
+}
+
+struct timesource_t timesource_ppc_timebase = {
+ .name = "ppc_timebase",
+ .priority = 200,
+ .type = TIMESOURCE_FUNCTION,
+ .read_fnct = ppc_timebase_read,
+ .mask = (cycle_t)-1UL,
+ .mult = 0,
+ .shift = 22,
+};
+
+static int init_ppc_timebase_timesource(void)
+{
+ /* DJWONG: Extrapolated from ppc64 code. */
+ unsigned long tb_ticks_per_sec;
+ unsigned long long x;
+
+ tb_ticks_per_sec = tb_ticks_per_jiffy * HZ;
+
+ timesource_ppc_timebase.mult = timesource_hz2mult(tb_ticks_per_sec,
+ timesource_ppc_timebase.shift);
+
+ printk(KERN_INFO "ppc_timebase: tb_ticks_per_sec = %lu, mult = %lu, tb_to_ns = %lu.\n",
+ tb_ticks_per_sec, timesource_ppc_timebase.mult , tb_to_ns_scale);
+
+ register_timesource(&timesource_ppc_timebase);
+ return 0;
+}
+
+module_init(init_ppc_timebase_timesource);
diff -Nru a/drivers/timesource/sn2_rtc.c b/drivers/timesource/sn2_rtc.c
--- /dev/null Wed Dec 31 16:00:00 196900
+++ b/drivers/timesource/sn2_rtc.c 2005-03-11 17:04:48 -08:00
@@ -0,0 +1,29 @@
+#include <linux/timesource.h>
+/* XXX this will need some includes
+ * to find: sn_rtc_cycles_per_second and RTC_COUNTER_ADDR
+ * See arch/ia64/sn/kernel/sn2/timer.c for likely suspects
+ */
+
+#define SN2_RTC_MASK ((1LL << 55) - 1)
+#define SN2_SHIFT 10
+
+struct timesource_t timesource_sn2_rtc = {
+ .name = "sn2_rtc",
+ .priority = 300, /* XXX - not sure what this should be */
+ .type = TIMESOURCE_MMIO_64,
+ .mmio_ptr = NULL,
+ .mask = (cycle_t)SN2_RTC_MASK,
+ .mult = 0, /* set below */
+ .shift = SN2_SHIFT,
+};
+
+static void init_sn2_timesource(void)
+{
+ timesource_sn2_rtc.mult = timesource_hz2mult(sn_rtc_cycles_per_second,
+ SN2_SHIFT);
+ timesource_sn2_rtc.mmio_ptr = RTC_COUNTER_ADDR;
+
+ register_time_interpolator(&timesource_sn2_rtc);
+ return 0;
+}
+module_init(init_sn2_timesource);
diff -Nru a/drivers/timesource/tsc.c b/drivers/timesource/tsc.c
--- /dev/null Wed Dec 31 16:00:00 196900
+++ b/drivers/timesource/tsc.c 2005-03-11 17:04:48 -08:00
@@ -0,0 +1,46 @@
+/* TODO:
+ * o better calibration
+ */
+
+#include <linux/timesource.h>
+#include <linux/timex.h>
+#include <linux/init.h>
+
+static void tsc_update_callback(void);
+
+static struct timesource_t timesource_tsc = {
+ .name = "tsc",
+ .priority = 25,
+ .type = TIMESOURCE_CYCLES,
+ .mask = (cycle_t)~0,
+ .mult = 0, /* to be set */
+ .shift = 22,
+ .update_callback = tsc_update_callback,
+};
+
+static unsigned long current_cpu_khz = 0;
+
+static void tsc_update_callback(void)
+{
+ /* only update if cpu_khz has changed */
+ if (current_cpu_khz != cpu_khz){
+ current_cpu_khz = cpu_khz;
+ timesource_tsc.mult = timesource_khz2mult(current_cpu_khz,
+ timesource_tsc.shift);
+ }
+}
+
+static int init_tsc_timesource(void)
+{
+ /* TSC initialization is done in arch/i386/kernel/tsc.c */
+ if (cpu_has_tsc && cpu_khz) {
+ current_cpu_khz = cpu_khz;
+ timesource_tsc.mult = timesource_khz2mult(current_cpu_khz,
+ timesource_tsc.shift);
+ register_timesource(&timesource_tsc);
+ }
+ return 0;
+}
+
+module_init(init_tsc_timesource);
+
diff -Nru a/include/asm-i386/mach-default/mach_timer.h b/include/asm-i386/mach-default/mach_timer.h
--- a/include/asm-i386/mach-default/mach_timer.h 2005-03-11 17:04:48 -08:00
+++ b/include/asm-i386/mach-default/mach_timer.h 2005-03-11 17:04:48 -08:00
@@ -14,8 +14,12 @@
*/
#ifndef _MACH_TIMER_H
#define _MACH_TIMER_H
+#include <linux/jiffies.h>
+#include <asm/io.h>

-#define CALIBRATE_LATCH (5 * LATCH)
+#define CALIBRATE_ITERATION 50
+#define CALIBRATE_LATCH (CALIBRATE_ITERATION * LATCH)
+#define CALIBRATE_TIME (CALIBRATE_ITERATION * 1000020/HZ)

static inline void mach_prepare_counter(void)
{
diff -Nru a/include/asm-i386/timer.h b/include/asm-i386/timer.h
--- a/include/asm-i386/timer.h 2005-03-11 17:04:48 -08:00
+++ b/include/asm-i386/timer.h 2005-03-11 17:04:48 -08:00
@@ -2,6 +2,13 @@
#define _ASMi386_TIMER_H
#include <linux/init.h>

+#define TICK_SIZE (tick_nsec / 1000)
+void setup_pit_timer(void);
+/* Modifiers for buggy PIT handling */
+extern int pit_latch_buggy;
+extern int timer_ack;
+
+#ifndef CONFIG_NEWTOD
/**
* struct timer_ops - used to define a timer source
*
@@ -29,18 +36,10 @@
struct timer_opts *opts;
};

-#define TICK_SIZE (tick_nsec / 1000)
-
extern struct timer_opts* __init select_timer(void);
extern void clock_fallback(void);
-void setup_pit_timer(void);
-
-/* Modifiers for buggy PIT handling */
-
-extern int pit_latch_buggy;

extern struct timer_opts *cur_timer;
-extern int timer_ack;

/* list of externed timers */
extern struct timer_opts timer_none;
@@ -60,5 +59,6 @@

#ifdef CONFIG_X86_PM_TIMER
extern struct init_timer_opts timer_pmtmr_init;
+#endif
#endif
#endif
diff -Nru a/include/asm-i386/tsc.h b/include/asm-i386/tsc.h
--- /dev/null Wed Dec 31 16:00:00 196900
+++ b/include/asm-i386/tsc.h 2005-03-11 17:04:48 -08:00
@@ -0,0 +1,6 @@
+#ifndef _ASM_I386_TSC_H
+#define _ASM_I386_TSC_H
+extern unsigned long cpu_freq_khz;
+void tsc_init(void);
+
+#endif
diff -Nru a/include/asm-x86_64/hpet.h b/include/asm-x86_64/hpet.h
--- a/include/asm-x86_64/hpet.h 2005-03-11 17:04:48 -08:00
+++ b/include/asm-x86_64/hpet.h 2005-03-11 17:04:48 -08:00
@@ -44,6 +44,7 @@
#define HPET_TN_SETVAL 0x040
#define HPET_TN_32BIT 0x100

+extern unsigned long hpet_address; /* hpet memory map physical address */
extern int is_hpet_enabled(void);
extern int hpet_rtc_timer_init(void);
extern int oem_force_hpet_timer(void);
diff -Nru a/include/linux/sched.h b/include/linux/sched.h
--- a/include/linux/sched.h 2005-03-11 17:04:48 -08:00
+++ b/include/linux/sched.h 2005-03-11 17:04:48 -08:00
@@ -807,7 +807,11 @@
}
#endif

+#ifndef CONFIG_NEWTOD
extern unsigned long long sched_clock(void);
+#else
+#define sched_clock() 0
+#endif
extern unsigned long long current_sched_time(const task_t *current_task);

/* sched_exec is called by processes performing an exec */
diff -Nru a/kernel/sched.c b/kernel/sched.c
--- a/kernel/sched.c 2005-03-11 17:04:48 -08:00
+++ b/kernel/sched.c 2005-03-11 17:04:48 -08:00
@@ -176,6 +176,11 @@
#define task_hot(p, now, sd) ((long long) ((now) - (p)->last_ran) \
< (long long) (sd)->cache_hot_time)

+/* XXX - terrible hack just for now */
+#ifdef CONFIG_NEWTOD
+#define sched_clock() 0
+#endif
+
/*
* These are the runqueue data structures:
*/


2005-03-12 05:58:42

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [RFC][PATCH] new timeofday arch specific hooks (v. A3)

On Fri, 2005-03-11 at 17:25 -0800, john stultz wrote:
> All,
> This patch implements the minimal architecture specific hooks to enable
> the new time of day subsystem code for i386, x86-64, ia64, ppc32 and
> ppc64. It applies on top of my linux-2.6.11_timeofday-core_A3 patch and
> with this patch applied, you can test the new time of day subsystem.
>
> Basically it configs in the NEWTOD code and cuts alot of code out of the
> build via #ifdefs. I know, I know, #ifdefs' are ugly and bad, and the
> final patch will just remove the old code. For now this allows us to be
> flexible and easily switch between the two implementations with a single
> define.
>
> New in this version:
> o ppc32 arch code (by Darrick Wong. Many thanks to him for this code!)
> o ia64 arch code (by Max Asbock. Many thanks to him for this code!)
> o minor cleanups moving code between the arch and timesource patches
>
> Items still on the TODO list:
> o s390 arch port (hey Martin: nudge, nudge :)
> o arch specific vsyscall/fsyscall interface
> o other arch ports (volunteers wanted!)

I'm not what the impact will be with the vDSO implementation of
gettimeofday which relies on the bits in systemcfg (tb_to_xs etc...).

Currently, the userland code uses the exact same bits as the kernel
code, and thus, we have a garantee of getting the same results from
both. Also, our "special" ppc_adjtimex will also update our offset and
scale factor (with appropriate barriers) in a way that applies to both
the kernel/syscall gettimeofday and the vDSO implementation. I'm not
sure this is still true with your patch.

I suppose I'll have to dig into the details sometime next week..

Ben

2005-03-13 00:50:36

by Matt Mackall

[permalink] [raw]
Subject: Re: [RFC][PATCH] new timeofday core subsystem (v. A3)

On Fri, Mar 11, 2005 at 05:24:15PM -0800, john stultz wrote:
> +struct timesource_t timesource_jiffies = {
> + .name = "jiffies",
> + .priority = 0, /* lowest priority*/
> + .type = TIMESOURCE_FUNCTION,
> + .read_fnct = jiffies_read,
> + .mask = (cycle_t)~0,

Not sure this is right. The type of 0 is 'int' and the ~ will happen
before the cast to a potentially longer type.

> + .mult = NSEC_PER_SEC/HZ,

Does rounding matter here? Alpha has HZ of 1024, so this comes out to
976562.5.

> +struct timesource_t {
> + char* name;
> + int priority;
> + enum {
> + TIMESOURCE_FUNCTION,
> + TIMESOURCE_CYCLES,
> + TIMESOURCE_MMIO_32,
> + TIMESOURCE_MMIO_64
> + } type;
> + cycle_t (*read_fnct)(void);
> + void __iomem* mmio_ptr;

Convention is * goes next to the variable name rather than the type.

> +/* XXX - this should go somewhere better! */
> +#ifndef readq
> +static inline unsigned long long readq(void __iomem *addr)

Somewhere in asm-generic..

> +static inline cycle_t read_timesource(struct timesource_t* ts)
> +{
> + switch (ts->type) {
> + case TIMESOURCE_MMIO_32:
> + return (cycle_t)readl(ts->mmio_ptr);
> + case TIMESOURCE_MMIO_64:
> + return (cycle_t)readq(ts->mmio_ptr);
> + case TIMESOURCE_CYCLES:
> + return (cycle_t)get_cycles();
> + default:/* case: TIMESOURCE_FUNCTION */
> + return ts->read_fnct();
> + }
> +}

Wouldn't it be better to change read_fnct to take a timesource * and
then change all the other guys to generic_timesource_<foo> helper
functions? This does away with the switch and makes it trivial to add
new generic sources. Change mmio_ptr to void *private.

> @@ -467,6 +468,7 @@
> pidhash_init();
> init_timers();
> softirq_init();
> + timeofday_init();
> time_init();

Can we push time_init inside of timeofday_init?

> +/* Chapter 5: Kernel Variables [RFC 1589 pg. 28] */
> +/* 5.1 Interface Variables */
> +static int ntp_status = STA_UNSYNC; /* status */
> +static long ntp_offset; /* usec */
> +static long ntp_constant = 2; /* ntp magic? */
> +static long ntp_maxerror = NTP_PHASE_LIMIT; /* usec */
> +static long ntp_esterror = NTP_PHASE_LIMIT; /* usec */
> +static const long ntp_tolerance = MAXFREQ; /* shifted ppm */
> +static const long ntp_precision = 1; /* constant */
> +
> +/* 5.2 Phase-Lock Loop Variables */
> +static long ntp_freq; /* shifted ppm */
> +static long ntp_reftime; /* sec */

You present a nice argument for not using tabs except at the beginning
of the line.

> +#define MILLION 1000000

Still a magic number despite being a define. Very meta. Unused.

> +/* int ntp_advance(nsec_t interval):
> + * Periodic hook which increments NTP state machine by interval.
> + * Returns the signed PPM adjustment to be used for the next interval.
> + * This is ntp_hardclock in the RFC.

Why is it not ntp_hardclock here?

> + */
> +int ntp_advance(nsec_t interval)
> +{
> + static u64 interval_sum=0;

Spaces.

> + /* decrement singleshot offset interval */
> + ss_offset_len =- interval;

Eh?

> + /* bound the adjustment to MAXPHASE/MINSEC */
> + if (tmp > (MAXPHASE / MINSEC) << SHIFT_UPDATE)
> + tmp = (MAXPHASE / MINSEC) << SHIFT_UPDATE;
> + if (tmp < -(MAXPHASE / MINSEC) << SHIFT_UPDATE)
> + tmp = -(MAXPHASE / MINSEC) << SHIFT_UPDATE;

max, min?

> + /* Make sure offset is bounded by MAXPHASE */
> + if (tmp > MAXPHASE)
> + tmp = MAXPHASE;
> + if (tmp < -MAXPHASE)
> + tmp = -MAXPHASE;

max, min.

> + if ((ntp_status & STA_FLL) && (interval >= MINSEC)) {
> + long damping;
> + tmp = (offset / interval); /* ppm (usec/sec)*/

(unnecessary parens)

> +/* int ntp_adjtimex(struct timex* tx)
> + * Interface to change NTP state machine
> + */
> +int ntp_adjtimex(struct timex* tx)
> +{
> + long save_offset;
> + int result;
> + unsigned long flags;
> +
> +/*=[Sanity checking]===============================*/
> + /* Check capabilities if we're trying to modify something */
> + if (tx->modes && !capable(CAP_SYS_TIME))
> + return -EPERM;

This is already done in do_adjtimex.

> + /* clear everything */
> + ntp_status |= STA_UNSYNC;
> + ntp_maxerror = NTP_PHASE_LIMIT;
> + ntp_esterror = NTP_PHASE_LIMIT;
> + ss_offset_len=0;
> + singleshot_adj=0;
> + tick_adj=0;
> + offset_adj =0;

Spacing.

> +/*[Nanosecond based variables]----------------

This comment style is weird. Kill the trailing dashes at least.

> +static enum { TIME_RUNNING, TIME_SUSPENDED } time_suspend_state = TIME_RUNNING;

Insert some line breaks.

> + /* convert timespec to ns */
> + nsec_t newtime = timespec2ns(tv);

> + /* clear NTP settings */
> + ntp_clear();

Pointless comments.

> +int do_adjtimex(struct timex *tx)
> +{
> + do_gettimeofday(&tx->time); /* set timex->time*/

Oh. Move the cap check back here..

> + if (time_suspend_state != TIME_RUNNING) {
> + printk(KERN_INFO "timeofday_suspend_hook: ACK! called while we're suspended!");

Line length. Perhaps BUG_ON instead.

> + /* finally, update legacy time values */
> + write_seqlock_irqsave(&xtime_lock, x_flags);
> + xtime = ns2timespec(system_time + wall_time_offset);
> + wall_to_monotonic = ns2timespec(wall_time_offset);
> + wall_to_monotonic.tv_sec = -wall_to_monotonic.tv_sec;
> + wall_to_monotonic.tv_nsec = -wall_to_monotonic.tv_nsec;
> + /* XXX - should jiffies be updated here? */

Excellent question.

--
Mathematics is the supreme nostalgia of our time.

2005-03-13 01:46:25

by Andreas Schwab

[permalink] [raw]
Subject: Re: [RFC][PATCH] new timeofday core subsystem (v. A3)

Matt Mackall <[email protected]> writes:

> On Fri, Mar 11, 2005 at 05:24:15PM -0800, john stultz wrote:
>> +struct timesource_t timesource_jiffies = {
>> + .name = "jiffies",
>> + .priority = 0, /* lowest priority*/
>> + .type = TIMESOURCE_FUNCTION,
>> + .read_fnct = jiffies_read,
>> + .mask = (cycle_t)~0,
>
> Not sure this is right. The type of 0 is 'int' and the ~ will happen
> before the cast to a potentially longer type.

If you want an all-one value for any unsigned type then (type)-1 is the
most reliable way.

Andreas.

--
Andreas Schwab, SuSE Labs, [email protected]
SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."

2005-03-14 18:15:25

by john stultz

[permalink] [raw]
Subject: Re: [RFC][PATCH] new timeofday arch specific hooks (v. A3)

On Sat, 2005-03-12 at 16:52 +1100, Benjamin Herrenschmidt wrote:
> On Fri, 2005-03-11 at 17:25 -0800, john stultz wrote:
> > All,
> > This patch implements the minimal architecture specific hooks to enable
> > the new time of day subsystem code for i386, x86-64, ia64, ppc32 and
> > ppc64. It applies on top of my linux-2.6.11_timeofday-core_A3 patch and
> > with this patch applied, you can test the new time of day subsystem.
> >
> > Basically it configs in the NEWTOD code and cuts alot of code out of the
> > build via #ifdefs. I know, I know, #ifdefs' are ugly and bad, and the
> > final patch will just remove the old code. For now this allows us to be
> > flexible and easily switch between the two implementations with a single
> > define.
> >
> > New in this version:
> > o ppc32 arch code (by Darrick Wong. Many thanks to him for this code!)
> > o ia64 arch code (by Max Asbock. Many thanks to him for this code!)
> > o minor cleanups moving code between the arch and timesource patches
> >
> > Items still on the TODO list:
> > o s390 arch port (hey Martin: nudge, nudge :)
> > o arch specific vsyscall/fsyscall interface
> > o other arch ports (volunteers wanted!)
>
> I'm not what the impact will be with the vDSO implementation of
> gettimeofday which relies on the bits in systemcfg (tb_to_xs etc...).

Oh yea, the vDSO stuff slipped in after I last tested the ppc64 bits, so
I wouldn't be surprised if that's broken. For now I'm just disabling the
vsyscall/fsyscall/vDSO bits on the arches that support it until I get a
generic interface setup that would allow it to be consistent with the
new time infrastructure. Hopefully I'll have that done (I'm targeting
x86-64 atleast) by the next release.

thanks
-john


2005-03-14 18:44:57

by john stultz

[permalink] [raw]
Subject: Re: [RFC][PATCH] new timeofday core subsystem (v. A3)

On Sat, 2005-03-12 at 16:49 -0800, Matt Mackall wrote:
> On Fri, Mar 11, 2005 at 05:24:15PM -0800, john stultz wrote:
> > +struct timesource_t timesource_jiffies = {
> > + .name = "jiffies",
> > + .priority = 0, /* lowest priority*/
> > + .type = TIMESOURCE_FUNCTION,
> > + .read_fnct = jiffies_read,
> > + .mask = (cycle_t)~0,
>
> Not sure this is right. The type of 0 is 'int' and the ~ will happen
> before the cast to a potentially longer type.

Good point. I'll change it to Andreas' suggestion of (type)-1.


> > + .mult = NSEC_PER_SEC/HZ,
>
> Does rounding matter here? Alpha has HZ of 1024, so this comes out to
> 976562.5.

Actually, there are probably a number of places where I need to be
better with rounding. Its a good idea there as well.

> > +static inline cycle_t read_timesource(struct timesource_t* ts)
> > +{
> > + switch (ts->type) {
> > + case TIMESOURCE_MMIO_32:
> > + return (cycle_t)readl(ts->mmio_ptr);
> > + case TIMESOURCE_MMIO_64:
> > + return (cycle_t)readq(ts->mmio_ptr);
> > + case TIMESOURCE_CYCLES:
> > + return (cycle_t)get_cycles();
> > + default:/* case: TIMESOURCE_FUNCTION */
> > + return ts->read_fnct();
> > + }
> > +}
>
> Wouldn't it be better to change read_fnct to take a timesource * and
> then change all the other guys to generic_timesource_<foo> helper
> functions? This does away with the switch and makes it trivial to add
> new generic sources. Change mmio_ptr to void *private.

Not sure if I totally understand this, but originally I just had a read
function, but to allow this framework to function w/ ia64 fsyscalls (and
likely other arches vsyscalls) we need to pass the raw mmio pointers.
Thus the timesource type and switch idea was taken from the time
interpolator code.


> > @@ -467,6 +468,7 @@
> > pidhash_init();
> > init_timers();
> > softirq_init();
> > + timeofday_init();
> > time_init();
>
> Can we push time_init inside of timeofday_init?

Ideally, yea, but this way is cleaner until all the arches are converted
to the new timeofday code.

> > +/* Chapter 5: Kernel Variables [RFC 1589 pg. 28] */
> > +/* 5.1 Interface Variables */
> > +static int ntp_status = STA_UNSYNC; /* status */
> > +static long ntp_offset; /* usec */
> > +static long ntp_constant = 2; /* ntp magic? */
> > +static long ntp_maxerror = NTP_PHASE_LIMIT; /* usec */
> > +static long ntp_esterror = NTP_PHASE_LIMIT; /* usec */
> > +static const long ntp_tolerance = MAXFREQ; /* shifted ppm */
> > +static const long ntp_precision = 1; /* constant */
> > +
> > +/* 5.2 Phase-Lock Loop Variables */
> > +static long ntp_freq; /* shifted ppm */
> > +static long ntp_reftime; /* sec */
>
> You present a nice argument for not using tabs except at the beginning
> of the line.

Yea, I should have caught that earlier. I have the tab length set to 4
in my editor. Sorry.


> > +#define MILLION 1000000
>
> Still a magic number despite being a define. Very meta. Unused.

Should have been yanked along with ntp_scale(). Good catch.

> > +/* int ntp_advance(nsec_t interval):
> > + * Periodic hook which increments NTP state machine by interval.
> > + * Returns the signed PPM adjustment to be used for the next interval.
> > + * This is ntp_hardclock in the RFC.
>
> Why is it not ntp_hardclock here?

I'm not sure if ntp_hardclock is a very good name. Since we're advancing
the state machine, ntp_advance() seems to be more clear to me. However
if the NTP folks care enough then I'll be fine with changing it.

> > + /* decrement singleshot offset interval */
> > + ss_offset_len =- interval;
>
> Eh?

Gah! Great catch! I would have never seen that terrible typo!



> > + /* bound the adjustment to MAXPHASE/MINSEC */
> > + if (tmp > (MAXPHASE / MINSEC) << SHIFT_UPDATE)
> > + tmp = (MAXPHASE / MINSEC) << SHIFT_UPDATE;
> > + if (tmp < -(MAXPHASE / MINSEC) << SHIFT_UPDATE)
> > + tmp = -(MAXPHASE / MINSEC) << SHIFT_UPDATE;
>
> max, min?
>
> > + /* Make sure offset is bounded by MAXPHASE */
> > + if (tmp > MAXPHASE)
> > + tmp = MAXPHASE;
> > + if (tmp < -MAXPHASE)
> > + tmp = -MAXPHASE;
>
> max, min.

Good idea.


> > +int do_adjtimex(struct timex *tx)
> > +{
> > + do_gettimeofday(&tx->time); /* set timex->time*/
>
> Oh. Move the cap check back here..

Will do.

> > + if (time_suspend_state != TIME_RUNNING) {
> > + printk(KERN_INFO "timeofday_suspend_hook: ACK! called while we're suspended!");
>
> Line length. Perhaps BUG_ON instead.

Eh, its not fatal to BUG_ON seems a bit harsh. I'll fix the line length
though.


> > + /* finally, update legacy time values */
> > + write_seqlock_irqsave(&xtime_lock, x_flags);
> > + xtime = ns2timespec(system_time + wall_time_offset);
> > + wall_to_monotonic = ns2timespec(wall_time_offset);
> > + wall_to_monotonic.tv_sec = -wall_to_monotonic.tv_sec;
> > + wall_to_monotonic.tv_nsec = -wall_to_monotonic.tv_nsec;
> > + /* XXX - should jiffies be updated here? */
>
> Excellent question.

Indeed. Currently jiffies is used as both a interrupt counter and a
time unit, and I'm trying make it just the former. If I emulate it then
it stops functioning as a interrupt counter, and if I don't then I'll
probably break assumptions about jiffies being a time unit. So I'm not
sure which is the easiest path to go until all the users of jiffies are
audited for intent.

As for the code style bits: Thanks, I'll try to clean those up.

I really appreciate the time you took to review my code!

Thanks again for the feedback!
-john

2005-03-14 19:31:36

by Matt Mackall

[permalink] [raw]
Subject: Re: [RFC][PATCH] new timeofday core subsystem (v. A3)

On Mon, Mar 14, 2005 at 10:42:45AM -0800, john stultz wrote:
>
> > > +static inline cycle_t read_timesource(struct timesource_t* ts)
> > > +{
> > > + switch (ts->type) {
> > > + case TIMESOURCE_MMIO_32:
> > > + return (cycle_t)readl(ts->mmio_ptr);
> > > + case TIMESOURCE_MMIO_64:
> > > + return (cycle_t)readq(ts->mmio_ptr);
> > > + case TIMESOURCE_CYCLES:
> > > + return (cycle_t)get_cycles();
> > > + default:/* case: TIMESOURCE_FUNCTION */
> > > + return ts->read_fnct();
> > > + }
> > > +}
> >
> > Wouldn't it be better to change read_fnct to take a timesource * and
> > then change all the other guys to generic_timesource_<foo> helper
> > functions? This does away with the switch and makes it trivial to add
> > new generic sources. Change mmio_ptr to void *private.
>
> Not sure if I totally understand this, but originally I just had a read
> function, but to allow this framework to function w/ ia64 fsyscalls (and
> likely other arches vsyscalls) we need to pass the raw mmio pointers.
> Thus the timesource type and switch idea was taken from the time
> interpolator code.

Well for vsyscall, we can leave the mmio_ptr and type. But in-kernel,
I think we'd rather always call read_fnct with generic helpers than hit this
switch every time.

> > > + if (time_suspend_state != TIME_RUNNING) {
> > > + printk(KERN_INFO "timeofday_suspend_hook: ACK! called while we're suspended!");
> >
> > Line length. Perhaps BUG_ON instead.
>
> Eh, its not fatal to BUG_ON seems a bit harsh. I'll fix the line length
> though.

Well there's a trade-off here. If it's something that should never
happen and you only printk, you may never get a failure report
(especially at KERN_INFO). It's good to be accomodating of external
errors, but catching internal should-never-happen errors is important.

> > Excellent question.
>
> Indeed. Currently jiffies is used as both a interrupt counter and a
> time unit, and I'm trying make it just the former. If I emulate it then
> it stops functioning as a interrupt counter, and if I don't then I'll
> probably break assumptions about jiffies being a time unit. So I'm not
> sure which is the easiest path to go until all the users of jiffies are
> audited for intent.

Post this as a separate thread. There are various thoughts floating
around on this already.

--
Mathematics is the supreme nostalgia of our time.

2005-03-14 19:47:30

by john stultz

[permalink] [raw]
Subject: Re: [RFC][PATCH] new timeofday core subsystem (v. A3)

On Mon, 2005-03-14 at 11:29 -0800, Matt Mackall wrote:
> On Mon, Mar 14, 2005 at 10:42:45AM -0800, john stultz wrote:
> >
> > > > +static inline cycle_t read_timesource(struct timesource_t* ts)
> > > > +{
> > > > + switch (ts->type) {
> > > > + case TIMESOURCE_MMIO_32:
> > > > + return (cycle_t)readl(ts->mmio_ptr);
> > > > + case TIMESOURCE_MMIO_64:
> > > > + return (cycle_t)readq(ts->mmio_ptr);
> > > > + case TIMESOURCE_CYCLES:
> > > > + return (cycle_t)get_cycles();
> > > > + default:/* case: TIMESOURCE_FUNCTION */
> > > > + return ts->read_fnct();
> > > > + }
> > > > +}
> > >
> > > Wouldn't it be better to change read_fnct to take a timesource * and
> > > then change all the other guys to generic_timesource_<foo> helper
> > > functions? This does away with the switch and makes it trivial to add
> > > new generic sources. Change mmio_ptr to void *private.
> >
> > Not sure if I totally understand this, but originally I just had a read
> > function, but to allow this framework to function w/ ia64 fsyscalls (and
> > likely other arches vsyscalls) we need to pass the raw mmio pointers.
> > Thus the timesource type and switch idea was taken from the time
> > interpolator code.
>
> Well for vsyscall, we can leave the mmio_ptr and type. But in-kernel,
> I think we'd rather always call read_fnct with generic helpers than hit this
> switch every time.

Huh. So if I understand you properly, all timesources should have valid
read_fnct pointers that return the cycle value, however we'll still
preserve the type and mmio_ptr so fsyscall/vsyscall bits can use them
externally?

Hmm. I'm a little cautious, as I really want to make the vsyscall
gettimeofday and regular do_gettimeofday be a similar as possible to
avoid some of the bugs we've seen between different gettimeofday
implementations. However I'm not completely against the idea.

Christoph: Do you have any thoughts on this?


> > > > + if (time_suspend_state != TIME_RUNNING) {
> > > > + printk(KERN_INFO "timeofday_suspend_hook: ACK! called while we're suspended!");
> > >
> > > Line length. Perhaps BUG_ON instead.
> >
> > Eh, its not fatal to BUG_ON seems a bit harsh. I'll fix the line length
> > though.
>
> Well there's a trade-off here. If it's something that should never
> happen and you only printk, you may never get a failure report
> (especially at KERN_INFO). It's good to be accomodating of external
> errors, but catching internal should-never-happen errors is important.

Fair enough.


> > > Excellent question.
> >
> > Indeed. Currently jiffies is used as both a interrupt counter and a
> > time unit, and I'm trying make it just the former. If I emulate it then
> > it stops functioning as a interrupt counter, and if I don't then I'll
> > probably break assumptions about jiffies being a time unit. So I'm not
> > sure which is the easiest path to go until all the users of jiffies are
> > audited for intent.
>
> Post this as a separate thread. There are various thoughts floating
> around on this already.

I'm a little busy with other things today, but I'll try to stir up a
discussion on this soon.

thanks
-john


2005-03-14 19:52:11

by Matt Mackall

[permalink] [raw]
Subject: Re: [RFC][PATCH] new timeofday core subsystem (v. A3)

On Mon, Mar 14, 2005 at 11:43:21AM -0800, john stultz wrote:
> On Mon, 2005-03-14 at 11:29 -0800, Matt Mackall wrote:
> > On Mon, Mar 14, 2005 at 10:42:45AM -0800, john stultz wrote:
> > >
> > > > > +static inline cycle_t read_timesource(struct timesource_t* ts)
> > > > > +{
> > > > > + switch (ts->type) {
> > > > > + case TIMESOURCE_MMIO_32:
> > > > > + return (cycle_t)readl(ts->mmio_ptr);
> > > > > + case TIMESOURCE_MMIO_64:
> > > > > + return (cycle_t)readq(ts->mmio_ptr);
> > > > > + case TIMESOURCE_CYCLES:
> > > > > + return (cycle_t)get_cycles();
> > > > > + default:/* case: TIMESOURCE_FUNCTION */
> > > > > + return ts->read_fnct();
> > > > > + }
> > > > > +}
> > > >
> > > > Wouldn't it be better to change read_fnct to take a timesource * and
> > > > then change all the other guys to generic_timesource_<foo> helper
> > > > functions? This does away with the switch and makes it trivial to add
> > > > new generic sources. Change mmio_ptr to void *private.
> > >
> > > Not sure if I totally understand this, but originally I just had a read
> > > function, but to allow this framework to function w/ ia64 fsyscalls (and
> > > likely other arches vsyscalls) we need to pass the raw mmio pointers.
> > > Thus the timesource type and switch idea was taken from the time
> > > interpolator code.
> >
> > Well for vsyscall, we can leave the mmio_ptr and type. But in-kernel,
> > I think we'd rather always call read_fnct with generic helpers than hit this
> > switch every time.
>
> Huh. So if I understand you properly, all timesources should have valid
> read_fnct pointers that return the cycle value, however we'll still
> preserve the type and mmio_ptr so fsyscall/vsyscall bits can use them
> externally?

Well where we'd read an MMIO address, we'd simply set read_fnct to
generic_timesource_mmio32 or so. And that function just does the read.
So both that function and read_timesource become one-liners and we
drop the conditional branches in the switch.

--
Mathematics is the supreme nostalgia of our time.

2005-03-14 20:04:47

by john stultz

[permalink] [raw]
Subject: Re: [RFC][PATCH] new timeofday core subsystem (v. A3)

On Mon, 2005-03-14 at 11:51 -0800, Matt Mackall wrote:
> On Mon, Mar 14, 2005 at 11:43:21AM -0800, john stultz wrote:
> > On Mon, 2005-03-14 at 11:29 -0800, Matt Mackall wrote:
> > > On Mon, Mar 14, 2005 at 10:42:45AM -0800, john stultz wrote:
> > > >
> > > > > > +static inline cycle_t read_timesource(struct timesource_t* ts)
> > > > > > +{
> > > > > > + switch (ts->type) {
> > > > > > + case TIMESOURCE_MMIO_32:
> > > > > > + return (cycle_t)readl(ts->mmio_ptr);
> > > > > > + case TIMESOURCE_MMIO_64:
> > > > > > + return (cycle_t)readq(ts->mmio_ptr);
> > > > > > + case TIMESOURCE_CYCLES:
> > > > > > + return (cycle_t)get_cycles();
> > > > > > + default:/* case: TIMESOURCE_FUNCTION */
> > > > > > + return ts->read_fnct();
> > > > > > + }
> > > > > > +}
> > > > >
> > > > > Wouldn't it be better to change read_fnct to take a timesource * and
> > > > > then change all the other guys to generic_timesource_<foo> helper
> > > > > functions? This does away with the switch and makes it trivial to add
> > > > > new generic sources. Change mmio_ptr to void *private.
> > > >
> > > > Not sure if I totally understand this, but originally I just had a read
> > > > function, but to allow this framework to function w/ ia64 fsyscalls (and
> > > > likely other arches vsyscalls) we need to pass the raw mmio pointers.
> > > > Thus the timesource type and switch idea was taken from the time
> > > > interpolator code.
> > >
> > > Well for vsyscall, we can leave the mmio_ptr and type. But in-kernel,
> > > I think we'd rather always call read_fnct with generic helpers than hit this
> > > switch every time.
> >
> > Huh. So if I understand you properly, all timesources should have valid
> > read_fnct pointers that return the cycle value, however we'll still
> > preserve the type and mmio_ptr so fsyscall/vsyscall bits can use them
> > externally?
>
> Well where we'd read an MMIO address, we'd simply set read_fnct to
> generic_timesource_mmio32 or so. And that function just does the read.
> So both that function and read_timesource become one-liners and we
> drop the conditional branches in the switch.

However the vsyscall/fsyscall bits cannot call in-kernel functions (as
they execute in userspace or a sudo-userspace). As it stands now in my
design TIMESOURCE_FUNCTION timesources will not be usable for
vsyscall/fsyscall implementations, so I'm not sure if that's doable.

I'd be interested you've got a way around that.

thanks
-john


2005-03-14 20:29:21

by Matt Mackall

[permalink] [raw]
Subject: Re: [RFC][PATCH] new timeofday core subsystem (v. A3)

On Mon, Mar 14, 2005 at 12:04:07PM -0800, john stultz wrote:
> > > > > > > +static inline cycle_t read_timesource(struct timesource_t* ts)
> > > > > > > +{
> > > > > > > + switch (ts->type) {
> > > > > > > + case TIMESOURCE_MMIO_32:
> > > > > > > + return (cycle_t)readl(ts->mmio_ptr);
> > > > > > > + case TIMESOURCE_MMIO_64:
> > > > > > > + return (cycle_t)readq(ts->mmio_ptr);
> > > > > > > + case TIMESOURCE_CYCLES:
> > > > > > > + return (cycle_t)get_cycles();
> > > > > > > + default:/* case: TIMESOURCE_FUNCTION */
> > > > > > > + return ts->read_fnct();
> > > > > > > + }
> > > > > > > +}
> > Well where we'd read an MMIO address, we'd simply set read_fnct to
> > generic_timesource_mmio32 or so. And that function just does the read.
> > So both that function and read_timesource become one-liners and we
> > drop the conditional branches in the switch.
>
> However the vsyscall/fsyscall bits cannot call in-kernel functions (as
> they execute in userspace or a sudo-userspace). As it stands now in my
> design TIMESOURCE_FUNCTION timesources will not be usable for
> vsyscall/fsyscall implementations, so I'm not sure if that's doable.
>
> I'd be interested you've got a way around that.

We can either stick all the generic mmio timer functions in the
vsyscall page (they're tiny) or leave the vsyscall using type/ptr but
have the kernel internally use only the function pointer. Someone
who's more familiar with the vsyscall timer code should chime in here.

--
Mathematics is the supreme nostalgia of our time.

2005-03-14 23:43:26

by George Anzinger

[permalink] [raw]
Subject: Re: [RFC][PATCH] new timeofday core subsystem (v. A3)

john stultz wrote:
> On Sat, 2005-03-12 at 16:49 -0800, Matt Mackall wrote:
>
~

>>
>
>>>+ /* finally, update legacy time values */
>>>+ write_seqlock_irqsave(&xtime_lock, x_flags);
>>>+ xtime = ns2timespec(system_time + wall_time_offset);
>>>+ wall_to_monotonic = ns2timespec(wall_time_offset);
>>>+ wall_to_monotonic.tv_sec = -wall_to_monotonic.tv_sec;
>>>+ wall_to_monotonic.tv_nsec = -wall_to_monotonic.tv_nsec;
>>>+ /* XXX - should jiffies be updated here? */
>>
>>Excellent question.
>
>
> Indeed. Currently jiffies is used as both a interrupt counter and a
> time unit, and I'm trying make it just the former. If I emulate it then
> it stops functioning as a interrupt counter, and if I don't then I'll
> probably break assumptions about jiffies being a time unit. So I'm not
> sure which is the easiest path to go until all the users of jiffies are
> audited for intent.

Really? Who counts interrupts??? The timer code treats jiffies as a unit of
time. You will need to rewrite that to make it otherwise. But then you have
another problem. To correctly function, times need to expire on time (hay how
bout that) not some time later. To do this we need an interrupt source. To
this point in time, the jiffies interrupt has been the indication that one or
more timer may have expired. While we don't need to "count" the interrupts, we
DO need them to expire the timers AND they need to be on time.
>
~
--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/

2005-03-15 00:13:30

by Christoph Lameter

[permalink] [raw]
Subject: Re: [RFC][PATCH] new timeofday core subsystem (v. A3)



On Fri, 11 Mar 2005, john stultz wrote:

> +/* get_lowres_timestamp():
> + * Returns a low res timestamp.
> + * (ie: the value of system_time as calculated at
> + * the last invocation of timeofday_periodic_hook() )
> + */
> +nsec_t get_lowres_timestamp(void)
> +{
> + nsec_t ret;
> + unsigned long seq;
> + do {
> + seq = read_seqbegin(&system_time_lock);
> +
> + /* quickly grab system_time*/
> + ret = system_time;
> +
> + } while (read_seqretry(&system_time_lock, seq));
> +
> + return ret;
> +}

On 64 bit platforms this could simply be a macro accessing "system time".

> +/* do_gettimeofday():
> + * Returns the time of day
> + */
> +void do_gettimeofday(struct timeval *tv)
> +{
> + nsec_t wall, sys;
> + unsigned long seq;
> +
> + /* atomically read wall and sys time */
> + do {
> + seq = read_seqbegin(&system_time_lock);
> +
> + wall = wall_time_offset;
> + sys = __monotonic_clock();
> +
> + } while (read_seqretry(&system_time_lock, seq));
> +
> + /* add them and convert to timeval */
> + *tv = ns2timeval(wall+sys);
> +}
> +EXPORT_SYMBOL(do_gettimeofday);

Good.

2005-03-15 00:35:23

by Christoph Lameter

[permalink] [raw]
Subject: Re: [RFC][PATCH] new timeofday core subsystem (v. A3)

On Mon, 14 Mar 2005, john stultz wrote:

> Huh. So if I understand you properly, all timesources should have valid
> read_fnct pointers that return the cycle value, however we'll still
> preserve the type and mmio_ptr so fsyscall/vsyscall bits can use them
> externally?
>
> Hmm. I'm a little cautious, as I really want to make the vsyscall
> gettimeofday and regular do_gettimeofday be a similar as possible to
> avoid some of the bugs we've seen between different gettimeofday
> implementations. However I'm not completely against the idea.
>
> Christoph: Do you have any thoughts on this?

Sorry to be late to the party. It would be a weird implementation to have
two ways to obtain time for each timesource. Also would be even more a
headache to maintain than the existing fastcall vs. fullcall.

2005-03-15 00:44:57

by john stultz

[permalink] [raw]
Subject: Re: [RFC][PATCH] new timeofday core subsystem (v. A3)

On Mon, 2005-03-14 at 16:28 -0800, Christoph Lameter wrote:
> On Mon, 14 Mar 2005, john stultz wrote:
>
> > Huh. So if I understand you properly, all timesources should have valid
> > read_fnct pointers that return the cycle value, however we'll still
> > preserve the type and mmio_ptr so fsyscall/vsyscall bits can use them
> > externally?
> >
> > Hmm. I'm a little cautious, as I really want to make the vsyscall
> > gettimeofday and regular do_gettimeofday be a similar as possible to
> > avoid some of the bugs we've seen between different gettimeofday
> > implementations. However I'm not completely against the idea.
> >
> > Christoph: Do you have any thoughts on this?
>
> Sorry to be late to the party. It would be a weird implementation to have
> two ways to obtain time for each timesource. Also would be even more a
> headache to maintain than the existing fastcall vs. fullcall.

That's my feeling as well, unless a more convincing argument comes up.

thanks
-john

2005-03-15 00:48:13

by Christoph Lameter

[permalink] [raw]
Subject: Re: [RFC][PATCH] new timeofday core subsystem (v. A3)

On Mon, 14 Mar 2005, Matt Mackall wrote:
> We can either stick all the generic mmio timer functions in the
> vsyscall page (they're tiny) or leave the vsyscall using type/ptr but
> have the kernel internally use only the function pointer. Someone
> who's more familiar with the vsyscall timer code should chime in here.

No we cannot do any function calls in a fastcall path on ia64. The current
design is ok. Why duplicate the functionality with additional indirect
function calls? Plus an indirect function calls stalls pipelines on some
processors and will limit the performance of gettimeofday.

2005-03-15 01:13:05

by john stultz

[permalink] [raw]
Subject: [topic change] jiffies as a time value

On Mon, 2005-03-14 at 15:40 -0800, George Anzinger wrote:
> john stultz wrote:
> > On Sat, 2005-03-12 at 16:49 -0800, Matt Mackall wrote:
> >>>+ /* finally, update legacy time values */
> >>>+ write_seqlock_irqsave(&xtime_lock, x_flags);
> >>>+ xtime = ns2timespec(system_time + wall_time_offset);
> >>>+ wall_to_monotonic = ns2timespec(wall_time_offset);
> >>>+ wall_to_monotonic.tv_sec = -wall_to_monotonic.tv_sec;
> >>>+ wall_to_monotonic.tv_nsec = -wall_to_monotonic.tv_nsec;
> >>>+ /* XXX - should jiffies be updated here? */
> >>
> >>Excellent question.
> >
> > Indeed. Currently jiffies is used as both a interrupt counter and a
> > time unit, and I'm trying make it just the former. If I emulate it then
> > it stops functioning as a interrupt counter, and if I don't then I'll
> > probably break assumptions about jiffies being a time unit. So I'm not
> > sure which is the easiest path to go until all the users of jiffies are
> > audited for intent.
>
> Really? Who counts interrupts??? The timer code treats jiffies as a unit of
> time. You will need to rewrite that to make it otherwise.

Ug. I'm thin on time this week, so I was hoping to save this discussion
for later, but I guess we can get into it now.

Well, assuming timer interrupts actually occur HZ times a second, yes
one could (and current practice, one does) implicitly interpret jiffies
as being a valid notion of time. However with SMIs, bad drivers that
disable interrupts for too long, and virtualization the reality is that
that assumption doesn't hold.

We do have the lost-ticks compensation code that tries to help this, but
that conflicts with some virtualization implementations. Suspend/resume
tries to compensate jiffies for ticks missed over time suspended, but
I'm not sure how accurate it really is (additionally, looking at it now,
it assumes jiffies is only 32bits).

Adding to that, the whole jiffies doesn't really increment at HZ, but
ACTHZ confusion, or bad drivers that assume HZ=100, we get a fair amount
of trouble stemming from folks using jiffies as a time value. Because
in reality, it is just a interrupt counter.

So now, if new timeofday code emulates jiffies, we have to decide if it
emulates jiffies at HZ or ACTHZ? Also there could be issues with jiffies
possibly jittering from it being incremented every tick and then set to
the proper time when the timekeeping code runs.

I'm not sure which is the best way to go, but it sounds that emulating
it is probably the easiest. I just deferred the question with a comment
until now because its not completely obvious. Any suggestions on the
above questions (I'm guessing the answers are: use ACTHZ, and the jitter
won't hurt that bad).

> But then you have
> another problem. To correctly function, times need to expire on time (hay how
> bout that) not some time later. To do this we need an interrupt source. To
> this point in time, the jiffies interrupt has been the indication that one or
> more timer may have expired. While we don't need to "count" the interrupts, we
> DO need them to expire the timers AND they need to be on time.

Well, something Nish Aravamudan has been working on is converting the
common users of jiffies (drivers) to start using human time units. These
very well understood units (which avoid HZ/ACTHZ/HZ=100 assumptions) can
then be accurately changed to jiffies (or possibly some other time unit)
internally. It would even be possible for soft-timers to expire based
upon the actual high-res time value, rather then the low-res tick-
counter(which is something else Nish has been playing with). When that
occurs we can easily start doing other interesting things that I believe
you've already been working on in your HRT code, such as changing the
timer interrupt frequency dynamically, or working with multiple timer
interrupt sources.

So basically, lots of interesting questions and possibilities and I very
much look forward to your input and suggestions.

thanks
-john

2005-03-15 02:35:36

by Albert Cahalan

[permalink] [raw]
Subject: Re: [RFC][PATCH] new timeofday core subsystem (v. A3)

On Mon, 2005-03-14 at 12:27 -0800, Matt Mackall wrote:
> On Mon, Mar 14, 2005 at 12:04:07PM -0800, john stultz wrote:
> > > > > > > > +static inline cycle_t read_timesource(struct timesource_t* ts)
> > > > > > > > +{
> > > > > > > > + switch (ts->type) {
> > > > > > > > + case TIMESOURCE_MMIO_32:
> > > > > > > > + return (cycle_t)readl(ts->mmio_ptr);
> > > > > > > > + case TIMESOURCE_MMIO_64:
> > > > > > > > + return (cycle_t)readq(ts->mmio_ptr);
> > > > > > > > + case TIMESOURCE_CYCLES:
> > > > > > > > + return (cycle_t)get_cycles();
> > > > > > > > + default:/* case: TIMESOURCE_FUNCTION */
> > > > > > > > + return ts->read_fnct();
> > > > > > > > + }
> > > > > > > > +}
> > > Well where we'd read an MMIO address, we'd simply set read_fnct to
> > > generic_timesource_mmio32 or so. And that function just does the read.
> > > So both that function and read_timesource become one-liners and we
> > > drop the conditional branches in the switch.
> >
> > However the vsyscall/fsyscall bits cannot call in-kernel functions (as
> > they execute in userspace or a sudo-userspace). As it stands now in my
> > design TIMESOURCE_FUNCTION timesources will not be usable for
> > vsyscall/fsyscall implementations, so I'm not sure if that's doable.
> >
> > I'd be interested you've got a way around that.
>
> We can either stick all the generic mmio timer functions in the
> vsyscall page (they're tiny) or leave the vsyscall using type/ptr but
> have the kernel internally use only the function pointer. Someone
> who's more familiar with the vsyscall timer code should chime in here.

When the vsyscall page is created, copy the one needed function
into it. The kernel is already self-modifying in many places; this
is nothing new.



2005-03-15 03:28:09

by Christoph Lameter

[permalink] [raw]
Subject: Re: [RFC][PATCH] new timeofday core subsystem (v. A3)

On Mon, 14 Mar 2005, Albert Cahalan wrote:

> When the vsyscall page is created, copy the one needed function
> into it. The kernel is already self-modifying in many places; this
> is nothing new.

AFAIK this will only works on ia32 and x86_64 and not definitely not
on ia64. Who knows about the other platforms ....

2005-03-15 05:40:22

by Christoph Lameter

[permalink] [raw]
Subject: Re: [RFC][PATCH] new timeofday core subsystem (v. A3)

Note that similarities exist between the posix clock and the time sources.
Will all time sources be exportable as posix clocks?


2005-03-15 06:11:27

by Christoph Lameter

[permalink] [raw]
Subject: Re: [RFC][PATCH] new timeofday core subsystem (v. A3)

On Fri, 11 Mar 2005, john stultz wrote:

> +/* cyc2ns():
> + * Uses the timesource and ntp ajdustment interval to
> + * convert cycle_ts to nanoseconds.
> + * If rem is not null, it stores the remainder of the
> + * calculation there.
> + *
> + */

This function is called in critical paths and it would be very important
to optimize it further.

> +static inline nsec_t cyc2ns(struct timesource_t* ts, int ntp_adj, cycle_t cycles, cycle_t* rem)
> +{
> + u64 ret;
> + ret = (u64)cycles;
> + ret *= (ts->mult + ntp_adj);

This only changes when nt_adj changes. Maybe maintain the sum separately?

> + if (unlikely(rem)) {
> + /* XXX clean this up later!
> + * buf for now relax, we only calc
> + * remainders at interrupt time
> + */
> + u64 remainder = ret & ((1 << ts->shift) -1);
> + do_div(remainder, ts->mult);
> + *rem = remainder;

IA64 does not do remainder processing (maybe I just do not understand
this...) but this seems to be not necessay if one uses 64 bit values that
are properly shifted?

> + }
> + ret >>= ts->shift;
> + return (nsec_t)ret;
> +}

The whole function could simply be:

#define cyc2ns(cycles, ts) (cycles*ts->current_factor) >> ts->shift

2005-03-15 15:43:27

by Albert Cahalan

[permalink] [raw]
Subject: Re: [RFC][PATCH] new timeofday core subsystem (v. A3)

On Mon, 2005-03-14 at 19:22 -0800, Christoph Lameter wrote:
> On Mon, 14 Mar 2005, Albert Cahalan wrote:
>
> > When the vsyscall page is created, copy the one needed function
> > into it. The kernel is already self-modifying in many places; this
> > is nothing new.
>
> AFAIK this will only works on ia32 and x86_64 and not definitely not
> on ia64. Who knows about the other platforms ....

I'll bet it does work fine on IA-64. If it didn't, you would
be unable to load the kernel or load an executable.

I know it works for PowerPC. You'll need an isync instruction
of course. You may also want a sync instruction and some code
to invalidate the cache.

Setting up the page content should be a 1-time operation done
at boot. Check your processor manuals as needed.


2005-03-15 15:53:05

by Chris Friesen

[permalink] [raw]
Subject: Re: [RFC][PATCH] new timeofday core subsystem (v. A3)

Albert Cahalan wrote:

> I know it works for PowerPC. You'll need an isync instruction
> of course. You may also want a sync instruction and some code
> to invalidate the cache.

For PPC you'll want to flush the dcache, then invalidate the icache.
This will ensure that it works on all processors.

Chris

2005-03-15 18:28:57

by john stultz

[permalink] [raw]
Subject: Re: [RFC][PATCH] new timeofday core subsystem (v. A3)

On Mon, 2005-03-14 at 21:37 -0800, Christoph Lameter wrote:
> Note that similarities exist between the posix clock and the time sources.
> Will all time sources be exportable as posix clocks?

At this point I'm not familiar enough with the posix clocks interface to
say, although its probably outside the scope of the initial timeofday
rework.

Do you have a link that might explain the posix clocks spec and its
intent?

thanks
-john

2005-03-15 22:14:17

by George Anzinger

[permalink] [raw]
Subject: Re: [RFC][PATCH] new timeofday core subsystem (v. A3)

john stultz wrote:
> On Mon, 2005-03-14 at 21:37 -0800, Christoph Lameter wrote:
>
>>Note that similarities exist between the posix clock and the time sources.
>>Will all time sources be exportable as posix clocks?
>
>
> At this point I'm not familiar enough with the posix clocks interface to
> say, although its probably outside the scope of the initial timeofday
> rework.

I do think we need to consider the needs of that subsystem. Clock wise, it
makes a monotonic and a real time clock available to the user. The real time
clock is just a timespec version of the timeval gettimeofday clock. At the
current time, the monotonic clock is the real time clock plus wall_to_monotonic.
All that is rather simple and straight forward, an I don't recommend adding
any other clocks unless there is a real need.

The interesting thing is that the posix timers are based on the posix clocks
which are base on wall_clock, and the jiffies clock which is what runs the
timers. In order to make sense of timer requests it is neccessary to,
atomically, grab all three clocks (i.e. wall_clock aka gettimeofday,
wall_to_monotonic, and jiffies with the jiffies offset). The code can then
figure out when a timer needs to expire in jiffies time in order to expire at a
given wall or monotonic time. Currently the xtime_time sequence lock is used to
do this.

Another issue that posix timers brings forward is the need to know when the
clock is set. This is needed to cause timers that were requested to expire at
some absolute wall_time to do so even if time is set while they are running. A
word on how this is done is in order...

Since the processing of a clock set by the posix timers code may, in fact, allow
the time to be set more than once before the affected timers are adjusted (or
rather to avoid the locking rats nest not allowing this would cause), the
wall_to_monotonic value is exploited. In particular, a clock setting changes
this value by the exact amount that time was adjusted. So, each posix timer
carries the value of wall_to_monotonic that was in use when the timer was
started. The clock_was_set code uses this to compute the clock movement and
thus the adjustment needed to make the timer expire at the right time.

What this translates to in the new code is a) the need for a way to atomically
get all the key times (wall, monotonic, jiffie) and b) access to a value that
will allow it to compute the amount of time a clock set, or a series of clock
settings, changed time by. Of course, it also needs the clock_was_set() notify
call.
>
> Do you have a link that might explain the posix clocks spec and its
> intent?

Well, there is my signature :) Really, on the high-res-timers project site you
want to download the support patch. In there, among other things, is a set of
man pages on posix clocks & timers. The patch applies to any kernel and just
adds a new set of directories off of Documentation.
--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/

2005-03-15 23:06:55

by George Anzinger

[permalink] [raw]
Subject: Re: [topic change] jiffies as a time value

john stultz wrote:
> On Mon, 2005-03-14 at 15:40 -0800, George Anzinger wrote:
>
>>john stultz wrote:
>>
>>>On Sat, 2005-03-12 at 16:49 -0800, Matt Mackall wrote:
>>>
>>>>>+ /* finally, update legacy time values */
>>>>>+ write_seqlock_irqsave(&xtime_lock, x_flags);
>>>>>+ xtime = ns2timespec(system_time + wall_time_offset);
>>>>>+ wall_to_monotonic = ns2timespec(wall_time_offset);
>>>>>+ wall_to_monotonic.tv_sec = -wall_to_monotonic.tv_sec;
>>>>>+ wall_to_monotonic.tv_nsec = -wall_to_monotonic.tv_nsec;
>>>>>+ /* XXX - should jiffies be updated here? */
>>>>
>>>>Excellent question.
>>>
>>>Indeed. Currently jiffies is used as both a interrupt counter and a
>>>time unit, and I'm trying make it just the former. If I emulate it then
>>>it stops functioning as a interrupt counter, and if I don't then I'll
>>>probably break assumptions about jiffies being a time unit. So I'm not
>>>sure which is the easiest path to go until all the users of jiffies are
>>>audited for intent.
>>
>>Really? Who counts interrupts??? The timer code treats jiffies as a unit of
>>time. You will need to rewrite that to make it otherwise.
>
>
> Ug. I'm thin on time this week, so I was hoping to save this discussion
> for later, but I guess we can get into it now.
>
> Well, assuming timer interrupts actually occur HZ times a second, yes
> one could (and current practice, one does) implicitly interpret jiffies
> as being a valid notion of time. However with SMIs, bad drivers that
> disable interrupts for too long, and virtualization the reality is that
> that assumption doesn't hold.
>
> We do have the lost-ticks compensation code that tries to help this, but
> that conflicts with some virtualization implementations. Suspend/resume
> tries to compensate jiffies for ticks missed over time suspended, but
> I'm not sure how accurate it really is (additionally, looking at it now,
> it assumes jiffies is only 32bits).
>
> Adding to that, the whole jiffies doesn't really increment at HZ, but
> ACTHZ confusion, or bad drivers that assume HZ=100, we get a fair amount
> of trouble stemming from folks using jiffies as a time value. Because
> in reality, it is just a interrupt counter.

Well, currently, in x86 systems it causes wall clock to advance a very well
defined amount. That it is not exactly 1/HZ is something we need to live with...
>
> So now, if new timeofday code emulates jiffies, we have to decide if it
> emulates jiffies at HZ or ACTHZ? Also there could be issues with jiffies
> possibly jittering from it being incremented every tick and then set to
> the proper time when the timekeeping code runs.

I think your overlooking timers. We have a given resolution for timers and some
code, at least, expects timers to run with that resolution. This REQUIRES
interrupts at resolution frequency. We can argue about what that interrupt
event is called (currently a jiffies interrupt) and disparage the fact that
hardware can not give us "nice" numbers for the resolution, but we do need the
interrupts. That there are bad places in the code where interrupts are delayed
is not really important in this discussion. For what it worth, the RT patch
Ingo is working on is getting latencies down in the 10s of microseconds region.

We also need, IMNSHO to recognize that, at lest with some hardware, that
interrupt IS in fact the clock and is the only reasonable way we have of reading
it. This is true, for example, on the x86. The TSC we use as a fill in for
between interrupts is not stable in the long term and should only be used to
interpolate over 1 to 10 ticks or so.
>
> I'm not sure which is the best way to go, but it sounds that emulating
> it is probably the easiest. I just deferred the question with a comment
> until now because its not completely obvious. Any suggestions on the
> above questions (I'm guessing the answers are: use ACTHZ, and the jitter
> won't hurt that bad).
>
>
>>But then you have
>>another problem. To correctly function, times need to expire on time (hay how
>>bout that) not some time later. To do this we need an interrupt source. To
>>this point in time, the jiffies interrupt has been the indication that one or
>>more timer may have expired. While we don't need to "count" the interrupts, we
>>DO need them to expire the timers AND they need to be on time.
>
>
> Well, something Nish Aravamudan has been working on is converting the
> common users of jiffies (drivers) to start using human time units. These
> very well understood units (which avoid HZ/ACTHZ/HZ=100 assumptions) can
> then be accurately changed to jiffies (or possibly some other time unit)
> internally. It would even be possible for soft-timers to expire based
> upon the actual high-res time value, rather then the low-res tick-
> counter(which is something else Nish has been playing with). When that
> occurs we can easily start doing other interesting things that I believe
> you've already been working on in your HRT code, such as changing the
> timer interrupt frequency dynamically, or working with multiple timer
> interrupt sources.

This is also what is done in things like posix timers. The fact remains that,
at least in the posix timers case, the resolution is exported to the user and
implies certain things. I am not sure we are explicitly exporting the
resolution in the kernel, but, down under the code, there is a resolution AND it
impacts what one should expect of timers.
>
> So basically, lots of interesting questions and possibilities and I very
> much look forward to your input and suggestions.
>
It may help to understand that MOST internal timers (i.e. timers the kernel
sets) never expire. They are canceled by the caller because they were really
"dead man" timers, i.e. "it better happen be this time or we are hurting".

Users, on the other hand, for the most part set up timers to allow bits of code
to run either periodically or at some specified time. These are the folks who
care about latency and being on time.

User also, of course, also use "dead man" timers such as, for example, the time
out on select. By their nature the "dead man" timer, usually, does not have
strict on time requirements.
--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/

2005-03-15 23:13:15

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC][PATCH] new timeofday arch specific hooks (v. A3)

Hi!

> diff -Nru a/arch/i386/kernel/apm.c b/arch/i386/kernel/apm.c
> --- a/arch/i386/kernel/apm.c 2005-03-11 17:02:30 -08:00
> +++ b/arch/i386/kernel/apm.c 2005-03-11 17:02:30 -08:00
> @@ -224,6 +224,7 @@
> #include <linux/smp_lock.h>
> #include <linux/dmi.h>
> #include <linux/suspend.h>
> +#include <linux/timeofday.h>
>
> #include <asm/system.h>
> #include <asm/uaccess.h>
> @@ -1204,6 +1205,7 @@
> device_suspend(PMSG_SUSPEND);
> device_power_down(PMSG_SUSPEND);
>
> + timeofday_suspend_hook();
> /* serialize with the timer interrupt */
> write_seqlock_irq(&xtime_lock);
>

Could you just register timeofday subsystem as a system device? Then
device_power_down will call you automagically..... And you'll not have
to modify apm, acpi, swsusp, ppc suspend, arm suspend, ...

Pavel

--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2005-03-15 23:45:36

by john stultz

[permalink] [raw]
Subject: Re: [RFC][PATCH] new timeofday arch specific hooks (v. A3)

On Tue, 2005-03-15 at 23:59 +0100, Pavel Machek wrote:
> > diff -Nru a/arch/i386/kernel/apm.c b/arch/i386/kernel/apm.c
> > --- a/arch/i386/kernel/apm.c 2005-03-11 17:02:30 -08:00
> > +++ b/arch/i386/kernel/apm.c 2005-03-11 17:02:30 -08:00
> > @@ -224,6 +224,7 @@
> > #include <linux/smp_lock.h>
> > #include <linux/dmi.h>
> > #include <linux/suspend.h>
> > +#include <linux/timeofday.h>
> >
> > #include <asm/system.h>
> > #include <asm/uaccess.h>
> > @@ -1204,6 +1205,7 @@
> > device_suspend(PMSG_SUSPEND);
> > device_power_down(PMSG_SUSPEND);
> >
> > + timeofday_suspend_hook();
> > /* serialize with the timer interrupt */
> > write_seqlock_irq(&xtime_lock);
> >
>
> Could you just register timeofday subsystem as a system device? Then
> device_power_down will call you automagically..... And you'll not have
> to modify apm, acpi, swsusp, ppc suspend, arm suspend, ...

That may very well be the right way to go. At the moment I'm just very
hesitant of making any user-visible changes.

What is the impact if a new system device name is created and then I
later change it? How stable is that interface supposed to be?

thanks
-john

2005-03-15 23:48:58

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC][PATCH] new timeofday arch specific hooks (v. A3)

On ?t 15-03-05 15:42:09, john stultz wrote:
> On Tue, 2005-03-15 at 23:59 +0100, Pavel Machek wrote:
> > > diff -Nru a/arch/i386/kernel/apm.c b/arch/i386/kernel/apm.c
> > > --- a/arch/i386/kernel/apm.c 2005-03-11 17:02:30 -08:00
> > > +++ b/arch/i386/kernel/apm.c 2005-03-11 17:02:30 -08:00
> > > @@ -224,6 +224,7 @@
> > > #include <linux/smp_lock.h>
> > > #include <linux/dmi.h>
> > > #include <linux/suspend.h>
> > > +#include <linux/timeofday.h>
> > >
> > > #include <asm/system.h>
> > > #include <asm/uaccess.h>
> > > @@ -1204,6 +1205,7 @@
> > > device_suspend(PMSG_SUSPEND);
> > > device_power_down(PMSG_SUSPEND);
> > >
> > > + timeofday_suspend_hook();
> > > /* serialize with the timer interrupt */
> > > write_seqlock_irq(&xtime_lock);
> > >
> >
> > Could you just register timeofday subsystem as a system device? Then
> > device_power_down will call you automagically..... And you'll not have
> > to modify apm, acpi, swsusp, ppc suspend, arm suspend, ...
>
> That may very well be the right way to go. At the moment I'm just very
> hesitant of making any user-visible changes.
>
> What is the impact if a new system device name is created and then I
> later change it? How stable is that interface supposed to be?

Changing its name is okay... your device probably will not have any
user-accessible controls, right?
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2005-03-16 01:45:14

by john stultz

[permalink] [raw]
Subject: Re: [RFC][PATCH] new timeofday arch specific hooks (v. A3)

On Wed, 2005-03-16 at 00:44 +0100, Pavel Machek wrote:
> On ?t 15-03-05 15:42:09, john stultz wrote:
> > On Tue, 2005-03-15 at 23:59 +0100, Pavel Machek wrote:
> > > > diff -Nru a/arch/i386/kernel/apm.c b/arch/i386/kernel/apm.c
> > > > --- a/arch/i386/kernel/apm.c 2005-03-11 17:02:30 -08:00
> > > > +++ b/arch/i386/kernel/apm.c 2005-03-11 17:02:30 -08:00
> > > > @@ -224,6 +224,7 @@
> > > > #include <linux/smp_lock.h>
> > > > #include <linux/dmi.h>
> > > > #include <linux/suspend.h>
> > > > +#include <linux/timeofday.h>
> > > >
> > > > #include <asm/system.h>
> > > > #include <asm/uaccess.h>
> > > > @@ -1204,6 +1205,7 @@
> > > > device_suspend(PMSG_SUSPEND);
> > > > device_power_down(PMSG_SUSPEND);
> > > >
> > > > + timeofday_suspend_hook();
> > > > /* serialize with the timer interrupt */
> > > > write_seqlock_irq(&xtime_lock);
> > > >
> > >
> > > Could you just register timeofday subsystem as a system device? Then
> > > device_power_down will call you automagically..... And you'll not have
> > > to modify apm, acpi, swsusp, ppc suspend, arm suspend, ...
> >
> > That may very well be the right way to go. At the moment I'm just very
> > hesitant of making any user-visible changes.
> >
> > What is the impact if a new system device name is created and then I
> > later change it? How stable is that interface supposed to be?
>
> Changing its name is okay... your device probably will not have any
> user-accessible controls, right?

Well, at some point I want to have some way for the user to be able to
select which timesource they want to be used. Similar to the current
"clock=" boot option override, there would be some sort of sysfs
timesource entry that users could "echo tsc" or whatever into in order
to force the system to use the tsc timesource at runtime.

This however would be separate from the timeofday suspend/resume hooks,
so its probably not an issue. Let me know if I'm wrong.

thanks!
-john


2005-03-16 03:57:21

by john stultz

[permalink] [raw]
Subject: Re: [topic change] jiffies as a time value

George,
I'm still digesting your mail. For now I'll just answer the easy bits,
and I'll owe you a better reply once I get all of this absorbed.

On Tue, 2005-03-15 at 15:01 -0800, George Anzinger wrote:
> We also need, IMNSHO to recognize that, at lest with some hardware, that
> interrupt IS in fact the clock and is the only reasonable way we have of reading
> it. This is true, for example, on the x86. The TSC we use as a fill in for
> between interrupts is not stable in the long term and should only be used to
> interpolate over 1 to 10 ticks or so.

Yep, the TSC is a terrible time source, but everyone still loves it! Its
so fast! However since every timesource isn't so bad, I don't feel we
need to punish everyone with the bugs interpolation can cause.

So my plan is an "interpolated timesource", which will fit into my
current framework without any changes. Basically it will work as the
current tsc/tick code does, but just in its own timesource driver, so
the core code stays pretty and sane. It will still preserve some of the
issues we see now with the interpolated time code, but since we're in a
more flexible environment, we might be able more easily try new
workarounds.

thanks
-john

2005-03-16 10:53:49

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC][PATCH] new timeofday arch specific hooks (v. A3)


> > Changing its name is okay... your device probably will not have any
> > user-accessible controls, right?
>
> Well, at some point I want to have some way for the user to be able to
> select which timesource they want to be used. Similar to the current
> "clock=" boot option override, there would be some sort of sysfs
> timesource entry that users could "echo tsc" or whatever into in order
> to force the system to use the tsc timesource at runtime.
>
> This however would be separate from the timeofday suspend/resume hooks,
> so its probably not an issue. Let me know if I'm wrong.

No, it should not be a problem. And yes, you could probably use same
sysfs code to select timesource... Just make sure that name is stable
before you publish that interface too much.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2005-03-17 08:18:16

by Ulrich Windl

[permalink] [raw]
Subject: Re: [RFC][PATCH] new timeofday core subsystem (v. A3)

On 15 Mar 2005 at 10:25, john stultz wrote:

> On Mon, 2005-03-14 at 21:37 -0800, Christoph Lameter wrote:
> > Note that similarities exist between the posix clock and the time sources.
> > Will all time sources be exportable as posix clocks?
>
> At this point I'm not familiar enough with the posix clocks interface to
> say, although its probably outside the scope of the initial timeofday
> rework.

I'd be happy to see the required POSIX clocks at nanosecond resolution for the
initial version. Add-Ons may follow later.

>
> Do you have a link that might explain the posix clocks spec and its
> intent?

There's a book named like "POSIX.4: Programming for the real world" by Bill
Gallmeister (I think).

Regards,
Ulrich

2005-03-17 16:56:00

by Russell King

[permalink] [raw]
Subject: Re: [RFC][PATCH] new timeofday core subsystem (v. A3)

On Tue, Mar 15, 2005 at 10:23:54AM -0500, Albert Cahalan wrote:
> On Mon, 2005-03-14 at 19:22 -0800, Christoph Lameter wrote:
> > On Mon, 14 Mar 2005, Albert Cahalan wrote:
> >
> > > When the vsyscall page is created, copy the one needed function
> > > into it. The kernel is already self-modifying in many places; this
> > > is nothing new.
> >
> > AFAIK this will only works on ia32 and x86_64 and not definitely not
> > on ia64. Who knows about the other platforms ....
>
> I'll bet it does work fine on IA-64. If it didn't, you would
> be unable to load the kernel or load an executable.
>
> I know it works for PowerPC. You'll need an isync instruction
> of course. You may also want a sync instruction and some code
> to invalidate the cache.
>
> Setting up the page content should be a 1-time operation done
> at boot. Check your processor manuals as needed.

Won't work on ARM. We have XIP kernels, which prevents the use of
self-modifying code.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2005-03-17 20:04:37

by Albert Cahalan

[permalink] [raw]
Subject: Re: [RFC][PATCH] new timeofday core subsystem (v. A3)

On Thu, 2005-03-17 at 16:55 +0000, Russell King wrote:
> On Tue, Mar 15, 2005 at 10:23:54AM -0500, Albert Cahalan wrote:
> > On Mon, 2005-03-14 at 19:22 -0800, Christoph Lameter wrote:
> > > On Mon, 14 Mar 2005, Albert Cahalan wrote:
> > >
> > > > When the vsyscall page is created, copy the one needed function
> > > > into it. The kernel is already self-modifying in many places; this
> > > > is nothing new.
> > >
> > > AFAIK this will only works on ia32 and x86_64 and not definitely not
> > > on ia64. Who knows about the other platforms ....
> >
> > I'll bet it does work fine on IA-64. If it didn't, you would
> > be unable to load the kernel or load an executable.
> >
> > I know it works for PowerPC. You'll need an isync instruction
> > of course. You may also want a sync instruction and some code
> > to invalidate the cache.
> >
> > Setting up the page content should be a 1-time operation done
> > at boot. Check your processor manuals as needed.
>
> Won't work on ARM. We have XIP kernels, which prevents the use of
> self-modifying code.

Does the ARM kernel provide a special page of code for
apps to execute? If not, then ARM is irrelevant.

Doesn't ARM always have an MMU? If you have an MMU, then
it is no problem to have one single page of non-XIP code
for this purpose.

Supposing that you do support the vsyscall hack and you don't
have an MMU, you can just place the tiny code fragment on the
stack (or anywhere else) when an exec is performed.

So, as far as I can see, ARM is fully capable of supporting this.


2005-03-17 20:29:35

by Russell King

[permalink] [raw]
Subject: Re: [RFC][PATCH] new timeofday core subsystem (v. A3)

On Thu, Mar 17, 2005 at 02:44:57PM -0500, Albert Cahalan wrote:
> Does the ARM kernel provide a special page of code for
> apps to execute? If not, then ARM is irrelevant.

No. However, I was responding to your suggestion that supporting
self modifying code in the kernel is trivial.

> Doesn't ARM always have an MMU? If you have an MMU, then
> it is no problem to have one single page of non-XIP code
> for this purpose.

No. You also have a big misconception about how we map system memory.
We have 1MB mappings, and replacing 1MB of code/data (which would
equate to half a kernel) would completely negate the whole point of
XIP.

> Supposing that you do support the vsyscall hack and you don't
> have an MMU, you can just place the tiny code fragment on the
> stack (or anywhere else) when an exec is performed.
>
> So, as far as I can see, ARM is fully capable of supporting this.

<cough>

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core