2005-05-14 00:18:57

by john stultz

[permalink] [raw]
Subject: [RFC][PATCH (1/7)] new timeofday subsystem (v A5)

All,
This patch implements the architecture independent portion of the new
time of day subsystem. For a brief description on the rework, see here:
http://lwn.net/Articles/120850/ (Many thanks to the LWN team for that
easy to understand writeup!)

I intend this to be the last RFC release and to submit this patch to
Andrew for for testing near the end of this month. So please, if you
have any complaints, suggestions, or blocking issues, let me know.

Included below is timeofday.c (which includes all the time of day
management and accessor functions), ntp.c (which includes the ntp
adjustment code, leapsecond processing, and ntp kernel state machine
code), timesource.c (for timesource specific management functions),
interface definition .h files, the example jiffies timesource (lowest
common denominator time source, mainly for use as example code) and
minimal hooks into arch independent code.

The patch does not function without minimal architecture specific hooks
(i386, x86-64, ppc32, ppc64, ia64 and s390 examples to follow), and it
should be able to be applied to a tree without affecting the existing
code.

New in this version:
o clock_was_set calls
o proper suspend/resume sysfs hooks
o boot time timesource override
o warp_clock implementation
o improved jiffies timesource accuracy from using ACTHZ
o number of minor comment cleanups

Items still on the TODO list:
o Continued Testing
o Final cleanup for submission to Andrew.

Once again, I look forward to your comments and feedback.

thanks
-john

linux-2.6.12-rc4_timeofday-core_A5.patch
========================================


Index: drivers/Makefile
===================================================================
--- eed337ef5e9ae7d62caa84b7974a11fddc7f06e0/drivers/Makefile (mode:100644)
+++ d68b09f31fa98801ead715e9281a2e4676b770a5/drivers/Makefile (mode:100644)
@@ -64,3 +64,4 @@
obj-$(CONFIG_BLK_DEV_SGIIOC4) += sn/
obj-y += firmware/
obj-$(CONFIG_CRYPTO) += crypto/
+obj-$(CONFIG_NEWTOD) += timesource/
Index: drivers/char/hangcheck-timer.c
===================================================================
--- eed337ef5e9ae7d62caa84b7974a11fddc7f06e0/drivers/char/hangcheck-timer.c (mode:100644)
+++ d68b09f31fa98801ead715e9281a2e4676b770a5/drivers/char/hangcheck-timer.c (mode:100644)
@@ -49,6 +49,7 @@
#include <linux/delay.h>
#include <asm/uaccess.h>
#include <linux/sysrq.h>
+#include <linux/timeofday.h>


#define VERSION_STR "0.9.0"
@@ -130,8 +131,12 @@
#endif

#ifdef HAVE_MONOTONIC
+#ifndef CONFIG_NEWTOD
extern unsigned long long monotonic_clock(void);
#else
+#define monotonic_clock() do_monotonic_clock()
+#endif
+#else
static inline unsigned long long monotonic_clock(void)
{
# ifdef __s390__
Index: drivers/timesource/Makefile
===================================================================
--- /dev/null (tree:eed337ef5e9ae7d62caa84b7974a11fddc7f06e0)
+++ d68b09f31fa98801ead715e9281a2e4676b770a5/drivers/timesource/Makefile (mode:100644)
@@ -0,0 +1 @@
+obj-y += jiffies.o
Index: drivers/timesource/jiffies.c
===================================================================
--- /dev/null (tree:eed337ef5e9ae7d62caa84b7974a11fddc7f06e0)
+++ d68b09f31fa98801ead715e9281a2e4676b770a5/drivers/timesource/jiffies.c (mode:100644)
@@ -0,0 +1,69 @@
+/***********************************************************************
+* linux/drivers/timesource/jiffies.c
+*
+* This file contains the jiffies based time source.
+*
+* Copyright (C) 2004, 2005 IBM, John Stultz ([email protected])
+*
+* This program is free software; you can redistribute it and/or modify
+* it under the terms of the GNU General Public License as published by
+* the Free Software Foundation; either version 2 of the License, or
+* (at your option) any later version.
+*
+* This program is distributed in the hope that it will be useful,
+* but WITHOUT ANY WARRANTY; without even the implied warranty of
+* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+* GNU General Public License for more details.
+*
+* You should have received a copy of the GNU General Public License
+* along with this program; if not, write to the Free Software
+* Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+*
+************************************************************************/
+#include <linux/timesource.h>
+#include <linux/jiffies.h>
+#include <linux/init.h>
+
+/* The Jiffies based timesource is the lowest common
+ * denominator time source which should function on
+ * all systems. It has the same coarse resolution as
+ * the timer interrupt frequency HZ and it suffers
+ * inaccuracies caused by missed or lost timer
+ * interrupts and the inability for the timer
+ * interrupt hardware to accuratly tick at the
+ * requested HZ value. It is also not reccomended
+ * for "tick-less" systems.
+ */
+#define NSEC_PER_JIFFY ((((unsigned long long)NSEC_PER_SEC)<<8)/ACTHZ)
+
+/* Since jiffies uses a simple NSEC_PER_JIFFY multiplier
+ * conversion, the .shift value could be zero. However
+ * this would make NTP adjustments impossible as they are
+ * in units of 1/2^.shift. Thus we use JIFFIES_SHIFT to
+ * shift both the nominator and denominator the same
+ * amount, and give ntp adjustments in units of 1/2^10
+ */
+#define JIFFIES_SHIFT 10
+
+static cycle_t jiffies_read(void)
+{
+ cycle_t ret = get_jiffies_64();
+ return ret;
+}
+
+struct timesource_t timesource_jiffies = {
+ .name = "jiffies",
+ .priority = 0, /* lowest priority*/
+ .type = TIMESOURCE_FUNCTION,
+ .read_fnct = jiffies_read,
+ .mask = (cycle_t)-1,
+ .mult = NSEC_PER_JIFFY << JIFFIES_SHIFT, /* See above for details */
+ .shift = JIFFIES_SHIFT,
+};
+
+static int __init init_jiffies_timesource(void)
+{
+ register_timesource(&timesource_jiffies);
+ return 0;
+}
+module_init(init_jiffies_timesource);
Index: include/linux/ntp.h
===================================================================
--- /dev/null (tree:eed337ef5e9ae7d62caa84b7974a11fddc7f06e0)
+++ d68b09f31fa98801ead715e9281a2e4676b770a5/include/linux/ntp.h (mode:100644)
@@ -0,0 +1,20 @@
+/* linux/include/linux/ntp.h
+ *
+ * This file NTP state machine accessor functions.
+ */
+
+#ifndef _LINUX_NTP_H
+#define _LINUX_NTP_H
+#include <linux/types.h>
+#include <linux/time.h>
+#include <linux/timex.h>
+
+/* NTP state machine interfaces */
+nsec_t ntp_scale(nsec_t value);
+int ntp_advance(nsec_t value);
+int ntp_adjtimex(struct timex*);
+int ntp_leapsecond(struct timespec now);
+void ntp_clear(void);
+int get_ntp_status(void);
+
+#endif
Index: include/linux/time.h
===================================================================
--- eed337ef5e9ae7d62caa84b7974a11fddc7f06e0/include/linux/time.h (mode:100644)
+++ d68b09f31fa98801ead715e9281a2e4676b770a5/include/linux/time.h (mode:100644)
@@ -27,6 +27,10 @@

#ifdef __KERNEL__

+/* timeofday base types */
+typedef u64 nsec_t;
+typedef u64 cycle_t;
+
/* Parameters used to convert the timespec values */
#ifndef USEC_PER_SEC
#define USEC_PER_SEC (1000000L)
Index: include/linux/timeofday.h
===================================================================
--- /dev/null (tree:eed337ef5e9ae7d62caa84b7974a11fddc7f06e0)
+++ d68b09f31fa98801ead715e9281a2e4676b770a5/include/linux/timeofday.h (mode:100644)
@@ -0,0 +1,73 @@
+/* linux/include/linux/timeofday.h
+ *
+ * This file contains the interface to the time of day subsystem
+ */
+#ifndef _LINUX_TIMEOFDAY_H
+#define _LINUX_TIMEOFDAY_H
+#include <linux/types.h>
+#include <linux/time.h>
+#include <linux/timex.h>
+#include <linux/timesource.h>
+#include <asm/div64.h>
+
+#ifdef CONFIG_NEWTOD
+/* Public definitions */
+extern nsec_t get_lowres_timestamp(void);
+extern nsec_t get_lowres_timeofday(void);
+extern nsec_t do_monotonic_clock(void);
+
+extern void do_gettimeofday(struct timeval *tv);
+extern void getnstimeofday(struct timespec *ts);
+extern int do_settimeofday(struct timespec *tv);
+extern int do_adjtimex(struct timex *tx);
+
+extern void timeofday_init(void);
+
+
+/* Required externs */
+/* XXX - should this go elsewhere? */
+extern nsec_t read_persistent_clock(void);
+extern void sync_persistent_clock(struct timespec ts);
+#ifdef CONFIG_NEWTOD_VSYSCALL
+extern void arch_update_vsyscall_gtod(nsec_t wall_time, cycle_t offset_base,
+ struct timesource_t* timesource, int ntp_adj);
+#else
+#define arch_update_vsyscall_gtod(x,y,z,w) {}
+#endif
+
+
+/* Inline helper functions */
+static inline struct timeval ns_to_timeval(nsec_t ns)
+{
+ struct timeval tv;
+ tv.tv_sec = div_long_long_rem(ns, NSEC_PER_SEC, &tv.tv_usec);
+ tv.tv_usec = (tv.tv_usec + NSEC_PER_USEC/2) / NSEC_PER_USEC;
+ return tv;
+}
+
+static inline struct timespec ns_to_timespec(nsec_t ns)
+{
+ struct timespec ts;
+ ts.tv_sec = div_long_long_rem(ns, NSEC_PER_SEC, &ts.tv_nsec);
+ return ts;
+}
+
+static inline nsec_t timespec_to_ns(struct timespec* ts)
+{
+ nsec_t ret;
+ ret = ((nsec_t)ts->tv_sec) * NSEC_PER_SEC;
+ ret += ts->tv_nsec;
+ return ret;
+}
+
+static inline nsec_t timeval_to_ns(struct timeval* tv)
+{
+ nsec_t ret;
+ ret = ((nsec_t)tv->tv_sec) * NSEC_PER_SEC;
+ ret += tv->tv_usec * NSEC_PER_USEC;
+ return ret;
+}
+#else /* CONFIG_NEWTOD */
+#define timeofday_init()
+#endif /* CONFIG_NEWTOD */
+#endif /* _LINUX_TIMEOFDAY_H */
Index: include/linux/timesource.h
===================================================================
--- /dev/null (tree:eed337ef5e9ae7d62caa84b7974a11fddc7f06e0)
+++ d68b09f31fa98801ead715e9281a2e4676b770a5/include/linux/timesource.h (mode:100644)
@@ -0,0 +1,157 @@
+/* linux/include/linux/timesource.h
+ *
+ * This file contains the structure definitions for timesources.
+ *
+ * If you are not a timesource, or the time of day code, you should
+ * not be including this file!
+ */
+#ifndef _LINUX_TIMESORUCE_H
+#define _LINUX_TIMESORUCE_H
+
+#include <linux/types.h>
+#include <linux/time.h>
+#include <linux/timex.h>
+#include <asm/io.h>
+#include <asm/div64.h>
+
+/* struct timesource_t:
+ * Provides mostly state-free accessors to the underlying hardware.
+ *
+ * name: ptr to timesource name
+ * priority: priority value for selection (higher is better)
+ * type: defines timesource type
+ * @read_fnct: returns a cycle value
+ * ptr: ptr to MMIO'ed counter
+ * mask: bitmask for two's complement
+ * subtraction of non 64 bit counters
+ * mult: cycle to nanosecond multiplier
+ * shift: cycle to nanosecond divisor (power of two)
+ * @update_callback: called when safe to alter timesource values
+ */
+struct timesource_t {
+ char* name;
+ int priority;
+ enum {
+ TIMESOURCE_FUNCTION,
+ TIMESOURCE_CYCLES,
+ TIMESOURCE_MMIO_32,
+ TIMESOURCE_MMIO_64
+ } type;
+ cycle_t (*read_fnct)(void);
+ void __iomem *mmio_ptr;
+ cycle_t mask;
+ u32 mult;
+ u32 shift;
+ void (*update_callback)(void);
+};
+
+
+/* Helper functions that converts a khz counter
+ * frequency to a timsource multiplier, given the
+ * timesource shift value
+ */
+static inline u32 timesource_khz2mult(u32 khz, u32 shift_constant)
+{
+ /* khz = cyc/(Million ns)
+ * mult/2^shift = ns/cyc
+ * mult = ns/cyc * 2^shift
+ * mult = 1Million/khz * 2^shift
+ * mult = 1000000 * 2^shift / khz
+ * mult = (1000000<<shift) / khz
+ */
+ u64 tmp = ((u64)1000000) << shift_constant;
+ /* XXX - should we round here? */
+ do_div(tmp, khz);
+ return (u32)tmp;
+}
+
+/* Helper functions that converts a hz counter
+ * frequency to a timsource multiplier, given the
+ * timesource shift value
+ */
+static inline u32 timesource_hz2mult(u32 hz, u32 shift_constant)
+{
+ /* hz = cyc/(Billion ns)
+ * mult/2^shift = ns/cyc
+ * mult = ns/cyc * 2^shift
+ * mult = 1Billion/hz * 2^shift
+ * mult = 1000000000 * 2^shift / hz
+ * mult = (1000000000<<shift) / hz
+ */
+ u64 tmp = ((u64)1000000000) << shift_constant;
+ /* XXX - should we round here? */
+ do_div(tmp, hz);
+ return (u32)tmp;
+}
+
+
+/* XXX - this should go somewhere better! */
+#ifndef readq
+static inline unsigned long long readq(void __iomem *addr)
+{
+ u32 low, high;
+ /* loop is required to make sure we get an atomic read */
+ do {
+ high = readl(addr+4);
+ low = readl(addr);
+ } while (high != readl(addr+4));
+
+ return low | (((unsigned long long)high) << 32LL);
+}
+#endif
+
+
+/* read_timesource():
+ * Uses the timesource to return the current cycle_t value
+ */
+static inline cycle_t read_timesource(struct timesource_t *ts)
+{
+ switch (ts->type) {
+ case TIMESOURCE_MMIO_32:
+ return (cycle_t)readl(ts->mmio_ptr);
+ case TIMESOURCE_MMIO_64:
+ return (cycle_t)readq(ts->mmio_ptr);
+ case TIMESOURCE_CYCLES:
+ return (cycle_t)get_cycles();
+ default:/* case: TIMESOURCE_FUNCTION */
+ return ts->read_fnct();
+ }
+}
+
+/* cyc2ns():
+ * Uses the timesource and ntp ajdustment interval to
+ * convert cycle_ts to nanoseconds.
+ */
+static inline nsec_t cyc2ns(struct timesource_t *ts, int ntp_adj, cycle_t cycles)
+{
+ u64 ret;
+ ret = (u64)cycles;
+ ret *= (ts->mult + ntp_adj);
+ ret >>= ts->shift;
+ return (nsec_t)ret;
+}
+
+/* cyc2ns_rem():
+ * Uses the timesource and ntp ajdustment interval to
+ * convert cycle_ts to nanoseconds. Add in remainder portion
+ * which is stored in ns<<ts->shift units and save the new
+ * remainder off.
+ */
+static inline nsec_t cyc2ns_rem(struct timesource_t *ts, int ntp_adj, cycle_t cycles, u64* rem)
+{
+ u64 ret;
+ ret = (u64)cycles;
+ ret *= (ts->mult + ntp_adj);
+ if (rem) {
+ ret += *rem;
+ *rem = ret & ((1<<ts->shift)-1);
+ }
+ ret >>= ts->shift;
+ return (nsec_t)ret;
+}
+
+/* used to install a new time source */
+void register_timesource(struct timesource_t*);
+struct timesource_t* get_next_timesource(void);
+
+#endif
Index: include/linux/timex.h
===================================================================
--- eed337ef5e9ae7d62caa84b7974a11fddc7f06e0/include/linux/timex.h (mode:100644)
+++ d68b09f31fa98801ead715e9281a2e4676b770a5/include/linux/timex.h (mode:100644)
@@ -228,6 +228,7 @@
extern unsigned long tick_nsec; /* ACTHZ period (nsec) */
extern int tickadj; /* amount of adjustment per tick */

+#ifndef CONFIG_NEWTOD
/*
* phase-lock loop variables
*/
@@ -314,6 +315,7 @@
}

#endif /* !CONFIG_TIME_INTERPOLATION */
+#endif /* !CONFIG_NEWTOD */

#endif /* KERNEL */

Index: init/main.c
===================================================================
--- eed337ef5e9ae7d62caa84b7974a11fddc7f06e0/init/main.c (mode:100644)
+++ d68b09f31fa98801ead715e9281a2e4676b770a5/init/main.c (mode:100644)
@@ -47,6 +47,7 @@
#include <linux/rmap.h>
#include <linux/mempolicy.h>
#include <linux/key.h>
+#include <linux/timeofday.h>

#include <asm/io.h>
#include <asm/bugs.h>
@@ -467,6 +468,7 @@
pidhash_init();
init_timers();
softirq_init();
+ timeofday_init();
time_init();

/*
Index: kernel/Makefile
===================================================================
--- eed337ef5e9ae7d62caa84b7974a11fddc7f06e0/kernel/Makefile (mode:100644)
+++ d68b09f31fa98801ead715e9281a2e4676b770a5/kernel/Makefile (mode:100644)
@@ -9,6 +9,7 @@
rcupdate.o intermodule.o extable.o params.o posix-timers.o \
kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o

+obj-$(CONFIG_NEWTOD) += timeofday.o timesource.o ntp.o
obj-$(CONFIG_FUTEX) += futex.o
obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o
obj-$(CONFIG_SMP) += cpu.o spinlock.o
Index: kernel/ntp.c
===================================================================
--- /dev/null (tree:eed337ef5e9ae7d62caa84b7974a11fddc7f06e0)
+++ d68b09f31fa98801ead715e9281a2e4676b770a5/kernel/ntp.c (mode:100644)
@@ -0,0 +1,520 @@
+/********************************************************************
+* linux/kernel/ntp.c
+*
+* NTP state machine and time scaling code.
+*
+* Copyright (C) 2004, 2005 IBM, John Stultz ([email protected])
+*
+* Portions rewritten from kernel/time.c and kernel/timer.c
+* Please see those files for original copyrights.
+*
+* This program is free software; you can redistribute it and/or modify
+* it under the terms of the GNU General Public License as published by
+* the Free Software Foundation; either version 2 of the License, or
+* (at your option) any later version.
+*
+* This program is distributed in the hope that it will be useful,
+* but WITHOUT ANY WARRANTY; without even the implied warranty of
+* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+* GNU General Public License for more details.
+*
+* You should have received a copy of the GNU General Public License
+* along with this program; if not, write to the Free Software
+* Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+*
+* Notes:
+*
+* Hopefully you should never have to understand or touch
+* any of the code below. but don't let that keep you from trying!
+*
+* This code is loosely based on David Mills' RFC 1589 and its
+* updates. Please see the following for more details:
+* http://www.eecis.udel.edu/~mills/database/rfc/rfc1589.txt
+* http://www.eecis.udel.edu/~mills/database/reports/kern/kernb.pdf
+*
+* NOTE: To simplify the code, we do not implement any of
+* the PPS code, as the code that uses it never was merged.
+* [email protected]
+*
+* Revision History:
+* 2004-09-02: A0
+* o First pass sent to lkml for review.
+* 2004-12-07: A1
+* o No changes, sent to lkml for review.
+* 2005-03-11: A3
+* o yanked ntp_scale(), ntp adjustments are done in cyc2ns
+* 2005-04-29: A4
+* o Added conditional debug info
+* 2005-05-12: A5
+* o comment cleanups
+* TODO List:
+* o Move to using ppb for frequency adjustments
+* o More documentation
+* o More testing
+* o More optimization
+*********************************************************************/
+
+#include <linux/ntp.h>
+#include <linux/errno.h>
+
+/* XXX - remove later */
+#define NTP_DEBUG 0
+
+/* NTP scaling code
+ * Functions:
+ * ----------
+ * nsec_t ntp_scale(nsec_t value):
+ * Scales the nsec_t vale using ntp kernel state
+ * void ntp_advance(nsec_t interval):
+ * Increments the NTP state machine by interval time
+ * static int ntp_hardupdate(long offset, struct timeval tv)
+ * ntp_adjtimex helper function
+ * int ntp_adjtimex(struct timex* tx):
+ * Interface to adjust NTP state machine
+ * int ntp_leapsecond(struct timespec now)
+ * Does NTP leapsecond processing. Returns number of
+ * seconds current time should be adjusted by.
+ * void ntp_clear(void):
+ * Clears the ntp kernel state
+ * int get_ntp_status(void):
+ * returns ntp_status value
+ *
+ * Variables:
+ * ----------
+ * ntp kernel state variables:
+ * See below for full list.
+ * ntp_lock:
+ * Protects ntp kernel state variables
+ */
+
+
+
+/* Chapter 5: Kernel Variables [RFC 1589 pg. 28] */
+/* 5.1 Interface Variables */
+static int ntp_status = STA_UNSYNC; /* status */
+static long ntp_offset; /* usec */
+static long ntp_constant = 2; /* ntp magic? */
+static long ntp_maxerror = NTP_PHASE_LIMIT; /* usec */
+static long ntp_esterror = NTP_PHASE_LIMIT; /* usec */
+static const long ntp_tolerance = MAXFREQ; /* shifted ppm */
+static const long ntp_precision = 1; /* constant */
+
+/* 5.2 Phase-Lock Loop Variables */
+static long ntp_freq; /* shifted ppm */
+static long ntp_reftime; /* sec */
+
+/* Extra values */
+static int ntp_state = TIME_OK; /* leapsecond state */
+static long ntp_tick = USEC_PER_SEC/USER_HZ; /* tick length */
+
+static s64 ss_offset_len; /* SINGLESHOT offset adj interval (nsec)*/
+static long singleshot_adj; /* +/- MAX_SINGLESHOT_ADJ (ppm)*/
+static long tick_adj; /* tx->tick adjustment (ppm) */
+static long offset_adj; /* offset adjustment (ppm) */
+
+
+/* lock for the above variables */
+static seqlock_t ntp_lock = SEQLOCK_UNLOCKED;
+
+#define MAX_SINGLESHOT_ADJ 500 /* (ppm) */
+#define SEC_PER_DAY 86400
+
+/* Required to safely shift negative values */
+#define shiftR(x,s) (x < 0) ? (-((-x) >> (s))) : ((x) >> (s))
+
+/**
+ * ntp_advance - Periodic hook which increments NTP state machine
+ * interval: nsecond interval value used to increment the state machine
+ *
+ * Periodic hook which increments NTP state machine by interval.
+ * Returns the signed PPM adjustment to be used for the next interval.
+ *
+ * This is ntp_hardclock in the RFC.
+ */
+int ntp_advance(nsec_t interval)
+{
+ static u64 interval_sum = 0;
+ static long ss_adj = 0;
+ unsigned long flags;
+ long ppm_sum;
+
+ /* inc interval sum */
+ interval_sum += interval;
+
+ write_seqlock_irqsave(&ntp_lock, flags);
+
+ /* decrement singleshot offset interval */
+ ss_offset_len -= interval;
+ if(ss_offset_len < 0) /* make sure it doesn't go negative */
+ ss_offset_len = 0;
+
+ /* Do second overflow code */
+ while (interval_sum > NSEC_PER_SEC) {
+ /* XXX - I'd prefer to smoothly apply this math
+ * at each call to ntp_advance() rather then each
+ * second.
+ */
+ long tmp;
+
+ /* Bump maxerror by ntp_tolerance */
+ ntp_maxerror += shiftR(ntp_tolerance, SHIFT_USEC);
+ if (ntp_maxerror > NTP_PHASE_LIMIT) {
+ ntp_maxerror = NTP_PHASE_LIMIT;
+ ntp_status |= STA_UNSYNC;
+ }
+
+ /* Calculate offset_adj for the next second */
+ tmp = ntp_offset;
+ if (!(ntp_status & STA_FLL))
+ tmp = shiftR(tmp, SHIFT_KG + ntp_constant);
+
+ /* bound the adjustment to MAXPHASE/MINSEC */
+ tmp = min(tmp, (MAXPHASE / MINSEC) << SHIFT_UPDATE);
+ tmp = max(tmp, -(MAXPHASE / MINSEC) << SHIFT_UPDATE);
+
+ offset_adj = shiftR(tmp, SHIFT_UPDATE); /* (usec/sec) = ppm */
+ ntp_offset -= tmp;
+
+ interval_sum -= NSEC_PER_SEC;
+
+ /* calculate singleshot aproximation ppm for the next second */
+ ss_adj = singleshot_adj;
+ singleshot_adj = 0;
+ }
+
+ /* calculate total ppm adjustment for the next interval */
+ ppm_sum = tick_adj;
+ ppm_sum += offset_adj;
+ ppm_sum += shiftR(ntp_freq,SHIFT_USEC);
+ ppm_sum += ss_adj;
+
+#if NTP_DEBUG
+{ /*XXX - yank me! just for debug */
+ static int dbg = 0;
+ if(!(dbg++%300000))
+ printk("tick_adj(%d) + offset_adj(%d) + ntp_freq(%d) + ss_adj(%d) = ppm_sum(%d)\n", tick_adj, offset_adj, shiftR(ntp_freq,SHIFT_USEC), ss_adj, ppm_sum);
+}
+#endif
+
+ write_sequnlock_irqrestore(&ntp_lock, flags);
+
+ return ppm_sum;
+}
+
+/**
+ * ntp_hardupdate - Calculates the offset and freq values
+ * offset: current offset
+ * tv: timeval holding the current time
+ *
+ * Private function, called only by ntp_adjtimex while holding ntp_lock
+ *
+ * XXX - this function needs a much better explanation
+ */
+static int ntp_hardupdate(long offset, struct timeval tv)
+{
+ int ret;
+ long tmp, interval;
+
+ ret = 0;
+ if (!(ntp_status & STA_PLL))
+ return ret;
+
+ tmp = offset;
+ /* Make sure offset is bounded by MAXPHASE */
+ tmp = min(tmp, MAXPHASE);
+ tmp = max(tmp, -MAXPHASE);
+
+ ntp_offset = tmp << SHIFT_UPDATE;
+
+ if ((ntp_status & STA_FREQHOLD) || (ntp_reftime == 0))
+ ntp_reftime = tv.tv_sec;
+
+ /* calculate seconds since last call to hardupdate */
+ interval = tv.tv_sec - ntp_reftime;
+ ntp_reftime = tv.tv_sec;
+
+ if ((ntp_status & STA_FLL) && (interval >= MINSEC)) {
+ long damping;
+ /* XXX - should we round here? */
+ tmp = offset / interval; /* ppm (usec/sec)*/
+
+ /* convert to shifted ppm, then apply damping factor */
+
+ /* calculate damping factor - XXX bigger comment!*/
+ damping = SHIFT_KH - SHIFT_USEC;
+
+ /* apply damping factor */
+ ntp_freq += shiftR(tmp,damping);
+#if NTP_DEBUG
+ printk("ntp->freq change: %ld\n",shiftR(tmp,damping));
+#endif
+
+ } else if ((ntp_status & STA_PLL) && (interval < MAXSEC)) {
+ long damping;
+ tmp = offset * interval; /* ppm XXX - not quite*/
+
+ /* calculate damping factor - XXX bigger comment!*/
+ damping = (2 * ntp_constant) + SHIFT_KF - SHIFT_USEC;
+
+ /* apply damping factor */
+ ntp_freq += shiftR(tmp,damping);
+
+#if NTP_DEBUG
+ printk("ntp->freq change: %ld\n", shiftR(tmp,damping));
+#endif
+ } else { /* interval out of bounds */
+#if NTP_DEBUG
+ printk("ntp_hardupdate(): interval out of bounds: %ld status: 0x%x\n",
+ interval, ntp_status);
+#endif
+ ret = -1; /* TIME_ERROR */
+ }
+
+ /* bound ntp_freq */
+ if (ntp_freq > ntp_tolerance)
+ ntp_freq = ntp_tolerance;
+ if (ntp_freq < -ntp_tolerance)
+ ntp_freq = -ntp_tolerance;
+
+ return ret;
+}
+
+/**
+ * ntp_adjtimex - Interface to change NTP state machine
+ * @tx: timex value passed to the kernel to be used
+ */
+int ntp_adjtimex(struct timex* tx)
+{
+ long save_offset;
+ int result;
+ unsigned long flags;
+
+/* Sanity checking
+ */
+ /* frequency adjustment limited to +/- MAXFREQ */
+ if ((tx->modes & ADJ_FREQUENCY)
+ && (abs(tx->freq) > MAXFREQ))
+ return -EINVAL;
+
+ /* maxerror adjustment limited to NTP_PHASE_LIMIT */
+ if ((tx->modes & ADJ_MAXERROR)
+ && (tx->maxerror < 0
+ || tx->maxerror >= NTP_PHASE_LIMIT))
+ return -EINVAL;
+
+ /* esterror adjustment limited to NTP_PHASE_LIMIT */
+ if ((tx->modes & ADJ_ESTERROR)
+ && (tx->esterror < 0
+ || tx->esterror >= NTP_PHASE_LIMIT))
+ return -EINVAL;
+
+ /* constant adjustment must be positive */
+ if ((tx->modes & ADJ_TIMECONST)
+ && (tx->constant < 0))
+ return -EINVAL;
+
+ /* Single shot mode can only be used by itself */
+ if (((tx->modes & ADJ_OFFSET_SINGLESHOT) == ADJ_OFFSET_SINGLESHOT)
+ && (tx->modes != ADJ_OFFSET_SINGLESHOT))
+ return -EINVAL;
+
+ /* offset adjustment limited to +/- MAXPHASE */
+ if ((tx->modes != ADJ_OFFSET_SINGLESHOT)
+ && (tx->modes & ADJ_OFFSET)
+ && (abs(tx->offset)>= MAXPHASE))
+ return -EINVAL;
+
+ /* tick adjustment limited to 10% */
+ /* XXX - should we round here? */
+ if ((tx->modes & ADJ_TICK)
+ && ((tx->tick < 900000/USER_HZ)
+ ||(tx->tick > 11000000/USER_HZ)))
+ return -EINVAL;
+
+#if NTP_DEBUG
+ /* dbg output XXX - yank me! */
+ if(tx->modes) {
+ printk("adjtimex: tx->offset: %ld tx->freq: %ld\n",
+ tx->offset, tx->freq);
+ }
+#endif
+
+/* Kernel input bits
+ */
+ write_seqlock_irqsave(&ntp_lock, flags);
+
+ result = ntp_state;
+
+ /* For ADJ_OFFSET_SINGLESHOT we must return the old offset */
+ save_offset = shiftR(ntp_offset, SHIFT_UPDATE);
+
+ /* Process input parameters */
+ if (tx->modes & ADJ_STATUS) {
+ ntp_status &= STA_RONLY;
+ ntp_status |= tx->status & ~STA_RONLY;
+ }
+
+ if (tx->modes & ADJ_FREQUENCY)
+ ntp_freq = tx->freq;
+
+ if (tx->modes & ADJ_MAXERROR)
+ ntp_maxerror = tx->maxerror;
+
+ if (tx->modes & ADJ_ESTERROR)
+ ntp_esterror = tx->esterror;
+
+ if (tx->modes & ADJ_TIMECONST)
+ ntp_constant = tx->constant;
+
+ if (tx->modes & ADJ_OFFSET) {
+ /* check if we're doing a singleshot adjustment */
+ if (tx->modes == ADJ_OFFSET_SINGLESHOT)
+ singleshot_adj = tx->offset;
+ /* otherwise, call hardupdate() */
+ else if (ntp_hardupdate(tx->offset, tx->time))
+ result = TIME_ERROR;
+ }
+
+ if (tx->modes & ADJ_TICK) {
+ /* first calculate usec/user_tick offset */
+ /* XXX - should we round here? */
+ tick_adj = (USEC_PER_SEC/USER_HZ) - tx->tick;
+ /* multiply by user_hz to get usec/sec => ppm */
+ tick_adj *= USER_HZ;
+ /* save tx->tick for future calls to adjtimex */
+ ntp_tick = tx->tick;
+ }
+
+ if ((ntp_status & (STA_UNSYNC|STA_CLOCKERR)) != 0 )
+ result = TIME_ERROR;
+
+/* Kernel output bits
+ */
+ /* write kernel state to user timex values*/
+ if ((tx->modes & ADJ_OFFSET_SINGLESHOT) == ADJ_OFFSET_SINGLESHOT)
+ tx->offset = save_offset;
+ else
+ tx->offset = shiftR(ntp_offset, SHIFT_UPDATE);
+
+ tx->freq = ntp_freq;
+ tx->maxerror = ntp_maxerror;
+ tx->esterror = ntp_esterror;
+ tx->status = ntp_status;
+ tx->constant = ntp_constant;
+ tx->precision = ntp_precision;
+ tx->tolerance = ntp_tolerance;
+
+ /* PPS is not implemented, so these are zero */
+ tx->ppsfreq = /*XXX - Not Implemented!*/ 0;
+ tx->jitter = /*XXX - Not Implemented!*/ 0;
+ tx->shift = /*XXX - Not Implemented!*/ 0;
+ tx->stabil = /*XXX - Not Implemented!*/ 0;
+ tx->jitcnt = /*XXX - Not Implemented!*/ 0;
+ tx->calcnt = /*XXX - Not Implemented!*/ 0;
+ tx->errcnt = /*XXX - Not Implemented!*/ 0;
+ tx->stbcnt = /*XXX - Not Implemented!*/ 0;
+
+ write_sequnlock_irqrestore(&ntp_lock, flags);
+
+ return result;
+}
+
+
+/**
+ * ntp_leapsecond - NTP leapsecond processing code.
+ * now: the current time
+ *
+ * Returns the number of seconds (-1, 0, or 1) that
+ * should be added to the current time to properly
+ * adjust for leapseconds.
+ */
+int ntp_leapsecond(struct timespec now)
+{
+ /*
+ * Leap second processing. If in leap-insert state at
+ * the end of the day, the system clock is set back one
+ * second; if in leap-delete state, the system clock is
+ * set ahead one second.
+ */
+ static time_t leaptime = 0;
+
+ switch (ntp_state) {
+ case TIME_OK:
+ if (ntp_status & STA_INS) {
+ ntp_state = TIME_INS;
+ /* calculate end of today (23:59:59)*/
+ leaptime = now.tv_sec + SEC_PER_DAY -
+ (now.tv_sec % SEC_PER_DAY) - 1;
+ }
+ else if (ntp_status & STA_DEL) {
+ ntp_state = TIME_DEL;
+ /* calculate end of today (23:59:59)*/
+ leaptime = now.tv_sec + SEC_PER_DAY -
+ (now.tv_sec % SEC_PER_DAY) - 1;
+ }
+ break;
+
+ case TIME_INS:
+ /* Once we are at (or past) leaptime, insert the second */
+ if (now.tv_sec > leaptime) {
+ ntp_state = TIME_OOP;
+ printk(KERN_NOTICE
+ "Clock: inserting leap second 23:59:60 UTC\n");
+ return -1;
+ }
+ break;
+
+ case TIME_DEL:
+ /* Once we are at (or past) leaptime, delete the second */
+ if (now.tv_sec >= leaptime) {
+ ntp_state = TIME_WAIT;
+ printk(KERN_NOTICE
+ "Clock: deleting leap second 23:59:59 UTC\n");
+ return 1;
+ }
+ break;
+
+ case TIME_OOP:
+ /* Wait for the end of the leap second*/
+ if (now.tv_sec > (leaptime + 1))
+ ntp_state = TIME_WAIT;
+ break;
+
+ case TIME_WAIT:
+ if (!(ntp_status & (STA_INS | STA_DEL)))
+ ntp_state = TIME_OK;
+ }
+
+ return 0;
+}
+
+/**
+ * ntp_clear - Clears the NTP state machine.
+ *
+ */
+void ntp_clear(void)
+{
+ unsigned long flags;
+ write_seqlock_irqsave(&ntp_lock, flags);
+
+ /* clear everything */
+ ntp_status |= STA_UNSYNC;
+ ntp_maxerror = NTP_PHASE_LIMIT;
+ ntp_esterror = NTP_PHASE_LIMIT;
+ ss_offset_len = 0;
+ singleshot_adj = 0;
+ tick_adj = 0;
+ offset_adj =0;
+
+ write_sequnlock_irqrestore(&ntp_lock, flags);
+}
+
+/**
+ * get_ntp_status - Returns the NTP status value
+ *
+ */
+int get_ntp_status(void)
+{
+ return ntp_status;
+}
+
Index: kernel/time.c
===================================================================
--- eed337ef5e9ae7d62caa84b7974a11fddc7f06e0/kernel/time.c (mode:100644)
+++ d68b09f31fa98801ead715e9281a2e4676b770a5/kernel/time.c (mode:100644)
@@ -38,6 +38,7 @@

#include <asm/uaccess.h>
#include <asm/unistd.h>
+#include <linux/timeofday.h>

/*
* The timezone where the local system is located. Used as a default by some
@@ -128,6 +129,7 @@
* as real UNIX machines always do it. This avoids all headaches about
* daylight saving times and warping kernel clocks.
*/
+#ifndef CONFIG_NEWTOD
inline static void warp_clock(void)
{
write_seqlock_irq(&xtime_lock);
@@ -137,6 +139,18 @@
write_sequnlock_irq(&xtime_lock);
clock_was_set();
}
+#else /* !CONFIG_NEWTOD */
+/* XXX - this is somewhat cracked out and should
+ be checked [email protected]
+*/
+inline static void warp_clock(void)
+{
+ struct timespec ts;
+ getnstimeofday(&ts);
+ ts.tv_sec += sys_tz.tz_minuteswest * 60;
+ do_settimeofday(&ts);
+}
+#endif /* !CONFIG_NEWTOD */

/*
* In case for some reason the CMOS clock has not already been running
@@ -227,6 +241,7 @@
/* adjtimex mainly allows reading (and writing, if superuser) of
* kernel time-keeping variables. used by xntpd.
*/
+#ifndef CONFIG_NEWTOD
int do_adjtimex(struct timex *txc)
{
long ltemp, mtemp, save_adjust;
@@ -410,6 +425,7 @@
notify_arch_cmos_timer();
return(result);
}
+#endif /* !CONFIG_NEWTOD */

asmlinkage long sys_adjtimex(struct timex __user *txc_p)
{
@@ -558,6 +574,7 @@


#else
+#ifndef CONFIG_NEWTOD
/*
* Simulate gettimeofday using do_gettimeofday which only allows a timeval
* and therefore only yields usec accuracy
@@ -570,6 +587,7 @@
tv->tv_sec = x.tv_sec;
tv->tv_nsec = x.tv_usec * NSEC_PER_USEC;
}
+#endif /* !CONFIG_NEWTOD */
#endif

#if (BITS_PER_LONG < 64)
Index: kernel/timeofday.c
===================================================================
--- /dev/null (tree:eed337ef5e9ae7d62caa84b7974a11fddc7f06e0)
+++ d68b09f31fa98801ead715e9281a2e4676b770a5/kernel/timeofday.c (mode:100644)
@@ -0,0 +1,603 @@
+/*********************************************************************
+* linux/kernel/timeofday.c
+*
+* This file contains the functions which access and manage
+* the system's time of day functionality.
+*
+* Copyright (C) 2003, 2004, 2005 IBM, John Stultz ([email protected])
+*
+* This program is free software; you can redistribute it and/or modify
+* it under the terms of the GNU General Public License as published by
+* the Free Software Foundation; either version 2 of the License, or
+* (at your option) any later version.
+*
+* This program is distributed in the hope that it will be useful,
+* but WITHOUT ANY WARRANTY; without even the implied warranty of
+* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+* GNU General Public License for more details.
+*
+* You should have received a copy of the GNU General Public License
+* along with this program; if not, write to the Free Software
+* Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+*
+* Revision History:
+* 2004-09-02: A0
+* o First pass sent to lkml for review.
+* 2004-12-07: A1
+* o Rework of timesource structure
+* o Sent to lkml for review
+* 2005-01-24: A2
+* o write_seqlock_irq -> writeseqlock_irqsave
+* o arch generic interface for for get_cmos_time() equivalents
+* o suspend/resume hooks for sleep/hibernate (lightly tested)
+* o timesource adjust_callback hook
+* o Sent to lkml for review
+* 2005-03-11: A3
+* o periodic_hook (formerly interrupt_hook) now calle by softtimer
+* o yanked ntp_scale(), ntp adjustments are done in cyc2ns now
+* o sent to lkml for review
+* 2005-04-29: A4
+* o Improved the cyc2ns remainder handling
+* o Added getnstimeofday
+* o Cleanups from Nish Aravamudan
+* 2005-05-12: A5
+* o Added clock_was_set hooks
+* o Added suspend/resume sysfs hooks
+* o Minor code cleanups
+* o First attempt at docbook comments
+* TODO WishList:
+* o See XXX's below.
+**********************************************************************/
+
+#include <linux/timeofday.h>
+#include <linux/timesource.h>
+#include <linux/ntp.h>
+#include <linux/timex.h>
+#include <linux/timer.h>
+#include <linux/module.h>
+#include <linux/sched.h> /* Needed for capable() */
+#include <linux/sysdev.h>
+#include <linux/jiffies.h>
+
+/* XXX - remove later */
+#define TIME_DBG 0
+#define TIME_DBG_FREQ 60000
+
+/* only run periodic_hook every 50ms */
+#define PERIODIC_INTERVAL_MS 50
+
+/*[Nanosecond based variables]
+ * system_time:
+ * Monotonically increasing counter of the number of nanoseconds
+ * since boot.
+ * wall_time_offset:
+ * Offset added to system_time to provide accurate time-of-day
+ */
+static nsec_t system_time;
+static nsec_t wall_time_offset;
+
+/*[Cycle based variables]
+ * offset_base:
+ * Value of the timesource at the last timeofday_periodic_hook()
+ * (adjusted only minorly to account for rounded off cycles)
+ */
+static cycle_t offset_base;
+
+/*[Time source data]
+ * timesource:
+ * current timesource pointer
+ */
+static struct timesource_t *timesource;
+
+/*[NTP adjustment]
+ * ntp_adj:
+ * value of the current ntp adjustment,
+ * stored in timesource multiplier units.
+ */
+int ntp_adj;
+
+/*[Locks]
+ * system_time_lock:
+ * generic lock for all locally scoped time values
+ */
+static seqlock_t system_time_lock = SEQLOCK_UNLOCKED;
+
+
+/*[Suspend/Resume info]
+ * time_suspend_state:
+ * variable that keeps track of suspend state
+ * suspend_start:
+ * start of the suspend call
+ */
+static enum {
+ TIME_RUNNING,
+ TIME_SUSPENDED
+} time_suspend_state = TIME_RUNNING;
+
+static nsec_t suspend_start;
+
+/* [Soft-Timers]
+ * timeofday_timer:
+ * soft-timer used to call timeofday_periodic_hook()
+ */
+struct timer_list timeofday_timer;
+
+
+/* [Functions]
+ */
+
+/**
+ * get_lowres_timestamp - Returns a low res timestamp
+ *
+ * Returns a low res timestamp w/ PERIODIC_INTERVAL_MS
+ * granularity. (ie: the value of system_time as
+ * calculated at the last invocation of
+ * timeofday_periodic_hook())
+ */
+nsec_t get_lowres_timestamp(void)
+{
+ nsec_t ret;
+ unsigned long seq;
+ do {
+ seq = read_seqbegin(&system_time_lock);
+
+ /* quickly grab system_time*/
+ ret = system_time;
+
+ } while (read_seqretry(&system_time_lock, seq));
+
+ return ret;
+}
+
+
+/**
+ * get_lowres_timeofday - Returns a low res time of day
+ *
+ * Returns a low res time of day, as calculated at the
+ * last invocation of timeofday_periodic_hook().
+ */
+nsec_t get_lowres_timeofday(void)
+{
+ nsec_t ret;
+ unsigned long seq;
+ do {
+ seq = read_seqbegin(&system_time_lock);
+
+ /* quickly calculate low-res time of day */
+ ret = system_time + wall_time_offset;
+
+ } while (read_seqretry(&system_time_lock, seq));
+
+ return ret;
+}
+
+
+/**
+ * update_legacy_time_values - Used to sync legacy time values
+ *
+ * Private function. Used to sync legacy time values to
+ * current timeofday. Assumes we have the system_time_lock.
+ * Hopefully someday this function can be removed.
+ */
+static void update_legacy_time_values(void)
+{
+ unsigned long flags;
+ write_seqlock_irqsave(&xtime_lock, flags);
+ xtime = ns_to_timespec(system_time + wall_time_offset);
+ wall_to_monotonic = ns_to_timespec(wall_time_offset);
+ set_normalized_timespec(&wall_to_monotonic,
+ -wall_to_monotonic.tv_sec, -wall_to_monotonic.tv_nsec);
+ /* We don't update jiffies here because it is its own time domain */
+ write_sequnlock_irqrestore(&xtime_lock, flags);
+}
+
+
+/**
+ * __monotonic_clock - Returns monotonically increasing nanoseconds
+ *
+ * private function, must hold system_time_lock lock when being
+ * called. Returns the monotonically increasing number of
+ * nanoseconds since the system booted (adjusted by NTP scaling)
+ */
+static inline nsec_t __monotonic_clock(void)
+{
+ nsec_t ret, ns_offset;
+ cycle_t now, cycle_delta;
+
+ /* read timesource */
+ now = read_timesource(timesource);
+
+ /* calculate the delta since the last timeofday_periodic_hook */
+ cycle_delta = (now - offset_base) & timesource->mask;
+
+ /* convert to nanoseconds */
+ ns_offset = cyc2ns(timesource, ntp_adj, cycle_delta);
+
+ /* add result to system time */
+ ret = system_time + ns_offset;
+
+ return ret;
+}
+
+
+/**
+ * do_monotonic_clock - Returns monotonically increasing nanoseconds
+ *
+ * Returns the monotonically increasing number of nanoseconds
+ * since the system booted via __monotonic_clock()
+ */
+nsec_t do_monotonic_clock(void)
+{
+ nsec_t ret;
+ unsigned long seq;
+
+ /* atomically read __monotonic_clock() */
+ do {
+ seq = read_seqbegin(&system_time_lock);
+
+ ret = __monotonic_clock();
+
+ } while (read_seqretry(&system_time_lock, seq));
+
+ return ret;
+}
+
+
+/**
+ * __gettimeofday - Returns the timeofday in nsec_t.
+ *
+ * Private function. Returns the timeofday in nsec_t.
+ */
+static inline nsec_t __gettimeofday(void)
+{
+ nsec_t wall, sys;
+ unsigned long seq;
+
+ /* atomically read wall and sys time */
+ do {
+ seq = read_seqbegin(&system_time_lock);
+
+ wall = wall_time_offset;
+ sys = __monotonic_clock();
+
+ } while (read_seqretry(&system_time_lock, seq));
+
+ return wall + sys;
+}
+
+
+/**
+ * getnstimeofday - Returns the time of day in a timespec
+ * @ts: pointer to the timespec to be set
+ *
+ * Returns the time of day in a timespec
+ * For consistency should be renamed
+ * later to do_getnstimeofday()
+ */
+void getnstimeofday(struct timespec *ts)
+{
+ *ts = ns_to_timespec(__gettimeofday());
+}
+EXPORT_SYMBOL(getnstimeofday);
+
+
+/**
+ * do_gettimeofday - Returns the time of day in a timeval
+ * @tv: pointer to the timeval to be set
+ *
+ */
+void do_gettimeofday(struct timeval *tv)
+{
+ *tv = ns_to_timeval(__gettimeofday());
+}
+EXPORT_SYMBOL(do_gettimeofday);
+
+
+/**
+ * do_settimeofday - Sets the time of day
+ * @tv: pointer to the timespec that will be used to set the time
+ *
+ */
+int do_settimeofday(struct timespec *tv)
+{
+ unsigned long flags;
+ nsec_t newtime = timespec_to_ns(tv);
+
+ /* atomically adjust wall_time_offset & clear ntp state machine */
+ write_seqlock_irqsave(&system_time_lock, flags);
+
+ wall_time_offset = newtime - __monotonic_clock();
+ ntp_clear();
+
+ update_legacy_time_values();
+
+ arch_update_vsyscall_gtod(system_time + wall_time_offset, offset_base,
+ timesource, ntp_adj);
+
+ write_sequnlock_irqrestore(&system_time_lock, flags);
+
+ /* signal posix-timers about time change */
+ clock_was_set();
+
+ return 0;
+}
+EXPORT_SYMBOL(do_settimeofday);
+
+
+/**
+ * do_adjtimex - interface to the kernel NTP variables
+ * @tx: pointer to the timex value that will be used
+ *
+ * Userspace NTP daemon's interface to the kernel NTP variables
+ */
+int do_adjtimex(struct timex *tx)
+{
+ /* Check capabilities if we're trying to modify something */
+ if (tx->modes && !capable(CAP_SYS_TIME))
+ return -EPERM;
+
+ /* Note: We set tx->time first,
+ * because ntp_adjtimex uses it
+ */
+ do_gettimeofday(&tx->time);
+
+ /* call out to NTP code */
+ return ntp_adjtimex(tx);
+}
+
+
+/**
+ * timeofday_suspend_hook - allows the timeofday subsystem to be shutdown
+ * @dev: unused
+ * state: unused
+ *
+ * This function allows the timeofday subsystem to
+ * be shutdown for a period of time. Usefull when
+ * going into suspend/hibernate mode. The code is
+ * very similar to the first half of
+ * timeofday_periodic_hook().
+ */
+static int timeofday_suspend_hook(struct sys_device *dev, u32 state)
+{
+ unsigned long flags;
+
+ write_seqlock_irqsave(&system_time_lock, flags);
+
+ /* Make sure time_suspend_state is sane */
+ BUG_ON(time_suspend_state != TIME_RUNNING);
+
+ /* First off, save suspend start time
+ * then quickly call __monotonic_clock.
+ * These two calls hopefully occur quickly
+ * because the difference between reads will
+ * accumulate as time drift on resume.
+ */
+ suspend_start = read_persistent_clock();
+ system_time = __monotonic_clock();
+
+ /* switch states */
+ time_suspend_state = TIME_SUSPENDED;
+
+ write_sequnlock_irqrestore(&system_time_lock, flags);
+ return 0;
+}
+
+
+/**
+ * timeofday_resume_hook - Resumes the timeofday subsystem.
+ * @dev: unused
+ *
+ * This function resumes the timeofday subsystem
+ * from a previous call to timeofday_suspend_hook.
+ */
+static int timeofday_resume_hook(struct sys_device *dev)
+{
+ nsec_t now, suspend_time;
+ unsigned long flags;
+
+ write_seqlock_irqsave(&system_time_lock, flags);
+
+ /* Make sure time_suspend_state is sane */
+ BUG_ON(time_suspend_state != TIME_SUSPENDED);
+
+ /* Read persistent clock to mark the end of
+ * the suspend interval then rebase the
+ * offset_base to current timesource value.
+ * Again, time between these two calls will
+ * not be accounted for and will show up as
+ * time drift.
+ */
+ now = read_persistent_clock();
+ offset_base = read_timesource(timesource);
+
+ /* calculate how long we were out for */
+ suspend_time = now - suspend_start;
+
+ /* update system_time */
+ system_time += suspend_time;
+
+ ntp_clear();
+
+ /* Set us back to running */
+ time_suspend_state = TIME_RUNNING;
+
+ /* finally, update legacy time values */
+ update_legacy_time_values();
+
+ write_sequnlock_irqrestore(&system_time_lock, flags);
+
+ /* signal posix-timers about time change */
+ clock_was_set();
+
+ return 0;
+}
+
+/* sysfs resume/suspend bits */
+static struct sysdev_class timeofday_sysclass = {
+ .resume = timeofday_resume_hook,
+ .suspend = timeofday_suspend_hook,
+ set_kset_name("timeofday"),
+};
+static struct sys_device device_timer = {
+ .id = 0,
+ .cls = &timeofday_sysclass,
+};
+static int timeofday_init_device(void)
+{
+ int error = sysdev_class_register(&timeofday_sysclass);
+ if (!error)
+ error = sysdev_register(&device_timer);
+ return error;
+}
+device_initcall(timeofday_init_device);
+
+/**
+ * timeofday_periodic_hook - Does periodic update of timekeeping values.
+ * unused: unused
+ *
+ * Calculates the delta since the last call,
+ * updates system time and clears the offset.
+ *
+ * Called via timeofday_timer.
+ */
+static void timeofday_periodic_hook(unsigned long unused)
+{
+ cycle_t now, cycle_delta;
+ static u64 remainder;
+ nsec_t ns, ns_ntp;
+ long leapsecond;
+ struct timesource_t* next;
+ unsigned long flags;
+ u64 tmp;
+
+ write_seqlock_irqsave(&system_time_lock, flags);
+
+ /* read time source & calc time since last call*/
+ now = read_timesource(timesource);
+ cycle_delta = (now - offset_base) & timesource->mask;
+
+ /* convert cycles to ntp adjusted ns and save remainder */
+ ns_ntp = cyc2ns_rem(timesource, ntp_adj, cycle_delta, &remainder);
+
+ /* convert cycles to raw ns for ntp advance */
+ ns = cyc2ns(timesource, 0, cycle_delta);
+
+#if TIME_DBG
+{ /* XXX - remove later*/
+ static int dbg=0;
+ if(!(dbg++%TIME_DBG_FREQ)){
+ printk(KERN_INFO "now: %lluc - then: %lluc = delta: %lluc -> %llu ns + %llu shift_ns (ntp_adj: %i)\n",
+ (unsigned long long)now, (unsigned long long)offset_base,
+ (unsigned long long)cycle_delta, (unsigned long long)ns,
+ (unsigned long long)remainder, ntp_adj);
+ }
+}
+#endif
+
+ /* update system_time */
+ system_time += ns_ntp;
+
+ /* reset the offset_base */
+ offset_base = now;
+
+ /* advance the ntp state machine by ns interval*/
+ ntp_adj = ntp_advance(ns);
+
+ /* do ntp leap second processing*/
+ leapsecond = ntp_leapsecond(ns_to_timespec(system_time+wall_time_offset));
+ wall_time_offset += leapsecond * NSEC_PER_SEC;
+
+ /* sync the persistent clock */
+ if (!(get_ntp_status() & STA_UNSYNC))
+ sync_persistent_clock(ns_to_timespec(system_time + wall_time_offset));
+
+ /* if necessary, switch timesources */
+ next = get_next_timesource();
+ if (next != timesource) {
+ /* immediately set new offset_base */
+ offset_base = read_timesource(next);
+ /* swap timesources */
+ timesource = next;
+ printk(KERN_INFO "Time: %s timesource has been installed.\n",
+ timesource->name);
+ ntp_clear();
+ ntp_adj = 0;
+ remainder = 0;
+ }
+
+ /* now is a safe time, so allow timesource to adjust
+ * itself (for example: to make cpufreq changes).
+ */
+ if(timesource->update_callback)
+ timesource->update_callback();
+
+
+ /* convert the signed ppm to timesource multiplier adjustment */
+ tmp = abs(ntp_adj);
+ tmp = tmp * timesource->mult;
+ /* XXX - should we round here? */
+ do_div(tmp, 1000000);
+ if (ntp_adj < 0)
+ ntp_adj = -(int)tmp;
+ else
+ ntp_adj = (int)tmp;
+
+ /* sync legacy values */
+ update_legacy_time_values();
+
+ arch_update_vsyscall_gtod(system_time + wall_time_offset, offset_base,
+ timesource, ntp_adj);
+
+ write_sequnlock_irqrestore(&system_time_lock, flags);
+
+ /* XXX - Do we need to call clock_was_set() here? */
+
+ /* Set us up to go off on the next interval */
+ mod_timer(&timeofday_timer,
+ jiffies + msecs_to_jiffies(PERIODIC_INTERVAL_MS));
+}
+
+
+/**
+ * timeofday_init - Initializes time variables
+ *
+ */
+void __init timeofday_init(void)
+{
+ unsigned long flags;
+#if TIME_DBG
+ printk(KERN_INFO "timeofday_init: Starting up!\n");
+#endif
+ write_seqlock_irqsave(&system_time_lock, flags);
+
+ /* initialize the timesource variable */
+ timesource = get_next_timesource();
+
+ /* clear and initialize offsets*/
+ offset_base = read_timesource(timesource);
+ wall_time_offset = read_persistent_clock();
+
+ /* clear NTP scaling factor & state machine */
+ ntp_adj = 0;
+ ntp_clear();
+
+ arch_update_vsyscall_gtod(system_time + wall_time_offset, offset_base,
+ timesource, ntp_adj);
+
+ /* initialize legacy time values */
+ update_legacy_time_values();
+
+ write_sequnlock_irqrestore(&system_time_lock, flags);
+
+ /* Install timeofday_periodic_hook timer */
+ init_timer(&timeofday_timer);
+ timeofday_timer.function = timeofday_periodic_hook;
+ timeofday_timer.expires = jiffies + 1;
+ add_timer(&timeofday_timer);
+
+
+#if TIME_DBG
+ printk(KERN_INFO "timeofday_init: finished!\n");
+#endif
+ return;
+}
Index: kernel/timer.c
===================================================================
--- eed337ef5e9ae7d62caa84b7974a11fddc7f06e0/kernel/timer.c (mode:100644)
+++ d68b09f31fa98801ead715e9281a2e4676b770a5/kernel/timer.c (mode:100644)
@@ -577,6 +577,7 @@
int tickadj = 500/HZ ? : 1; /* microsecs */


+#ifndef CONFIG_NEWTOD
/*
* phase-lock loop variables
*/
@@ -807,6 +808,9 @@
}
} while (ticks);
}
+#else /* !CONFIG_NEWTOD */
+#define update_wall_time(x)
+#endif /* !CONFIG_NEWTOD */

/*
* Called from the timer interrupt handler to charge one tick to the current
Index: kernel/timesource.c
===================================================================
--- /dev/null (tree:eed337ef5e9ae7d62caa84b7974a11fddc7f06e0)
+++ d68b09f31fa98801ead715e9281a2e4676b770a5/kernel/timesource.c (mode:100644)
@@ -0,0 +1,237 @@
+/*********************************************************************
+* linux/kernel/timesource.c
+*
+* This file contains the functions which manage timesource drivers.
+*
+* Copyright (C) 2004, 2005 IBM, John Stultz ([email protected])
+*
+* This program is free software; you can redistribute it and/or modify
+* it under the terms of the GNU General Public License as published by
+* the Free Software Foundation; either version 2 of the License, or
+* (at your option) any later version.
+*
+* This program is distributed in the hope that it will be useful,
+* but WITHOUT ANY WARRANTY; without even the implied warranty of
+* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+* GNU General Public License for more details.
+*
+* You should have received a copy of the GNU General Public License
+* along with this program; if not, write to the Free Software
+* Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+*
+* Revision History:
+* 2004-12-07: A1
+* o Rework of timesource structure
+* o Sent to lkml for review
+* 2005-04-29: A4
+* o Keep track of all registered timesources
+* o Add sysfs interface for overriding default selection
+* 2005-05-12: A5
+* o Add boot-time timesource= option for timesource overrides
+* TODO WishList:
+* o Allow timesource drivers to be unregistered
+* o get rid of timesource_jiffies extern
+**********************************************************************/
+
+#include <linux/timesource.h>
+#include <linux/sysdev.h>
+#include <linux/init.h>
+#include <linux/module.h>
+
+#define MAX_TIMESOURCES 10
+
+
+/* XXX - Would like a better way for initializing curr_timesource */
+extern struct timesource_t timesource_jiffies;
+
+/*[Timesource internal variables]---------
+ * curr_timesource:
+ * currently selected timesource. Initialized to timesource_jiffies.
+ * next_timesource:
+ * pending next selected timesource.
+ * timesource_list:
+ * array of pointers pointing to registered timesources
+ * timesource_list_counter:
+ * value which counts the number of registered timesources
+ * timesource_lock:
+ * protects manipulations to curr_timesource and next_timesource
+ * and the timesource_list
+ */
+static struct timesource_t *curr_timesource = &timesource_jiffies;
+static struct timesource_t *next_timesource;
+static struct timesource_t *timesource_list[MAX_TIMESOURCES];
+static int timesource_list_counter;
+static seqlock_t timesource_lock = SEQLOCK_UNLOCKED;
+
+static char override_name[32];
+
+/**
+ * get_next_timesource - Returns the selected timesource
+ *
+ */
+struct timesource_t* get_next_timesource(void)
+{
+ write_seqlock(&timesource_lock);
+ if (next_timesource) {
+ curr_timesource = next_timesource;
+ next_timesource = NULL;
+ }
+ write_sequnlock(&timesource_lock);
+
+ return curr_timesource;
+}
+
+/**
+ * select_timesource - Finds the best registered timesource.
+ *
+ * Private function. Must have a writelock on timesource_lock
+ * when called.
+ */
+static struct timesource_t* select_timesource(void)
+{
+ struct timesource_t* best = timesource_list[0];
+ int i;
+
+ for (i=0; i < timesource_list_counter; i++) {
+ /* Check for override */
+ if ((override_name[0] != 0) &&
+ (!strncmp(timesource_list[i]->name, override_name,
+ strlen(override_name)))) {
+ best = timesource_list[i];
+ break;
+ }
+ /* Pick the highest priority */
+ if (timesource_list[i]->priority > best->priority)
+ best = timesource_list[i];
+ }
+ return best;
+}
+
+/**
+ * register_timesource - Used to install new timesources
+ * @t: timesource to be registered
+ *
+ */
+void register_timesource(struct timesource_t* t)
+{
+ char* error_msg = 0;
+ int i;
+ write_seqlock(&timesource_lock);
+
+ /* check if timesource is already registered */
+ for (i=0; i < timesource_list_counter; i++)
+ if (!strncmp(timesource_list[i]->name, t->name, strlen(t->name))){
+ error_msg = "Already registered!";
+ break;
+ }
+
+ /* check that the list isn't full */
+ if (timesource_list_counter >= MAX_TIMESOURCES)
+ error_msg = "Too many timesources!";
+
+ if(!error_msg)
+ timesource_list[timesource_list_counter++] = t;
+ else
+ printk("register_timesource: Cannot register %s. %s\n",
+ t->name, error_msg);
+
+ /* select next timesource */
+ next_timesource = select_timesource();
+
+ write_sequnlock(&timesource_lock);
+}
+EXPORT_SYMBOL(register_timesource);
+
+/**
+ * sysfs_show_timesources - sysfs interface for listing timesource
+ * @dev: unused
+ * @buf: char buffer to be filled with timesource list
+ *
+ * Provides sysfs interface for listing registered timesources
+ */
+static ssize_t sysfs_show_timesources(struct sys_device *dev, char *buf)
+{
+ int i;
+ char* curr = buf;
+ write_seqlock(&timesource_lock);
+ for(i=0; i < timesource_list_counter; i++) {
+ /* Mark current timesource w/ a star */
+ if (timesource_list[i] == curr_timesource)
+ curr += sprintf(curr, "*");
+ curr += sprintf(curr, "%s ",timesource_list[i]->name);
+ }
+ write_sequnlock(&timesource_lock);
+
+ curr += sprintf(curr, "\n");
+ return curr - buf;
+}
+
+/**
+ * sysfs_override_timesource - interface for manually overriding timesource
+ * @dev: unused
+ * @buf: name of override timesource
+ *
+ *
+ * Takes input from sysfs interface for manually overriding
+ * the default timesource selction
+ */
+static ssize_t sysfs_override_timesource(struct sys_device *dev,
+ const char *buf, size_t count)
+{
+ /* check to avoid underflow later */
+ if (strlen(buf) == 0)
+ return count;
+
+ write_seqlock(&timesource_lock);
+
+ /* copy the name given */
+ strncpy(override_name, buf, strlen(buf)-1);
+ override_name[strlen(buf)-1] = 0;
+
+ /* see if we can find it */
+ next_timesource = select_timesource();
+
+ write_sequnlock(&timesource_lock);
+ return count;
+}
+
+/* Sysfs setup bits:
+ */
+static SYSDEV_ATTR(timesource, 0600, sysfs_show_timesources, sysfs_override_timesource);
+
+static struct sysdev_class timesource_sysclass = {
+ set_kset_name("timesource"),
+};
+
+static struct sys_device device_timesource = {
+ .id = 0,
+ .cls = &timesource_sysclass,
+};
+
+static int init_timesource_sysfs(void)
+{
+ int error = sysdev_class_register(&timesource_sysclass);
+ if (!error) {
+ error = sysdev_register(&device_timesource);
+ if (!error)
+ error = sysdev_create_file(&device_timesource, &attr_timesource);
+ }
+ return error;
+}
+device_initcall(init_timesource_sysfs);
+
+
+/**
+ * boot_override_timesource - boot time override
+ * @str: override name
+ *
+ * Takes a timesource= boot argument and uses it
+ * as the timesource override name
+ */
+static int __init boot_override_timesource(char* str)
+{
+ if (str)
+ strlcpy(override_name, str, sizeof(override_name));
+ return 1;
+}
+__setup("timesource=", boot_override_timesource);



2005-05-14 00:25:28

by john stultz

[permalink] [raw]
Subject: [RFC][PATCH (2/7)] new timeofday i386 arch specific changes (v A5)


All,
This patch converts the i386 arch to use the new timeofday
infrastructure. It applies ontop of my linux-2.6.12-rc4_timeofday-
core_A5 patch. This is a full conversion, so most of this patch is
subtractions removing the existing arch specific time keeping code. This
patch does not provide any i386 timesources, so using this patch alone
ontop of the timeofday-core patch will only give you the jiffies
timesource. To get full replacements for the code being removed here,
the following timeofday-timesources-i386 patch will need to be applied.

I intend to send this patch along with the timeofday-core patch to
Andrew at the end of this month for testing in his tree. So please, if
you have any complaints, suggestions, or blocking issues, let me know.

New in this version:
o This patch was broken out of the multi-arch timeofday-arch_A4 patch
o Removed #ifdefs and fully converted i386 to use the new timeofday
code.

Todo Items:
o Further cleanups and re-arainging in arch/i386/kernel/time.c of the
reminants of the arch specific time keeping code

thanks
-john


linux-2.6.12-rc4_timeofday-arch-i386_A5.patch
=============================================
Index: arch/i386/Kconfig
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/arch/i386/Kconfig (mode:100644)
+++ 0feb50d39a18b8a58ac2894eeac9b2f24a3b4435/arch/i386/Kconfig (mode:100644)
@@ -14,6 +14,11 @@
486, 586, Pentiums, and various instruction-set-compatible chips by
AMD, Cyrix, and others.

+config NEWTOD
+ bool
+ default y
+
+
config MMU
bool
default y
Index: arch/i386/kernel/Makefile
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/arch/i386/kernel/Makefile (mode:100644)
+++ 0feb50d39a18b8a58ac2894eeac9b2f24a3b4435/arch/i386/kernel/Makefile (mode:100644)
@@ -7,10 +7,9 @@
obj-y := process.o semaphore.o signal.o entry.o traps.o irq.o vm86.o \
ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_i386.o \
pci-dma.o i386_ksyms.o i387.o dmi_scan.o bootflag.o \
- doublefault.o quirks.o
+ doublefault.o quirks.o tsc.o

obj-y += cpu/
-obj-y += timers/
obj-$(CONFIG_ACPI_BOOT) += acpi/
obj-$(CONFIG_X86_BIOS_REBOOT) += reboot.o
obj-$(CONFIG_MCA) += mca.o
Index: arch/i386/kernel/i8259.c
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/arch/i386/kernel/i8259.c (mode:100644)
+++ 0feb50d39a18b8a58ac2894eeac9b2f24a3b4435/arch/i386/kernel/i8259.c (mode:100644)
@@ -387,6 +387,48 @@
}
}

+#ifdef CONFIG_NEWTOD
+void setup_pit_timer(void)
+{
+ extern spinlock_t i8253_lock;
+ unsigned long flags;
+
+ spin_lock_irqsave(&i8253_lock, flags);
+ outb_p(0x34,PIT_MODE); /* binary, mode 2, LSB/MSB, ch 0 */
+ udelay(10);
+ outb_p(LATCH & 0xff , PIT_CH0); /* LSB */
+ udelay(10);
+ outb(LATCH >> 8 , PIT_CH0); /* MSB */
+ spin_unlock_irqrestore(&i8253_lock, flags);
+}
+
+static int timer_resume(struct sys_device *dev)
+{
+ setup_pit_timer();
+ return 0;
+}
+
+static struct sysdev_class timer_sysclass = {
+ set_kset_name("timer_pit"),
+ .resume = timer_resume,
+};
+
+static struct sys_device device_timer = {
+ .id = 0,
+ .cls = &timer_sysclass,
+};
+
+static int __init init_timer_sysfs(void)
+{
+ int error = sysdev_class_register(&timer_sysclass);
+ if (!error)
+ error = sysdev_register(&device_timer);
+ return error;
+}
+
+device_initcall(init_timer_sysfs);
+#endif
+
void __init init_IRQ(void)
{
int i;
Index: arch/i386/kernel/setup.c
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/arch/i386/kernel/setup.c (mode:100644)
+++ 0feb50d39a18b8a58ac2894eeac9b2f24a3b4435/arch/i386/kernel/setup.c (mode:100644)
@@ -1523,6 +1523,7 @@
conswitchp = &dummy_con;
#endif
#endif
+ tsc_init();
}

#include "setup_arch_post.h"
Index: arch/i386/kernel/time.c
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/arch/i386/kernel/time.c (mode:100644)
+++ 0feb50d39a18b8a58ac2894eeac9b2f24a3b4435/arch/i386/kernel/time.c (mode:100644)
@@ -68,6 +68,8 @@

#include "io_ports.h"

+#include <linux/timeofday.h>
+
extern spinlock_t i8259A_lock;
int pit_latch_buggy; /* extern */

@@ -86,8 +88,6 @@
DEFINE_SPINLOCK(i8253_lock);
EXPORT_SYMBOL(i8253_lock);

-struct timer_opts *cur_timer = &timer_none;
-
/*
* This is a special lock that is owned by the CPU and holds the index
* register we are working with. It is required for NMI access to the
@@ -117,102 +117,19 @@
}
EXPORT_SYMBOL(rtc_cmos_write);

-/*
- * This version of gettimeofday has microsecond resolution
- * and better than microsecond precision on fast x86 machines with TSC.
- */
-void do_gettimeofday(struct timeval *tv)
-{
- unsigned long seq;
- unsigned long usec, sec;
- unsigned long max_ntp_tick;
-
- do {
- unsigned long lost;
-
- seq = read_seqbegin(&xtime_lock);
-
- usec = cur_timer->get_offset();
- lost = jiffies - wall_jiffies;
-
- /*
- * If time_adjust is negative then NTP is slowing the clock
- * so make sure not to go into next possible interval.
- * Better to lose some accuracy than have time go backwards..
- */
- if (unlikely(time_adjust < 0)) {
- max_ntp_tick = (USEC_PER_SEC / HZ) - tickadj;
- usec = min(usec, max_ntp_tick);
-
- if (lost)
- usec += lost * max_ntp_tick;
- }
- else if (unlikely(lost))
- usec += lost * (USEC_PER_SEC / HZ);
-
- sec = xtime.tv_sec;
- usec += (xtime.tv_nsec / 1000);
- } while (read_seqretry(&xtime_lock, seq));
-
- while (usec >= 1000000) {
- usec -= 1000000;
- sec++;
- }
-
- tv->tv_sec = sec;
- tv->tv_usec = usec;
-}
-
-EXPORT_SYMBOL(do_gettimeofday);
-
-int do_settimeofday(struct timespec *tv)
-{
- time_t wtm_sec, sec = tv->tv_sec;
- long wtm_nsec, nsec = tv->tv_nsec;
-
- if ((unsigned long)tv->tv_nsec >= NSEC_PER_SEC)
- return -EINVAL;
-
- write_seqlock_irq(&xtime_lock);
- /*
- * This is revolting. We need to set "xtime" correctly. However, the
- * value in this location is the value at the most recent update of
- * wall time. Discover what correction gettimeofday() would have
- * made, and then undo it!
- */
- nsec -= cur_timer->get_offset() * NSEC_PER_USEC;
- nsec -= (jiffies - wall_jiffies) * TICK_NSEC;
-
- wtm_sec = wall_to_monotonic.tv_sec + (xtime.tv_sec - sec);
- wtm_nsec = wall_to_monotonic.tv_nsec + (xtime.tv_nsec - nsec);
-
- set_normalized_timespec(&xtime, sec, nsec);
- set_normalized_timespec(&wall_to_monotonic, wtm_sec, wtm_nsec);
-
- time_adjust = 0; /* stop active adjtime() */
- time_status |= STA_UNSYNC;
- time_maxerror = NTP_PHASE_LIMIT;
- time_esterror = NTP_PHASE_LIMIT;
- write_sequnlock_irq(&xtime_lock);
- clock_was_set();
- return 0;
-}
-
-EXPORT_SYMBOL(do_settimeofday);
-
static int set_rtc_mmss(unsigned long nowtime)
{
int retval;
-
- WARN_ON(irqs_disabled());
+ unsigned long flags;

/* gets recalled with irq locally disabled */
- spin_lock_irq(&rtc_lock);
+ /* XXX - does irqsave resolve this? -johnstul */
+ spin_lock_irqsave(&rtc_lock, flags);
if (efi_enabled)
retval = efi_set_rtc_mmss(nowtime);
else
retval = mach_set_rtc_mmss(nowtime);
- spin_unlock_irq(&rtc_lock);
+ spin_unlock_irqrestore(&rtc_lock, flags);

return retval;
}
@@ -220,16 +137,6 @@

int timer_ack;

-/* monotonic_clock(): returns # of nanoseconds passed since time_init()
- * Note: This function is required to return accurate
- * time even in the absence of multiple timer ticks.
- */
-unsigned long long monotonic_clock(void)
-{
- return cur_timer->monotonic_clock();
-}
-EXPORT_SYMBOL(monotonic_clock);
-
#if defined(CONFIG_SMP) && defined(CONFIG_FRAME_POINTER)
unsigned long profile_pc(struct pt_regs *regs)
{
@@ -244,12 +151,21 @@
#endif

/*
- * timer_interrupt() needs to keep up the real-time clock,
- * as well as call the "do_timer()" routine every clocktick
+ * This is the same as the above, except we _also_ save the current
+ * Time Stamp Counter value at the time of the timer interrupt, so that
+ * we later on can estimate the time of day more exactly.
*/
-static inline void do_timer_interrupt(int irq, void *dev_id,
- struct pt_regs *regs)
+irqreturn_t timer_interrupt(int irq, void *dev_id, struct pt_regs *regs)
{
+ /*
+ * Here we are in the timer irq handler. We just have irqs locally
+ * disabled but we don't know if the timer_bh is running on the other
+ * CPU. We need to avoid to SMP race with it. NOTE: we don' t need
+ * the irq version of write_lock because as just said we have irq
+ * locally disabled. -arca
+ */
+ write_seqlock(&xtime_lock);
+
#ifdef CONFIG_X86_IO_APIC
if (timer_ack) {
/*
@@ -282,27 +198,6 @@
irq = inb_p( 0x61 ); /* read the current state */
outb_p( irq|0x80, 0x61 ); /* reset the IRQ */
}
-}
-
-/*
- * This is the same as the above, except we _also_ save the current
- * Time Stamp Counter value at the time of the timer interrupt, so that
- * we later on can estimate the time of day more exactly.
- */
-irqreturn_t timer_interrupt(int irq, void *dev_id, struct pt_regs *regs)
-{
- /*
- * Here we are in the timer irq handler. We just have irqs locally
- * disabled but we don't know if the timer_bh is running on the other
- * CPU. We need to avoid to SMP race with it. NOTE: we don' t need
- * the irq version of write_lock because as just said we have irq
- * locally disabled. -arca
- */
- write_seqlock(&xtime_lock);
-
- cur_timer->mark_offset();
-
- do_timer_interrupt(irq, NULL, regs);

write_sequnlock(&xtime_lock);
return IRQ_HANDLED;
@@ -324,55 +219,35 @@

return retval;
}
-static void sync_cmos_clock(unsigned long dummy);

-static struct timer_list sync_cmos_timer =
- TIMER_INITIALIZER(sync_cmos_clock, 0, 0);
-
-static void sync_cmos_clock(unsigned long dummy)
+/* arch specific timeofday hooks */
+nsec_t read_persistent_clock(void)
{
- struct timeval now, next;
- int fail = 1;
+ return (nsec_t)get_cmos_time() * NSEC_PER_SEC;
+}

+void sync_persistent_clock(struct timespec ts)
+{
+ static unsigned long last_rtc_update;
/*
* If we have an externally synchronized Linux clock, then update
* CMOS clock accordingly every ~11 minutes. Set_rtc_mmss() has to be
* called as close as possible to 500 ms before the new second starts.
- * This code is run on a timer. If the clock is set, that timer
- * may not expire at the correct time. Thus, we adjust...
*/
- if ((time_status & STA_UNSYNC) != 0)
- /*
- * Not synced, exit, do not restart a timer (if one is
- * running, let it run out).
- */
+ if (ts.tv_sec <= last_rtc_update + 660)
return;

- do_gettimeofday(&now);
- if (now.tv_usec >= USEC_AFTER - ((unsigned) TICK_SIZE) / 2 &&
- now.tv_usec <= USEC_BEFORE + ((unsigned) TICK_SIZE) / 2)
- fail = set_rtc_mmss(now.tv_sec);
-
- next.tv_usec = USEC_AFTER - now.tv_usec;
- if (next.tv_usec <= 0)
- next.tv_usec += USEC_PER_SEC;
-
- if (!fail)
- next.tv_sec = 659;
- else
- next.tv_sec = 0;
-
- if (next.tv_usec >= USEC_PER_SEC) {
- next.tv_sec++;
- next.tv_usec -= USEC_PER_SEC;
+ if((ts.tv_nsec / 1000) >= USEC_AFTER - ((unsigned) TICK_SIZE) / 2 &&
+ (ts.tv_nsec / 1000) <= USEC_BEFORE + ((unsigned) TICK_SIZE) / 2) {
+ /* horrible...FIXME */
+ if (set_rtc_mmss(ts.tv_sec) == 0)
+ last_rtc_update = ts.tv_sec;
+ else
+ last_rtc_update = ts.tv_sec - 600; /* do it again in 60 s */
}
- mod_timer(&sync_cmos_timer, jiffies + timeval_to_jiffies(&next));
}

-void notify_arch_cmos_timer(void)
-{
- mod_timer(&sync_cmos_timer, jiffies + 1);
-}
+

static long clock_cmos_diff, sleep_start;

@@ -389,7 +264,6 @@

static int timer_resume(struct sys_device *dev)
{
- unsigned long flags;
unsigned long sec;
unsigned long sleep_length;

@@ -399,10 +273,6 @@
#endif
sec = get_cmos_time() + clock_cmos_diff;
sleep_length = (get_cmos_time() - sleep_start) * HZ;
- write_seqlock_irqsave(&xtime_lock, flags);
- xtime.tv_sec = sec;
- xtime.tv_nsec = 0;
- write_sequnlock_irqrestore(&xtime_lock, flags);
jiffies += sleep_length;
wall_jiffies += sleep_length;
return 0;
@@ -436,17 +306,10 @@
/* Duplicate of time_init() below, with hpet_enable part added */
static void __init hpet_time_init(void)
{
- xtime.tv_sec = get_cmos_time();
- xtime.tv_nsec = (INITIAL_JIFFIES % HZ) * (NSEC_PER_SEC / HZ);
- set_normalized_timespec(&wall_to_monotonic,
- -xtime.tv_sec, -xtime.tv_nsec);
-
if ((hpet_enable() >= 0) && hpet_use_timer) {
printk("Using HPET for base-timer\n");
}

- cur_timer = select_timer();
- printk(KERN_INFO "Using %s for high-res timesource\n",cur_timer->name);

time_init_hook();
}
@@ -464,13 +327,5 @@
return;
}
#endif
- xtime.tv_sec = get_cmos_time();
- xtime.tv_nsec = (INITIAL_JIFFIES % HZ) * (NSEC_PER_SEC / HZ);
- set_normalized_timespec(&wall_to_monotonic,
- -xtime.tv_sec, -xtime.tv_nsec);
-
- cur_timer = select_timer();
- printk(KERN_INFO "Using %s for high-res timesource\n",cur_timer->name);
-
time_init_hook();
}
Index: arch/i386/kernel/timers/Makefile
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/arch/i386/kernel/timers/Makefile (mode:100644)
+++ /dev/null (tree:0feb50d39a18b8a58ac2894eeac9b2f24a3b4435)
@@ -1,9 +0,0 @@
-#
-# Makefile for x86 timers
-#
-
-obj-y := timer.o timer_none.o timer_tsc.o timer_pit.o common.o
-
-obj-$(CONFIG_X86_CYCLONE_TIMER) += timer_cyclone.o
-obj-$(CONFIG_HPET_TIMER) += timer_hpet.o
-obj-$(CONFIG_X86_PM_TIMER) += timer_pm.o
Index: arch/i386/kernel/timers/common.c
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/arch/i386/kernel/timers/common.c (mode:100644)
+++ /dev/null (tree:0feb50d39a18b8a58ac2894eeac9b2f24a3b4435)
@@ -1,160 +0,0 @@
-/*
- * Common functions used across the timers go here
- */
-
-#include <linux/init.h>
-#include <linux/timex.h>
-#include <linux/errno.h>
-#include <linux/jiffies.h>
-
-#include <asm/io.h>
-#include <asm/timer.h>
-#include <asm/hpet.h>
-
-#include "mach_timer.h"
-
-/* ------ Calibrate the TSC -------
- * Return 2^32 * (1 / (TSC clocks per usec)) for do_fast_gettimeoffset().
- * Too much 64-bit arithmetic here to do this cleanly in C, and for
- * accuracy's sake we want to keep the overhead on the CTC speaker (channel 2)
- * output busy loop as low as possible. We avoid reading the CTC registers
- * directly because of the awkward 8-bit access mechanism of the 82C54
- * device.
- */
-
-#define CALIBRATE_TIME (5 * 1000020/HZ)
-
-unsigned long __init calibrate_tsc(void)
-{
- mach_prepare_counter();
-
- {
- unsigned long startlow, starthigh;
- unsigned long endlow, endhigh;
- unsigned long count;
-
- rdtsc(startlow,starthigh);
- mach_countup(&count);
- rdtsc(endlow,endhigh);
-
-
- /* Error: ECTCNEVERSET */
- if (count <= 1)
- goto bad_ctc;
-
- /* 64-bit subtract - gcc just messes up with long longs */
- __asm__("subl %2,%0\n\t"
- "sbbl %3,%1"
- :"=a" (endlow), "=d" (endhigh)
- :"g" (startlow), "g" (starthigh),
- "0" (endlow), "1" (endhigh));
-
- /* Error: ECPUTOOFAST */
- if (endhigh)
- goto bad_ctc;
-
- /* Error: ECPUTOOSLOW */
- if (endlow <= CALIBRATE_TIME)
- goto bad_ctc;
-
- __asm__("divl %2"
- :"=a" (endlow), "=d" (endhigh)
- :"r" (endlow), "0" (0), "1" (CALIBRATE_TIME));
-
- return endlow;
- }
-
- /*
- * The CTC wasn't reliable: we got a hit on the very first read,
- * or the CPU was so fast/slow that the quotient wouldn't fit in
- * 32 bits..
- */
-bad_ctc:
- return 0;
-}
-
-#ifdef CONFIG_HPET_TIMER
-/* ------ Calibrate the TSC using HPET -------
- * Return 2^32 * (1 / (TSC clocks per usec)) for getting the CPU freq.
- * Second output is parameter 1 (when non NULL)
- * Set 2^32 * (1 / (tsc per HPET clk)) for delay_hpet().
- * calibrate_tsc() calibrates the processor TSC by comparing
- * it to the HPET timer of known frequency.
- * Too much 64-bit arithmetic here to do this cleanly in C
- */
-#define CALIBRATE_CNT_HPET (5 * hpet_tick)
-#define CALIBRATE_TIME_HPET (5 * KERNEL_TICK_USEC)
-
-unsigned long __init calibrate_tsc_hpet(unsigned long *tsc_hpet_quotient_ptr)
-{
- unsigned long tsc_startlow, tsc_starthigh;
- unsigned long tsc_endlow, tsc_endhigh;
- unsigned long hpet_start, hpet_end;
- unsigned long result, remain;
-
- hpet_start = hpet_readl(HPET_COUNTER);
- rdtsc(tsc_startlow, tsc_starthigh);
- do {
- hpet_end = hpet_readl(HPET_COUNTER);
- } while ((hpet_end - hpet_start) < CALIBRATE_CNT_HPET);
- rdtsc(tsc_endlow, tsc_endhigh);
-
- /* 64-bit subtract - gcc just messes up with long longs */
- __asm__("subl %2,%0\n\t"
- "sbbl %3,%1"
- :"=a" (tsc_endlow), "=d" (tsc_endhigh)
- :"g" (tsc_startlow), "g" (tsc_starthigh),
- "0" (tsc_endlow), "1" (tsc_endhigh));
-
- /* Error: ECPUTOOFAST */
- if (tsc_endhigh)
- goto bad_calibration;
-
- /* Error: ECPUTOOSLOW */
- if (tsc_endlow <= CALIBRATE_TIME_HPET)
- goto bad_calibration;
-
- ASM_DIV64_REG(result, remain, tsc_endlow, 0, CALIBRATE_TIME_HPET);
- if (remain > (tsc_endlow >> 1))
- result++; /* rounding the result */
-
- if (tsc_hpet_quotient_ptr) {
- unsigned long tsc_hpet_quotient;
-
- ASM_DIV64_REG(tsc_hpet_quotient, remain, tsc_endlow, 0,
- CALIBRATE_CNT_HPET);
- if (remain > (tsc_endlow >> 1))
- tsc_hpet_quotient++; /* rounding the result */
- *tsc_hpet_quotient_ptr = tsc_hpet_quotient;
- }
-
- return result;
-bad_calibration:
- /*
- * the CPU was so fast/slow that the quotient wouldn't fit in
- * 32 bits..
- */
- return 0;
-}
-#endif
-
-/* calculate cpu_khz */
-void __init init_cpu_khz(void)
-{
- if (cpu_has_tsc) {
- unsigned long tsc_quotient = calibrate_tsc();
- if (tsc_quotient) {
- /* report CPU clock rate in Hz.
- * The formula is (10^6 * 2^32) / (2^32 * 1 / (clocks/us)) =
- * clock/second. Our precision is about 100 ppm.
- */
- { unsigned long eax=0, edx=1000;
- __asm__("divl %2"
- :"=a" (cpu_khz), "=d" (edx)
- :"r" (tsc_quotient),
- "0" (eax), "1" (edx));
- printk("Detected %lu.%03lu MHz processor.\n", cpu_khz / 1000, cpu_khz % 1000);
- }
- }
- }
-}
Index: arch/i386/kernel/timers/timer.c
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/arch/i386/kernel/timers/timer.c (mode:100644)
+++ /dev/null (tree:0feb50d39a18b8a58ac2894eeac9b2f24a3b4435)
@@ -1,66 +0,0 @@
-#include <linux/init.h>
-#include <linux/kernel.h>
-#include <linux/string.h>
-#include <asm/timer.h>
-
-#ifdef CONFIG_HPET_TIMER
-/*
- * HPET memory read is slower than tsc reads, but is more dependable as it
- * always runs at constant frequency and reduces complexity due to
- * cpufreq. So, we prefer HPET timer to tsc based one. Also, we cannot use
- * timer_pit when HPET is active. So, we default to timer_tsc.
- */
-#endif
-/* list of timers, ordered by preference, NULL terminated */
-static struct init_timer_opts* __initdata timers[] = {
-#ifdef CONFIG_X86_CYCLONE_TIMER
- &timer_cyclone_init,
-#endif
-#ifdef CONFIG_HPET_TIMER
- &timer_hpet_init,
-#endif
-#ifdef CONFIG_X86_PM_TIMER
- &timer_pmtmr_init,
-#endif
- &timer_tsc_init,
- &timer_pit_init,
- NULL,
-};
-
-static char clock_override[10] __initdata;
-
-static int __init clock_setup(char* str)
-{
- if (str)
- strlcpy(clock_override, str, sizeof(clock_override));
- return 1;
-}
-__setup("clock=", clock_setup);
-
-
-/* The chosen timesource has been found to be bad.
- * Fall back to a known good timesource (the PIT)
- */
-void clock_fallback(void)
-{
- cur_timer = &timer_pit;
-}
-
-/* iterates through the list of timers, returning the first
- * one that initializes successfully.
- */
-struct timer_opts* __init select_timer(void)
-{
- int i = 0;
-
- /* find most preferred working timer */
- while (timers[i]) {
- if (timers[i]->init)
- if (timers[i]->init(clock_override) == 0)
- return timers[i]->opts;
- ++i;
- }
-
- panic("select_timer: Cannot find a suitable timer\n");
- return NULL;
-}
Index: arch/i386/kernel/timers/timer_cyclone.c
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/arch/i386/kernel/timers/timer_cyclone.c (mode:100644)
+++ /dev/null (tree:0feb50d39a18b8a58ac2894eeac9b2f24a3b4435)
@@ -1,259 +0,0 @@
-/* Cyclone-timer:
- * This code implements timer_ops for the cyclone counter found
- * on IBM x440, x360, and other Summit based systems.
- *
- * Copyright (C) 2002 IBM, John Stultz ([email protected])
- */
-
-
-#include <linux/spinlock.h>
-#include <linux/init.h>
-#include <linux/timex.h>
-#include <linux/errno.h>
-#include <linux/string.h>
-#include <linux/jiffies.h>
-
-#include <asm/timer.h>
-#include <asm/io.h>
-#include <asm/pgtable.h>
-#include <asm/fixmap.h>
-#include "io_ports.h"
-
-extern spinlock_t i8253_lock;
-
-/* Number of usecs that the last interrupt was delayed */
-static int delay_at_last_interrupt;
-
-#define CYCLONE_CBAR_ADDR 0xFEB00CD0
-#define CYCLONE_PMCC_OFFSET 0x51A0
-#define CYCLONE_MPMC_OFFSET 0x51D0
-#define CYCLONE_MPCS_OFFSET 0x51A8
-#define CYCLONE_TIMER_FREQ 100000000
-#define CYCLONE_TIMER_MASK (((u64)1<<40)-1) /* 40 bit mask */
-int use_cyclone = 0;
-
-static u32* volatile cyclone_timer; /* Cyclone MPMC0 register */
-static u32 last_cyclone_low;
-static u32 last_cyclone_high;
-static unsigned long long monotonic_base;
-static seqlock_t monotonic_lock = SEQLOCK_UNLOCKED;
-
-/* helper macro to atomically read both cyclone counter registers */
-#define read_cyclone_counter(low,high) \
- do{ \
- high = cyclone_timer[1]; low = cyclone_timer[0]; \
- } while (high != cyclone_timer[1]);
-
-
-static void mark_offset_cyclone(void)
-{
- unsigned long lost, delay;
- unsigned long delta = last_cyclone_low;
- int count;
- unsigned long long this_offset, last_offset;
-
- write_seqlock(&monotonic_lock);
- last_offset = ((unsigned long long)last_cyclone_high<<32)|last_cyclone_low;
-
- spin_lock(&i8253_lock);
- read_cyclone_counter(last_cyclone_low,last_cyclone_high);
-
- /* read values for delay_at_last_interrupt */
- outb_p(0x00, 0x43); /* latch the count ASAP */
-
- count = inb_p(0x40); /* read the latched count */
- count |= inb(0x40) << 8;
-
- /*
- * VIA686a test code... reset the latch if count > max + 1
- * from timer_pit.c - cjb
- */
- if (count > LATCH) {
- outb_p(0x34, PIT_MODE);
- outb_p(LATCH & 0xff, PIT_CH0);
- outb(LATCH >> 8, PIT_CH0);
- count = LATCH - 1;
- }
- spin_unlock(&i8253_lock);
-
- /* lost tick compensation */
- delta = last_cyclone_low - delta;
- delta /= (CYCLONE_TIMER_FREQ/1000000);
- delta += delay_at_last_interrupt;
- lost = delta/(1000000/HZ);
- delay = delta%(1000000/HZ);
- if (lost >= 2)
- jiffies_64 += lost-1;
-
- /* update the monotonic base value */
- this_offset = ((unsigned long long)last_cyclone_high<<32)|last_cyclone_low;
- monotonic_base += (this_offset - last_offset) & CYCLONE_TIMER_MASK;
- write_sequnlock(&monotonic_lock);
-
- /* calculate delay_at_last_interrupt */
- count = ((LATCH-1) - count) * TICK_SIZE;
- delay_at_last_interrupt = (count + LATCH/2) / LATCH;
-
-
- /* catch corner case where tick rollover occured
- * between cyclone and pit reads (as noted when
- * usec delta is > 90% # of usecs/tick)
- */
- if (lost && abs(delay - delay_at_last_interrupt) > (900000/HZ))
- jiffies_64++;
-}
-
-static unsigned long get_offset_cyclone(void)
-{
- u32 offset;
-
- if(!cyclone_timer)
- return delay_at_last_interrupt;
-
- /* Read the cyclone timer */
- offset = cyclone_timer[0];
-
- /* .. relative to previous jiffy */
- offset = offset - last_cyclone_low;
-
- /* convert cyclone ticks to microseconds */
- /* XXX slow, can we speed this up? */
- offset = offset/(CYCLONE_TIMER_FREQ/1000000);
-
- /* our adjusted time offset in microseconds */
- return delay_at_last_interrupt + offset;
-}
-
-static unsigned long long monotonic_clock_cyclone(void)
-{
- u32 now_low, now_high;
- unsigned long long last_offset, this_offset, base;
- unsigned long long ret;
- unsigned seq;
-
- /* atomically read monotonic base & last_offset */
- do {
- seq = read_seqbegin(&monotonic_lock);
- last_offset = ((unsigned long long)last_cyclone_high<<32)|last_cyclone_low;
- base = monotonic_base;
- } while (read_seqretry(&monotonic_lock, seq));
-
-
- /* Read the cyclone counter */
- read_cyclone_counter(now_low,now_high);
- this_offset = ((unsigned long long)now_high<<32)|now_low;
-
- /* convert to nanoseconds */
- ret = base + ((this_offset - last_offset)&CYCLONE_TIMER_MASK);
- return ret * (1000000000 / CYCLONE_TIMER_FREQ);
-}
-
-static int __init init_cyclone(char* override)
-{
- u32* reg;
- u32 base; /* saved cyclone base address */
- u32 pageaddr; /* page that contains cyclone_timer register */
- u32 offset; /* offset from pageaddr to cyclone_timer register */
- int i;
-
- /* check clock override */
- if (override[0] && strncmp(override,"cyclone",7))
- return -ENODEV;
-
- /*make sure we're on a summit box*/
- if(!use_cyclone) return -ENODEV;
-
- printk(KERN_INFO "Summit chipset: Starting Cyclone Counter.\n");
-
- /* find base address */
- pageaddr = (CYCLONE_CBAR_ADDR)&PAGE_MASK;
- offset = (CYCLONE_CBAR_ADDR)&(~PAGE_MASK);
- set_fixmap_nocache(FIX_CYCLONE_TIMER, pageaddr);
- reg = (u32*)(fix_to_virt(FIX_CYCLONE_TIMER) + offset);
- if(!reg){
- printk(KERN_ERR "Summit chipset: Could not find valid CBAR register.\n");
- return -ENODEV;
- }
- base = *reg;
- if(!base){
- printk(KERN_ERR "Summit chipset: Could not find valid CBAR value.\n");
- return -ENODEV;
- }
-
- /* setup PMCC */
- pageaddr = (base + CYCLONE_PMCC_OFFSET)&PAGE_MASK;
- offset = (base + CYCLONE_PMCC_OFFSET)&(~PAGE_MASK);
- set_fixmap_nocache(FIX_CYCLONE_TIMER, pageaddr);
- reg = (u32*)(fix_to_virt(FIX_CYCLONE_TIMER) + offset);
- if(!reg){
- printk(KERN_ERR "Summit chipset: Could not find valid PMCC register.\n");
- return -ENODEV;
- }
- reg[0] = 0x00000001;
-
- /* setup MPCS */
- pageaddr = (base + CYCLONE_MPCS_OFFSET)&PAGE_MASK;
- offset = (base + CYCLONE_MPCS_OFFSET)&(~PAGE_MASK);
- set_fixmap_nocache(FIX_CYCLONE_TIMER, pageaddr);
- reg = (u32*)(fix_to_virt(FIX_CYCLONE_TIMER) + offset);
- if(!reg){
- printk(KERN_ERR "Summit chipset: Could not find valid MPCS register.\n");
- return -ENODEV;
- }
- reg[0] = 0x00000001;
-
- /* map in cyclone_timer */
- pageaddr = (base + CYCLONE_MPMC_OFFSET)&PAGE_MASK;
- offset = (base + CYCLONE_MPMC_OFFSET)&(~PAGE_MASK);
- set_fixmap_nocache(FIX_CYCLONE_TIMER, pageaddr);
- cyclone_timer = (u32*)(fix_to_virt(FIX_CYCLONE_TIMER) + offset);
- if(!cyclone_timer){
- printk(KERN_ERR "Summit chipset: Could not find valid MPMC register.\n");
- return -ENODEV;
- }
-
- /*quick test to make sure its ticking*/
- for(i=0; i<3; i++){
- u32 old = cyclone_timer[0];
- int stall = 100;
- while(stall--) barrier();
- if(cyclone_timer[0] == old){
- printk(KERN_ERR "Summit chipset: Counter not counting! DISABLED\n");
- cyclone_timer = 0;
- return -ENODEV;
- }
- }
-
- init_cpu_khz();
-
- /* Everything looks good! */
- return 0;
-}
-
-
-static void delay_cyclone(unsigned long loops)
-{
- unsigned long bclock, now;
- if(!cyclone_timer)
- return;
- bclock = cyclone_timer[0];
- do {
- rep_nop();
- now = cyclone_timer[0];
- } while ((now-bclock) < loops);
-}
-/************************************************************/
-
-/* cyclone timer_opts struct */
-static struct timer_opts timer_cyclone = {
- .name = "cyclone",
- .mark_offset = mark_offset_cyclone,
- .get_offset = get_offset_cyclone,
- .monotonic_clock = monotonic_clock_cyclone,
- .delay = delay_cyclone,
-};
-
-struct init_timer_opts __initdata timer_cyclone_init = {
- .init = init_cyclone,
- .opts = &timer_cyclone,
-};
Index: arch/i386/kernel/timers/timer_hpet.c
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/arch/i386/kernel/timers/timer_hpet.c (mode:100644)
+++ /dev/null (tree:0feb50d39a18b8a58ac2894eeac9b2f24a3b4435)
@@ -1,194 +0,0 @@
-/*
- * This code largely moved from arch/i386/kernel/time.c.
- * See comments there for proper credits.
- */
-
-#include <linux/spinlock.h>
-#include <linux/init.h>
-#include <linux/timex.h>
-#include <linux/errno.h>
-#include <linux/string.h>
-#include <linux/jiffies.h>
-
-#include <asm/timer.h>
-#include <asm/io.h>
-#include <asm/processor.h>
-
-#include "io_ports.h"
-#include "mach_timer.h"
-#include <asm/hpet.h>
-
-static unsigned long hpet_usec_quotient; /* convert hpet clks to usec */
-static unsigned long tsc_hpet_quotient; /* convert tsc to hpet clks */
-static unsigned long hpet_last; /* hpet counter value at last tick*/
-static unsigned long last_tsc_low; /* lsb 32 bits of Time Stamp Counter */
-static unsigned long last_tsc_high; /* msb 32 bits of Time Stamp Counter */
-static unsigned long long monotonic_base;
-static seqlock_t monotonic_lock = SEQLOCK_UNLOCKED;
-
-/* convert from cycles(64bits) => nanoseconds (64bits)
- * basic equation:
- * ns = cycles / (freq / ns_per_sec)
- * ns = cycles * (ns_per_sec / freq)
- * ns = cycles * (10^9 / (cpu_mhz * 10^6))
- * ns = cycles * (10^3 / cpu_mhz)
- *
- * Then we use scaling math (suggested by [email protected]) to get:
- * ns = cycles * (10^3 * SC / cpu_mhz) / SC
- * ns = cycles * cyc2ns_scale / SC
- *
- * And since SC is a constant power of two, we can convert the div
- * into a shift.
- * [email protected] "math is hard, lets go shopping!"
- */
-static unsigned long cyc2ns_scale;
-#define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */
-
-static inline void set_cyc2ns_scale(unsigned long cpu_mhz)
-{
- cyc2ns_scale = (1000 << CYC2NS_SCALE_FACTOR)/cpu_mhz;
-}
-
-static inline unsigned long long cycles_2_ns(unsigned long long cyc)
-{
- return (cyc * cyc2ns_scale) >> CYC2NS_SCALE_FACTOR;
-}
-
-static unsigned long long monotonic_clock_hpet(void)
-{
- unsigned long long last_offset, this_offset, base;
- unsigned seq;
-
- /* atomically read monotonic base & last_offset */
- do {
- seq = read_seqbegin(&monotonic_lock);
- last_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low;
- base = monotonic_base;
- } while (read_seqretry(&monotonic_lock, seq));
-
- /* Read the Time Stamp Counter */
- rdtscll(this_offset);
-
- /* return the value in ns */
- return base + cycles_2_ns(this_offset - last_offset);
-}
-
-static unsigned long get_offset_hpet(void)
-{
- register unsigned long eax, edx;
-
- eax = hpet_readl(HPET_COUNTER);
- eax -= hpet_last; /* hpet delta */
- eax = min(hpet_tick, eax);
- /*
- * Time offset = (hpet delta) * ( usecs per HPET clock )
- * = (hpet delta) * ( usecs per tick / HPET clocks per tick)
- * = (hpet delta) * ( hpet_usec_quotient ) / (2^32)
- *
- * Where,
- * hpet_usec_quotient = (2^32 * usecs per tick)/HPET clocks per tick
- *
- * Using a mull instead of a divl saves some cycles in critical path.
- */
- ASM_MUL64_REG(eax, edx, hpet_usec_quotient, eax);
-
- /* our adjusted time offset in microseconds */
- return edx;
-}
-
-static void mark_offset_hpet(void)
-{
- unsigned long long this_offset, last_offset;
- unsigned long offset;
-
- write_seqlock(&monotonic_lock);
- last_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low;
- rdtsc(last_tsc_low, last_tsc_high);
-
- if (hpet_use_timer)
- offset = hpet_readl(HPET_T0_CMP) - hpet_tick;
- else
- offset = hpet_readl(HPET_COUNTER);
- if (unlikely(((offset - hpet_last) >= (2*hpet_tick)) && (hpet_last != 0))) {
- int lost_ticks = ((offset - hpet_last) / hpet_tick) - 1;
- jiffies_64 += lost_ticks;
- }
- hpet_last = offset;
-
- /* update the monotonic base value */
- this_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low;
- monotonic_base += cycles_2_ns(this_offset - last_offset);
- write_sequnlock(&monotonic_lock);
-}
-
-static void delay_hpet(unsigned long loops)
-{
- unsigned long hpet_start, hpet_end;
- unsigned long eax;
-
- /* loops is the number of cpu cycles. Convert it to hpet clocks */
- ASM_MUL64_REG(eax, loops, tsc_hpet_quotient, loops);
-
- hpet_start = hpet_readl(HPET_COUNTER);
- do {
- rep_nop();
- hpet_end = hpet_readl(HPET_COUNTER);
- } while ((hpet_end - hpet_start) < (loops));
-}
-
-static int __init init_hpet(char* override)
-{
- unsigned long result, remain;
-
- /* check clock override */
- if (override[0] && strncmp(override,"hpet",4))
- return -ENODEV;
-
- if (!is_hpet_enabled())
- return -ENODEV;
-
- printk("Using HPET for gettimeofday\n");
- if (cpu_has_tsc) {
- unsigned long tsc_quotient = calibrate_tsc_hpet(&tsc_hpet_quotient);
- if (tsc_quotient) {
- /* report CPU clock rate in Hz.
- * The formula is (10^6 * 2^32) / (2^32 * 1 / (clocks/us)) =
- * clock/second. Our precision is about 100 ppm.
- */
- { unsigned long eax=0, edx=1000;
- ASM_DIV64_REG(cpu_khz, edx, tsc_quotient,
- eax, edx);
- printk("Detected %lu.%03lu MHz processor.\n",
- cpu_khz / 1000, cpu_khz % 1000);
- }
- set_cyc2ns_scale(cpu_khz/1000);
- }
- }
-
- /*
- * Math to calculate hpet to usec multiplier
- * Look for the comments at get_offset_hpet()
- */
- ASM_DIV64_REG(result, remain, hpet_tick, 0, KERNEL_TICK_USEC);
- if (remain > (hpet_tick >> 1))
- result++; /* rounding the result */
- hpet_usec_quotient = result;
-
- return 0;
-}
-
-/************************************************************/
-
-/* tsc timer_opts struct */
-static struct timer_opts timer_hpet = {
- .name = "hpet",
- .mark_offset = mark_offset_hpet,
- .get_offset = get_offset_hpet,
- .monotonic_clock = monotonic_clock_hpet,
- .delay = delay_hpet,
-};
-
-struct init_timer_opts __initdata timer_hpet_init = {
- .init = init_hpet,
- .opts = &timer_hpet,
-};
Index: arch/i386/kernel/timers/timer_none.c
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/arch/i386/kernel/timers/timer_none.c (mode:100644)
+++ /dev/null (tree:0feb50d39a18b8a58ac2894eeac9b2f24a3b4435)
@@ -1,39 +0,0 @@
-#include <linux/init.h>
-#include <asm/timer.h>
-
-static void mark_offset_none(void)
-{
- /* nothing needed */
-}
-
-static unsigned long get_offset_none(void)
-{
- return 0;
-}
-
-static unsigned long long monotonic_clock_none(void)
-{
- return 0;
-}
-
-static void delay_none(unsigned long loops)
-{
- int d0;
- __asm__ __volatile__(
- "\tjmp 1f\n"
- ".align 16\n"
- "1:\tjmp 2f\n"
- ".align 16\n"
- "2:\tdecl %0\n\tjns 2b"
- :"=&a" (d0)
- :"0" (loops));
-}
-
-/* none timer_opts struct */
-struct timer_opts timer_none = {
- .name = "none",
- .mark_offset = mark_offset_none,
- .get_offset = get_offset_none,
- .monotonic_clock = monotonic_clock_none,
- .delay = delay_none,
-};
Index: arch/i386/kernel/timers/timer_pit.c
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/arch/i386/kernel/timers/timer_pit.c (mode:100644)
+++ /dev/null (tree:0feb50d39a18b8a58ac2894eeac9b2f24a3b4435)
@@ -1,206 +0,0 @@
-/*
- * This code largely moved from arch/i386/kernel/time.c.
- * See comments there for proper credits.
- */
-
-#include <linux/spinlock.h>
-#include <linux/module.h>
-#include <linux/device.h>
-#include <linux/irq.h>
-#include <linux/sysdev.h>
-#include <linux/timex.h>
-#include <asm/delay.h>
-#include <asm/mpspec.h>
-#include <asm/timer.h>
-#include <asm/smp.h>
-#include <asm/io.h>
-#include <asm/arch_hooks.h>
-
-extern spinlock_t i8259A_lock;
-extern spinlock_t i8253_lock;
-#include "do_timer.h"
-#include "io_ports.h"
-
-static int count_p; /* counter in get_offset_pit() */
-
-static int __init init_pit(char* override)
-{
- /* check clock override */
- if (override[0] && strncmp(override,"pit",3))
- printk(KERN_ERR "Warning: clock= override failed. Defaulting to PIT\n");
-
- count_p = LATCH;
- return 0;
-}
-
-static void mark_offset_pit(void)
-{
- /* nothing needed */
-}
-
-static unsigned long long monotonic_clock_pit(void)
-{
- return 0;
-}
-
-static void delay_pit(unsigned long loops)
-{
- int d0;
- __asm__ __volatile__(
- "\tjmp 1f\n"
- ".align 16\n"
- "1:\tjmp 2f\n"
- ".align 16\n"
- "2:\tdecl %0\n\tjns 2b"
- :"=&a" (d0)
- :"0" (loops));
-}
-
-
-/* This function must be called with xtime_lock held.
- * It was inspired by Steve McCanne's microtime-i386 for BSD. -- jrs
- *
- * However, the pc-audio speaker driver changes the divisor so that
- * it gets interrupted rather more often - it loads 64 into the
- * counter rather than 11932! This has an adverse impact on
- * do_gettimeoffset() -- it stops working! What is also not
- * good is that the interval that our timer function gets called
- * is no longer 10.0002 ms, but 9.9767 ms. To get around this
- * would require using a different timing source. Maybe someone
- * could use the RTC - I know that this can interrupt at frequencies
- * ranging from 8192Hz to 2Hz. If I had the energy, I'd somehow fix
- * it so that at startup, the timer code in sched.c would select
- * using either the RTC or the 8253 timer. The decision would be
- * based on whether there was any other device around that needed
- * to trample on the 8253. I'd set up the RTC to interrupt at 1024 Hz,
- * and then do some jiggery to have a version of do_timer that
- * advanced the clock by 1/1024 s. Every time that reached over 1/100
- * of a second, then do all the old code. If the time was kept correct
- * then do_gettimeoffset could just return 0 - there is no low order
- * divider that can be accessed.
- *
- * Ideally, you would be able to use the RTC for the speaker driver,
- * but it appears that the speaker driver really needs interrupt more
- * often than every 120 us or so.
- *
- * Anyway, this needs more thought.... pjsg (1993-08-28)
- *
- * If you are really that interested, you should be reading
- * comp.protocols.time.ntp!
- */
-
-static unsigned long get_offset_pit(void)
-{
- int count;
- unsigned long flags;
- static unsigned long jiffies_p = 0;
-
- /*
- * cache volatile jiffies temporarily; we have xtime_lock.
- */
- unsigned long jiffies_t;
-
- spin_lock_irqsave(&i8253_lock, flags);
- /* timer count may underflow right here */
- outb_p(0x00, PIT_MODE); /* latch the count ASAP */
-
- count = inb_p(PIT_CH0); /* read the latched count */
-
- /*
- * We do this guaranteed double memory access instead of a _p
- * postfix in the previous port access. Wheee, hackady hack
- */
- jiffies_t = jiffies;
-
- count |= inb_p(PIT_CH0) << 8;
-
- /* VIA686a test code... reset the latch if count > max + 1 */
- if (count > LATCH) {
- outb_p(0x34, PIT_MODE);
- outb_p(LATCH & 0xff, PIT_CH0);
- outb(LATCH >> 8, PIT_CH0);
- count = LATCH - 1;
- }
-
- /*
- * avoiding timer inconsistencies (they are rare, but they happen)...
- * there are two kinds of problems that must be avoided here:
- * 1. the timer counter underflows
- * 2. hardware problem with the timer, not giving us continuous time,
- * the counter does small "jumps" upwards on some Pentium systems,
- * (see c't 95/10 page 335 for Neptun bug.)
- */
-
- if( jiffies_t == jiffies_p ) {
- if( count > count_p ) {
- /* the nutcase */
- count = do_timer_overflow(count);
- }
- } else
- jiffies_p = jiffies_t;
-
- count_p = count;
-
- spin_unlock_irqrestore(&i8253_lock, flags);
-
- count = ((LATCH-1) - count) * TICK_SIZE;
- count = (count + LATCH/2) / LATCH;
-
- return count;
-}
-
-
-/* tsc timer_opts struct */
-struct timer_opts timer_pit = {
- .name = "pit",
- .mark_offset = mark_offset_pit,
- .get_offset = get_offset_pit,
- .monotonic_clock = monotonic_clock_pit,
- .delay = delay_pit,
-};
-
-struct init_timer_opts __initdata timer_pit_init = {
- .init = init_pit,
- .opts = &timer_pit,
-};
-
-void setup_pit_timer(void)
-{
- extern spinlock_t i8253_lock;
- unsigned long flags;
-
- spin_lock_irqsave(&i8253_lock, flags);
- outb_p(0x34,PIT_MODE); /* binary, mode 2, LSB/MSB, ch 0 */
- udelay(10);
- outb_p(LATCH & 0xff , PIT_CH0); /* LSB */
- udelay(10);
- outb(LATCH >> 8 , PIT_CH0); /* MSB */
- spin_unlock_irqrestore(&i8253_lock, flags);
-}
-
-static int timer_resume(struct sys_device *dev)
-{
- setup_pit_timer();
- return 0;
-}
-
-static struct sysdev_class timer_sysclass = {
- set_kset_name("timer_pit"),
- .resume = timer_resume,
-};
-
-static struct sys_device device_timer = {
- .id = 0,
- .cls = &timer_sysclass,
-};
-
-static int __init init_timer_sysfs(void)
-{
- int error = sysdev_class_register(&timer_sysclass);
- if (!error)
- error = sysdev_register(&device_timer);
- return error;
-}
-
-device_initcall(init_timer_sysfs);
-
Index: arch/i386/kernel/timers/timer_pm.c
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/arch/i386/kernel/timers/timer_pm.c (mode:100644)
+++ /dev/null (tree:0feb50d39a18b8a58ac2894eeac9b2f24a3b4435)
@@ -1,258 +0,0 @@
-/*
- * (C) Dominik Brodowski <[email protected]> 2003
- *
- * Driver to use the Power Management Timer (PMTMR) available in some
- * southbridges as primary timing source for the Linux kernel.
- *
- * Based on parts of linux/drivers/acpi/hardware/hwtimer.c, timer_pit.c,
- * timer_hpet.c, and on Arjan van de Ven's implementation for 2.4.
- *
- * This file is licensed under the GPL v2.
- */
-
-
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/device.h>
-#include <linux/init.h>
-#include <asm/types.h>
-#include <asm/timer.h>
-#include <asm/smp.h>
-#include <asm/io.h>
-#include <asm/arch_hooks.h>
-
-#include <linux/timex.h>
-#include "mach_timer.h"
-
-/* Number of PMTMR ticks expected during calibration run */
-#define PMTMR_TICKS_PER_SEC 3579545
-#define PMTMR_EXPECTED_RATE \
- ((CALIBRATE_LATCH * (PMTMR_TICKS_PER_SEC >> 10)) / (CLOCK_TICK_RATE>>10))
-
-
-/* The I/O port the PMTMR resides at.
- * The location is detected during setup_arch(),
- * in arch/i386/acpi/boot.c */
-u32 pmtmr_ioport = 0;
-
-
-/* value of the Power timer at last timer interrupt */
-static u32 offset_tick;
-static u32 offset_delay;
-
-static unsigned long long monotonic_base;
-static seqlock_t monotonic_lock = SEQLOCK_UNLOCKED;
-
-#define ACPI_PM_MASK 0xFFFFFF /* limit it to 24 bits */
-
-/*helper function to safely read acpi pm timesource*/
-static inline u32 read_pmtmr(void)
-{
- u32 v1=0,v2=0,v3=0;
- /* It has been reported that because of various broken
- * chipsets (ICH4, PIIX4 and PIIX4E) where the ACPI PM time
- * source is not latched, so you must read it multiple
- * times to insure a safe value is read.
- */
- do {
- v1 = inl(pmtmr_ioport);
- v2 = inl(pmtmr_ioport);
- v3 = inl(pmtmr_ioport);
- } while ((v1 > v2 && v1 < v3) || (v2 > v3 && v2 < v1)
- || (v3 > v1 && v3 < v2));
-
- /* mask the output to 24 bits */
- return v2 & ACPI_PM_MASK;
-}
-
-
-/*
- * Some boards have the PMTMR running way too fast. We check
- * the PMTMR rate against PIT channel 2 to catch these cases.
- */
-static int verify_pmtmr_rate(void)
-{
- u32 value1, value2;
- unsigned long count, delta;
-
- mach_prepare_counter();
- value1 = read_pmtmr();
- mach_countup(&count);
- value2 = read_pmtmr();
- delta = (value2 - value1) & ACPI_PM_MASK;
-
- /* Check that the PMTMR delta is within 5% of what we expect */
- if (delta < (PMTMR_EXPECTED_RATE * 19) / 20 ||
- delta > (PMTMR_EXPECTED_RATE * 21) / 20) {
- printk(KERN_INFO "PM-Timer running at invalid rate: %lu%% of normal - aborting.\n", 100UL * delta / PMTMR_EXPECTED_RATE);
- return -1;
- }
-
- return 0;
-}
-
-
-static int init_pmtmr(char* override)
-{
- u32 value1, value2;
- unsigned int i;
-
- if (override[0] && strncmp(override,"pmtmr",5))
- return -ENODEV;
-
- if (!pmtmr_ioport)
- return -ENODEV;
-
- /* we use the TSC for delay_pmtmr, so make sure it exists */
- if (!cpu_has_tsc)
- return -ENODEV;
-
- /* "verify" this timing source */
- value1 = read_pmtmr();
- for (i = 0; i < 10000; i++) {
- value2 = read_pmtmr();
- if (value2 == value1)
- continue;
- if (value2 > value1)
- goto pm_good;
- if ((value2 < value1) && ((value2) < 0xFFF))
- goto pm_good;
- printk(KERN_INFO "PM-Timer had inconsistent results: 0x%#x, 0x%#x - aborting.\n", value1, value2);
- return -EINVAL;
- }
- printk(KERN_INFO "PM-Timer had no reasonable result: 0x%#x - aborting.\n", value1);
- return -ENODEV;
-
-pm_good:
- if (verify_pmtmr_rate() != 0)
- return -ENODEV;
-
- init_cpu_khz();
- return 0;
-}
-
-static inline u32 cyc2us(u32 cycles)
-{
- /* The Power Management Timer ticks at 3.579545 ticks per microsecond.
- * 1 / PM_TIMER_FREQUENCY == 0.27936511 =~ 286/1024 [error: 0.024%]
- *
- * Even with HZ = 100, delta is at maximum 35796 ticks, so it can
- * easily be multiplied with 286 (=0x11E) without having to fear
- * u32 overflows.
- */
- cycles *= 286;
- return (cycles >> 10);
-}
-
-/*
- * this gets called during each timer interrupt
- * - Called while holding the writer xtime_lock
- */
-static void mark_offset_pmtmr(void)
-{
- u32 lost, delta, last_offset;
- static int first_run = 1;
- last_offset = offset_tick;
-
- write_seqlock(&monotonic_lock);
-
- offset_tick = read_pmtmr();
-
- /* calculate tick interval */
- delta = (offset_tick - last_offset) & ACPI_PM_MASK;
-
- /* convert to usecs */
- delta = cyc2us(delta);
-
- /* update the monotonic base value */
- monotonic_base += delta * NSEC_PER_USEC;
- write_sequnlock(&monotonic_lock);
-
- /* convert to ticks */
- delta += offset_delay;
- lost = delta / (USEC_PER_SEC / HZ);
- offset_delay = delta % (USEC_PER_SEC / HZ);
-
-
- /* compensate for lost ticks */
- if (lost >= 2)
- jiffies_64 += lost - 1;
-
- /* don't calculate delay for first run,
- or if we've got less then a tick */
- if (first_run || (lost < 1)) {
- first_run = 0;
- offset_delay = 0;
- }
-}
-
-
-static unsigned long long monotonic_clock_pmtmr(void)
-{
- u32 last_offset, this_offset;
- unsigned long long base, ret;
- unsigned seq;
-
-
- /* atomically read monotonic base & last_offset */
- do {
- seq = read_seqbegin(&monotonic_lock);
- last_offset = offset_tick;
- base = monotonic_base;
- } while (read_seqretry(&monotonic_lock, seq));
-
- /* Read the pmtmr */
- this_offset = read_pmtmr();
-
- /* convert to nanoseconds */
- ret = (this_offset - last_offset) & ACPI_PM_MASK;
- ret = base + (cyc2us(ret) * NSEC_PER_USEC);
- return ret;
-}
-
-static void delay_pmtmr(unsigned long loops)
-{
- unsigned long bclock, now;
-
- rdtscl(bclock);
- do
- {
- rep_nop();
- rdtscl(now);
- } while ((now-bclock) < loops);
-}
-
-
-/*
- * get the offset (in microseconds) from the last call to mark_offset()
- * - Called holding a reader xtime_lock
- */
-static unsigned long get_offset_pmtmr(void)
-{
- u32 now, offset, delta = 0;
-
- offset = offset_tick;
- now = read_pmtmr();
- delta = (now - offset)&ACPI_PM_MASK;
-
- return (unsigned long) offset_delay + cyc2us(delta);
-}
-
-
-/* acpi timer_opts struct */
-static struct timer_opts timer_pmtmr = {
- .name = "pmtmr",
- .mark_offset = mark_offset_pmtmr,
- .get_offset = get_offset_pmtmr,
- .monotonic_clock = monotonic_clock_pmtmr,
- .delay = delay_pmtmr,
-};
-
-struct init_timer_opts __initdata timer_pmtmr_init = {
- .init = init_pmtmr,
- .opts = &timer_pmtmr,
-};
-
-MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Dominik Brodowski <[email protected]>");
-MODULE_DESCRIPTION("Power Management Timer (PMTMR) as primary timing source for x86");
Index: arch/i386/kernel/timers/timer_tsc.c
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/arch/i386/kernel/timers/timer_tsc.c (mode:100644)
+++ /dev/null (tree:0feb50d39a18b8a58ac2894eeac9b2f24a3b4435)
@@ -1,560 +0,0 @@
-/*
- * This code largely moved from arch/i386/kernel/time.c.
- * See comments there for proper credits.
- *
- * 2004-06-25 Jesper Juhl
- * moved mark_offset_tsc below cpufreq_delayed_get to avoid gcc 3.4
- * failing to inline.
- */
-
-#include <linux/spinlock.h>
-#include <linux/init.h>
-#include <linux/timex.h>
-#include <linux/errno.h>
-#include <linux/cpufreq.h>
-#include <linux/string.h>
-#include <linux/jiffies.h>
-
-#include <asm/timer.h>
-#include <asm/io.h>
-/* processor.h for distable_tsc flag */
-#include <asm/processor.h>
-
-#include "io_ports.h"
-#include "mach_timer.h"
-
-#include <asm/hpet.h>
-
-#ifdef CONFIG_HPET_TIMER
-static unsigned long hpet_usec_quotient;
-static unsigned long hpet_last;
-static struct timer_opts timer_tsc;
-#endif
-
-static inline void cpufreq_delayed_get(void);
-
-int tsc_disable __initdata = 0;
-
-extern spinlock_t i8253_lock;
-
-static int use_tsc;
-/* Number of usecs that the last interrupt was delayed */
-static int delay_at_last_interrupt;
-
-static unsigned long last_tsc_low; /* lsb 32 bits of Time Stamp Counter */
-static unsigned long last_tsc_high; /* msb 32 bits of Time Stamp Counter */
-static unsigned long long monotonic_base;
-static seqlock_t monotonic_lock = SEQLOCK_UNLOCKED;
-
-/* convert from cycles(64bits) => nanoseconds (64bits)
- * basic equation:
- * ns = cycles / (freq / ns_per_sec)
- * ns = cycles * (ns_per_sec / freq)
- * ns = cycles * (10^9 / (cpu_mhz * 10^6))
- * ns = cycles * (10^3 / cpu_mhz)
- *
- * Then we use scaling math (suggested by [email protected]) to get:
- * ns = cycles * (10^3 * SC / cpu_mhz) / SC
- * ns = cycles * cyc2ns_scale / SC
- *
- * And since SC is a constant power of two, we can convert the div
- * into a shift.
- * [email protected] "math is hard, lets go shopping!"
- */
-static unsigned long cyc2ns_scale;
-#define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */
-
-static inline void set_cyc2ns_scale(unsigned long cpu_mhz)
-{
- cyc2ns_scale = (1000 << CYC2NS_SCALE_FACTOR)/cpu_mhz;
-}
-
-static inline unsigned long long cycles_2_ns(unsigned long long cyc)
-{
- return (cyc * cyc2ns_scale) >> CYC2NS_SCALE_FACTOR;
-}
-
-static int count2; /* counter for mark_offset_tsc() */
-
-/* Cached *multiplier* to convert TSC counts to microseconds.
- * (see the equation below).
- * Equal to 2^32 * (1 / (clocks per usec) ).
- * Initialized in time_init.
- */
-static unsigned long fast_gettimeoffset_quotient;
-
-static unsigned long get_offset_tsc(void)
-{
- register unsigned long eax, edx;
-
- /* Read the Time Stamp Counter */
-
- rdtsc(eax,edx);
-
- /* .. relative to previous jiffy (32 bits is enough) */
- eax -= last_tsc_low; /* tsc_low delta */
-
- /*
- * Time offset = (tsc_low delta) * fast_gettimeoffset_quotient
- * = (tsc_low delta) * (usecs_per_clock)
- * = (tsc_low delta) * (usecs_per_jiffy / clocks_per_jiffy)
- *
- * Using a mull instead of a divl saves up to 31 clock cycles
- * in the critical path.
- */
-
- __asm__("mull %2"
- :"=a" (eax), "=d" (edx)
- :"rm" (fast_gettimeoffset_quotient),
- "0" (eax));
-
- /* our adjusted time offset in microseconds */
- return delay_at_last_interrupt + edx;
-}
-
-static unsigned long long monotonic_clock_tsc(void)
-{
- unsigned long long last_offset, this_offset, base;
- unsigned seq;
-
- /* atomically read monotonic base & last_offset */
- do {
- seq = read_seqbegin(&monotonic_lock);
- last_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low;
- base = monotonic_base;
- } while (read_seqretry(&monotonic_lock, seq));
-
- /* Read the Time Stamp Counter */
- rdtscll(this_offset);
-
- /* return the value in ns */
- return base + cycles_2_ns(this_offset - last_offset);
-}
-
-/*
- * Scheduler clock - returns current time in nanosec units.
- */
-unsigned long long sched_clock(void)
-{
- unsigned long long this_offset;
-
- /*
- * In the NUMA case we dont use the TSC as they are not
- * synchronized across all CPUs.
- */
-#ifndef CONFIG_NUMA
- if (!use_tsc)
-#endif
- /* no locking but a rare wrong value is not a big deal */
- return jiffies_64 * (1000000000 / HZ);
-
- /* Read the Time Stamp Counter */
- rdtscll(this_offset);
-
- /* return the value in ns */
- return cycles_2_ns(this_offset);
-}
-
-static void delay_tsc(unsigned long loops)
-{
- unsigned long bclock, now;
-
- rdtscl(bclock);
- do
- {
- rep_nop();
- rdtscl(now);
- } while ((now-bclock) < loops);
-}
-
-#ifdef CONFIG_HPET_TIMER
-static void mark_offset_tsc_hpet(void)
-{
- unsigned long long this_offset, last_offset;
- unsigned long offset, temp, hpet_current;
-
- write_seqlock(&monotonic_lock);
- last_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low;
- /*
- * It is important that these two operations happen almost at
- * the same time. We do the RDTSC stuff first, since it's
- * faster. To avoid any inconsistencies, we need interrupts
- * disabled locally.
- */
- /*
- * Interrupts are just disabled locally since the timer irq
- * has the SA_INTERRUPT flag set. -arca
- */
- /* read Pentium cycle counter */
-
- hpet_current = hpet_readl(HPET_COUNTER);
- rdtsc(last_tsc_low, last_tsc_high);
-
- /* lost tick compensation */
- offset = hpet_readl(HPET_T0_CMP) - hpet_tick;
- if (unlikely(((offset - hpet_last) > hpet_tick) && (hpet_last != 0))) {
- int lost_ticks = (offset - hpet_last) / hpet_tick;
- jiffies_64 += lost_ticks;
- }
- hpet_last = hpet_current;
-
- /* update the monotonic base value */
- this_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low;
- monotonic_base += cycles_2_ns(this_offset - last_offset);
- write_sequnlock(&monotonic_lock);
-
- /* calculate delay_at_last_interrupt */
- /*
- * Time offset = (hpet delta) * ( usecs per HPET clock )
- * = (hpet delta) * ( usecs per tick / HPET clocks per tick)
- * = (hpet delta) * ( hpet_usec_quotient ) / (2^32)
- * Where,
- * hpet_usec_quotient = (2^32 * usecs per tick)/HPET clocks per tick
- */
- delay_at_last_interrupt = hpet_current - offset;
- ASM_MUL64_REG(temp, delay_at_last_interrupt,
- hpet_usec_quotient, delay_at_last_interrupt);
-}
-#endif
-
-
-#ifdef CONFIG_CPU_FREQ
-#include <linux/workqueue.h>
-
-static unsigned int cpufreq_delayed_issched = 0;
-static unsigned int cpufreq_init = 0;
-static struct work_struct cpufreq_delayed_get_work;
-
-static void handle_cpufreq_delayed_get(void *v)
-{
- unsigned int cpu;
- for_each_online_cpu(cpu) {
- cpufreq_get(cpu);
- }
- cpufreq_delayed_issched = 0;
-}
-
-/* if we notice lost ticks, schedule a call to cpufreq_get() as it tries
- * to verify the CPU frequency the timing core thinks the CPU is running
- * at is still correct.
- */
-static inline void cpufreq_delayed_get(void)
-{
- if (cpufreq_init && !cpufreq_delayed_issched) {
- cpufreq_delayed_issched = 1;
- printk(KERN_DEBUG "Losing some ticks... checking if CPU frequency changed.\n");
- schedule_work(&cpufreq_delayed_get_work);
- }
-}
-
-/* If the CPU frequency is scaled, TSC-based delays will need a different
- * loops_per_jiffy value to function properly.
- */
-
-static unsigned int ref_freq = 0;
-static unsigned long loops_per_jiffy_ref = 0;
-
-#ifndef CONFIG_SMP
-static unsigned long fast_gettimeoffset_ref = 0;
-static unsigned long cpu_khz_ref = 0;
-#endif
-
-static int
-time_cpufreq_notifier(struct notifier_block *nb, unsigned long val,
- void *data)
-{
- struct cpufreq_freqs *freq = data;
-
- if (val != CPUFREQ_RESUMECHANGE)
- write_seqlock_irq(&xtime_lock);
- if (!ref_freq) {
- ref_freq = freq->old;
- loops_per_jiffy_ref = cpu_data[freq->cpu].loops_per_jiffy;
-#ifndef CONFIG_SMP
- fast_gettimeoffset_ref = fast_gettimeoffset_quotient;
- cpu_khz_ref = cpu_khz;
-#endif
- }
-
- if ((val == CPUFREQ_PRECHANGE && freq->old < freq->new) ||
- (val == CPUFREQ_POSTCHANGE && freq->old > freq->new) ||
- (val == CPUFREQ_RESUMECHANGE)) {
- if (!(freq->flags & CPUFREQ_CONST_LOOPS))
- cpu_data[freq->cpu].loops_per_jiffy = cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new);
-#ifndef CONFIG_SMP
- if (cpu_khz)
- cpu_khz = cpufreq_scale(cpu_khz_ref, ref_freq, freq->new);
- if (use_tsc) {
- if (!(freq->flags & CPUFREQ_CONST_LOOPS)) {
- fast_gettimeoffset_quotient = cpufreq_scale(fast_gettimeoffset_ref, freq->new, ref_freq);
- set_cyc2ns_scale(cpu_khz/1000);
- }
- }
-#endif
- }
-
- if (val != CPUFREQ_RESUMECHANGE)
- write_sequnlock_irq(&xtime_lock);
-
- return 0;
-}
-
-static struct notifier_block time_cpufreq_notifier_block = {
- .notifier_call = time_cpufreq_notifier
-};
-
-
-static int __init cpufreq_tsc(void)
-{
- int ret;
- INIT_WORK(&cpufreq_delayed_get_work, handle_cpufreq_delayed_get, NULL);
- ret = cpufreq_register_notifier(&time_cpufreq_notifier_block,
- CPUFREQ_TRANSITION_NOTIFIER);
- if (!ret)
- cpufreq_init = 1;
- return ret;
-}
-core_initcall(cpufreq_tsc);
-
-#else /* CONFIG_CPU_FREQ */
-static inline void cpufreq_delayed_get(void) { return; }
-#endif
-
-static void mark_offset_tsc(void)
-{
- unsigned long lost,delay;
- unsigned long delta = last_tsc_low;
- int count;
- int countmp;
- static int count1 = 0;
- unsigned long long this_offset, last_offset;
- static int lost_count = 0;
-
- write_seqlock(&monotonic_lock);
- last_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low;
- /*
- * It is important that these two operations happen almost at
- * the same time. We do the RDTSC stuff first, since it's
- * faster. To avoid any inconsistencies, we need interrupts
- * disabled locally.
- */
-
- /*
- * Interrupts are just disabled locally since the timer irq
- * has the SA_INTERRUPT flag set. -arca
- */
-
- /* read Pentium cycle counter */
-
- rdtsc(last_tsc_low, last_tsc_high);
-
- spin_lock(&i8253_lock);
- outb_p(0x00, PIT_MODE); /* latch the count ASAP */
-
- count = inb_p(PIT_CH0); /* read the latched count */
- count |= inb(PIT_CH0) << 8;
-
- /*
- * VIA686a test code... reset the latch if count > max + 1
- * from timer_pit.c - cjb
- */
- if (count > LATCH) {
- outb_p(0x34, PIT_MODE);
- outb_p(LATCH & 0xff, PIT_CH0);
- outb(LATCH >> 8, PIT_CH0);
- count = LATCH - 1;
- }
-
- spin_unlock(&i8253_lock);
-
- if (pit_latch_buggy) {
- /* get center value of last 3 time lutch */
- if ((count2 >= count && count >= count1)
- || (count1 >= count && count >= count2)) {
- count2 = count1; count1 = count;
- } else if ((count1 >= count2 && count2 >= count)
- || (count >= count2 && count2 >= count1)) {
- countmp = count;count = count2;
- count2 = count1;count1 = countmp;
- } else {
- count2 = count1; count1 = count; count = count1;
- }
- }
-
- /* lost tick compensation */
- delta = last_tsc_low - delta;
- {
- register unsigned long eax, edx;
- eax = delta;
- __asm__("mull %2"
- :"=a" (eax), "=d" (edx)
- :"rm" (fast_gettimeoffset_quotient),
- "0" (eax));
- delta = edx;
- }
- delta += delay_at_last_interrupt;
- lost = delta/(1000000/HZ);
- delay = delta%(1000000/HZ);
- if (lost >= 2) {
- jiffies_64 += lost-1;
-
- /* sanity check to ensure we're not always losing ticks */
- if (lost_count++ > 100) {
- printk(KERN_WARNING "Losing too many ticks!\n");
- printk(KERN_WARNING "TSC cannot be used as a timesource. \n");
- printk(KERN_WARNING "Possible reasons for this are:\n");
- printk(KERN_WARNING " You're running with Speedstep,\n");
- printk(KERN_WARNING " You don't have DMA enabled for your hard disk (see hdparm),\n");
- printk(KERN_WARNING " Incorrect TSC synchronization on an SMP system (see dmesg).\n");
- printk(KERN_WARNING "Falling back to a sane timesource now.\n");
-
- clock_fallback();
- }
- /* ... but give the TSC a fair chance */
- if (lost_count > 25)
- cpufreq_delayed_get();
- } else
- lost_count = 0;
- /* update the monotonic base value */
- this_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low;
- monotonic_base += cycles_2_ns(this_offset - last_offset);
- write_sequnlock(&monotonic_lock);
-
- /* calculate delay_at_last_interrupt */
- count = ((LATCH-1) - count) * TICK_SIZE;
- delay_at_last_interrupt = (count + LATCH/2) / LATCH;
-
- /* catch corner case where tick rollover occured
- * between tsc and pit reads (as noted when
- * usec delta is > 90% # of usecs/tick)
- */
- if (lost && abs(delay - delay_at_last_interrupt) > (900000/HZ))
- jiffies_64++;
-}
-
-static int __init init_tsc(char* override)
-{
-
- /* check clock override */
- if (override[0] && strncmp(override,"tsc",3)) {
-#ifdef CONFIG_HPET_TIMER
- if (is_hpet_enabled()) {
- printk(KERN_ERR "Warning: clock= override failed. Defaulting to tsc\n");
- } else
-#endif
- {
- return -ENODEV;
- }
- }
-
- /*
- * If we have APM enabled or the CPU clock speed is variable
- * (CPU stops clock on HLT or slows clock to save power)
- * then the TSC timestamps may diverge by up to 1 jiffy from
- * 'real time' but nothing will break.
- * The most frequent case is that the CPU is "woken" from a halt
- * state by the timer interrupt itself, so we get 0 error. In the
- * rare cases where a driver would "wake" the CPU and request a
- * timestamp, the maximum error is < 1 jiffy. But timestamps are
- * still perfectly ordered.
- * Note that the TSC counter will be reset if APM suspends
- * to disk; this won't break the kernel, though, 'cuz we're
- * smart. See arch/i386/kernel/apm.c.
- */
- /*
- * Firstly we have to do a CPU check for chips with
- * a potentially buggy TSC. At this point we haven't run
- * the ident/bugs checks so we must run this hook as it
- * may turn off the TSC flag.
- *
- * NOTE: this doesn't yet handle SMP 486 machines where only
- * some CPU's have a TSC. Thats never worked and nobody has
- * moaned if you have the only one in the world - you fix it!
- */
-
- count2 = LATCH; /* initialize counter for mark_offset_tsc() */
-
- if (cpu_has_tsc) {
- unsigned long tsc_quotient;
-#ifdef CONFIG_HPET_TIMER
- if (is_hpet_enabled() && hpet_use_timer) {
- unsigned long result, remain;
- printk("Using TSC for gettimeofday\n");
- tsc_quotient = calibrate_tsc_hpet(NULL);
- timer_tsc.mark_offset = &mark_offset_tsc_hpet;
- /*
- * Math to calculate hpet to usec multiplier
- * Look for the comments at get_offset_tsc_hpet()
- */
- ASM_DIV64_REG(result, remain, hpet_tick,
- 0, KERNEL_TICK_USEC);
- if (remain > (hpet_tick >> 1))
- result++; /* rounding the result */
-
- hpet_usec_quotient = result;
- } else
-#endif
- {
- tsc_quotient = calibrate_tsc();
- }
-
- if (tsc_quotient) {
- fast_gettimeoffset_quotient = tsc_quotient;
- use_tsc = 1;
- /*
- * We could be more selective here I suspect
- * and just enable this for the next intel chips ?
- */
- /* report CPU clock rate in Hz.
- * The formula is (10^6 * 2^32) / (2^32 * 1 / (clocks/us)) =
- * clock/second. Our precision is about 100 ppm.
- */
- { unsigned long eax=0, edx=1000;
- __asm__("divl %2"
- :"=a" (cpu_khz), "=d" (edx)
- :"r" (tsc_quotient),
- "0" (eax), "1" (edx));
- printk("Detected %lu.%03lu MHz processor.\n", cpu_khz / 1000, cpu_khz % 1000);
- }
- set_cyc2ns_scale(cpu_khz/1000);
- return 0;
- }
- }
- return -ENODEV;
-}
-
-#ifndef CONFIG_X86_TSC
-/* disable flag for tsc. Takes effect by clearing the TSC cpu flag
- * in cpu/common.c */
-static int __init tsc_setup(char *str)
-{
- tsc_disable = 1;
- return 1;
-}
-#else
-static int __init tsc_setup(char *str)
-{
- printk(KERN_WARNING "notsc: Kernel compiled with CONFIG_X86_TSC, "
- "cannot disable TSC.\n");
- return 1;
-}
-#endif
-__setup("notsc", tsc_setup);
-
-
-
-/************************************************************/
-
-/* tsc timer_opts struct */
-static struct timer_opts timer_tsc = {
- .name = "tsc",
- .mark_offset = mark_offset_tsc,
- .get_offset = get_offset_tsc,
- .monotonic_clock = monotonic_clock_tsc,
- .delay = delay_tsc,
-};
-
-struct init_timer_opts __initdata timer_tsc_init = {
- .init = init_tsc,
- .opts = &timer_tsc,
-};
Index: arch/i386/kernel/tsc.c
===================================================================
--- /dev/null (tree:d68b09f31fa98801ead715e9281a2e4676b770a5)
+++ 0feb50d39a18b8a58ac2894eeac9b2f24a3b4435/arch/i386/kernel/tsc.c (mode:100644)
@@ -0,0 +1,163 @@
+#include <linux/init.h>
+#include <linux/timex.h>
+#include <linux/cpufreq.h>
+#include "mach_timer.h"
+
+int tsc_disable;
+
+/* convert from cycles(64bits) => nanoseconds (64bits)
+ * basic equation:
+ * ns = cycles / (freq / ns_per_sec)
+ * ns = cycles * (ns_per_sec / freq)
+ * ns = cycles * (10^9 / (cpu_mhz * 10^6))
+ * ns = cycles * (10^3 / cpu_mhz)
+ *
+ * Then we use scaling math (suggested by [email protected]) to get:
+ * ns = cycles * (10^3 * SC / cpu_mhz) / SC
+ * ns = cycles * cyc2ns_scale / SC
+ *
+ * And since SC is a constant power of two, we can convert the div
+ * into a shift.
+ * [email protected] "math is hard, lets go shopping!"
+ */
+static unsigned long cyc2ns_scale;
+#define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */
+
+static inline void set_cyc2ns_scale(unsigned long cpu_mhz)
+{
+ cyc2ns_scale = (1000 << CYC2NS_SCALE_FACTOR)/cpu_mhz;
+}
+
+static inline unsigned long long cycles_2_ns(unsigned long long cyc)
+{
+ return (cyc * cyc2ns_scale) >> CYC2NS_SCALE_FACTOR;
+}
+
+/*
+ * Scheduler clock - returns current time in nanosec units.
+ */
+unsigned long long sched_clock(void)
+{
+ unsigned long long this_offset;
+
+ /*
+ * In the NUMA case we dont use the TSC as they are not
+ * synchronized across all CPUs.
+ */
+#ifndef CONFIG_NUMA
+ if (!cpu_khz)
+#endif
+ /* no locking but a rare wrong value is not a big deal */
+ return jiffies_64 * (1000000000 / HZ);
+
+ /* Read the Time Stamp Counter */
+ rdtscll(this_offset);
+
+ /* return the value in ns */
+ return cycles_2_ns(this_offset);
+}
+
+void tsc_init(void)
+{
+ unsigned long long start, end;
+ unsigned long count;
+ u64 delta64;
+ int i;
+
+ if(!cpu_has_tsc)
+ return;
+ /* repeat 3 times to make sure the cache is warm */
+ for(i=0; i < 3; i++) {
+ mach_prepare_counter();
+ rdtscll(start);
+ mach_countup(&count);
+ rdtscll(end);
+ }
+ delta64 = end - start;
+
+ /* cpu freq too fast */
+ if(delta64 > (1ULL<<32))
+ return;
+ /* cpu freq too slow */
+ if (delta64 <= CALIBRATE_TIME)
+ return;
+
+ delta64 *= 1000;
+ do_div(delta64,CALIBRATE_TIME);
+ cpu_khz = (unsigned long)delta64;
+
+ printk("Detected %lu.%03lu MHz processor.\n",
+ cpu_khz / 1000, cpu_khz % 1000);
+
+ set_cyc2ns_scale(cpu_khz/1000);
+}
+
+
+/* All of the code below comes from arch/i386/kernel/timers/timer_tsc.c
+ * XXX: severly needs better comments and the ifdef's killed.
+ */
+
+#ifdef CONFIG_CPU_FREQ
+static unsigned int cpufreq_init = 0;
+
+/* If the CPU frequency is scaled, TSC-based delays will need a different
+ * loops_per_jiffy value to function properly.
+ */
+
+static unsigned int ref_freq = 0;
+static unsigned long loops_per_jiffy_ref = 0;
+
+#ifndef CONFIG_SMP
+static unsigned long cpu_khz_ref = 0;
+#endif
+
+static int time_cpufreq_notifier(struct notifier_block *nb,
+ unsigned long val, void *data)
+{
+ struct cpufreq_freqs *freq = data;
+
+ if (val != CPUFREQ_RESUMECHANGE)
+ write_seqlock_irq(&xtime_lock);
+ if (!ref_freq) {
+ ref_freq = freq->old;
+ loops_per_jiffy_ref = cpu_data[freq->cpu].loops_per_jiffy;
+#ifndef CONFIG_SMP
+ cpu_khz_ref = cpu_khz;
+#endif
+ }
+
+ if ((val == CPUFREQ_PRECHANGE && freq->old < freq->new) ||
+ (val == CPUFREQ_POSTCHANGE && freq->old > freq->new) ||
+ (val == CPUFREQ_RESUMECHANGE)) {
+ if (!(freq->flags & CPUFREQ_CONST_LOOPS))
+ cpu_data[freq->cpu].loops_per_jiffy = cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new);
+#ifndef CONFIG_SMP
+ if (cpu_khz) {
+ cpu_khz = cpufreq_scale(cpu_khz_ref, ref_freq, freq->new);
+ set_cyc2ns_scale(cpu_khz/1000);
+ }
+#endif
+ }
+
+ if (val != CPUFREQ_RESUMECHANGE)
+ write_sequnlock_irq(&xtime_lock);
+
+ return 0;
+}
+
+static struct notifier_block time_cpufreq_notifier_block = {
+ .notifier_call = time_cpufreq_notifier
+};
+
+static int __init cpufreq_tsc(void)
+{
+ int ret;
+ ret = cpufreq_register_notifier(&time_cpufreq_notifier_block,
+ CPUFREQ_TRANSITION_NOTIFIER);
+ if (!ret)
+ cpufreq_init = 1;
+ return ret;
+}
+core_initcall(cpufreq_tsc);
+#endif /* CONFIG_CPU_FREQ */
+
Index: arch/i386/lib/delay.c
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/arch/i386/lib/delay.c (mode:100644)
+++ 0feb50d39a18b8a58ac2894eeac9b2f24a3b4435/arch/i386/lib/delay.c (mode:100644)
@@ -13,6 +13,7 @@
#include <linux/config.h>
#include <linux/sched.h>
#include <linux/delay.h>
+#include <linux/timeofday.h>
#include <asm/processor.h>
#include <asm/delay.h>
#include <asm/timer.h>
@@ -21,11 +22,20 @@
#include <asm/smp.h>
#endif

-extern struct timer_opts* timer;
-
+/* XXX - For now just use a simple loop delay
+ * This has cpufreq issues, but so did the old method.
+ */
void __delay(unsigned long loops)
{
- cur_timer->delay(loops);
+ int d0;
+ __asm__ __volatile__(
+ "\tjmp 1f\n"
+ ".align 16\n"
+ "1:\tjmp 2f\n"
+ ".align 16\n"
+ "2:\tdecl %0\n\tjns 2b"
+ :"=&a" (d0)
+ :"0" (loops));
}

inline void __const_udelay(unsigned long xloops)
Index: include/asm-i386/mach-default/mach_timer.h
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/include/asm-i386/mach-default/mach_timer.h (mode:100644)
+++ 0feb50d39a18b8a58ac2894eeac9b2f24a3b4435/include/asm-i386/mach-default/mach_timer.h (mode:100644)
@@ -14,8 +14,12 @@
*/
#ifndef _MACH_TIMER_H
#define _MACH_TIMER_H
+#include <linux/jiffies.h>
+#include <asm/io.h>

-#define CALIBRATE_LATCH (5 * LATCH)
+#define CALIBRATE_ITERATION 50
+#define CALIBRATE_LATCH (CALIBRATE_ITERATION * LATCH)
+#define CALIBRATE_TIME (CALIBRATE_ITERATION * 1000020/HZ)

static inline void mach_prepare_counter(void)
{
Index: include/asm-i386/timer.h
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/include/asm-i386/timer.h (mode:100644)
+++ 0feb50d39a18b8a58ac2894eeac9b2f24a3b4435/include/asm-i386/timer.h (mode:100644)
@@ -2,63 +2,10 @@
#define _ASMi386_TIMER_H
#include <linux/init.h>

-/**
- * struct timer_ops - used to define a timer source
- *
- * @name: name of the timer.
- * @init: Probes and initializes the timer. Takes clock= override
- * string as an argument. Returns 0 on success, anything else
- * on failure.
- * @mark_offset: called by the timer interrupt.
- * @get_offset: called by gettimeofday(). Returns the number of microseconds
- * since the last timer interupt.
- * @monotonic_clock: returns the number of nanoseconds since the init of the
- * timer.
- * @delay: delays this many clock cycles.
- */
-struct timer_opts {
- char* name;
- void (*mark_offset)(void);
- unsigned long (*get_offset)(void);
- unsigned long long (*monotonic_clock)(void);
- void (*delay)(unsigned long);
-};
-
-struct init_timer_opts {
- int (*init)(char *override);
- struct timer_opts *opts;
-};
-
#define TICK_SIZE (tick_nsec / 1000)
-
-extern struct timer_opts* __init select_timer(void);
-extern void clock_fallback(void);
void setup_pit_timer(void);
-
/* Modifiers for buggy PIT handling */
-
extern int pit_latch_buggy;
-
-extern struct timer_opts *cur_timer;
extern int timer_ack;

-/* list of externed timers */
-extern struct timer_opts timer_none;
-extern struct timer_opts timer_pit;
-extern struct init_timer_opts timer_pit_init;
-extern struct init_timer_opts timer_tsc_init;
-#ifdef CONFIG_X86_CYCLONE_TIMER
-extern struct init_timer_opts timer_cyclone_init;
-#endif
-
-extern unsigned long calibrate_tsc(void);
-extern void init_cpu_khz(void);
-#ifdef CONFIG_HPET_TIMER
-extern struct init_timer_opts timer_hpet_init;
-extern unsigned long calibrate_tsc_hpet(unsigned long *tsc_hpet_quotient_ptr);
-#endif
-
-#ifdef CONFIG_X86_PM_TIMER
-extern struct init_timer_opts timer_pmtmr_init;
-#endif
#endif
Index: include/asm-i386/timex.h
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/include/asm-i386/timex.h (mode:100644)
+++ 0feb50d39a18b8a58ac2894eeac9b2f24a3b4435/include/asm-i386/timex.h (mode:100644)
@@ -16,6 +16,8 @@
#endif


+/* XXX - All of this should likely move elsewhere [email protected]*/
+
/*
* Standard way to access the cycle counter on i586+ CPUs.
* Currently only used on SMP.
@@ -48,5 +50,6 @@
}

extern unsigned long cpu_khz;
+extern void tsc_init(void);

#endif


2005-05-14 00:29:12

by john stultz

[permalink] [raw]
Subject: [RFC][PATCH (3/7)] new timeofday x86-64 specific changes (v A5)

All,
This patch converts the x86-64 arch to use the new timeofday
infrastructure. It applies on top of my linux-2.6.12-rc4_timeofday-
core_A5 patch. This is a full conversion, so most of this patch is
subtractions removing the existing arch specific time keeping code. This
patch does not provide any x86-64 timesources, so using this patch alone
ontop of the timeofday-core patch will only give you the jiffies
timesource. To get full replacements for the code being removed here,
the following timeofday-timesources-i386 patch (x86-64 shares the same
timesources as i386) will need to be applied.

I would like to send this patch along with the timeofday-core patch to
Andrew at the end of this month for testing in his tree. So please, if
you have any complaints, suggestions, or blocking issues, let me know.


New in this version:
o This patch was broken out of the multi-arch timeofday-arch_A4 patch
o Removed #ifdefs and fully converted x86-64 to use the new timeofday
code.
o Integrated the proof of concept vsyscall implementation

Todo Items:
o Further cleanups and re-arranging in arch/x86-64/kernel/time.c of the
remnants of the arch specific time keeping code
o Further removal of the old vsyscall code.

thanks
-john

linux-2.6.12-rc4_timeofday-arch-x86-64_A5.patch
===============================================
Index: arch/i386/kernel/acpi/boot.c
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/arch/i386/kernel/acpi/boot.c (mode:100644)
+++ 59012af04a74f0dbf82461c74469537b90e1c8ed/arch/i386/kernel/acpi/boot.c (mode:100644)
@@ -547,7 +547,7 @@


#ifdef CONFIG_HPET_TIMER
-
+#include <asm/hpet.h>
static int __init acpi_parse_hpet(unsigned long phys, unsigned long size)
{
struct acpi_table_hpet *hpet_tbl;
@@ -570,18 +570,12 @@
#ifdef CONFIG_X86_64
vxtime.hpet_address = hpet_tbl->addr.addrl |
((long) hpet_tbl->addr.addrh << 32);
-
- printk(KERN_INFO PREFIX "HPET id: %#x base: %#lx\n",
- hpet_tbl->id, vxtime.hpet_address);
+ hpet_address = vxtime.hpet_address;
#else /* X86 */
- {
- extern unsigned long hpet_address;
-
hpet_address = hpet_tbl->addr.addrl;
+#endif /* X86 */
printk(KERN_INFO PREFIX "HPET id: %#x base: %#lx\n",
hpet_tbl->id, hpet_address);
- }
-#endif /* X86 */

return 0;
}
Index: arch/x86_64/Kconfig
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/arch/x86_64/Kconfig (mode:100644)
+++ 59012af04a74f0dbf82461c74469537b90e1c8ed/arch/x86_64/Kconfig (mode:100644)
@@ -24,6 +24,14 @@
bool
default y

+config NEWTOD
+ bool
+ default y
+
+config NEWTOD_VSYSCALL
+ bool
+ default y
+
config MMU
bool
default y
Index: arch/x86_64/kernel/time.c
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/arch/x86_64/kernel/time.c (mode:100644)
+++ 59012af04a74f0dbf82461c74469537b90e1c8ed/arch/x86_64/kernel/time.c (mode:100644)
@@ -35,6 +35,7 @@
#include <asm/sections.h>
#include <linux/cpufreq.h>
#include <linux/hpet.h>
+#include <linux/timeofday.h>
#ifdef CONFIG_X86_LOCAL_APIC
#include <asm/apic.h>
#endif
@@ -58,6 +59,7 @@
#undef HPET_HACK_ENABLE_DANGEROUS

unsigned int cpu_khz; /* TSC clocks / usec, not used here */
+unsigned long hpet_address;
static unsigned long hpet_period; /* fsecs / HPET clock */
unsigned long hpet_tick; /* HPET clocks / interrupt */
unsigned long vxtime_hz = PIT_TICK_RATE;
@@ -79,108 +81,6 @@
rdtscll(*tsc);
}

-/*
- * do_gettimeoffset() returns microseconds since last timer interrupt was
- * triggered by hardware. A memory read of HPET is slower than a register read
- * of TSC, but much more reliable. It's also synchronized to the timer
- * interrupt. Note that do_gettimeoffset() may return more than hpet_tick, if a
- * timer interrupt has happened already, but vxtime.trigger wasn't updated yet.
- * This is not a problem, because jiffies hasn't updated either. They are bound
- * together by xtime_lock.
- */
-
-static inline unsigned int do_gettimeoffset_tsc(void)
-{
- unsigned long t;
- unsigned long x;
- rdtscll_sync(&t);
- if (t < vxtime.last_tsc) t = vxtime.last_tsc; /* hack */
- x = ((t - vxtime.last_tsc) * vxtime.tsc_quot) >> 32;
- return x;
-}
-
-static inline unsigned int do_gettimeoffset_hpet(void)
-{
- return ((hpet_readl(HPET_COUNTER) - vxtime.last) * vxtime.quot) >> 32;
-}
-
-unsigned int (*do_gettimeoffset)(void) = do_gettimeoffset_tsc;
-
-/*
- * This version of gettimeofday() has microsecond resolution and better than
- * microsecond precision, as we're using at least a 10 MHz (usually 14.31818
- * MHz) HPET timer.
- */
-
-void do_gettimeofday(struct timeval *tv)
-{
- unsigned long seq, t;
- unsigned int sec, usec;
-
- do {
- seq = read_seqbegin(&xtime_lock);
-
- sec = xtime.tv_sec;
- usec = xtime.tv_nsec / 1000;
-
- /* i386 does some correction here to keep the clock
- monotonous even when ntpd is fixing drift.
- But they didn't work for me, there is a non monotonic
- clock anyways with ntp.
- I dropped all corrections now until a real solution can
- be found. Note when you fix it here you need to do the same
- in arch/x86_64/kernel/vsyscall.c and export all needed
- variables in vmlinux.lds. -AK */
-
- t = (jiffies - wall_jiffies) * (1000000L / HZ) +
- do_gettimeoffset();
- usec += t;
-
- } while (read_seqretry(&xtime_lock, seq));
-
- tv->tv_sec = sec + usec / 1000000;
- tv->tv_usec = usec % 1000000;
-}
-
-EXPORT_SYMBOL(do_gettimeofday);
-
-/*
- * settimeofday() first undoes the correction that gettimeofday would do
- * on the time, and then saves it. This is ugly, but has been like this for
- * ages already.
- */
-
-int do_settimeofday(struct timespec *tv)
-{
- time_t wtm_sec, sec = tv->tv_sec;
- long wtm_nsec, nsec = tv->tv_nsec;
-
- if ((unsigned long)tv->tv_nsec >= NSEC_PER_SEC)
- return -EINVAL;
-
- write_seqlock_irq(&xtime_lock);
-
- nsec -= do_gettimeoffset() * 1000 +
- (jiffies - wall_jiffies) * (NSEC_PER_SEC/HZ);
-
- wtm_sec = wall_to_monotonic.tv_sec + (xtime.tv_sec - sec);
- wtm_nsec = wall_to_monotonic.tv_nsec + (xtime.tv_nsec - nsec);
-
- set_normalized_timespec(&xtime, sec, nsec);
- set_normalized_timespec(&wall_to_monotonic, wtm_sec, wtm_nsec);
-
- time_adjust = 0; /* stop active adjtime() */
- time_status |= STA_UNSYNC;
- time_maxerror = NTP_PHASE_LIMIT;
- time_esterror = NTP_PHASE_LIMIT;
-
- write_sequnlock_irq(&xtime_lock);
- clock_was_set();
- return 0;
-}
-
-EXPORT_SYMBOL(do_settimeofday);
-
unsigned long profile_pc(struct pt_regs *regs)
{
unsigned long pc = instruction_pointer(regs);
@@ -280,90 +180,8 @@
spin_unlock(&rtc_lock);
}

-
-/* monotonic_clock(): returns # of nanoseconds passed since time_init()
- * Note: This function is required to return accurate
- * time even in the absence of multiple timer ticks.
- */
-unsigned long long monotonic_clock(void)
-{
- unsigned long seq;
- u32 last_offset, this_offset, offset;
- unsigned long long base;
-
- if (vxtime.mode == VXTIME_HPET) {
- do {
- seq = read_seqbegin(&xtime_lock);
-
- last_offset = vxtime.last;
- base = monotonic_base;
- this_offset = hpet_readl(HPET_T0_CMP) - hpet_tick;
-
- } while (read_seqretry(&xtime_lock, seq));
- offset = (this_offset - last_offset);
- offset *=(NSEC_PER_SEC/HZ)/hpet_tick;
- return base + offset;
- }else{
- do {
- seq = read_seqbegin(&xtime_lock);
-
- last_offset = vxtime.last_tsc;
- base = monotonic_base;
- } while (read_seqretry(&xtime_lock, seq));
- sync_core();
- rdtscll(this_offset);
- offset = (this_offset - last_offset)*1000/cpu_khz;
- return base + offset;
- }
-
-
-}
-EXPORT_SYMBOL(monotonic_clock);
-
-static noinline void handle_lost_ticks(int lost, struct pt_regs *regs)
-{
- static long lost_count;
- static int warned;
-
- if (report_lost_ticks) {
- printk(KERN_WARNING "time.c: Lost %d timer "
- "tick(s)! ", lost);
- print_symbol("rip %s)\n", regs->rip);
- }
-
- if (lost_count == 1000 && !warned) {
- printk(KERN_WARNING
- "warning: many lost ticks.\n"
- KERN_WARNING "Your time source seems to be instable or "
- "some driver is hogging interupts\n");
- print_symbol("rip %s\n", regs->rip);
- if (vxtime.mode == VXTIME_TSC && vxtime.hpet_address) {
- printk(KERN_WARNING "Falling back to HPET\n");
- vxtime.last = hpet_readl(HPET_T0_CMP) - hpet_tick;
- vxtime.mode = VXTIME_HPET;
- do_gettimeoffset = do_gettimeoffset_hpet;
- }
- /* else should fall back to PIT, but code missing. */
- warned = 1;
- } else
- lost_count++;
-
-#ifdef CONFIG_CPU_FREQ
- /* In some cases the CPU can change frequency without us noticing
- (like going into thermal throttle)
- Give cpufreq a change to catch up. */
- if ((lost_count+1) % 25 == 0) {
- cpufreq_delayed_get();
- }
-#endif
-}
-
static irqreturn_t timer_interrupt(int irq, void *dev_id, struct pt_regs *regs)
{
- static unsigned long rtc_update = 0;
- unsigned long tsc;
- int delay, offset = 0, lost = 0;
-
/*
* Here we are in the timer irq handler. We have irqs locally disabled (so we
* don't need spin_lock_irqsave()) but we don't know if the timer_bh is running
@@ -373,56 +191,6 @@

write_seqlock(&xtime_lock);

- if (vxtime.hpet_address) {
- offset = hpet_readl(HPET_T0_CMP) - hpet_tick;
- delay = hpet_readl(HPET_COUNTER) - offset;
- } else {
- spin_lock(&i8253_lock);
- outb_p(0x00, 0x43);
- delay = inb_p(0x40);
- delay |= inb(0x40) << 8;
- spin_unlock(&i8253_lock);
- delay = LATCH - 1 - delay;
- }
-
- rdtscll_sync(&tsc);
-
- if (vxtime.mode == VXTIME_HPET) {
- if (offset - vxtime.last > hpet_tick) {
- lost = (offset - vxtime.last) / hpet_tick - 1;
- }
-
- monotonic_base +=
- (offset - vxtime.last)*(NSEC_PER_SEC/HZ) / hpet_tick;
-
- vxtime.last = offset;
- } else {
- offset = (((tsc - vxtime.last_tsc) *
- vxtime.tsc_quot) >> 32) - (USEC_PER_SEC / HZ);
-
- if (offset < 0)
- offset = 0;
-
- if (offset > (USEC_PER_SEC / HZ)) {
- lost = offset / (USEC_PER_SEC / HZ);
- offset %= (USEC_PER_SEC / HZ);
- }
-
- monotonic_base += (tsc - vxtime.last_tsc)*1000000/cpu_khz ;
-
- vxtime.last_tsc = tsc - vxtime.quot * delay / vxtime.tsc_quot;
-
- if ((((tsc - vxtime.last_tsc) *
- vxtime.tsc_quot) >> 32) < offset)
- vxtime.last_tsc = tsc -
- (((long) offset << 32) / vxtime.tsc_quot) - 1;
- }
-
- if (lost > 0) {
- handle_lost_ticks(lost, regs);
- jiffies += lost;
- }
-
/*
* Do the timer stuff.
*/
@@ -445,20 +213,6 @@
smp_local_timer_interrupt(regs);
#endif

-/*
- * If we have an externally synchronized Linux clock, then update CMOS clock
- * accordingly every ~11 minutes. set_rtc_mmss() will be called in the jiffy
- * closest to exactly 500 ms before the next second. If the update fails, we
- * don't care, as it'll be updated on the next turn, and the problem (time way
- * off) isn't likely to go away much sooner anyway.
- */
-
- if ((~time_status & STA_UNSYNC) && xtime.tv_sec > rtc_update &&
- abs(xtime.tv_nsec - 500000000) <= tick_nsec / 2) {
- set_rtc_mmss(xtime.tv_sec);
- rtc_update = xtime.tv_sec + 660;
- }
-
write_sequnlock(&xtime_lock);

return IRQ_HANDLED;
@@ -559,6 +313,30 @@
return mktime(year, mon, day, hour, min, sec);
}

+/* arch specific timeofday hooks */
+nsec_t read_persistent_clock(void)
+{
+ return (nsec_t)get_cmos_time() * NSEC_PER_SEC;
+}
+
+void sync_persistent_clock(struct timespec ts)
+{
+ static unsigned long rtc_update = 0;
+ /*
+ * If we have an externally synchronized Linux clock, then update
+ * CMOS clock accordingly every ~11 minutes. set_rtc_mmss() will
+ * be called in the jiffy closest to exactly 500 ms before the
+ * next second. If the update fails, we don't care, as it'll be
+ * updated on the next turn, and the problem (time way off) isn't
+ * likely to go away much sooner anyway.
+ */
+ if (ts.tv_sec > rtc_update &&
+ abs(ts.tv_nsec - 500000000) <= tick_nsec / 2) {
+ set_rtc_mmss(xtime.tv_sec);
+ rtc_update = xtime.tv_sec + 660;
+ }
+}
+
#ifdef CONFIG_CPU_FREQ

/* Frequency scaling support. Adjust the TSC based timer when the cpu frequency
@@ -927,8 +705,6 @@
*/
void __init time_init_gtod(void)
{
- char *timetype;
-
/*
* AMD systems with more than one CPU don't have fully synchronized
* TSCs. Always use HPET gettimeofday for these, although it is slower.
@@ -947,17 +723,6 @@
/* Some systems will want to disable TSC and use HPET. */
if (oem_force_hpet_timer())
notsc = 1;
- if (vxtime.hpet_address && notsc) {
- timetype = "HPET";
- vxtime.last = hpet_readl(HPET_T0_CMP) - hpet_tick;
- vxtime.mode = VXTIME_HPET;
- do_gettimeoffset = do_gettimeoffset_hpet;
- } else {
- timetype = vxtime.hpet_address ? "HPET/TSC" : "PIT/TSC";
- vxtime.mode = VXTIME_TSC;
- }
-
- printk(KERN_INFO "time.c: Using %s based timekeeping.\n", timetype);
}

__setup("report_lost_ticks", time_setup);
@@ -980,7 +745,6 @@

static int timer_resume(struct sys_device *dev)
{
- unsigned long flags;
unsigned long sec;
unsigned long ctime = get_cmos_time();
unsigned long sleep_length = (ctime - sleep_start) * HZ;
@@ -991,10 +755,6 @@
i8254_timer_resume();

sec = ctime + clock_cmos_diff;
- write_seqlock_irqsave(&xtime_lock,flags);
- xtime.tv_sec = sec;
- xtime.tv_nsec = 0;
- write_sequnlock_irqrestore(&xtime_lock,flags);
jiffies += sleep_length;
wall_jiffies += sleep_length;
return 0;
Index: arch/x86_64/kernel/vmlinux.lds.S
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/arch/x86_64/kernel/vmlinux.lds.S (mode:100644)
+++ 59012af04a74f0dbf82461c74469537b90e1c8ed/arch/x86_64/kernel/vmlinux.lds.S (mode:100644)
@@ -71,6 +71,13 @@
. = ALIGN(CONFIG_X86_L1_CACHE_BYTES);
.jiffies : AT CACHE_ALIGN(AFTER(.xtime)) { *(.jiffies) }
jiffies = LOADADDR(.jiffies);
+
+ .vsyscall_gtod_data : AT AFTER(.jiffies) { *(.vsyscall_gtod_data) }
+ vsyscall_gtod_data = LOADADDR(.vsyscall_gtod_data);
+ .vsyscall_gtod_lock : AT AFTER(.vsyscall_gtod_data) { *(.vsyscall_gtod_lock) }
+ vsyscall_gtod_lock = LOADADDR(.vsyscall_gtod_lock);
+
+
.vsyscall_1 ADDR(.vsyscall_0) + 1024: AT (LOADADDR(.vsyscall_0) + 1024) { *(.vsyscall_1) }
. = LOADADDR(.vsyscall_0) + 4096;

Index: arch/x86_64/kernel/vsyscall.c
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/arch/x86_64/kernel/vsyscall.c (mode:100644)
+++ 59012af04a74f0dbf82461c74469537b90e1c8ed/arch/x86_64/kernel/vsyscall.c (mode:100644)
@@ -19,6 +19,8 @@
* want per guest time just set the kernel.vsyscall64 sysctl to 0.
*/

+#include <linux/timeofday.h>
+#include <linux/timesource.h>
#include <linux/time.h>
#include <linux/init.h>
#include <linux/kernel.h>
@@ -40,6 +42,21 @@
int __sysctl_vsyscall __section_sysctl_vsyscall = 1;
seqlock_t __xtime_lock __section_xtime_lock = SEQLOCK_UNLOCKED;

+
+struct vsyscall_gtod_data_t {
+ struct timeval wall_time_tv;
+ struct timezone sys_tz;
+ cycle_t offset_base;
+ struct timesource_t timesource;
+};
+
+extern struct vsyscall_gtod_data_t vsyscall_gtod_data;
+struct vsyscall_gtod_data_t __vsyscall_gtod_data __section_vsyscall_gtod_data;
+
+extern seqlock_t vsyscall_gtod_lock;
+seqlock_t __vsyscall_gtod_lock __section_vsyscall_gtod_lock = SEQLOCK_UNLOCKED;
+
+
#include <asm/unistd.h>

static force_inline void timeval_normalize(struct timeval * tv)
@@ -53,40 +70,54 @@
}
}

-static force_inline void do_vgettimeofday(struct timeval * tv)
+/* XXX - this is ugly. gettimeofday() has a label in it so we can't
+ call it twice.
+ */
+static force_inline int syscall_gtod(struct timeval *tv, struct timezone *tz)
{
- long sequence, t;
- unsigned long sec, usec;
-
+ int ret;
+ asm volatile("syscall"
+ : "=a" (ret)
+ : "0" (__NR_gettimeofday),"D" (tv),"S" (tz) : __syscall_clobber );
+ return ret;
+}
+static force_inline void do_vgettimeofday(struct timeval* tv)
+{
+ cycle_t now, cycle_delta;
+ nsec_t nsec_delta;
+ unsigned long seq;
do {
- sequence = read_seqbegin(&__xtime_lock);
-
- sec = __xtime.tv_sec;
- usec = (__xtime.tv_nsec / 1000) +
- (__jiffies - __wall_jiffies) * (1000000 / HZ);
-
- if (__vxtime.mode == VXTIME_TSC) {
- sync_core();
- rdtscll(t);
- if (t < __vxtime.last_tsc)
- t = __vxtime.last_tsc;
- usec += ((t - __vxtime.last_tsc) *
- __vxtime.tsc_quot) >> 32;
- /* See comment in x86_64 do_gettimeofday. */
- } else {
- usec += ((readl((void *)fix_to_virt(VSYSCALL_HPET) + 0xf0) -
- __vxtime.last) * __vxtime.quot) >> 32;
+ seq = read_seqbegin(&__vsyscall_gtod_lock);
+
+ if (__vsyscall_gtod_data.timesource.type == TIMESOURCE_FUNCTION) {
+ syscall_gtod(tv, NULL);
+ return;
}
- } while (read_seqretry(&__xtime_lock, sequence));

- tv->tv_sec = sec + usec / 1000000;
- tv->tv_usec = usec % 1000000;
+ /* read the timeosurce and calc cycle_delta */
+ now = read_timesource(&__vsyscall_gtod_data.timesource);
+ cycle_delta = (now - __vsyscall_gtod_data.offset_base)
+ & __vsyscall_gtod_data.timesource.mask;
+
+ /* convert cycles to nsecs */
+ nsec_delta = cycle_delta * __vsyscall_gtod_data.timesource.mult;
+ nsec_delta = nsec_delta >> __vsyscall_gtod_data.timesource.shift;
+
+ /* add nsec offset to wall_time_tv */
+ *tv = __vsyscall_gtod_data.wall_time_tv;
+ do_div(nsec_delta, NSEC_PER_USEC);
+ tv->tv_usec += (unsigned long) nsec_delta;
+ while (tv->tv_usec > USEC_PER_SEC) {
+ tv->tv_sec += 1;
+ tv->tv_usec -= USEC_PER_SEC;
+ }
+ } while (read_seqretry(&__vsyscall_gtod_lock, seq));
}

/* RED-PEN may want to readd seq locking, but then the variable should be write-once. */
static force_inline void do_get_tz(struct timezone * tz)
{
- *tz = __sys_tz;
+ *tz = __vsyscall_gtod_data.sys_tz;
}

static force_inline int gettimeofday(struct timeval *tv, struct timezone *tz)
@@ -118,15 +149,15 @@
return 0;
}

-/* This will break when the xtime seconds get inaccurate, but that is
- * unlikely */
static time_t __vsyscall(1) vtime(time_t *t)
{
+ struct timeval tv;
if (unlikely(!__sysctl_vsyscall))
return time_syscall(t);
- else if (t)
- *t = __xtime.tv_sec;
- return __xtime.tv_sec;
+ vgettimeofday(&tv, 0);
+ if (t)
+ *t = tv.tv_sec;
+ return tv.tv_sec;
}

static long __vsyscall(2) venosys_0(void)
@@ -139,6 +170,48 @@
return -ENOSYS;
}

+struct timesource_t* curr_timesource;
+
+void arch_update_vsyscall_gtod(nsec_t wall_time, cycle_t offset_base,
+ struct timesource_t* timesource, int ntp_adj)
+{
+ unsigned long flags;
+
+ write_seqlock_irqsave(&vsyscall_gtod_lock, flags);
+
+ /* XXX - hackitty hack hack. this is terrible! */
+ if (curr_timesource != timesource) {
+ if ((timesource->type == TIMESOURCE_MMIO_32)
+ || (timesource->type == TIMESOURCE_MMIO_64)) {
+ unsigned long vaddr = (unsigned long)timesource->mmio_ptr;
+ pgd_t *pgd = pgd_offset_k(vaddr);
+ pud_t *pud = pud_offset(pgd, vaddr);
+ pmd_t *pmd = pmd_offset(pud,vaddr);
+ pte_t *pte = pte_offset_kernel(pmd, vaddr);
+ *pte = pte_mkread(*pte);
+ }
+ curr_timesource = timesource;
+ }
+
+ /* save off wall time as timeval */
+ vsyscall_gtod_data.wall_time_tv = ns2timeval(wall_time);
+
+ /* save offset_base */
+ vsyscall_gtod_data.offset_base = offset_base;
+
+ /* copy current timesource */
+ vsyscall_gtod_data.timesource = *timesource;
+
+ /* apply ntp adjustment to timesource mult */
+ vsyscall_gtod_data.timesource.mult += ntp_adj;
+
+ /* save off current timezone */
+ vsyscall_gtod_data.sys_tz = sys_tz;
+
+ write_sequnlock_irqrestore(&vsyscall_gtod_lock, flags);
+
+}
+
#ifdef CONFIG_SYSCTL

#define SYSCALL 0x050f
Index: include/asm-generic/div64.h
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/include/asm-generic/div64.h (mode:100644)
+++ 59012af04a74f0dbf82461c74469537b90e1c8ed/include/asm-generic/div64.h (mode:100644)
@@ -55,4 +55,13 @@

#endif /* BITS_PER_LONG */

+#ifndef div_long_long_rem
+#define div_long_long_rem(dividend,divisor,remainder) \
+({ \
+ u64 result = dividend; \
+ *remainder = do_div(result,divisor); \
+ result; \
+})
+#endif
+
#endif /* _ASM_GENERIC_DIV64_H */
Index: include/asm-x86_64/hpet.h
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/include/asm-x86_64/hpet.h (mode:100644)
+++ 59012af04a74f0dbf82461c74469537b90e1c8ed/include/asm-x86_64/hpet.h (mode:100644)
@@ -1,6 +1,6 @@
#ifndef _ASM_X8664_HPET_H
#define _ASM_X8664_HPET_H 1
-
+#include <asm/fixmap.h>
/*
* Documentation on HPET can be found at:
* http://www.intel.com/ial/home/sp/pcmmspec.htm
@@ -44,6 +44,7 @@
#define HPET_TN_SETVAL 0x040
#define HPET_TN_32BIT 0x100

+extern unsigned long hpet_address; /* hpet memory map physical address */
extern int is_hpet_enabled(void);
extern int hpet_rtc_timer_init(void);
extern int oem_force_hpet_timer(void);
Index: include/asm-x86_64/vsyscall.h
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/include/asm-x86_64/vsyscall.h (mode:100644)
+++ 59012af04a74f0dbf82461c74469537b90e1c8ed/include/asm-x86_64/vsyscall.h (mode:100644)
@@ -22,6 +22,8 @@
#define __section_sysctl_vsyscall __attribute__ ((unused, __section__ (".sysctl_vsyscall"), aligned(16)))
#define __section_xtime __attribute__ ((unused, __section__ (".xtime"), aligned(16)))
#define __section_xtime_lock __attribute__ ((unused, __section__ (".xtime_lock"), aligned(16)))
+#define __section_vsyscall_gtod_data __attribute__ ((unused, __section__ (".vsyscall_gtod_data"),aligned(16)))
+#define __section_vsyscall_gtod_lock __attribute__ ((unused, __section__ (".vsyscall_gtod_lock"),aligned(16)))

#define VXTIME_TSC 1
#define VXTIME_HPET 2


2005-05-14 00:32:20

by john stultz

[permalink] [raw]
Subject: [RFC][PATCH (4/7)] new timeofday i386 and x86-64 timesources (v A5)

All,
This patch implements the time sources shared between i386 and x86-64
(acpi_pm, cyclone, hpet, pit, tsc and tsc-interp). The patch should
apply ontop of either linux-2.6.12-rc4_timeofday-arch-x86-64_A5.patch or
linux-2.6.12-rc4_timeofday-arch-i386_A5.patch.

The patch should be fairly straight forward, only adding the new
timesources.

I intend to send this patch along with the timeofday-core and
timeofday-arch-i386 patches to Andrew at the end of this month for
testing in his tree. So please, if you have any complaints, suggestions,
or blocking issues, let me know.

New in this release:
o i386_pit timesource has been fixed and properly works now
o New code to handle TSC stalls on cpus that halt the TSC in C3 mode
o A new TSC/Jiffies interpolation example.

This new tsc-interp timesource provides the same interpolated
timekeeping method as was used previously in the i386 arch specific time
code. This should also quiet fears about the TSC timesource not being
stable enough for embedded systems.

Items still on the TODO list:
o make acpi-pm and pit x86-64 compatible
o make cyclone ia64 generic

I look forward to your comments and feedback.

thanks
-john

linux-2.6.12-rc4_timeofday-timesources-i386_A5.patch
====================================================
Index: drivers/acpi/processor_idle.c
===================================================================
--- 0feb50d39a18b8a58ac2894eeac9b2f24a3b4435/drivers/acpi/processor_idle.c (mode:100644)
+++ 3b4165efeade40b65ea2e8188184e4f8d3d8d636/drivers/acpi/processor_idle.c (mode:100644)
@@ -162,6 +162,7 @@
return;
}

+extern void tsc_c3_compensate(unsigned long usecs);

static void acpi_processor_idle (void)
{
@@ -309,6 +310,10 @@
t2 = inl(acpi_fadt.xpm_tmr_blk.address);
/* Enable bus master arbitration */
acpi_set_register(ACPI_BITREG_ARB_DISABLE, 0, ACPI_MTX_DO_NOT_LOCK);
+
+ /* compensate for TSC pause */
+ tsc_c3_compensate(((t2-t1)*286)>>10);
+
/* Re-enable interrupts */
local_irq_enable();
/* Compute time (ticks) that we were actually asleep */
Index: drivers/timesource/Makefile
===================================================================
--- 0feb50d39a18b8a58ac2894eeac9b2f24a3b4435/drivers/timesource/Makefile (mode:100644)
+++ 3b4165efeade40b65ea2e8188184e4f8d3d8d636/drivers/timesource/Makefile (mode:100644)
@@ -1 +1,7 @@
obj-y += jiffies.o
+obj-$(CONFIG_X86) += tsc.o
+obj-$(CONFIG_X86) += i386_pit.o
+obj-$(CONFIG_X86) += tsc-interp.o
+obj-$(CONFIG_X86_CYCLONE_TIMER) += cyclone.o
+obj-$(CONFIG_X86_PM_TIMER) += acpi_pm.o
+obj-$(CONFIG_HPET_TIMER) += hpet.o
Index: drivers/timesource/acpi_pm.c
===================================================================
--- /dev/null (tree:0feb50d39a18b8a58ac2894eeac9b2f24a3b4435)
+++ 3b4165efeade40b65ea2e8188184e4f8d3d8d636/drivers/timesource/acpi_pm.c (mode:100644)
@@ -0,0 +1,135 @@
+/*
+ * linux/drivers/timesource/acpi_pm.c
+ *
+ * This file contains the ACPI PM based time source.
+ *
+ * This code was largely moved from the i386 timer_pm.c file
+ * which was (C) Dominik Brodowski <[email protected]> 2003
+ * and contained the following comments:
+ *
+ * Driver to use the Power Management Timer (PMTMR) available in some
+ * southbridges as primary timing source for the Linux kernel.
+ *
+ * Based on parts of linux/drivers/acpi/hardware/hwtimer.c, timer_pit.c,
+ * timer_hpet.c, and on Arjan van de Ven's implementation for 2.4.
+ *
+ * This file is licensed under the GPL v2.
+ */
+
+
+#include <linux/timesource.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <asm/io.h>
+#include "mach_timer.h"
+
+/* Number of PMTMR ticks expected during calibration run */
+#define PMTMR_TICKS_PER_SEC 3579545
+#define PMTMR_EXPECTED_RATE \
+ ((CALIBRATE_LATCH * (PMTMR_TICKS_PER_SEC >> 10)) / (CLOCK_TICK_RATE>>10))
+
+
+/* The I/O port the PMTMR resides at.
+ * The location is detected during setup_arch(),
+ * in arch/i386/acpi/boot.c */
+u32 pmtmr_ioport = 0;
+
+#define ACPI_PM_MASK 0xFFFFFF /* limit it to 24 bits */
+
+static inline u32 read_pmtmr(void)
+{
+ u32 v1=0,v2=0,v3=0;
+ /* It has been reported that because of various broken
+ * chipsets (ICH4, PIIX4 and PIIX4E) where the ACPI PM time
+ * source is not latched, so you must read it multiple
+ * times to insure a safe value is read.
+ */
+ do {
+ v1 = inl(pmtmr_ioport);
+ v2 = inl(pmtmr_ioport);
+ v3 = inl(pmtmr_ioport);
+ } while ((v1 > v2 && v1 < v3) || (v2 > v3 && v2 < v1)
+ || (v3 > v1 && v3 < v2));
+
+ /* mask the output to 24 bits */
+ return v2 & ACPI_PM_MASK;
+}
+
+
+static cycle_t acpi_pm_read(void)
+{
+ return (cycle_t)read_pmtmr();
+}
+
+struct timesource_t timesource_acpi_pm = {
+ .name = "acpi_pm",
+ .priority = 200,
+ .type = TIMESOURCE_FUNCTION,
+ .read_fnct = acpi_pm_read,
+ .mask = (cycle_t)ACPI_PM_MASK,
+ .mult = 0, /*to be caluclated*/
+ .shift = 22,
+};
+
+/*
+ * Some boards have the PMTMR running way too fast. We check
+ * the PMTMR rate against PIT channel 2 to catch these cases.
+ */
+static int __init verify_pmtmr_rate(void)
+{
+ u32 value1, value2;
+ unsigned long count, delta;
+
+ mach_prepare_counter();
+ value1 = read_pmtmr();
+ mach_countup(&count);
+ value2 = read_pmtmr();
+ delta = (value2 - value1) & ACPI_PM_MASK;
+
+ /* Check that the PMTMR delta is within 5% of what we expect */
+ if (delta < (PMTMR_EXPECTED_RATE * 19) / 20 ||
+ delta > (PMTMR_EXPECTED_RATE * 21) / 20) {
+ printk(KERN_INFO "PM-Timer running at invalid rate: %lu%% of normal - aborting.\n", 100UL * delta / PMTMR_EXPECTED_RATE);
+ return -1;
+ }
+
+ return 0;
+}
+
+
+static int __init init_acpi_pm_timesource(void)
+{
+ u32 value1, value2;
+ unsigned int i;
+
+ if (!pmtmr_ioport)
+ return -ENODEV;
+
+ timesource_acpi_pm.mult = timesource_hz2mult(PMTMR_TICKS_PER_SEC,
+ timesource_acpi_pm.shift);
+
+ /* "verify" this timing source */
+ value1 = read_pmtmr();
+ for (i = 0; i < 10000; i++) {
+ value2 = read_pmtmr();
+ if (value2 == value1)
+ continue;
+ if (value2 > value1)
+ goto pm_good;
+ if ((value2 < value1) && ((value2) < 0xFFF))
+ goto pm_good;
+ printk(KERN_INFO "PM-Timer had inconsistent results: 0x%#x, 0x%#x - aborting.\n", value1, value2);
+ return -EINVAL;
+ }
+ printk(KERN_INFO "PM-Timer had no reasonable result: 0x%#x - aborting.\n", value1);
+ return -ENODEV;
+
+pm_good:
+ if (verify_pmtmr_rate() != 0)
+ return -ENODEV;
+
+ register_timesource(&timesource_acpi_pm);
+ return 0;
+}
+
+module_init(init_acpi_pm_timesource);
Index: drivers/timesource/cyclone.c
===================================================================
--- /dev/null (tree:0feb50d39a18b8a58ac2894eeac9b2f24a3b4435)
+++ 3b4165efeade40b65ea2e8188184e4f8d3d8d636/drivers/timesource/cyclone.c (mode:100644)
@@ -0,0 +1,135 @@
+#include <linux/timesource.h>
+#include <linux/errno.h>
+#include <linux/string.h>
+#include <linux/timex.h>
+#include <linux/init.h>
+
+#include <asm/io.h>
+#include <asm/pgtable.h>
+#include "mach_timer.h"
+
+#define CYCLONE_CBAR_ADDR 0xFEB00CD0 /* base address ptr*/
+#define CYCLONE_PMCC_OFFSET 0x51A0 /* offset to control register */
+#define CYCLONE_MPCS_OFFSET 0x51A8 /* offset to select register */
+#define CYCLONE_MPMC_OFFSET 0x51D0 /* offset to count register */
+#define CYCLONE_TIMER_FREQ 100000000
+#define CYCLONE_TIMER_MASK (0xFFFFFFFF) /* 32 bit mask */
+
+int use_cyclone = 0;
+
+struct timesource_t timesource_cyclone = {
+ .name = "cyclone",
+ .priority = 100,
+ .type = TIMESOURCE_MMIO_32,
+ .mmio_ptr = NULL, /* to be set */
+ .mask = (cycle_t)CYCLONE_TIMER_MASK,
+ .mult = 10,
+ .shift = 0,
+};
+
+static unsigned long __init calibrate_cyclone(void)
+{
+ unsigned long start, end, delta;
+ unsigned long i, count;
+ unsigned long cyclone_freq_khz;
+
+ /* repeat 3 times to make sure the cache is warm */
+ for(i=0; i < 3; i++) {
+ mach_prepare_counter();
+ start = readl(timesource_cyclone.mmio_ptr);
+ mach_countup(&count);
+ end = readl(timesource_cyclone.mmio_ptr);
+ }
+
+ delta = end - start;
+ printk("cyclone delta: %lu\n", delta);
+ delta *= (ACTHZ/1000)>>8;
+ printk("delta*hz = %lu\n", delta);
+ cyclone_freq_khz = delta/CALIBRATE_ITERATION;
+ printk("calculated cyclone_freq: %lu khz\n", cyclone_freq_khz);
+ return cyclone_freq_khz;
+}
+
+static int __init init_cyclone_timesource(void)
+{
+ unsigned long base; /* saved value from CBAR */
+ unsigned long offset;
+ u32 __iomem* reg;
+ u32 __iomem* volatile cyclone_timer; /* Cyclone MPMC0 register */
+ unsigned long khz;
+ int i;
+
+ /*make sure we're on a summit box*/
+ if (!use_cyclone) return -ENODEV;
+
+ printk(KERN_INFO "Summit chipset: Starting Cyclone Counter.\n");
+
+ /* find base address */
+ offset = CYCLONE_CBAR_ADDR;
+ reg = ioremap_nocache(offset, sizeof(reg));
+ if(!reg){
+ printk(KERN_ERR "Summit chipset: Could not find valid CBAR register.\n");
+ return -ENODEV;
+ }
+ /* even on 64bit systems, this is only 32bits */
+ base = readl(reg);
+ if(!base){
+ printk(KERN_ERR "Summit chipset: Could not find valid CBAR value.\n");
+ return -ENODEV;
+ }
+ iounmap(reg);
+
+ /* setup PMCC */
+ offset = base + CYCLONE_PMCC_OFFSET;
+ reg = ioremap_nocache(offset, sizeof(reg));
+ if(!reg){
+ printk(KERN_ERR "Summit chipset: Could not find valid PMCC register.\n");
+ return -ENODEV;
+ }
+ writel(0x00000001,reg);
+ iounmap(reg);
+
+ /* setup MPCS */
+ offset = base + CYCLONE_MPCS_OFFSET;
+ reg = ioremap_nocache(offset, sizeof(reg));
+ if(!reg){
+ printk(KERN_ERR "Summit chipset: Could not find valid MPCS register.\n");
+ return -ENODEV;
+ }
+ writel(0x00000001,reg);
+ iounmap(reg);
+
+ /* map in cyclone_timer */
+ offset = base + CYCLONE_MPMC_OFFSET;
+ cyclone_timer = ioremap_nocache(offset, sizeof(u64));
+ if(!cyclone_timer){
+ printk(KERN_ERR "Summit chipset: Could not find valid MPMC register.\n");
+ return -ENODEV;
+ }
+
+ /*quick test to make sure its ticking*/
+ for(i=0; i<3; i++){
+ u32 old = readl(cyclone_timer);
+ int stall = 100;
+ while(stall--) barrier();
+ if(readl(cyclone_timer) == old){
+ printk(KERN_ERR "Summit chipset: Counter not counting! DISABLED\n");
+ iounmap(cyclone_timer);
+ cyclone_timer = NULL;
+ return -ENODEV;
+ }
+ }
+ timesource_cyclone.mmio_ptr = cyclone_timer;
+
+ /* sort out mult/shift values */
+ khz = calibrate_cyclone();
+ timesource_cyclone.shift = 22;
+ timesource_cyclone.mult = timesource_khz2mult(khz,
+ timesource_cyclone.shift);
+
+ register_timesource(&timesource_cyclone);
+
+ return 0;
+}
+
+module_init(init_cyclone_timesource);
Index: drivers/timesource/hpet.c
===================================================================
--- /dev/null (tree:0feb50d39a18b8a58ac2894eeac9b2f24a3b4435)
+++ 3b4165efeade40b65ea2e8188184e4f8d3d8d636/drivers/timesource/hpet.c (mode:100644)
@@ -0,0 +1,59 @@
+#include <linux/timesource.h>
+#include <linux/hpet.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <asm/io.h>
+#include <asm/hpet.h>
+
+#define HPET_MASK (0xFFFFFFFF)
+#define HPET_SHIFT 22
+
+/* FSEC = 10^-15 NSEC = 10^-9 */
+#define FSEC_PER_NSEC 1000000
+
+struct timesource_t timesource_hpet = {
+ .name = "hpet",
+ .priority = 300,
+ .type = TIMESOURCE_MMIO_32,
+ .mmio_ptr = NULL,
+ .mask = (cycle_t)HPET_MASK,
+ .mult = 0, /* set below */
+ .shift = HPET_SHIFT,
+};
+
+static int __init init_hpet_timesource(void)
+{
+ unsigned long hpet_period;
+ void __iomem* hpet_base;
+ u64 tmp;
+
+ if (!hpet_address)
+ return -ENODEV;
+
+ /* calculate the hpet address */
+ hpet_base =
+ (void __iomem*)ioremap_nocache(hpet_address, HPET_MMAP_SIZE);
+ timesource_hpet.mmio_ptr = hpet_base + HPET_COUNTER;
+
+ /* calculate the frequency */
+ hpet_period = readl(hpet_base + HPET_PERIOD);
+
+
+ /* hpet period is in femto seconds per cycle
+ * so we need to convert this to ns/cyc units
+ * aproximated by mult/2^shift
+ *
+ * fsec/cyc * 1nsec/1000000fsec = nsec/cyc = mult/2^shift
+ * fsec/cyc * 1ns/1000000fsec * 2^shift = mult
+ * fsec/cyc * 2^shift * 1nsec/1000000fsec = mult
+ * (fsec/cyc << shift)/1000000 = mult
+ * (hpet_period << shift)/FSEC_PER_NSEC = mult
+ */
+ tmp = (u64)hpet_period << HPET_SHIFT;
+ do_div(tmp, FSEC_PER_NSEC);
+ timesource_hpet.mult = (u32)tmp;
+
+ register_timesource(&timesource_hpet);
+ return 0;
+}
+module_init(init_hpet_timesource);
Index: drivers/timesource/i386_pit.c
===================================================================
--- /dev/null (tree:0feb50d39a18b8a58ac2894eeac9b2f24a3b4435)
+++ 3b4165efeade40b65ea2e8188184e4f8d3d8d636/drivers/timesource/i386_pit.c (mode:100644)
@@ -0,0 +1,64 @@
+#include <linux/timesource.h>
+#include <linux/jiffies.h>
+#include <linux/init.h>
+/* pit timesource does not build on x86-64 */
+#ifndef CONFIG_X86_64
+#include <asm/io.h>
+#include "io_ports.h"
+
+extern spinlock_t i8253_lock;
+
+/* Since the PIT overflows every tick, its not very useful
+ * to just read by itself. So use jiffies to emulate a free
+ * running counter.
+ */
+
+static cycle_t pit_read(void)
+{
+ unsigned long flags, seq;
+ int count;
+ u64 jifs;
+
+ do {
+ seq = read_seqbegin(&xtime_lock);
+
+ spin_lock_irqsave(&i8253_lock, flags);
+
+ outb_p(0x00, PIT_MODE); /* latch the count ASAP */
+ count = inb_p(PIT_CH0); /* read the latched count */
+ count |= inb_p(PIT_CH0) << 8;
+
+ /* VIA686a test code... reset the latch if count > max + 1 */
+ if (count > LATCH) {
+ outb_p(0x34, PIT_MODE);
+ outb_p(LATCH & 0xff, PIT_CH0);
+ outb(LATCH >> 8, PIT_CH0);
+ count = LATCH - 1;
+ }
+ spin_unlock_irqrestore(&i8253_lock, flags);
+ jifs = get_jiffies_64();
+ } while (read_seqretry(&xtime_lock, seq));
+
+ count = (LATCH-1) - count;
+
+ return (cycle_t)(jifs * LATCH) + count;
+}
+
+static struct timesource_t timesource_pit = {
+ .name = "pit",
+ .priority = 0,
+ .type = TIMESOURCE_FUNCTION,
+ .read_fnct = pit_read,
+ .mask = (cycle_t)-1,
+ .mult = 0,
+ .shift = 20,
+};
+
+static int __init init_pit_timesource(void)
+{
+ timesource_pit.mult = timesource_hz2mult(CLOCK_TICK_RATE, 20);
+ register_timesource(&timesource_pit);
+ return 0;
+}
+module_init(init_pit_timesource);
+#endif
Index: drivers/timesource/tsc-interp.c
===================================================================
--- /dev/null (tree:0feb50d39a18b8a58ac2894eeac9b2f24a3b4435)
+++ 3b4165efeade40b65ea2e8188184e4f8d3d8d636/drivers/timesource/tsc-interp.c (mode:100644)
@@ -0,0 +1,108 @@
+/* TSC-Jiffies Interpolation timesource
+ Example interpolation timesource.
+TODO:
+ o per-cpu TSC offsets
+*/
+#include <linux/timesource.h>
+#include <linux/timer.h>
+#include <linux/timex.h>
+#include <linux/init.h>
+#include <linux/jiffies.h>
+#include <linux/threads.h>
+#include <linux/smp.h>
+
+static unsigned long current_cpu_khz = 0;
+
+static seqlock_t tsc_interp_lock = SEQLOCK_UNLOCKED;
+static cycle_t tsc_then;
+static cycle_t jiffies_then;
+struct timer_list tsc_interp_timer;
+
+static unsigned long mult, shift;
+
+#define NSEC_PER_JIFFY ((((unsigned long long)NSEC_PER_SEC)<<8)/ACTHZ)
+#define SHIFT_VAL 22
+
+static cycle_t read_tsc_interp(void);
+static void tsc_interp_update_callback(void);
+
+static struct timesource_t timesource_tsc_interp = {
+ .name = "tsc-interp",
+ .priority = 20,
+ .type = TIMESOURCE_FUNCTION,
+ .read_fnct = read_tsc_interp,
+ .mask = (cycle_t)-1,
+ .mult = 1<<SHIFT_VAL,
+ .shift = SHIFT_VAL,
+ .update_callback = tsc_interp_update_callback,
+};
+
+static void tsc_interp_sync(unsigned long unused)
+{
+ cycle_t tsc_now;
+ u64 jiffies_now;
+
+ do {
+ jiffies_now = get_jiffies_64();
+ rdtscll(tsc_now);
+ } while (jiffies_now != get_jiffies_64());
+
+ write_seqlock(&tsc_interp_lock);
+ jiffies_then = jiffies_now;
+ tsc_then = tsc_now;
+ write_sequnlock(&tsc_interp_lock);
+
+ mod_timer(&tsc_interp_timer, jiffies+1);
+}
+
+
+static cycle_t read_tsc_interp(void)
+{
+ cycle_t ret;
+ cycle_t now, then;
+ u64 jiffs;
+ unsigned long seq;
+
+ do {
+ seq = read_seqbegin(&tsc_interp_lock);
+
+ jiffs = jiffies_then;
+ then = tsc_then;
+
+ } while (read_seqretry(&tsc_interp_lock, seq));
+
+ rdtscll(now);
+ ret = jiffs * NSEC_PER_JIFFY;
+ ret += min((cycle_t)NSEC_PER_JIFFY,(cycle_t)((now - then)*mult)>> shift);
+
+ return ret;
+}
+
+static void tsc_interp_update_callback(void)
+{
+ /* only update if cpu_khz has changed */
+ if (current_cpu_khz != cpu_khz){
+ current_cpu_khz = cpu_khz;
+ mult = timesource_khz2mult(current_cpu_khz, shift);
+ }
+}
+
+
+static int __init init_tsc_interp_timesource(void)
+{
+ /* TSC initialization is done in arch/i386/kernel/tsc.c */
+ if (cpu_has_tsc && cpu_khz) {
+ current_cpu_khz = cpu_khz;
+ shift = SHIFT_VAL;
+ mult = timesource_khz2mult(current_cpu_khz, shift);
+ /* setup periodic soft-timer */
+ init_timer(&tsc_interp_timer);
+ tsc_interp_timer.function = tsc_interp_sync;
+ tsc_interp_timer.expires = jiffies;
+ add_timer(&tsc_interp_timer);
+
+ register_timesource(&timesource_tsc_interp);
+ }
+ return 0;
+}
+module_init(init_tsc_interp_timesource);
Index: drivers/timesource/tsc.c
===================================================================
--- /dev/null (tree:0feb50d39a18b8a58ac2894eeac9b2f24a3b4435)
+++ 3b4165efeade40b65ea2e8188184e4f8d3d8d636/drivers/timesource/tsc.c (mode:100644)
@@ -0,0 +1,95 @@
+/* TODO:
+ * o better calibration
+ */
+
+#include <linux/timesource.h>
+#include <linux/timex.h>
+#include <linux/init.h>
+
+static unsigned long current_cpu_khz = 0;
+static cycle_t tsc_c3_offset;
+
+static cycle_t read_safe_tsc(void);
+static void tsc_update_callback(void);
+
+static struct timesource_t timesource_safe_tsc = {
+ .name = "c3tsc",
+ .priority = 26,
+ .type = TIMESOURCE_FUNCTION,
+ .read_fnct = read_safe_tsc,
+ .mask = (cycle_t)-1,
+ .mult = 0, /* to be set */
+ .shift = 22,
+ .update_callback = tsc_update_callback,
+};
+
+static struct timesource_t timesource_raw_tsc = {
+ .name = "tsc",
+ .priority = 25,
+ .type = TIMESOURCE_CYCLES,
+ .mask = (cycle_t)-1,
+ .mult = 0, /* to be set */
+ .shift = 22,
+ .update_callback = tsc_update_callback,
+};
+
+static struct timesource_t timesource_tsc;
+
+static int use_safe_tsc = 0;
+
+static cycle_t read_safe_tsc(void)
+{
+ cycle_t ret;
+ rdtscll(ret);
+ return ret + tsc_c3_offset;
+}
+
+void tsc_c3_compensate(unsigned long usecs)
+{
+ u64 nsecs = (u64)usecs * 1000;
+ cycle_t offset = nsecs << timesource_tsc.shift;
+
+ if (!timesource_tsc.mult)
+ return;
+
+ if(!use_safe_tsc)
+ use_safe_tsc = 1;
+
+ do_div(offset, timesource_tsc.mult);
+ tsc_c3_offset += offset;
+}
+
+
+static void tsc_update_callback(void)
+{
+ /* check to see if we should switch to the safe timesource */
+ if (use_safe_tsc &&
+ strncmp(timesource_tsc.name, "c3tsc", 5)) {
+ printk("Falling back to C3 safe TSC\n");
+ timesource_safe_tsc.mult = timesource_tsc.mult;
+ timesource_tsc = timesource_safe_tsc;
+ }
+
+ /* only update if cpu_khz has changed */
+ if (current_cpu_khz != cpu_khz){
+ current_cpu_khz = cpu_khz;
+ timesource_tsc.mult = timesource_khz2mult(current_cpu_khz,
+ timesource_tsc.shift);
+ }
+}
+
+static int __init init_tsc_timesource(void)
+{
+ timesource_tsc = timesource_raw_tsc;
+ /* TSC initialization is done in arch/i386/kernel/tsc.c */
+ if (cpu_has_tsc && cpu_khz) {
+ current_cpu_khz = cpu_khz;
+ timesource_tsc.mult = timesource_khz2mult(current_cpu_khz,
+ timesource_tsc.shift);
+ register_timesource(&timesource_tsc);
+ }
+ return 0;
+}
+
+module_init(init_tsc_timesource);
+


2005-05-14 00:34:38

by john stultz

[permalink] [raw]
Subject: [RFC][PATCH (5/7)] new timeofday ia64,ppc32,ppc64 and s390 arch specific hooks (v A5)

All,
This patch implements the minimal architecture specific hooks to enable
the new time of day subsystem code for ia64, ppc32, ppc64 and s390. It
applies on top of my linux-2.6.12-rc4_timeofday-core_A5 patch and with
this patch applied, you can test the new time of day subsystem on these
arches.

Basically it configs in the NEWTOD code and cuts alot of code out of
the build via #ifdefs. I know, I know, #ifdefs' are ugly and bad, and
the final patch for each arch will just remove the old code. Don't
worry, I'm not going to push this to Andrew anytime soon. For now this
allows us to be flexible and easily switch between the two
implementations with a single define.

This code is largely untested, but I tried to make sure ia64, ppc32 and
ppc64 compiled without warnings.

I'd like to thank the following folks for their work in providing these
arch implementations:
o Max Asbock for the ia64 work
o Darrick Wong for the ppc32 work
o Martin Schwidefsky! for the s390 work

Items still on the TODO list:
o More testing
o arch specific vsyscall/fsyscall interface
o other arch ports

I look forward to your comments and feedback.

thanks
-john

linux-2.6.12-rc4_timeofday-arch-other_A5.patch
==============================================
Index: arch/ia64/Kconfig
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/arch/ia64/Kconfig (mode:100644)
+++ f86144e80c5de25e7bea135a07a5635205be4cf3/arch/ia64/Kconfig (mode:100644)
@@ -18,6 +18,10 @@
page at <http://www.linuxia64.org/> and a mailing list at
<[email protected]>.

+config NEWTOD
+ bool
+ default y
+
config 64BIT
bool
default y
@@ -36,7 +40,7 @@

config TIME_INTERPOLATION
bool
- default y
+ default n

config EFI
bool
Index: arch/ia64/kernel/asm-offsets.c
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/arch/ia64/kernel/asm-offsets.c (mode:100644)
+++ f86144e80c5de25e7bea135a07a5635205be4cf3/arch/ia64/kernel/asm-offsets.c (mode:100644)
@@ -222,6 +222,7 @@
DEFINE(IA64_MCA_CPU_INIT_STACK_OFFSET,
offsetof (struct ia64_mca_cpu, init_stack));
BLANK();
+#ifndef CONFIG_NEWTOD
/* used by fsys_gettimeofday in arch/ia64/kernel/fsys.S */
DEFINE(IA64_TIME_INTERPOLATOR_ADDRESS_OFFSET, offsetof (struct time_interpolator, addr));
DEFINE(IA64_TIME_INTERPOLATOR_SOURCE_OFFSET, offsetof (struct time_interpolator, source));
@@ -235,5 +236,6 @@
DEFINE(IA64_TIME_SOURCE_CPU, TIME_SOURCE_CPU);
DEFINE(IA64_TIME_SOURCE_MMIO64, TIME_SOURCE_MMIO64);
DEFINE(IA64_TIME_SOURCE_MMIO32, TIME_SOURCE_MMIO32);
+#endif /* CONFIG_NEWTOD */
DEFINE(IA64_TIMESPEC_TV_NSEC_OFFSET, offsetof (struct timespec, tv_nsec));
}
Index: arch/ia64/kernel/cyclone.c
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/arch/ia64/kernel/cyclone.c (mode:100644)
+++ f86144e80c5de25e7bea135a07a5635205be4cf3/arch/ia64/kernel/cyclone.c (mode:100644)
@@ -17,7 +17,7 @@
use_cyclone = 1;
}

-
+#ifndef CONFIG_NEWTOD
struct time_interpolator cyclone_interpolator = {
.source = TIME_SOURCE_MMIO64,
.shift = 16,
@@ -107,3 +107,4 @@
}

__initcall(init_cyclone_clock);
+#endif /* !CONFIG_NEWTOD */
Index: arch/ia64/kernel/fsys.S
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/arch/ia64/kernel/fsys.S (mode:100644)
+++ f86144e80c5de25e7bea135a07a5635205be4cf3/arch/ia64/kernel/fsys.S (mode:100644)
@@ -145,6 +145,7 @@
FSYS_RETURN
END(fsys_set_tid_address)

+#ifndef CONFIG_NEWTOD
/*
* Ensure that the time interpolator structure is compatible with the asm code
*/
@@ -326,6 +327,7 @@
EX(.fail_efault, st8 [r31] = r9)
EX(.fail_efault, st8 [r23] = r21)
FSYS_RETURN
+#endif /* !CONFIG_NEWTOD */
.fail_einval:
mov r8 = EINVAL
mov r10 = -1
@@ -334,6 +336,7 @@
mov r8 = EFAULT
mov r10 = -1
FSYS_RETURN
+#ifndef CONFIG_NEWTOD
END(fsys_gettimeofday)

ENTRY(fsys_clock_gettime)
@@ -347,6 +350,7 @@
shl r30 = r32,15
br.many .gettime
END(fsys_clock_gettime)
+#endif /* !CONFIG_NEWTOD */

/*
* long fsys_rt_sigprocmask (int how, sigset_t *set, sigset_t *oset, size_t sigsetsize).
@@ -689,7 +693,11 @@
data8 0 // setrlimit
data8 0 // getrlimit // 1085
data8 0 // getrusage
+#ifdef CONFIG_NEWTOD
+ data8 0 // gettimeofday
+#else
data8 fsys_gettimeofday // gettimeofday
+#endif
data8 0 // settimeofday
data8 0 // select
data8 0 // poll // 1090
@@ -856,7 +864,11 @@
data8 0 // timer_getoverrun
data8 0 // timer_delete
data8 0 // clock_settime
+#ifdef CONFIG_NEWTOD
+ data8 0 // clock_gettime
+#else
data8 fsys_clock_gettime // clock_gettime
+#endif
data8 0 // clock_getres // 1255
data8 0 // clock_nanosleep
data8 0 // fstatfs64
Index: arch/ia64/kernel/time.c
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/arch/ia64/kernel/time.c (mode:100644)
+++ f86144e80c5de25e7bea135a07a5635205be4cf3/arch/ia64/kernel/time.c (mode:100644)
@@ -21,6 +21,7 @@
#include <linux/efi.h>
#include <linux/profile.h>
#include <linux/timex.h>
+#include <linux/timeofday.h>

#include <asm/machvec.h>
#include <asm/delay.h>
@@ -45,11 +46,13 @@

#endif

+#ifndef CONFIG_NEWTOD
static struct time_interpolator itc_interpolator = {
.shift = 16,
.mask = 0xffffffffffffffffLL,
.source = TIME_SOURCE_CPU
};
+#endif /* CONFIG_NEWTOD */

static irqreturn_t
timer_interrupt (int irq, void *dev_id, struct pt_regs *regs)
@@ -211,6 +214,7 @@
local_cpu_data->nsec_per_cyc = ((NSEC_PER_SEC<<IA64_NSEC_PER_CYC_SHIFT)
+ itc_freq/2)/itc_freq;

+#ifndef CONFIG_NEWTOD
if (!(sal_platform_features & IA64_SAL_PLATFORM_FEATURE_ITC_DRIFT)) {
itc_interpolator.frequency = local_cpu_data->itc_freq;
itc_interpolator.drift = itc_drift;
@@ -229,6 +233,7 @@
#endif
register_time_interpolator(&itc_interpolator);
}
+#endif /* CONFIG_NEWTOD */

/* Setup the CPU local timer tick */
ia64_cpu_local_tick();
@@ -253,3 +258,17 @@
*/
set_normalized_timespec(&wall_to_monotonic, -xtime.tv_sec, -xtime.tv_nsec);
}
+
+/* arch specific timeofday hooks */
+nsec_t read_persistent_clock(void)
+{
+ struct timespec ts;
+ efi_gettimeofday(&ts);
+ return (nsec_t)(ts.tv_sec * NSEC_PER_SEC + ts.tv_nsec);
+}
+
+void sync_persistent_clock(struct timespec ts)
+{
+ /* XXX - Something should go here, no? */
+}
+
Index: arch/ia64/sn/kernel/sn2/timer.c
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/arch/ia64/sn/kernel/sn2/timer.c (mode:100644)
+++ f86144e80c5de25e7bea135a07a5635205be4cf3/arch/ia64/sn/kernel/sn2/timer.c (mode:100644)
@@ -19,6 +19,7 @@
#include <asm/sn/shub_mmr.h>
#include <asm/sn/clksupport.h>

+#ifndef CONFIG_NEWTOD
extern unsigned long sn_rtc_cycles_per_second;

static struct time_interpolator sn2_interpolator = {
@@ -34,3 +35,8 @@
sn2_interpolator.addr = RTC_COUNTER_ADDR;
register_time_interpolator(&sn2_interpolator);
}
+#else
+void __init sn_timer_init(void)
+{
+}
+#endif
Index: arch/ppc/Kconfig
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/arch/ppc/Kconfig (mode:100644)
+++ f86144e80c5de25e7bea135a07a5635205be4cf3/arch/ppc/Kconfig (mode:100644)
@@ -8,6 +8,10 @@
bool
default y

+config NEWTOD
+ bool
+ default y
+
config UID16
bool

Index: arch/ppc/kernel/time.c
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/arch/ppc/kernel/time.c (mode:100644)
+++ f86144e80c5de25e7bea135a07a5635205be4cf3/arch/ppc/kernel/time.c (mode:100644)
@@ -57,6 +57,7 @@
#include <linux/time.h>
#include <linux/init.h>
#include <linux/profile.h>
+#include <linux/timeofday.h>

#include <asm/segment.h>
#include <asm/io.h>
@@ -76,6 +77,8 @@

extern struct timezone sys_tz;

+static unsigned long time_offset;
+
/* keep track of when we need to update the rtc */
time_t last_rtc_update;

@@ -93,6 +96,46 @@

EXPORT_SYMBOL(rtc_lock);

+#ifdef CONFIG_NEWTOD
+nsec_t read_persistent_clock(void)
+{
+ if (ppc_md.get_rtc_time) {
+ return (nsec_t)ppc_md.get_rtc_time() * NSEC_PER_SEC;
+ } else {
+ printk(KERN_ERR "ppc_md.get_rtc_time does not exist???\n");
+ return 0;
+ }
+}
+
+void sync_persistent_clock(struct timespec ts)
+{
+ /*
+ * update the rtc when needed, this should be performed on the
+ * right fraction of a second. Half or full second ?
+ * Full second works on mk48t59 clocks, others need testing.
+ * Note that this update is basically only used through
+ * the adjtimex system calls. Setting the HW clock in
+ * any other way is a /dev/rtc and userland business.
+ * This is still wrong by -0.5/+1.5 jiffies because of the
+ * timer interrupt resolution and possible delay, but here we
+ * hit a quantization limit which can only be solved by higher
+ * resolution timers and decoupling time management from timer
+ * interrupts. This is also wrong on the clocks
+ * which require being written at the half second boundary.
+ * We should have an rtc call that only sets the minutes and
+ * seconds like on Intel to avoid problems with non UTC clocks.
+ */
+ if ( ppc_md.set_rtc_time && ts.tv_sec - last_rtc_update >= 659 &&
+ abs((ts.tv_nsec/1000) - (1000000-1000000/HZ)) < 500000/HZ) {
+ if (ppc_md.set_rtc_time(ts.tv_sec + 1 + time_offset) == 0)
+ last_rtc_update = ts.tv_sec+1;
+ else
+ /* Try again one minute later */
+ last_rtc_update += 60;
+ }
+}
+#endif /* CONFIG_NEWTOD */
+
/* Timer interrupt helper function */
static inline int tb_delta(unsigned *jiffy_stamp) {
int delta;
@@ -150,6 +193,7 @@
tb_last_stamp = jiffy_stamp;
do_timer(regs);

+#ifndef CONFIG_NEWTOD
/*
* update the rtc when needed, this should be performed on the
* right fraction of a second. Half or full second ?
@@ -176,6 +220,7 @@
/* Try again one minute later */
last_rtc_update += 60;
}
+#endif
write_sequnlock(&xtime_lock);
}
if ( !disarm_decr[smp_processor_id()] )
@@ -191,6 +236,7 @@
/*
* This version of gettimeofday has microsecond resolution.
*/
+#ifndef CONFIG_NEWTOD
void do_gettimeofday(struct timeval *tv)
{
unsigned long flags;
@@ -278,6 +324,7 @@
}

EXPORT_SYMBOL(do_settimeofday);
+#endif

/* This function is only called on the boot processor */
void __init time_init(void)
Index: arch/ppc64/Kconfig
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/arch/ppc64/Kconfig (mode:100644)
+++ f86144e80c5de25e7bea135a07a5635205be4cf3/arch/ppc64/Kconfig (mode:100644)
@@ -10,6 +10,10 @@
bool
default y

+config NEWTOD
+ bool
+ default y
+
config UID16
bool

Index: arch/ppc64/kernel/sys_ppc32.c
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/arch/ppc64/kernel/sys_ppc32.c (mode:100644)
+++ f86144e80c5de25e7bea135a07a5635205be4cf3/arch/ppc64/kernel/sys_ppc32.c (mode:100644)
@@ -322,8 +322,10 @@

ret = do_adjtimex(&txc);

+#ifndef CONFIG_NEWTOD
/* adjust the conversion of TB to time of day to track adjtimex */
ppc_adjtimex();
+#endif

if(put_user(txc.modes, &utp->modes) ||
__put_user(txc.offset, &utp->offset) ||
Index: arch/ppc64/kernel/time.c
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/arch/ppc64/kernel/time.c (mode:100644)
+++ f86144e80c5de25e7bea135a07a5635205be4cf3/arch/ppc64/kernel/time.c (mode:100644)
@@ -50,6 +50,7 @@
#include <linux/profile.h>
#include <linux/cpu.h>
#include <linux/security.h>
+#include <linux/timeofday.h>

#include <asm/segment.h>
#include <asm/io.h>
@@ -105,6 +106,7 @@

void ppc_adjtimex(void);

+#ifndef CONFIG_NEWTOD
static unsigned adjusting_time = 0;

static __inline__ void timer_check_rtc(void)
@@ -140,6 +142,52 @@
last_rtc_update += 60;
}
}
+#else /* CONFIG_NEWTOD */
+nsec_t read_persistent_clock(void)
+{
+ struct rtc_time tm;
+ unsigned long sec;
+#ifdef CONFIG_PPC_ISERIES
+ if (!piranha_simulator)
+#endif
+ ppc_md.get_boot_time(&tm);
+
+ sec = mktime(tm.tm_year + 1900, tm.tm_mon + 1, tm.tm_mday,
+ tm.tm_hour, tm.tm_min, tm.tm_sec);
+ return (nsec_t)sec * NSEC_PER_SEC;
+}
+void sync_persistent_clock(struct timespec ts)
+{
+ /*
+ * update the rtc when needed, this should be performed on the
+ * right fraction of a second. Half or full second ?
+ * Full second works on mk48t59 clocks, others need testing.
+ * Note that this update is basically only used through
+ * the adjtimex system calls. Setting the HW clock in
+ * any other way is a /dev/rtc and userland business.
+ * This is still wrong by -0.5/+1.5 jiffies because of the
+ * timer interrupt resolution and possible delay, but here we
+ * hit a quantization limit which can only be solved by higher
+ * resolution timers and decoupling time management from timer
+ * interrupts. This is also wrong on the clocks
+ * which require being written at the half second boundary.
+ * We should have an rtc call that only sets the minutes and
+ * seconds like on Intel to avoid problems with non UTC clocks.
+ */
+ if ( ts.tv_sec - last_rtc_update >= 659 &&
+ abs((ts.tv_nsec/1000) - (1000000-1000000/HZ)) < 500000/HZ) {
+ struct rtc_time tm;
+ to_tm(ts.tv_sec+1, &tm);
+ tm.tm_year -= 1900;
+ tm.tm_mon -= 1;
+ if (ppc_md.set_rtc_time(&tm) == 0)
+ last_rtc_update = ts.tv_sec+1;
+ else
+ /* Try again one minute later */
+ last_rtc_update += 60;
+ }
+}
+#endif /* CONFIG_NEWTOD */

/*
* This version of gettimeofday has microsecond resolution.
@@ -171,12 +219,14 @@
tv->tv_usec = usec;
}

+#ifndef CONFIG_NEWTOD
void do_gettimeofday(struct timeval *tv)
{
__do_gettimeofday(tv, get_tb());
}

EXPORT_SYMBOL(do_gettimeofday);
+#endif

/* Synchronize xtime with do_gettimeofday */

@@ -350,11 +400,15 @@
tb_last_stamp = lpaca->next_jiffy_update_tb;
timer_recalc_offset(lpaca->next_jiffy_update_tb);
do_timer(regs);
+#ifndef CONFIG_NEWTOD
timer_sync_xtime(lpaca->next_jiffy_update_tb);
timer_check_rtc();
+#endif
write_sequnlock(&xtime_lock);
+#ifndef CONFIG_NEWTOD
if ( adjusting_time && (time_adjust == 0) )
ppc_adjtimex();
+#endif
}
lpaca->next_jiffy_update_tb += tb_ticks_per_jiffy;
}
@@ -397,6 +451,7 @@
return mulhdu(get_tb(), tb_to_ns_scale) << tb_to_ns_shift;
}

+#ifndef CONFIG_NEWTOD
int do_settimeofday(struct timespec *tv)
{
time_t wtm_sec, new_sec = tv->tv_sec;
@@ -473,6 +528,7 @@
}

EXPORT_SYMBOL(do_settimeofday);
+#endif /* !CONFIG_NEWTOD */

void __init time_init(void)
{
@@ -525,7 +581,9 @@
systemcfg->stamp_xsec = xtime.tv_sec * XSEC_PER_SEC;
systemcfg->tb_to_xs = tb_to_xs;

+#ifndef CONFIG_NEWTOD
time_freq = 0;
+#endif

xtime.tv_nsec = 0;
last_rtc_update = xtime.tv_sec;
@@ -548,6 +606,7 @@

/* #define DEBUG_PPC_ADJTIMEX 1 */

+#ifndef CONFIG_NEWTOD
void ppc_adjtimex(void)
{
unsigned long den, new_tb_ticks_per_sec, tb_ticks, old_xsec, new_tb_to_xs, new_xsec, new_stamp_xsec;
@@ -671,6 +730,7 @@
write_sequnlock_irqrestore( &xtime_lock, flags );

}
+#endif /* !CONFIG_NEWTOD */


#define TICK_SIZE tick
Index: arch/s390/Kconfig
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/arch/s390/Kconfig (mode:100644)
+++ f86144e80c5de25e7bea135a07a5635205be4cf3/arch/s390/Kconfig (mode:100644)
@@ -127,6 +127,10 @@
This allows you to run 32-bit Linux/ELF binaries on your zSeries
in 64 bit mode. Everybody wants this; say Y.

+config NEWTOD
+ bool
+ default y
+
comment "Code generation options"

choice
Index: arch/s390/kernel/time.c
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/arch/s390/kernel/time.c (mode:100644)
+++ f86144e80c5de25e7bea135a07a5635205be4cf3/arch/s390/kernel/time.c (mode:100644)
@@ -29,6 +29,7 @@
#include <linux/profile.h>
#include <linux/timex.h>
#include <linux/notifier.h>
+#include <linux/timeofday.h>

#include <asm/uaccess.h>
#include <asm/delay.h>
@@ -89,6 +90,7 @@
return (unsigned long) now;
}

+#ifndef CONFIG_NEWTOD
/*
* This version of gettimeofday has microsecond resolution.
*/
@@ -149,7 +151,27 @@
}

EXPORT_SYMBOL(do_settimeofday);
+#endif
+
+nsec_t read_persistent_clock(void)
+{
+ unsigned long long nsecs;
+ /*
+ * The TOD clock counts from 1900-01-01. Bit 2^12 of the
+ * 64 bit register is micro-seconds.
+ */
+ nsecs = get_clock() - 0x7d91048bca000000LL;
+ /*
+ * Calc nsecs * 1000 / 4096 without overflow and
+ * without loosing too many bits.
+ */
+ nsecs = (((((nsecs >> 3) * 5) >> 3) * 5) >> 3) * 5;
+ return (nsec_t) nsecs;
+}

+void sync_persistent_clock(struct timespec ts)
+{
+}

#ifdef CONFIG_PROFILING
#define s390_do_profile(regs) profile_tick(CPU_PROFILING, regs)
Index: include/asm-ppc64/time.h
===================================================================
--- d68b09f31fa98801ead715e9281a2e4676b770a5/include/asm-ppc64/time.h (mode:100644)
+++ f86144e80c5de25e7bea135a07a5635205be4cf3/include/asm-ppc64/time.h (mode:100644)
@@ -21,6 +21,7 @@
#include <asm/processor.h>
#include <asm/paca.h>
#include <asm/iSeries/HvCall.h>
+#include <asm/percpu.h>

/* time.c */
extern unsigned long tb_ticks_per_jiffy;


2005-05-14 00:38:34

by john stultz

[permalink] [raw]
Subject: [RFC][PATCH (6/7)] new timeofday ia64,ppc32,ppc64 and s390 timesources (v A5)


All,
This patch implements the time sources for ppc32, ppc64, s390 and
initial untested sketches of timesources for ia64. The patch should
apply ontop of linux-2.6.12-rc4_timeofday-arch-other_A5. The patch
should be fairly straight forward, only adding the new timesources.

I'd like to thank the following folks for their work in providing these
arch implementations:
o Darrick Wong for the ppc32 work
o Martin Schwidefsky! for the s390 work

New in this release:
o minor fixes for compile warnings

Items still on the TODO list:
o real ia64 timesources
o all other arch timesources
o lots of cleanups
o lots of testing

I look forward to your comments and feedback.

thanks
-john

linux-2.6.12-rc4_timeofday-timesources-other_A5.patch
======================================================
Index: drivers/timesource/Makefile
===================================================================
--- f86144e80c5de25e7bea135a07a5635205be4cf3/drivers/timesource/Makefile (mode:100644)
+++ 6f16ba51ef2d9bdf92b90eb3d61785877456e273/drivers/timesource/Makefile (mode:100644)
@@ -1 +1,9 @@
obj-y += jiffies.o
+
+obj-$(CONFIG_PPC64) += ppc64_timebase.o
+obj-$(CONFIG_PPC) += ppc_timebase.o
+obj-$(CONFIG_ARCH_S390) += s390_tod.o
+
+# XXX - Untested/Uncompiled
+#obj-$(CONFIG_IA64) += itc.c
+#obj-$(CONFIG_IA64_SGI_SN2) += sn2_rtc.c
Index: drivers/timesource/itc.c
===================================================================
--- /dev/null (tree:f86144e80c5de25e7bea135a07a5635205be4cf3)
+++ 6f16ba51ef2d9bdf92b90eb3d61785877456e273/drivers/timesource/itc.c (mode:100644)
@@ -0,0 +1,35 @@
+/* XXX - this is totally untested and uncompiled
+ * TODO:
+ * o cpufreq issues
+ * o unsynched ITCs ?
+ */
+#include <linux/timesource.h>
+
+/* XXX - Other includes needed for:
+ * sal_platform_features, IA64_SAL_PLATFORM_FEATURE_ITC_DRIFT,
+ * local_cpu_data->itc_freq
+ * See arch/ia64/kernel/time.c for ideas
+ */
+
+static struct timesource_t timesource_itc = {
+ .name = "itc",
+ .priority = 25,
+ .type = TIMESOURCE_CYCLES,
+ .mask = (cycle_t)-1,
+ .mult = 0, /* to be set */
+ .shift = 22,
+};
+
+static int __init init_itc_timesource(void)
+{
+ if (!(sal_platform_features & IA64_SAL_PLATFORM_FEATURE_ITC_DRIFT)) {
+ /* XXX - I'm not really sure if itc_freq is in cyc/sec */
+ timesource_itc.mult = timesource_hz2mult(local_cpu_data->itc_freq,
+ timesource_itc.shift);
+ register_timesource(&timesource_itc);
+ }
+ return 0;
+}
+
+module_init(init_itc_timesource);
+
Index: drivers/timesource/ppc64_timebase.c
===================================================================
--- /dev/null (tree:f86144e80c5de25e7bea135a07a5635205be4cf3)
+++ 6f16ba51ef2d9bdf92b90eb3d61785877456e273/drivers/timesource/ppc64_timebase.c (mode:100644)
@@ -0,0 +1,33 @@
+#include <linux/timesource.h>
+#include <asm/time.h>
+
+static cycle_t timebase_read(void)
+{
+ return (cycle_t)get_tb();
+}
+
+struct timesource_t timesource_timebase = {
+ .name = "timebase",
+ .priority = 200,
+ .type = TIMESOURCE_FUNCTION,
+ .read_fnct = timebase_read,
+ .mask = (cycle_t)-1,
+ .mult = 0,
+ .shift = 22,
+};
+
+
+/* XXX - this should be calculated or properly externed! */
+extern unsigned long tb_to_ns_scale;
+extern unsigned long tb_to_ns_shift;
+extern unsigned long tb_ticks_per_sec;
+
+static int __init init_timebase_timesource(void)
+{
+ timesource_timebase.mult = timesource_hz2mult(tb_ticks_per_sec,
+ timesource_timebase.shift);
+ register_timesource(&timesource_timebase);
+ return 0;
+}
+
+module_init(init_timebase_timesource);
Index: drivers/timesource/ppc_timebase.c
===================================================================
--- /dev/null (tree:f86144e80c5de25e7bea135a07a5635205be4cf3)
+++ 6f16ba51ef2d9bdf92b90eb3d61785877456e273/drivers/timesource/ppc_timebase.c (mode:100644)
@@ -0,0 +1,56 @@
+#include <linux/timesource.h>
+#include <linux/init.h>
+#include <asm/time.h>
+#ifndef CONFIG_PPC64
+
+/* XXX - this should be calculated or properly externed! */
+
+/* DJWONG: tb_to_ns_scale is supposed to be set in time_init.
+ * No idea if that actually _happens_ on a ppc601, though it
+ * seems to work on a B&W G3. :D */
+extern unsigned long tb_to_ns_scale;
+
+static cycle_t ppc_timebase_read(void)
+{
+ unsigned long lo, hi, hi2;
+ unsigned long long tb;
+
+ do {
+ hi = get_tbu();
+ lo = get_tbl();
+ hi2 = get_tbu();
+ } while (hi2 != hi);
+ tb = ((unsigned long long) hi << 32) | lo;
+
+ return (cycle_t)tb;
+}
+
+struct timesource_t timesource_ppc_timebase = {
+ .name = "ppc_timebase",
+ .priority = 200,
+ .type = TIMESOURCE_FUNCTION,
+ .read_fnct = ppc_timebase_read,
+ .mask = (cycle_t)-1,
+ .mult = 0,
+ .shift = 22,
+};
+
+static int __init init_ppc_timebase_timesource(void)
+{
+ /* DJWONG: Extrapolated from ppc64 code. */
+ unsigned long tb_ticks_per_sec;
+
+ tb_ticks_per_sec = tb_ticks_per_jiffy * HZ;
+
+ timesource_ppc_timebase.mult = timesource_hz2mult(tb_ticks_per_sec,
+ timesource_ppc_timebase.shift);
+
+ printk(KERN_INFO "ppc_timebase: tb_ticks_per_sec = %lu, mult = %lu, tb_to_ns = %lu.\n",
+ tb_ticks_per_sec, (unsigned long)timesource_ppc_timebase.mult , tb_to_ns_scale);
+
+ register_timesource(&timesource_ppc_timebase);
+ return 0;
+}
+
+module_init(init_ppc_timebase_timesource);
+#endif /* CONFIG_PPC64 */
Index: drivers/timesource/s390_tod.c
===================================================================
--- /dev/null (tree:f86144e80c5de25e7bea135a07a5635205be4cf3)
+++ 6f16ba51ef2d9bdf92b90eb3d61785877456e273/drivers/timesource/s390_tod.c (mode:100644)
@@ -0,0 +1,37 @@
+/*
+ * linux/drivers/timesource/s390_tod.c
+ *
+ * (C) Copyright IBM Corp. 2004
+ *
+ * Author(s): Martin Schwidefsky ([email protected]),
+ *
+ * s390 TOD clock time source.
+ */
+
+#include <linux/timesource.h>
+#include <linux/timex.h>
+#include <linux/init.h>
+
+static cycle_t s390_tod_read(void)
+{
+ return get_clock();
+}
+
+struct timesource_t timesource_s390_tod = {
+ .name = "TOD",
+ .priority = 100,
+ .type = TIMESOURCE_FUNCTION,
+ .read_fnct = s390_tod_read,
+ .mask = -1ULL,
+ .mult = 1000,
+ .shift = 12
+};
+
+
+static int __init init_s390_timesource(void)
+{
+ register_timesource(&timesource_s390_tod);
+ return 0;
+}
+
+module_init(init_s390_timesource);
Index: drivers/timesource/sn2_rtc.c
===================================================================
--- /dev/null (tree:f86144e80c5de25e7bea135a07a5635205be4cf3)
+++ 6f16ba51ef2d9bdf92b90eb3d61785877456e273/drivers/timesource/sn2_rtc.c (mode:100644)
@@ -0,0 +1,29 @@
+#include <linux/timesource.h>
+/* XXX this will need some includes
+ * to find: sn_rtc_cycles_per_second and RTC_COUNTER_ADDR
+ * See arch/ia64/sn/kernel/sn2/timer.c for likely suspects
+ */
+
+#define SN2_RTC_MASK ((1LL << 55) - 1)
+#define SN2_SHIFT 10
+
+struct timesource_t timesource_sn2_rtc = {
+ .name = "sn2_rtc",
+ .priority = 300, /* XXX - not sure what this should be */
+ .type = TIMESOURCE_MMIO_64,
+ .mmio_ptr = NULL,
+ .mask = (cycle_t)SN2_RTC_MASK,
+ .mult = 0, /* set below */
+ .shift = SN2_SHIFT,
+};
+
+static void __init init_sn2_timesource(void)
+{
+ timesource_sn2_rtc.mult = timesource_hz2mult(sn_rtc_cycles_per_second,
+ SN2_SHIFT);
+ timesource_sn2_rtc.mmio_ptr = RTC_COUNTER_ADDR;
+
+ register_time_interpolator(&timesource_sn2_rtc);
+ return 0;
+}
+module_init(init_sn2_timesource);


2005-05-14 00:40:08

by john stultz

[permalink] [raw]
Subject: [RFC][PATCH (7/7)] new timeofday i386 vsyscall proof of concept (v A5)

All,

This patch implements vsyscall-gettimeofday() functions for i386
using the new timeofday core code. This is just a hackish proof of
concept that shows how it could be done and what interfaces are needed
to have a clean separation of the arch independent time keeping and the
very arch specific vsyscall code. It should apply on top of my
linux-2.6.12-rc4_timeofday-timesources-i386_A5 patch.

I look forward to your comments and feedback.

thanks
-john

linux-2.6.12-rc4_timeofday-vsyscall-i386_A5.patch
=================================================
Index: arch/i386/Kconfig
===================================================================
--- 3b4165efeade40b65ea2e8188184e4f8d3d8d636/arch/i386/Kconfig (mode:100644)
+++ 9d016193cc103e4ba0026e943774ef0f774bf72f/arch/i386/Kconfig (mode:100644)
@@ -18,6 +18,9 @@
bool
default y

+config NEWTOD_VSYSCALL
+ depends on EXPERIMENTAL
+ bool "VSYSCALL gettimeofday() interface"

config MMU
bool
Index: arch/i386/kernel/Makefile
===================================================================
--- 3b4165efeade40b65ea2e8188184e4f8d3d8d636/arch/i386/kernel/Makefile (mode:100644)
+++ 9d016193cc103e4ba0026e943774ef0f774bf72f/arch/i386/kernel/Makefile (mode:100644)
@@ -10,6 +10,7 @@
doublefault.o quirks.o tsc.o

obj-y += cpu/
+obj-$(CONFIG_NEWTOD_VSYSCALL) += vsyscall-gtod.o
obj-$(CONFIG_ACPI_BOOT) += acpi/
obj-$(CONFIG_X86_BIOS_REBOOT) += reboot.o
obj-$(CONFIG_MCA) += mca.o
Index: arch/i386/kernel/setup.c
===================================================================
--- 3b4165efeade40b65ea2e8188184e4f8d3d8d636/arch/i386/kernel/setup.c (mode:100644)
+++ 9d016193cc103e4ba0026e943774ef0f774bf72f/arch/i386/kernel/setup.c (mode:100644)
@@ -50,6 +50,7 @@
#include <asm/io_apic.h>
#include <asm/ist.h>
#include <asm/io.h>
+#include <asm/vsyscall-gtod.h>
#include "setup_arch_pre.h"
#include <bios_ebda.h>

@@ -1524,6 +1525,7 @@
#endif
#endif
tsc_init();
+ vsyscall_init();
}

#include "setup_arch_post.h"
Index: arch/i386/kernel/vmlinux.lds.S
===================================================================
--- 3b4165efeade40b65ea2e8188184e4f8d3d8d636/arch/i386/kernel/vmlinux.lds.S (mode:100644)
+++ 9d016193cc103e4ba0026e943774ef0f774bf72f/arch/i386/kernel/vmlinux.lds.S (mode:100644)
@@ -5,6 +5,8 @@
#include <asm-generic/vmlinux.lds.h>
#include <asm/thread_info.h>
#include <asm/page.h>
+#include <linux/config.h>
+#include <asm/vsyscall-gtod.h>

OUTPUT_FORMAT("elf32-i386", "elf32-i386", "elf32-i386")
OUTPUT_ARCH(i386)
@@ -52,6 +54,31 @@

_edata = .; /* End of data section */

+/* VSYSCALL_GTOD data */
+#ifdef CONFIG_NEWTOD_VSYSCALL
+
+ /* vsyscall entry */
+ . = ALIGN(64);
+ .data.cacheline_aligned : { *(.data.cacheline_aligned) }
+
+ .vsyscall_0 VSYSCALL_GTOD_START: AT ((LOADADDR(.data.cacheline_aligned) + SIZEOF(.data.cacheline_aligned) + 4095) & ~(4095)) { *(.vsyscall_0) }
+ __vsyscall_0 = LOADADDR(.vsyscall_0);
+
+
+ /* generic gtod variables */
+ . = ALIGN(64);
+ .vsyscall_gtod_data : AT ((LOADADDR(.vsyscall_0) + SIZEOF(.vsyscall_0) + 63) & ~(63)) { *(.vsyscall_gtod_data) }
+ vsyscall_gtod_data = LOADADDR(.vsyscall_gtod_data);
+
+ . = ALIGN(16);
+ .vsyscall_gtod_lock : AT ((LOADADDR(.vsyscall_gtod_data) + SIZEOF(.vsyscall_gtod_data) + 15) & ~(15)) { *(.vsyscall_gtod_lock) }
+ vsyscall_gtod_lock = LOADADDR(.vsyscall_gtod_lock);
+
+ .vsyscall_1 ADDR(.vsyscall_0) + 1024: AT (LOADADDR(.vsyscall_0) + 1024) { *(.vsyscall_1) }
+ . = LOADADDR(.vsyscall_0) + 4096;
+#endif
+/* END of VSYSCALL_GTOD data*/
+
. = ALIGN(THREAD_SIZE); /* init_task */
.data.init_task : { *(.data.init_task) }

Index: arch/i386/kernel/vsyscall-gtod.c
===================================================================
--- /dev/null (tree:3b4165efeade40b65ea2e8188184e4f8d3d8d636)
+++ 9d016193cc103e4ba0026e943774ef0f774bf72f/arch/i386/kernel/vsyscall-gtod.c (mode:100644)
@@ -0,0 +1,193 @@
+#include <linux/time.h>
+#include <linux/timeofday.h>
+#include <linux/timesource.h>
+#include <linux/sched.h>
+#include <asm/vsyscall-gtod.h>
+#include <asm/pgtable.h>
+#include <asm/page.h>
+#include <asm/fixmap.h>
+#include <asm/msr.h>
+#include <asm/timer.h>
+#include <asm/system.h>
+#include <asm/unistd.h>
+#include <asm/errno.h>
+
+struct vsyscall_gtod_data_t {
+ struct timeval wall_time_tv;
+ struct timezone sys_tz;
+ cycle_t offset_base;
+ struct timesource_t timesource;
+};
+
+struct vsyscall_gtod_data_t vsyscall_gtod_data;
+struct vsyscall_gtod_data_t __vsyscall_gtod_data __section_vsyscall_gtod_data;
+
+seqlock_t vsyscall_gtod_lock = SEQLOCK_UNLOCKED;
+seqlock_t __vsyscall_gtod_lock __section_vsyscall_gtod_lock = SEQLOCK_UNLOCKED;
+
+int errno;
+static inline _syscall2(int,gettimeofday,struct timeval *,tv,struct timezone *,tz);
+
+static int vsyscall_mapped = 0; /* flag variable for remap_vsyscall() */
+extern struct timezone sys_tz;
+
+static inline void do_vgettimeofday(struct timeval* tv)
+{
+ cycle_t now, cycle_delta;
+ nsec_t nsec_delta;
+
+ if (__vsyscall_gtod_data.timesource.type == TIMESOURCE_FUNCTION) {
+ gettimeofday(tv, NULL);
+ return;
+ }
+
+ /* read the timeosurce and calc cycle_delta */
+ now = read_timesource(&__vsyscall_gtod_data.timesource);
+ cycle_delta = (now - __vsyscall_gtod_data.offset_base)
+ & __vsyscall_gtod_data.timesource.mask;
+
+ /* convert cycles to nsecs */
+ nsec_delta = cycle_delta * __vsyscall_gtod_data.timesource.mult;
+ nsec_delta = nsec_delta >> __vsyscall_gtod_data.timesource.shift;
+
+ /* add nsec offset to wall_time_tv */
+ *tv = __vsyscall_gtod_data.wall_time_tv;
+ do_div(nsec_delta, NSEC_PER_USEC);
+ tv->tv_usec += (unsigned long) nsec_delta;
+ while (tv->tv_usec > USEC_PER_SEC) {
+ tv->tv_sec += 1;
+ tv->tv_usec -= USEC_PER_SEC;
+ }
+}
+
+static inline void do_get_tz(struct timezone *tz)
+{
+ *tz = __vsyscall_gtod_data.sys_tz;
+}
+
+static int __vsyscall(0) asmlinkage vgettimeofday(struct timeval *tv, struct timezone *tz)
+{
+ unsigned long seq;
+ do {
+ seq = read_seqbegin(&__vsyscall_gtod_lock);
+
+ if (tv)
+ do_vgettimeofday(tv);
+ if (tz)
+ do_get_tz(tz);
+
+ } while (read_seqretry(&__vsyscall_gtod_lock, seq));
+
+ return 0;
+}
+
+static time_t __vsyscall(1) asmlinkage vtime(time_t * t)
+{
+ struct timeval tv;
+ vgettimeofday(&tv,NULL);
+ if (t)
+ *t = tv.tv_sec;
+ return tv.tv_sec;
+}
+
+struct timesource_t* curr_timesource;
+
+void arch_update_vsyscall_gtod(nsec_t wall_time, cycle_t offset_base,
+ struct timesource_t* timesource, int ntp_adj)
+{
+ unsigned long flags;
+
+ write_seqlock_irqsave(&vsyscall_gtod_lock, flags);
+
+ /* XXX - hackitty hack hack. this is terrible! */
+ if (curr_timesource != timesource) {
+ if ((timesource->type == TIMESOURCE_MMIO_32)
+ || (timesource->type == TIMESOURCE_MMIO_64)) {
+ unsigned long vaddr = (unsigned long)timesource->mmio_ptr;
+ pgd_t *pgd = pgd_offset_k(vaddr);
+ pud_t *pud = pud_offset(pgd, vaddr);
+ pmd_t *pmd = pmd_offset(pud,vaddr);
+ pte_t *pte = pte_offset_kernel(pmd, vaddr);
+ pte->pte_low |= _PAGE_USER;
+ }
+ }
+
+ /* save off wall time as timeval */
+ vsyscall_gtod_data.wall_time_tv = ns2timeval(wall_time);
+
+ /* save offset_base */
+ vsyscall_gtod_data.offset_base = offset_base;
+
+ /* copy current timesource */
+ vsyscall_gtod_data.timesource = *timesource;
+
+ /* apply ntp adjustment to timesource mult */
+ vsyscall_gtod_data.timesource.mult += ntp_adj;
+
+ /* save off current timezone */
+ vsyscall_gtod_data.sys_tz = sys_tz;
+
+ write_sequnlock_irqrestore(&vsyscall_gtod_lock, flags);
+
+}
+extern char __vsyscall_0;
+
+static void __init map_vsyscall(void)
+{
+ unsigned long physaddr_page0 = (unsigned long) &__vsyscall_0 - PAGE_OFFSET;
+
+ /* Initially we map the VSYSCALL page w/ PAGE_KERNEL permissions to
+ * keep the alternate_instruction code from bombing out when it
+ * changes the seq_lock memory barriers in vgettimeofday()
+ */
+ __set_fixmap(FIX_VSYSCALL_GTOD_FIRST_PAGE, physaddr_page0, PAGE_KERNEL);
+}
+
+static int __init remap_vsyscall(void)
+{
+ unsigned long physaddr_page0 = (unsigned long) &__vsyscall_0 - PAGE_OFFSET;
+
+ if (!vsyscall_mapped)
+ return 0;
+
+ /* Remap the VSYSCALL page w/ PAGE_KERNEL_VSYSCALL permissions
+ * after the alternate_instruction code has run
+ */
+ clear_fixmap(FIX_VSYSCALL_GTOD_FIRST_PAGE);
+ __set_fixmap(FIX_VSYSCALL_GTOD_FIRST_PAGE, physaddr_page0, PAGE_KERNEL_VSYSCALL);
+
+ return 0;
+}
+
+int __init vsyscall_init(void)
+{
+ printk("VSYSCALL: consistency checks...");
+ if ((unsigned long) &vgettimeofday != VSYSCALL_ADDR(__NR_vgettimeofday)) {
+ printk("vgettimeofday link addr broken\n");
+ printk("VSYSCALL: vsyscall_init failed!\n");
+ return -EFAULT;
+ }
+ if ((unsigned long) &vtime != VSYSCALL_ADDR(__NR_vtime)) {
+ printk("vtime link addr broken\n");
+ printk("VSYSCALL: vsyscall_init failed!\n");
+ return -EFAULT;
+ }
+ if (VSYSCALL_ADDR(0) != __fix_to_virt(FIX_VSYSCALL_GTOD_FIRST_PAGE)) {
+ printk("fixmap first vsyscall 0x%lx should be 0x%x\n",
+ __fix_to_virt(FIX_VSYSCALL_GTOD_FIRST_PAGE),
+ VSYSCALL_ADDR(0));
+ printk("VSYSCALL: vsyscall_init failed!\n");
+ return -EFAULT;
+ }
+
+
+ printk("passed...mapping...");
+ map_vsyscall();
+ printk("done.\n");
+ vsyscall_mapped = 1;
+ printk("VSYSCALL: fixmap virt addr: 0x%lx\n",
+ __fix_to_virt(FIX_VSYSCALL_GTOD_FIRST_PAGE));
+
+ return 0;
+}
+__initcall(remap_vsyscall);
Index: include/asm-i386/fixmap.h
===================================================================
--- 3b4165efeade40b65ea2e8188184e4f8d3d8d636/include/asm-i386/fixmap.h (mode:100644)
+++ 9d016193cc103e4ba0026e943774ef0f774bf72f/include/asm-i386/fixmap.h (mode:100644)
@@ -27,6 +27,7 @@
#include <asm/acpi.h>
#include <asm/apicdef.h>
#include <asm/page.h>
+#include <asm/vsyscall-gtod.h>
#ifdef CONFIG_HIGHMEM
#include <linux/threads.h>
#include <asm/kmap_types.h>
@@ -53,6 +54,11 @@
enum fixed_addresses {
FIX_HOLE,
FIX_VSYSCALL,
+#ifdef CONFIG_NEWTOD_VSYSCALL
+ FIX_VSYSCALL_GTOD_LAST_PAGE,
+ FIX_VSYSCALL_GTOD_FIRST_PAGE = FIX_VSYSCALL_GTOD_LAST_PAGE
+ + VSYSCALL_GTOD_NUMPAGES - 1,
+#endif
#ifdef CONFIG_X86_LOCAL_APIC
FIX_APIC_BASE, /* local (CPU) APIC) -- required for SMP or not */
#endif
Index: include/asm-i386/pgtable.h
===================================================================
--- 3b4165efeade40b65ea2e8188184e4f8d3d8d636/include/asm-i386/pgtable.h (mode:100644)
+++ 9d016193cc103e4ba0026e943774ef0f774bf72f/include/asm-i386/pgtable.h (mode:100644)
@@ -159,6 +159,8 @@
#define __PAGE_KERNEL_NOCACHE (__PAGE_KERNEL | _PAGE_PCD)
#define __PAGE_KERNEL_LARGE (__PAGE_KERNEL | _PAGE_PSE)
#define __PAGE_KERNEL_LARGE_EXEC (__PAGE_KERNEL_EXEC | _PAGE_PSE)
+#define __PAGE_KERNEL_VSYSCALL \
+ (_PAGE_PRESENT | _PAGE_USER | _PAGE_ACCESSED)

#define PAGE_KERNEL __pgprot(__PAGE_KERNEL)
#define PAGE_KERNEL_RO __pgprot(__PAGE_KERNEL_RO)
@@ -166,6 +168,8 @@
#define PAGE_KERNEL_NOCACHE __pgprot(__PAGE_KERNEL_NOCACHE)
#define PAGE_KERNEL_LARGE __pgprot(__PAGE_KERNEL_LARGE)
#define PAGE_KERNEL_LARGE_EXEC __pgprot(__PAGE_KERNEL_LARGE_EXEC)
+#define PAGE_KERNEL_VSYSCALL __pgprot(__PAGE_KERNEL_VSYSCALL)
+#define PAGE_KERNEL_VSYSCALL_NOCACHE __pgprot(__PAGE_KERNEL_VSYSCALL|(__PAGE_KERNEL_RO | _PAGE_PCD))

/*
* The i386 can't do page protection for execute, and considers that
Index: include/asm-i386/vsyscall-gtod.h
===================================================================
--- /dev/null (tree:3b4165efeade40b65ea2e8188184e4f8d3d8d636)
+++ 9d016193cc103e4ba0026e943774ef0f774bf72f/include/asm-i386/vsyscall-gtod.h (mode:100644)
@@ -0,0 +1,41 @@
+#ifndef _ASM_i386_VSYSCALL_GTOD_H_
+#define _ASM_i386_VSYSCALL_GTOD_H_
+
+#ifdef CONFIG_NEWTOD_VSYSCALL
+
+/* VSYSCALL_GTOD_START must be the same as
+ * __fix_to_virt(FIX_VSYSCALL_GTOD FIRST_PAGE)
+ * and must also be same as addr in vmlinux.lds.S */
+#define VSYSCALL_GTOD_START 0xffffd000
+#define VSYSCALL_GTOD_SIZE 1024
+#define VSYSCALL_GTOD_END (VSYSCALL_GTOD_START + PAGE_SIZE)
+#define VSYSCALL_GTOD_NUMPAGES \
+ ((VSYSCALL_GTOD_END-VSYSCALL_GTOD_START) >> PAGE_SHIFT)
+#define VSYSCALL_ADDR(vsyscall_nr) \
+ (VSYSCALL_GTOD_START+VSYSCALL_GTOD_SIZE*(vsyscall_nr))
+
+#ifdef __KERNEL__
+#ifndef __ASSEMBLY__
+#include <linux/seqlock.h>
+#define __vsyscall(nr) __attribute__ ((unused,__section__(".vsyscall_" #nr)))
+
+/* ReadOnly generic time value attributes*/
+#define __section_vsyscall_gtod_data __attribute__ ((unused, __section__ (".vsyscall_gtod_data")))
+
+#define __section_vsyscall_gtod_lock __attribute__ ((unused, __section__ (".vsyscall_gtod_lock")))
+
+
+enum vsyscall_num {
+ __NR_vgettimeofday,
+ __NR_vtime,
+};
+
+int vsyscall_init(void);
+extern char __vsyscall_0;
+#endif /* __ASSEMBLY__ */
+#endif /* __KERNEL__ */
+#else /* CONFIG_NEWTOD_VSYSCALL */
+#define vsyscall_init()
+#define vsyscall_set_timesource(x)
+#endif /* CONFIG_NEWTOD_VSYSCALL */
+#endif /* _ASM_i386_VSYSCALL_GTOD_H_ */


2005-05-14 19:59:08

by Christoph Lameter

[permalink] [raw]
Subject: IA64 implementation of timesource for new time of day subsystem

On Fri, 13 May 2005, john stultz wrote:

> I look forward to your comments and feedback.

Here is the implementation of the IA64 timesources for the new time of
day subsystem.

This is quite straighforward. Thanks John. However, the ITC
interpolator can no longer use MMIO in SMP situations since there is no
provision for jitter compensation in the new time of day subsystem. I have
implemented that via a function now which will slow down clock access
for non SGI IA64 hardware significantly since it will not be able to use
the fastcall anymore.

I am working on the fastcall but I would need a couple of changes
to the core code to make the following symbols non-static since they
will need to be accessed from the fast syscall handler:

timesource
system_time
wall_time_offset
offset_base

The asm code is going to be simplified because there will be no need
to support jitter compensation and most values are now single 64 bit values
instead of two 64 bit values with separate seconds and nanoseconds.

However, the asm code is also is going to be a bit more complicated since
the split from 64 bit nanoseconds into seconds and
nanoseconds/microseconds for gettimeofday and clock_gettime
has to be done in asm as well.

I would recommend to add jitter compensation to the time sources. Otherwise
each ITC/TSC like timesource will have to implement that on its own.

Signed-off-by: Christoph Lameter <[email protected]>

Index: linux-2.6.12-rc4/drivers/timesource/Makefile
===================================================================
--- linux-2.6.12-rc4.orig/drivers/timesource/Makefile 2005-05-14 11:21:46.000000000 -0700
+++ linux-2.6.12-rc4/drivers/timesource/Makefile 2005-05-14 12:15:08.000000000 -0700
@@ -4,9 +4,8 @@ obj-$(CONFIG_PPC64) += ppc64_timebase.o
obj-$(CONFIG_PPC) += ppc_timebase.o
obj-$(CONFIG_ARCH_S390) += s390_tod.o

-# XXX - Untested/Uncompiled
-#obj-$(CONFIG_IA64) += itc.c
-#obj-$(CONFIG_IA64_SGI_SN2) += sn2_rtc.c
+obj-$(CONFIG_IA64) += itc.o
+obj-$(CONFIG_IA64_SGI_SN2) += sn2_rtc.o
obj-$(CONFIG_X86) += tsc.o
obj-$(CONFIG_X86) += i386_pit.o
obj-$(CONFIG_X86) += tsc-interp.o
Index: linux-2.6.12-rc4/drivers/timesource/itc.c
===================================================================
--- linux-2.6.12-rc4.orig/drivers/timesource/itc.c 2005-05-14 11:21:46.000000000 -0700
+++ linux-2.6.12-rc4/drivers/timesource/itc.c 2005-05-14 12:20:00.000000000 -0700
@@ -1,31 +1,83 @@
-/* XXX - this is totally untested and uncompiled
- * TODO:
- * o cpufreq issues
- * o unsynched ITCs ?
+/*
+ * drivers/timesource/itc.c
+ *
+ * Use of the ITC register on Itanium processors as a time source
+ *
+ * Copyright (C) 2005 Silicon Graphics, Inc.
+ * Christoph Lameter, <[email protected]>
*/
+#include <linux/config.h>
#include <linux/timesource.h>
+#include <linux/jiffies.h>

-/* XXX - Other includes needed for:
- * sal_platform_features, IA64_SAL_PLATFORM_FEATURE_ITC_DRIFT,
- * local_cpu_data->itc_freq
- * See arch/ia64/kernel/time.c for ideas
- */
+#include <asm/machvec.h>
+#include <asm/sal.h>
+#include <asm/system.h>

static struct timesource_t timesource_itc = {
.name = "itc",
.priority = 25,
.type = TIMESOURCE_CYCLES,
.mask = (cycle_t)-1,
- .mult = 0, /* to be set */
.shift = 22,
};

+#ifdef CONFIG_SMP
+static int nojitter;
+
+static __init int nojitter_setup(char *str)
+{
+ nojitter = 1;
+ printk(KERN_INFO "ITC timesource: Jitter checking bypassed.\n");
+ return 1;
+}
+
+__setup("itc_nojitter", nojitter_setup);
+
+cycle_t last_itc;
+
+/*
+ * Insure that ITC is monotonically increasing by comparing
+ * to the last value encountered. Do this in an atomic fashion
+ * by using cmpxchg for synchronization between processors
+ * and at the same time for the updating of the last_itc value;
+ */
+static cycle_t itc_filtered(void) {
+ cycle_t now, last;
+
+ do {
+ last = last_itc;
+ smb_rmb();
+ now = get_cycles();
+ if (time_before(now, last))
+ return last_itc;
+ } while (cmpxchg(&last_itc, last, now) != last);
+ return now;
+}
+#endif
+
static int __init init_itc_timesource(void)
{
if (!(sal_platform_features & IA64_SAL_PLATFORM_FEATURE_ITC_DRIFT)) {
- /* XXX - I'm not really sure if itc_freq is in cyc/sec */
timesource_itc.mult = timesource_hz2mult(local_cpu_data->itc_freq,
timesource_itc.shift);
+#ifdef CONFIG_SMP
+ /* ITCs are never accurately synchronized in an SMP configuration
+ * even if the ITC_DRIFT bit is not set.
+ * Jitter compensation requires a cmpxchg which may limit
+ * the scalability of the syscalls for retrieving time.
+ * ITC synchronization is usually successful to within a few
+ * ITC ticks but this is not a sure thing. If you need to improve
+ * timer performance in SMP situations then boot the kernel with the
+ * "itc_nojitter" option. However, doing so may result in time fluctuating
+ * (maybe even appearing to go backward!) if the ITC offsets between the
+ * individual CPUs are too large.
+ */
+ if (!nojitter) {
+ timesource_itc.type = TIMESOURCE_FUNCTION;
+ timesource_itc.read_fnct = itc_filtered;
+ }
+#endif
register_timesource(&timesource_itc);
}
return 0;
Index: linux-2.6.12-rc4/arch/ia64/kernel/time.c
===================================================================
--- linux-2.6.12-rc4.orig/arch/ia64/kernel/time.c 2005-05-14 11:21:46.000000000 -0700
+++ linux-2.6.12-rc4/arch/ia64/kernel/time.c 2005-05-14 12:15:08.000000000 -0700
@@ -139,6 +139,7 @@ ia64_cpu_local_tick (void)
ia64_set_itm(local_cpu_data->itm_next);
}

+#ifndef CONFIG_NEWTOD
static int nojitter;

static int __init nojitter_setup(char *str)
@@ -150,6 +151,7 @@ static int __init nojitter_setup(char *s

__setup("nojitter", nojitter_setup);

+#endif

void __devinit
ia64_init_itm (void)
Index: linux-2.6.12-rc4/drivers/timesource/sn2_rtc.c
===================================================================
--- linux-2.6.12-rc4.orig/drivers/timesource/sn2_rtc.c 2005-05-14 11:21:46.000000000 -0700
+++ linux-2.6.12-rc4/drivers/timesource/sn2_rtc.c 2005-05-14 12:15:08.000000000 -0700
@@ -1,29 +1,38 @@
-#include <linux/timesource.h>
-/* XXX this will need some includes
- * to find: sn_rtc_cycles_per_second and RTC_COUNTER_ADDR
- * See arch/ia64/sn/kernel/sn2/timer.c for likely suspects
+/*
+ * linux/drivers/timesource/sn2_rtc.c
+ *
+ * Use the RTC on the SN2 on an Altix system as the time source
+ *
+ * (C) 2005 Silicon Graphics, Inc.
+ * Christoph Lameter <[email protected]>
*/

+
+#include <linux/timesource.h>
+#include <asm/system.h>
+#include <asm/sn/leds.h>
+#include <asm/sn/shub_mmr.h>
+#include <asm/sn/clksupport.h>
+
+extern unsigned long sn_rtc_cycles_per_second;
+
#define SN2_RTC_MASK ((1LL << 55) - 1)
#define SN2_SHIFT 10

struct timesource_t timesource_sn2_rtc = {
.name = "sn2_rtc",
- .priority = 300, /* XXX - not sure what this should be */
+ .priority = 999,
.type = TIMESOURCE_MMIO_64,
- .mmio_ptr = NULL,
.mask = (cycle_t)SN2_RTC_MASK,
.mult = 0, /* set below */
.shift = SN2_SHIFT,
};

-static void __init init_sn2_timesource(void)
+static __init int init_sn2_timesource(void)
{
- timesource_sn2_rtc.mult = timesource_hz2mult(sn_rtc_cycles_per_second,
- SN2_SHIFT);
+ timesource_sn2_rtc.mult = timesource_hz2mult(sn_rtc_cycles_per_second, SN2_SHIFT);
timesource_sn2_rtc.mmio_ptr = RTC_COUNTER_ADDR;
-
- register_time_interpolator(&timesource_sn2_rtc);
+ register_timesource(&timesource_sn2_rtc);
return 0;
}
module_init(init_sn2_timesource);

2005-05-15 09:12:10

by James Courtier-Dutton

[permalink] [raw]
Subject: Re: IA64 implementation of timesource for new time of day subsystem

Christoph Lameter wrote:
> On Fri, 13 May 2005, john stultz wrote:
>
>
>>I look forward to your comments and feedback.
>
>
> Here is the implementation of the IA64 timesources for the new time of
> day subsystem.
>
> This is quite straighforward. Thanks John. However, the ITC
> interpolator can no longer use MMIO in SMP situations since there is no
> provision for jitter compensation in the new time of day subsystem. I have
> implemented that via a function now which will slow down clock access
> for non SGI IA64 hardware significantly since it will not be able to use
> the fastcall anymore.
>
> I am working on the fastcall but I would need a couple of changes
> to the core code to make the following symbols non-static since they
> will need to be accessed from the fast syscall handler:
>
> timesource
> system_time
> wall_time_offset
> offset_base
>

Will this mean that Linux will have a monotonic time source?
For media players we need a timesource that does not change under any
circumstances. e.g. User changes the clock time, the monotonic time
source should not change. The monotonic time source should just start at
0 at power on, and continually increase accurately over time. I.e. A
very accurate "uptime" measurement.

James

2005-05-15 10:17:26

by Andi Kleen

[permalink] [raw]
Subject: Re: IA64 implementation of timesource for new time of day subsystem

> Will this mean that Linux will have a monotonic time source?

2.6 has had one for a long time (posix_gettime(CLOCK_MONOTONIC))

-Andi

2005-05-16 15:35:52

by Chris Friesen

[permalink] [raw]
Subject: Re: IA64 implementation of timesource for new time of day subsystem

Andi Kleen wrote:
>>Will this mean that Linux will have a monotonic time source?
>
> 2.6 has had one for a long time (posix_gettime(CLOCK_MONOTONIC))

I think that's clock_gettime(), no?

Chris

2005-05-16 17:35:08

by john stultz

[permalink] [raw]
Subject: Re: IA64 implementation of timesource for new time of day subsystem

On Sat, 2005-05-14 at 12:55 -0700, Christoph Lameter wrote:
> On Fri, 13 May 2005, john stultz wrote:
>
> > I look forward to your comments and feedback.
>
> Here is the implementation of the IA64 timesources for the new time of
> day subsystem.

Great!

> This is quite straighforward. Thanks John. However, the ITC
> interpolator can no longer use MMIO in SMP situations since there is no
> provision for jitter compensation in the new time of day subsystem. I have
> implemented that via a function now which will slow down clock access
> for non SGI IA64 hardware significantly since it will not be able to use
> the fastcall anymore.
>
> I am working on the fastcall but I would need a couple of changes
> to the core code to make the following symbols non-static since they
> will need to be accessed from the fast syscall handler:
>
> timesource
> system_time
> wall_time_offset
> offset_base


Actually that shouldn't be necessary. Look at my arch-x86-64 patch or
vsyscall-i386 patch for how the arch_vsyscall_gtod_update() function is
used. It provides an arch specific hook called by the timeofday core to
provide the information you desire.

Please let me know if it is not sufficient for some reason.

[snip]

> I would recommend to add jitter compensation to the time sources. Otherwise
> each ITC/TSC like timesource will have to implement that on its own.

Just to clarify for others, this is the same unsynced cpu cycle counter
problem that affects the TSC on i386 and x86-64. ia64 gets around the
problem by checking on every call to gettimeofday() if the ITC value is
less then the ITC value used on the previous call to gettimeofday(). If
the value is less (ie: would result in time going backwards) it just
uses the last value to calculate time. It then uses cmpxchg to
atomically update the last ITC value.

The problem I have with this is it that if the ITCs are not synced, they
really are not good timesources. If one cpu's ITC is behind another, the
net result of the above algorithm is cpu 2 will always just use cpu 1's
last calculated time. This could cause jumps in time when a process
moves from cpu2 to cpu1.

Since it only affects the TSC and ITC, I think keeping the decision to
use cmpxchg in the timesource code, as you've implemented with the ITC
is the best way to go. If you really want to you can special case the
arch specific fsyscall code by switching on the time source .name, and
that would allow you to use a similar cmpxchg algorithm there as well.


thanks
-john

2005-05-16 18:14:20

by Christoph Lameter

[permalink] [raw]
Subject: Re: IA64 implementation of timesource for new time of day subsystem

On Mon, 16 May 2005, john stultz wrote:

> Actually that shouldn't be necessary. Look at my arch-x86-64 patch or
> vsyscall-i386 patch for how the arch_vsyscall_gtod_update() function is
> used. It provides an arch specific hook called by the timeofday core to
> provide the information you desire.
>
> Please let me know if it is not sufficient for some reason.

Obviously this wont work since you cannot execute C code nor functions in
an ia64 fastcall. I need the variables exported.

> > I would recommend to add jitter compensation to the time sources. Otherwise
> > each ITC/TSC like timesource will have to implement that on its own.
>
> Just to clarify for others, this is the same unsynced cpu cycle counter
> problem that affects the TSC on i386 and x86-64. ia64 gets around the
> problem by checking on every call to gettimeofday() if the ITC value is
> less then the ITC value used on the previous call to gettimeofday(). If
> the value is less (ie: would result in time going backwards) it just
> uses the last value to calculate time. It then uses cmpxchg to
> atomically update the last ITC value.

Nope. You are way off here. Unsynched cpu cycle counters lead to the ITC
timesource not being registered.

> The problem I have with this is it that if the ITCs are not synced, they
> really are not good timesources. If one cpu's ITC is behind another, the
> net result of the above algorithm is cpu 2 will always just use cpu 1's
> last calculated time. This could cause jumps in time when a process
> moves from cpu2 to cpu1.

Note again that the use of cmpxchg is NOT covering the case of ITCs not
being synced. If the ITCs are not synced then no timesource will be
established for ITC!

This is the case of ITC's running synchronous but at a tiny offset. The
startup on IA64 syncs the ITCs but cannot guarantee a complete sync. There
may be a small offset of a few clock ticks. The cmpxchg is
needed to compensate for that small offset. I imagine that other
architectures have similar issues.

> Since it only affects the TSC and ITC, I think keeping the decision to
> use cmpxchg in the timesource code, as you've implemented with the ITC
> is the best way to go. If you really want to you can special case the
> arch specific fsyscall code by switching on the time source .name, and
> that would allow you to use a similar cmpxchg algorithm there as well.

Again this will not work on IA64 since it does the fast system calls in a
different way.

Clock jitter can affect multiple clock sources that may fluctuate
in a minor way due to a variety of influences. Jitter compensation may
help in these situations.

2005-05-16 18:46:24

by john stultz

[permalink] [raw]
Subject: Re: IA64 implementation of timesource for new time of day subsystem

On Mon, 2005-05-16 at 11:09 -0700, Christoph Lameter wrote:
> On Mon, 16 May 2005, john stultz wrote:
>
> > Actually that shouldn't be necessary. Look at my arch-x86-64 patch or
> > vsyscall-i386 patch for how the arch_vsyscall_gtod_update() function is
> > used. It provides an arch specific hook called by the timeofday core to
> > provide the information you desire.
> >
> > Please let me know if it is not sufficient for some reason.
>
> Obviously this wont work since you cannot execute C code nor functions in
> an ia64 fastcall. I need the variables exported.

No. Look at the x86-64 code. The generic timeofday core calls
arch_update_vsyscall_gtod() (sorry for the function name confusion
above) any time the timekeeping variables change.

All you need is to do is define implement an ia64 version of
arch_update_vsyscall_gtod() which can then export the values passed to
it in whatever form you desire so it can be used by the fastcall.


> > > I would recommend to add jitter compensation to the time sources. Otherwise
> > > each ITC/TSC like timesource will have to implement that on its own.
> >
> > Just to clarify for others, this is the same unsynced cpu cycle counter
> > problem that affects the TSC on i386 and x86-64. ia64 gets around the
> > problem by checking on every call to gettimeofday() if the ITC value is
> > less then the ITC value used on the previous call to gettimeofday(). If
> > the value is less (ie: would result in time going backwards) it just
> > uses the last value to calculate time. It then uses cmpxchg to
> > atomically update the last ITC value.
>
> Nope. You are way off here. Unsynched cpu cycle counters lead to the ITC
> timesource not being registered.
>
> > The problem I have with this is it that if the ITCs are not synced, they
> > really are not good timesources. If one cpu's ITC is behind another, the
> > net result of the above algorithm is cpu 2 will always just use cpu 1's
> > last calculated time. This could cause jumps in time when a process
> > moves from cpu2 to cpu1.
>
> Note again that the use of cmpxchg is NOT covering the case of ITCs not
> being synced. If the ITCs are not synced then no timesource will be
> established for ITC!
>
> This is the case of ITC's running synchronous but at a tiny offset. The
> startup on IA64 syncs the ITCs but cannot guarantee a complete sync. There
> may be a small offset of a few clock ticks. The cmpxchg is
> needed to compensate for that small offset. I imagine that other
> architectures have similar issues.

Just per-cpu cycle counters like the TSC and ITC to my knowledge. The
PPC timebase increments off of a global bus-signal, so it is not
affected. I'd be interested in other examples, though.


> > Since it only affects the TSC and ITC, I think keeping the decision to
> > use cmpxchg in the timesource code, as you've implemented with the ITC
> > is the best way to go. If you really want to you can special case the
> > arch specific fsyscall code by switching on the time source .name, and
> > that would allow you to use a similar cmpxchg algorithm there as well.
>
> Again this will not work on IA64 since it does the fast system calls in a
> different way.

I think you'll find otherwise. The arch_update_vsyscall_gtod() interface
gives each arch quite a bit of flexibility in how to implement their own
accelerated timeofday.

In pseudo code, all you would need to do is something like:

arch_update_vsyscall_gtod(wall_time, offset_base, timesource, ntp_adj):

fastcall_data.wall = wall_time
fastcall_data.base = offset_base
fastcall_data.ts = timesource
fastcall_data.ntpadj = ntp_adj


fastcall_gtod(): [I understand this would be done in asm]

switch(fastcall_data.ts.type):
case TIMESOURCE_MMIO:
now = <mmio read code>
case TIMESOURCE_CYCLE:
bow = <cycle read code>
case TIMESOURCE_FUNCTION:
# special case for itc
if (fastcall_data.ts.name == "itc")
now = <cycle jitter read>
else
return gettimeofday()

offset = (now - fastcall_data.base)
offset *= fastcall_data.ts.mult
offset += fastcall_data.ntpadj
offset >>= fastcall_data.ts.shift

return fastcall_data.wall + offset


> Clock jitter can affect multiple clock sources that may fluctuate
> in a minor way due to a variety of influences. Jitter compensation may
> help in these situations.

Forgive me as I'm just not aware of these, and am thus hesitant to
change the core code for two known cases that can be cleanly dealt with
in the timesource driver code.

thanks
-john

2005-05-16 18:52:40

by john stultz

[permalink] [raw]
Subject: Re: [RFC][PATCH (3/7)] new timeofday x86-64 specific changes (v A5)

All,
I just realized a last minute function name change didn't get updated
on the x86-64 patch. This small fix applies on top of my timeofday-
arch-x86-64_A5 patch to resolve the issue.

thanks
-john

arch/x86_64/kernel/vsyscall.c: needs update
Index: arch/x86_64/kernel/vsyscall.c
===================================================================
--- 59012af04a74f0dbf82461c74469537b90e1c8ed/arch/x86_64/kernel/vsyscall.c (mode:100644)
+++ uncommitted/arch/x86_64/kernel/vsyscall.c (mode:100644)
@@ -194,7 +194,7 @@
}

/* save off wall time as timeval */
- vsyscall_gtod_data.wall_time_tv = ns2timeval(wall_time);
+ vsyscall_gtod_data.wall_time_tv = ns_to_timeval(wall_time);

/* save offset_base */
vsyscall_gtod_data.offset_base = offset_base;


2005-05-16 18:56:01

by john stultz

[permalink] [raw]
Subject: Re: IA64 implementation of timesource for new time of day subsystem

On Mon, 2005-05-16 at 11:45 -0700, john stultz wrote:
> All you need is to do is define implement an ia64 version of
> arch_update_vsyscall_gtod() which can then export the values passed to
> it in whatever form you desire so it can be used by the fastcall.


Sorry, that first sentence wasn't complete, I went to go look up the
reference and got distracted with something else.

It should be "All you need to do is define NEWTOD_VSYSCALL and
implement.."

time for more coffee :)
-john


2005-05-16 19:35:30

by Christoph Lameter

[permalink] [raw]
Subject: Re: IA64 implementation of timesource for new time of day subsystem

On Mon, 16 May 2005, john stultz wrote:

> In pseudo code, all you would need to do is something like:
>
> arch_update_vsyscall_gtod(wall_time, offset_base, timesource, ntp_adj):
>
> fastcall_data.wall = wall_time
> fastcall_data.base = offset_base
> fastcall_data.ts = timesource
> fastcall_data.ntpadj = ntp_adj


Ahh. Thanks.

> > Clock jitter can affect multiple clock sources that may fluctuate
> > in a minor way due to a variety of influences. Jitter compensation may
> > help in these situations.
>
> Forgive me as I'm just not aware of these, and am thus hesitant to
> change the core code for two known cases that can be cleanly dealt with
> in the timesource driver code.

I am happy to leave the situation as is since it does not affect SGI.
We have a memory mapped timer that does not need this jitter compensation.

Other IA64 vendors will see that their timer performance drops
significantly after the new timer subsystem is in. IBM no longer
has IA64 systems that rely on ITC?

2005-05-16 19:35:31

by David Mosberger

[permalink] [raw]
Subject: Re: IA64 implementation of timesource for new time of day subsystem

>>>>> On Mon, 16 May 2005 12:24:08 -0700 (PDT), Christoph Lameter <[email protected]> said:

Christoph> Other IA64 vendors will see that their timer performance
Christoph> drops significantly after the new timer subsystem is
Christoph> in. IBM no longer has IA64 systems that rely on ITC?

Would that somehow make it ok to break existing and working code?

--david

2005-05-16 19:51:25

by john stultz

[permalink] [raw]
Subject: Re: IA64 implementation of timesource for new time of day subsystem

On Mon, 2005-05-16 at 12:29 -0700, David Mosberger wrote:
> >>>>> On Mon, 16 May 2005 12:24:08 -0700 (PDT), Christoph Lameter <[email protected]> said:
>
> Christoph> Other IA64 vendors will see that their timer performance
> Christoph> drops significantly after the new timer subsystem is
> Christoph> in. IBM no longer has IA64 systems that rely on ITC?
>
> Would that somehow make it ok to break existing and working code?

No. I intend to preserve the existing functionality (and performance) of
the current code. The current timeofday core should allow for this (as I
described in my last mail), so really its just a matter of either me or
someone else getting around to properly converting that arch with the
help of the arch maintainer. Until the arch is really ready to use the
new timeofday core, no changes are necessary.

Christoph's patch is just a step in the right direction. That is, a much
appreciated step, I haven't yet had the time to implement or test the
ia64 timesources. Any notable regressions introduced will need to be
resolved before the arch specific patch is finally submitted.

What I'm trying to shake out, with Christoph's help, is any major
limitations in the core timeofday code that would keep an arch from
being able to use it. I feel Christoph's concerns have been addressed,
but please let me know if you disagree.

thanks
-john

2005-05-16 20:31:14

by Christoph Lameter

[permalink] [raw]
Subject: Re: IA64 implementation of timesource for new time of day subsystem

On Mon, 16 May 2005, john stultz wrote:

>
> No. I intend to preserve the existing functionality (and performance) of
> the current code. The current timeofday core should allow for this (as I
> described in my last mail), so really its just a matter of either me or
> someone else getting around to properly converting that arch with the
> help of the arch maintainer. Until the arch is really ready to use the
> new timeofday core, no changes are necessary.

Its not an arch specific issue. The time sources need to have a field that
specifies that jitter protection is needed and there needs to be some
logic to implement it. Otherwise we have to develop special functions for
each timesource that deal with jitter protection. Function will make a
fastcall for the clocks that use jitter protection not possible and thus
timer access will slow down.

> What I'm trying to shake out, with Christoph's help, is any major
> limitations in the core timeofday code that would keep an arch from
> being able to use it. I feel Christoph's concerns have been addressed,
> but please let me know if you disagree.

Please add jitter protection to the arch independent code.

2005-05-16 20:55:44

by john stultz

[permalink] [raw]
Subject: Re: IA64 implementation of timesource for new time of day subsystem

On Mon, 2005-05-16 at 13:27 -0700, Christoph Lameter wrote:
> On Mon, 16 May 2005, john stultz wrote:
>
> >
> > No. I intend to preserve the existing functionality (and performance) of
> > the current code. The current timeofday core should allow for this (as I
> > described in my last mail), so really its just a matter of either me or
> > someone else getting around to properly converting that arch with the
> > help of the arch maintainer. Until the arch is really ready to use the
> > new timeofday core, no changes are necessary.
>
> Its not an arch specific issue. The time sources need to have a field that
> specifies that jitter protection is needed and there needs to be some
> logic to implement it. Otherwise we have to develop special functions for
> each timesource that deal with jitter protection.

You've only pointed out two timesources that could want this (ITC and
TSC), so I think its reasonable to do your jitter handling in the
timesource driver. If there are other arches that have non hardware
synced per-cpu counters, then it would be something to consider.

> Function will make a
> fastcall for the clocks that use jitter protection not possible and thus
> timer access will slow down.


I disagree. I already explained how this can be done via the
arch_update_vsyscall_gtod() interface by special casing for this
specific well known time source.


> > What I'm trying to shake out, with Christoph's help, is any major
> > limitations in the core timeofday code that would keep an arch from
> > being able to use it. I feel Christoph's concerns have been addressed,
> > but please let me know if you disagree.
>
> Please add jitter protection to the arch independent code.

If more timesources need that functionality, then I'll be happy to.
Until then it should stay in the ia64 specific itc driver and fastcall
code.

thanks
-john

2005-05-16 21:01:13

by David Mosberger

[permalink] [raw]
Subject: Re: IA64 implementation of timesource for new time of day subsystem

>>>>> On Mon, 16 May 2005 13:53:44 -0700, john stultz <[email protected]> said:


John> You've only pointed out two timesources that could want this
John> (ITC and TSC), so I think its reasonable to do your jitter
John> handling in the timesource driver. If there are other arches
John> that have non hardware synced per-cpu counters, then it would
John> be something to consider.

I think Christopher's point is that _all_ time-sources which require
software syncing will need this since it is not possible to sync
perfectly, even if there is no drift.

--david

2005-05-16 21:40:59

by john stultz

[permalink] [raw]
Subject: Re: IA64 implementation of timesource for new time of day subsystem

On Mon, 2005-05-16 at 13:58 -0700, David Mosberger wrote:
> >>>>> On Mon, 16 May 2005 13:53:44 -0700, john stultz <[email protected]> said:
> John> You've only pointed out two timesources that could want this
> John> (ITC and TSC), so I think its reasonable to do your jitter
> John> handling in the timesource driver. If there are other arches
> John> that have non hardware synced per-cpu counters, then it would
> John> be something to consider.
>
> I think Christopher's point is that _all_ time-sources which require
> software syncing will need this since it is not possible to sync
> perfectly, even if there is no drift.

Yes, but to my knowledge it is only the ITC that does software syncing.
The TSC could use it as well, but doesn't. Other then that I haven't
heard of any other timesource that would use such functionality.

Since its possible to do jitter compensation within the itc timesource
driver (and within the fastcall code to preserve the existing
performance), would it be reasonable to deffer making the jitter
compensation code generic until another timesource needs it? It should
be a fairly simple change.

Or is this just something I'm being hard-headed about?

thanks
-john






2005-05-16 21:58:47

by john stultz

[permalink] [raw]
Subject: Re: [RFC][PATCH (7/7)] new timeofday i386 vsyscall proof of concept (v A5)

All,
Yikes, here is another small fix a last minute function name
change caused, this time in the vsyscall-i386 patch. This small fix
applies on top of my timeofday-vsyscall-i386_A5 patch to resolve the
issue.

Also it was noted that the vsyscall-i386 patch puts the vsyscall config
option in a bad place. I'll fix that in my next release.

Sorry about that.

thanks
-john

arch/i386/kernel/vsyscall-gtod.c: needs update
Index: arch/i386/kernel/vsyscall-gtod.c
===================================================================
--- 9d016193cc103e4ba0026e943774ef0f774bf72f/arch/i386/kernel/vsyscall-gtod.c (mode:100644)
+++ uncommitted/arch/i386/kernel/vsyscall-gtod.c (mode:100644)
@@ -113,7 +113,7 @@
}

/* save off wall time as timeval */
- vsyscall_gtod_data.wall_time_tv = ns2timeval(wall_time);
+ vsyscall_gtod_data.wall_time_tv = ns_to_timeval(wall_time);

/* save offset_base */
vsyscall_gtod_data.offset_base = offset_base;



2005-05-16 22:03:12

by Christoph Lameter

[permalink] [raw]
Subject: Re: IA64 implementation of timesource for new time of day subsystem

On Mon, 16 May 2005, john stultz wrote:

> Since its possible to do jitter compensation within the itc timesource
> driver (and within the fastcall code to preserve the existing
> performance), would it be reasonable to deffer making the jitter
> compensation code generic until another timesource needs it? It should
> be a fairly simple change.

Well looks that we will start out with the new time subsystem by putting
some hacks in. I need to check in the funky routine (for setting up the
fastcall configuration) if the function pointer passed == jitter-compensated-itc
and depending on that set a special flag that makes the asm code do jitter
compensation.

> Or is this just something I'm being hard-headed about?

Looks like it. Its not that difficult. Add a jitter compensation flag to
timesource. Check on retrieving from a timesource if its less than last if
flag is set. Pass the field to the funky function to setup the
vsyscalls.

Maybe add a general flags field? There may be other things that need to be
added in the future.

2005-05-17 08:07:27

by Ulrich Windl

[permalink] [raw]
Subject: Re: IA64 implementation of timesource for new time of day subsystem

On 16 May 2005 at 12:29, David Mosberger wrote:

> >>>>> On Mon, 16 May 2005 12:24:08 -0700 (PDT), Christoph Lameter <[email protected]> said:
>
> Christoph> Other IA64 vendors will see that their timer performance
> Christoph> drops significantly after the new timer subsystem is
> Christoph> in. IBM no longer has IA64 systems that rely on ITC?
>
> Would that somehow make it ok to break existing and working code?

AFAIR the design goal was not to make a new time implementation, but to make a
more precise one ;-)

Regards,
Ulrich

2005-05-17 23:35:06

by Nishanth Aravamudan

[permalink] [raw]
Subject: [RFC][PATCH 0/4] new timeofday-based soft-timer subsystem

On 13.05.2005 [17:16:35 -0700], john stultz wrote:
> All,
> This patch implements the architecture independent portion of the new
> time of day subsystem. For a brief description on the rework, see here:
> http://lwn.net/Articles/120850/ (Many thanks to the LWN team for that
> easy to understand writeup!)
>
> I intend this to be the last RFC release and to submit this patch to
> Andrew for for testing near the end of this month. So please, if you
> have any complaints, suggestions, or blocking issues, let me know.

I have been working closely with John to re-work the soft-timer subsytem
to use the new timeofday() subsystem. The following patches attempts to
begin this process. I would greatly appreciate any comments.

Some design points:

1) The patch is small but does a lot.
a) Renames timer_jiffies to last_timer_time (now that we are not
jiffies-based).
b) Converts the soft-timer time-vector's/bucket's entries to
timerinterval (a new unit) width, instead of jiffy width.
c) Defines timerintervals to be the current time as reported by the new
timeofday-subsystem shifted down by TIMERINTERVAL_BITS bits.
Thus, various pseudo-'human time' units can be emulated.
d) Uses do_monotonic_clock() (converted to timerintervals) as the basis
for addition and expiration of timers instead of jiffies.
e) Adds some new helper functions for dealing with nanosecond values.

2) The patch depends on John's timeofday core rework. For arches that
will not have the new timeofday (or for which the rework is still in
progress), I can emulate the existing system with a separate patch (Such
a possible patch will follow). The goal of this patch, though, is just
to show how easy the new system can be implemented and the benefits. It
has been tested on x86 and x86_64 archs; there may be some issues with
ppc and ppc64 which I am working on resolving.

3) The reason for the re-work? Many people complain about all of the
adding of 1 jiffy here or there to fix bugs. This new systems is
fundamentally human-time oriented and deals with those issues correctly
and, more importantly, sanely :)

The code is reasonably well commented, but does expect readers to
understand the current soft-timer subsystem.

This is still an early working of this patch, so I expect criticism, and
am happy to make changes.

I will try to get some current benchmark differentials posted tomorrow.
The previous patch I released showed little difference between mainline,
John's timeofday rework and my soft-timer rework in kernbench.

Overview:

1/4: A small interdiff between John's current stack and what is
necessary for my patch to work. Moves timeofday.h architecture-specific
code to asm/timeofday.h. This is necessary for my patch.

2/4: Converts the soft-timer subsystem to use timerinterval as the units
of addition and expiration.

3/4: Converts, as an example, sys_nanosleep() to use the new interfaces
provided by patch 2. For instance, you (albeit somewhat rarely -- maybe
once out of every 100,000 requests) may get only 10 usecs of actual
sleep (instead of 2+ msecs no matter what).

4/4: Enables non-NEWTOD archs to use the same interfaces, with some
performance penalty (am working on a better alternative, this is just
a POC).

Thanks,
Nish

2005-05-17 23:39:07

by Nishanth Aravamudan

[permalink] [raw]
Subject: [RFC][PATCH 1/4] move arch-specific timeofday core to asm

On 17.05.2005 [16:33:00 -0700], Nishanth Aravamudan wrote:
> On 13.05.2005 [17:16:35 -0700], john stultz wrote:
> > All,
> > This patch implements the architecture independent portion of the new
> > time of day subsystem. For a brief description on the rework, see here:
> > http://lwn.net/Articles/120850/ (Many thanks to the LWN team for that
> > easy to understand writeup!)
> >
> > I intend this to be the last RFC release and to submit this patch to
> > Andrew for for testing near the end of this month. So please, if you
> > have any complaints, suggestions, or blocking issues, let me know.
>
> I have been working closely with John to re-work the soft-timer subsytem
> to use the new timeofday() subsystem. The following patches attempts to
> begin this process. I would greatly appreciate any comments.

Description: Updates the timeofday-rework to move arch-specific code
into asm headers files.

Signed-off-by: Nishanth Aravamudan <[email protected]>

diff -urpN 2.6.12-rc4-tod-lkml/arch/i386/kernel/time.c 2.6.12-rc4-tod/arch/i386/kernel/time.c
--- 2.6.12-rc4-tod-lkml/arch/i386/kernel/time.c 2005-05-17 15:30:06.000000000 -0700
+++ 2.6.12-rc4-tod/arch/i386/kernel/time.c 2005-05-17 13:01:12.000000000 -0700
@@ -56,6 +56,7 @@
#include <asm/uaccess.h>
#include <asm/processor.h>
#include <asm/timer.h>
+#include <asm/timeofday.h>

#include "mach_time.h"

@@ -68,8 +69,6 @@

#include "io_ports.h"

-#include <linux/timeofday.h>
-
extern spinlock_t i8259A_lock;
int pit_latch_buggy; /* extern */

diff -urpN 2.6.12-rc4-tod-lkml/include/asm-generic/timeofday.h 2.6.12-rc4-tod/include/asm-generic/timeofday.h
--- 2.6.12-rc4-tod-lkml/include/asm-generic/timeofday.h 1969-12-31 16:00:00.000000000 -0800
+++ 2.6.12-rc4-tod/include/asm-generic/timeofday.h 2005-05-17 13:01:12.000000000 -0700
@@ -0,0 +1,26 @@
+/* linux/include/asm-generic/timeofday.h
+ *
+ * This file contains the asm-generic interface
+ * to the arch specific calls used by the time of day subsystem
+ */
+#ifndef _ASM_GENERIC_TIMEOFDAY_H
+#define _ASM_GENERIC_TIMEOFDAY_H
+#include <linux/types.h>
+#include <linux/time.h>
+#include <linux/timex.h>
+#include <asm/div64.h>
+#ifdef CONFIG_NEWTOD
+
+/* Required externs */
+extern nsec_t read_persistent_clock(void);
+extern void sync_persistent_clock(struct timespec ts);
+
+#ifdef CONFIG_NEWTOD_VSYSCALL
+extern void arch_update_vsyscall_gtod(nsec_t wall_time, cycle_t offset_base,
+ struct timesource_t* timesource, int ntp_adj);
+#else
+#define arch_update_vsyscall_gtod(x,y,z,w) {}
+#endif /* CONFIG_NEWTOD_VSYSCALL */
+
+#endif /* CONFIG_NEWTOD */
+#endif
diff -urpN 2.6.12-rc4-tod-lkml/include/asm-i386/timeofday.h 2.6.12-rc4-tod/include/asm-i386/timeofday.h
--- 2.6.12-rc4-tod-lkml/include/asm-i386/timeofday.h 1969-12-31 16:00:00.000000000 -0800
+++ 2.6.12-rc4-tod/include/asm-i386/timeofday.h 2005-05-17 13:01:12.000000000 -0700
@@ -0,0 +1,4 @@
+#ifndef _ASM_I386_TIMEOFDAY_H
+#define _ASM_I386_TIMEOFDAY_H
+#include <asm-generic/timeofday.h>
+#endif
diff -urpN 2.6.12-rc4-tod-lkml/include/asm-ia64/timeofday.h 2.6.12-rc4-tod/include/asm-ia64/timeofday.h
--- 2.6.12-rc4-tod-lkml/include/asm-ia64/timeofday.h 1969-12-31 16:00:00.000000000 -0800
+++ 2.6.12-rc4-tod/include/asm-ia64/timeofday.h 2005-05-17 13:01:12.000000000 -0700
@@ -0,0 +1,4 @@
+#ifndef _ASM_IA64_TIMEOFDAY_H
+#define _ASM_IA64_TIMEOFDAY_H
+#include <asm-generic/timeofday.h>
+#endif
diff -urpN 2.6.12-rc4-tod-lkml/include/asm-ppc/timeofday.h 2.6.12-rc4-tod/include/asm-ppc/timeofday.h
--- 2.6.12-rc4-tod-lkml/include/asm-ppc/timeofday.h 1969-12-31 16:00:00.000000000 -0800
+++ 2.6.12-rc4-tod/include/asm-ppc/timeofday.h 2005-05-17 13:01:12.000000000 -0700
@@ -0,0 +1,4 @@
+#ifndef _ASM_PPC_TIMEOFDAY_H
+#define _ASM_PPC_TIMEOFDAY_H
+#include <asm-generic/timeofday.h>
+#endif
diff -urpN 2.6.12-rc4-tod-lkml/include/asm-ppc64/timeofday.h 2.6.12-rc4-tod/include/asm-ppc64/timeofday.h
--- 2.6.12-rc4-tod-lkml/include/asm-ppc64/timeofday.h 1969-12-31 16:00:00.000000000 -0800
+++ 2.6.12-rc4-tod/include/asm-ppc64/timeofday.h 2005-05-17 13:01:12.000000000 -0700
@@ -0,0 +1,4 @@
+#ifndef _ASM_PPC64_TIMEOFDAY_H
+#define _ASM_PPC64_TIMEOFDAY_H
+#include <asm-generic/timeofday.h>
+#endif
diff -urpN 2.6.12-rc4-tod-lkml/include/asm-s390/timeofday.h 2.6.12-rc4-tod/include/asm-s390/timeofday.h
--- 2.6.12-rc4-tod-lkml/include/asm-s390/timeofday.h 1969-12-31 16:00:00.000000000 -0800
+++ 2.6.12-rc4-tod/include/asm-s390/timeofday.h 2005-05-17 13:01:12.000000000 -0700
@@ -0,0 +1,4 @@
+#ifndef _ASM_S390_TIMEOFDAY_H
+#define _ASM_S390_TIMEOFDAY_H
+#include <asm-generic/timeofday.h>
+#endif
diff -urpN 2.6.12-rc4-tod-lkml/include/asm-x86_64/timeofday.h 2.6.12-rc4-tod/include/asm-x86_64/timeofday.h
--- 2.6.12-rc4-tod-lkml/include/asm-x86_64/timeofday.h 1969-12-31 16:00:00.000000000 -0800
+++ 2.6.12-rc4-tod/include/asm-x86_64/timeofday.h 2005-05-17 13:01:12.000000000 -0700
@@ -0,0 +1,4 @@
+#ifndef _ASM_X86_64_TIMEOFDAY_H
+#define _ASM_X86_64_TIMEOFDAY_H
+#include <asm-generic/timeofday.h>
+#endif
diff -urpN 2.6.12-rc4-tod-lkml/include/linux/timeofday.h 2.6.12-rc4-tod/include/linux/timeofday.h
--- 2.6.12-rc4-tod-lkml/include/linux/timeofday.h 2005-05-17 15:29:29.000000000 -0700
+++ 2.6.12-rc4-tod/include/linux/timeofday.h 2005-05-17 13:01:12.000000000 -0700
@@ -7,7 +7,6 @@
#include <linux/types.h>
#include <linux/time.h>
#include <linux/timex.h>
-#include <linux/timesource.h>
#include <asm/div64.h>

#ifdef CONFIG_NEWTOD
@@ -23,19 +22,6 @@ extern int do_adjtimex(struct timex *tx)

extern void timeofday_init(void);

-
-/* Required externs */
-/* XXX - should this go elsewhere? */
-extern nsec_t read_persistent_clock(void);
-extern void sync_persistent_clock(struct timespec ts);
-#ifdef CONFIG_NEWTOD_VSYSCALL
-extern void arch_update_vsyscall_gtod(nsec_t wall_time, cycle_t offset_base,
- struct timesource_t* timesource, int ntp_adj);
-#else
-#define arch_update_vsyscall_gtod(x,y,z,w) {}
-#endif
-
-
/* Inline helper functions */
static inline struct timeval ns_to_timeval(nsec_t ns)
{
diff -urpN 2.6.12-rc4-tod-lkml/kernel/timeofday.c 2.6.12-rc4-tod/kernel/timeofday.c
--- 2.6.12-rc4-tod-lkml/kernel/timeofday.c 2005-05-17 15:29:29.000000000 -0700
+++ 2.6.12-rc4-tod/kernel/timeofday.c 2005-05-17 13:01:12.000000000 -0700
@@ -58,6 +58,7 @@
#include <linux/sched.h> /* Needed for capable() */
#include <linux/sysdev.h>
#include <linux/jiffies.h>
+#include <asm/timeofday.h>

/* XXX - remove later */
#define TIME_DBG 0

2005-05-17 23:40:20

by Nishanth Aravamudan

[permalink] [raw]
Subject: [RFC][PATCH 4/4] support new soft-timer subsystem on non-NEWTOD archs

On 17.05.2005 [16:33:00 -0700], Nishanth Aravamudan wrote:
> On 13.05.2005 [17:16:35 -0700], john stultz wrote:
> > All,
> > This patch implements the architecture independent portion of the new
> > time of day subsystem. For a brief description on the rework, see here:
> > http://lwn.net/Articles/120850/ (Many thanks to the LWN team for that
> > easy to understand writeup!)
> >
> > I intend this to be the last RFC release and to submit this patch to
> > Andrew for for testing near the end of this month. So please, if you
> > have any complaints, suggestions, or blocking issues, let me know.
>
> I have been working closely with John to re-work the soft-timer subsytem
> to use the new timeofday() subsystem. The following patches attempts to
> begin this process. I would greatly appreciate any comments.

Description: Support the new soft-timer interfaces on non-NEWTOD archs
by emulating nanoseconds via jiffies.

Signed-off-by: Nishanth Aravamudan <[email protected]>

--- 2.6.12-rc4-tod/include/linux/timeofday.h 2005-05-17 13:01:12.000000000 -0700
+++ 2.6.12-rc4-tod-timer/include/linux/timeofday.h 2005-05-17 13:02:01.000000000 -0700
@@ -55,5 +55,14 @@ static inline nsec_t timeval_to_ns(struc
}
#else /* CONFIG_NEWTOD */
#define timeofday_init()
+/*
+ * do_monotonic_clock():
+ * Emulate the monotonically increasing number of nanoseconds
+ * of NEWTOD archs via jiffies.
+ */
+nsec_t do_monotonic_clock(void)
+{
+ return jiffies_to_nsecs(jiffies);
+}
#endif /* CONFIG_NEWTOD */
#endif /* _LINUX_TIMEOFDAY_H */

2005-05-17 23:43:28

by Nishanth Aravamudan

[permalink] [raw]
Subject: [RFC][PATCH 3/4] convert sys_nanosleep() to use new soft-timer subsystem

On 17.05.2005 [16:33:00 -0700], Nishanth Aravamudan wrote:
> On 13.05.2005 [17:16:35 -0700], john stultz wrote:
> > All,
> > This patch implements the architecture independent portion of the new
> > time of day subsystem. For a brief description on the rework, see here:
> > http://lwn.net/Articles/120850/ (Many thanks to the LWN team for that
> > easy to understand writeup!)
> >
> > I intend this to be the last RFC release and to submit this patch to
> > Andrew for for testing near the end of this month. So please, if you
> > have any complaints, suggestions, or blocking issues, let me know.
>
> I have been working closely with John to re-work the soft-timer subsytem
> to use the new timeofday() subsystem. The following patches attempts to
> begin this process. I would greatly appreciate any comments.

Description: Convert sys_nanosleep() to use the new timerinterval-based
soft-timer interfaces.

Signed-off-by: Nishanth Aravamudan <[email protected]>

diff -urpN 2.6.12-rc4-tod-timer-a/kernel/timer.c 2.6.12-rc4-tod-timer-b/kernel/timer.c
--- 2.6.12-rc4-tod-timer-a/kernel/timer.c 2005-05-17 16:09:40.000000000 -0700
+++ 2.6.12-rc4-tod-timer-b/kernel/timer.c 2005-05-17 16:11:47.000000000 -0700
@@ -1460,21 +1460,21 @@ asmlinkage long sys_gettid(void)

static long __sched nanosleep_restart(struct restart_block *restart)
{
- unsigned long expire = restart->arg0, now = jiffies;
+ nsec_t expire = restart->arg0, now = do_monotonic_clock();
struct timespec __user *rmtp = (struct timespec __user *) restart->arg1;
long ret;

/* Did it expire while we handled signals? */
- if (!time_after(expire, now))
+ if (now > expire)
return 0;

- current->state = TASK_INTERRUPTIBLE;
- expire = schedule_timeout(expire - now);
+ set_current_state(TASK_INTERRUPTIBLE);
+ expire = schedule_timeout_nsecs(expire - now);

ret = 0;
if (expire) {
struct timespec t;
- jiffies_to_timespec(expire, &t);
+ t = ns_to_timespec(expire);

ret = -ERESTART_RESTARTBLOCK;
if (rmtp && copy_to_user(rmtp, &t, sizeof(t)))
@@ -1487,7 +1487,7 @@ static long __sched nanosleep_restart(st
asmlinkage long sys_nanosleep(struct timespec __user *rqtp, struct timespec __user *rmtp)
{
struct timespec t;
- unsigned long expire;
+ nsec_t expire;
long ret;

if (copy_from_user(&t, rqtp, sizeof(t)))
@@ -1496,20 +1496,20 @@ asmlinkage long sys_nanosleep(struct tim
if ((t.tv_nsec >= 1000000000L) || (t.tv_nsec < 0) || (t.tv_sec < 0))
return -EINVAL;

- expire = timespec_to_jiffies(&t) + (t.tv_sec || t.tv_nsec);
- current->state = TASK_INTERRUPTIBLE;
- expire = schedule_timeout(expire);
+ expire = timespec_to_ns(&t);
+ set_current_state(TASK_INTERRUPTIBLE);
+ expire = schedule_timeout_nsecs(expire);

ret = 0;
if (expire) {
struct restart_block *restart;
- jiffies_to_timespec(expire, &t);
+ t = ns_to_timespec(expire);
if (rmtp && copy_to_user(rmtp, &t, sizeof(t)))
return -EFAULT;

restart = &current_thread_info()->restart_block;
restart->fn = nanosleep_restart;
- restart->arg0 = jiffies + expire;
+ restart->arg0 = do_monotonic_clock() + expire;
restart->arg1 = (unsigned long) rmtp;
ret = -ERESTART_RESTARTBLOCK;
}

2005-05-17 23:48:18

by Nishanth Aravamudan

[permalink] [raw]
Subject: [RFC][PATCH 2/4] convert soft-timer subsystem to timerintervals

On 17.05.2005 [16:33:00 -0700], Nishanth Aravamudan wrote:
> On 13.05.2005 [17:16:35 -0700], john stultz wrote:
> > All,
> > This patch implements the architecture independent portion of the new
> > time of day subsystem. For a brief description on the rework, see here:
> > http://lwn.net/Articles/120850/ (Many thanks to the LWN team for that
> > easy to understand writeup!)
> >
> > I intend this to be the last RFC release and to submit this patch to
> > Andrew for for testing near the end of this month. So please, if you
> > have any complaints, suggestions, or blocking issues, let me know.
>
> I have been working closely with John to re-work the soft-timer subsytem
> to use the new timeofday() subsystem. The following patches attempts to
> begin this process. I would greatly appreciate any comments.

Description: Rework the soft-timer subsytem to use timerintervals (a new
unit) instead of jiffies for expiration and addition. timerintervals are
nothing more than the nanosecond time returned by the new
timeofday-subsystem shifted down a certain number of bits (thus
determining the precisino of the soft-timer subsystem at compile-time).

Signed-off-by: Nishanth Aravamudan <[email protected]>

diff -urpN 2.6.12-rc4-tod/include/linux/jiffies.h 2.6.12-rc4-tod-timer/include/linux/jiffies.h
--- 2.6.12-rc4-tod/include/linux/jiffies.h 2005-03-01 23:37:31.000000000 -0800
+++ 2.6.12-rc4-tod-timer/include/linux/jiffies.h 2005-05-17 16:03:12.000000000 -0700
@@ -263,7 +263,7 @@ static inline unsigned int jiffies_to_ms
#endif
}

-static inline unsigned int jiffies_to_usecs(const unsigned long j)
+static inline unsigned long jiffies_to_usecs(const unsigned long j)
{
#if HZ <= 1000000 && !(1000000 % HZ)
return (1000000 / HZ) * j;
@@ -274,6 +274,17 @@ static inline unsigned int jiffies_to_us
#endif
}

+static inline nsec_t jiffies_to_nsecs(const unsigned long j)
+{
+#if HZ <= NSEC_PER_SEC && !(NSEC_PER_SEC % HZ)
+ return (NSEC_PER_SEC / HZ) * (nsec_t)j;
+#elif HZ > NSEC_PER_SEC && !(HZ % NSEC_PER_SEC)
+ return ((nsec_t)j + (HZ / NSEC_PER_SEC) - 1)/(HZ / NSEC_PER_SEC);
+#else
+ return ((nsec_t)j * NSEC_PER_SEC) / HZ;
+#endif
+}
+
static inline unsigned long msecs_to_jiffies(const unsigned int m)
{
if (m > jiffies_to_msecs(MAX_JIFFY_OFFSET))
@@ -287,7 +298,7 @@ static inline unsigned long msecs_to_jif
#endif
}

-static inline unsigned long usecs_to_jiffies(const unsigned int u)
+static inline unsigned long usecs_to_jiffies(const unsigned long u)
{
if (u > jiffies_to_usecs(MAX_JIFFY_OFFSET))
return MAX_JIFFY_OFFSET;
@@ -300,6 +311,24 @@ static inline unsigned long usecs_to_jif
#endif
}

+static inline unsigned long nsecs_to_jiffies(const nsec_t n)
+{
+ nsec_t temp;
+ if (n > jiffies_to_nsecs(MAX_JIFFY_OFFSET))
+ return MAX_JIFFY_OFFSET;
+#if HZ <= NSEC_PER_SEC && !(NSEC_PER_SEC % HZ)
+ temp = n + (NSEC_PER_SEC / HZ) - 1;
+ do_div(temp, (NSEC_PER_SEC / HZ));
+ return (unsigned long)temp;
+#elif HZ > NSEC_PER_SEC && !(HZ % NSEC_PER_SEC)
+ return n * (HZ / NSEC_PER_SEC);
+#else
+ temp = n * HZ + NSEC_PER_SEC - 1;
+ do_div(temp, NSEC_PER_SEC);
+ return (unsigned long)temp;
+#endif
+}
+
/*
* The TICK_NSEC - 1 rounds up the value to the next resolution. Note
* that a remainder subtract here would not do the right thing as the
diff -urpN 2.6.12-rc4-tod/include/linux/sched.h 2.6.12-rc4-tod-timer/include/linux/sched.h
--- 2.6.12-rc4-tod/include/linux/sched.h 2005-05-10 10:29:03.000000000 -0700
+++ 2.6.12-rc4-tod-timer/include/linux/sched.h 2005-05-17 16:03:12.000000000 -0700
@@ -182,7 +182,13 @@ extern void scheduler_tick(void);
extern int in_sched_functions(unsigned long addr);

#define MAX_SCHEDULE_TIMEOUT LONG_MAX
+#define MAX_SCHEDULE_TIMEOUT_NSECS ((nsec_t)(-1))
+#define MAX_SCHEDULE_TIMEOUT_USECS ULONG_MAX
+#define MAX_SCHEDULE_TIMEOUT_MSECS UINT_MAX
extern signed long FASTCALL(schedule_timeout(signed long timeout));
+extern nsec_t FASTCALL(schedule_timeout_nsecs(nsec_t timeout_nsecs));
+extern unsigned long FASTCALL(schedule_timeout_usecs(unsigned long timeout_usecs));
+extern unsigned int FASTCALL(schedule_timeout_msecs(unsigned int timeout_msesc));
asmlinkage void schedule(void);

struct namespace;
diff -urpN 2.6.12-rc4-tod/include/linux/time.h 2.6.12-rc4-tod-timer/include/linux/time.h
--- 2.6.12-rc4-tod/include/linux/time.h 2005-05-17 16:01:35.000000000 -0700
+++ 2.6.12-rc4-tod-timer/include/linux/time.h 2005-05-17 16:03:12.000000000 -0700
@@ -40,6 +40,10 @@ typedef u64 cycle_t;
#define NSEC_PER_SEC (1000000000L)
#endif

+#ifndef NSEC_PER_MSEC
+#define NSEC_PER_MSEC (1000000L)
+#endif
+
#ifndef NSEC_PER_USEC
#define NSEC_PER_USEC (1000L)
#endif
diff -urpN 2.6.12-rc4-tod/include/linux/timer.h 2.6.12-rc4-tod-timer/include/linux/timer.h
--- 2.6.12-rc4-tod/include/linux/timer.h 2005-03-01 23:38:13.000000000 -0800
+++ 2.6.12-rc4-tod-timer/include/linux/timer.h 2005-05-17 16:03:12.000000000 -0700
@@ -5,6 +5,7 @@
#include <linux/list.h>
#include <linux/spinlock.h>
#include <linux/stddef.h>
+#include <linux/timeofday.h>

struct tvec_t_base_s;

@@ -65,27 +66,13 @@ extern void add_timer_on(struct timer_li
extern int del_timer(struct timer_list * timer);
extern int __mod_timer(struct timer_list *timer, unsigned long expires);
extern int mod_timer(struct timer_list *timer, unsigned long expires);
+extern void add_timer(struct timer_list *timer);
+extern int set_timer_nsecs(struct timer_list *timer, nsec_t expires_nsecs);
+extern void set_timer_on_nsecs(struct timer_list *timer, nsec_t expires_nsecs,
+ int cpu);

extern unsigned long next_timer_interrupt(void);

-/***
- * add_timer - start a timer
- * @timer: the timer to be added
- *
- * The kernel will do a ->function(->data) callback from the
- * timer interrupt at the ->expired point in the future. The
- * current time is 'jiffies'.
- *
- * The timer's ->expired, ->function (and if the handler uses it, ->data)
- * fields must be set prior calling this function.
- *
- * Timers with an ->expired field in the past will be executed in the next
- * timer tick.
- */
-static inline void add_timer(struct timer_list * timer)
-{
- __mod_timer(timer, timer->expires);
-}

#ifdef CONFIG_SMP
extern int del_timer_sync(struct timer_list *timer);
diff -urpN 2.6.12-rc4-tod/kernel/timer.c 2.6.12-rc4-tod-timer/kernel/timer.c
--- 2.6.12-rc4-tod/kernel/timer.c 2005-05-17 16:01:35.000000000 -0700
+++ 2.6.12-rc4-tod-timer/kernel/timer.c 2005-05-17 16:03:12.000000000 -0700
@@ -33,6 +33,7 @@
#include <linux/posix-timers.h>
#include <linux/cpu.h>
#include <linux/syscalls.h>
+#include <linux/timeofday.h>

#include <asm/uaccess.h>
#include <asm/unistd.h>
@@ -56,6 +57,15 @@ static void time_interpolator_update(lon
#define TVR_SIZE (1 << TVR_BITS)
#define TVN_MASK (TVN_SIZE - 1)
#define TVR_MASK (TVR_SIZE - 1)
+/*
+ * Modifying TIMERINTERVAL_BITS changes the software resolution of
+ * soft-timers. While 20 bits would be closer to a millisecond, there
+ * are performance gains from allowing a software resolution finer than
+ * the hardware (HZ=1000)
+ */
+#define TIMERINTERVAL_BITS 19
+#define TIMERINTERVAL_SIZE (1 << TIMERINTERVAL_BITS)
+#define TIMERINTERVAL_MASK (TIMERINTERVAL_SIZE - 1)

typedef struct tvec_s {
struct list_head vec[TVN_SIZE];
@@ -67,7 +77,7 @@ typedef struct tvec_root_s {

struct tvec_t_base_s {
spinlock_t lock;
- unsigned long timer_jiffies;
+ unsigned long last_timer_time;
struct timer_list *running_timer;
tvec_root_t tv1;
tvec_t tv2;
@@ -113,11 +123,82 @@ static inline void check_timer(struct ti
check_timer_failed(timer);
}

+/*
+ * nsecs_to_timerintervals_ceiling - convert nanoseconds to timerintervals
+ * @n: number of nanoseconds to convert
+ *
+ * This is where changes to TIMERINTERVAL_BITS affect the soft-timer
+ * subsystem.
+ *
+ * Some explanation of the math is necessary:
+ * Rather than do decimal arithmetic, we shift for the sake of speed.
+ * This does mean that the actual requestable sleeps are
+ * 2^(sizeof(unsigned long)*8 - TIMERINTERVAL_BITS)
+ * timerintervals. The (signed long) cast takes care of the corner case
+ * where we request a 0 nanosecond sleep; if the quantity were unsigned,
+ * we would not propogate the carry and force a wrap when adding the 1.
+ *
+ * To prevent timers from being expired early, we:
+ * Take the ceiling when we add; and
+ * Take the floor when we expire.
+ */
+static inline unsigned long nsecs_to_timerintervals_ceiling(nsec_t nsecs)
+{
+ return (unsigned long)
+ ((((signed long)(nsecs - 1)) >> TIMERINTERVAL_BITS) + 1);
+}
+
+/*
+ * nsecs_to_timerintervals_floor - convert nanoseconds to timerintervals
+ * @n: number of nanoseconds to convert
+ *
+ * This is where changes to TIMERINTERVAL_BITS affect the soft-timer
+ * subsystem.
+ *
+ * Some explanation of the math is necessary:
+ * Rather than do decimal arithmetic, we shift for the sake of speed.
+ * This does mean that the actual requestable sleeps are
+ * 2^(sizeof(unsigned long)*8 - TIMERINTERVAL_BITS)
+ *
+ * To prevent timers from being expired early, we:
+ * Take the ceiling when we add; and
+ * Take the floor when we expire.
+ */
+static inline unsigned long nsecs_to_timerintervals_floor(nsec_t nsecs)
+{
+ return (unsigned long)(nsecs >> TIMERINTERVAL_BITS);
+}
+
+/*
+ * jiffies_to_timerintervals - convert absolute jiffies to timerintervals
+ * @abs_jiffies: number of jiffies to convert
+ *
+ * First, we convert the absolute jiffies parameter to a relative
+ * jiffies value. To maintain precision, we convert the relative
+ * jiffies value to a relative nanosecond value and then convert that
+ * to a relative soft-timer interval unit value. We then add this
+ * relative value to the current time according to the timeofday-
+ * subsystem, converted to soft-timer interval units.
+ *
+ * We only use this function when adding timers, so we are free to
+ * always use the ceiling version of nsecs_to_timerintervals.
+ *
+ * This function only exists to support deprecated interfaces. Once
+ * those interfaces have been converted to the alternatives, it should
+ * be removed.
+ */
+static inline unsigned long jiffies_to_timerintervals(unsigned long abs_jiffies)
+{
+ unsigned long relative_jiffies = abs_jiffies - jiffies;
+ return nsecs_to_timerintervals_ceiling(do_monotonic_clock()) +
+ nsecs_to_timerintervals_ceiling(jiffies_to_nsecs(relative_jiffies));
+}

static void internal_add_timer(tvec_base_t *base, struct timer_list *timer)
{
+ /* expires is in timerintervals */
unsigned long expires = timer->expires;
- unsigned long idx = expires - base->timer_jiffies;
+ unsigned long idx = expires - base->last_timer_time;
struct list_head *vec;

if (idx < TVR_SIZE) {
@@ -137,7 +218,7 @@ static void internal_add_timer(tvec_base
* Can happen if you add a timer with expires == jiffies,
* or you set a timer to go off in the past
*/
- vec = base->tv1.vec + (base->timer_jiffies & TVR_MASK);
+ vec = base->tv1.vec + (base->last_timer_time & TVR_MASK);
} else {
int i;
/* If the timeout is larger than 0xffffffff on 64-bit
@@ -145,7 +226,7 @@ static void internal_add_timer(tvec_base
*/
if (idx > 0xffffffffUL) {
idx = 0xffffffffUL;
- expires = idx + base->timer_jiffies;
+ expires = idx + base->last_timer_time;
}
i = (expires >> (TVR_BITS + 3 * TVN_BITS)) & TVN_MASK;
vec = base->tv5.vec + i;
@@ -207,6 +288,7 @@ repeat:
list_del(&timer->entry);
ret = 1;
}
+ /* expires is in timerintervals */
timer->expires = expires;
internal_add_timer(new_base, timer);
timer->base = new_base;
@@ -238,15 +320,41 @@ void add_timer_on(struct timer_list *tim
check_timer(timer);

spin_lock_irqsave(&base->lock, flags);
+ timer->expires = jiffies_to_timerintervals(timer->expires);
internal_add_timer(base, timer);
timer->base = base;
spin_unlock_irqrestore(&base->lock, flags);
}

+/***
+ * add_timer - start a timer
+ * @timer: the timer to be added
+ *
+ * The kernel will do a ->function(->data) callback from the
+ * timer interrupt at the ->expired point in the future. The
+ * current time is 'jiffies'.
+ *
+ * The timer's ->expired, ->function (and if the handler uses it, ->data)
+ * fields must be set prior calling this function.
+ *
+ * Timers with an ->expired field in the past will be executed in the next
+ * timer tick.
+ *
+ * The callers of add_timer() should be aware that the interface is now
+ * deprecated. set_timer_nsecs() is the single interface for adding and
+ * modifying timers.
+ */
+void add_timer(struct timer_list * timer)
+{
+ __mod_timer(timer, jiffies_to_timerintervals(timer->expires));
+}
+
+EXPORT_SYMBOL(add_timer);

/***
* mod_timer - modify a timer's timeout
* @timer: the timer to be modified
+ * @expires: absolute time, in jiffies, when timer should expire
*
* mod_timer is a more efficient way to update the expire field of an
* active timer (if the timer is inactive it will be activated)
@@ -262,6 +370,10 @@ void add_timer_on(struct timer_list *tim
* The function returns whether it has modified a pending timer or not.
* (ie. mod_timer() of an inactive timer returns 0, mod_timer() of an
* active timer returns 1.)
+ *
+ * The callers of mod_timer() should be aware that the interface is now
+ * deprecated. set_timer_nsecs() is the single interface for adding and
+ * modifying timers.
*/
int mod_timer(struct timer_list *timer, unsigned long expires)
{
@@ -269,6 +381,7 @@ int mod_timer(struct timer_list *timer,

check_timer(timer);

+ expires = jiffies_to_timerintervals(expires);
/*
* This is a common optimization triggered by the
* networking code - if the timer is re-modified
@@ -282,6 +395,59 @@ int mod_timer(struct timer_list *timer,

EXPORT_SYMBOL(mod_timer);

+/*
+ * set_timer_nsecs - modify a timer's timeout in nsecs
+ * @timer: the timer to be modified
+ *
+ * set_timer_nsecs replaces both add_timer and mod_timer. The caller
+ * should call do_monotonic_clock() to determine the absolute timeout
+ * necessary.
+ */
+int set_timer_nsecs(struct timer_list *timer, nsec_t expires_nsecs)
+{
+ unsigned long expires;
+
+ BUG_ON(!timer->function);
+
+ check_timer(timer);
+
+ /* make sure to round up */
+ expires = nsecs_to_timerintervals_ceiling(expires_nsecs);
+ if (timer_pending(timer) && timer->expires == expires)
+ return 1;
+
+ return __mod_timer(timer, expires);
+}
+
+EXPORT_SYMBOL_GPL(set_timer_nsecs);
+
+/***
+ * set_timer_on_nsecs - start a timer on a particular CPU
+ * @timer: the timer to be added
+ * @expires_nsecs: absolute time in nsecs when timer should expire
+ * @cpu: the CPU to start it on
+ *
+ * This is not very scalable on SMP. Double adds are not possible.
+ */
+void set_timer_on_nsecs(struct timer_list *timer, nsec_t expires_nsecs, int cpu)
+{
+ tvec_base_t *base = &per_cpu(tvec_bases, cpu);
+ unsigned long flags;
+
+ BUG_ON(timer_pending(timer) || !timer->function);
+
+ check_timer(timer);
+
+ spin_lock_irqsave(&base->lock, flags);
+ /* make sure to round up */
+ timer->expires = nsecs_to_timerintervals_ceiling(expires_nsecs);
+ internal_add_timer(base, timer);
+ timer->base = base;
+ spin_unlock_irqrestore(&base->lock, flags);
+}
+
+EXPORT_SYMBOL_GPL(set_timer_on_nsecs);
+
/***
* del_timer - deactive a timer.
* @timer: the timer to be deactivated
@@ -427,21 +593,22 @@ static int cascade(tvec_base_t *base, tv
/***
* __run_timers - run all expired timers (if any) on this CPU.
* @base: the timer vector to be processed.
+ * @current_timer_time: the current time in soft-timer interval units
*
* This function cascades all vectors and executes all expired timer
* vectors.
*/
-#define INDEX(N) (base->timer_jiffies >> (TVR_BITS + N * TVN_BITS)) & TVN_MASK
+#define INDEX(N) (base->last_timer_time >> (TVR_BITS + N * TVN_BITS)) & TVN_MASK

-static inline void __run_timers(tvec_base_t *base)
+static inline void __run_timers(tvec_base_t *base, unsigned long current_timer_time)
{
struct timer_list *timer;

spin_lock_irq(&base->lock);
- while (time_after_eq(jiffies, base->timer_jiffies)) {
+ while (time_after_eq(current_timer_time, base->last_timer_time)) {
struct list_head work_list = LIST_HEAD_INIT(work_list);
struct list_head *head = &work_list;
- int index = base->timer_jiffies & TVR_MASK;
+ int index = base->last_timer_time & TVR_MASK;

/*
* Cascade timers:
@@ -451,7 +618,7 @@ static inline void __run_timers(tvec_bas
(!cascade(base, &base->tv3, INDEX(1))) &&
!cascade(base, &base->tv4, INDEX(2)))
cascade(base, &base->tv5, INDEX(3));
- ++base->timer_jiffies;
+ ++base->last_timer_time;
list_splice_init(base->tv1.vec + index, &work_list);
repeat:
if (!list_empty(head)) {
@@ -500,20 +667,20 @@ unsigned long next_timer_interrupt(void)

base = &__get_cpu_var(tvec_bases);
spin_lock(&base->lock);
- expires = base->timer_jiffies + (LONG_MAX >> 1);
+ expires = base->last_timer_time + (LONG_MAX >> 1);
list = 0;

/* Look for timer events in tv1. */
- j = base->timer_jiffies & TVR_MASK;
+ j = base->last_timer_time & TVR_MASK;
do {
list_for_each_entry(nte, base->tv1.vec + j, entry) {
expires = nte->expires;
- if (j < (base->timer_jiffies & TVR_MASK))
+ if (j < (base->last_timer_time & TVR_MASK))
list = base->tv2.vec + (INDEX(0));
goto found;
}
j = (j + 1) & TVR_MASK;
- } while (j != (base->timer_jiffies & TVR_MASK));
+ } while (j != (base->last_timer_time & TVR_MASK));

/* Check tv2-tv5. */
varray[0] = &base->tv2;
@@ -890,10 +1057,15 @@ EXPORT_SYMBOL(xtime_lock);
*/
static void run_timer_softirq(struct softirq_action *h)
{
+ unsigned long current_timer_time;
tvec_base_t *base = &__get_cpu_var(tvec_bases);

- if (time_after_eq(jiffies, base->timer_jiffies))
- __run_timers(base);
+ /* cache the converted current time, rounding down */
+ current_timer_time =
+ nsecs_to_timerintervals_floor(do_monotonic_clock());
+
+ if (time_after_eq(current_timer_time, base->last_timer_time))
+ __run_timers(base, current_timer_time);
}

/*
@@ -1078,6 +1250,10 @@ static void process_timeout(unsigned lon
* value will be %MAX_SCHEDULE_TIMEOUT.
*
* In all cases the return value is guaranteed to be non-negative.
+ *
+ * The callers of schedule_timeout() should be aware that the interface
+ * is now deprecated. schedule_timeout_{msecs,usecs,nsecs}() are now the
+ * interfaces for relative timeout requests.
*/
fastcall signed long __sched schedule_timeout(signed long timeout)
{
@@ -1133,6 +1309,149 @@ fastcall signed long __sched schedule_ti

EXPORT_SYMBOL(schedule_timeout);

+/**
+ * schedule_timeout_nsecs - sleep until timeout
+ * @timeout_nsecs: timeout value in nanoseconds
+ *
+ * Make the current task sleep until @timeout_nsecs nsecs have
+ * elapsed. The routine will return immediately unless
+ * the current task state has been set (see set_current_state()).
+ *
+ * You can set the task state as follows -
+ *
+ * %TASK_UNINTERRUPTIBLE - at least @timeout_nsecs nsecs are guaranteed
+ * to pass before the routine returns. The routine will return 0
+ *
+ * %TASK_INTERRUPTIBLE - the routine may return early if a signal is
+ * delivered to the current task. In this case the remaining time
+ * in nsecs will be returned, or 0 if the timer expired in time
+ *
+ * The current task state is guaranteed to be TASK_RUNNING when this
+ * routine returns.
+ *
+ * Specifying a @timeout value of %MAX_SCHEDULE_TIMEOUT_NSECS will
+ * schedule the CPU away without a bound on the timeout. In this case
+ * the return value will be %MAX_SCHEDULE_TIMEOUT_NSECS.
+ */
+fastcall nsec_t __sched schedule_timeout_nsecs(nsec_t timeout_nsecs)
+{
+ struct timer_list timer;
+ nsec_t expires;
+
+ if (timeout_nsecs == MAX_SCHEDULE_TIMEOUT_NSECS) {
+ schedule();
+ goto out;
+ }
+
+ expires = do_monotonic_clock() + timeout_nsecs;
+
+ init_timer(&timer);
+ timer.data = (unsigned long) current;
+ timer.function = process_timeout;
+
+ set_timer_nsecs(&timer, expires);
+ schedule();
+ del_singleshot_timer_sync(&timer);
+
+ timeout_nsecs = do_monotonic_clock();
+ if (expires < timeout_nsecs)
+ timeout_nsecs = (nsec_t)0UL;
+ else
+ timeout_nsecs = expires - timeout_nsecs;
+out:
+ return timeout_nsecs;
+}
+
+EXPORT_SYMBOL_GPL(schedule_timeout_nsecs);
+
+/**
+ * schedule_timeout_usecs - sleep until timeout
+ * @timeout_usecs: timeout value in nanoseconds
+ *
+ * Make the current task sleep until @timeout_usecs usecs have
+ * elapsed. The routine will return immediately unless
+ * the current task state has been set (see set_current_state()).
+ *
+ * You can set the task state as follows -
+ *
+ * %TASK_UNINTERRUPTIBLE - at least @timeout_usecs usecs are guaranteed
+ * to pass before the routine returns. The routine will return 0
+ *
+ * %TASK_INTERRUPTIBLE - the routine may return early if a signal is
+ * delivered to the current task. In this case the remaining time
+ * in usecs will be returned, or 0 if the timer expired in time
+ *
+ * The current task state is guaranteed to be TASK_RUNNING when this
+ * routine returns.
+ *
+ * Specifying a @timeout value of %MAX_SCHEDULE_TIMEOUT_USECS will
+ * schedule the CPU away without a bound on the timeout. In this case
+ * the return value will be %MAX_SCHEDULE_TIMEOUT_USECS.
+ */
+fastcall inline unsigned long __sched schedule_timeout_usecs(unsigned long timeout_usecs)
+{
+ nsec_t timeout_nsecs;
+
+ if (timeout_usecs == MAX_SCHEDULE_TIMEOUT_USECS)
+ timeout_nsecs = MAX_SCHEDULE_TIMEOUT_NSECS;
+ else
+ timeout_nsecs = timeout_usecs * (nsec_t)NSEC_PER_USEC;
+ /*
+ * Make sure to round up by subtracting one before division and
+ * adding one after
+ */
+ timeout_nsecs = schedule_timeout_nsecs(timeout_nsecs) - 1;
+ do_div(timeout_nsecs, NSEC_PER_USEC);
+ timeout_usecs = (unsigned long)timeout_nsecs + 1UL;
+ return timeout_usecs;
+}
+
+EXPORT_SYMBOL_GPL(schedule_timeout_usecs);
+
+/**
+ * schedule_timeout_msecs - sleep until timeout
+ * @timeout_msecs: timeout value in nanoseconds
+ *
+ * Make the current task sleep until @timeout_msecs msecs have
+ * elapsed. The routine will return immediately unless
+ * the current task state has been set (see set_current_state()).
+ *
+ * You can set the task state as follows -
+ *
+ * %TASK_UNINTERRUPTIBLE - at least @timeout_msecs msecs are guaranteed
+ * to pass before the routine returns. The routine will return 0
+ *
+ * %TASK_INTERRUPTIBLE - the routine may return early if a signal is
+ * delivered to the current task. In this case the remaining time
+ * in msecs will be returned, or 0 if the timer expired in time
+ *
+ * The current task state is guaranteed to be TASK_RUNNING when this
+ * routine returns.
+ *
+ * Specifying a @timeout value of %MAX_SCHEDULE_TIMEOUT_MSECS will
+ * schedule the CPU away without a bound on the timeout. In this case
+ * the return value will be %MAX_SCHEDULE_TIMEOUT_MSECS.
+ */
+fastcall inline unsigned int __sched schedule_timeout_msecs(unsigned int timeout_msecs)
+{
+ nsec_t timeout_nsecs;
+
+ if (timeout_msecs == MAX_SCHEDULE_TIMEOUT_MSECS)
+ timeout_nsecs = MAX_SCHEDULE_TIMEOUT_NSECS;
+ else
+ timeout_nsecs = timeout_msecs * (nsec_t)NSEC_PER_MSEC;
+ /*
+ * Make sure to round up by subtracting one before division and
+ * adding one after
+ */
+ timeout_nsecs = schedule_timeout_nsecs(timeout_nsecs) - 1;
+ do_div(timeout_nsecs, NSEC_PER_MSEC);
+ timeout_msecs = (unsigned int)timeout_nsecs + 1;
+ return timeout_msecs;
+}
+
+EXPORT_SYMBOL_GPL(schedule_timeout_msecs);
+
/* Thread ID - the internal kernel "pid" */
asmlinkage long sys_gettid(void)
{
@@ -1302,7 +1621,11 @@ static void __devinit init_timers_cpu(in
for (j = 0; j < TVR_SIZE; j++)
INIT_LIST_HEAD(base->tv1.vec + j);

- base->timer_jiffies = jiffies;
+ /*
+ * Under the new montonic_clock() oriented soft-timer subsystem,
+ * we begin at 0, not INITIAL_JIFFIES
+ */
+ base->last_timer_time = 0UL;
}

#ifdef CONFIG_HOTPLUG_CPU

2005-05-18 08:24:16

by Nishanth Aravamudan

[permalink] [raw]
Subject: [RFC][UPDATE PATCH 2/4] convert soft-timer subsystem to timerintervals

On 17.05.2005 [16:36:16 -0700], Nishanth Aravamudan wrote:
> On 17.05.2005 [16:33:00 -0700], Nishanth Aravamudan wrote:
> > On 13.05.2005 [17:16:35 -0700], john stultz wrote:
> > > All,
> > > This patch implements the architecture independent portion of the new
> > > time of day subsystem. For a brief description on the rework, see here:
> > > http://lwn.net/Articles/120850/ (Many thanks to the LWN team for that
> > > easy to understand writeup!)
> > >
> > > I intend this to be the last RFC release and to submit this patch to
> > > Andrew for for testing near the end of this month. So please, if you
> > > have any complaints, suggestions, or blocking issues, let me know.
> >
> > I have been working closely with John to re-work the soft-timer subsytem
> > to use the new timeofday() subsystem. The following patches attempts to
> > begin this process. I would greatly appreciate any comments.
>
> Description: Rework the soft-timer subsytem to use timerintervals (a new
> unit) instead of jiffies for expiration and addition. timerintervals are
> nothing more than the nanosecond time returned by the new
> timeofday-subsystem shifted down a certain number of bits (thus
> determining the precisino of the soft-timer subsystem at compile-time).

Sorry, this patch introduced a bug in the nsecs->timerinterval
conversion functions which prevented the kernel from booting. Please use
the following patch instead.

Thanks,
Nish

Description: Rework the soft-timer subsytem to use timerintervals (a new
unit) instead of jiffies for expiration and addition. timerintervals are
nothing more than the nanosecond time returned by the new
timeofday-subsystem shifted down a certain number of bits (thus
determining the precisino of the soft-timer subsystem at compile-time).

Signed-off-by: Nishanth Aravamudan <[email protected]>


diff -urpN 2.6.12-rc4-tod/include/linux/jiffies.h 2.6.12-rc4-tod-timer/include/linux/jiffies.h
--- 2.6.12-rc4-tod/include/linux/jiffies.h 2005-05-06 22:20:31.000000000 -0700
+++ 2.6.12-rc4-tod-timer/include/linux/jiffies.h 2005-05-18 00:29:42.000000000 -0700
@@ -263,7 +263,7 @@ static inline unsigned int jiffies_to_ms
#endif
}

-static inline unsigned int jiffies_to_usecs(const unsigned long j)
+static inline unsigned long jiffies_to_usecs(const unsigned long j)
{
#if HZ <= 1000000 && !(1000000 % HZ)
return (1000000 / HZ) * j;
@@ -274,6 +274,17 @@ static inline unsigned int jiffies_to_us
#endif
}

+static inline nsec_t jiffies_to_nsecs(const unsigned long j)
+{
+#if HZ <= NSEC_PER_SEC && !(NSEC_PER_SEC % HZ)
+ return (NSEC_PER_SEC / HZ) * (nsec_t)j;
+#elif HZ > NSEC_PER_SEC && !(HZ % NSEC_PER_SEC)
+ return ((nsec_t)j + (HZ / NSEC_PER_SEC) - 1)/(HZ / NSEC_PER_SEC);
+#else
+ return ((nsec_t)j * NSEC_PER_SEC) / HZ;
+#endif
+}
+
static inline unsigned long msecs_to_jiffies(const unsigned int m)
{
if (m > jiffies_to_msecs(MAX_JIFFY_OFFSET))
@@ -287,7 +298,7 @@ static inline unsigned long msecs_to_jif
#endif
}

-static inline unsigned long usecs_to_jiffies(const unsigned int u)
+static inline unsigned long usecs_to_jiffies(const unsigned long u)
{
if (u > jiffies_to_usecs(MAX_JIFFY_OFFSET))
return MAX_JIFFY_OFFSET;
@@ -300,6 +311,24 @@ static inline unsigned long usecs_to_jif
#endif
}

+static inline unsigned long nsecs_to_jiffies(const nsec_t n)
+{
+ nsec_t temp;
+ if (n > jiffies_to_nsecs(MAX_JIFFY_OFFSET))
+ return MAX_JIFFY_OFFSET;
+#if HZ <= NSEC_PER_SEC && !(NSEC_PER_SEC % HZ)
+ temp = n + (NSEC_PER_SEC / HZ) - 1;
+ do_div(temp, (NSEC_PER_SEC / HZ));
+ return (unsigned long)temp;
+#elif HZ > NSEC_PER_SEC && !(HZ % NSEC_PER_SEC)
+ return n * (HZ / NSEC_PER_SEC);
+#else
+ temp = n * HZ + NSEC_PER_SEC - 1;
+ do_div(temp, NSEC_PER_SEC);
+ return (unsigned long)temp;
+#endif
+}
+
/*
* The TICK_NSEC - 1 rounds up the value to the next resolution. Note
* that a remainder subtract here would not do the right thing as the
diff -urpN 2.6.12-rc4-tod/include/linux/sched.h 2.6.12-rc4-tod-timer/include/linux/sched.h
--- 2.6.12-rc4-tod/include/linux/sched.h 2005-05-06 22:20:31.000000000 -0700
+++ 2.6.12-rc4-tod-timer/include/linux/sched.h 2005-05-18 00:29:42.000000000 -0700
@@ -182,7 +182,13 @@ extern void scheduler_tick(void);
extern int in_sched_functions(unsigned long addr);

#define MAX_SCHEDULE_TIMEOUT LONG_MAX
+#define MAX_SCHEDULE_TIMEOUT_NSECS ((nsec_t)(-1))
+#define MAX_SCHEDULE_TIMEOUT_USECS ULONG_MAX
+#define MAX_SCHEDULE_TIMEOUT_MSECS UINT_MAX
extern signed long FASTCALL(schedule_timeout(signed long timeout));
+extern nsec_t FASTCALL(schedule_timeout_nsecs(nsec_t timeout_nsecs));
+extern unsigned long FASTCALL(schedule_timeout_usecs(unsigned long timeout_usecs));
+extern unsigned int FASTCALL(schedule_timeout_msecs(unsigned int timeout_msesc));
asmlinkage void schedule(void);

struct namespace;
diff -urpN 2.6.12-rc4-tod/include/linux/time.h 2.6.12-rc4-tod-timer/include/linux/time.h
--- 2.6.12-rc4-tod/include/linux/time.h 2005-05-17 23:35:49.000000000 -0700
+++ 2.6.12-rc4-tod-timer/include/linux/time.h 2005-05-18 00:29:42.000000000 -0700
@@ -40,6 +40,10 @@ typedef u64 cycle_t;
#define NSEC_PER_SEC (1000000000L)
#endif

+#ifndef NSEC_PER_MSEC
+#define NSEC_PER_MSEC (1000000L)
+#endif
+
#ifndef NSEC_PER_USEC
#define NSEC_PER_USEC (1000L)
#endif
diff -urpN 2.6.12-rc4-tod/include/linux/timer.h 2.6.12-rc4-tod-timer/include/linux/timer.h
--- 2.6.12-rc4-tod/include/linux/timer.h 2005-05-06 22:20:31.000000000 -0700
+++ 2.6.12-rc4-tod-timer/include/linux/timer.h 2005-05-18 00:29:42.000000000 -0700
@@ -5,6 +5,7 @@
#include <linux/list.h>
#include <linux/spinlock.h>
#include <linux/stddef.h>
+#include <linux/timeofday.h>

struct tvec_t_base_s;

@@ -65,27 +66,13 @@ extern void add_timer_on(struct timer_li
extern int del_timer(struct timer_list * timer);
extern int __mod_timer(struct timer_list *timer, unsigned long expires);
extern int mod_timer(struct timer_list *timer, unsigned long expires);
+extern void add_timer(struct timer_list *timer);
+extern int set_timer_nsecs(struct timer_list *timer, nsec_t expires_nsecs);
+extern void set_timer_on_nsecs(struct timer_list *timer, nsec_t expires_nsecs,
+ int cpu);

extern unsigned long next_timer_interrupt(void);

-/***
- * add_timer - start a timer
- * @timer: the timer to be added
- *
- * The kernel will do a ->function(->data) callback from the
- * timer interrupt at the ->expired point in the future. The
- * current time is 'jiffies'.
- *
- * The timer's ->expired, ->function (and if the handler uses it, ->data)
- * fields must be set prior calling this function.
- *
- * Timers with an ->expired field in the past will be executed in the next
- * timer tick.
- */
-static inline void add_timer(struct timer_list * timer)
-{
- __mod_timer(timer, timer->expires);
-}

#ifdef CONFIG_SMP
extern int del_timer_sync(struct timer_list *timer);
diff -urpN 2.6.12-rc4-tod/kernel/timer.c 2.6.12-rc4-tod-timer/kernel/timer.c
--- 2.6.12-rc4-tod/kernel/timer.c 2005-05-17 23:35:49.000000000 -0700
+++ 2.6.12-rc4-tod-timer/kernel/timer.c 2005-05-18 01:08:59.000000000 -0700
@@ -33,6 +33,7 @@
#include <linux/posix-timers.h>
#include <linux/cpu.h>
#include <linux/syscalls.h>
+#include <linux/timeofday.h>

#include <asm/uaccess.h>
#include <asm/unistd.h>
@@ -56,6 +57,15 @@ static void time_interpolator_update(lon
#define TVR_SIZE (1 << TVR_BITS)
#define TVN_MASK (TVN_SIZE - 1)
#define TVR_MASK (TVR_SIZE - 1)
+/*
+ * Modifying TIMERINTERVAL_BITS changes the software resolution of
+ * soft-timers. While 20 bits would be closer to a millisecond, there
+ * are performance gains from allowing a software resolution finer than
+ * the hardware (HZ=1000)
+ */
+#define TIMERINTERVAL_BITS 19
+#define TIMERINTERVAL_SIZE (1 << TIMERINTERVAL_BITS)
+#define TIMERINTERVAL_MASK (TIMERINTERVAL_SIZE - 1)

typedef struct tvec_s {
struct list_head vec[TVN_SIZE];
@@ -67,7 +77,7 @@ typedef struct tvec_root_s {

struct tvec_t_base_s {
spinlock_t lock;
- unsigned long timer_jiffies;
+ unsigned long last_timer_time;
struct timer_list *running_timer;
tvec_root_t tv1;
tvec_t tv2;
@@ -113,11 +123,89 @@ static inline void check_timer(struct ti
check_timer_failed(timer);
}

+/*
+ * nsecs_to_timerintervals_ceiling - convert nanoseconds to timerintervals
+ * @n: number of nanoseconds to convert
+ *
+ * This is where changes to TIMERINTERVAL_BITS affect the soft-timer
+ * subsystem.
+ *
+ * Some explanation of the math is necessary:
+ * Rather than do decimal arithmetic, we shift for the sake of speed.
+ * This does mean that the actual requestable sleeps are
+ * 2^(sizeof(unsigned long)*8 - TIMERINTERVAL_BITS)
+ * timerintervals.
+ *
+ * The conditional takes care of the corner case where we request a 0
+ * nanosecond sleep; if the quantity were unsigned, we would not
+ * propogate the carry and force a wrap when adding the 1.
+ *
+ * To prevent timers from being expired early, we:
+ * Take the ceiling when we add; and
+ * Take the floor when we expire.
+ */
+static inline unsigned long nsecs_to_timerintervals_ceiling(nsec_t nsecs)
+{
+ if (nsecs)
+ return (unsigned long)(((nsecs - 1) >> TIMERINTERVAL_BITS) + 1);
+ else
+ return 0UL;
+}
+
+/*
+ * nsecs_to_timerintervals_floor - convert nanoseconds to timerintervals
+ * @n: number of nanoseconds to convert
+ *
+ * This is where changes to TIMERINTERVAL_BITS affect the soft-timer
+ * subsystem.
+ *
+ * Some explanation of the math is necessary:
+ * Rather than do decimal arithmetic, we shift for the sake of speed.
+ * This does mean that the actual requestable sleeps are
+ * 2^(sizeof(unsigned long)*8 - TIMERINTERVAL_BITS)
+ *
+ * There is no special case for 0 in the floor function, since we do not
+ * do any subtraction or addition of 1
+ *
+ * To prevent timers from being expired early, we:
+ * Take the ceiling when we add; and
+ * Take the floor when we expire.
+ */
+static inline unsigned long nsecs_to_timerintervals_floor(nsec_t nsecs)
+{
+ return (unsigned long)(nsecs >> TIMERINTERVAL_BITS);
+}
+
+/*
+ * jiffies_to_timerintervals - convert absolute jiffies to timerintervals
+ * @abs_jiffies: number of jiffies to convert
+ *
+ * First, we convert the absolute jiffies parameter to a relative
+ * jiffies value. To maintain precision, we convert the relative
+ * jiffies value to a relative nanosecond value and then convert that
+ * to a relative soft-timer interval unit value. We then add this
+ * relative value to the current time according to the timeofday-
+ * subsystem, converted to soft-timer interval units.
+ *
+ * We only use this function when adding timers, so we are free to
+ * always use the ceiling version of nsecs_to_timerintervals.
+ *
+ * This function only exists to support deprecated interfaces. Once
+ * those interfaces have been converted to the alternatives, it should
+ * be removed.
+ */
+static inline unsigned long jiffies_to_timerintervals(unsigned long abs_jiffies)
+{
+ unsigned long relative_jiffies = abs_jiffies - jiffies;
+ return nsecs_to_timerintervals_ceiling(do_monotonic_clock()) +
+ nsecs_to_timerintervals_ceiling(jiffies_to_nsecs(relative_jiffies));
+}

static void internal_add_timer(tvec_base_t *base, struct timer_list *timer)
{
+ /* expires is in timerintervals */
unsigned long expires = timer->expires;
- unsigned long idx = expires - base->timer_jiffies;
+ unsigned long idx = expires - base->last_timer_time;
struct list_head *vec;

if (idx < TVR_SIZE) {
@@ -137,7 +225,7 @@ static void internal_add_timer(tvec_base
* Can happen if you add a timer with expires == jiffies,
* or you set a timer to go off in the past
*/
- vec = base->tv1.vec + (base->timer_jiffies & TVR_MASK);
+ vec = base->tv1.vec + (base->last_timer_time & TVR_MASK);
} else {
int i;
/* If the timeout is larger than 0xffffffff on 64-bit
@@ -145,7 +233,7 @@ static void internal_add_timer(tvec_base
*/
if (idx > 0xffffffffUL) {
idx = 0xffffffffUL;
- expires = idx + base->timer_jiffies;
+ expires = idx + base->last_timer_time;
}
i = (expires >> (TVR_BITS + 3 * TVN_BITS)) & TVN_MASK;
vec = base->tv5.vec + i;
@@ -207,6 +295,7 @@ repeat:
list_del(&timer->entry);
ret = 1;
}
+ /* expires is in timerintervals */
timer->expires = expires;
internal_add_timer(new_base, timer);
timer->base = new_base;
@@ -238,15 +327,41 @@ void add_timer_on(struct timer_list *tim
check_timer(timer);

spin_lock_irqsave(&base->lock, flags);
+ timer->expires = jiffies_to_timerintervals(timer->expires);
internal_add_timer(base, timer);
timer->base = base;
spin_unlock_irqrestore(&base->lock, flags);
}

+/***
+ * add_timer - start a timer
+ * @timer: the timer to be added
+ *
+ * The kernel will do a ->function(->data) callback from the
+ * timer interrupt at the ->expired point in the future. The
+ * current time is 'jiffies'.
+ *
+ * The timer's ->expired, ->function (and if the handler uses it, ->data)
+ * fields must be set prior calling this function.
+ *
+ * Timers with an ->expired field in the past will be executed in the next
+ * timer tick.
+ *
+ * The callers of add_timer() should be aware that the interface is now
+ * deprecated. set_timer_nsecs() is the single interface for adding and
+ * modifying timers.
+ */
+void add_timer(struct timer_list * timer)
+{
+ __mod_timer(timer, jiffies_to_timerintervals(timer->expires));
+}
+
+EXPORT_SYMBOL(add_timer);

/***
* mod_timer - modify a timer's timeout
* @timer: the timer to be modified
+ * @expires: absolute time, in jiffies, when timer should expire
*
* mod_timer is a more efficient way to update the expire field of an
* active timer (if the timer is inactive it will be activated)
@@ -262,6 +377,10 @@ void add_timer_on(struct timer_list *tim
* The function returns whether it has modified a pending timer or not.
* (ie. mod_timer() of an inactive timer returns 0, mod_timer() of an
* active timer returns 1.)
+ *
+ * The callers of mod_timer() should be aware that the interface is now
+ * deprecated. set_timer_nsecs() is the single interface for adding and
+ * modifying timers.
*/
int mod_timer(struct timer_list *timer, unsigned long expires)
{
@@ -269,6 +388,7 @@ int mod_timer(struct timer_list *timer,

check_timer(timer);

+ expires = jiffies_to_timerintervals(expires);
/*
* This is a common optimization triggered by the
* networking code - if the timer is re-modified
@@ -282,6 +402,59 @@ int mod_timer(struct timer_list *timer,

EXPORT_SYMBOL(mod_timer);

+/*
+ * set_timer_nsecs - modify a timer's timeout in nsecs
+ * @timer: the timer to be modified
+ *
+ * set_timer_nsecs replaces both add_timer and mod_timer. The caller
+ * should call do_monotonic_clock() to determine the absolute timeout
+ * necessary.
+ */
+int set_timer_nsecs(struct timer_list *timer, nsec_t expires_nsecs)
+{
+ unsigned long expires;
+
+ BUG_ON(!timer->function);
+
+ check_timer(timer);
+
+ /* make sure to round up */
+ expires = nsecs_to_timerintervals_ceiling(expires_nsecs);
+ if (timer_pending(timer) && timer->expires == expires)
+ return 1;
+
+ return __mod_timer(timer, expires);
+}
+
+EXPORT_SYMBOL_GPL(set_timer_nsecs);
+
+/***
+ * set_timer_on_nsecs - start a timer on a particular CPU
+ * @timer: the timer to be added
+ * @expires_nsecs: absolute time in nsecs when timer should expire
+ * @cpu: the CPU to start it on
+ *
+ * This is not very scalable on SMP. Double adds are not possible.
+ */
+void set_timer_on_nsecs(struct timer_list *timer, nsec_t expires_nsecs, int cpu)
+{
+ tvec_base_t *base = &per_cpu(tvec_bases, cpu);
+ unsigned long flags;
+
+ BUG_ON(timer_pending(timer) || !timer->function);
+
+ check_timer(timer);
+
+ spin_lock_irqsave(&base->lock, flags);
+ /* make sure to round up */
+ timer->expires = nsecs_to_timerintervals_ceiling(expires_nsecs);
+ internal_add_timer(base, timer);
+ timer->base = base;
+ spin_unlock_irqrestore(&base->lock, flags);
+}
+
+EXPORT_SYMBOL_GPL(set_timer_on_nsecs);
+
/***
* del_timer - deactive a timer.
* @timer: the timer to be deactivated
@@ -427,21 +600,22 @@ static int cascade(tvec_base_t *base, tv
/***
* __run_timers - run all expired timers (if any) on this CPU.
* @base: the timer vector to be processed.
+ * @current_timer_time: the current time in soft-timer interval units
*
* This function cascades all vectors and executes all expired timer
* vectors.
*/
-#define INDEX(N) (base->timer_jiffies >> (TVR_BITS + N * TVN_BITS)) & TVN_MASK
+#define INDEX(N) (base->last_timer_time >> (TVR_BITS + N * TVN_BITS)) & TVN_MASK

-static inline void __run_timers(tvec_base_t *base)
+static inline void __run_timers(tvec_base_t *base, unsigned long current_timer_time)
{
struct timer_list *timer;

spin_lock_irq(&base->lock);
- while (time_after_eq(jiffies, base->timer_jiffies)) {
+ while (time_after_eq(current_timer_time, base->last_timer_time)) {
struct list_head work_list = LIST_HEAD_INIT(work_list);
struct list_head *head = &work_list;
- int index = base->timer_jiffies & TVR_MASK;
+ int index = base->last_timer_time & TVR_MASK;

/*
* Cascade timers:
@@ -451,7 +625,7 @@ static inline void __run_timers(tvec_bas
(!cascade(base, &base->tv3, INDEX(1))) &&
!cascade(base, &base->tv4, INDEX(2)))
cascade(base, &base->tv5, INDEX(3));
- ++base->timer_jiffies;
+ ++base->last_timer_time;
list_splice_init(base->tv1.vec + index, &work_list);
repeat:
if (!list_empty(head)) {
@@ -500,20 +674,20 @@ unsigned long next_timer_interrupt(void)

base = &__get_cpu_var(tvec_bases);
spin_lock(&base->lock);
- expires = base->timer_jiffies + (LONG_MAX >> 1);
+ expires = base->last_timer_time + (LONG_MAX >> 1);
list = 0;

/* Look for timer events in tv1. */
- j = base->timer_jiffies & TVR_MASK;
+ j = base->last_timer_time & TVR_MASK;
do {
list_for_each_entry(nte, base->tv1.vec + j, entry) {
expires = nte->expires;
- if (j < (base->timer_jiffies & TVR_MASK))
+ if (j < (base->last_timer_time & TVR_MASK))
list = base->tv2.vec + (INDEX(0));
goto found;
}
j = (j + 1) & TVR_MASK;
- } while (j != (base->timer_jiffies & TVR_MASK));
+ } while (j != (base->last_timer_time & TVR_MASK));

/* Check tv2-tv5. */
varray[0] = &base->tv2;
@@ -890,10 +1064,15 @@ EXPORT_SYMBOL(xtime_lock);
*/
static void run_timer_softirq(struct softirq_action *h)
{
+ unsigned long current_timer_time;
tvec_base_t *base = &__get_cpu_var(tvec_bases);

- if (time_after_eq(jiffies, base->timer_jiffies))
- __run_timers(base);
+ /* cache the converted current time, rounding down */
+ current_timer_time =
+ nsecs_to_timerintervals_floor(do_monotonic_clock());
+
+ if (time_after_eq(current_timer_time, base->last_timer_time))
+ __run_timers(base, current_timer_time);
}

/*
@@ -1078,6 +1257,10 @@ static void process_timeout(unsigned lon
* value will be %MAX_SCHEDULE_TIMEOUT.
*
* In all cases the return value is guaranteed to be non-negative.
+ *
+ * The callers of schedule_timeout() should be aware that the interface
+ * is now deprecated. schedule_timeout_{msecs,usecs,nsecs}() are now the
+ * interfaces for relative timeout requests.
*/
fastcall signed long __sched schedule_timeout(signed long timeout)
{
@@ -1133,6 +1316,149 @@ fastcall signed long __sched schedule_ti

EXPORT_SYMBOL(schedule_timeout);

+/**
+ * schedule_timeout_nsecs - sleep until timeout
+ * @timeout_nsecs: timeout value in nanoseconds
+ *
+ * Make the current task sleep until @timeout_nsecs nsecs have
+ * elapsed. The routine will return immediately unless
+ * the current task state has been set (see set_current_state()).
+ *
+ * You can set the task state as follows -
+ *
+ * %TASK_UNINTERRUPTIBLE - at least @timeout_nsecs nsecs are guaranteed
+ * to pass before the routine returns. The routine will return 0
+ *
+ * %TASK_INTERRUPTIBLE - the routine may return early if a signal is
+ * delivered to the current task. In this case the remaining time
+ * in nsecs will be returned, or 0 if the timer expired in time
+ *
+ * The current task state is guaranteed to be TASK_RUNNING when this
+ * routine returns.
+ *
+ * Specifying a @timeout value of %MAX_SCHEDULE_TIMEOUT_NSECS will
+ * schedule the CPU away without a bound on the timeout. In this case
+ * the return value will be %MAX_SCHEDULE_TIMEOUT_NSECS.
+ */
+fastcall nsec_t __sched schedule_timeout_nsecs(nsec_t timeout_nsecs)
+{
+ struct timer_list timer;
+ nsec_t expires;
+
+ if (timeout_nsecs == MAX_SCHEDULE_TIMEOUT_NSECS) {
+ schedule();
+ goto out;
+ }
+
+ expires = do_monotonic_clock() + timeout_nsecs;
+
+ init_timer(&timer);
+ timer.data = (unsigned long) current;
+ timer.function = process_timeout;
+
+ set_timer_nsecs(&timer, expires);
+ schedule();
+ del_singleshot_timer_sync(&timer);
+
+ timeout_nsecs = do_monotonic_clock();
+ if (expires < timeout_nsecs)
+ timeout_nsecs = (nsec_t)0UL;
+ else
+ timeout_nsecs = expires - timeout_nsecs;
+out:
+ return timeout_nsecs;
+}
+
+EXPORT_SYMBOL_GPL(schedule_timeout_nsecs);
+
+/**
+ * schedule_timeout_usecs - sleep until timeout
+ * @timeout_usecs: timeout value in nanoseconds
+ *
+ * Make the current task sleep until @timeout_usecs usecs have
+ * elapsed. The routine will return immediately unless
+ * the current task state has been set (see set_current_state()).
+ *
+ * You can set the task state as follows -
+ *
+ * %TASK_UNINTERRUPTIBLE - at least @timeout_usecs usecs are guaranteed
+ * to pass before the routine returns. The routine will return 0
+ *
+ * %TASK_INTERRUPTIBLE - the routine may return early if a signal is
+ * delivered to the current task. In this case the remaining time
+ * in usecs will be returned, or 0 if the timer expired in time
+ *
+ * The current task state is guaranteed to be TASK_RUNNING when this
+ * routine returns.
+ *
+ * Specifying a @timeout value of %MAX_SCHEDULE_TIMEOUT_USECS will
+ * schedule the CPU away without a bound on the timeout. In this case
+ * the return value will be %MAX_SCHEDULE_TIMEOUT_USECS.
+ */
+fastcall inline unsigned long __sched schedule_timeout_usecs(unsigned long timeout_usecs)
+{
+ nsec_t timeout_nsecs;
+
+ if (timeout_usecs == MAX_SCHEDULE_TIMEOUT_USECS)
+ timeout_nsecs = MAX_SCHEDULE_TIMEOUT_NSECS;
+ else
+ timeout_nsecs = timeout_usecs * (nsec_t)NSEC_PER_USEC;
+ /*
+ * Make sure to round up by subtracting one before division and
+ * adding one after
+ */
+ timeout_nsecs = schedule_timeout_nsecs(timeout_nsecs) - 1;
+ do_div(timeout_nsecs, NSEC_PER_USEC);
+ timeout_usecs = (unsigned long)timeout_nsecs + 1UL;
+ return timeout_usecs;
+}
+
+EXPORT_SYMBOL_GPL(schedule_timeout_usecs);
+
+/**
+ * schedule_timeout_msecs - sleep until timeout
+ * @timeout_msecs: timeout value in nanoseconds
+ *
+ * Make the current task sleep until @timeout_msecs msecs have
+ * elapsed. The routine will return immediately unless
+ * the current task state has been set (see set_current_state()).
+ *
+ * You can set the task state as follows -
+ *
+ * %TASK_UNINTERRUPTIBLE - at least @timeout_msecs msecs are guaranteed
+ * to pass before the routine returns. The routine will return 0
+ *
+ * %TASK_INTERRUPTIBLE - the routine may return early if a signal is
+ * delivered to the current task. In this case the remaining time
+ * in msecs will be returned, or 0 if the timer expired in time
+ *
+ * The current task state is guaranteed to be TASK_RUNNING when this
+ * routine returns.
+ *
+ * Specifying a @timeout value of %MAX_SCHEDULE_TIMEOUT_MSECS will
+ * schedule the CPU away without a bound on the timeout. In this case
+ * the return value will be %MAX_SCHEDULE_TIMEOUT_MSECS.
+ */
+fastcall inline unsigned int __sched schedule_timeout_msecs(unsigned int timeout_msecs)
+{
+ nsec_t timeout_nsecs;
+
+ if (timeout_msecs == MAX_SCHEDULE_TIMEOUT_MSECS)
+ timeout_nsecs = MAX_SCHEDULE_TIMEOUT_NSECS;
+ else
+ timeout_nsecs = timeout_msecs * (nsec_t)NSEC_PER_MSEC;
+ /*
+ * Make sure to round up by subtracting one before division and
+ * adding one after
+ */
+ timeout_nsecs = schedule_timeout_nsecs(timeout_nsecs) - 1;
+ do_div(timeout_nsecs, NSEC_PER_MSEC);
+ timeout_msecs = (unsigned int)timeout_nsecs + 1;
+ return timeout_msecs;
+}
+
+EXPORT_SYMBOL_GPL(schedule_timeout_msecs);
+
/* Thread ID - the internal kernel "pid" */
asmlinkage long sys_gettid(void)
{
@@ -1302,7 +1628,11 @@ static void __devinit init_timers_cpu(in
for (j = 0; j < TVR_SIZE; j++)
INIT_LIST_HEAD(base->tv1.vec + j);

- base->timer_jiffies = jiffies;
+ /*
+ * Under the new montonic_clock() oriented soft-timer subsystem,
+ * we begin at 0, not INITIAL_JIFFIES
+ */
+ base->last_timer_time = 0UL;
}

#ifdef CONFIG_HOTPLUG_CPU

2005-05-18 16:04:12

by Jonathan Corbet

[permalink] [raw]
Subject: Re: [RFC][UPDATE PATCH 2/4] convert soft-timer subsystem to timerintervals

Hi, Nishanth,

To my uneducated eye, this patch looks like a useful cleaning-up of the
timer API. I do have one question, though...

> @@ -238,15 +327,41 @@ void add_timer_on(struct timer_list *tim
> check_timer(timer);
>
> spin_lock_irqsave(&base->lock, flags);
> + timer->expires = jiffies_to_timerintervals(timer->expires);

It would appear that, depending on where you are, ->expires can be
expressed in two different units. Users of add_timer() and mod_timer()
are expecting jiffies, but the internal code uses timer intervals. What
happens when somebody does something like this?

mod_timer(&my_timer, my_timer.expires + additional_delay);

Might it be better to store the timerintervals value in a different
field, and leave ->expires as part of the legacy interface only?

jon

Jonathan Corbet
Executive editor, LWN.net
[email protected]

2005-05-18 17:03:41

by Nishanth Aravamudan

[permalink] [raw]
Subject: Re: [RFC][UPDATE PATCH 2/4] convert soft-timer subsystem to timerintervals

On 18.05.2005 [09:59:27 -0600], Jonathan Corbet wrote:
> Hi, Nishanth,
>
> To my uneducated eye, this patch looks like a useful cleaning-up of the
> timer API. I do have one question, though...

Thanks! I think one of the best side-effects (beyond a more accurate
execution of sleep requests) of my patch is that the new interfaces are
a heck of a lot saner :)

> > @@ -238,15 +327,41 @@ void add_timer_on(struct timer_list *tim
> > check_timer(timer);
> >
> > spin_lock_irqsave(&base->lock, flags);
> > + timer->expires = jiffies_to_timerintervals(timer->expires);
>
> It would appear that, depending on where you are, ->expires can be
> expressed in two different units. Users of add_timer() and mod_timer()
> are expecting jiffies, but the internal code uses timer intervals. What
> happens when somebody does something like this?
>
> mod_timer(&my_timer, my_timer.expires + additional_delay);
>
> Might it be better to store the timerintervals value in a different
> field, and leave ->expires as part of the legacy interface only?

This is definitely an option. Currently, it is somewhat vague as to
whether, once a timer has been submitted, whether the expires field is
still valid to the caller. In the new system, it will clearly explicitly
not be (I meant to modify the comment to add_timer(), mod_timer() and
set_timer_nsecs() appropriately, but have not yet.

The problem with the mod_timer() approach you suggest is that there is
no guarantee that my_timer.expires represents anything close to the
current time. And, as far as my experience with reviewing the current
callers is concerned, there is no such usage.

It definitely is feasible and reasonable, though, to make that change. I
will look into it and see what I can do.

Thanks for the feedback!

-Nish

2005-05-19 23:34:25

by Nishanth Aravamudan

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/4] new timeofday-based soft-timer subsystem

On 17.05.2005 [16:33:00 -0700], Nishanth Aravamudan wrote:
> On 13.05.2005 [17:16:35 -0700], john stultz wrote:
> > All,
> > This patch implements the architecture independent portion of the new
> > time of day subsystem. For a brief description on the rework, see here:
> > http://lwn.net/Articles/120850/ (Many thanks to the LWN team for that
> > easy to understand writeup!)
> >
> > I intend this to be the last RFC release and to submit this patch to
> > Andrew for for testing near the end of this month. So please, if you
> > have any complaints, suggestions, or blocking issues, let me know.
>
> I have been working closely with John to re-work the soft-timer subsytem
> to use the new timeofday() subsystem. The following patches attempts to
> begin this process. I would greatly appreciate any comments.

<snip>

> I will try to get some current benchmark differentials posted tomorrow.
> The previous patch I released showed little difference between mainline,
> John's timeofday rework and my soft-timer rework in kernbench.

<snip>

Hrm, "tomorrow" became "two days from now," but here they are, kernbench
comparisons (in percent relative to mainline) for x86 and x86_64, 10
iterations each.

The x86_64 machine is a 2-way 2.0 GHz with 3.5 GB of RAM.
The x86 machine is 16-way (32 with HT) 1.4 GHz Xeon with 15 GB of RAM.

x86

Elapsed User System CPU

2.6.12-rc4: 100% 100% 100% 100%

2.6.12-rc4 + John's patch: 100.3% 100% 99.8% 99.6%

2.6.12-rc4 + John's patch + my patch: 102.1% 101% 100% 98%

x86_64

Elapsed User System CPU

2.6.12-rc4: 100% 100% 100% 100%

2.6.12-rc4 + John's patch: 99.5% 99.5% 99.1% 99.9%

2.6.12-rc4 + John's patch + my patch: 99.7% 99.7% 99.5% 100%

----

All in all, pretty consistent across the board.

Thanks,
Nish