2006-11-29 03:00:23

by john stultz

[permalink] [raw]
Subject: [PATCH 0/5][time][x86_64] GENERIC_TIME patchset for x86_64

Hey Andi,
First let me apologize, I've been busy with other things and
its been far too long since I last posted this. Anyway, I found some
time to resync my trees and wanted to send this along.

You had asked earlier about performance impact:

Vanilla TSC:
149 nsecs per gtod call
367 nsecs per CLOCK_MONOTONIC call
288 nsecs per CLOCK_REALTIME call
Vanilla ACPI PM:
1272 nsecs per gtod call
1335 nsecs per CLOCK_MONOTONIC call
1273 nsecs per CLOCK_REALTIME call

GENERIC_TIME TSC:
149 nsecs per gtod call
304 nsecs per CLOCK_MONOTONIC call
275 nsecs per CLOCK_REALTIME call
GENERIC_TIME ACPI PM:
1273 nsecs per gtod call
1275 nsecs per CLOCK_MONOTONIC call
1273 nsecs per CLOCK_REALTIME call

So almost no performance change.

Ingo has a few cleanups I need to merge, but otherwise I think this is
getting close to ready for inclusion into -mm for testing. Please let
me know if you have any major objections and if not I'll re-diff it
against -mm and send it to Andrew.

New in the current C7 release:
o Synched up w/ 2.6.19-rc6-git11
o Reworked the patch order to be a bit more logical
o Dropped the apic_runs_main_timer removal on Andi's request

Let me know if you have any thoughts or comments!

thanks again!
-john


2006-11-29 03:00:52

by john stultz

[permalink] [raw]
Subject: [PATCH 3/5][time][x86_64] Split x86_64/kernel/time.c up

In preperation for the x86_64 generic time conversion, this patch
splits out TSC and HPET related code from arch/x86_64/kernel/time.c
into respective hpet.c and tsc.c files.

Signed-off-by: John Stultz <[email protected]>

arch/x86_64/kernel/Makefile | 2
arch/x86_64/kernel/hpet.c | 435 ++++++++++++++++++++++++++++++
arch/x86_64/kernel/time.c | 628 --------------------------------------------
arch/x86_64/kernel/tsc.c | 201 ++++++++++++++
include/asm-x86_64/hpet.h | 6
include/asm-x86_64/timex.h | 11
6 files changed, 658 insertions(+), 625 deletions(-)

linux-2.6.19-rc6git11_timeofday-arch-x86-64-split-hpet-tsc-time_C7.patch
============================================
diff --git a/arch/x86_64/kernel/Makefile b/arch/x86_64/kernel/Makefile
index 3c7cbff..e68a87e 100644
--- a/arch/x86_64/kernel/Makefile
+++ b/arch/x86_64/kernel/Makefile
@@ -8,7 +8,7 @@ obj-y := process.o signal.o entry.o trap
ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_x86_64.o \
x8664_ksyms.o i387.o syscall.o vsyscall.o \
setup64.o bootflag.o e820.o reboot.o quirks.o i8237.o \
- pci-dma.o pci-nommu.o alternative.o
+ pci-dma.o pci-nommu.o alternative.o hpet.o tsc.o

obj-$(CONFIG_STACKTRACE) += stacktrace.o
obj-$(CONFIG_X86_MCE) += mce.o therm_throt.o
diff --git a/arch/x86_64/kernel/hpet.c b/arch/x86_64/kernel/hpet.c
new file mode 100644
index 0000000..a219786
--- /dev/null
+++ b/arch/x86_64/kernel/hpet.c
@@ -0,0 +1,435 @@
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/init.h>
+#include <linux/mc146818rtc.h>
+#include <linux/time.h>
+#include <linux/clocksource.h>
+#include <linux/ioport.h>
+#include <linux/acpi.h>
+#include <linux/hpet.h>
+#include <asm/timex.h>
+#include <asm/hpet.h>
+
+int nohpet __initdata = 0;
+
+unsigned long hpet_address;
+unsigned long hpet_period; /* fsecs / HPET clock */
+unsigned long hpet_tick; /* HPET clocks / interrupt */
+
+int hpet_use_timer; /* Use counter of hpet for time keeping,
+ * otherwise PIT
+ */
+unsigned int do_gettimeoffset_hpet(void)
+{
+ /* cap counter read to one tick to avoid inconsistencies */
+ unsigned long counter = hpet_readl(HPET_COUNTER) - vxtime.last;
+ return (min(counter,hpet_tick) * vxtime.quot) >> US_SCALE;
+}
+
+#ifdef CONFIG_HPET
+static __init int late_hpet_init(void)
+{
+ struct hpet_data hd;
+ unsigned int ntimer;
+
+ if (!hpet_address)
+ return 0;
+
+ memset(&hd, 0, sizeof (hd));
+
+ ntimer = hpet_readl(HPET_ID);
+ ntimer = (ntimer & HPET_ID_NUMBER) >> HPET_ID_NUMBER_SHIFT;
+ ntimer++;
+
+ /*
+ * Register with driver.
+ * Timer0 and Timer1 is used by platform.
+ */
+ hd.hd_phys_address = hpet_address;
+ hd.hd_address = (void __iomem *)fix_to_virt(FIX_HPET_BASE);
+ hd.hd_nirqs = ntimer;
+ hd.hd_flags = HPET_DATA_PLATFORM;
+ hpet_reserve_timer(&hd, 0);
+#ifdef CONFIG_HPET_EMULATE_RTC
+ hpet_reserve_timer(&hd, 1);
+#endif
+ hd.hd_irq[0] = HPET_LEGACY_8254;
+ hd.hd_irq[1] = HPET_LEGACY_RTC;
+ if (ntimer > 2) {
+ struct hpet *hpet;
+ struct hpet_timer *timer;
+ int i;
+
+ hpet = (struct hpet *) fix_to_virt(FIX_HPET_BASE);
+ timer = &hpet->hpet_timers[2];
+ for (i = 2; i < ntimer; timer++, i++)
+ hd.hd_irq[i] = (timer->hpet_config &
+ Tn_INT_ROUTE_CNF_MASK) >>
+ Tn_INT_ROUTE_CNF_SHIFT;
+
+ }
+
+ hpet_alloc(&hd);
+ return 0;
+}
+fs_initcall(late_hpet_init);
+#endif
+
+int hpet_timer_stop_set_go(unsigned long tick)
+{
+ unsigned int cfg;
+
+/*
+ * Stop the timers and reset the main counter.
+ */
+
+ cfg = hpet_readl(HPET_CFG);
+ cfg &= ~(HPET_CFG_ENABLE | HPET_CFG_LEGACY);
+ hpet_writel(cfg, HPET_CFG);
+ hpet_writel(0, HPET_COUNTER);
+ hpet_writel(0, HPET_COUNTER + 4);
+
+/*
+ * Set up timer 0, as periodic with first interrupt to happen at hpet_tick,
+ * and period also hpet_tick.
+ */
+ if (hpet_use_timer) {
+ hpet_writel(HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_SETVAL |
+ HPET_TN_32BIT, HPET_T0_CFG);
+ hpet_writel(hpet_tick, HPET_T0_CMP); /* next interrupt */
+ hpet_writel(hpet_tick, HPET_T0_CMP); /* period */
+ cfg |= HPET_CFG_LEGACY;
+ }
+/*
+ * Go!
+ */
+
+ cfg |= HPET_CFG_ENABLE;
+ hpet_writel(cfg, HPET_CFG);
+
+ return 0;
+}
+
+int hpet_arch_init(void)
+{
+ unsigned int id;
+
+ if (!hpet_address)
+ return -1;
+ set_fixmap_nocache(FIX_HPET_BASE, hpet_address);
+ __set_fixmap(VSYSCALL_HPET, hpet_address, PAGE_KERNEL_VSYSCALL_NOCACHE);
+
+/*
+ * Read the period, compute tick and quotient.
+ */
+
+ id = hpet_readl(HPET_ID);
+
+ if (!(id & HPET_ID_VENDOR) || !(id & HPET_ID_NUMBER))
+ return -1;
+
+ hpet_period = hpet_readl(HPET_PERIOD);
+ if (hpet_period < 100000 || hpet_period > 100000000)
+ return -1;
+
+ hpet_tick = (FSEC_PER_TICK + hpet_period / 2) / hpet_period;
+
+ hpet_use_timer = (id & HPET_ID_LEGSUP);
+
+ return hpet_timer_stop_set_go(hpet_tick);
+}
+
+int hpet_reenable(void)
+{
+ return hpet_timer_stop_set_go(hpet_tick);
+}
+
+/*
+ * calibrate_tsc() calibrates the processor TSC in a very simple way, comparing
+ * it to the HPET timer of known frequency.
+ */
+
+#define TICK_COUNT 100000000
+
+unsigned int __init hpet_calibrate_tsc(void)
+{
+ int tsc_start, hpet_start;
+ int tsc_now, hpet_now;
+ unsigned long flags;
+
+ local_irq_save(flags);
+ local_irq_disable();
+
+ hpet_start = hpet_readl(HPET_COUNTER);
+ rdtscl(tsc_start);
+
+ do {
+ local_irq_disable();
+ hpet_now = hpet_readl(HPET_COUNTER);
+ tsc_now = get_cycles_sync();
+ local_irq_restore(flags);
+ } while ((tsc_now - tsc_start) < TICK_COUNT &&
+ (hpet_now - hpet_start) < TICK_COUNT);
+
+ return (tsc_now - tsc_start) * 1000000000L
+ / ((hpet_now - hpet_start) * hpet_period / 1000);
+}
+
+#ifdef CONFIG_HPET_EMULATE_RTC
+/* HPET in LegacyReplacement Mode eats up RTC interrupt line. When, HPET
+ * is enabled, we support RTC interrupt functionality in software.
+ * RTC has 3 kinds of interrupts:
+ * 1) Update Interrupt - generate an interrupt, every sec, when RTC clock
+ * is updated
+ * 2) Alarm Interrupt - generate an interrupt at a specific time of day
+ * 3) Periodic Interrupt - generate periodic interrupt, with frequencies
+ * 2Hz-8192Hz (2Hz-64Hz for non-root user) (all freqs in powers of 2)
+ * (1) and (2) above are implemented using polling at a frequency of
+ * 64 Hz. The exact frequency is a tradeoff between accuracy and interrupt
+ * overhead. (DEFAULT_RTC_INT_FREQ)
+ * For (3), we use interrupts at 64Hz or user specified periodic
+ * frequency, whichever is higher.
+ */
+#include <linux/rtc.h>
+
+#define DEFAULT_RTC_INT_FREQ 64
+#define RTC_NUM_INTS 1
+
+static unsigned long UIE_on;
+static unsigned long prev_update_sec;
+
+static unsigned long AIE_on;
+static struct rtc_time alarm_time;
+
+static unsigned long PIE_on;
+static unsigned long PIE_freq = DEFAULT_RTC_INT_FREQ;
+static unsigned long PIE_count;
+
+static unsigned long hpet_rtc_int_freq; /* RTC interrupt frequency */
+static unsigned int hpet_t1_cmp; /* cached comparator register */
+
+int is_hpet_enabled(void)
+{
+ return hpet_address != 0;
+}
+
+/*
+ * Timer 1 for RTC, we do not use periodic interrupt feature,
+ * even if HPET supports periodic interrupts on Timer 1.
+ * The reason being, to set up a periodic interrupt in HPET, we need to
+ * stop the main counter. And if we do that everytime someone diables/enables
+ * RTC, we will have adverse effect on main kernel timer running on Timer 0.
+ * So, for the time being, simulate the periodic interrupt in software.
+ *
+ * hpet_rtc_timer_init() is called for the first time and during subsequent
+ * interuppts reinit happens through hpet_rtc_timer_reinit().
+ */
+int hpet_rtc_timer_init(void)
+{
+ unsigned int cfg, cnt;
+ unsigned long flags;
+
+ if (!is_hpet_enabled())
+ return 0;
+ /*
+ * Set the counter 1 and enable the interrupts.
+ */
+ if (PIE_on && (PIE_freq > DEFAULT_RTC_INT_FREQ))
+ hpet_rtc_int_freq = PIE_freq;
+ else
+ hpet_rtc_int_freq = DEFAULT_RTC_INT_FREQ;
+
+ local_irq_save(flags);
+
+ cnt = hpet_readl(HPET_COUNTER);
+ cnt += ((hpet_tick*HZ)/hpet_rtc_int_freq);
+ hpet_writel(cnt, HPET_T1_CMP);
+ hpet_t1_cmp = cnt;
+
+ cfg = hpet_readl(HPET_T1_CFG);
+ cfg &= ~HPET_TN_PERIODIC;
+ cfg |= HPET_TN_ENABLE | HPET_TN_32BIT;
+ hpet_writel(cfg, HPET_T1_CFG);
+
+ local_irq_restore(flags);
+
+ return 1;
+}
+
+static void hpet_rtc_timer_reinit(void)
+{
+ unsigned int cfg, cnt, ticks_per_int, lost_ints;
+
+ if (unlikely(!(PIE_on | AIE_on | UIE_on))) {
+ cfg = hpet_readl(HPET_T1_CFG);
+ cfg &= ~HPET_TN_ENABLE;
+ hpet_writel(cfg, HPET_T1_CFG);
+ return;
+ }
+
+ if (PIE_on && (PIE_freq > DEFAULT_RTC_INT_FREQ))
+ hpet_rtc_int_freq = PIE_freq;
+ else
+ hpet_rtc_int_freq = DEFAULT_RTC_INT_FREQ;
+
+ /* It is more accurate to use the comparator value than current count.*/
+ ticks_per_int = hpet_tick * HZ / hpet_rtc_int_freq;
+ hpet_t1_cmp += ticks_per_int;
+ hpet_writel(hpet_t1_cmp, HPET_T1_CMP);
+
+ /*
+ * If the interrupt handler was delayed too long, the write above tries
+ * to schedule the next interrupt in the past and the hardware would
+ * not interrupt until the counter had wrapped around.
+ * So we have to check that the comparator wasn't set to a past time.
+ */
+ cnt = hpet_readl(HPET_COUNTER);
+ if (unlikely((int)(cnt - hpet_t1_cmp) > 0)) {
+ lost_ints = (cnt - hpet_t1_cmp) / ticks_per_int + 1;
+ /* Make sure that, even with the time needed to execute
+ * this code, the next scheduled interrupt has been moved
+ * back to the future: */
+ lost_ints++;
+
+ hpet_t1_cmp += lost_ints * ticks_per_int;
+ hpet_writel(hpet_t1_cmp, HPET_T1_CMP);
+
+ if (PIE_on)
+ PIE_count += lost_ints;
+
+ printk(KERN_WARNING "rtc: lost some interrupts at %ldHz.\n",
+ hpet_rtc_int_freq);
+ }
+}
+
+/*
+ * The functions below are called from rtc driver.
+ * Return 0 if HPET is not being used.
+ * Otherwise do the necessary changes and return 1.
+ */
+int hpet_mask_rtc_irq_bit(unsigned long bit_mask)
+{
+ if (!is_hpet_enabled())
+ return 0;
+
+ if (bit_mask & RTC_UIE)
+ UIE_on = 0;
+ if (bit_mask & RTC_PIE)
+ PIE_on = 0;
+ if (bit_mask & RTC_AIE)
+ AIE_on = 0;
+
+ return 1;
+}
+
+int hpet_set_rtc_irq_bit(unsigned long bit_mask)
+{
+ int timer_init_reqd = 0;
+
+ if (!is_hpet_enabled())
+ return 0;
+
+ if (!(PIE_on | AIE_on | UIE_on))
+ timer_init_reqd = 1;
+
+ if (bit_mask & RTC_UIE) {
+ UIE_on = 1;
+ }
+ if (bit_mask & RTC_PIE) {
+ PIE_on = 1;
+ PIE_count = 0;
+ }
+ if (bit_mask & RTC_AIE) {
+ AIE_on = 1;
+ }
+
+ if (timer_init_reqd)
+ hpet_rtc_timer_init();
+
+ return 1;
+}
+
+int hpet_set_alarm_time(unsigned char hrs, unsigned char min, unsigned char sec)
+{
+ if (!is_hpet_enabled())
+ return 0;
+
+ alarm_time.tm_hour = hrs;
+ alarm_time.tm_min = min;
+ alarm_time.tm_sec = sec;
+
+ return 1;
+}
+
+int hpet_set_periodic_freq(unsigned long freq)
+{
+ if (!is_hpet_enabled())
+ return 0;
+
+ PIE_freq = freq;
+ PIE_count = 0;
+
+ return 1;
+}
+
+int hpet_rtc_dropped_irq(void)
+{
+ if (!is_hpet_enabled())
+ return 0;
+
+ return 1;
+}
+
+irqreturn_t hpet_rtc_interrupt(int irq, void *dev_id, struct pt_regs *regs)
+{
+ struct rtc_time curr_time;
+ unsigned long rtc_int_flag = 0;
+ int call_rtc_interrupt = 0;
+
+ hpet_rtc_timer_reinit();
+
+ if (UIE_on | AIE_on) {
+ rtc_get_rtc_time(&curr_time);
+ }
+ if (UIE_on) {
+ if (curr_time.tm_sec != prev_update_sec) {
+ /* Set update int info, call real rtc int routine */
+ call_rtc_interrupt = 1;
+ rtc_int_flag = RTC_UF;
+ prev_update_sec = curr_time.tm_sec;
+ }
+ }
+ if (PIE_on) {
+ PIE_count++;
+ if (PIE_count >= hpet_rtc_int_freq/PIE_freq) {
+ /* Set periodic int info, call real rtc int routine */
+ call_rtc_interrupt = 1;
+ rtc_int_flag |= RTC_PF;
+ PIE_count = 0;
+ }
+ }
+ if (AIE_on) {
+ if ((curr_time.tm_sec == alarm_time.tm_sec) &&
+ (curr_time.tm_min == alarm_time.tm_min) &&
+ (curr_time.tm_hour == alarm_time.tm_hour)) {
+ /* Set alarm int info, call real rtc int routine */
+ call_rtc_interrupt = 1;
+ rtc_int_flag |= RTC_AF;
+ }
+ }
+ if (call_rtc_interrupt) {
+ rtc_int_flag |= (RTC_IRQF | (RTC_NUM_INTS << 8));
+ rtc_interrupt(rtc_int_flag, dev_id);
+ }
+ return IRQ_HANDLED;
+}
+#endif
+
+static int __init nohpet_setup(char *s)
+{
+ nohpet = 1;
+ return 1;
+}
+
+__setup("nohpet", nohpet_setup);
+
diff --git a/arch/x86_64/kernel/time.c b/arch/x86_64/kernel/time.c
index a6820e0..164ba5f 100644
--- a/arch/x86_64/kernel/time.c
+++ b/arch/x86_64/kernel/time.c
@@ -42,9 +42,10 @@ #include <asm/sections.h>
#include <linux/cpufreq.h>
#include <linux/hpet.h>
#include <asm/apic.h>
+#include <asm/hpet.h>

#ifdef CONFIG_CPU_FREQ
-static void cpufreq_delayed_get(void);
+extern void cpufreq_delayed_get(void);
#endif
extern void i8254_timer_resume(void);
extern int using_apic_timer;
@@ -55,22 +56,6 @@ DEFINE_SPINLOCK(rtc_lock);
EXPORT_SYMBOL(rtc_lock);
DEFINE_SPINLOCK(i8253_lock);

-int nohpet __initdata = 0;
-static int notsc __initdata = 0;
-
-#define USEC_PER_TICK (USEC_PER_SEC / HZ)
-#define NSEC_PER_TICK (NSEC_PER_SEC / HZ)
-#define FSEC_PER_TICK (FSEC_PER_SEC / HZ)
-
-#define NS_SCALE 10 /* 2^10, carefully chosen */
-#define US_SCALE 32 /* 2^32, arbitralrily chosen */
-
-unsigned int cpu_khz; /* TSC clocks / usec, not used here */
-EXPORT_SYMBOL(cpu_khz);
-unsigned long hpet_address;
-static unsigned long hpet_period; /* fsecs / HPET clock */
-unsigned long hpet_tick; /* HPET clocks / interrupt */
-int hpet_use_timer; /* Use counter of hpet for time keeping, otherwise PIT */
unsigned long vxtime_hz = PIT_TICK_RATE;
int report_lost_ticks; /* command line option */
unsigned long long monotonic_base;
@@ -81,34 +66,6 @@ volatile unsigned long __jiffies __secti
struct timespec __xtime __section_xtime;
struct timezone __sys_tz __section_sys_tz;

-/*
- * do_gettimeoffset() returns microseconds since last timer interrupt was
- * triggered by hardware. A memory read of HPET is slower than a register read
- * of TSC, but much more reliable. It's also synchronized to the timer
- * interrupt. Note that do_gettimeoffset() may return more than hpet_tick, if a
- * timer interrupt has happened already, but vxtime.trigger wasn't updated yet.
- * This is not a problem, because jiffies hasn't updated either. They are bound
- * together by xtime_lock.
- */
-
-static inline unsigned int do_gettimeoffset_tsc(void)
-{
- unsigned long t;
- unsigned long x;
- t = get_cycles_sync();
- if (t < vxtime.last_tsc)
- t = vxtime.last_tsc; /* hack */
- x = ((t - vxtime.last_tsc) * vxtime.tsc_quot) >> US_SCALE;
- return x;
-}
-
-static inline unsigned int do_gettimeoffset_hpet(void)
-{
- /* cap counter read to one tick to avoid inconsistencies */
- unsigned long counter = hpet_readl(HPET_COUNTER) - vxtime.last;
- return (min(counter,hpet_tick) * vxtime.quot) >> US_SCALE;
-}
-
unsigned int (*do_gettimeoffset)(void) = do_gettimeoffset_tsc;

/*
@@ -272,7 +229,7 @@ static void set_rtc_mmss(unsigned long n
* Note: This function is required to return accurate
* time even in the absence of multiple timer ticks.
*/
-static inline unsigned long long cycles_2_ns(unsigned long long cyc);
+extern unsigned long long cycles_2_ns(unsigned long long cyc);
unsigned long long monotonic_clock(void)
{
unsigned long seq;
@@ -462,40 +419,6 @@ static irqreturn_t timer_interrupt(int i
return IRQ_HANDLED;
}

-static unsigned int cyc2ns_scale __read_mostly;
-
-static inline void set_cyc2ns_scale(unsigned long cpu_khz)
-{
- cyc2ns_scale = (NSEC_PER_MSEC << NS_SCALE) / cpu_khz;
-}
-
-static inline unsigned long long cycles_2_ns(unsigned long long cyc)
-{
- return (cyc * cyc2ns_scale) >> NS_SCALE;
-}
-
-unsigned long long sched_clock(void)
-{
- unsigned long a = 0;
-
-#if 0
- /* Don't do a HPET read here. Using TSC always is much faster
- and HPET may not be mapped yet when the scheduler first runs.
- Disadvantage is a small drift between CPUs in some configurations,
- but that should be tolerable. */
- if (__vxtime.mode == VXTIME_HPET)
- return (hpet_readl(HPET_COUNTER) * vxtime.quot) >> US_SCALE;
-#endif
-
- /* Could do CPU core sync here. Opteron can execute rdtsc speculatively,
- which means it is not completely exact and may not be monotonous between
- CPUs. But the errors should be too small to matter for scheduling
- purposes. */
-
- rdtscll(a);
- return cycles_2_ns(a);
-}
-
static unsigned long get_cmos_time(void)
{
unsigned int year, mon, day, hour, min, sec;
@@ -547,142 +470,6 @@ #endif
return mktime(year, mon, day, hour, min, sec);
}

-#ifdef CONFIG_CPU_FREQ
-
-/* Frequency scaling support. Adjust the TSC based timer when the cpu frequency
- changes.
-
- RED-PEN: On SMP we assume all CPUs run with the same frequency. It's
- not that important because current Opteron setups do not support
- scaling on SMP anyroads.
-
- Should fix up last_tsc too. Currently gettimeofday in the
- first tick after the change will be slightly wrong. */
-
-#include <linux/workqueue.h>
-
-static unsigned int cpufreq_delayed_issched = 0;
-static unsigned int cpufreq_init = 0;
-static struct work_struct cpufreq_delayed_get_work;
-
-static void handle_cpufreq_delayed_get(void *v)
-{
- unsigned int cpu;
- for_each_online_cpu(cpu) {
- cpufreq_get(cpu);
- }
- cpufreq_delayed_issched = 0;
-}
-
-/* if we notice lost ticks, schedule a call to cpufreq_get() as it tries
- * to verify the CPU frequency the timing core thinks the CPU is running
- * at is still correct.
- */
-static void cpufreq_delayed_get(void)
-{
- static int warned;
- if (cpufreq_init && !cpufreq_delayed_issched) {
- cpufreq_delayed_issched = 1;
- if (!warned) {
- warned = 1;
- printk(KERN_DEBUG
- "Losing some ticks... checking if CPU frequency changed.\n");
- }
- schedule_work(&cpufreq_delayed_get_work);
- }
-}
-
-static unsigned int ref_freq = 0;
-static unsigned long loops_per_jiffy_ref = 0;
-
-static unsigned long cpu_khz_ref = 0;
-
-static int time_cpufreq_notifier(struct notifier_block *nb, unsigned long val,
- void *data)
-{
- struct cpufreq_freqs *freq = data;
- unsigned long *lpj, dummy;
-
- if (cpu_has(&cpu_data[freq->cpu], X86_FEATURE_CONSTANT_TSC))
- return 0;
-
- lpj = &dummy;
- if (!(freq->flags & CPUFREQ_CONST_LOOPS))
-#ifdef CONFIG_SMP
- lpj = &cpu_data[freq->cpu].loops_per_jiffy;
-#else
- lpj = &boot_cpu_data.loops_per_jiffy;
-#endif
-
- if (!ref_freq) {
- ref_freq = freq->old;
- loops_per_jiffy_ref = *lpj;
- cpu_khz_ref = cpu_khz;
- }
- if ((val == CPUFREQ_PRECHANGE && freq->old < freq->new) ||
- (val == CPUFREQ_POSTCHANGE && freq->old > freq->new) ||
- (val == CPUFREQ_RESUMECHANGE)) {
- *lpj =
- cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new);
-
- cpu_khz = cpufreq_scale(cpu_khz_ref, ref_freq, freq->new);
- if (!(freq->flags & CPUFREQ_CONST_LOOPS))
- vxtime.tsc_quot = (USEC_PER_MSEC << US_SCALE) / cpu_khz;
- }
-
- set_cyc2ns_scale(cpu_khz_ref);
-
- return 0;
-}
-
-static struct notifier_block time_cpufreq_notifier_block = {
- .notifier_call = time_cpufreq_notifier
-};
-
-static int __init cpufreq_tsc(void)
-{
- INIT_WORK(&cpufreq_delayed_get_work, handle_cpufreq_delayed_get, NULL);
- if (!cpufreq_register_notifier(&time_cpufreq_notifier_block,
- CPUFREQ_TRANSITION_NOTIFIER))
- cpufreq_init = 1;
- return 0;
-}
-
-core_initcall(cpufreq_tsc);
-
-#endif
-
-/*
- * calibrate_tsc() calibrates the processor TSC in a very simple way, comparing
- * it to the HPET timer of known frequency.
- */
-
-#define TICK_COUNT 100000000
-
-static unsigned int __init hpet_calibrate_tsc(void)
-{
- int tsc_start, hpet_start;
- int tsc_now, hpet_now;
- unsigned long flags;
-
- local_irq_save(flags);
- local_irq_disable();
-
- hpet_start = hpet_readl(HPET_COUNTER);
- rdtscl(tsc_start);
-
- do {
- local_irq_disable();
- hpet_now = hpet_readl(HPET_COUNTER);
- tsc_now = get_cycles_sync();
- local_irq_restore(flags);
- } while ((tsc_now - tsc_start) < TICK_COUNT &&
- (hpet_now - hpet_start) < TICK_COUNT);
-
- return (tsc_now - tsc_start) * 1000000000L
- / ((hpet_now - hpet_start) * hpet_period / 1000);
-}
-

/*
* pit_calibrate_tsc() uses the speaker output (channel 2) of
@@ -713,124 +500,6 @@ static unsigned int __init pit_calibrate
return (end - start) / 50;
}

-#ifdef CONFIG_HPET
-static __init int late_hpet_init(void)
-{
- struct hpet_data hd;
- unsigned int ntimer;
-
- if (!hpet_address)
- return 0;
-
- memset(&hd, 0, sizeof (hd));
-
- ntimer = hpet_readl(HPET_ID);
- ntimer = (ntimer & HPET_ID_NUMBER) >> HPET_ID_NUMBER_SHIFT;
- ntimer++;
-
- /*
- * Register with driver.
- * Timer0 and Timer1 is used by platform.
- */
- hd.hd_phys_address = hpet_address;
- hd.hd_address = (void __iomem *)fix_to_virt(FIX_HPET_BASE);
- hd.hd_nirqs = ntimer;
- hd.hd_flags = HPET_DATA_PLATFORM;
- hpet_reserve_timer(&hd, 0);
-#ifdef CONFIG_HPET_EMULATE_RTC
- hpet_reserve_timer(&hd, 1);
-#endif
- hd.hd_irq[0] = HPET_LEGACY_8254;
- hd.hd_irq[1] = HPET_LEGACY_RTC;
- if (ntimer > 2) {
- struct hpet *hpet;
- struct hpet_timer *timer;
- int i;
-
- hpet = (struct hpet *) fix_to_virt(FIX_HPET_BASE);
- timer = &hpet->hpet_timers[2];
- for (i = 2; i < ntimer; timer++, i++)
- hd.hd_irq[i] = (timer->hpet_config &
- Tn_INT_ROUTE_CNF_MASK) >>
- Tn_INT_ROUTE_CNF_SHIFT;
-
- }
-
- hpet_alloc(&hd);
- return 0;
-}
-fs_initcall(late_hpet_init);
-#endif
-
-static int hpet_timer_stop_set_go(unsigned long tick)
-{
- unsigned int cfg;
-
-/*
- * Stop the timers and reset the main counter.
- */
-
- cfg = hpet_readl(HPET_CFG);
- cfg &= ~(HPET_CFG_ENABLE | HPET_CFG_LEGACY);
- hpet_writel(cfg, HPET_CFG);
- hpet_writel(0, HPET_COUNTER);
- hpet_writel(0, HPET_COUNTER + 4);
-
-/*
- * Set up timer 0, as periodic with first interrupt to happen at hpet_tick,
- * and period also hpet_tick.
- */
- if (hpet_use_timer) {
- hpet_writel(HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_SETVAL |
- HPET_TN_32BIT, HPET_T0_CFG);
- hpet_writel(hpet_tick, HPET_T0_CMP); /* next interrupt */
- hpet_writel(hpet_tick, HPET_T0_CMP); /* period */
- cfg |= HPET_CFG_LEGACY;
- }
-/*
- * Go!
- */
-
- cfg |= HPET_CFG_ENABLE;
- hpet_writel(cfg, HPET_CFG);
-
- return 0;
-}
-
-static int hpet_init(void)
-{
- unsigned int id;
-
- if (!hpet_address)
- return -1;
- set_fixmap_nocache(FIX_HPET_BASE, hpet_address);
- __set_fixmap(VSYSCALL_HPET, hpet_address, PAGE_KERNEL_VSYSCALL_NOCACHE);
-
-/*
- * Read the period, compute tick and quotient.
- */
-
- id = hpet_readl(HPET_ID);
-
- if (!(id & HPET_ID_VENDOR) || !(id & HPET_ID_NUMBER))
- return -1;
-
- hpet_period = hpet_readl(HPET_PERIOD);
- if (hpet_period < 100000 || hpet_period > 100000000)
- return -1;
-
- hpet_tick = (FSEC_PER_TICK + hpet_period / 2) / hpet_period;
-
- hpet_use_timer = (id & HPET_ID_LEGSUP);
-
- return hpet_timer_stop_set_go(hpet_tick);
-}
-
-static int hpet_reenable(void)
-{
- return hpet_timer_stop_set_go(hpet_tick);
-}
-
#define PIT_MODE 0x43
#define PIT_CH0 0x40

@@ -888,7 +557,7 @@ void __init time_init(void)
set_normalized_timespec(&wall_to_monotonic,
-xtime.tv_sec, -xtime.tv_nsec);

- if (!hpet_init())
+ if (!hpet_arch_init())
vxtime_hz = (FSEC_PER_SEC + hpet_period / 2) / hpet_period;
else
hpet_address = 0;
@@ -923,30 +592,6 @@ #ifndef CONFIG_SMP
#endif
}

-/*
- * Make an educated guess if the TSC is trustworthy and synchronized
- * over all CPUs.
- */
-__cpuinit int unsynchronized_tsc(void)
-{
-#ifdef CONFIG_SMP
- if (apic_is_clustered_box())
- return 1;
-#endif
- /* Most intel systems have synchronized TSCs except for
- multi node systems */
- if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) {
-#ifdef CONFIG_ACPI
- /* But TSC doesn't tick in C3 so don't use it there */
- if (acpi_fadt.length > 0 && acpi_fadt.plvl3_lat < 1000)
- return 1;
-#endif
- return 0;
- }
-
- /* Assume multi socket systems are not synchronized */
- return num_present_cpus() > 1;
-}

/*
* Decide what mode gettimeofday should use.
@@ -1084,268 +729,3 @@ static int time_init_device(void)

device_initcall(time_init_device);

-#ifdef CONFIG_HPET_EMULATE_RTC
-/* HPET in LegacyReplacement Mode eats up RTC interrupt line. When, HPET
- * is enabled, we support RTC interrupt functionality in software.
- * RTC has 3 kinds of interrupts:
- * 1) Update Interrupt - generate an interrupt, every sec, when RTC clock
- * is updated
- * 2) Alarm Interrupt - generate an interrupt at a specific time of day
- * 3) Periodic Interrupt - generate periodic interrupt, with frequencies
- * 2Hz-8192Hz (2Hz-64Hz for non-root user) (all freqs in powers of 2)
- * (1) and (2) above are implemented using polling at a frequency of
- * 64 Hz. The exact frequency is a tradeoff between accuracy and interrupt
- * overhead. (DEFAULT_RTC_INT_FREQ)
- * For (3), we use interrupts at 64Hz or user specified periodic
- * frequency, whichever is higher.
- */
-#include <linux/rtc.h>
-
-#define DEFAULT_RTC_INT_FREQ 64
-#define RTC_NUM_INTS 1
-
-static unsigned long UIE_on;
-static unsigned long prev_update_sec;
-
-static unsigned long AIE_on;
-static struct rtc_time alarm_time;
-
-static unsigned long PIE_on;
-static unsigned long PIE_freq = DEFAULT_RTC_INT_FREQ;
-static unsigned long PIE_count;
-
-static unsigned long hpet_rtc_int_freq; /* RTC interrupt frequency */
-static unsigned int hpet_t1_cmp; /* cached comparator register */
-
-int is_hpet_enabled(void)
-{
- return hpet_address != 0;
-}
-
-/*
- * Timer 1 for RTC, we do not use periodic interrupt feature,
- * even if HPET supports periodic interrupts on Timer 1.
- * The reason being, to set up a periodic interrupt in HPET, we need to
- * stop the main counter. And if we do that everytime someone diables/enables
- * RTC, we will have adverse effect on main kernel timer running on Timer 0.
- * So, for the time being, simulate the periodic interrupt in software.
- *
- * hpet_rtc_timer_init() is called for the first time and during subsequent
- * interuppts reinit happens through hpet_rtc_timer_reinit().
- */
-int hpet_rtc_timer_init(void)
-{
- unsigned int cfg, cnt;
- unsigned long flags;
-
- if (!is_hpet_enabled())
- return 0;
- /*
- * Set the counter 1 and enable the interrupts.
- */
- if (PIE_on && (PIE_freq > DEFAULT_RTC_INT_FREQ))
- hpet_rtc_int_freq = PIE_freq;
- else
- hpet_rtc_int_freq = DEFAULT_RTC_INT_FREQ;
-
- local_irq_save(flags);
-
- cnt = hpet_readl(HPET_COUNTER);
- cnt += ((hpet_tick*HZ)/hpet_rtc_int_freq);
- hpet_writel(cnt, HPET_T1_CMP);
- hpet_t1_cmp = cnt;
-
- cfg = hpet_readl(HPET_T1_CFG);
- cfg &= ~HPET_TN_PERIODIC;
- cfg |= HPET_TN_ENABLE | HPET_TN_32BIT;
- hpet_writel(cfg, HPET_T1_CFG);
-
- local_irq_restore(flags);
-
- return 1;
-}
-
-static void hpet_rtc_timer_reinit(void)
-{
- unsigned int cfg, cnt, ticks_per_int, lost_ints;
-
- if (unlikely(!(PIE_on | AIE_on | UIE_on))) {
- cfg = hpet_readl(HPET_T1_CFG);
- cfg &= ~HPET_TN_ENABLE;
- hpet_writel(cfg, HPET_T1_CFG);
- return;
- }
-
- if (PIE_on && (PIE_freq > DEFAULT_RTC_INT_FREQ))
- hpet_rtc_int_freq = PIE_freq;
- else
- hpet_rtc_int_freq = DEFAULT_RTC_INT_FREQ;
-
- /* It is more accurate to use the comparator value than current count.*/
- ticks_per_int = hpet_tick * HZ / hpet_rtc_int_freq;
- hpet_t1_cmp += ticks_per_int;
- hpet_writel(hpet_t1_cmp, HPET_T1_CMP);
-
- /*
- * If the interrupt handler was delayed too long, the write above tries
- * to schedule the next interrupt in the past and the hardware would
- * not interrupt until the counter had wrapped around.
- * So we have to check that the comparator wasn't set to a past time.
- */
- cnt = hpet_readl(HPET_COUNTER);
- if (unlikely((int)(cnt - hpet_t1_cmp) > 0)) {
- lost_ints = (cnt - hpet_t1_cmp) / ticks_per_int + 1;
- /* Make sure that, even with the time needed to execute
- * this code, the next scheduled interrupt has been moved
- * back to the future: */
- lost_ints++;
-
- hpet_t1_cmp += lost_ints * ticks_per_int;
- hpet_writel(hpet_t1_cmp, HPET_T1_CMP);
-
- if (PIE_on)
- PIE_count += lost_ints;
-
- printk(KERN_WARNING "rtc: lost some interrupts at %ldHz.\n",
- hpet_rtc_int_freq);
- }
-}
-
-/*
- * The functions below are called from rtc driver.
- * Return 0 if HPET is not being used.
- * Otherwise do the necessary changes and return 1.
- */
-int hpet_mask_rtc_irq_bit(unsigned long bit_mask)
-{
- if (!is_hpet_enabled())
- return 0;
-
- if (bit_mask & RTC_UIE)
- UIE_on = 0;
- if (bit_mask & RTC_PIE)
- PIE_on = 0;
- if (bit_mask & RTC_AIE)
- AIE_on = 0;
-
- return 1;
-}
-
-int hpet_set_rtc_irq_bit(unsigned long bit_mask)
-{
- int timer_init_reqd = 0;
-
- if (!is_hpet_enabled())
- return 0;
-
- if (!(PIE_on | AIE_on | UIE_on))
- timer_init_reqd = 1;
-
- if (bit_mask & RTC_UIE) {
- UIE_on = 1;
- }
- if (bit_mask & RTC_PIE) {
- PIE_on = 1;
- PIE_count = 0;
- }
- if (bit_mask & RTC_AIE) {
- AIE_on = 1;
- }
-
- if (timer_init_reqd)
- hpet_rtc_timer_init();
-
- return 1;
-}
-
-int hpet_set_alarm_time(unsigned char hrs, unsigned char min, unsigned char sec)
-{
- if (!is_hpet_enabled())
- return 0;
-
- alarm_time.tm_hour = hrs;
- alarm_time.tm_min = min;
- alarm_time.tm_sec = sec;
-
- return 1;
-}
-
-int hpet_set_periodic_freq(unsigned long freq)
-{
- if (!is_hpet_enabled())
- return 0;
-
- PIE_freq = freq;
- PIE_count = 0;
-
- return 1;
-}
-
-int hpet_rtc_dropped_irq(void)
-{
- if (!is_hpet_enabled())
- return 0;
-
- return 1;
-}
-
-irqreturn_t hpet_rtc_interrupt(int irq, void *dev_id, struct pt_regs *regs)
-{
- struct rtc_time curr_time;
- unsigned long rtc_int_flag = 0;
- int call_rtc_interrupt = 0;
-
- hpet_rtc_timer_reinit();
-
- if (UIE_on | AIE_on) {
- rtc_get_rtc_time(&curr_time);
- }
- if (UIE_on) {
- if (curr_time.tm_sec != prev_update_sec) {
- /* Set update int info, call real rtc int routine */
- call_rtc_interrupt = 1;
- rtc_int_flag = RTC_UF;
- prev_update_sec = curr_time.tm_sec;
- }
- }
- if (PIE_on) {
- PIE_count++;
- if (PIE_count >= hpet_rtc_int_freq/PIE_freq) {
- /* Set periodic int info, call real rtc int routine */
- call_rtc_interrupt = 1;
- rtc_int_flag |= RTC_PF;
- PIE_count = 0;
- }
- }
- if (AIE_on) {
- if ((curr_time.tm_sec == alarm_time.tm_sec) &&
- (curr_time.tm_min == alarm_time.tm_min) &&
- (curr_time.tm_hour == alarm_time.tm_hour)) {
- /* Set alarm int info, call real rtc int routine */
- call_rtc_interrupt = 1;
- rtc_int_flag |= RTC_AF;
- }
- }
- if (call_rtc_interrupt) {
- rtc_int_flag |= (RTC_IRQF | (RTC_NUM_INTS << 8));
- rtc_interrupt(rtc_int_flag, dev_id);
- }
- return IRQ_HANDLED;
-}
-#endif
-
-static int __init nohpet_setup(char *s)
-{
- nohpet = 1;
- return 1;
-}
-
-__setup("nohpet", nohpet_setup);
-
-int __init notsc_setup(char *s)
-{
- notsc = 1;
- return 1;
-}
-
-__setup("notsc", notsc_setup);
diff --git a/arch/x86_64/kernel/tsc.c b/arch/x86_64/kernel/tsc.c
new file mode 100644
index 0000000..977d1b2
--- /dev/null
+++ b/arch/x86_64/kernel/tsc.c
@@ -0,0 +1,201 @@
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/interrupt.h>
+#include <linux/init.h>
+#include <linux/clocksource.h>
+#include <linux/time.h>
+#include <linux/acpi.h>
+#include <linux/cpufreq.h>
+
+#include <asm/timex.h>
+
+int notsc __initdata = 0;
+
+unsigned int cpu_khz; /* TSC clocks / usec, not used here */
+EXPORT_SYMBOL(cpu_khz);
+
+/*
+ * do_gettimeoffset() returns microseconds since last timer interrupt was
+ * triggered by hardware. A memory read of HPET is slower than a register read
+ * of TSC, but much more reliable. It's also synchronized to the timer
+ * interrupt. Note that do_gettimeoffset() may return more than hpet_tick, if a
+ * timer interrupt has happened already, but vxtime.trigger wasn't updated yet.
+ * This is not a problem, because jiffies hasn't updated either. They are bound
+ * together by xtime_lock.
+ */
+
+unsigned int do_gettimeoffset_tsc(void)
+{
+ unsigned long t;
+ unsigned long x;
+ t = get_cycles_sync();
+ if (t < vxtime.last_tsc)
+ t = vxtime.last_tsc; /* hack */
+ x = ((t - vxtime.last_tsc) * vxtime.tsc_quot) >> US_SCALE;
+ return x;
+}
+
+static unsigned int cyc2ns_scale __read_mostly;
+
+void set_cyc2ns_scale(unsigned long khz)
+{
+ cyc2ns_scale = (NSEC_PER_MSEC << NS_SCALE) / khz;
+}
+
+unsigned long long cycles_2_ns(unsigned long long cyc)
+{
+ return (cyc * cyc2ns_scale) >> NS_SCALE;
+}
+
+unsigned long long sched_clock(void)
+{
+ unsigned long a = 0;
+
+ /* Could do CPU core sync here. Opteron can execute rdtsc speculatively,
+ * which means it is not completely exact and may not be monotonous
+ * between CPUs. But the errors should be too small to matter for
+ * scheduling purposes.
+ */
+
+ rdtscll(a);
+ return cycles_2_ns(a);
+}
+
+#ifdef CONFIG_CPU_FREQ
+
+/* Frequency scaling support. Adjust the TSC based timer when the cpu frequency
+ * changes.
+ *
+ * RED-PEN: On SMP we assume all CPUs run with the same frequency. It's
+ * not that important because current Opteron setups do not support
+ * scaling on SMP anyroads.
+ *
+ * Should fix up last_tsc too. Currently gettimeofday in the
+ * first tick after the change will be slightly wrong.
+ */
+
+#include <linux/workqueue.h>
+
+static unsigned int cpufreq_delayed_issched = 0;
+static unsigned int cpufreq_init = 0;
+static struct work_struct cpufreq_delayed_get_work;
+
+static void handle_cpufreq_delayed_get(void *v)
+{
+ unsigned int cpu;
+ for_each_online_cpu(cpu) {
+ cpufreq_get(cpu);
+ }
+ cpufreq_delayed_issched = 0;
+}
+
+/* if we notice lost ticks, schedule a call to cpufreq_get() as it tries
+ * to verify the CPU frequency the timing core thinks the CPU is running
+ * at is still correct.
+ */
+void cpufreq_delayed_get(void)
+{
+ static int warned;
+ if (cpufreq_init && !cpufreq_delayed_issched) {
+ cpufreq_delayed_issched = 1;
+ if (!warned) {
+ warned = 1;
+ printk(KERN_DEBUG "Losing some ticks... "
+ "checking if CPU frequency changed.\n");
+ }
+ schedule_work(&cpufreq_delayed_get_work);
+ }
+}
+
+static unsigned int ref_freq = 0;
+static unsigned long loops_per_jiffy_ref = 0;
+
+static unsigned long cpu_khz_ref = 0;
+
+static int time_cpufreq_notifier(struct notifier_block *nb, unsigned long val,
+ void *data)
+{
+ struct cpufreq_freqs *freq = data;
+ unsigned long *lpj, dummy;
+
+ if (cpu_has(&cpu_data[freq->cpu], X86_FEATURE_CONSTANT_TSC))
+ return 0;
+
+ lpj = &dummy;
+ if (!(freq->flags & CPUFREQ_CONST_LOOPS))
+#ifdef CONFIG_SMP
+ lpj = &cpu_data[freq->cpu].loops_per_jiffy;
+#else
+ lpj = &boot_cpu_data.loops_per_jiffy;
+#endif
+
+ if (!ref_freq) {
+ ref_freq = freq->old;
+ loops_per_jiffy_ref = *lpj;
+ cpu_khz_ref = cpu_khz;
+ }
+ if ((val == CPUFREQ_PRECHANGE && freq->old < freq->new) ||
+ (val == CPUFREQ_POSTCHANGE && freq->old > freq->new) ||
+ (val == CPUFREQ_RESUMECHANGE)) {
+ *lpj =
+ cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new);
+
+ cpu_khz = cpufreq_scale(cpu_khz_ref, ref_freq, freq->new);
+ if (!(freq->flags & CPUFREQ_CONST_LOOPS))
+ vxtime.tsc_quot = (USEC_PER_MSEC << US_SCALE) / cpu_khz;
+ }
+
+ set_cyc2ns_scale(cpu_khz_ref);
+
+ return 0;
+}
+
+static struct notifier_block time_cpufreq_notifier_block = {
+ .notifier_call = time_cpufreq_notifier
+};
+
+static int __init cpufreq_tsc(void)
+{
+ INIT_WORK(&cpufreq_delayed_get_work, handle_cpufreq_delayed_get, NULL);
+ if (!cpufreq_register_notifier(&time_cpufreq_notifier_block,
+ CPUFREQ_TRANSITION_NOTIFIER))
+ cpufreq_init = 1;
+ return 0;
+}
+
+core_initcall(cpufreq_tsc);
+
+#endif
+
+/*
+ * Make an educated guess if the TSC is trustworthy and synchronized
+ * over all CPUs.
+ */
+__cpuinit int unsynchronized_tsc(void)
+{
+#ifdef CONFIG_SMP
+ if (apic_is_clustered_box())
+ return 1;
+#endif
+ /* Most intel systems have synchronized TSCs except for
+ multi node systems */
+ if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) {
+#ifdef CONFIG_ACPI
+ /* But TSC doesn't tick in C3 so don't use it there */
+ if (acpi_fadt.length > 0 && acpi_fadt.plvl3_lat < 1000)
+ return 1;
+#endif
+ return 0;
+ }
+
+ /* Assume multi socket systems are not synchronized */
+ return num_present_cpus() > 1;
+}
+
+int __init notsc_setup(char *s)
+{
+ notsc = 1;
+ return 1;
+}
+
+__setup("notsc", notsc_setup);
diff --git a/include/asm-x86_64/hpet.h b/include/asm-x86_64/hpet.h
index 60d5127..59a66f0 100644
--- a/include/asm-x86_64/hpet.h
+++ b/include/asm-x86_64/hpet.h
@@ -56,9 +56,15 @@ #define HPET_TICK_RATE (HZ * 100000UL)
extern int is_hpet_enabled(void);
extern int hpet_rtc_timer_init(void);
extern int apic_is_clustered_box(void);
+extern int hpet_arch_init(void);
+extern int hpet_timer_stop_set_go(unsigned long tick);
+extern int hpet_reenable(void);
+extern unsigned int hpet_calibrate_tsc(void);

extern int hpet_use_timer;
extern unsigned long hpet_address;
+extern unsigned long hpet_period;
+extern unsigned long hpet_tick;

#ifdef CONFIG_HPET_EMULATE_RTC
extern int hpet_mask_rtc_irq_bit(unsigned long bit_mask);
diff --git a/include/asm-x86_64/timex.h b/include/asm-x86_64/timex.h
index b9e5320..adb8c0d 100644
--- a/include/asm-x86_64/timex.h
+++ b/include/asm-x86_64/timex.h
@@ -44,6 +44,17 @@ extern unsigned int cpu_khz;
extern int read_current_timer(unsigned long *timer_value);
#define ARCH_HAS_READ_CURRENT_TIMER 1

+#define USEC_PER_TICK (USEC_PER_SEC / HZ)
+#define NSEC_PER_TICK (NSEC_PER_SEC / HZ)
+#define FSEC_PER_TICK (FSEC_PER_SEC / HZ)
+
+#define NS_SCALE 10 /* 2^10, carefully chosen */
+#define US_SCALE 32 /* 2^32, arbitralrily chosen */
+
extern struct vxtime_data vxtime;

+extern unsigned int do_gettimeoffset_hpet(void);
+extern unsigned int do_gettimeoffset_tsc(void);
+extern void set_cyc2ns_scale(unsigned long khz);
+extern int notsc;
#endif

2006-11-29 03:00:46

by john stultz

[permalink] [raw]
Subject: [PATCH 2/5][time][x86_64] hpet_address cleanup

In preparation for supporting generic timekeeping, this patch cleans up
x86-64's use of vxtime.hpet_address, changing it to just hpet_address
as is also used in i386. This is necessary since the vxtime structure
will be going away.

Signed-off-by: John Stultz <[email protected]>


arch/i386/kernel/acpi/boot.c | 23 ++++++-----------------
arch/x86_64/kernel/apic.c | 3 ++-
arch/x86_64/kernel/time.c | 36 +++++++++++++++++++-----------------
include/asm-x86_64/hpet.h | 1 +
4 files changed, 28 insertions(+), 35 deletions(-)

linux-2.6.19-rc6git11_timeofday-arch-x86-64-hpet-address-cleanup_C7.patch
============================================
diff --git a/arch/i386/kernel/acpi/boot.c b/arch/i386/kernel/acpi/boot.c
index d12fb97..b9e9f17 100644
--- a/arch/i386/kernel/acpi/boot.c
+++ b/arch/i386/kernel/acpi/boot.c
@@ -638,6 +638,7 @@ static int __init acpi_parse_sbf(unsigne
}

#ifdef CONFIG_HPET_TIMER
+#include <asm/hpet.h>

static int __init acpi_parse_hpet(unsigned long phys, unsigned long size)
{
@@ -671,32 +672,20 @@ #define HPET_RESOURCE_NAME_SIZE 9
hpet_res->end = (1 * 1024) - 1;
}

+ hpet_address = hpet_tbl->addr.addrl;
#ifdef CONFIG_X86_64
- vxtime.hpet_address = hpet_tbl->addr.addrl |
- ((long)hpet_tbl->addr.addrh << 32);
-
+ hpet_address |= ((long)hpet_tbl->addr.addrh << 32);
+#endif
printk(KERN_INFO PREFIX "HPET id: %#x base: %#lx\n",
- hpet_tbl->id, vxtime.hpet_address);
-
- res_start = vxtime.hpet_address;
-#else /* X86 */
- {
- extern unsigned long hpet_address;
+ hpet_tbl->id, hpet_address);

- hpet_address = hpet_tbl->addr.addrl;
- printk(KERN_INFO PREFIX "HPET id: %#x base: %#lx\n",
- hpet_tbl->id, hpet_address);
-
- res_start = hpet_address;
- }
-#endif /* X86 */
+ res_start = hpet_address;

if (hpet_res) {
hpet_res->start = res_start;
hpet_res->end += res_start;
insert_resource(&iomem_resource, hpet_res);
}
-
return 0;
}
#else
diff --git a/arch/x86_64/kernel/apic.c b/arch/x86_64/kernel/apic.c
index 4d9d5ed..02f5961 100644
--- a/arch/x86_64/kernel/apic.c
+++ b/arch/x86_64/kernel/apic.c
@@ -36,6 +36,7 @@ #include <asm/nmi.h>
#include <asm/idle.h>
#include <asm/proto.h>
#include <asm/timex.h>
+#include <asm/hpet.h>
#include <asm/apic.h>

int apic_mapped;
@@ -673,7 +674,7 @@ static void setup_APIC_timer(unsigned in
local_irq_save(flags);

/* wait for irq slice */
- if (vxtime.hpet_address && hpet_use_timer) {
+ if (hpet_address && hpet_use_timer) {
int trigger = hpet_readl(HPET_T0_CMP);
while (hpet_readl(HPET_COUNTER) >= trigger)
/* do nothing */ ;
diff --git a/arch/x86_64/kernel/time.c b/arch/x86_64/kernel/time.c
index e3ef544..a6820e0 100644
--- a/arch/x86_64/kernel/time.c
+++ b/arch/x86_64/kernel/time.c
@@ -67,6 +67,7 @@ #define US_SCALE 32 /* 2^32, arbitralril

unsigned int cpu_khz; /* TSC clocks / usec, not used here */
EXPORT_SYMBOL(cpu_khz);
+unsigned long hpet_address;
static unsigned long hpet_period; /* fsecs / HPET clock */
unsigned long hpet_tick; /* HPET clocks / interrupt */
int hpet_use_timer; /* Use counter of hpet for time keeping, otherwise PIT */
@@ -316,7 +317,7 @@ static noinline void handle_lost_ticks(i
KERN_WARNING "Your time source seems to be instable or "
"some driver is hogging interupts\n");
print_symbol("rip %s\n", get_irq_regs()->rip);
- if (vxtime.mode == VXTIME_TSC && vxtime.hpet_address) {
+ if (vxtime.mode == VXTIME_TSC && hpet_address) {
printk(KERN_WARNING "Falling back to HPET\n");
if (hpet_use_timer)
vxtime.last = hpet_readl(HPET_T0_CMP) -
@@ -324,6 +325,7 @@ static noinline void handle_lost_ticks(i
else
vxtime.last = hpet_readl(HPET_COUNTER);
vxtime.mode = VXTIME_HPET;
+ vxtime.hpet_address = hpet_address;
do_gettimeoffset = do_gettimeoffset_hpet;
}
/* else should fall back to PIT, but code missing. */
@@ -354,7 +356,7 @@ void main_timer_handler(void)

write_seqlock(&xtime_lock);

- if (vxtime.hpet_address)
+ if (hpet_address)
offset = hpet_readl(HPET_COUNTER);

if (hpet_use_timer) {
@@ -717,7 +719,7 @@ static __init int late_hpet_init(void)
struct hpet_data hd;
unsigned int ntimer;

- if (!vxtime.hpet_address)
+ if (!hpet_address)
return 0;

memset(&hd, 0, sizeof (hd));
@@ -730,7 +732,7 @@ static __init int late_hpet_init(void)
* Register with driver.
* Timer0 and Timer1 is used by platform.
*/
- hd.hd_phys_address = vxtime.hpet_address;
+ hd.hd_phys_address = hpet_address;
hd.hd_address = (void __iomem *)fix_to_virt(FIX_HPET_BASE);
hd.hd_nirqs = ntimer;
hd.hd_flags = HPET_DATA_PLATFORM;
@@ -799,10 +801,10 @@ static int hpet_init(void)
{
unsigned int id;

- if (!vxtime.hpet_address)
+ if (!hpet_address)
return -1;
- set_fixmap_nocache(FIX_HPET_BASE, vxtime.hpet_address);
- __set_fixmap(VSYSCALL_HPET, vxtime.hpet_address, PAGE_KERNEL_VSYSCALL_NOCACHE);
+ set_fixmap_nocache(FIX_HPET_BASE, hpet_address);
+ __set_fixmap(VSYSCALL_HPET, hpet_address, PAGE_KERNEL_VSYSCALL_NOCACHE);

/*
* Read the period, compute tick and quotient.
@@ -856,7 +858,7 @@ void __init pit_stop_interrupt(void)
void __init stop_timer_interrupt(void)
{
char *name;
- if (vxtime.hpet_address) {
+ if (hpet_address) {
name = "HPET";
hpet_timer_stop_set_go(0);
} else {
@@ -879,8 +881,7 @@ static struct irqaction irq0 = {
void __init time_init(void)
{
if (nohpet)
- vxtime.hpet_address = 0;
-
+ hpet_address = 0;
xtime.tv_sec = get_cmos_time();
xtime.tv_nsec = 0;

@@ -890,7 +891,7 @@ void __init time_init(void)
if (!hpet_init())
vxtime_hz = (FSEC_PER_SEC + hpet_period / 2) / hpet_period;
else
- vxtime.hpet_address = 0;
+ hpet_address = 0;

if (hpet_use_timer) {
/* set tick_nsec to use the proper rate for HPET */
@@ -898,7 +899,7 @@ void __init time_init(void)
cpu_khz = hpet_calibrate_tsc();
timename = "HPET";
#ifdef CONFIG_X86_PM_TIMER
- } else if (pmtmr_ioport && !vxtime.hpet_address) {
+ } else if (pmtmr_ioport && !hpet_address) {
vxtime_hz = PM_TIMER_FREQUENCY;
timename = "PM";
pit_init();
@@ -957,23 +958,24 @@ void time_init_gtod(void)
if (unsynchronized_tsc())
notsc = 1;

- if (cpu_has(&boot_cpu_data, X86_FEATURE_RDTSCP))
+ if (cpu_has(&boot_cpu_data, X86_FEATURE_RDTSCP))
vgetcpu_mode = VGETCPU_RDTSCP;
else
vgetcpu_mode = VGETCPU_LSL;

- if (vxtime.hpet_address && notsc) {
+ if (hpet_address && notsc) {
timetype = hpet_use_timer ? "HPET" : "PIT/HPET";
if (hpet_use_timer)
vxtime.last = hpet_readl(HPET_T0_CMP) - hpet_tick;
else
vxtime.last = hpet_readl(HPET_COUNTER);
vxtime.mode = VXTIME_HPET;
+ vxtime.hpet_address = hpet_address;
do_gettimeoffset = do_gettimeoffset_hpet;
#ifdef CONFIG_X86_PM_TIMER
/* Using PM for gettimeofday is quite slow, but we have no other
choice because the TSC is too unreliable on some systems. */
- } else if (pmtmr_ioport && !vxtime.hpet_address && notsc) {
+ } else if (pmtmr_ioport && !hpet_address && notsc) {
timetype = "PM";
do_gettimeoffset = do_gettimeoffset_pm;
vxtime.mode = VXTIME_PMTMR;
@@ -1033,7 +1035,7 @@ static int timer_resume(struct sys_devic
sleep_length = 0;
ctime = sleep_start;
}
- if (vxtime.hpet_address)
+ if (hpet_address)
hpet_reenable();
else
i8254_timer_resume();
@@ -1117,7 +1119,7 @@ static unsigned int hpet_t1_cmp; /* cach

int is_hpet_enabled(void)
{
- return vxtime.hpet_address != 0;
+ return hpet_address != 0;
}

/*
diff --git a/include/asm-x86_64/hpet.h b/include/asm-x86_64/hpet.h
index b390984..60d5127 100644
--- a/include/asm-x86_64/hpet.h
+++ b/include/asm-x86_64/hpet.h
@@ -58,6 +58,7 @@ extern int hpet_rtc_timer_init(void);
extern int apic_is_clustered_box(void);

extern int hpet_use_timer;
+extern unsigned long hpet_address;

#ifdef CONFIG_HPET_EMULATE_RTC
extern int hpet_mask_rtc_irq_bit(unsigned long bit_mask);

2006-11-29 03:00:27

by john stultz

[permalink] [raw]
Subject: [PATCH 1/5][time][Generic] vsyscall-gtod support for GENERIC_TIME

Provides generic infrastructure for vsyscall-gtod.

Signed-off-by: John Stultz <[email protected]>

include/linux/clocksource.h | 8 ++++++++
kernel/timer.c | 1 +
2 files changed, 9 insertions(+)

linux-2.6.19-rc6git11_timeofday-vsyscall-support_C7.patch
============================================
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index d852024..62a600d 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -46,6 +46,7 @@ typedef u64 cycle_t;
* @shift: cycle to nanosecond divisor (power of two)
* @update_callback: called when safe to alter clocksource values
* @is_continuous: defines if clocksource is free-running.
+ * @vread: vsyscall based read
* @cycle_interval: Used internally by timekeeping core, please ignore.
* @xtime_interval: Used internally by timekeeping core, please ignore.
*/
@@ -59,6 +60,7 @@ struct clocksource {
u32 shift;
int (*update_callback)(void);
int is_continuous;
+ cycle_t (*vread)(void);

/* timekeeping specific data, ignore */
cycle_t cycle_last, cycle_interval;
@@ -182,4 +184,10 @@ int clocksource_register(struct clocksou
void clocksource_reselect(void);
struct clocksource* clocksource_get_next(void);

+#ifdef CONFIG_GENERIC_TIME_VSYSCALL
+extern void update_vsyscall(struct timespec *ts, struct clocksource *c);
+#else
+#define update_vsyscall(now, c) do { } while(0)
+#endif
+
#endif /* _LINUX_CLOCKSOURCE_H */
diff --git a/kernel/timer.c b/kernel/timer.c
index c1c7fbc..38fd4a7 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -956,6 +956,7 @@ #endif
clock->xtime_nsec = 0;
clocksource_calculate_interval(clock, tick_nsec);
}
+ update_vsyscall(&xtime, clock);
}

/*

2006-11-29 03:01:28

by john stultz

[permalink] [raw]
Subject: [PATCH 4/5][time][x86_64] Convert x86_64 to use GENERIC_TIME

This patch converts x86_64 to use the GENERIC_TIME infrastructure and
adds clocksource structures for both TSC and HPET (ACPI PM is shared w/
i386).

Signed-off-by: John Stultz <[email protected]>

arch/x86_64/Kconfig | 4
arch/x86_64/kernel/apic.c | 2
arch/x86_64/kernel/hpet.c | 65 ++++++++-
arch/x86_64/kernel/pmtimer.c | 58 --------
arch/x86_64/kernel/smpboot.c | 1
arch/x86_64/kernel/time.c | 301 -------------------------------------------
arch/x86_64/kernel/tsc.c | 105 +++++++++------
include/asm-x86_64/proto.h | 1
include/asm-x86_64/timex.h | 5
9 files changed, 133 insertions(+), 409 deletions(-)

linux-2.6.19-rc6git11_timeofday-arch-x86-64-generic-time-conversion_C7.patch
============================================
diff --git a/arch/x86_64/Kconfig b/arch/x86_64/Kconfig
index 010d226..20bcd6d 100644
--- a/arch/x86_64/Kconfig
+++ b/arch/x86_64/Kconfig
@@ -24,6 +24,10 @@ config X86
bool
default y

+config GENERIC_TIME
+ bool
+ default y
+
config ZONE_DMA32
bool
default y
diff --git a/arch/x86_64/kernel/apic.c b/arch/x86_64/kernel/apic.c
index 02f5961..588ef3d 100644
--- a/arch/x86_64/kernel/apic.c
+++ b/arch/x86_64/kernel/apic.c
@@ -696,7 +696,7 @@ static void setup_APIC_timer(unsigned in
/* Turn off PIT interrupt if we use APIC timer as main timer.
Only works with the PM timer right now
TBD fix it for HPET too. */
- if (vxtime.mode == VXTIME_PMTMR &&
+ if ((pmtmr_ioport != 0) &&
smp_processor_id() == boot_cpu_id &&
apic_runs_main_timer == 1 &&
!cpu_isset(boot_cpu_id, timer_interrupt_broadcast_ipi_mask)) {
diff --git a/arch/x86_64/kernel/hpet.c b/arch/x86_64/kernel/hpet.c
index a219786..c00b01a 100644
--- a/arch/x86_64/kernel/hpet.c
+++ b/arch/x86_64/kernel/hpet.c
@@ -19,12 +19,6 @@ unsigned long hpet_tick; /* HPET clocks
int hpet_use_timer; /* Use counter of hpet for time keeping,
* otherwise PIT
*/
-unsigned int do_gettimeoffset_hpet(void)
-{
- /* cap counter read to one tick to avoid inconsistencies */
- unsigned long counter = hpet_readl(HPET_COUNTER) - vxtime.last;
- return (min(counter,hpet_tick) * vxtime.quot) >> US_SCALE;
-}

#ifdef CONFIG_HPET
static __init int late_hpet_init(void)
@@ -433,3 +427,62 @@ static int __init nohpet_setup(char *s)

__setup("nohpet", nohpet_setup);

+#define HPET_MASK 0xFFFFFFFF
+#define HPET_SHIFT 22
+
+/* FSEC = 10^-15 NSEC = 10^-9 */
+#define FSEC_PER_NSEC 1000000
+
+static void *hpet_ptr;
+
+static cycle_t read_hpet(void)
+{
+ return (cycle_t)readl(hpet_ptr);
+}
+
+struct clocksource clocksource_hpet = {
+ .name = "hpet",
+ .rating = 250,
+ .read = read_hpet,
+ .mask = (cycle_t)HPET_MASK,
+ .mult = 0, /* set below */
+ .shift = HPET_SHIFT,
+ .is_continuous = 1,
+};
+
+static int __init init_hpet_clocksource(void)
+{
+ unsigned long hpet_period;
+ void __iomem *hpet_base;
+ u64 tmp;
+
+ if (!hpet_address)
+ return -ENODEV;
+
+ /* calculate the hpet address: */
+ hpet_base =
+ (void __iomem*)ioremap_nocache(hpet_address, HPET_MMAP_SIZE);
+ hpet_ptr = hpet_base + HPET_COUNTER;
+
+ /* calculate the frequency: */
+ hpet_period = readl(hpet_base + HPET_PERIOD);
+
+ /*
+ * hpet period is in femto seconds per cycle
+ * so we need to convert this to ns/cyc units
+ * aproximated by mult/2^shift
+ *
+ * fsec/cyc * 1nsec/1000000fsec = nsec/cyc = mult/2^shift
+ * fsec/cyc * 1ns/1000000fsec * 2^shift = mult
+ * fsec/cyc * 2^shift * 1nsec/1000000fsec = mult
+ * (fsec/cyc << shift)/1000000 = mult
+ * (hpet_period << shift)/FSEC_PER_NSEC = mult
+ */
+ tmp = (u64)hpet_period << HPET_SHIFT;
+ do_div(tmp, FSEC_PER_NSEC);
+ clocksource_hpet.mult = (u32)tmp;
+
+ return clocksource_register(&clocksource_hpet);
+}
+
+module_init(init_hpet_clocksource);
diff --git a/arch/x86_64/kernel/pmtimer.c b/arch/x86_64/kernel/pmtimer.c
index 7554458..ae8f912 100644
--- a/arch/x86_64/kernel/pmtimer.c
+++ b/arch/x86_64/kernel/pmtimer.c
@@ -24,15 +24,6 @@ #include <asm/proto.h>
#include <asm/msr.h>
#include <asm/vsyscall.h>

-/* The I/O port the PMTMR resides at.
- * The location is detected during setup_arch(),
- * in arch/i386/kernel/acpi/boot.c */
-u32 pmtmr_ioport __read_mostly;
-
-/* value of the Power timer at last timer interrupt */
-static u32 offset_delay;
-static u32 last_pmtmr_tick;
-
#define ACPI_PM_MASK 0xFFFFFF /* limit it to 24 bits */

static inline u32 cyc2us(u32 cycles)
@@ -48,38 +39,6 @@ static inline u32 cyc2us(u32 cycles)
return (cycles >> 10);
}

-int pmtimer_mark_offset(void)
-{
- static int first_run = 1;
- unsigned long tsc;
- u32 lost;
-
- u32 tick = inl(pmtmr_ioport);
- u32 delta;
-
- delta = cyc2us((tick - last_pmtmr_tick) & ACPI_PM_MASK);
-
- last_pmtmr_tick = tick;
- monotonic_base += delta * NSEC_PER_USEC;
-
- delta += offset_delay;
-
- lost = delta / (USEC_PER_SEC / HZ);
- offset_delay = delta % (USEC_PER_SEC / HZ);
-
- rdtscll(tsc);
- vxtime.last_tsc = tsc - offset_delay * (u64)cpu_khz / 1000;
-
- /* don't calculate delay for first run,
- or if we've got less then a tick */
- if (first_run || (lost < 1)) {
- first_run = 0;
- offset_delay = 0;
- }
-
- return lost - 1;
-}
-
static unsigned pmtimer_wait_tick(void)
{
u32 a, b;
@@ -101,23 +60,6 @@ void pmtimer_wait(unsigned us)
} while (cyc2us(b - a) < us);
}

-void pmtimer_resume(void)
-{
- last_pmtmr_tick = inl(pmtmr_ioport);
-}
-
-unsigned int do_gettimeoffset_pm(void)
-{
- u32 now, offset, delta = 0;
-
- offset = last_pmtmr_tick;
- now = inl(pmtmr_ioport);
- delta = (now - offset) & ACPI_PM_MASK;
-
- return offset_delay + cyc2us(delta);
-}
-
-
static int __init nopmtimer_setup(char *s)
{
pmtmr_ioport = 0;
diff --git a/arch/x86_64/kernel/smpboot.c b/arch/x86_64/kernel/smpboot.c
index 62c2e74..6ccd235 100644
--- a/arch/x86_64/kernel/smpboot.c
+++ b/arch/x86_64/kernel/smpboot.c
@@ -1180,7 +1180,6 @@ void __init smp_cpus_done(unsigned int m
smp_cleanup_boot();
setup_ioapic_dest();
check_nmi_watchdog();
- time_init_gtod();
}

#ifdef CONFIG_HOTPLUG_CPU
diff --git a/arch/x86_64/kernel/time.c b/arch/x86_64/kernel/time.c
index 164ba5f..4bc737c 100644
--- a/arch/x86_64/kernel/time.c
+++ b/arch/x86_64/kernel/time.c
@@ -44,9 +44,6 @@ #include <linux/hpet.h>
#include <asm/apic.h>
#include <asm/hpet.h>

-#ifdef CONFIG_CPU_FREQ
-extern void cpufreq_delayed_get(void);
-#endif
extern void i8254_timer_resume(void);
extern int using_apic_timer;

@@ -57,8 +54,6 @@ EXPORT_SYMBOL(rtc_lock);
DEFINE_SPINLOCK(i8253_lock);

unsigned long vxtime_hz = PIT_TICK_RATE;
-int report_lost_ticks; /* command line option */
-unsigned long long monotonic_base;

struct vxtime_data __vxtime __section_vxtime; /* for vsyscalls */

@@ -66,76 +61,6 @@ volatile unsigned long __jiffies __secti
struct timespec __xtime __section_xtime;
struct timezone __sys_tz __section_sys_tz;

-unsigned int (*do_gettimeoffset)(void) = do_gettimeoffset_tsc;
-
-/*
- * This version of gettimeofday() has microsecond resolution and better than
- * microsecond precision, as we're using at least a 10 MHz (usually 14.31818
- * MHz) HPET timer.
- */
-
-void do_gettimeofday(struct timeval *tv)
-{
- unsigned long seq;
- unsigned int sec, usec;
-
- do {
- seq = read_seqbegin(&xtime_lock);
-
- sec = xtime.tv_sec;
- usec = xtime.tv_nsec / NSEC_PER_USEC;
-
- /* i386 does some correction here to keep the clock
- monotonous even when ntpd is fixing drift.
- But they didn't work for me, there is a non monotonic
- clock anyways with ntp.
- I dropped all corrections now until a real solution can
- be found. Note when you fix it here you need to do the same
- in arch/x86_64/kernel/vsyscall.c and export all needed
- variables in vmlinux.lds. -AK */
- usec += do_gettimeoffset();
-
- } while (read_seqretry(&xtime_lock, seq));
-
- tv->tv_sec = sec + usec / USEC_PER_SEC;
- tv->tv_usec = usec % USEC_PER_SEC;
-}
-
-EXPORT_SYMBOL(do_gettimeofday);
-
-/*
- * settimeofday() first undoes the correction that gettimeofday would do
- * on the time, and then saves it. This is ugly, but has been like this for
- * ages already.
- */
-
-int do_settimeofday(struct timespec *tv)
-{
- time_t wtm_sec, sec = tv->tv_sec;
- long wtm_nsec, nsec = tv->tv_nsec;
-
- if ((unsigned long)tv->tv_nsec >= NSEC_PER_SEC)
- return -EINVAL;
-
- write_seqlock_irq(&xtime_lock);
-
- nsec -= do_gettimeoffset() * NSEC_PER_USEC;
-
- wtm_sec = wall_to_monotonic.tv_sec + (xtime.tv_sec - sec);
- wtm_nsec = wall_to_monotonic.tv_nsec + (xtime.tv_nsec - nsec);
-
- set_normalized_timespec(&xtime, sec, nsec);
- set_normalized_timespec(&wall_to_monotonic, wtm_sec, wtm_nsec);
-
- ntp_clear();
-
- write_sequnlock_irq(&xtime_lock);
- clock_was_set();
- return 0;
-}
-
-EXPORT_SYMBOL(do_settimeofday);
-
unsigned long profile_pc(struct pt_regs *regs)
{
unsigned long pc = instruction_pointer(regs);
@@ -225,85 +150,9 @@ static void set_rtc_mmss(unsigned long n
}


-/* monotonic_clock(): returns # of nanoseconds passed since time_init()
- * Note: This function is required to return accurate
- * time even in the absence of multiple timer ticks.
- */
-extern unsigned long long cycles_2_ns(unsigned long long cyc);
-unsigned long long monotonic_clock(void)
-{
- unsigned long seq;
- u32 last_offset, this_offset, offset;
- unsigned long long base;
-
- if (vxtime.mode == VXTIME_HPET) {
- do {
- seq = read_seqbegin(&xtime_lock);
-
- last_offset = vxtime.last;
- base = monotonic_base;
- this_offset = hpet_readl(HPET_COUNTER);
- } while (read_seqretry(&xtime_lock, seq));
- offset = (this_offset - last_offset);
- offset *= NSEC_PER_TICK / hpet_tick;
- } else {
- do {
- seq = read_seqbegin(&xtime_lock);
-
- last_offset = vxtime.last_tsc;
- base = monotonic_base;
- } while (read_seqretry(&xtime_lock, seq));
- this_offset = get_cycles_sync();
- offset = cycles_2_ns(this_offset - last_offset);
- }
- return base + offset;
-}
-EXPORT_SYMBOL(monotonic_clock);
-
-static noinline void handle_lost_ticks(int lost)
-{
- static long lost_count;
- static int warned;
- if (report_lost_ticks) {
- printk(KERN_WARNING "time.c: Lost %d timer tick(s)! ", lost);
- print_symbol("rip %s)\n", get_irq_regs()->rip);
- }
-
- if (lost_count == 1000 && !warned) {
- printk(KERN_WARNING "warning: many lost ticks.\n"
- KERN_WARNING "Your time source seems to be instable or "
- "some driver is hogging interupts\n");
- print_symbol("rip %s\n", get_irq_regs()->rip);
- if (vxtime.mode == VXTIME_TSC && hpet_address) {
- printk(KERN_WARNING "Falling back to HPET\n");
- if (hpet_use_timer)
- vxtime.last = hpet_readl(HPET_T0_CMP) -
- hpet_tick;
- else
- vxtime.last = hpet_readl(HPET_COUNTER);
- vxtime.mode = VXTIME_HPET;
- vxtime.hpet_address = hpet_address;
- do_gettimeoffset = do_gettimeoffset_hpet;
- }
- /* else should fall back to PIT, but code missing. */
- warned = 1;
- } else
- lost_count++;
-
-#ifdef CONFIG_CPU_FREQ
- /* In some cases the CPU can change frequency without us noticing
- Give cpufreq a change to catch up. */
- if ((lost_count+1) % 25 == 0)
- cpufreq_delayed_get();
-#endif
-}
-
void main_timer_handler(void)
{
static unsigned long rtc_update = 0;
- unsigned long tsc;
- int delay = 0, offset = 0, lost = 0;
-
/*
* Here we are in the timer irq handler. We have irqs locally disabled (so we
* don't need spin_lock_irqsave()) but we don't know if the timer_bh is running
@@ -313,72 +162,11 @@ void main_timer_handler(void)

write_seqlock(&xtime_lock);

- if (hpet_address)
- offset = hpet_readl(HPET_COUNTER);
-
- if (hpet_use_timer) {
- /* if we're using the hpet timer functionality,
- * we can more accurately know the counter value
- * when the timer interrupt occured.
- */
- offset = hpet_readl(HPET_T0_CMP) - hpet_tick;
- delay = hpet_readl(HPET_COUNTER) - offset;
- } else if (!pmtmr_ioport) {
- spin_lock(&i8253_lock);
- outb_p(0x00, 0x43);
- delay = inb_p(0x40);
- delay |= inb(0x40) << 8;
- spin_unlock(&i8253_lock);
- delay = LATCH - 1 - delay;
- }
-
- tsc = get_cycles_sync();
-
- if (vxtime.mode == VXTIME_HPET) {
- if (offset - vxtime.last > hpet_tick) {
- lost = (offset - vxtime.last) / hpet_tick - 1;
- }
-
- monotonic_base +=
- (offset - vxtime.last) * NSEC_PER_TICK / hpet_tick;
-
- vxtime.last = offset;
-#ifdef CONFIG_X86_PM_TIMER
- } else if (vxtime.mode == VXTIME_PMTMR) {
- lost = pmtimer_mark_offset();
-#endif
- } else {
- offset = (((tsc - vxtime.last_tsc) *
- vxtime.tsc_quot) >> US_SCALE) - USEC_PER_TICK;
-
- if (offset < 0)
- offset = 0;
-
- if (offset > USEC_PER_TICK) {
- lost = offset / USEC_PER_TICK;
- offset %= USEC_PER_TICK;
- }
-
- monotonic_base += cycles_2_ns(tsc - vxtime.last_tsc);
-
- vxtime.last_tsc = tsc - vxtime.quot * delay / vxtime.tsc_quot;
-
- if ((((tsc - vxtime.last_tsc) *
- vxtime.tsc_quot) >> US_SCALE) < offset)
- vxtime.last_tsc = tsc -
- (((long) offset << US_SCALE) / vxtime.tsc_quot) - 1;
- }
-
- if (lost > 0)
- handle_lost_ticks(lost);
- else
- lost = 0;
-
/*
* Do the timer stuff.
*/

- do_timer(lost + 1);
+ do_timer(1);
#ifndef CONFIG_SMP
update_process_times(user_mode(get_irq_regs()));
#endif
@@ -537,12 +325,6 @@ void __init stop_timer_interrupt(void)
printk(KERN_INFO "timer: %s interrupt stopped.\n", name);
}

-int __init time_setup(char *str)
-{
- report_lost_ticks = 1;
- return 1;
-}
-
static struct irqaction irq0 = {
timer_interrupt, IRQF_DISABLED, CPU_MASK_NONE, "timer", NULL, NULL
};
@@ -557,9 +339,7 @@ void __init time_init(void)
set_normalized_timespec(&wall_to_monotonic,
-xtime.tv_sec, -xtime.tv_nsec);

- if (!hpet_arch_init())
- vxtime_hz = (FSEC_PER_SEC + hpet_period / 2) / hpet_period;
- else
+ if (hpet_arch_init())
hpet_address = 0;

if (hpet_use_timer) {
@@ -567,83 +347,25 @@ void __init time_init(void)
tick_nsec = TICK_NSEC_HPET;
cpu_khz = hpet_calibrate_tsc();
timename = "HPET";
-#ifdef CONFIG_X86_PM_TIMER
- } else if (pmtmr_ioport && !hpet_address) {
- vxtime_hz = PM_TIMER_FREQUENCY;
- timename = "PM";
- pit_init();
- cpu_khz = pit_calibrate_tsc();
-#endif
} else {
pit_init();
cpu_khz = pit_calibrate_tsc();
timename = "PIT";
}

- vxtime.mode = VXTIME_TSC;
- vxtime.quot = (USEC_PER_SEC << US_SCALE) / vxtime_hz;
- vxtime.tsc_quot = (USEC_PER_MSEC << US_SCALE) / cpu_khz;
- vxtime.last_tsc = get_cycles_sync();
- set_cyc2ns_scale(cpu_khz);
- setup_irq(0, &irq0);
-
-#ifndef CONFIG_SMP
- time_init_gtod();
-#endif
-}
-
-
-/*
- * Decide what mode gettimeofday should use.
- */
-void time_init_gtod(void)
-{
- char *timetype;
-
if (unsynchronized_tsc())
- notsc = 1;
+ mark_tsc_unstable();

if (cpu_has(&boot_cpu_data, X86_FEATURE_RDTSCP))
vgetcpu_mode = VGETCPU_RDTSCP;
else
vgetcpu_mode = VGETCPU_LSL;

- if (hpet_address && notsc) {
- timetype = hpet_use_timer ? "HPET" : "PIT/HPET";
- if (hpet_use_timer)
- vxtime.last = hpet_readl(HPET_T0_CMP) - hpet_tick;
- else
- vxtime.last = hpet_readl(HPET_COUNTER);
- vxtime.mode = VXTIME_HPET;
- vxtime.hpet_address = hpet_address;
- do_gettimeoffset = do_gettimeoffset_hpet;
-#ifdef CONFIG_X86_PM_TIMER
- /* Using PM for gettimeofday is quite slow, but we have no other
- choice because the TSC is too unreliable on some systems. */
- } else if (pmtmr_ioport && !hpet_address && notsc) {
- timetype = "PM";
- do_gettimeoffset = do_gettimeoffset_pm;
- vxtime.mode = VXTIME_PMTMR;
- sysctl_vsyscall = 0;
- printk(KERN_INFO "Disabling vsyscall due to use of PM timer\n");
-#endif
- } else {
- timetype = hpet_use_timer ? "HPET/TSC" : "PIT/TSC";
- vxtime.mode = VXTIME_TSC;
- }
-
- printk(KERN_INFO "time.c: Using %ld.%06ld MHz WALL %s GTOD %s timer.\n",
- vxtime_hz / 1000000, vxtime_hz % 1000000, timename, timetype);
printk(KERN_INFO "time.c: Detected %d.%03d MHz processor.\n",
cpu_khz / 1000, cpu_khz % 1000);
- vxtime.quot = (USEC_PER_SEC << US_SCALE) / vxtime_hz;
- vxtime.tsc_quot = (USEC_PER_MSEC << US_SCALE) / cpu_khz;
- vxtime.last_tsc = get_cycles_sync();
-
- set_cyc2ns_scale(cpu_khz);
+ setup_irq(0, &irq0);
}

-__setup("report_lost_ticks", time_setup);

static long clock_cmos_diff;
static unsigned long sleep_start;
@@ -689,20 +411,8 @@ static int timer_resume(struct sys_devic
write_seqlock_irqsave(&xtime_lock,flags);
xtime.tv_sec = sec;
xtime.tv_nsec = 0;
- if (vxtime.mode == VXTIME_HPET) {
- if (hpet_use_timer)
- vxtime.last = hpet_readl(HPET_T0_CMP) - hpet_tick;
- else
- vxtime.last = hpet_readl(HPET_COUNTER);
-#ifdef CONFIG_X86_PM_TIMER
- } else if (vxtime.mode == VXTIME_PMTMR) {
- pmtimer_resume();
-#endif
- } else
- vxtime.last_tsc = get_cycles_sync();
- write_sequnlock_irqrestore(&xtime_lock,flags);
jiffies += sleep_length;
- monotonic_base += sleep_length * (NSEC_PER_SEC/HZ);
+ write_sequnlock_irqrestore(&xtime_lock,flags);
touch_softlockup_watchdog();
return 0;
}
@@ -728,4 +438,3 @@ static int time_init_device(void)
}

device_initcall(time_init_device);
-
diff --git a/arch/x86_64/kernel/tsc.c b/arch/x86_64/kernel/tsc.c
index 977d1b2..682e122 100644
--- a/arch/x86_64/kernel/tsc.c
+++ b/arch/x86_64/kernel/tsc.c
@@ -9,32 +9,11 @@ #include <linux/cpufreq.h>

#include <asm/timex.h>

-int notsc __initdata = 0;
+static int notsc __initdata = 0;

unsigned int cpu_khz; /* TSC clocks / usec, not used here */
EXPORT_SYMBOL(cpu_khz);

-/*
- * do_gettimeoffset() returns microseconds since last timer interrupt was
- * triggered by hardware. A memory read of HPET is slower than a register read
- * of TSC, but much more reliable. It's also synchronized to the timer
- * interrupt. Note that do_gettimeoffset() may return more than hpet_tick, if a
- * timer interrupt has happened already, but vxtime.trigger wasn't updated yet.
- * This is not a problem, because jiffies hasn't updated either. They are bound
- * together by xtime_lock.
- */
-
-unsigned int do_gettimeoffset_tsc(void)
-{
- unsigned long t;
- unsigned long x;
- t = get_cycles_sync();
- if (t < vxtime.last_tsc)
- t = vxtime.last_tsc; /* hack */
- x = ((t - vxtime.last_tsc) * vxtime.tsc_quot) >> US_SCALE;
- return x;
-}
-
static unsigned int cyc2ns_scale __read_mostly;

void set_cyc2ns_scale(unsigned long khz)
@@ -42,7 +21,7 @@ void set_cyc2ns_scale(unsigned long khz)
cyc2ns_scale = (NSEC_PER_MSEC << NS_SCALE) / khz;
}

-unsigned long long cycles_2_ns(unsigned long long cyc)
+static unsigned long long cycles_2_ns(unsigned long long cyc)
{
return (cyc * cyc2ns_scale) >> NS_SCALE;
}
@@ -61,6 +40,19 @@ unsigned long long sched_clock(void)
return cycles_2_ns(a);
}

+static int tsc_unstable;
+
+static inline int check_tsc_unstable(void)
+{
+ return tsc_unstable;
+}
+
+void mark_tsc_unstable(void)
+{
+ tsc_unstable = 1;
+}
+EXPORT_SYMBOL_GPL(mark_tsc_unstable);
+
#ifdef CONFIG_CPU_FREQ

/* Frequency scaling support. Adjust the TSC based timer when the cpu frequency
@@ -89,24 +81,6 @@ static void handle_cpufreq_delayed_get(v
cpufreq_delayed_issched = 0;
}

-/* if we notice lost ticks, schedule a call to cpufreq_get() as it tries
- * to verify the CPU frequency the timing core thinks the CPU is running
- * at is still correct.
- */
-void cpufreq_delayed_get(void)
-{
- static int warned;
- if (cpufreq_init && !cpufreq_delayed_issched) {
- cpufreq_delayed_issched = 1;
- if (!warned) {
- warned = 1;
- printk(KERN_DEBUG "Losing some ticks... "
- "checking if CPU frequency changed.\n");
- }
- schedule_work(&cpufreq_delayed_get_work);
- }
-}
-
static unsigned int ref_freq = 0;
static unsigned long loops_per_jiffy_ref = 0;

@@ -142,7 +116,7 @@ #endif

cpu_khz = cpufreq_scale(cpu_khz_ref, ref_freq, freq->new);
if (!(freq->flags & CPUFREQ_CONST_LOOPS))
- vxtime.tsc_quot = (USEC_PER_MSEC << US_SCALE) / cpu_khz;
+ mark_tsc_unstable();
}

set_cyc2ns_scale(cpu_khz_ref);
@@ -199,3 +173,50 @@ int __init notsc_setup(char *s)
}

__setup("notsc", notsc_setup);
+
+
+/* clock source code: */
+
+static int tsc_update_callback(void);
+
+static cycle_t read_tsc(void)
+{
+ cycle_t ret = (cycle_t)get_cycles_sync();
+ return ret;
+}
+
+static struct clocksource clocksource_tsc = {
+ .name = "tsc",
+ .rating = 300,
+ .read = read_tsc,
+ .mask = (cycle_t)-1,
+ .mult = 0, /* to be set */
+ .shift = 22,
+ .update_callback = tsc_update_callback,
+ .is_continuous = 1,
+};
+
+static int tsc_update_callback(void)
+{
+ int change = 0;
+
+ /* check to see if we should switch to the safe clocksource: */
+ if (clocksource_tsc.rating != 50 && check_tsc_unstable()) {
+ clocksource_tsc.rating = 50;
+ clocksource_reselect();
+ change = 1;
+ }
+ return change;
+}
+
+static int __init init_tsc_clocksource(void)
+{
+ if (!notsc) {
+ clocksource_tsc.mult = clocksource_khz2mult(cpu_khz,
+ clocksource_tsc.shift);
+ return clocksource_register(&clocksource_tsc);
+ }
+ return 0;
+}
+
+module_init(init_tsc_clocksource);
diff --git a/include/asm-x86_64/proto.h b/include/asm-x86_64/proto.h
index e72cfcd..96cdf2d 100644
--- a/include/asm-x86_64/proto.h
+++ b/include/asm-x86_64/proto.h
@@ -45,7 +45,6 @@ extern u32 pmtmr_ioport;
#else
#define pmtmr_ioport 0
#endif
-extern unsigned long long monotonic_base;
extern int sysctl_vsyscall;
extern int nohpet;
extern unsigned long vxtime_hz;
diff --git a/include/asm-x86_64/timex.h b/include/asm-x86_64/timex.h
index adb8c0d..a4d0066 100644
--- a/include/asm-x86_64/timex.h
+++ b/include/asm-x86_64/timex.h
@@ -52,9 +52,6 @@ #define NS_SCALE 10 /* 2^10, care
#define US_SCALE 32 /* 2^32, arbitralrily chosen */

extern struct vxtime_data vxtime;
-
-extern unsigned int do_gettimeoffset_hpet(void);
-extern unsigned int do_gettimeoffset_tsc(void);
+extern void mark_tsc_unstable(void);
extern void set_cyc2ns_scale(unsigned long khz);
-extern int notsc;
#endif

2006-11-29 03:01:27

by john stultz

[permalink] [raw]
Subject: [PATCH 5/5][time][x86_64] Re-enable vsyscall support for x86_64

Cleanup and re-enable vsyscall gettimeofday using the generic
clocksource infrastructure.

Signed-off-by: John Stultz <[email protected]>

arch/x86_64/Kconfig | 4 +
arch/x86_64/kernel/hpet.c | 6 +
arch/x86_64/kernel/time.c | 6 -
arch/x86_64/kernel/tsc.c | 7 ++
arch/x86_64/kernel/vmlinux.lds.S | 28 +++------
arch/x86_64/kernel/vsyscall.c | 121 +++++++++++++++++++++++----------------
include/asm-x86_64/proto.h | 3
include/asm-x86_64/timex.h | 1
include/asm-x86_64/vsyscall.h | 32 +---------
9 files changed, 105 insertions(+), 103 deletions(-)

linux-2.6.19-rc6git11_timeofday-arch-x86-64-vsyscall-reenablement_C7.patch
============================================
diff --git a/arch/x86_64/Kconfig b/arch/x86_64/Kconfig
index 20bcd6d..c8026f8 100644
--- a/arch/x86_64/Kconfig
+++ b/arch/x86_64/Kconfig
@@ -28,6 +28,10 @@ config GENERIC_TIME
bool
default y

+config GENERIC_TIME_VSYSCALL
+ bool
+ default y
+
config ZONE_DMA32
bool
default y
diff --git a/arch/x86_64/kernel/hpet.c b/arch/x86_64/kernel/hpet.c
index c00b01a..2d3aed1 100644
--- a/arch/x86_64/kernel/hpet.c
+++ b/arch/x86_64/kernel/hpet.c
@@ -440,6 +440,11 @@ static cycle_t read_hpet(void)
return (cycle_t)readl(hpet_ptr);
}

+static cycle_t __vsyscall_fn vread_hpet(void)
+{
+ return (cycle_t)readl((void *)fix_to_virt(VSYSCALL_HPET) + 0xf0);
+}
+
struct clocksource clocksource_hpet = {
.name = "hpet",
.rating = 250,
@@ -448,6 +453,7 @@ struct clocksource clocksource_hpet = {
.mult = 0, /* set below */
.shift = HPET_SHIFT,
.is_continuous = 1,
+ .vread = vread_hpet,
};

static int __init init_hpet_clocksource(void)
diff --git a/arch/x86_64/kernel/time.c b/arch/x86_64/kernel/time.c
index 4bc737c..17bb7de 100644
--- a/arch/x86_64/kernel/time.c
+++ b/arch/x86_64/kernel/time.c
@@ -53,13 +53,7 @@ DEFINE_SPINLOCK(rtc_lock);
EXPORT_SYMBOL(rtc_lock);
DEFINE_SPINLOCK(i8253_lock);

-unsigned long vxtime_hz = PIT_TICK_RATE;
-
-struct vxtime_data __vxtime __section_vxtime; /* for vsyscalls */
-
volatile unsigned long __jiffies __section_jiffies = INITIAL_JIFFIES;
-struct timespec __xtime __section_xtime;
-struct timezone __sys_tz __section_sys_tz;

unsigned long profile_pc(struct pt_regs *regs)
{
diff --git a/arch/x86_64/kernel/tsc.c b/arch/x86_64/kernel/tsc.c
index 682e122..5c768cf 100644
--- a/arch/x86_64/kernel/tsc.c
+++ b/arch/x86_64/kernel/tsc.c
@@ -185,6 +185,12 @@ static cycle_t read_tsc(void)
return ret;
}

+static cycle_t __vsyscall_fn vread_tsc(void)
+{
+ cycle_t ret = (cycle_t)get_cycles_sync();
+ return ret;
+}
+
static struct clocksource clocksource_tsc = {
.name = "tsc",
.rating = 300,
@@ -194,6 +200,7 @@ static struct clocksource clocksource_ts
.shift = 22,
.update_callback = tsc_update_callback,
.is_continuous = 1,
+ .vread = vread_tsc,
};

static int tsc_update_callback(void)
diff --git a/arch/x86_64/kernel/vmlinux.lds.S b/arch/x86_64/kernel/vmlinux.lds.S
index d9534e7..5b10798 100644
--- a/arch/x86_64/kernel/vmlinux.lds.S
+++ b/arch/x86_64/kernel/vmlinux.lds.S
@@ -94,31 +94,25 @@ #define VVIRT(x) (ADDR(x) - VVIRT_OFFSET
__vsyscall_0 = VSYSCALL_VIRT_ADDR;

. = ALIGN(CONFIG_X86_L1_CACHE_BYTES);
- .xtime_lock : AT(VLOAD(.xtime_lock)) { *(.xtime_lock) }
- xtime_lock = VVIRT(.xtime_lock);
-
- .vxtime : AT(VLOAD(.vxtime)) { *(.vxtime) }
- vxtime = VVIRT(.vxtime);
+ .vsyscall_fn : AT(VLOAD(.vsyscall_fn)) { *(.vsyscall_fn) }
+ . = ALIGN(CONFIG_X86_L1_CACHE_BYTES);
+ .vsyscall_gtod_data : AT(VLOAD(.vsyscall_gtod_data))
+ { *(.vsyscall_gtod_data) }
+ vsyscall_gtod_data = VVIRT(.vsyscall_gtod_data);

.vgetcpu_mode : AT(VLOAD(.vgetcpu_mode)) { *(.vgetcpu_mode) }
vgetcpu_mode = VVIRT(.vgetcpu_mode);

- .sys_tz : AT(VLOAD(.sys_tz)) { *(.sys_tz) }
- sys_tz = VVIRT(.sys_tz);
-
- .sysctl_vsyscall : AT(VLOAD(.sysctl_vsyscall)) { *(.sysctl_vsyscall) }
- sysctl_vsyscall = VVIRT(.sysctl_vsyscall);
-
- .xtime : AT(VLOAD(.xtime)) { *(.xtime) }
- xtime = VVIRT(.xtime);
-
. = ALIGN(CONFIG_X86_L1_CACHE_BYTES);
.jiffies : AT(VLOAD(.jiffies)) { *(.jiffies) }
jiffies = VVIRT(.jiffies);

- .vsyscall_1 ADDR(.vsyscall_0) + 1024: AT(VLOAD(.vsyscall_1)) { *(.vsyscall_1) }
- .vsyscall_2 ADDR(.vsyscall_0) + 2048: AT(VLOAD(.vsyscall_2)) { *(.vsyscall_2) }
- .vsyscall_3 ADDR(.vsyscall_0) + 3072: AT(VLOAD(.vsyscall_3)) { *(.vsyscall_3) }
+ .vsyscall_1 ADDR(.vsyscall_0) + 1024: AT(VLOAD(.vsyscall_1))
+ { *(.vsyscall_1) }
+ .vsyscall_2 ADDR(.vsyscall_0) + 2048: AT(VLOAD(.vsyscall_2))
+ { *(.vsyscall_2) }
+ .vsyscall_3 ADDR(.vsyscall_0) + 3072: AT(VLOAD(.vsyscall_3))
+ { *(.vsyscall_3) }

. = VSYSCALL_VIRT_ADDR + 4096;

diff --git a/arch/x86_64/kernel/vsyscall.c b/arch/x86_64/kernel/vsyscall.c
index 92546c1..26e30c0 100644
--- a/arch/x86_64/kernel/vsyscall.c
+++ b/arch/x86_64/kernel/vsyscall.c
@@ -26,6 +26,7 @@ #include <linux/timer.h>
#include <linux/seqlock.h>
#include <linux/jiffies.h>
#include <linux/sysctl.h>
+#include <linux/clocksource.h>
#include <linux/getcpu.h>
#include <linux/cpu.h>
#include <linux/smp.h>
@@ -34,6 +35,7 @@ #include <linux/notifier.h>
#include <asm/vsyscall.h>
#include <asm/pgtable.h>
#include <asm/page.h>
+#include <asm/unistd.h>
#include <asm/fixmap.h>
#include <asm/errno.h>
#include <asm/io.h>
@@ -43,56 +45,41 @@ #include <asm/topology.h>

#define __vsyscall(nr) __attribute__ ((unused,__section__(".vsyscall_" #nr)))

-int __sysctl_vsyscall __section_sysctl_vsyscall = 1;
-seqlock_t __xtime_lock __section_xtime_lock = SEQLOCK_UNLOCKED;
+struct vsyscall_gtod_data_t {
+ seqlock_t lock;
+ int sysctl_enabled;
+ struct timeval wall_time_tv;
+ struct timezone sys_tz;
+ cycle_t offset_base;
+ struct clocksource clock;
+};
int __vgetcpu_mode __section_vgetcpu_mode;

-#include <asm/unistd.h>
-
-static __always_inline void timeval_normalize(struct timeval * tv)
+struct vsyscall_gtod_data_t __vsyscall_gtod_data __section_vsyscall_gtod_data =
{
- time_t __sec;
-
- __sec = tv->tv_usec / 1000000;
- if (__sec) {
- tv->tv_usec %= 1000000;
- tv->tv_sec += __sec;
- }
-}
+ .lock = SEQLOCK_UNLOCKED,
+ .sysctl_enabled = 1,
+};

-static __always_inline void do_vgettimeofday(struct timeval * tv)
+void update_vsyscall(struct timespec *wall_time, struct clocksource *clock)
{
- long sequence, t;
- unsigned long sec, usec;
-
- do {
- sequence = read_seqbegin(&__xtime_lock);
-
- sec = __xtime.tv_sec;
- usec = __xtime.tv_nsec / 1000;
-
- if (__vxtime.mode != VXTIME_HPET) {
- t = get_cycles_sync();
- if (t < __vxtime.last_tsc)
- t = __vxtime.last_tsc;
- usec += ((t - __vxtime.last_tsc) *
- __vxtime.tsc_quot) >> 32;
- /* See comment in x86_64 do_gettimeofday. */
- } else {
- usec += ((readl((void __iomem *)
- fix_to_virt(VSYSCALL_HPET) + 0xf0) -
- __vxtime.last) * __vxtime.quot) >> 32;
- }
- } while (read_seqretry(&__xtime_lock, sequence));
-
- tv->tv_sec = sec + usec / 1000000;
- tv->tv_usec = usec % 1000000;
+ unsigned long flags;
+
+ write_seqlock_irqsave(&vsyscall_gtod_data.lock, flags);
+ /* copy vsyscall data */
+ vsyscall_gtod_data.clock = *clock;
+ vsyscall_gtod_data.wall_time_tv.tv_sec = wall_time->tv_sec;
+ vsyscall_gtod_data.wall_time_tv.tv_usec = wall_time->tv_nsec/1000;
+ vsyscall_gtod_data.sys_tz = sys_tz;
+ write_sequnlock_irqrestore(&vsyscall_gtod_data.lock, flags);
}

-/* RED-PEN may want to readd seq locking, but then the variable should be write-once. */
+/* RED-PEN may want to readd seq locking, but then the variable should be
+ * write-once.
+ */
static __always_inline void do_get_tz(struct timezone * tz)
{
- *tz = __sys_tz;
+ *tz = __vsyscall_gtod_data.sys_tz;
}

static __always_inline int gettimeofday(struct timeval *tv, struct timezone *tz)
@@ -100,7 +87,8 @@ static __always_inline int gettimeofday(
int ret;
asm volatile("vsysc2: syscall"
: "=a" (ret)
- : "0" (__NR_gettimeofday),"D" (tv),"S" (tz) : __syscall_clobber );
+ : "0" (__NR_gettimeofday),"D" (tv),"S" (tz)
+ : __syscall_clobber );
return ret;
}

@@ -113,10 +101,44 @@ static __always_inline long time_syscall
return secs;
}

+static __always_inline void do_vgettimeofday(struct timeval * tv)
+{
+ cycle_t now, base, mask, cycle_delta;
+ unsigned long seq, mult, shift, nsec_delta;
+ cycle_t (*vread)(void);
+ do {
+ seq = read_seqbegin(&__vsyscall_gtod_data.lock);
+
+ vread = __vsyscall_gtod_data.clock.vread;
+ if (unlikely(!__vsyscall_gtod_data.sysctl_enabled || !vread)) {
+ gettimeofday(tv,0);
+ return;
+ }
+ now = vread();
+ base = __vsyscall_gtod_data.clock.cycle_last;
+ mask = __vsyscall_gtod_data.clock.mask;
+ mult = __vsyscall_gtod_data.clock.mult;
+ shift = __vsyscall_gtod_data.clock.shift;
+
+ *tv = __vsyscall_gtod_data.wall_time_tv;
+
+ } while (read_seqretry(&__vsyscall_gtod_data.lock, seq));
+
+ /* calculate interval: */
+ cycle_delta = (now - base) & mask;
+ /* convert to nsecs: */
+ nsec_delta = (cycle_delta * mult) >> shift;
+
+ /* convert to usecs and add to timespec: */
+ tv->tv_usec += nsec_delta / NSEC_PER_USEC;
+ while (tv->tv_usec > USEC_PER_SEC) {
+ tv->tv_sec += 1;
+ tv->tv_usec -= USEC_PER_SEC;
+ }
+}
+
int __vsyscall(0) vgettimeofday(struct timeval * tv, struct timezone * tz)
{
- if (!__sysctl_vsyscall)
- return gettimeofday(tv,tz);
if (tv)
do_vgettimeofday(tv);
if (tz)
@@ -128,11 +150,11 @@ int __vsyscall(0) vgettimeofday(struct t
* unlikely */
time_t __vsyscall(1) vtime(time_t *t)
{
- if (!__sysctl_vsyscall)
+ if (unlikely(!__vsyscall_gtod_data.sysctl_enabled))
return time_syscall(t);
else if (t)
- *t = __xtime.tv_sec;
- return __xtime.tv_sec;
+ *t = __vsyscall_gtod_data.wall_time_tv.tv_sec;
+ return __vsyscall_gtod_data.wall_time_tv.tv_sec;
}

/* Fast way to get current CPU and node.
@@ -209,7 +231,7 @@ static int vsyscall_sysctl_change(ctl_ta
ret = -ENOMEM;
goto out;
}
- if (!sysctl_vsyscall) {
+ if (!vsyscall_gtod_data.sysctl_enabled) {
writew(SYSCALL, map1);
writew(SYSCALL, map2);
} else {
@@ -232,7 +254,8 @@ static int vsyscall_sysctl_nostrat(ctl_t

static ctl_table kernel_table2[] = {
{ .ctl_name = 99, .procname = "vsyscall64",
- .data = &sysctl_vsyscall, .maxlen = sizeof(int), .mode = 0644,
+ .data = &vsyscall_gtod_data.sysctl_enabled, .maxlen = sizeof(int),
+ .mode = 0644,
.strategy = vsyscall_sysctl_nostrat,
.proc_handler = vsyscall_sysctl_change },
{ 0, }
diff --git a/include/asm-x86_64/proto.h b/include/asm-x86_64/proto.h
index 96cdf2d..020de76 100644
--- a/include/asm-x86_64/proto.h
+++ b/include/asm-x86_64/proto.h
@@ -45,10 +45,7 @@ extern u32 pmtmr_ioport;
#else
#define pmtmr_ioport 0
#endif
-extern int sysctl_vsyscall;
extern int nohpet;
-extern unsigned long vxtime_hz;
-extern void time_init_gtod(void);

extern void early_printk(const char *fmt, ...) __attribute__((format(printf,1,2)));

diff --git a/include/asm-x86_64/timex.h b/include/asm-x86_64/timex.h
index a4d0066..a73d535 100644
--- a/include/asm-x86_64/timex.h
+++ b/include/asm-x86_64/timex.h
@@ -51,7 +51,6 @@ #define FSEC_PER_TICK (FSEC_PER_SEC / HZ
#define NS_SCALE 10 /* 2^10, carefully chosen */
#define US_SCALE 32 /* 2^32, arbitralrily chosen */

-extern struct vxtime_data vxtime;
extern void mark_tsc_unstable(void);
extern void set_cyc2ns_scale(unsigned long khz);
#endif
diff --git a/include/asm-x86_64/vsyscall.h b/include/asm-x86_64/vsyscall.h
index 01d1c17..f73a0a0 100644
--- a/include/asm-x86_64/vsyscall.h
+++ b/include/asm-x86_64/vsyscall.h
@@ -15,51 +15,29 @@ #define VSYSCALL_ADDR(vsyscall_nr) (VSYS
#ifdef __KERNEL__
#include <linux/seqlock.h>

-#define __section_vxtime __attribute__ ((unused, __section__ (".vxtime"), aligned(16)))
#define __section_vgetcpu_mode __attribute__ ((unused, __section__ (".vgetcpu_mode"), aligned(16)))
#define __section_jiffies __attribute__ ((unused, __section__ (".jiffies"), aligned(16)))
-#define __section_sys_tz __attribute__ ((unused, __section__ (".sys_tz"), aligned(16)))
-#define __section_sysctl_vsyscall __attribute__ ((unused, __section__ (".sysctl_vsyscall"), aligned(16)))
-#define __section_xtime __attribute__ ((unused, __section__ (".xtime"), aligned(16)))
-#define __section_xtime_lock __attribute__ ((unused, __section__ (".xtime_lock"), aligned(16)))

-#define VXTIME_TSC 1
-#define VXTIME_HPET 2
-#define VXTIME_PMTMR 3
+/* Definitions for CONFIG_GENERIC_TIME definitions */
+#define __section_vsyscall_gtod_data __attribute__ \
+ ((unused, __section__ (".vsyscall_gtod_data"),aligned(16)))
+#define __vsyscall_fn __attribute__ ((unused,__section__(".vsyscall_fn")))

#define VGETCPU_RDTSCP 1
#define VGETCPU_LSL 2

-struct vxtime_data {
- long hpet_address; /* HPET base address */
- int last;
- unsigned long last_tsc;
- long quot;
- long tsc_quot;
- int mode;
-};

#define hpet_readl(a) readl((const void __iomem *)fix_to_virt(FIX_HPET_BASE) + a)
#define hpet_writel(d,a) writel(d, (void __iomem *)fix_to_virt(FIX_HPET_BASE) + a)

-/* vsyscall space (readonly) */
-extern struct vxtime_data __vxtime;
extern int __vgetcpu_mode;
-extern struct timespec __xtime;
extern volatile unsigned long __jiffies;
-extern struct timezone __sys_tz;
-extern seqlock_t __xtime_lock;

/* kernel space (writeable) */
-extern struct vxtime_data vxtime;
extern int vgetcpu_mode;
extern struct timezone sys_tz;
-extern int sysctl_vsyscall;
extern seqlock_t xtime_lock;
-
-extern int sysctl_vsyscall;
-
-#define ARCH_HAVE_XTIME_LOCK 1
+extern struct vsyscall_gtod_data_t vsyscall_gtod_data;

#endif /* __KERNEL__ */

2006-12-11 00:38:08

by Andrea Arcangeli

[permalink] [raw]
Subject: rdtscp vgettimeofday

Hello,

As far as I can see, many changes happened but nobody has yet added
the rdtscp support to x86-64. rdtscp finally solves the problem and it
obsoletes hpet for timekeeping and it allows a fully userland
gettimeofday running at maximum speed in userland.

Before rdtscp we could never index the rdtsc offset in a proper index
without being in kernel with preemption disabled, so it could never
work reliably.

What's the status of the DSO API? Does it break backwards
compatibility or is the production glibc already capable of handling
that new kernel API?

I need rdtscp working on vsyscalls ASAP, but I should first understand
if I need to base my code on top of the vDSO patch or if to fork it
off in a dead branch to preserve backwards compatibility with current
glibc userland.

Thanks.

2006-12-11 21:17:26

by dean gaudet

[permalink] [raw]
Subject: Re: rdtscp vgettimeofday

On Mon, 11 Dec 2006, Andrea Arcangeli wrote:

> As far as I can see, many changes happened but nobody has yet added
> the rdtscp support to x86-64. rdtscp finally solves the problem and it
> obsoletes hpet for timekeeping and it allows a fully userland
> gettimeofday running at maximum speed in userland.

rdtscp doesn't solve anything extra which can't already be solved with
existing vgetcpu (based on lsl) and rdtsc. which have the advantage of
working on all x86, not just the (currently) rare revF opteron.

lsl-based vgetcpu is relatively slow (because it is a protected
instruction with lots of microcode) -- but there are other options which
continue to work on all x86 (see <http://lkml.org/lkml/2006/11/13/401>).


> Before rdtscp we could never index the rdtsc offset in a proper index
> without being in kernel with preemption disabled, so it could never
> work reliably.

even with rdtscp you have to deal with the definite possibility of being
scheduled away in the middle of the computation. arguably you need to
deal with the possibility of being scheduled away *and* back again to the
same cpu (so testing cpu# at top and bottom of a loop isn't sufficient).

suleiman proposed a per-cpu scheduling event number to deal with that...
not sure what folks think of that idea.

-dean

2006-12-11 21:32:34

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: rdtscp vgettimeofday

On Mon, Dec 11, 2006 at 01:17:25PM -0800, dean gaudet wrote:
> rdtscp doesn't solve anything extra [..]
> [..] lsl-based vgetcpu is relatively slow

Well, if you accept to run slow there's nothing to solve in the first
place indeed.

If nothing else rdtscp should avoid the mess of restarting a
vsyscalls, which is quite a difficult problem as it heavily depends on
the compiler/dwarf.

> even with rdtscp you have to deal with the definite possibility of being
> scheduled away in the middle of the computation. arguably you need
> to

Isn't rdtscp atomic? all you need is to read atomically the current
contents of the tsc and the index to use in a per-cpu table exported
in readonly. This table will contain a per-cpu seqlock as well. Then a
math logic has to be built with per-cpu threads, so that those per-cpu
tables are updated by cpufreq and at regular intervals.

If this is all wrong and it's not feasible to implement a safe and
monothonic vgettimeofday that doesn't access the southbridge and that
doesn't require restarting the vsyscall manually by patching rip/rsp,
I've an hard time to see how rdtscp is useful at all. I hope somebody
thought about those issues before adding a new instruction to a
popular CPU ;).

2006-12-11 23:15:46

by dean gaudet

[permalink] [raw]
Subject: Re: rdtscp vgettimeofday

On Mon, 11 Dec 2006, Andrea Arcangeli wrote:

> On Mon, Dec 11, 2006 at 01:17:25PM -0800, dean gaudet wrote:
> > rdtscp doesn't solve anything extra [..]
> > [..] lsl-based vgetcpu is relatively slow
>
> Well, if you accept to run slow there's nothing to solve in the first
> place indeed.
>
> If nothing else rdtscp should avoid the mess of restarting a
> vsyscalls, which is quite a difficult problem as it heavily depends on
> the compiler/dwarf.

rdtscp gets you 2 of the 5 values you need to compute the time. anything
can happen between when you do the rdtscp and do the other 3 reads: the
computation is (((tsc-A)*B)>>N)+C where N is a constant, and A, B, C are
per-cpu data.

A/B/C change a few times a second (to avoid 32-bit rollover in (tsc-A)),
every time there's a halt, and every P-state transition.

if you lose your tick in the middle of those reads any number of things
can happen to screw the computation... including being scheduled on
another core and mixing values from two cores.


> > even with rdtscp you have to deal with the definite possibility of being
> > scheduled away in the middle of the computation. arguably you need
> > to
>
> Isn't rdtscp atomic? all you need is to read atomically the current
> contents of the tsc and the index to use in a per-cpu table exported
> in readonly. This table will contain a per-cpu seqlock as well. Then a
> math logic has to be built with per-cpu threads, so that those per-cpu
> tables are updated by cpufreq and at regular intervals.
>
> If this is all wrong and it's not feasible to implement a safe and
> monothonic vgettimeofday that doesn't access the southbridge and that
> doesn't require restarting the vsyscall manually by patching rip/rsp,
> I've an hard time to see how rdtscp is useful at all. I hope somebody
> thought about those issues before adding a new instruction to a
> popular CPU ;).

oh i think there are several solutions which will work... and i also think
rdtscp wasn't a necessary addition to the ISA :)

-dean

2006-12-11 23:38:07

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: rdtscp vgettimeofday

On Mon, Dec 11, 2006 at 03:15:44PM -0800, dean gaudet wrote:
> rdtscp gets you 2 of the 5 values you need to compute the time. anything
> can happen between when you do the rdtscp and do the other 3 reads: the
> computation is (((tsc-A)*B)>>N)+C where N is a constant, and A, B, C are
> per-cpu data.
> A/B/C change a few times a second (to avoid 32-bit rollover in (tsc-A)),
> every time there's a halt, and every P-state transition.

This is wrong. There's the D variable too, the seq lock.

The thing I've in mind is something like:

rdstcp (get tsc and cpu atomic) this is fundamental without tsc
and cpu read atomically nothing of the below is possible

read D from cpu we got from rdtscp (seqlock)
smb_rmb()
check that D isn't during the race condition (last LSB clear
or similar) or restart
rdstcp again (tsc and cpu atomic)
check that cpu is still the same or restart
index the per-cpu array and get the safe A B C
smp_rmb()
read per-cpu D again and check that it didn't change or restart

Then you have tsc, A, B and C all atomic. N is a constant. rdtsc again
is fundamental in getting this info all atomic w/o accessing the
southbridge and without expensive asm instruction.

> if you lose your tick in the middle of those reads any number of things
> can happen to screw the computation... including being scheduled on
> another core and mixing values from two cores.

Being scheduled in another core is normal. continuing gettimeofday
from another core after you have the tsc value is just fine.

If something the problem is to generate A B C in per-cpu data with a
per-cpu seqlock around it. That's the job for the per-cpu kernel
thread.

The only real trouble I see is the offset from the last irq. It's
possible to make this to work we need to rotate the timer irq across
all cpus at regular intervals (before the tsc2usec measurement error
showup).

> oh i think there are several solutions which will work... and i also think
> rdtscp wasn't a necessary addition to the ISA :)

Please don't suggest me the userland rsp manual unwinding, that's
orders of magnitude more fragile and it sounds much more complex too ;).