diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.41-bk2-core/arch/i386/Config.help linux/arch/i386/Config.help
--- linux-2.5.41-bk2-core/arch/i386/Config.help Wed Oct 9 13:55:48 2002
+++ linux/arch/i386/Config.help Wed Oct 9 14:08:47 2002
@@ -52,6 +52,75 @@
Say Y here if you are building a kernel for a desktop, embedded
or real-time system. Say N if you are unsure.
+High-res-timers
+CONFIG_HIGH_RES_TIMERS
+ POSIX timers are available by default. This option enables high
+ resolution POSIX timers. With this option the resolution is at
+ least 1 micro second. High resolution is not free. If enabled this
+ option will add a small overhead each time a timer expires that is
+ not on a 1/HZ tick boundry. If no such timers are used the overhead
+ is nil.
+
+ This option enables two additional POSIX CLOCKS, CLOCK_REALTIME_HR
+ and CLOCK_MONOTONIC_HR. Note that this option does not change the
+ resolution of CLOCK_REALTIME or CLOCK_MONOTONIC which remain at 1/HZ
+ resolution.
+
+High-res-timers clock
+CONFIG_HIGH_RES_TIMER_ACPI_PM
+ This option allows you to choose the wall clock timer for your system.
+ With high resolution timers on the x86 platforms it is best to keep
+ the interrupt generating timer separate from the time keeping timer.
+ On x86 platforms there are three possible sources implemented for the
+ wall clock. These are:
+
+ <timer> <resolution>
+ ACPI power management (pm) timer ~280 nano seconds
+ TSC (Time Stamp Counter) 1/CPU clock
+ PIT (Programmable Interrupt Timer) ~838 nano seconds
+
+ The PIT is used to generate interrupts and at any given time will be
+ programmed to interrupt when the next timer is to expire or on the
+ next 1/HZ tick. For this reason it is best to not use this timer as
+ the wall clock timer. This timer has a resolution of 838 nano
+ seconds. THIS OPTION SHOULD ONLY BE USED IF BOTH ACPI AND TSC ARE
+ NOT AVAILABLE.
+
+ The TSC runs at the cpu clock rate (i.e. its resolution is 1/CPU
+ clock) and it has a very low access time. However, it is subject,
+ in some (incorrect) processors, to throttling to cool the cpu, and
+ to other slow downs during power management. If your cpu is correct
+ and does not change the TSC frequency for throttling or power
+ management this is the best clock timer.
+
+ The ACPI pm timer is available on systems with Advanced Configuration
+ and Power Interface support. The pm timer is available on these
+ systems even if you don't use or enable ACPI in the software or the
+ BIOS (but see Default ACPI pm timer address). The timer has a
+ resolution of about 280 nanoseconds, however, the access time is a bit
+ higher that that of the TSC. Since it is part of ACPI it is intended
+ to keep track of time while the system is under power management, thus
+ it is not subject to the problems of the TSC.
+
+ If you enable the ACPI pm timer and it can not be found, it is
+ possible that your BIOS is not producing the ACPI table or that your
+ machine does not support ACPI. In the former case, see "Default ACPI
+ pm timer address". If the timer is not found the boot will fail when
+ trying to calibrate the delay timer.
+
+Default ACPI pm timer address
+CONFIG_HIGH_RES_TIMER_ACPI_PM_ADD
+ This option is available for use on systems where the BIOS does not
+ generate the ACPI tables if ACPI is not enabled. For example some
+ BIOSes will not generate the ACPI tables if APM is enabled. The ACPI
+ pm timer is still available but can not be found by the software.
+ This option allows you to supply the needed address. When the high
+ resolution timers code finds a valid ACPI pm timer address it reports
+ it in the boot messages log (look for lines that begin with
+ "High-res-timers:"). You can turn on the ACPI support in the BIOS,
+ boot the system and find this value. You can then enter it at
+ configure time. Both the report and the entry are in decimal.
+
CONFIG_X86
This is Linux's home port. Linux was originally native to the Intel
386, and runs on all the later x86 processors including the Intel
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.41-bk2-core/arch/i386/config.in linux/arch/i386/config.in
--- linux-2.5.41-bk2-core/arch/i386/config.in Wed Oct 9 14:01:44 2002
+++ linux/arch/i386/config.in Wed Oct 9 14:08:47 2002
@@ -156,6 +156,23 @@
bool 'Huge TLB Page Support' CONFIG_HUGETLB_PAGE
bool 'Symmetric multi-processing support' CONFIG_SMP
+bool 'Configure High-Resolution-Timers' CONFIG_HIGH_RES_TIMERS
+#
+# We assume that if the box doesn't have a TSC it doesn't have ACPI either.
+#
+if [ "$CONFIG_HIGH_RES_TIMERS" = "y" -a "$CONFIG_X86_TSC" = "y" ]; then
+ choice 'Clock source?' \
+ "ACPI-pm-timer CONFIG_HIGH_RES_TIMER_ACPI_PM \
+ Time-stamp-counter/TSC CONFIG_HIGH_RES_TIMER_TSC \
+ Programable-interrupt-timer/PIT CONFIG_HIGH_RES_TIMER_PIT" Time-stamp-counter/TSC
+else
+ if [ "$CONFIG_HIGH_RES_TIMERS" = "y" ]; then
+ define_bool CONFIG_HIGH_RES_TIMER_PIT y
+ fi
+fi
+if [ "$CONFIG_HIGH_RES_TIMER_ACPI_PM" = "y" ]; then
+ int 'Default ACPI pm timer address' CONFIG_HIGH_RES_TIMER_ACPI_PM_ADD 0
+fi
bool 'Preemptible Kernel' CONFIG_PREEMPT
if [ "$CONFIG_SMP" != "y" ]; then
bool 'Local APIC support on uniprocessors' CONFIG_X86_UP_APIC
@@ -350,6 +367,7 @@
else
define_bool CONFIG_BLK_DEV_HD n
fi
+
endmenu
mainmenu_option next_comment
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.41-bk2-core/arch/i386/kernel/Makefile linux/arch/i386/kernel/Makefile
--- linux-2.5.41-bk2-core/arch/i386/kernel/Makefile Wed Oct 9 14:01:44 2002
+++ linux/arch/i386/kernel/Makefile Wed Oct 9 14:08:47 2002
@@ -17,6 +17,7 @@
obj-$(CONFIG_KGDB) += kgdb_stub.o
obj-$(CONFIG_X86_MSR) += msr.o
obj-$(CONFIG_X86_CPUID) += cpuid.o
+obj-$(CONFIG_HIGH_RES_TIMER_ACPI_PM) += high-res-tbxfroot.o
obj-$(CONFIG_MICROCODE) += microcode.o
obj-$(CONFIG_APM) += apm.o
obj-$(CONFIG_ACPI) += acpi.o
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.41-bk2-core/arch/i386/kernel/apic.c linux/arch/i386/kernel/apic.c
--- linux-2.5.41-bk2-core/arch/i386/kernel/apic.c Wed Oct 9 13:55:48 2002
+++ linux/arch/i386/kernel/apic.c Wed Oct 9 14:08:47 2002
@@ -801,7 +801,7 @@
* P5 APIC double write bug.
*/
-#define APIC_DIVISOR 16
+#define APIC_DIVISOR 1
void __setup_APIC_LVTT(unsigned int clocks)
{
@@ -812,12 +812,12 @@
apic_write_around(APIC_LVTT, lvtt1_value);
/*
- * Divide PICLK by 16
+ * Divide PICLK by 1
*/
tmp_value = apic_read(APIC_TDCR);
apic_write_around(APIC_TDCR, (tmp_value
& ~(APIC_TDR_DIV_1 | APIC_TDR_DIV_TMBASE))
- | APIC_TDR_DIV_16);
+ | APIC_TDR_DIV_1);
apic_write_around(APIC_TMICT, clocks/APIC_DIVISOR);
}
@@ -1030,10 +1030,20 @@
* Interrupts are already masked off at this point.
*/
prof_counter[cpu] = prof_multiplier[cpu];
+ /*
+ * deal with profiling later...
+ */
+#ifndef CONFIG_HIGH_RES_TIMERS
if (prof_counter[cpu] != prof_old_multiplier[cpu]) {
__setup_APIC_LVTT(calibration_result/prof_counter[cpu]);
prof_old_multiplier[cpu] = prof_counter[cpu];
}
+#else
+ /*
+ * This is the 1/HZ count, can be changed by HRT code.
+ */
+ __setup_APIC_LVTT(calibration_result);
+#endif
#ifdef CONFIG_SMP
update_process_times(user);
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.41-bk2-core/arch/i386/kernel/high-res-tbxfroot.c linux/arch/i386/kernel/high-res-tbxfroot.c
--- linux-2.5.41-bk2-core/arch/i386/kernel/high-res-tbxfroot.c Wed Dec 31 16:00:00 1969
+++ linux/arch/i386/kernel/high-res-tbxfroot.c Wed Oct 9 14:08:47 2002
@@ -0,0 +1,272 @@
+/******************************************************************************
+ *
+ * Module Name: tbxfroot - Find the root ACPI table (RSDT)
+ * $Revision: 49 $
+ *
+ *****************************************************************************/
+
+/*
+ * Copyright (C) 2000, 2001 R. Byron Moore
+
+ * This code purloined and modified by George Anzinger
+ * Copyright (C) 2002 by MontaVista Software.
+ * It is part of the high-res-timers ACPI option and its sole purpose is
+ * to find the darn timer.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+/* This is most annoying! We want to find the address of the pm timer in the
+ * ACPI hardware package. We know there is one if ACPI is available at all
+ * as it is part of the basic ACPI hardware set.
+ * However, the powers that be have conspired to make it a real
+ * pain to find the address. We have written a minimal search routine
+ * that we use only once on boot up. We try to cover all the bases including
+ * checksum, and version. We will try to get some constants and structures
+ * from the ACPI code in an attempt to follow it, but darn, what a mess.
+ *
+ * First problem, the include files are in the driver package....
+ * and what a mess they are. We pick up the kernel string and types first.
+
+ * But then there is the COMPILER_DEPENDENT_UINT64 ...
+ */
+
+#define COMPILER_DEPENDENT_UINT64 unsigned long long
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <../drivers/acpi/include/actypes.h>
+#include <../drivers/acpi/include/actbl.h>
+#include <../drivers/acpi/include/acconfig.h>
+#include <linux/init.h>
+#include <asm/page.h>
+
+#define STRNCMP(d,s,n) strncmp((d), (s), (NATIVE_INT)(n))
+#define RSDP_CHECKSUM_LENGTH 20
+
+#ifndef CONFIG_ACPI
+/*******************************************************************************
+ *
+ * FUNCTION: hrt_acpi_checksum
+ *
+ * PARAMETERS: Buffer - Buffer to checksum
+ * Length - Size of the buffer
+ *
+ * RETURNS 8 bit checksum of buffer
+ *
+ * DESCRIPTION: Computes an 8 bit checksum of the buffer(length) and returns it.
+ *
+ ******************************************************************************/
+static __init
+u8
+hrt_acpi_checksum (
+ void *buffer,
+ u32 length)
+{
+ u8 *limit;
+ u8 *rover;
+ u8 sum = 0;
+
+
+ if (buffer && length) {
+ /* Buffer and Length are valid */
+
+ limit = (u8 *) buffer + length;
+
+ for (rover = buffer; rover < limit; rover++) {
+ sum = (u8) (sum + *rover);
+ }
+ }
+
+ return (sum);
+}
+
+/*******************************************************************************
+ *
+ * FUNCTION: hrt_acpi_scan_memory_for_rsdp
+ *
+ * PARAMETERS: Start_address - Starting pointer for search
+ * Length - Maximum length to search
+ *
+ * RETURN: Pointer to the RSDP if found, otherwise NULL.
+ *
+ * DESCRIPTION: Search a block of memory for the RSDP signature
+ *
+ ******************************************************************************/
+static __init
+u8 *
+hrt_acpi_scan_memory_for_rsdp (
+ u8 *start_address,
+ u32 length)
+{
+ u32 offset;
+ u8 *mem_rover;
+
+
+ /* Search from given start addr for the requested length */
+
+ for (offset = 0, mem_rover = start_address;
+ offset < length;
+ offset += RSDP_SCAN_STEP, mem_rover += RSDP_SCAN_STEP) {
+
+ /* The signature and checksum must both be correct */
+
+ if (STRNCMP ((NATIVE_CHAR *) mem_rover,
+ RSDP_SIG, sizeof (RSDP_SIG)-1) == 0 &&
+ hrt_acpi_checksum (mem_rover, RSDP_CHECKSUM_LENGTH) == 0) {
+ /* If so, we have found the RSDP */
+
+;
+ return (mem_rover);
+ }
+ }
+
+ /* Searched entire block, no RSDP was found */
+
+
+ return (NULL);
+}
+
+
+/*******************************************************************************
+ *
+ * FUNCTION: hrt_acpi_find_rsdp
+ *
+ * PARAMETERS:
+ *
+ * RETURN: Logical address of rsdp
+ *
+ * DESCRIPTION: Search lower 1_mbyte of memory for the root system descriptor
+ * pointer structure. If it is found, return its address,
+ * else return 0.
+ *
+ * NOTE: The RSDP must be either in the first 1_k of the Extended
+ * BIOS Data Area or between E0000 and FFFFF (ACPI 1.0 section
+ * 5.2.2; assertion #421).
+ *
+ ******************************************************************************/
+/* Constants used in searching for the RSDP in low memory */
+
+#define LO_RSDP_WINDOW_BASE 0 /* Physical Address */
+#define HI_RSDP_WINDOW_BASE 0xE0000 /* Physical Address */
+#define LO_RSDP_WINDOW_SIZE 0x400
+#define HI_RSDP_WINDOW_SIZE 0x20000
+#define RSDP_SCAN_STEP 16
+
+static __init
+RSDP_DESCRIPTOR *
+hrt_find_acpi_rsdp (void)
+{
+ u8 *mem_rover;
+
+
+ /*
+ * 1) Search EBDA (low memory) paragraphs
+ */
+ mem_rover = hrt_acpi_scan_memory_for_rsdp((u8 *)__va(LO_RSDP_WINDOW_BASE),
+ LO_RSDP_WINDOW_SIZE);
+
+ if (!mem_rover) {
+ /*
+ * 2) Search upper memory:
+ * 16-byte boundaries in E0000h-F0000h
+ */
+ mem_rover = hrt_acpi_scan_memory_for_rsdp((u8 *)__va(HI_RSDP_WINDOW_BASE),
+ HI_RSDP_WINDOW_SIZE);
+ }
+
+ if (mem_rover) {
+ /* Found it, return the logical address */
+
+ return (RSDP_DESCRIPTOR *)mem_rover;
+ }
+ return (RSDP_DESCRIPTOR *)0;
+}
+
+__init
+u32
+hrt_get_acpi_pm_ptr(void)
+{
+ fadt_descriptor_rev2 *fadt;
+ RSDT_DESCRIPTOR_REV2 *rsdt;
+ XSDT_DESCRIPTOR_REV2 *xsdt;
+ RSDP_DESCRIPTOR *rsdp = hrt_find_acpi_rsdp ();
+
+ if ( ! rsdp){
+ printk("ACPI: System description tables not found\n");
+ return 0;
+ }
+ /*
+ * Now that we have that problem out of the way, lets set up this
+ * timer. We need to figure the addresses based on the revision
+ * of ACPI, which is in this here table we just found.
+ * We will not check the RSDT checksum, but will the FADT.
+ */
+ if ( rsdp->revision == 2){
+ xsdt = (XSDT_DESCRIPTOR_REV2 *)__va(rsdp->xsdt_physical_address);
+ fadt = (fadt_descriptor_rev2 *)__va(xsdt->table_offset_entry [0]);
+ }else{
+ rsdt = (RSDT_DESCRIPTOR_REV2 *)__va(rsdp->rsdt_physical_address);
+ fadt = (fadt_descriptor_rev2 *)__va(rsdt->table_offset_entry [0]);
+ }
+ /*
+ * Verify the signature and the checksum
+ */
+ if (STRNCMP ((NATIVE_CHAR *) fadt->header.signature ,
+ FADT_SIG, sizeof (FADT_SIG)-1) == 0 &&
+ hrt_acpi_checksum ((NATIVE_CHAR *)fadt, fadt->header.length) == 0) {
+ /*
+ * looks good. Again, based on revision,
+ * pluck the addresses we want and get out.
+ */
+ if ( rsdp->revision == 2){
+ return (u32 )fadt->Xpm_tmr_blk.address;
+ }else{
+ return (u32 )fadt->V1_pm_tmr_blk;
+ }
+ }
+ printk("ACPI: Signature or checksum failed on FADT\n");
+ return 0;
+}
+
+#else
+int acpi_get_firmware_table (
+ acpi_string signature,
+ u32 instance,
+ u32 flags,
+ acpi_table_header **table_pointer);
+
+extern fadt_descriptor_rev2 acpi_fadt;
+__init
+u32
+hrt_get_acpi_pm_ptr(void)
+{
+ fadt_descriptor_rev2 *fadt = &acpi_fadt;
+ fadt_descriptor_rev2 local_fadt;
+
+ if (! fadt || !fadt->header.signature[0]){
+ fadt = &local_fadt;
+ acpi_get_firmware_table("FACP",1,0,(acpi_table_header **)&fadt);
+ }
+ if ( ! fadt|| !fadt->header.signature[0]){
+ printk("ACPI: Could not find the ACPI pm timer.");
+ }
+
+ if ( fadt->header.revision == 2){
+ return (u32)fadt->Xpm_tmr_blk.address;
+ }else{
+ return (u32 )fadt->V1_pm_tmr_blk;
+ }
+}
+#endif
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.41-bk2-core/arch/i386/kernel/time.c linux/arch/i386/kernel/time.c
--- linux-2.5.41-bk2-core/arch/i386/kernel/time.c Thu Oct 3 10:41:57 2002
+++ linux/arch/i386/kernel/time.c Wed Oct 9 14:08:47 2002
@@ -29,7 +29,10 @@
* Fixed a xtime SMP race (we need the xtime_lock rw spinlock to
* serialize accesses to xtime/lost_ticks).
*/
-
+/* 2002-8-13 George Anzinger Modified for High res timers:
+ * Copyright (C) 2002 MontaVista Software
+*/
+#define _INCLUDED_FROM_TIME_C
#include <linux/errno.h>
#include <linux/sched.h>
#include <linux/kernel.h>
@@ -62,19 +65,20 @@
extern spinlock_t i8259A_lock;
-#include "do_timer.h"
/*
* for x86_do_profile()
*/
#include <linux/irq.h>
+#include <asm/sc_math.h>
+#include <linux/hrtime.h>
+#include "do_timer.h"
u64 jiffies_64;
unsigned long cpu_khz; /* Detected as we calibrate the TSC */
-/* Number of usecs that the last interrupt was delayed */
-static int delay_at_last_interrupt;
+static __initdata unsigned long tsc_cycles_per_5_jiffies; /* set only if TSC */
static unsigned long last_tsc_low; /* lsb 32 bits of Time Stamp Counter */
@@ -88,7 +92,24 @@
extern rwlock_t xtime_lock;
extern unsigned long wall_jiffies;
+
+#ifndef CONFIG_HIGH_RES_TIMERS
+
+/* Number of usecs that the last interrupt was delayed */
+static int delay_at_last_interrupt;
+
+#endif /* CONFIG_HIGH_RES_TIMERS */
+
spinlock_t rtc_lock = SPIN_LOCK_UNLOCKED;
+/*
+ * We have three of these do_xxx_gettimeoffset() routines:
+ * do_fast_gettimeoffset(void) for TSC systems with out high-res-timers
+ * do_slow_gettimeoffset(void) for ~TSC systems with out high-res-timers
+ * do_highres__gettimeoffset(void) for systems with high-res-timers
+ *
+ * Pick the desired one at compile time...
+ */
+#if ! defined(CONFIG_HIGH_RES_TIMERS) && defined(CONFIG_X86_TSC)
static inline unsigned long do_fast_gettimeoffset(void)
{
@@ -109,23 +130,19 @@
* Using a mull instead of a divl saves up to 31 clock cycles
* in the critical path.
*/
-
- __asm__("mull %2"
- :"=a" (eax), "=d" (edx)
- :"rm" (fast_gettimeoffset_quotient),
- "0" (eax));
-
+ edx = mpy_sc32(eax, fast_gettimeoffset_quotient);
/* our adjusted time offset in microseconds */
return delay_at_last_interrupt + edx;
}
+#define do_gettimeoffset() do_fast_gettimeoffset()
+#endif
#define TICK_SIZE (tick_nsec / 1000)
spinlock_t i8253_lock = SPIN_LOCK_UNLOCKED;
EXPORT_SYMBOL(i8253_lock);
-#ifndef CONFIG_X86_TSC
-
+#if ! defined(CONFIG_HIGH_RES_TIMERS) && ! defined(CONFIG_X86_TSC)
/* This function must be called with interrupts disabled
* It was inspired by Steve McCanne's microtime-i386 for BSD. -- jrs
*
@@ -223,10 +240,21 @@
static unsigned long (*do_gettimeoffset)(void) = do_slow_gettimeoffset;
-#else
+#endif
+
+#ifdef CONFIG_HIGH_RES_TIMERS
-#define do_gettimeoffset() do_fast_gettimeoffset()
+static unsigned long do_highres_gettimeoffset(void)
+{
+ /*
+ * We are under the xtime_lock here.
+ */
+ long tmp = quick_get_cpuctr();
+ long rtn = arch_cycles_to_usec(tmp + sub_jiffie());
+ return rtn;
+}
+#define do_gettimeoffset() do_highres_gettimeoffset()
#endif
/*
@@ -241,16 +269,25 @@
read_lock_irqsave(&xtime_lock, flags);
usec = do_gettimeoffset();
{
+ /*
+ * FIX ME***** Due to adjtime and such
+ * this should be changed to actually update
+ * wall time using the proper routine.
+ * Otherwise we run the risk of time moving
+ * backward due to different interpretations
+ * of the jiffie. I.e jiffie != 1/HZ
+ * (but it is close).
+ */
unsigned long lost = jiffies - wall_jiffies;
if (lost)
- usec += lost * (1000000 / HZ);
+ usec += lost * (USEC_PER_SEC / HZ);
}
sec = xtime.tv_sec;
usec += (xtime.tv_nsec / 1000);
read_unlock_irqrestore(&xtime_lock, flags);
- while (usec >= 1000000) {
- usec -= 1000000;
+ while (usec >= USEC_PER_SEC) {
+ usec -= USEC_PER_SEC;
sec++;
}
@@ -268,10 +305,10 @@
* made, and then undo it!
*/
tv->tv_usec -= do_gettimeoffset();
- tv->tv_usec -= (jiffies - wall_jiffies) * (1000000 / HZ);
+ tv->tv_usec -= (jiffies - wall_jiffies) * (USEC_PER_SEC / HZ);
while (tv->tv_usec < 0) {
- tv->tv_usec += 1000000;
+ tv->tv_usec += USEC_PER_SEC;
tv->tv_sec--;
}
@@ -361,7 +398,7 @@
* timer_interrupt() needs to keep up the real-time clock,
* as well as call the "do_timer()" routine every clocktick
*/
-static inline void do_timer_interrupt(int irq, void *dev_id, struct pt_regs *regs)
+static inline void do_timer_interrupt(int irq, struct pt_regs *regs)
{
#ifdef CONFIG_X86_IO_APIC
if (timer_ack) {
@@ -381,36 +418,29 @@
do_timer_interrupt_hook(regs);
- /*
+ /*
+ * This is dumb for two reasons.
+ * 1.) it is based on wall time which has not yet been updated.
+ * 2.) it is checked each tick for something that happens each
+ * 10 min. Why not use a timer for it? Much lower overhead,
+ * in fact, zero if STA_UNSYNC is set.
+ */
+ /*
* If we have an externally synchronized Linux clock, then update
* CMOS clock accordingly every ~11 minutes. Set_rtc_mmss() has to be
* called as close as possible to 500 ms before the new second starts.
*/
if ((time_status & STA_UNSYNC) == 0 &&
xtime.tv_sec > last_rtc_update + 660 &&
- (xtime.tv_nsec / 1000) >= 500000 - ((unsigned) TICK_SIZE) / 2 &&
- (xtime.tv_nsec / 1000) <= 500000 + ((unsigned) TICK_SIZE) / 2) {
+ (xtime.tv_nsec ) >= 500000000 - ((unsigned) tick_nsec) / 2 &&
+ (xtime.tv_nsec ) <= 500000000 + ((unsigned) tick_nsec) / 2) {
if (set_rtc_mmss(xtime.tv_sec) == 0)
last_rtc_update = xtime.tv_sec;
else
- last_rtc_update = xtime.tv_sec - 600; /* do it again in 60 s */
+ /* do it again in 60 s */
+ last_rtc_update = xtime.tv_sec - 600;
}
-#ifdef CONFIG_MCA
- if( MCA_bus ) {
- /* The PS/2 uses level-triggered interrupts. You can't
- turn them off, nor would you want to (any attempt to
- enable edge-triggered interrupts usually gets intercepted by a
- special hardware circuit). Hence we have to acknowledge
- the timer interrupt. Through some incredibly stupid
- design idea, the reset for IRQ 0 is done by setting the
- high bit of the PPI port B (0x61). Note that some PS/2s,
- notably the 55SX, work fine if this is removed. */
-
- irq = inb_p( 0x61 ); /* read the current state */
- outb_p( irq|0x80, 0x61 ); /* reset the IRQ */
- }
-#endif
}
static int use_tsc;
@@ -422,24 +452,28 @@
*/
void timer_interrupt(int irq, void *dev_id, struct pt_regs *regs)
{
- int count;
-
/*
* Here we are in the timer irq handler. We just have irqs locally
* disabled but we don't know if the timer_bh is running on the other
- * CPU. We need to avoid to SMP race with it. NOTE: we don' t need
+ * CPU. We need to avoid to SMP race with it. NOTE: we don't need
* the irq version of write_lock because as just said we have irq
* locally disabled. -arca
*/
write_lock(&xtime_lock);
+#ifndef CONFIG_HIGH_RES_TIMERS
if (use_tsc)
{
+ int count;
/*
* It is important that these two operations happen almost at
* the same time. We do the RDTSC stuff first, since it's
* faster. To avoid any inconsistencies, we need interrupts
* disabled locally.
+ * Note: It is dumb to put the spin_lock() between these two
+ * operations since we are trying to sync the two clocks.
+ * Also, the rdtscl is so fast, know one will know the
+ * difference.
*/
/*
@@ -447,11 +481,11 @@
* has the SA_INTERRUPT flag set. -arca
*/
- /* read Pentium cycle counter */
+ spin_lock(&i8253_lock);
+ /* read Pentium cycle counter */
rdtscl(last_tsc_low);
- spin_lock(&i8253_lock);
outb_p(0x00, 0x43); /* latch the count ASAP */
count = inb_p(0x40); /* read the latched count */
@@ -461,13 +495,95 @@
count = ((LATCH-1) - count) * TICK_SIZE;
delay_at_last_interrupt = (count + LATCH/2) / LATCH;
}
-
- do_timer_interrupt(irq, NULL, regs);
+#endif /* ! CONFIG_HIGH_RES_TIMERS */
+ do_timer_interrupt(irq, regs);
+#ifdef CONFIG_MCA
+ /*
+ * This code mover here from do_timer_interrupt() as part of the
+ * high-res timers change because it should be done every interrupt
+ * but do_timer_interrupt() wants to return early if it is not a
+ * "1/HZ" tick interrupt. For non-high-res systems the code is in
+ * exactly the same location (i.e. it is moved from the tail of the
+ * above called function to the next thing after the function).
+ */
+ if( MCA_bus ) {
+ /* The PS/2 uses level-triggered interrupts. You can't
+ turn them off, nor would you want to (any attempt to
+ enable edge-triggered interrupts usually gets intercepted by a
+ special hardware circuit). Hence we have to acknowledge
+ the timer interrupt. Through some incredibly stupid
+ design idea, the reset for IRQ 0 is done by setting the
+ high bit of the PPI port B (0x61). Note that some PS/2s,
+ notably the 55SX, work fine if this is removed. */
+
+ irq = inb_p( 0x61 ); /* read the current state */
+ outb_p( irq|0x80, 0x61 ); /* reset the IRQ */
+ }
+#endif
write_unlock(&xtime_lock);
}
+#ifdef CONFIG_HIGH_RES_TIMERS
+/*
+ * ALL_PERIODIC mode is used if we MUST support the NMI watchdog. In this
+ * case we must continue to provide interrupts even if they are not serviced.
+ * In this mode, we leave the chip in periodic mode programmed to interrupt
+ * every jiffie. This is done by, for short intervals, programming a short
+ * time, waiting till it is loaded and then programming the 1/HZ. The chip
+ * will not load the 1/HZ count till the short count expires. If the last
+ * interrupt was programmed to be short, we need to program another short
+ * to cover the remaining part of the jiffie and can then just leave the
+ * chip alone. Note that is is also a low overhead way of doing things as
+ * we do not have to mess with the chip MOST of the time.
+ */
+
+int _schedule_next_int(unsigned long jiffie_f,long sub_jiffie_in, int always)
+{
+ long sub_jiff_offset;
+ IF_ALL_PERIODIC(
+ int * last_was_long = &_last_was_long[smp_processor_id()];
+ if ((sub_jiffie_in == -1) && *last_was_long) return 0);
+ /*
+ * First figure where we are in time.
+ * A note on locking. We are under the timerlist_lock here. This
+ * means that interrupts are off already, so don't use irq versions.
+ */
+ if_SMP( read_lock(&xtime_lock));
+
+ sub_jiff_offset = quick_update_jiffies_sub(jiffie_f);
+
+ if_SMP( read_unlock(&xtime_lock));
+
+
+ if ((IF_ALL_PERIODIC( *last_was_long =) (sub_jiffie_in == -1 ))) {
+
+ sub_jiff_offset = cycles_per_jiffies - sub_jiff_offset;
+
+ }else{
+ sub_jiff_offset = sub_jiffie_in - sub_jiff_offset;
+ }
+ /*
+ * If time is already passed, just return saying so.
+ */
+ if (! always && (sub_jiff_offset < high_res_test_val)){
+ IF_ALL_PERIODIC( *last_was_long = 0);
+ return 1;
+ }
+ reload_timer_chip(sub_jiff_offset);
+ return 0;
+}
+
+#ifdef CONFIG_APM
+void restart_timer(void)
+{
+ start_PIT();
+}
+#endif /* CONFIG__APM */
+#endif /* CONFIG_HIGH_RES_TIMERS */
+
+
/* not static: needed by APM */
unsigned long get_cmos_time(void)
{
@@ -510,6 +626,26 @@
return mktime(year, mon, day, hour, min, sec);
}
+#define CAL_JIFS 5
+#define CALIBRATE_LATCH (((CAL_JIFS * CLOCK_TICK_RATE) + HZ/2)/HZ)
+#define CALIBRATE_TIME ((CAL_JIFS * USEC_PER_SEC)/HZ)
+#define CALIBRATE_TIME_NSEC (CAL_JIFS * (NSEC_PER_SEC/HZ))
+
+#ifdef CONFIG_HIGH_RES_TIMERS
+
+void __init hrtimer_init(void)
+{
+ /*
+ * The init_hrtimers macro is in the choosen support package
+ * depending on the clock source, PIT, TSC, or ACPI pm timer.
+ */
+ init_hrtimers();
+ start_PIT();
+}
+#else
+#define hrtimer_init()
+#endif /* ! CONFIG_HIGH_RES_TIMERS */
+
/* ------ Calibrate the TSC -------
* Return 2^32 * (1 / (TSC clocks per usec)) for do_fast_gettimeoffset().
* Too much 64-bit arithmetic here to do this cleanly in C, and for
@@ -519,8 +655,6 @@
* device.
*/
-#define CALIBRATE_LATCH (5 * LATCH)
-#define CALIBRATE_TIME (5 * 1000020/HZ)
#ifdef CONFIG_X86_TSC
static unsigned long __init calibrate_tsc(void)
@@ -571,6 +705,14 @@
/* Error: ECPUTOOSLOW */
if (endlow <= CALIBRATE_TIME)
goto bad_ctc;
+ /*
+ * endlow at this point is CAL_JIFS*arch clocks
+ * per jiffie. Set up the value for
+ * high_res use. Note: keep the whole
+ * value for now, hrtimer_init will do
+ * the divide (want that precision).
+ */
+ tsc_cycles_per_5_jiffies = endlow;
__asm__("divl %2"
:"=a" (endlow), "=d" (endhigh)
@@ -585,6 +727,9 @@
* 32 bits..
*/
bad_ctc:
+#ifdef CONFIG_HIGH_RES_TIMERS
+ printk("******************** TSC calibrate failed!\n");
+#endif
return 0;
}
#endif /* CONFIG_X86_TSC */
@@ -658,6 +803,7 @@
xtime.tv_sec = get_cmos_time();
xtime.tv_nsec = 0;
+ IF_HIGH_RES(tick_nsec = NSEC_PER_SEC / HZ);
/*
* If we have APM enabled or the CPU clock speed is variable
@@ -700,17 +846,19 @@
#ifndef do_gettimeoffset
do_gettimeoffset = do_fast_gettimeoffset;
#endif
+ /*
+ * Kick off the high res timers
+ */
+ hrtimer_init();
/* report CPU clock rate in Hz.
* The formula is (10^6 * 2^32) / (2^32 * 1 / (clocks/us)) =
* clock/second. Our precision is about 100 ppm.
*/
- { unsigned long eax=0, edx=1000;
- __asm__("divl %2"
- :"=a" (cpu_khz), "=d" (edx)
- :"r" (tsc_quotient),
- "0" (eax), "1" (edx));
- printk("Detected %lu.%03lu MHz processor.\n", cpu_khz / 1000, cpu_khz % 1000);
+ cpu_khz = div_sc32( 1000, tsc_quotient);
+ {
+ printk("Detected %lu.%03lu MHz processor.\n",
+ cpu_khz / 1000, cpu_khz % 1000);
}
#ifdef CONFIG_CPU_FREQ
cpufreq_register_notifier(&time_cpufreq_notifier_block, CPUFREQ_TRANSITION_NOTIFIER);
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.41-bk2-core/arch/i386/mach-generic/do_timer.h linux/arch/i386/mach-generic/do_timer.h
--- linux-2.5.41-bk2-core/arch/i386/mach-generic/do_timer.h Thu Sep 26 11:23:49 2002
+++ linux/arch/i386/mach-generic/do_timer.h Wed Oct 9 14:08:47 2002
@@ -14,6 +14,11 @@
static inline void do_timer_interrupt_hook(struct pt_regs *regs)
{
do_timer(regs);
+ IF_HIGH_RES(
+ if (!(new_jiffie() & 1))
+ return;
+ jiffies_intr = 0;
+ )
/*
* In the SMP case we use the local APIC timer interrupt to do the
* profiling, except when we simulate SMP mode on a uniprocessor
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.41-bk2-core/include/asm-i386/hrtime-M386.h linux/include/asm-i386/hrtime-M386.h
--- linux-2.5.41-bk2-core/include/asm-i386/hrtime-M386.h Wed Dec 31 16:00:00 1969
+++ linux/include/asm-i386/hrtime-M386.h Wed Oct 9 14:08:47 2002
@@ -0,0 +1,247 @@
+/*
+ *
+ * File: include/asm-i386/hrtime-M386.h
+ * Copyright (C) 1999 by the University of Kansas Center for Research, Inc.
+ * Copyright (C) 2001 by MontaVista Software.
+ *
+ * This software was developed by the Information and
+ * Telecommunication Technology Center (ITTC) at the University of
+ * Kansas. Partial funding for this project was provided by Sprint. This
+ * software may be used and distributed according to the terms of the GNU
+ * Public License, incorporated herein by reference. Neither ITTC nor
+ * Sprint accept any liability whatsoever for this product.
+ *
+ * This project was developed under the direction of Dr. Douglas Niehaus.
+ *
+ * Authors: Balaji S., Raghavan Menon
+ * Furquan Ansari, Jason Keimig, Apurva Sheth
+ *
+ * Thanx to Michael Barabanov for helping me with the non-pentium code.
+ *
+ * Please send bug-reports/suggestions/comments to [email protected]
+ *
+ * Further details about this project can be obtained at
+ * http://hegel.ittc.ukans.edu/projects/utime/
+ * or in the file Documentation/utime.txt
+ */
+/* This is in case its not a pentuim or a ppro.
+ * we dont have access to the cycle counters
+ */
+/*
+ * This code swiped from the utime project to support high res timers
+ * Principle thief George Anzinger [email protected]
+ */
+#ifndef _ASM_HRTIME_M386_H
+#define _ASM_HRTIME_M386_H
+
+#ifdef __KERNEL__
+
+
+extern int base_c0,base_c0_offset;
+#define timer_latch_reset(x) _timer_latch_reset = x
+extern int _timer_latch_reset;
+
+/*
+ * Never call this routine with local ints on.
+ * update_jiffies_sub()
+ */
+
+extern inline unsigned int read_timer_chip(void)
+{
+ unsigned int next_intr;
+
+ LATCH_CNT0();
+ READ_CNT0(next_intr);
+ return next_intr;
+}
+
+#define HR_SCALE_ARCH_NSEC 20
+#define HR_SCALE_ARCH_USEC 30
+#define HR_SCALE_NSEC_ARCH 32
+#define HR_SCALE_USEC_ARCH 29
+
+#define cf_arch_to_usec (SC_n(HR_SCALE_ARCH_USEC,1000000)/ \
+ (long long)CLOCK_TICK_RATE)
+
+extern inline int arch_cycles_to_usec(long update)
+{
+ return (mpy_sc_n(HR_SCALE_ARCH_USEC, update ,arch_to_usec));
+}
+#define cf_arch_to_nsec (SC_n(HR_SCALE_ARCH_NSEC,1000000000)/ \
+ (long long)CLOCK_TICK_RATE)
+
+extern inline int arch_cycles_to_nsec(long update)
+{
+ return mpy_sc_n(HR_SCALE_ARCH_NSEC, update, arch_to_nsec);
+}
+/*
+ * And the other way...
+ */
+#define cf_usec_to_arch (SC_n( HR_SCALE_USEC_ARCH,CLOCK_TICK_RATE)/ \
+ (long long)1000000)
+extern inline int usec_to_arch_cycles(unsigned long usec)
+{
+ return mpy_sc_n(HR_SCALE_USEC_ARCH,usec,usec_to_arch);
+}
+#define cf_nsec_to_arch (SC_n( HR_SCALE_NSEC_ARCH,CLOCK_TICK_RATE)/ \
+ (long long)1000000000)
+extern inline int nsec_to_arch_cycles(long nsec)
+{
+ return (mpy_ex32(nsec,nsec_to_arch));
+}
+/*
+ * If this is defined otherwise to allow NTP adjusting, it should
+ * be scaled by about 16 bits (or so) to allow small percentage
+ * changes
+ */
+#define arch_cycles_to_latch(x) x
+/*
+ * This function updates base_c0
+ * This function is always called under the write_lock_irq(&xtime_lock)
+ * It returns the number of "clocks" since the last call to it.
+ *
+ * There is a problem having a counter that has a period the same as it is
+ * interagated. I.e. did it just roll over or has a very short time really
+ * elapsed. (One of the reasons one should not use the PIT for both ints
+ * and time.) We will take the occurance of an interrupt since last time
+ * to indicate that the counter has reset. This will work for the
+ * get_cpuctr() code but is flawed for the quick_get_cpuctr() as it is
+ * called when ever time is requested. For that code, we make sure that
+ * we never move backward in time.
+ */
+extern inline unsigned long get_cpuctr(void)
+{
+ int c0;
+ long rtn;
+
+ spin_lock(&i8253_lock);
+ c0 = read_timer_chip();
+
+ rtn = base_c0 - c0 + _timer_latch_reset;
+
+// if (rtn < 0) {
+// rtn += _timer_latch_reset;
+// }
+ base_c0 = c0;
+ base_c0_offset = 0;
+ spin_unlock(&i8253_lock);
+
+ return rtn;
+}
+/*
+ * In an SMP system this is called under the read_lock_irq(xtime_lock)
+ * In a UP system it is also called with this lock (PIT case only)
+ * It returns the number of "clocks" since the last call to get_cpuctr (above).
+ */
+extern inline unsigned long quick_get_cpuctr(void)
+{
+ register int c0;
+ long rtn;
+
+ spin_lock(&i8253_lock);
+ c0 = read_timer_chip();
+ /*
+ * If the new count is greater than
+ * the last one (base_c0) the chip has just rolled and an
+ * interrupt is pending. To get the time right. We need to add
+ * _timer_latch_reset to the answer. All this is true if only
+ * one roll is involved, but base_co should be updated at least
+ * every 1/HZ.
+ */
+ rtn = base_c0 - c0;
+ if (rtn < base_c0_offset) {
+ rtn += _timer_latch_reset;
+ }
+ base_c0_offset = rtn;
+ spin_unlock(&i8253_lock);
+ return rtn;
+}
+
+#ifdef _INCLUDED_FROM_TIME_C
+int base_c0 = 0;
+int base_c0_offset = 0;
+struct timer_conversion_bits timer_conversion_bits = {
+ _cycles_per_jiffies: (LATCH),
+ _nsec_to_arch: cf_nsec_to_arch,
+ _usec_to_arch: cf_usec_to_arch,
+ _arch_to_nsec: cf_arch_to_nsec,
+ _arch_to_usec: cf_arch_to_usec,
+ _arch_to_latch: 1
+};
+int _timer_latch_reset;
+
+#define set_last_timer_cc() (void)(1)
+
+/* This returns the correct cycles_per_sec from a calibrated one
+ */
+#define arch_hrtime_init(x) (CLOCK_TICK_RATE)
+
+/*
+ * The reload_timer_chip routine is called under the timerlist lock (irq off)
+ * and, in SMP, the xtime_lock. We also take the i8253_lock for the chip access
+ */
+
+extern inline void reload_timer_chip( int new_latch_value)
+{
+ int c1, c1new, delta;
+ unsigned char pit_status;
+ /*
+ * In put value is in timer units for the 386 platform.
+ * We must be called with irq disabled.
+ */
+ spin_lock(&i8253_lock);
+ /*
+ * we need to get this last value of the timer chip
+ */
+ LATCH_CNT0_AND_CNT1();
+ READ_CNT0(delta);
+ READ_CNT1(c1);
+ base_c0 -= delta;
+
+ new_latch_value = arch_cycles_to_latch( new_latch_value );
+ if (new_latch_value < TIMER_DELTA){
+ new_latch_value = TIMER_DELTA;
+ }
+ IF_ALL_PERIODIC( put_timer_in_periodic_mode());
+ outb_p(new_latch_value & 0xff, PIT0); /* LSB */
+ outb(new_latch_value >> 8, PIT0); /* MSB */
+ do {
+ outb_p(PIT0_LATCH_STATUS,PIT_COMMAND);
+ pit_status = inb(PIT0);
+ }while (pit_status & PIT_NULL_COUNT);
+ do {
+ LATCH_CNT0_AND_CNT1();
+ READ_CNT0(delta);
+ READ_CNT1(c1new);
+ } while (!(((new_latch_value-delta)&0xffff) < 15));
+
+ IF_ALL_PERIODIC(
+ outb_p(LATCH & 0xff, PIT0); /* LSB */
+ outb(LATCH >> 8, PIT0); /* MSB */
+ )
+
+ /*
+ * this is assuming that counter one is latched on with
+ * 18 as the value
+ * Most BIOSes do this i guess....
+ */
+ //IF_DEBUG(if (delta > 50000) BREAKPOINT);
+ c1 -= c1new;
+ base_c0 += ((c1 < 0) ? (c1 + 18) : (c1)) + delta;
+ if ( base_c0 < 0 ){
+ base_c0 += _timer_latch_reset;
+ }
+ spin_unlock(&i8253_lock);
+ return;
+}
+/*
+ * No run time conversion factors need to be set up as the PIT has a fixed
+ * speed.
+ */
+#define init_hrtimers()
+
+#endif /* _INCLUDED_FROM_HRTIME_C_ */
+#define final_clock_init()
+#endif /* __KERNEL__ */
+#endif /* _ASM_HRTIME_M386_H */
+
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.41-bk2-core/include/asm-i386/hrtime-M586.h linux/include/asm-i386/hrtime-M586.h
--- linux-2.5.41-bk2-core/include/asm-i386/hrtime-M586.h Wed Dec 31 16:00:00 1969
+++ linux/include/asm-i386/hrtime-M586.h Wed Oct 9 14:08:47 2002
@@ -0,0 +1,165 @@
+/*
+ * UTIME: On-demand Microsecond Resolution Timers
+ * ----------------------------------------------
+ *
+ * File: include/asm-i586/hrtime-Macpi.h
+ * Copyright (C) 1999 by the University of Kansas Center for Research, Inc.
+ * Copyright (C) 2001 by MontaVista Software.
+ *
+ * This software was developed by the Information and
+ * Telecommunication Technology Center (ITTC) at the University of
+ * Kansas. Partial funding for this project was provided by Sprint. This
+ * software may be used and distributed according to the terms of the GNU
+ * Public License, incorporated herein by reference. Neither ITTC nor
+ * Sprint accept any liability whatsoever for this product.
+ *
+ * This project was developed under the direction of Dr. Douglas Niehaus.
+ *
+ * Authors: Balaji S., Raghavan Menon
+ * Furquan Ansari, Jason Keimig, Apurva Sheth
+ *
+ * Please send bug-reports/suggestions/comments to [email protected]
+ *
+ * Further details about this project can be obtained at
+ * http://hegel.ittc.ukans.edu/projects/utime/
+ * or in the file Documentation/utime.txt
+ */
+/*
+ * This code swiped from the utime project to support high res timers
+ * Principle thief George Anzinger [email protected]
+ */
+#include <asm/msr.h>
+#ifndef _ASM_HRTIME_M586_H
+#define _ASM_HRTIME_M586_H
+
+#ifdef __KERNEL__
+
+#ifdef _INCLUDED_FROM_TIME_C
+/*
+ * This gets redefined when we calibrate the TSC
+ */
+struct timer_conversion_bits timer_conversion_bits = {
+ _cycles_per_jiffies: LATCH
+};
+#endif
+
+/*
+ * This define avoids an ugly ifdef in time.c
+ */
+#define get_cpuctr_from_timer_interrupt()
+#define timer_latch_reset(s)
+
+/* NOTE: When trying to port this to other architectures define
+ * this to be (void)(1) (ie. #define set_last_timer_cc() (void)(1))
+ * otherwise sched.c would give an undefined reference
+ */
+
+// think this is old cruft... extern void set_last_timer_cc(void);
+/*
+ * These are specific to the pentium counters
+ */
+extern inline unsigned long get_cpuctr(void)
+{
+ /*
+ * We are interested only in deltas so we just use the low bits
+ * at 1GHZ this should be good for 4.2 seconds, at 100GHZ 42 ms
+ */
+ unsigned long old = last_update;
+ rdtscl(last_update);
+ return last_update - old;
+}
+extern inline unsigned long quick_get_cpuctr(void)
+{
+ unsigned long value;
+ rdtscl(value);
+ return value - last_update;
+}
+#define arch_hrtime_init(x) (x)
+
+extern unsigned long long base_cpuctr;
+extern unsigned long base_jiffies;
+/*
+ * We use various scaling. The sc32 scales by 2**32, sc_n by the first parm.
+ * When working with constants, choose a scale such that x/n->(32-scale)< 1/2.
+ * So for 1/3 <1/2 so scale of 32, where as 3/1 must be shifted 3 times (3/8) to
+ * be less than 1/2 so scale should be 29
+ *
+ * The principle high end is when we can no longer keep 1/HZ worth of arch
+ * time (TSC counts) in an integer. This will happen somewhere between 40GHz and
+ * 50GHz with HZ set to 100. For now we are cool and the scale of 24 works for
+ * the nano second to arch from 2MHz to 40+GHz.
+ */
+#define HR_TIME_SCALE_NSEC 22
+#define HR_TIME_SCALE_USEC 14
+extern inline int arch_cycles_to_usec(unsigned long update)
+{
+ return (mpy_sc32(update ,arch_to_usec));
+}
+/*
+ * We use the same scale for both the pit and the APIC
+ */
+extern inline int arch_cycles_to_latch(unsigned long update)
+{
+ return (mpy_sc32(update ,arch_to_latch));
+}
+#define compute_latch(APIC_clocks_jiffie) arch_to_latch = \
+ div_sc32(APIC_clocks_jiffie, \
+ cycles_per_jiffies);
+
+extern inline int arch_cycles_to_nsec(long update)
+{
+ return mpy_sc_n(HR_TIME_SCALE_NSEC, update, arch_to_nsec);
+}
+/*
+ * And the other way...
+ */
+extern inline int usec_to_arch_cycles(unsigned long usec)
+{
+ return mpy_sc_n(HR_TIME_SCALE_USEC,usec,usec_to_arch);
+}
+extern inline int nsec_to_arch_cycles(unsigned long nsec)
+{
+ return mpy_sc_n(HR_TIME_SCALE_NSEC,nsec,nsec_to_arch);
+}
+
+EXTERN int pit_pgm_correction;
+
+#ifdef _INCLUDED_FROM_TIME_C
+
+#include <asm/io.h>
+
+
+#ifndef USEC_PER_SEC
+#define USEC_PER_SEC 1000000
+#endif
+ /*
+ * Code for runtime calibration of high res timers
+ * Watch out, cycles_per_sec will overflow when we
+ * get a ~ 2.14 GHz machine...
+ * We are starting with tsc_cycles_per_5_jiffies set to
+ * 5 times the actual value (as set by
+ * calibrate_tsc() ).
+ */
+#define init_hrtimers() \
+ arch_to_usec = fast_gettimeoffset_quotient; \
+ \
+ arch_to_latch = div_ll_X_l(mpy_l_X_l_ll(fast_gettimeoffset_quotient, \
+ CLOCK_TICK_RATE), \
+ (USEC_PER_SEC)); \
+\
+ arch_to_nsec = div_sc_n(HR_TIME_SCALE_NSEC, \
+ CALIBRATE_TIME * NSEC_PER_USEC, \
+ tsc_cycles_per_5_jiffies); \
+ \
+ nsec_to_arch = div_sc_n(HR_TIME_SCALE_NSEC, \
+ tsc_cycles_per_5_jiffies, \
+ CALIBRATE_TIME * NSEC_PER_USEC); \
+ usec_to_arch = div_sc_n(HR_TIME_SCALE_USEC, \
+ tsc_cycles_per_5_jiffies, \
+ CALIBRATE_TIME ); \
+ cycles_per_jiffies = tsc_cycles_per_5_jiffies / CAL_JIFS;
+
+
+#endif /* _INCLUDED_FROM_HRTIME_C */
+#endif /* __KERNEL__ */
+#endif /* _ASM_HRTIME-M586_H */
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.41-bk2-core/include/asm-i386/hrtime-Macpi.h linux/include/asm-i386/hrtime-Macpi.h
--- linux-2.5.41-bk2-core/include/asm-i386/hrtime-Macpi.h Wed Dec 31 16:00:00 1969
+++ linux/include/asm-i386/hrtime-Macpi.h Wed Oct 9 14:08:47 2002
@@ -0,0 +1,214 @@
+/*
+ *
+ * File: include/asm-i386/hrtime-Macpi.h
+ * Copyright (C) 2001 by MontaVista Software,
+
+ * This software may be used and distributed according to the terms of
+ * the GNU Public License, incorporated herein by reference.
+
+ */
+#include <asm/msr.h>
+#include <asm/io.h>
+#ifndef _ASM_HRTIME_Macpi_H
+#define _ASM_HRTIME_Macpi_H
+
+#ifdef __KERNEL__
+
+/*
+ * This define avoids an ugly ifdef in time.c
+ */
+#define timer_latch_reset(s)
+
+/* NOTE: When trying to port this to other architectures define
+ * this to be (void)(1) (ie. #define set_last_timer_cc() (void)(1))
+ * otherwise sched.c would give an undefined reference
+ */
+
+extern void set_last_timer_cc(void);
+/*
+ * These are specific to the ACPI pm counter
+ * The spec says the counter can be either 32 or 24 bits wide. We treat them
+ * both as 24 bits. Its faster than doing the test.
+ */
+#define SIZE_MASK 0xffffff
+
+extern int acpi_pm_tmr_address;
+
+extern inline unsigned long get_cpuctr(void)
+{
+ static long old;
+
+ old = last_update;
+ last_update = inl(acpi_pm_tmr_address);
+ return (last_update - old) & SIZE_MASK;
+}
+extern inline unsigned long quick_get_cpuctr(void)
+{
+ return (inl(acpi_pm_tmr_address) - last_update) & SIZE_MASK;
+}
+#define arch_hrtime_init(x) (x)
+
+
+/*
+ * We use various scaling. The sc32 scales by 2**32, sc_n by the first parm.
+ * When working with constants, choose a scale such that x/n->(32-scale)< 1/2.
+ * So for 1/3 <1/2 so scale of 32, where as 3/1 must be shifted 3 times (3/8) to
+ * be less than 1/2 so scale should be 29
+ *
+ */
+#define HR_SCALE_ARCH_NSEC 22
+#define HR_SCALE_ARCH_USEC 32
+#define HR_SCALE_NSEC_ARCH 32
+#define HR_SCALE_USEC_ARCH 29
+
+#ifndef PM_TIMER_FREQUENCY
+#define PM_TIMER_FREQUENCY 3579545/*45 counts per second */
+#endif
+#define PM_TIMER_FREQUENCY_x_100 357954545 /* counts per second * 100*/
+
+#define cf_arch_to_usec (SC_32(100000000)/(long long)PM_TIMER_FREQUENCY_x_100)
+extern inline int arch_cycles_to_usec(unsigned long update)
+{
+ return (mpy_sc32(update ,arch_to_usec));
+}
+#ifndef CONFIG_
+/*
+ * We need to take 1/3 of the presented value (or more exactly)
+ * CLOCK_TICK_RATE /PM_TIMER_FREQUENCY. Note that these two timers
+ * are on the same cyrstal so will be EXACTLY 1/3.
+ */
+#define cf_arch_to_latch SC_32(CLOCK_TICK_RATE)/(long long)(CLOCK_TICK_RATE * 3)
+extern inline int arch_cycles_to_latch(unsigned long update)
+{
+ return (mpy_sc32(update ,arch_to_latch));
+}
+#else
+/*
+ * APIC clocks run from a low of 33MH to say 200MH. The PM timer
+ * runs about 3.5 MH. We want to scale so that ( APIC << scale )/PM
+ * is less 2 ^ 32. Lets use 2 ^ 19, leaves plenty of room.
+ */
+#define HR_SCALE_ARCH_LATCH 19
+
+#define compute_latch(APIC_clocks_jiffie) arch_to_latch = div_sc_n( \
+ HR_SCALE_ARCH_LATCH, \
+ APIC_clocks_jiffie, \
+ cycles_per_jiffies);
+extern inline int arch_cycles_to_latch(unsigned long update)
+{
+ return (mpy_sc_n(HR_SCALE_ARCH_LATCH, update ,arch_to_latch));
+}
+
+#endif
+
+#define cf_arch_to_nsec (SC_n(HR_SCALE_ARCH_NSEC,100000000000LL)/ \
+ (long long)PM_TIMER_FREQUENCY_x_100)
+
+extern inline int arch_cycles_to_nsec(long update)
+{
+ return mpy_sc_n(HR_SCALE_ARCH_NSEC, update, arch_to_nsec);
+}
+/*
+ * And the other way...
+ */
+#define cf_usec_to_arch (SC_n( HR_SCALE_USEC_ARCH,PM_TIMER_FREQUENCY_x_100)/ \
+ (long long)100000000)
+extern inline int usec_to_arch_cycles(unsigned long usec)
+{
+ return mpy_sc_n(HR_SCALE_USEC_ARCH,usec,usec_to_arch);
+}
+#define cf_nsec_to_arch (SC_n( HR_SCALE_NSEC_ARCH,PM_TIMER_FREQUENCY)/ \
+ (long long)1000000000)
+extern inline int nsec_to_arch_cycles(unsigned long nsec)
+{
+ return mpy_sc32(nsec,nsec_to_arch);
+}
+
+//EXTERN int pit_pgm_correction;
+
+#ifdef _INCLUDED_FROM_TIME_C
+
+#include <asm/io.h>
+struct timer_conversion_bits timer_conversion_bits = {
+ _cycles_per_jiffies: ((PM_TIMER_FREQUENCY + HZ/2) / HZ),
+ _nsec_to_arch: cf_nsec_to_arch,
+ _usec_to_arch: cf_usec_to_arch,
+ _arch_to_nsec: cf_arch_to_nsec,
+ _arch_to_usec: cf_arch_to_usec,
+ _arch_to_latch: cf_arch_to_latch
+};
+int acpi_pm_tmr_address;
+
+
+/*
+ * No run time conversion factors need to be set up as the pm timer has a fixed
+ * speed.
+ */
+/*
+ * Here we have a local udelay for our init use only. The system delay has
+ * has not yet been calibrated when we use this, however, we do know
+ * tsc_cycles_per_5_jiffies...
+ */
+extern unsigned long tsc_cycles_per_5_jiffies;
+
+static inline __init void hrt_udelay(int usec)
+{
+ long now,end;
+ rdtscl(end);
+ end += (usec * tsc_cycles_per_5_jiffies) / (USEC_PER_JIFFIES * 5);
+ do {rdtscl(now);} while((end - now) > 0);
+
+}
+extern int hrt_get_acpi_pm_ptr(void);
+
+#if defined( CONFIG_HIGH_RES_TIMER_ACPI_PM_ADD) && CONFIG_HIGH_RES_TIMER_ACPI_PM_ADD > 0
+#define default_pm_add CONFIG_HIGH_RES_TIMER_ACPI_PM_ADD
+#define message "High-res-timers: ACPI pm timer not found. Trying specified address %d\n"
+#else
+#define default_pm_add 0
+#define message \
+ "High-res-timers: ACPI pm timer not found(%d) and no backup."\
+ "\nCheck BIOS settings or supply a backup. See configure documentation.\n"
+#endif
+#define fail_message \
+"High-res-timers: >-<--><-->-<-->-<-->-<--><-->-<-->-<-->-<-->-<-->-<-->-<-->-<\n"\
+"High-res-timers: >Failed to find the ACPI pm timer <\n"\
+"High-res-timers: >-<--><-->-<-->-<-->-<-->Boot will fail in Calibrate Delay <\n"\
+"High-res-timers: >Supply a valid default pm timer address <\n"\
+"High-res-timers: >or get your BIOS to turn on ACPI support. <\n"\
+"High-res-timers: >See CONFIGURE help for more information. <\n"\
+"High-res-timers: >-<--><-->-<-->-<-->-<--><-->-<-->-<-->-<-->-<-->-<-->-<-->-<\n"
+/*
+ * After we get the address, we set last_update to the current timer value
+ */
+static inline __init void init_hrtimers(void)
+{
+ acpi_pm_tmr_address = hrt_get_acpi_pm_ptr();
+ if (!acpi_pm_tmr_address){
+ printk(message,default_pm_add);
+ if ( (acpi_pm_tmr_address = default_pm_add)){
+ last_update += quick_get_cpuctr();
+ hrt_udelay(4);
+ if (!quick_get_cpuctr()){
+ printk("High-res-timers: No ACPI pm timer found at %d.\n",
+ acpi_pm_tmr_address);
+ acpi_pm_tmr_address = 0;
+ }
+ }
+ }else{
+ if (default_pm_add != acpi_pm_tmr_address) {
+ printk("High-res-timers: Ignoring supplied default ACPI pm timer address.\n");
+ }
+ last_update += quick_get_cpuctr();
+ }
+ if (!acpi_pm_tmr_address){
+ printk(fail_message);
+ }else{
+ printk("High-res-timers: Found ACPI pm timer at %d\n",
+ acpi_pm_tmr_address);
+ }
+}
+
+#endif /* _INCLUDED_FROM_TIME_C_ */
+#endif /* __KERNEL__ */
+#endif /* _ASM_HRTIME-Mapic_H */
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.41-bk2-core/include/asm-i386/hrtime.h linux/include/asm-i386/hrtime.h
--- linux-2.5.41-bk2-core/include/asm-i386/hrtime.h Wed Dec 31 16:00:00 1969
+++ linux/include/asm-i386/hrtime.h Wed Oct 9 14:08:47 2002
@@ -0,0 +1,482 @@
+/*
+ *
+ * File: include/asm-i386/hrtime.h
+ * Copyright (C) 1999 by the University of Kansas Center for Research, Inc.
+ * Copyright (C) 2001 by MontaVista Software.
+ *
+ * This software was developed by the Information and
+ * Telecommunication Technology Center (ITTC) at the University of
+ * Kansas. Partial funding for this project was provided by Sprint. This
+ * software may be used and distributed according to the terms of the GNU
+ * Public License, incorporated herein by reference. Neither ITTC nor
+ * Sprint accept any liability whatsoever for this product.
+ *
+ * This project was developed under the direction of Dr. Douglas Niehaus.
+ *
+ * Authors: Balaji S., Raghavan Menon
+ * Furquan Ansari, Jason Keimig, Apurva Sheth
+ *
+ * Please send bug-reports/suggestions/comments to [email protected]
+ *
+ * Further details about this project can be obtained at
+ * http://hegel.ittc.ukans.edu/projects/utime/
+ * or in the file Documentation/high-res-timers/
+ */
+/*
+ * This code purloined from the utime project for high res timers.
+ * Principle modifier George Anzinger [email protected]
+ */
+#ifndef _I386_HRTIME_H
+#define _I386_HRTIME_H
+#ifdef __KERNEL__
+
+#include <linux/config.h> /* for CONFIG_APM etc... */
+#include <asm/types.h> /* for u16s */
+#include <asm/io.h>
+#include <asm/sc_math.h> /* scaling math routines */
+#include <asm/delay.h>
+/*
+ * What "IF_ALL_PERIODIC" does it to set up the PIT so that it always,
+ * if we don't touch it again, will tick at a 1/HZ rate. This is done
+ * by programing the interrupt we want and, once it it loaded, dropping
+ * a 1/HZ program on top of it. The PIT will give us the desired interrupt
+ * and, at interrupt time, load the 1/HZ program. So...
+
+ * If no sub 1/HZ ticks are needed AND we are aligned with the 1/HZ
+ * boundry, we don't need to touch the PIT. Otherwise we do the above.
+
+ * In theory you could turn this off, but it has been so long....
+
+ * There are two reasons to keep this:
+ * 1. The NMI watchdog uses the timer interrupt to generate the NMI interrupts.
+ * 2. We don't have to touch the PIT unless we have a sub jiffie event in
+ * the next 1/HZ interval (unless we drift away from the 1/HZ boundry).
+ */
+#if 1
+#define IF_ALL_PERIODIC(a) a
+#else
+#define IF_ALL_PERIODIC(a)
+#endif
+
+
+/*
+ * The high-res-timers option is set up to self configure with different
+ * platforms. It is up to the platform to provide certian macros which
+ * override the default macros defined in system without (or with disabled)
+ * high-res-timers.
+ *
+ * To do high-res-timers at some fundamental level the timer interrupt must
+ * be seperated from the time keeping tick. A tick can still be generated
+ * by the timer interrupt, but it may be surrounded by non-tick interrupts.
+ * It is up to the platform to determine if a particular interrupt is a tick,
+ * and up to the timer code (in timer.c) to determine what time events have
+ * expired.
+ *
+ * Macros:
+ * update_jiffies() This macro is to compute the new value of jiffie and
+ * sub_jiffie. If high-res-timers are not available it
+ * may be assumed that this macro will be called once
+ * every 1/HZ and so should reduce to:
+ *
+ * (*(u64 *)&jiffies_64)++;
+ *
+ * sub_jiffie, in this case will always be zero, and need not be addressed.
+ * It is assumed that the sub_jiffie is in platform defined units and runs
+ * from 0 to a value which represents 1/HZ on that platform. (See conversion
+ * macro requirements below.)
+ * If high-res-timers are available, this macro will be called each timer
+ * interrupt which may be more often than 1/HZ. It is up to the code to
+ * determine if a new jiffie has just started and pass this info to:
+ *
+ * new_jiffie() which should return true if the last call to update_jiffie()
+ * moved the jiffie count (as apposed to just the sub_jiffie).
+ * For systems without high-res-timers the kernel will predefine
+ * this to be 0 which will allow the compiler to optimize the code
+ * for this case. In SMP systems this should be set to all 1's
+ * as it is used in a per cpu fashion to indicate that a paricular
+ * cpu needs to run the accounting code. It should result
+ * in a variable that can be cast to a volital long and of
+ * which the address can be taken.
+ *
+ * schedule_next_int(jiffie_f,sub_jiffie_v,always) is a macro that the
+ * platform should
+ * provide that will set up the timer interrupt
+ * hardware to interrupt at the absolute time
+ * defined by jiffie_f,sub_jiffie_v where the
+ * units are 1/HZ and the platform defined
+ * sub_jiffie unit. This function must
+ * determine the actual current time and the
+ * requested offset and act accordingly. A
+ * sub_jiffie_v value of -1 should be
+ * understood to mean the next even jiffie
+ * regardless of the jiffie_f value. If
+ * the current jiffie is not jiffie_f, it
+ * may be assumed that the requested time
+ * has passed and an immeadiate interrupt
+ * should be taken. If high-res-timers are
+ * not available, this macro should evaluate
+ * to nil. This macro may return 1 if always
+ * if false AND the requested time has passed.
+ * "Always" indicates that an interrupt is
+ * required even if the time has already passed.
+ */
+
+
+/*
+ * no of usecs less than which events cannot be scheduled
+ */
+#define TIMER_DELTA 5
+
+#ifdef _INCLUDED_FROM_TIME_C
+#define EXTERN
+int timer_delta = TIMER_DELTA;
+#else
+#define EXTERN extern
+extern int timer_delta;
+#endif
+
+#define CONFIG_HIGH_RES_RESOLUTION 1000 // nano second resolution
+ // we will use for high res.
+
+#define USEC_PER_JIFFIES (1000000/HZ)
+/*
+ * This is really: x*(CLOCK_TICK_RATE+HZ/2)/1000000
+ * Note that we can not figure the constant part at
+ * compile time because we would loose precision.
+ */
+#define PIT0_LATCH_STATUS 0xc2
+#define PIT0 0x40
+#define PIT1 0x41
+#define PIT_COMMAND 0x43
+#define PIT0_ONE_SHOT 0x38
+#define PIT0_PERIODIC 0x34
+#define PIT0_LATCH_COUNT 0xd2
+#define PIT01_LATCH_COUNT 0xd6
+#define PIT_NULL_COUNT 0x40
+#define READ_CNT0(varr) {varr = inb(PIT0);varr += (inb(PIT0))<<8;}
+#define READ_CNT1(var) { var = inb(PIT1); }
+#define LATCH_CNT0() { outb(PIT0_LATCH_COUNT,PIT_COMMAND); }
+#define LATCH_CNT0_AND_CNT1() { outb(PIT01_LATCH_COUNT,PIT_COMMAND); }
+
+#define TO_LATCH(x) (((x)*LATCH)/USEC_PER_JIFFIES)
+
+#define sub_jiffie() _sub_jiffie
+#define schedule_next_int(a,b,c) _schedule_next_int(a,b,c)
+
+#define update_jiffies() update_jiffies_sub()
+#define new_jiffie() _new_jiffie
+#define high_res_test() high_res_test_val = - cycles_per_jiffies;
+#define high_res_end_test() high_res_test_val = 0;
+
+extern unsigned long next_intr;
+extern spinlock_t i8253_lock;
+extern rwlock_t xtime_lock;
+
+extern int _schedule_next_int(unsigned long jiffie_f,long sub_jiffie_in, int always);
+
+extern unsigned int volatile latch_reload;
+
+EXTERN int jiffies_intr;
+EXTERN long volatile _new_jiffie;
+EXTERN int _sub_jiffie;
+EXTERN unsigned long volatile last_update;
+EXTERN int high_res_test_val;
+
+#ifndef CONFIG_HIGH_RES_TIMER_PIT
+IF_ALL_PERIODIC(
+ EXTERN int min_hz_sub_jiffie;
+ EXTERN int max_hz_sub_jiffie;
+ EXTERN int _last_was_long[NR_CPUS];
+ )
+#endif
+
+extern inline void start_PIT(void)
+{
+ spin_lock(&i8253_lock);
+ outb_p(PIT0_PERIODIC, PIT_COMMAND);
+ outb_p(LATCH & 0xff, PIT0); /* LSB */
+ outb(LATCH >> 8, PIT0); /* MSB */
+ spin_unlock(&i8253_lock);
+}
+/*
+ * Now go ahead and include the clock specific file 586/386/acpi
+ * These asm files have extern inline functions to do a lot of
+ * stuff as well as the conversion routines.
+ */
+#ifdef CONFIG_HIGH_RES_TIMER_ACPI_PM
+#include <asm/hrtime-Macpi.h>
+#elif defined(CONFIG_HIGH_RES_TIMER_PIT)
+#include <asm/hrtime-M386.h>
+#elif defined(CONFIG_HIGH_RES_TIMER_TSC)
+#include <asm/hrtime-M586.h>
+#else
+#error "Need one of: CONFIG_HIGH_RES_TIMER_ACPI_PM CONFIG_HIGH_RES_TIMER_TSC CONFIG_HIGH_RES_TIMER_ACPI_PM"
+#endif
+
+extern unsigned long long jiffiesll;
+
+/*
+ * We stole this routine from the Utime code, but there it
+ * calculated microseconds and here we calculate sub_jiffies
+ * which have (in this case) units of TSC count. (If there
+ * is no TSC, see hrtime-M386.h where a different unit
+ * is used. This allows the more expensive math (to get
+ * standard units) to be done only when needed. Also this
+ * makes it as easy (and as efficient) to calculate nano
+ * as well as micro seconds.
+ */
+
+extern inline void arch_update_jiffies (unsigned long update)
+{
+ /*
+ * update is the delta in sub_jiffies
+ */
+ _sub_jiffie += update;
+ while ((unsigned long)_sub_jiffie > cycles_per_jiffies){
+ _sub_jiffie -= cycles_per_jiffies;
+ _new_jiffie = ~0;
+ jiffies_intr++;
+ jiffies_64++;
+ }
+}
+#define SC_32_TO_USEC (SC_32(1000000)/ (long long)CLOCK_TICK_RATE)
+
+
+
+/*
+ * This routine is always called under the write_lockirq(xtime_lock)
+ */
+extern inline void update_jiffies_sub(void)
+{
+ unsigned long cycles_update;
+
+ cycles_update = get_cpuctr();
+
+
+ arch_update_jiffies(cycles_update);
+ /*
+ * In the ALL_PERIODIC mode we program the PIT to give periodic
+ * interrupts and, if no sub_jiffie timers are due, leave it alone.
+ * This means that it can drift WRT the clock (TSC or pm timer).
+ * What we are trying to do is to program the next interrupt to
+ * occure on exactly the requested time. If we are not doing
+ * sub HZ interrupts we expect to find a small excess of time
+ * beyond the 1/HZ, i.e. _sub_jiffie will have some small value.
+ * This value will drift AND may jump upward from time to time.
+ * The drift is due to not having precise tracking between the
+ * two timers (the PIT and either the TSC or the PM timer) and
+ * the jump is caused by interrupt delays, cache misses etc.
+ * We need to correct for the drift. To correct all we need to
+ * do is to set "last_was_long" to zero and a new timer program
+ * will be started to "do the right thing".
+
+ * Detecting the need to do this correction is another issue.
+ * Here is what we do:
+ * Each interrupt where last_was_long is !=0 (indicates the
+ * interrupt should be on a 1/HZ boundry) we check the resulting
+ * _sub_jiffie. If it is smaller than some MIN value, we do
+ * the correction. (Note that drift that makes the value
+ * smaller is the easy one.) We also require that
+ * _sub_jiffie <= some max at least once over a period of 1 second.
+ * I.e. with HZ = 100, we will allow up to 99 "late" interrupts
+ * before we do a correction.
+
+ * The values we use for min_hz_sub_jiffie and max_hz_sub_jiffie
+ * depend on the units and we will start by, during boot,
+ * observing what MIN appears to be. We will set max_hz_sub_jiffie
+ * to be about 100 machine cycles more than this.
+
+ * Note that with min_hz_sub_jiffie and max_hz_sub_jiffie
+ * set to 0, this code will reset the PIT every HZ.
+ */
+#ifndef CONFIG_HIGH_RES_TIMER_PIT
+ IF_ALL_PERIODIC(
+ {
+ int *last_was_long = &_last_was_long[smp_processor_id()];
+ if ( ! *last_was_long )
+ return;
+ if ( _sub_jiffie < min_hz_sub_jiffie ){
+ *last_was_long = 0;
+ return;
+ }
+ if (_sub_jiffie <= max_hz_sub_jiffie) {
+ *last_was_long = 1;
+ return;
+ }
+ if ( ++*last_was_long > HZ ){
+ *last_was_long = 0;
+ return;
+ }
+ }
+ )
+#endif
+}
+
+/*
+ * quick_update_jiffies_sub returns the sub_jiffie offset of
+ * current time from the "ref_jiff" jiffie value. We do this
+ * with out updating any memory values and thus do not need to
+ * take any locks, if we are careful.
+ *
+ * I don't know how to eliminate the lock in the SMP case, so..
+ * Oh, and also the PIT case requires a lock anyway, so..
+ */
+#if defined (CONFIG_SMP) || defined(CONFIG_HIGH_RES_TIMER_PIT)
+static inline void get_rat_jiffies(unsigned long *jiffies_f,long * _sub_jiffie_f,unsigned long *update)
+{
+ unsigned long flags;
+
+ read_lock_irqsave(&xtime_lock, flags);
+ *jiffies_f = jiffies;
+ *_sub_jiffie_f = _sub_jiffie;
+ *update = quick_get_cpuctr();
+ read_unlock_irqrestore(&xtime_lock, flags);
+}
+#else
+static inline void get_rat_jiffies(unsigned long *jiffies_f,long *_sub_jiffie_f,unsigned long *update)
+{
+ unsigned long last_update_f;
+ do {
+ *jiffies_f = jiffies;
+ last_update_f = last_update;
+ barrier();
+ *_sub_jiffie_f = _sub_jiffie;
+ *update = quick_get_cpuctr();
+ barrier();
+ }while (*jiffies_f != jiffies || last_update_f != last_update);
+}
+#endif /* CONFIG_SMP */
+
+/*
+ * If smp, this must be called with the read_lockirq(&xtime_lock) held.
+ * No lock is needed if not SMP.
+ */
+
+extern inline long quick_update_jiffies_sub(unsigned long ref_jiff)
+{
+ unsigned long update;
+ unsigned long rtn;
+ unsigned long jiffies_f;
+ long _sub_jiffie_f;
+
+
+ get_rat_jiffies( &jiffies_f,&_sub_jiffie_f,&update);
+
+ rtn = _sub_jiffie_f + (unsigned long) update;
+ rtn += (jiffies_f - ref_jiff) * cycles_per_jiffies;
+ return rtn;
+
+}
+#ifdef CONFIG_X86_LOCAL_APIC
+/*
+ * If we have a local APIC, we will use its counter to get the needed
+ * interrupts. Here is where we program it.
+ */
+
+extern void __setup_APIC_LVTT( unsigned int );
+
+extern inline void reload_timer_chip( int new_latch_value)
+{
+ int new_latch = arch_cycles_to_latch( new_latch_value );
+ /*
+ * We may want to do more in line code for speed here.
+ * For now, however...
+
+ * Note: The interrupt routine presets the counter for 1/HZ
+ * each interrupt so we only deal with requested shorter times
+ * either due to timer requests or drift.
+ */
+ if ( new_latch < timer_delta) new_latch = timer_delta;
+ __setup_APIC_LVTT(new_latch);
+}
+
+#endif
+#ifndef CONFIG_HIGH_RES_TIMER_PIT
+#ifndef CONFIG_X86_LOCAL_APIC
+extern inline void reload_timer_chip( int new_latch_value)
+{
+ IF_ALL_PERIODIC( unsigned char pit_status);
+ /*
+ * The input value is in arch cycles
+ * We must be called with irq disabled.
+ */
+
+ new_latch_value = arch_cycles_to_latch( new_latch_value );
+ if (new_latch_value < TIMER_DELTA){
+ new_latch_value = TIMER_DELTA;
+ }
+ spin_lock(&i8253_lock);
+ IF_ALL_PERIODIC(outb_p(PIT0_PERIODIC, PIT_COMMAND););
+ outb_p(new_latch_value & 0xff, PIT0); /* LSB */
+ outb(new_latch_value >> 8, PIT0); /* MSB */
+ IF_ALL_PERIODIC(
+ do {
+ outb_p(PIT0_LATCH_STATUS,PIT_COMMAND);
+ pit_status = inb(PIT0);
+ }while (pit_status & PIT_NULL_COUNT);
+ outb_p(LATCH & 0xff, PIT0); /* LSB */
+ outb(LATCH >> 8, PIT0); /* MSB */
+ )
+ spin_unlock(&i8253_lock);
+ return;
+}
+#endif // ! CONFIG_X86_LOCAL_APIC
+/*
+ * Time out for a discussion. Because the PIT and TSC (or the PIT and
+ * pm timer) may drift WRT each other, we need a way to get the jiffie
+ * interrupt to happen as near to the jiffie roll as possible. This
+ * insures that we will get the interrupt when the timer is to be
+ * delivered, not before (we would not deliver) or later, making the
+ * jiffie timers different from the sub_jiffie deliveries. We would
+ * also like any latency between a "requested" interrupt and the
+ * automatic jiffie interrupts from the PIT to be the same. Since it
+ * takes some time to set up the PIT, we assume that requested
+ * interrupts may be a bit late when compared to the automatic
+ * interrupts. When we request a jiffie interrupt, we want the
+ * interrupt to happen at the requested time, which will be a bit before
+ * we get to the jiffies update code.
+ *
+ * What we want to determine here is a.) how long it takes (min) to get
+ * from a requested interrupt to the jiffies update code and b.) how
+ * long it takes when the interrupt is automatic (i.e. from the PIT
+ * reset logic). When we set "last_was_long" to zero, the next tick
+ * setup code will "request" a jiffies interrupt (as long as we do not
+ * have any sub jiffie timers pending). The interrupt after the
+ * requested one will be automatic. Ignoring drift over this 2/HZ time
+ * we then get two latency values, the requested latency and the
+ * automatic latency. We set up the difference to correct the requested
+ * time and the second one as the center of a window which we will use
+ * to detect the need to resync the PIT. We do this for HZ ticks and
+ * take the min.
+ */
+#define NANOSEC_SYNC_LIMIT 2000 // Try for 2 usec. max drift
+#define final_clock_init() \
+ { unsigned long end = jiffies + HZ + HZ; \
+ int min_a = cycles_per_jiffies, min_b = cycles_per_jiffies; \
+ long flags; \
+ int * last_was_long = &_last_was_long[smp_processor_id()]; \
+ while (time_before(jiffies,end)){ \
+ unsigned long f_jiffies = jiffies; \
+ while (jiffies == f_jiffies); \
+ *last_was_long = 0; \
+ while (jiffies == f_jiffies + 1); \
+ read_lock_irqsave(&xtime_lock, flags); \
+ if ( _sub_jiffie < min_a) \
+ min_a = _sub_jiffie; \
+ read_unlock_irqrestore(&xtime_lock, flags); \
+ while (jiffies == f_jiffies + 2); \
+ read_lock_irqsave(&xtime_lock, flags); \
+ if ( _sub_jiffie < min_b) \
+ min_b = _sub_jiffie; \
+ read_unlock_irqrestore(&xtime_lock, flags); \
+ } \
+ min_hz_sub_jiffie = min_b - nsec_to_arch_cycles(NANOSEC_SYNC_LIMIT);\
+ if( min_hz_sub_jiffie < 0) min_hz_sub_jiffie = 0; \
+ max_hz_sub_jiffie = min_b + nsec_to_arch_cycles(NANOSEC_SYNC_LIMIT);\
+ timer_delta = arch_cycles_to_latch(usec_to_arch_cycles(TIMER_DELTA)); \
+ }
+
+
+#endif /* not CONFIG_HIGH_RES_TIMER_PIT */
+#endif /* __KERNEL__ */
+#endif /* _I386_HRTIME_H */
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.41-bk2-core/include/asm-i386/sc_math.h linux/include/asm-i386/sc_math.h
--- linux-2.5.41-bk2-core/include/asm-i386/sc_math.h Wed Dec 31 16:00:00 1969
+++ linux/include/asm-i386/sc_math.h Wed Oct 9 14:08:47 2002
@@ -0,0 +1,143 @@
+#ifndef SC_MATH
+#define SC_MATH
+#define MATH_STR(X) #X
+#define MATH_NAME(X) X
+
+/*
+ * Pre scaling defines
+ */
+#define SC_32(x) ((long long)x<<32)
+#define SC_n(n,x) (((long long)x)<<n)
+/*
+ * This routine preforms the following calculation:
+ *
+ * X = (a*b)>>32
+ * we could, (but don't) also get the part shifted out.
+ */
+extern inline long mpy_sc32(long a,long b)
+{
+ long edx;
+ __asm__("imull %2"
+ :"=a" (a), "=d" (edx)
+ :"rm" (b),
+ "0" (a));
+ return edx;
+}
+/*
+ * X = (a/b)<<32 or more precisely x = (a<<32)/b
+ */
+
+extern inline long div_sc32(long a, long b)
+{
+ long dum;
+ __asm__("divl %2"
+ :"=a" (b), "=d" (dum)
+ :"r" (b), "0" (0), "1" (a));
+
+ return b;
+}
+/*
+ * X = (a*b)>>24
+ * we could, (but don't) also get the part shifted out.
+ */
+
+#define mpy_ex24(a,b) mpy_sc_n(24,a,b)
+/*
+ * X = (a/b)<<24 or more precisely x = (a<<24)/b
+ */
+#define div_ex24(a,b) div_sc_n(24,a,b)
+
+/*
+ * The routines allow you to do x = (a/b) << N and
+ * x=(a*b)>>N for values of N from 1 to 32.
+ *
+ * These are handy to have to do scaled math.
+ * Scaled math has two nice features:
+ * A.) A great deal more precision can be maintained by
+ * keeping more signifigant bits.
+ * B.) Often an in line div can be repaced with a mpy
+ * which is a LOT faster.
+ */
+
+#define mpy_sc_n(N,aa,bb) ({long edx,a=aa,b=bb; \
+ __asm__("imull %2\n\t" \
+ "shldl $(32-"MATH_STR(N)"),%0,%1" \
+ :"=a" (a), "=d" (edx)\
+ :"rm" (b), \
+ "0" (a)); edx;})
+
+
+#define div_sc_n(N,aa,bb) ({long dum=aa,dum2,b=bb; \
+ __asm__("shrdl $(32-"MATH_STR(N)"),%4,%3\n\t" \
+ "sarl $(32-"MATH_STR(N)"),%4\n\t" \
+ "divl %2" \
+ :"=a" (dum2), "=d" (dum) \
+ :"rm" (b), "0" (0), "1" (dum)); dum2;})
+
+
+/*
+ * (long)X = ((long long)divs) / (long)div
+ * (long)rem = ((long long)divs) % (long)div
+ *
+ * Warning, this will do an exception if X overflows.
+ */
+#define div_long_long_rem(a,b,c) div_ll_X_l_rem(a,b,c)
+
+extern inline long div_ll_X_l_rem(long long divs, long div,long * rem)
+{
+ long dum2;
+ __asm__( "divl %2"
+ :"=a" (dum2), "=d" (*rem)
+ :"rm" (div), "A" (divs));
+
+ return dum2;
+
+}
+/*
+ * same as above, but no remainder
+ */
+extern inline long div_ll_X_l(long long divs, long div)
+{
+ long dum;
+ return div_ll_X_l_rem(divs,div,&dum);
+}
+/*
+ * (long)X = (((long)divh<<32) | (long)divl) / (long)div
+ * (long)rem = (((long)divh<<32) % (long)divl) / (long)div
+ *
+ * Warning, this will do an exception if X overflows.
+ */
+extern inline long div_h_or_l_X_l_rem(long divh,long divl, long div,long* rem)
+{
+ long dum2;
+ __asm__( "divl %2"
+ :"=a" (dum2), "=d" (*rem)
+ :"rm" (div), "0" (divl),"1" (divh));
+
+ return dum2;
+
+}
+extern inline long long mpy_l_X_l_ll(long mpy1,long mpy2)
+{
+ long long eax;
+ __asm__("imull %1\n\t"
+ :"=A" (eax)
+ :"rm" (mpy2),
+ "a" (mpy1));
+
+ return eax;
+
+}
+extern inline long mpy_1_X_1_h(long mpy1,long mpy2,long *hi)
+{
+ long eax;
+ __asm__("imull %2\n\t"
+ :"=a" (eax),"=d" (*hi)
+ :"rm" (mpy2),
+ "0" (mpy1));
+
+ return eax;
+
+}
+
+#endif
diff -urP -I \$Id:.*Exp \$ -X /usr/src/patch.exclude linux-2.5.41-bk2-core/include/asm-i386/signal.h linux/include/asm-i386/signal.h
--- linux-2.5.41-bk2-core/include/asm-i386/signal.h Mon Sep 9 10:35:04 2002
+++ linux/include/asm-i386/signal.h Wed Oct 9 14:08:47 2002
@@ -3,6 +3,8 @@
#include <linux/types.h>
#include <linux/linkage.h>
+#include <linux/time.h>
+#include <asm/ptrace.h>
/* Avoid too many header ordering problems. */
struct siginfo;
@@ -216,9 +218,82 @@
__asm__("bsfl %1,%0" : "=r"(word) : "rm"(word) : "cc");
return word;
}
+/*
+ * These macros are used by nanosleep() and clock_nanosleep().
+ * The issue is that these functions need the *regs pointer which is
+ * passed in different ways by the differing archs.
+
+ * Below we do things in two differing ways. In the long run we would
+ * like to see nano_sleep() go away (glibc should call clock_nanosleep
+ * much as we do). When that happens and the nano_sleep() system
+ * call entry is retired, there will no longer be any real need for
+ * sys_nanosleep() so the FOLD_NANO_SLEEP_INTO_CLOCK_NANO_SLEEP macro
+ * could be undefined, resulting in not needing to stack all the
+ * parms over again, i.e. better (faster AND smaller) code.
+
+ * And while were at it, there needs to be a way to set the return code
+ * on the way to do_signal(). It (i.e. do_signal()) saves the regs on
+ * the callers stack to call the user handler and then the return is
+ * done using those registers. This means that the error code MUST be
+ * set in the register PRIOR to calling do_signal(). See our answer
+ * below...thanks to Jim Houston <[email protected]>
+ */
+#define FOLD_NANO_SLEEP_INTO_CLOCK_NANO_SLEEP_NOT
+
+
+#ifdef FOLD_NANO_SLEEP_INTO_CLOCK_NANO_SLEEP
+extern long do_clock_nanosleep(struct pt_regs *regs,
+ clockid_t which_clock,
+ int flags,
+ const struct timespec *rqtp,
+ struct timespec *rmtp);
+
+#define NANOSLEEP_ENTRY(a) \
+ asmlinkage long sys_nanosleep( struct timespec* rqtp, \
+ struct timespec * rmtp) \
+{ struct pt_regs *regs = (struct pt_regs *)&rqtp; \
+ return do_clock_nanosleep(regs, CLOCK_REALTIME, 0, rqtp, rmtp); \
+}
+
+#define CLOCK_NANOSLEEP_ENTRY(a) asmlinkage long sys_clock_nanosleep( \
+ clockid_t which_clock, \
+ int flags, \
+ const struct timespec *rqtp, \
+ struct timespec *rmtp) \
+{ struct pt_regs *regs = (struct pt_regs *)&which_clock; \
+ return do_clock_nanosleep(regs, which_clock, flags, rqtp, rmtp); \
+} \
+long do_clock_nanosleep(struct pt_regs *regs, \
+ clockid_t which_clock, \
+ int flags, \
+ const struct timespec *rqtp, \
+ struct timespec *rmtp) \
+{ a
+
+#else
+#define NANOSLEEP_ENTRY(a) \
+ asmlinkage long sys_nanosleep( struct timespec* rqtp, \
+ struct timespec * rmtp) \
+{ struct pt_regs *regs = (struct pt_regs *)&rqtp; \
+ a
+#define CLOCK_NANOSLEEP_ENTRY(a) asmlinkage long sys_clock_nanosleep( \
+ clockid_t which_clock, \
+ int flags, \
+ const struct timespec *rqtp, \
+ struct timespec *rmtp) \
+{ struct pt_regs *regs = (struct pt_regs *)&which_clock; \
+ a
+#endif
struct pt_regs;
extern int FASTCALL(do_signal(struct pt_regs *regs, sigset_t *oldset));
+#define PT_REGS_ENTRY(type,name,p1_type,p1, p2_type,p2) \
+type name(p1_type p1,p2_type p2)\
+{ struct pt_regs *regs = (struct pt_regs *)&p1;
+
+#define _do_signal() (regs->eax = -EINTR, do_signal(regs, NULL))
+
+
#endif /* __KERNEL__ */
On Wed, 9 Oct 2002, george anzinger wrote:
>
> This patch, in conjunction with the "core" high-res-timers
> patch implements high resolution timers on the i386
> platforms.
I really don't get the notion of partial ticks, and quite frankly, this
isn't going into my tree until some major distribution kicks me in the
head and explains to me why the hell we have partial ticks instead of just
making the ticks shorter.
Linus
Linus Torvalds wrote:
>
> On Wed, 9 Oct 2002, george anzinger wrote:
> >
> > This patch, in conjunction with the "core" high-res-timers
> > patch implements high resolution timers on the i386
> > platforms.
>
> I really don't get the notion of partial ticks, and quite frankly, this
> isn't going into my tree until some major distribution kicks me in the
> head and explains to me why the hell we have partial ticks instead of just
> making the ticks shorter.
>
Well, the notion is to provide timers that have resolution
down into the micro seconds. Since this take a bit more
overhead, we just set up an interrupt on an as needed
basis. This is why we define both a high res and a low res
clock. Timers on the low res clock will always use the 1/HZ
tick to drive them and thus do not introduce any additional
overhead. If this is all that is needed the configure
option can be left off and only these timers will be
available.
On the other hand, if a user requires better resolution,
s/he just turns on the high-res option and incures the
overhead only when it is used and then only at timer expire
time. Note that the only way to access a high-res timer is
via the POSIX clocks and timers API. They are not available
to select or any other system call.
Making ticks shorter causes extra overhead ALL the time,
even when it is not needed. Higher resolution is not free
in any case, but it is much closer to free with this patch
than by increasing HZ (which, of course, can still be
done). Overhead wise and resolution wise, for timers, we
would be better off with a 1/HZ tick and the "on demand"
high-res interrupts this patch introduces.
--
George Anzinger [email protected]
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml
george anzinger wrote:
> Linus Torvalds wrote:
> > I really don't get the notion of partial ticks, and quite frankly, this
> > isn't going into my tree until some major distribution kicks me in the
> > head and explains to me why the hell we have partial ticks instead of just
> > making the ticks shorter.
> ...
>
> Making ticks shorter causes extra overhead ALL the time,
> even when it is not needed. Higher resolution is not free
> in any case, but it is much closer to free with this patch
> than by increasing HZ (which, of course, can still be
> done). Overhead wise and resolution wise, for timers, we
> would be better off with a 1/HZ tick and the "on demand"
> high-res interrupts this patch introduces.
Seems reasonable to me. Increasing HZ adds overhead -
it makes sense to incur the interrupt overhead only when it's
needed. In my case, we want to provide fairly precise
network delays (we're doing a WAN simulator), and still hit
line rate. Now, I'm way far from the code, but I suspect that
the interrupt overhead needed to get the precision the customer
is calling for would be totally prohibitive. I dunno if we'll
get the precision the customer wants with George's approach,
but we'll get a lot closer than we would setting HZ to 10000
on our wimpy little embedded platform.
George's approach would work a lot better when doing lots of UML VM's
on a single box, too, wouldn't it?
- Dan
> george anzinger wrote:
>
>>Linus Torvalds wrote:
>>
>>>I really don't get the notion of partial ticks, and quite frankly, this
>>>isn't going into my tree until some major distribution kicks me in the
>>>head and explains to me why the hell we have partial ticks instead of just
>>>making the ticks shorter.
>>
>>...
>>
>>Making ticks shorter causes extra overhead ALL the time,
>>even when it is not needed. Higher resolution is not free
>>in any case, but it is much closer to free with this patch
>>than by increasing HZ (which, of course, can still be
>>done). Overhead wise and resolution wise, for timers, we
>>would be better off with a 1/HZ tick and the "on demand"
>>high-res interrupts this patch introduces.
I would like to add my small vote for including the timers too.
I have not looked at the code, but the idea seems sound (let
those who need the timers pay the price at that time, don't make
the rest of the machine suffer otherwise)....
Enjoy,
Ben
--
Ben Greear <[email protected]> <Ben_Greear AT excite.com>
President of Candela Technologies Inc http://www.candelatech.com
ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear
[email protected] said:
> George's approach would work a lot better when doing lots of UML VM's
> on a single box, too, wouldn't it?
My thinking on this is that I'll have UML do the on-demand ticks. So, on
a host with n UMLs, we will no longer have n * HZ timer deliveries/sec.
I haven't thought a lot about it, but this seems largely unconnected to how
the host does its timers.
The one connection I can think of is that any generic support for on-demand
ticks would be re-used by UML. And if UML required generic changes for this,
then that would obviously affect the other ports somehow.
Jeff
Jeff Dike wrote:
>
> [email protected] said:
> > George's approach would work a lot better when doing lots of UML VM's
> > on a single box, too, wouldn't it?
>
> My thinking on this is that I'll have UML do the on-demand ticks. ...
> any generic support for on-demand
> ticks would be re-used by UML. And if UML required generic changes for this,
> then that would obviously affect the other ports somehow.
Yes, exactly. UML wants on-demand ticks, which is exactly what George's
patch
uses, too. I'm too far from the code to say, but there ought to be some
commonality there.
- Dan
On Wednesday 09 October 2002 20:50, Dan Kegel wrote:
> line rate. Now, I'm way far from the code, but I suspect that
> the interrupt overhead needed to get the precision the customer
> is calling for would be totally prohibitive. I dunno if we'll
only in a fixed interval tick system. Early in george's design process I
argued for a tickless system,(which I had implemented in my company's
proprietary real-time OS) which has _no_ extra overhead and does away with
the 10ms tick entirely. the precision attained is whatever the highest
resolution interrupting counter on the system is capable of.
george did extensive benchmarking of candidate implementations of both
designs and came to the conclusion that the 10ms jiffie fixed interval tick
plus on demand higher resolution ticks was more suitable for general purpose
uses than the tickless system, particularly under high load when there are
many low resolution timed events in the system (as in a server situation).
it turned out that the tickless system was appropriate for embedded systems
(my focus) which tend to have small numbers of well coordinated tasks running
and not so good in environments with a lot of things going on, such as a
multimedia desktop or big honkin server.
whith the hybrid system that george developed, you get the batching benefits
of low resolution fixed interval timers, which provides all the capability
most timer services customers need while at the same time, and for minimal
overhead providing the high resolution timers that the embedded world needs.
--
/**************************************************
** Mark Salisbury || [email protected] **
**************************************************/
george anzinger <[email protected]> writes:
> Linus Torvalds wrote:
> >
> > On Wed, 9 Oct 2002, george anzinger wrote:
> > >
> > > This patch, in conjunction with the "core" high-res-timers
> > > patch implements high resolution timers on the i386
> > > platforms.
> >
> > I really don't get the notion of partial ticks, and quite frankly, this
> > isn't going into my tree until some major distribution kicks me in the
> > head and explains to me why the hell we have partial ticks instead of just
> > making the ticks shorter.
> >
> Well, the notion is to provide timers that have resolution
> down into the micro seconds. Since this take a bit more
> overhead, we just set up an interrupt on an as needed
> basis. This is why we define both a high res and a low res
> clock. Timers on the low res clock will always use the 1/HZ
> tick to drive them and thus do not introduce any additional
> overhead. If this is all that is needed the configure
> option can be left off and only these timers will be
> available.
>
> On the other hand, if a user requires better resolution,
> s/he just turns on the high-res option and incures the
> overhead only when it is used and then only at timer expire
> time. Note that the only way to access a high-res timer is
> via the POSIX clocks and timers API. They are not available
> to select or any other system call.
>
> Making ticks shorter causes extra overhead ALL the time,
> even when it is not needed. Higher resolution is not free
> in any case, but it is much closer to free with this patch
> than by increasing HZ (which, of course, can still be
> done). Overhead wise and resolution wise, for timers, we
> would be better off with a 1/HZ tick and the "on demand"
> high-res interrupts this patch introduces.
??? The issue of ticks is separate from the issue of how often
timer interrupts fire. Ticks just becomes the maximum resolution
you can support/express.
If it makes sense to have two maximum tick resolutions. The normal
application maximum tick rate and the special task maximum tick
rate it is probably worth making this only available as a capability
or an rlimit.
Eric
"Eric W. Biederman" wrote:
>
> george anzinger <[email protected]> writes:
>
> > Linus Torvalds wrote:
> > >
> > > On Wed, 9 Oct 2002, george anzinger wrote:
> > > >
> > > > This patch, in conjunction with the "core" high-res-timers
> > > > patch implements high resolution timers on the i386
> > > > platforms.
> > >
> > > I really don't get the notion of partial ticks, and quite frankly, this
> > > isn't going into my tree until some major distribution kicks me in the
> > > head and explains to me why the hell we have partial ticks instead of just
> > > making the ticks shorter.
> > >
> > Well, the notion is to provide timers that have resolution
> > down into the micro seconds. Since this take a bit more
> > overhead, we just set up an interrupt on an as needed
> > basis. This is why we define both a high res and a low res
> > clock. Timers on the low res clock will always use the 1/HZ
> > tick to drive them and thus do not introduce any additional
> > overhead. If this is all that is needed the configure
> > option can be left off and only these timers will be
> > available.
> >
> > On the other hand, if a user requires better resolution,
> > s/he just turns on the high-res option and incures the
> > overhead only when it is used and then only at timer expire
> > time. Note that the only way to access a high-res timer is
> > via the POSIX clocks and timers API. They are not available
> > to select or any other system call.
> >
> > Making ticks shorter causes extra overhead ALL the time,
> > even when it is not needed. Higher resolution is not free
> > in any case, but it is much closer to free with this patch
> > than by increasing HZ (which, of course, can still be
> > done). Overhead wise and resolution wise, for timers, we
> > would be better off with a 1/HZ tick and the "on demand"
> > high-res interrupts this patch introduces.
>
> ??? The issue of ticks is separate from the issue of how often
> timer interrupts fire. Ticks just becomes the maximum resolution
> you can support/express.
>
> If it makes sense to have two maximum tick resolutions. The normal
> application maximum tick rate and the special task maximum tick
> rate it is probably worth making this only available as a capability
> or an rlimit.
>
I could support a notion that to use the high-res clock for
a timer the user would need a particular capability. After
all we do the same for the real time priority.
Does this get us any closer to acceptance in 2.5?
--
George Anzinger [email protected]
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml
On Wed, Oct 09, 2002 at 04:42:03PM -0700, george anzinger wrote:
> Linus Torvalds wrote:
> >
> > On Wed, 9 Oct 2002, george anzinger wrote:
> > >
> > > This patch, in conjunction with the "core" high-res-timers
> > > patch implements high resolution timers on the i386
> > > platforms.
> >
> > I really don't get the notion of partial ticks, and quite frankly, this
> > isn't going into my tree until some major distribution kicks me in the
> > head and explains to me why the hell we have partial ticks instead of just
> > making the ticks shorter.
> >
> Well, the notion is to provide timers that have resolution
> down into the micro seconds. Since this take a bit more
> overhead, we just set up an interrupt on an as needed
> basis. This is why we define both a high res and a low res
> clock. Timers on the low res clock will always use the 1/HZ
> tick to drive them and thus do not introduce any additional
> overhead. If this is all that is needed the configure
> option can be left off and only these timers will be
> available.
>
> On the other hand, if a user requires better resolution,
> s/he just turns on the high-res option and incures the
> overhead only when it is used and then only at timer expire
> time. Note that the only way to access a high-res timer is
> via the POSIX clocks and timers API. They are not available
> to select or any other system call.
>
> Making ticks shorter causes extra overhead ALL the time,
> even when it is not needed. Higher resolution is not free
> in any case, but it is much closer to free with this patch
> than by increasing HZ (which, of course, can still be
> done). Overhead wise and resolution wise, for timers, we
> would be better off with a 1/HZ tick and the "on demand"
> high-res interrupts this patch introduces.
I think what Linus is getting at is: why not make the units of jiffies
microseconds and give it larger increments on clock ticks? Now you
don't need any special logic to go to better than HZ resolution.
Unfortunately, this means identifying all the things that use HZ as a
measure of how often we check for rescheduling.
There's also an issue of dynamic range - if we some day soon decide we
want internal timestamps with nanosecond resolution (because units of
.1us are annoying, not because we'll actually have ns accuracy),
then we're seeing timer wraps every couple seconds on 32bit machines
and we're pretty much forced to break into seconds and nanoseconds.
This is arguably saner than jiffies and subjiffies, but it forces
people who are using long timeouts today to use a new interface.
I don't think he can seriously mean cranking HZ up to match whatever
timing requirements we might have - that obviously doesn't scale.
--
"Love the dolphins," she advised him. "Write by W.A.S.T.E.."
Oliver Xymoron wrote:
>
> On Wed, Oct 09, 2002 at 04:42:03PM -0700, george anzinger wrote:
> > Linus Torvalds wrote:
> > >
> > > On Wed, 9 Oct 2002, george anzinger wrote:
> > > >
> > > > This patch, in conjunction with the "core" high-res-timers
> > > > patch implements high resolution timers on the i386
> > > > platforms.
> > >
> > > I really don't get the notion of partial ticks, and quite frankly, this
> > > isn't going into my tree until some major distribution kicks me in the
> > > head and explains to me why the hell we have partial ticks instead of just
> > > making the ticks shorter.
> > >
> > Well, the notion is to provide timers that have resolution
> > down into the micro seconds. Since this take a bit more
> > overhead, we just set up an interrupt on an as needed
> > basis. This is why we define both a high res and a low res
> > clock. Timers on the low res clock will always use the 1/HZ
> > tick to drive them and thus do not introduce any additional
> > overhead. If this is all that is needed the configure
> > option can be left off and only these timers will be
> > available.
> >
> > On the other hand, if a user requires better resolution,
> > s/he just turns on the high-res option and incures the
> > overhead only when it is used and then only at timer expire
> > time. Note that the only way to access a high-res timer is
> > via the POSIX clocks and timers API. They are not available
> > to select or any other system call.
> >
> > Making ticks shorter causes extra overhead ALL the time,
> > even when it is not needed. Higher resolution is not free
> > in any case, but it is much closer to free with this patch
> > than by increasing HZ (which, of course, can still be
> > done). Overhead wise and resolution wise, for timers, we
> > would be better off with a 1/HZ tick and the "on demand"
> > high-res interrupts this patch introduces.
>
> I think what Linus is getting at is: why not make the units of jiffies
> microseconds and give it larger increments on clock ticks? Now you
> don't need any special logic to go to better than HZ resolution.
> Unfortunately, this means identifying all the things that use HZ as a
> measure of how often we check for rescheduling.
Well then you are still dealing with two measures, the HZ
and the tick rate. One might also argue that the subjiffie
should be some "normal" thing like nanosecond or micro
second. I went round and round with this in the beginning.
What it comes down to it the conversion back and forth is
much easier and faster (less overhead) when using the
natural units of the underlying clock. This way the
interrupt code, for example, does not have to even do a
conversion.
>
> There's also an issue of dynamic range - if we some day soon decide we
> want internal timestamps with nanosecond resolution (because units of
> .1us are annoying, not because we'll actually have ns accuracy),
> then we're seeing timer wraps every couple seconds on 32bit machines
> and we're pretty much forced to break into seconds and nanoseconds.
> This is arguably saner than jiffies and subjiffies, but it forces
> people who are using long timeouts today to use a new interface.
>
> I don't think he can seriously mean cranking HZ up to match whatever
> timing requirements we might have - that obviously doesn't scale.
This is at least the third "take" on what he means, each of
which sends me in a very different direction. Sure would
like to know what he really means.
I KNOW there is demand for the high-res timers, else I would
not have spent the last year + being funded to do it. It
also would not be in the OSDL Carrier Grade system if there
was no demand.
What I would really like to do is address the real issue.
>
> --
> "Love the dolphins," she advised him. "Write by W.A.S.T.E.."
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
George Anzinger [email protected]
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml
On Thu, Oct 10, 2002 at 09:24:54AM -0700, george anzinger wrote:
> Oliver Xymoron wrote:
> >
> > On Wed, Oct 09, 2002 at 04:42:03PM -0700, george anzinger wrote:
> > > Linus Torvalds wrote:
> > > >
> > > > On Wed, 9 Oct 2002, george anzinger wrote:
> > > > >
> > > > > This patch, in conjunction with the "core" high-res-timers
> > > > > patch implements high resolution timers on the i386
> > > > > platforms.
> > > >
> > > > I really don't get the notion of partial ticks, and quite frankly, this
> > > > isn't going into my tree until some major distribution kicks me in the
> > > > head and explains to me why the hell we have partial ticks instead of just
> > > > making the ticks shorter.
> > > >
> > > Well, the notion is to provide timers that have resolution
> > > down into the micro seconds. Since this take a bit more
> > > overhead, we just set up an interrupt on an as needed
> > > basis. This is why we define both a high res and a low res
> > > clock. Timers on the low res clock will always use the 1/HZ
> > > tick to drive them and thus do not introduce any additional
> > > overhead. If this is all that is needed the configure
> > > option can be left off and only these timers will be
> > > available.
> > >
> > > On the other hand, if a user requires better resolution,
> > > s/he just turns on the high-res option and incures the
> > > overhead only when it is used and then only at timer expire
> > > time. Note that the only way to access a high-res timer is
> > > via the POSIX clocks and timers API. They are not available
> > > to select or any other system call.
> > >
> > > Making ticks shorter causes extra overhead ALL the time,
> > > even when it is not needed. Higher resolution is not free
> > > in any case, but it is much closer to free with this patch
> > > than by increasing HZ (which, of course, can still be
> > > done). Overhead wise and resolution wise, for timers, we
> > > would be better off with a 1/HZ tick and the "on demand"
> > > high-res interrupts this patch introduces.
> >
> > I think what Linus is getting at is: why not make the units of jiffies
> > microseconds and give it larger increments on clock ticks? Now you
> > don't need any special logic to go to better than HZ resolution.
> > Unfortunately, this means identifying all the things that use HZ as a
> > measure of how often we check for rescheduling.
>
> Well then you are still dealing with two measures, the HZ
> and the tick rate.
Yep, and separating the two breaks a few things. Granted.
> One might also argue that the subjiffie
> should be some "normal" thing like nanosecond or micro
> second. I went round and round with this in the beginning.
> What it comes down to it the conversion back and forth is
> much easier and faster (less overhead) when using the
> natural units of the underlying clock. This way the
> interrupt code, for example, does not have to even do a
> conversion.
Then the argument becomes move jiffies to the most convenient unit
that encompasses what you want to do with subjiffies. Microseconds was
just an example. Most code doesn't really care when ticks happen,
except to the extent that they currently trigger timers, so
jiffies=tick HZ stops being a meaningful measure once timers are
untied from ticks, see?
> > I don't think he can seriously mean cranking HZ up to match whatever
> > timing requirements we might have - that obviously doesn't scale.
>
> This is at least the third "take" on what he means, each of
> which sends me in a very different direction. Sure would
> like to know what he really means.
Perhaps if you pose it as a multiple-choice question? I suppose he's
almost sure to answer with "none of the above".
--
"Love the dolphins," she advised him. "Write by W.A.S.T.E.."
Oliver Xymoron wrote:
>
> On Thu, Oct 10, 2002 at 09:24:54AM -0700, george anzinger wrote:
> > Oliver Xymoron wrote:
> > >
> > > On Wed, Oct 09, 2002 at 04:42:03PM -0700, george anzinger wrote:
> > > > Linus Torvalds wrote:
> > > > >
> > > > > On Wed, 9 Oct 2002, george anzinger wrote:
> > > > > >
> > > > > > This patch, in conjunction with the "core" high-res-timers
> > > > > > patch implements high resolution timers on the i386
> > > > > > platforms.
> > > > >
> > > > > I really don't get the notion of partial ticks, and quite frankly, this
> > > > > isn't going into my tree until some major distribution kicks me in the
> > > > > head and explains to me why the hell we have partial ticks instead of just
> > > > > making the ticks shorter.
> > > > >
> > > > Well, the notion is to provide timers that have resolution
> > > > down into the micro seconds. Since this take a bit more
> > > > overhead, we just set up an interrupt on an as needed
> > > > basis. This is why we define both a high res and a low res
> > > > clock. Timers on the low res clock will always use the 1/HZ
> > > > tick to drive them and thus do not introduce any additional
> > > > overhead. If this is all that is needed the configure
> > > > option can be left off and only these timers will be
> > > > available.
> > > >
> > > > On the other hand, if a user requires better resolution,
> > > > s/he just turns on the high-res option and incures the
> > > > overhead only when it is used and then only at timer expire
> > > > time. Note that the only way to access a high-res timer is
> > > > via the POSIX clocks and timers API. They are not available
> > > > to select or any other system call.
> > > >
> > > > Making ticks shorter causes extra overhead ALL the time,
> > > > even when it is not needed. Higher resolution is not free
> > > > in any case, but it is much closer to free with this patch
> > > > than by increasing HZ (which, of course, can still be
> > > > done). Overhead wise and resolution wise, for timers, we
> > > > would be better off with a 1/HZ tick and the "on demand"
> > > > high-res interrupts this patch introduces.
> > >
> > > I think what Linus is getting at is: why not make the units of jiffies
> > > microseconds and give it larger increments on clock ticks? Now you
> > > don't need any special logic to go to better than HZ resolution.
> > > Unfortunately, this means identifying all the things that use HZ as a
> > > measure of how often we check for rescheduling.
> >
> > Well then you are still dealing with two measures, the HZ
> > and the tick rate.
>
> Yep, and separating the two breaks a few things. Granted.
>
> > One might also argue that the subjiffie
> > should be some "normal" thing like nanosecond or micro
> > second. I went round and round with this in the beginning.
> > What it comes down to it the conversion back and forth is
> > much easier and faster (less overhead) when using the
> > natural units of the underlying clock. This way the
> > interrupt code, for example, does not have to even do a
> > conversion.
>
> Then the argument becomes move jiffies to the most convenient unit
> that encompasses what you want to do with subjiffies. Microseconds was
> just an example. Most code doesn't really care when ticks happen,
> except to the extent that they currently trigger timers, so
> jiffies=tick HZ stops being a meaningful measure once timers are
> untied from ticks, see?
Hm? Not really sure what this leads to. Right now the
timers are organized by "tick". I think this is VERY
useful. It makes the timer insert VERY fast and the tick
processing equally fast. A regular "tick" also makes the
accounting overhead flat WRT load, also a GOOD thing.
One thought I had was to separate out the sub tick events
into a different list and come up with a different interrupt
source for them. Problem is they MUST stay in sync. This
is most easily done when they are in the same list.
What you haven't touched on, is the separation of the "tick"
from the clock or time. The patch implements, a separation
here. Time is taken from a reliable source (in this patch
either TSC or the ACPI pm timer, but others are possible)
and the "tick" is just a reminder to look at the clock and
update accordingly. This eliminates the issue of choosing a
HZ value that is so many PPM close to real time and the NTP
issues that causes, such as the current early expiration of
timers. Try this:
time sleep 60
on a 2.5.40 system. It will come back with 59.xxx seconds.
Clearly the sleep was for less than 60.
>
> > > I don't think he can seriously mean cranking HZ up to match whatever
> > > timing requirements we might have - that obviously doesn't scale.
> >
> > This is at least the third "take" on what he means, each of
> > which sends me in a very different direction. Sure would
> > like to know what he really means.
>
> Perhaps if you pose it as a multiple-choice question? I suppose he's
> almost sure to answer with "none of the above".
>
> --
> "Love the dolphins," she advised him. "Write by W.A.S.T.E.."
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
George Anzinger [email protected]
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml
>> This patch, in conjunction with the "core" high-res-timers
>> patch implements high resolution timers on the i386
>> platforms.
>
> I really don't get the notion of partial ticks, and quite frankly, this
> isn't going into my tree until some major distribution kicks me in the
> head and explains to me why the hell we have partial ticks instead of just
> making the ticks shorter.
>
> Linus
Hi Linus,
Concurrent has been using previous versions of the Posix timers patch
in our 2.4.18 based kernel. I like this interface and would like to
see it included in your kernel.
What would make the patch more acceptable? Would it be acceptable
if it used a separate queue for the Posix timers and minimized changes
to timer.c?
To answer the partial tick question, it's a trade off. If all you need
is 1 milli-second resolution, it might not be worth spliting the tick.
It's a question of how the overhead to set up a timer compares to the
overhead of the higher frequency tick interrupts. If you want
micro-second resolution, you need to split the tick.
This is important to folks doing control systems. They get excited
about timing jitter and resolution. It is also interesting to folks
doing games. It's nice to be able to do short delays by blocking rather
than having to spin in a delay loop.
I'd feel better about this being used for critical applications if
the games folks beat it up first.
Jim Houston - Concurrent Computer Corp.
Linus Torvalds wrote:
> On Wed, 9 Oct 2002, george anzinger wrote:
>
>>This patch, in conjunction with the "core" high-res-timers
>>patch implements high resolution timers on the i386
>>platforms.
>
>
> I really don't get the notion of partial ticks, and quite frankly, this
> isn't going into my tree until some major distribution kicks me in the
> head and explains to me why the hell we have partial ticks instead of just
> making the ticks shorter.
>
> Linus
In any kind of virtual environment you would rather prefer a completely
tickless system alltogether than increased tick rates. In a S/390
virtual machine, running many hundreds of virtual Linux servers the
100Hz timer pops are already considerably painful, and going to a higher
tick rate achieving higher timer resolution is completely prohibitive.
Similar is true in many embedded systems related to power consumption of
high frequency ticks.
However, George has shown that introducing the notion of a completely
tickless system is expensive on Intel overhead wise, thus partial ticks
seem to be a possibility addressing the needs for embedded and virtual
environments, getting decent timer resolution as needed.
Ingo Adlung
On 12 Oct 2002, at 18:03, Jim Houston wrote:
>
> >> This patch, in conjunction with the "core" high-res-timers
> >> patch implements high resolution timers on the i386
> >> platforms.
> >
> > I really don't get the notion of partial ticks, and quite frankly, this
> > isn't going into my tree until some major distribution kicks me in the
> > head and explains to me why the hell we have partial ticks instead of just
> > making the ticks shorter.
> >
> > Linus
>
> Hi Linus,
>
> Concurrent has been using previous versions of the Posix timers patch
> in our 2.4.18 based kernel. I like this interface and would like to
> see it included in your kernel.
Hi,
I think nobody objects seeing the interface implemented. Maybe just how
it's implemented. I did not have a close look, but the concept seems
odd at first sight.
Using a individial timer as interrupt source may be a different idea
(if avaliable for the particular hardware), but the there must be a
balance between busy looping in the kernel and setting up of such an
individual interrupt.
The other thing is how to correlate it with the wall clock.
Sorry for not giving answers, I just know the problems...
Regards,
Ulrich
On Sun, Oct 13, 2002 at 12:46:31PM +0200, Ingo Adlung wrote:
> Linus Torvalds wrote:
> > On Wed, 9 Oct 2002, george anzinger wrote:
> >
> >>This patch, in conjunction with the "core" high-res-timers
> >>patch implements high resolution timers on the i386
> >>platforms.
> >
> >
> > I really don't get the notion of partial ticks, and quite frankly, this
> > isn't going into my tree until some major distribution kicks me in the
> > head and explains to me why the hell we have partial ticks instead of just
> > making the ticks shorter.
Not speaking for a major distro, just for me writing HPET (high
performance event timer ...) support for x86-64 (and it happens to exist
on ia64 as well, and possibly might be in new Intel P4 chipsets, too).
It's a very nice piece of hardware that allows very fine granularity
aperiodic interrupts (in each interrupt you set when the next one will
happen), without much overhead.
It'd be a shame to just set this timer to 1kHz periodic just use that as
a base timer, when you can do much better resolution and latency-wise.
HPET has a base clock > 10 MHz.
> > Linus
>
> In any kind of virtual environment you would rather prefer a completely
> tickless system alltogether than increased tick rates. In a S/390
> virtual machine, running many hundreds of virtual Linux servers the
> 100Hz timer pops are already considerably painful, and going to a higher
> tick rate achieving higher timer resolution is completely prohibitive.
> Similar is true in many embedded systems related to power consumption of
> high frequency ticks.
>
> However, George has shown that introducing the notion of a completely
> tickless system is expensive on Intel overhead wise, thus partial ticks
> seem to be a possibility addressing the needs for embedded and virtual
> environments, getting decent timer resolution as needed.
When HPET becomes a standard (yes, it's a MS requirement for new PCs),
it won't be expensive on i386 anymore.
--
Vojtech Pavlik
SuSE Labs
Hi!
> > >>This patch, in conjunction with the "core" high-res-timers
> > >>patch implements high resolution timers on the i386
> > >>platforms.
> > >
> > >
> > > I really don't get the notion of partial ticks, and quite frankly, this
> > > isn't going into my tree until some major distribution kicks me in the
> > > head and explains to me why the hell we have partial ticks instead of just
> > > making the ticks shorter.
>
> Not speaking for a major distro, just for me writing HPET (high
> performance event timer ...) support for x86-64 (and it happens to exist
> on ia64 as well, and possibly might be in new Intel P4 chipsets, too).
>
> It's a very nice piece of hardware that allows very fine granularity
> aperiodic interrupts (in each interrupt you set when the next one will
> happen), without much overhead.
I believe the problem is like this: assume you have three timers,
10msec polling of mouse, 30msec keyboard autorepeat and 50msec cursor
blinking. With current approach, you get
10msec userland runs
<enter kernel>
<process mouse>
<process keyboard>
<process cursor>
<exit kernel>
With hires timers, you get:
3msec userland runs
<enter kernel>
<process mouse>
<exit kernel>
2msec userland runs
<enter kernel>
<process keyboard>
<exit kernel>
...
which is not so efficient. I guess rounding could be implemented to
preserve this "do-all-together" ability?
Pavel
--
When do you have heart between your knees?
On Tue, Oct 15, 2002 at 12:17:47AM +0200, Pavel Machek wrote:
> Hi!
>
> > > >>This patch, in conjunction with the "core" high-res-timers
> > > >>patch implements high resolution timers on the i386
> > > >>platforms.
> > > >
> > > >
> > > > I really don't get the notion of partial ticks, and quite frankly, this
> > > > isn't going into my tree until some major distribution kicks me in the
> > > > head and explains to me why the hell we have partial ticks instead of just
> > > > making the ticks shorter.
> >
> > Not speaking for a major distro, just for me writing HPET (high
> > performance event timer ...) support for x86-64 (and it happens to exist
> > on ia64 as well, and possibly might be in new Intel P4 chipsets, too).
> >
> > It's a very nice piece of hardware that allows very fine granularity
> > aperiodic interrupts (in each interrupt you set when the next one will
> > happen), without much overhead.
>
> I believe the problem is like this: assume you have three timers,
> 10msec polling of mouse, 30msec keyboard autorepeat and 50msec cursor
> blinking. With current approach, you get
>
> 10msec userland runs
> <enter kernel>
> <process mouse>
> <process keyboard>
> <process cursor>
> <exit kernel>
>
> With hires timers, you get:
>
> 3msec userland runs
> <enter kernel>
> <process mouse>
> <exit kernel>
> 2msec userland runs
> <enter kernel>
> <process keyboard>
> <exit kernel>
> ...
>
> which is not so efficient. I guess rounding could be implemented to
> preserve this "do-all-together" ability?
Actually that's exactly why you'd want sub-tick timing. For timers where
you don't care too much about the timing ;) you could do the rounding,
and for those where you need exact timing (sound, video, ...) you could
call a different add_timer() which would disable the coalescing.
--
Vojtech Pavlik
SuSE Labs
Vojtech Pavlik wrote:
>
> On Tue, Oct 15, 2002 at 12:17:47AM +0200, Pavel Machek wrote:
> > Hi!
> >
> > > > >>This patch, in conjunction with the "core" high-res-timers
> > > > >>patch implements high resolution timers on the i386
> > > > >>platforms.
> > > > >
> > > > >
> > > > > I really don't get the notion of partial ticks, and quite frankly, this
> > > > > isn't going into my tree until some major distribution kicks me in the
> > > > > head and explains to me why the hell we have partial ticks instead of just
> > > > > making the ticks shorter.
> > >
> > > Not speaking for a major distro, just for me writing HPET (high
> > > performance event timer ...) support for x86-64 (and it happens to exist
> > > on ia64 as well, and possibly might be in new Intel P4 chipsets, too).
> > >
> > > It's a very nice piece of hardware that allows very fine granularity
> > > aperiodic interrupts (in each interrupt you set when the next one will
> > > happen), without much overhead.
> >
> > I believe the problem is like this: assume you have three timers,
> > 10msec polling of mouse, 30msec keyboard autorepeat and 50msec cursor
> > blinking. With current approach, you get
> >
> > 10msec userland runs
> > <enter kernel>
> > <process mouse>
> > <process keyboard>
> > <process cursor>
> > <exit kernel>
> >
> > With hires timers, you get:
> >
> > 3msec userland runs
> > <enter kernel>
> > <process mouse>
> > <exit kernel>
> > 2msec userland runs
> > <enter kernel>
> > <process keyboard>
> > <exit kernel>
> > ...
> >
> > which is not so efficient. I guess rounding could be implemented to
> > preserve this "do-all-together" ability?
>
> Actually that's exactly why you'd want sub-tick timing. For timers where
> you don't care too much about the timing ;) you could do the rounding,
> and for those where you need exact timing (sound, video, ...) you could
> call a different add_timer() which would disable the coalescing.
The way you do this with the POSIX interface is to use the
low res CLOCKs. Internally one would just set the
sub_jiffie in the struct timer_list to zero (as the
initialize code does). This way the timer would always be
handled on the tick interrupt and would never cause a
"special" sub tick interrupt.
As the patch is currently written, it takes extra effort to
force a sub tick event (as it should) so one has to
"request" it.
--
George Anzinger [email protected]
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml
Ulrich Windl wrote:
>
> On 12 Oct 2002, at 18:03, Jim Houston wrote:
>
> >
> > >> This patch, in conjunction with the "core" high-res-timers
> > >> patch implements high resolution timers on the i386
> > >> platforms.
> > >
> > > I really don't get the notion of partial ticks, and quite frankly, this
> > > isn't going into my tree until some major distribution kicks me in the
> > > head and explains to me why the hell we have partial ticks instead of just
> > > making the ticks shorter.
> > >
> > > Linus
> >
> > Hi Linus,
> >
> > Concurrent has been using previous versions of the Posix timers patch
> > in our 2.4.18 based kernel. I like this interface and would like to
> > see it included in your kernel.
>
> Hi,
>
> I think nobody objects seeing the interface implemented. Maybe just how
> it's implemented. I did not have a close look, but the concept seems
> odd at first sight.
>
> Using a individial timer as interrupt source may be a different idea
> (if avaliable for the particular hardware), but the there must be a
> balance between busy looping in the kernel and setting up of such an
> individual interrupt.
>
> The other thing is how to correlate it with the wall clock.
Ah, yes, that is it in a nut shell. If you try to put the
high res timers in a "special" list and not in the same list
as the low res stuff, you have ordering issues. It becomes
real easy to have the timers expire in the incorrect order.
As to interrupt source and time, the biggest issue is that
we don't really have timers that interrupt in "nice" units
of time. The PIT, for example, has a tick time (i.e. each
count) of 0.838095239 micro seconds. So how are we to
figure time from such a tick if we want to use an integer
value for HZ.
What my patch suggests is that we use the higher resolution
TSC or pm timer (or what ever is available) and just use the
PIT to remind us to look at the clock, AND that we keep time
in units of that clock. In some ways we already do this,
but we are not consistent. For example we advance the time
by less than 1 ms each tick, but we still assume that a tick
is 1 ms when we set up timers. This leads to standards
failures such as that illustrated by:
time sleep 60
which on a 2.5 system will sleep for less than 60 seconds
because of this.
--
George Anzinger [email protected]
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml
On Wed, 9 Oct 2002, Linus Torvalds wrote:
| On Wed, 9 Oct 2002, george anzinger wrote:
| >
| > This patch, in conjunction with the "core" high-res-timers
| > patch implements high resolution timers on the i386
| > platforms.
|
| I really don't get the notion of partial ticks, and quite frankly, this
| isn't going into my tree until some major distribution kicks me in the
| head and explains to me why the hell we have partial ticks instead of just
| making the ticks shorter.
| -
Carrier Grade Linux is not a distro, but we do integrate these
patches into the CGL patches and will continue to use it.
Please consider adding it to 2.5.
Thanks,
--
~Randy
On Thu, 2002-10-17 at 17:54, Randy.Dunlap wrote:
> Carrier Grade Linux is not a distro, but we do integrate these
> patches into the CGL patches and will continue to use it.
>
> Please consider adding it to 2.5.
Indeed. Linus, please consider merging at least George's latest patch
set which provides just the new system calls to support POSIX clocks and
timers. There is no dependence on the high-resolution bits, so at least
Linux can provide the missing POSIX.4 system calls.
George can then provide the high resolution code separately which can be
debated and optionally merged.
Robert Love
On Thursday 17 October 2002 17:54, Randy.Dunlap wrote:
> On Wed, 9 Oct 2002, Linus Torvalds wrote:
> | On Wed, 9 Oct 2002, george anzinger wrote:
> | > This patch, in conjunction with the "core" high-res-timers
> | > patch implements high resolution timers on the i386
> | > platforms.
> |
> | I really don't get the notion of partial ticks, and quite frankly, this
> | isn't going into my tree until some major distribution kicks me in the
> | head and explains to me why the hell we have partial ticks instead of
> | just making the ticks shorter.
> | -
because just making ticks shorter/more frequent just increases timer overhead
all the time whether you are actually doing anything requiring it or not.
this is a big waste of cpu cycles.
using the partial tick method put forward by george, you only pay the price
for higher resolution timers WHEN YOU WANT TO.
most things that want say 1usec precision dont want to do something EVERY us,
just something every now and then with 1us precision. things like programs
that want to block for a 350 usec. but waiting 10 or even 1 msec would be too
long.
the timer overhead using fixed interval timers (as you suggest) to support
that occaisional 350 usec block would eat too much cpu to be practical.
increasing timer frequency penalizes ALL users/processes with increased timer
overhead all the time for the benefit of the small number of tasks that need
better resolution. the sub-jiffie/partial tick model only pays that price
when there is an actual timed event that needs to occur at that higher
resolution and the rest of the time the timer overhead remains as it is today
(which to my mind is 10 times what it needs to be, but that is an argument
for another day)
embedded systems in particular need higher resolution and these types of
systems are precisely the systems that can't afford to multiply their timer
overhead by a factor of 10 or more (as increasing HZ to 1000 does).
I would like to add my vote to including George's high-res patches in
2.5... The advantages have been expounded by others, along with their few
downsides (compared to just bumping up HZ). Especially for embedded
systems, but in general also, these make sense.. I'm nearly done with an
initial mips implementation, which we'll be using in my group at Cisco.
Thanks,
Brad