I've re-based my timekeeping patch queue ontop of Roman's recent NTP and
div64 patches. I've been a little short on time, so I've not been able
to get them all tested (ntp testing takes a number of hours), but they
do build and I wanted to get some feedback on the re-base.
My apologies to Ingo for sitting on his patches for this long.
I'll get a test setup going over the weekend to make sure all is well.
The total patch queue is large, so while folks probably won't try it
themselves, here's the full series I'm using to build:
#start w/ linus' -git
#roman's udiv patches
01-udiv.patchd
02-convert.patch
03-remove.patch
#roman's ntp changes
1-cleanup.patch
2-ntp4.patch
3-timefreq.patch
4-timeoffset.patch
5-tai.patch
6-scaleshift.patch
7-ticklength.patch
8-leap.patch
#my timekeeping changes
mult-adj-split.patch
monotonic-raw
#ntp cleanups from Ingo and me
ntp-cleanup-ingo1.patch
ntp-cleanup-ingo2.patch
ntp_cleanup-john.patch
The clocksource frequency is represented by
clocksource->mult/2^(clocksource->shift). Currently, when NTP makes
adjustments to the clock frequency, they are made directly to the mult
value.
This has the drawback that once changed, we cannot know what the orignal
mult value was, or how much adjustment has been applied.
This property causes problems in calculating proper ntp intervals when
switching back and forth between clocksources.
This patch separates the current mult value into a mult and mult_adj
pair. The mult value stays constant, while the ntp clocksource
adjustments are done only to the mult_adj value.
This allows for correct ntp interval calculation and additionally lays
the groundwork for a new notion of time, what I'm calling the
monotonic-raw time, which is introduced in a following patch.
Thoughts or comments would be appreciated.
thanks
-john
Signed-off-by: John Stultz <[email protected]>
diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c
index 17fda52..37851e1 100644
--- a/arch/ia64/kernel/time.c
+++ b/arch/ia64/kernel/time.c
@@ -357,7 +357,7 @@ void update_vsyscall(struct timespec *wall, struct clocksource *c)
/* copy fsyscall clock data */
fsyscall_gtod_data.clk_mask = c->mask;
- fsyscall_gtod_data.clk_mult = c->mult;
+ fsyscall_gtod_data.clk_mult = c->mult + c->mult_adj;
fsyscall_gtod_data.clk_shift = c->shift;
fsyscall_gtod_data.clk_fsys_mmio = c->fsys_mmio;
fsyscall_gtod_data.clk_cycle_last = c->cycle_last;
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 3b26fbd..36aa8da 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -814,7 +814,7 @@ void update_vsyscall(struct timespec *wall_time, struct clocksource *clock)
/* XXX this assumes clock->shift == 22 */
/* 4611686018 ~= 2^(20+64-22) / 1e9 */
- t2x = (u64) clock->mult * 4611686018ULL;
+ t2x = (u64) (clock->mult + clock->mult_adj) * 4611686018ULL;
stamp_xsec = (u64) xtime.tv_nsec * XSEC_PER_SEC;
do_div(stamp_xsec, 1000000000);
stamp_xsec += (u64) xtime.tv_sec * XSEC_PER_SEC;
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 3f82427..14afd48 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -83,7 +83,7 @@ void update_vsyscall(struct timespec *wall_time, struct clocksource *clock)
vsyscall_gtod_data.clock.vread = clock->vread;
vsyscall_gtod_data.clock.cycle_last = clock->cycle_last;
vsyscall_gtod_data.clock.mask = clock->mask;
- vsyscall_gtod_data.clock.mult = clock->mult;
+ vsyscall_gtod_data.clock.mult = clock->mult + clock->mult_adj;
vsyscall_gtod_data.clock.shift = clock->shift;
vsyscall_gtod_data.wall_time_sec = wall_time->tv_sec;
vsyscall_gtod_data.wall_time_nsec = wall_time->tv_nsec;
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 85778a4..e917a30 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -45,7 +45,8 @@ struct clocksource;
* @read: returns a cycle value
* @mask: bitmask for two's complement
* subtraction of non 64 bit counters
- * @mult: cycle to nanosecond multiplier
+ * @mult: cycle to nanosecond multiplier (unadjusted)
+ * @mult_adj: NTP adjustment factor added to the multiplier
* @shift: cycle to nanosecond divisor (power of two)
* @flags: flags describing special properties
* @vread: vsyscall based read
@@ -63,6 +64,7 @@ struct clocksource {
cycle_t (*read)(void);
cycle_t mask;
u32 mult;
+ s32 mult_adj;
u32 shift;
unsigned long flags;
cycle_t (*vread)(void);
@@ -179,7 +181,7 @@ static inline cycle_t clocksource_read(struct clocksource *cs)
static inline s64 cyc2ns(struct clocksource *cs, cycle_t cycles)
{
u64 ret = (u64)cycles;
- ret = (ret * cs->mult) >> cs->shift;
+ ret = (ret * (cs->mult + cs->mult_adj)) >> cs->shift;
return ret;
}
@@ -199,7 +201,7 @@ static inline void clocksource_calculate_interval(struct clocksource *c,
{
u64 tmp;
- /* XXX - All of this could use a whole lot of optimization */
+ /* Do the ns -> cycle conversion first, ignoring mult_adj */
tmp = length_nsec;
tmp <<= c->shift;
tmp += c->mult/2;
@@ -209,7 +211,8 @@ static inline void clocksource_calculate_interval(struct clocksource *c,
if (c->cycle_interval == 0)
c->cycle_interval = 1;
- c->xtime_interval = (u64)c->cycle_interval * c->mult;
+ /* Go back from cycles -> shifted ns, this time include mult_adj */
+ c->xtime_interval = (u64)c->cycle_interval * (c->mult + c->mult_adj);
}
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 1af9fb0..5b11b0f 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -426,7 +426,7 @@ static void clocksource_adjust(s64 offset)
} else
return;
- clock->mult += adj;
+ clock->mult_adj += adj;
clock->xtime_interval += interval;
clock->xtime_nsec -= offset;
clock->error -= (interval - offset) <<
In talking with Josip Loncaric, and his work on clock synchronization
(see btime.sf.net), he mentioned that for really close synchronization,
it is useful to have access to "hardware time", that is a notion of time
that is not in any way adjusted by the clock slewing done to keep close
time sync.
Part of the issue is if we are using the kernel's ntp adjusted
representation of time in order to measure how we should correct time,
we can run into what Paul McKenney aptly described as "Painting a road
using the lines we're painting as the guide".
I had been thinking of a similar problem, and was trying to come up with
a way to give users access to a purely hardware based time
representation that avoided users having to know the underlying
frequency and mask values needed to deal with the wide variety of
possible underlying hardware counters.
My solution is to introduce CLOCK_MONOTONIC_RAW. This exposes a
nanosecond based time value, that increments starting at bootup and has
no frequency adjustments made to it what so ever.
The time is accessed from userspace via the posix_clock_gettime()
syscall, passing CLOCK_MONOTONIC_RAW as the clock_id.
This patch very much depends on the mult/mult_adj split patch in this
series.
Thoughts comments would be greatly appreciated!
thanks
-john
Signed-off-by: John Stultz <[email protected]>
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index e917a30..1460803 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -79,6 +79,7 @@ struct clocksource {
/* timekeeping specific data, ignore */
cycle_t cycle_interval;
u64 xtime_interval;
+ u64 raw_interval;
/*
* Second part is written at each timer interrupt
* Keep it in a different cache line to dirty no
diff --git a/include/linux/time.h b/include/linux/time.h
index 2091a19..9548eb4 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -116,6 +116,7 @@ extern int do_setitimer(int which, struct itimerval *value,
extern unsigned int alarm_setitimer(unsigned int seconds);
extern int do_getitimer(int which, struct itimerval *value);
extern void getnstimeofday(struct timespec *tv);
+extern void getrawmonotonic(struct timespec *ts);
extern void getboottime(struct timespec *ts);
extern void monotonic_to_bootbased(struct timespec *ts);
@@ -214,6 +215,7 @@ struct itimerval {
#define CLOCK_MONOTONIC 1
#define CLOCK_PROCESS_CPUTIME_ID 2
#define CLOCK_THREAD_CPUTIME_ID 3
+#define CLOCK_MONOTONIC_RAW 4
/*
* The IDs of various hardware clocks:
diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c
index 022c9c3..91a6c19 100644
--- a/kernel/posix-timers.c
+++ b/kernel/posix-timers.c
@@ -224,6 +224,15 @@ static int posix_ktime_get_ts(clockid_t which_clock, struct timespec *tp)
}
/*
+ * Get monotonic time for posix timers
+ */
+static int posix_get_monotonic_raw(clockid_t which_clock, struct timespec *tp)
+{
+ getrawmonotonic(tp);
+ return 0;
+}
+
+/*
* Initialize everything, well, just everything in Posix clocks/timers ;)
*/
static __init int init_posix_timers(void)
@@ -236,9 +245,15 @@ static __init int init_posix_timers(void)
.clock_get = posix_ktime_get_ts,
.clock_set = do_posix_clock_nosettime,
};
+ struct k_clock clock_monotonic_raw = {
+ .clock_getres = hrtimer_get_res,
+ .clock_get = posix_get_monotonic_raw,
+ .clock_set = do_posix_clock_nosettime,
+ };
register_posix_clock(CLOCK_REALTIME, &clock_realtime);
register_posix_clock(CLOCK_MONOTONIC, &clock_monotonic);
+ register_posix_clock(CLOCK_MONOTONIC_RAW, &clock_monotonic_raw);
posix_timers_cache = kmem_cache_create("posix_timers_cache",
sizeof (struct k_itimer), 0, SLAB_PANIC,
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index cb17979..fc70529 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -44,6 +44,7 @@ __cacheline_aligned_in_smp DEFINE_SEQLOCK(xtime_lock);
*/
struct timespec xtime __attribute__ ((aligned (16)));
struct timespec wall_to_monotonic __attribute__ ((aligned (16)));
+struct timespec monotonic_raw;
static unsigned long total_sleep_time; /* seconds */
static struct timespec xtime_cache __attribute__ ((aligned (16)));
@@ -106,6 +107,39 @@ void getnstimeofday(struct timespec *ts)
EXPORT_SYMBOL(getnstimeofday);
/**
+ * getrawmonotonic - Returns the raw monotonic time in a timespec
+ * @ts: pointer to the timespec to be set
+ *
+ * Returns the raw monotonic time (completely un-modified by ntp)
+ */
+void getrawmonotonic(struct timespec *ts)
+{
+ unsigned long seq;
+ s64 nsecs;
+ cycle_t cycle_now, cycle_delta;
+
+ do {
+ seq = read_seqbegin(&xtime_lock);
+
+ /* read clocksource: */
+ cycle_now = clocksource_read(clock);
+
+ /* calculate the delta since the last update_wall_time: */
+ cycle_delta = (cycle_now - clock->cycle_last) & clock->mask;
+
+ /* convert to nanoseconds: */
+ nsecs = ((s64)cycle_delta * clock->mult) >> clock->shift;
+
+ *ts = monotonic_raw;
+
+ } while (read_seqretry(&xtime_lock, seq));
+
+ timespec_add_ns(ts, nsecs);
+}
+
+EXPORT_SYMBOL(getrawmonotonic);
+
+/**
* do_gettimeofday - Returns the time of day in a timeval
* @tv: pointer to the timeval to be set
*
@@ -251,6 +285,8 @@ void __init timekeeping_init(void)
xtime.tv_nsec = 0;
set_normalized_timespec(&wall_to_monotonic,
-xtime.tv_sec, -xtime.tv_nsec);
+ set_normalized_timespec(&monotonic_raw, 0,0);
+
update_xtime_cache(0);
total_sleep_time = 0;
write_sequnlock_irqrestore(&xtime_lock, flags);
@@ -439,6 +475,7 @@ static void clocksource_adjust(s64 offset)
void update_wall_time(void)
{
cycle_t offset;
+ static u64 raw_snsec; /* shifted raw nanosecnds */
/* Make sure we're fully resumed: */
if (unlikely(timekeeping_suspended))
@@ -457,6 +494,7 @@ void update_wall_time(void)
while (offset >= clock->cycle_interval) {
/* accumulate one interval */
clock->xtime_nsec += clock->xtime_interval;
+ raw_snsec += clock->raw_interval;
clock->cycle_last += clock->cycle_interval;
offset -= clock->cycle_interval;
@@ -466,6 +504,11 @@ void update_wall_time(void)
second_overflow();
}
+ if (raw_snsec >= (u64)NSEC_PER_SEC << clock->shift) {
+ raw_snsec -= (u64)NSEC_PER_SEC << clock->shift;
+ monotonic_raw.tv_sec++;
+ }
+
/* accumulate error between NTP and clock interval */
clock->error += current_tick_length();
clock->error -= clock->xtime_interval << (TICK_LENGTH_SHIFT - clock->shift);
@@ -478,6 +521,10 @@ void update_wall_time(void)
xtime.tv_nsec = (s64)clock->xtime_nsec >> clock->shift;
clock->xtime_nsec -= (s64)xtime.tv_nsec << clock->shift;
+ /* store full nanoseconds into raw_monotonic */
+ monotonic_raw.tv_nsec = (s64)raw_snsec >> clock->shift;
+
+
update_xtime_cache(cyc2ns(clock, offset));
/* check to see if there is a new clocksource to use */
Make this file more readable by applying a consistent coding style.
No code changed.
Signed-off-by: Ingo Molnar <[email protected]>
Merged on top of Roman's very similar cleanups.
(In other words: Blame John for the merge screwups).
Signed-off-by: John Stultz <[email protected]>
Index: linux-2.6/kernel/time/ntp.c
===================================================================
--- linux-2.6.orig/kernel/time/ntp.c 2008-03-14 19:21:02.000000000 -0700
+++ linux-2.6/kernel/time/ntp.c 2008-03-14 20:10:41.000000000 -0700
@@ -1,13 +1,10 @@
/*
- * linux/kernel/time/ntp.c
- *
* NTP state machine interfaces and logic.
*
* This code was mainly moved from kernel/timer.c and kernel/time.c
* Please see those files for relevant copyright info and historical
* changelogs.
*/
-
#include <linux/mm.h>
#include <linux/time.h>
#include <linux/timer.h>
@@ -22,32 +19,35 @@
/*
* Timekeeping variables
*/
-unsigned long tick_usec = TICK_USEC; /* USER_HZ period (usec) */
-unsigned long tick_nsec; /* ACTHZ period (nsec) */
-u64 tick_length;
-static u64 tick_length_base;
-
-static struct hrtimer leap_timer;
-
-#define MAX_TICKADJ 500 /* microsecs */
-#define MAX_TICKADJ_SCALED (((u64)(MAX_TICKADJ * NSEC_PER_USEC) << \
- NTP_SCALE_SHIFT) / NTP_INTERVAL_FREQ)
+unsigned long tick_usec = TICK_USEC; /* USER_HZ period (usec) */
+unsigned long tick_nsec; /* ACTHZ period (nsec) */
+u64 tick_length;
+static u64 tick_length_base;
+static const long max_tickadj = 500; /* (usec) */
+
+static struct hrtimer leap_timer;
/*
* phase-lock loop variables
*/
/* TIME_ERROR prevents overwriting the CMOS clock */
-static int time_state = TIME_OK; /* clock synchronization status */
-int time_status = STA_UNSYNC; /* clock status bits */
-static long time_tai; /* TAI offset (s) */
-static s64 time_offset; /* time adjustment (ns) */
-static long time_constant = 2; /* pll time constant */
-long time_maxerror = NTP_PHASE_LIMIT; /* maximum error (us) */
-long time_esterror = NTP_PHASE_LIMIT; /* estimated error (us) */
-static s64 time_freq; /* frequency offset (scaled ns/s)*/
-static long time_reftime; /* time at last adjustment (s) */
-long time_adjust;
-static long ntp_tick_adj;
+static int time_state = TIME_OK; /* clock synch status */
+int time_status = STA_UNSYNC; /* clock status bits */
+static long time_tai; /* TAI offset (s) */
+static s64 time_offset; /* time adjustment (ns) */
+static long time_constant = 2; /* pll time constant */
+long time_maxerror = NTP_PHASE_LIMIT;/* maximum error (us) */
+long time_esterror = NTP_PHASE_LIMIT;/* estimated error (us) */
+static s64 time_freq; /* frequency offset (scaled ns/s)*/
+static long time_reftime; /* time at last adjustment (s) */
+long time_adjust;
+static long ntp_tick_adj;
+
+
+static inline u64 scale_tickadj(u64 adj)
+{
+ return ((adj * NSEC_PER_USEC) << NTP_SCALE_SHIFT) / NTP_INTERVAL_FREQ;
+}
static void ntp_update_frequency(void)
{
@@ -111,15 +111,15 @@ static void ntp_update_offset(long offse
*/
void ntp_clear(void)
{
- time_adjust = 0; /* stop active adjtime() */
- time_status |= STA_UNSYNC;
- time_maxerror = NTP_PHASE_LIMIT;
- time_esterror = NTP_PHASE_LIMIT;
+ time_adjust = 0; /* stop active adjtime() */
+ time_status |= STA_UNSYNC;
+ time_maxerror = NTP_PHASE_LIMIT;
+ time_esterror = NTP_PHASE_LIMIT;
ntp_update_frequency();
- tick_length = tick_length_base;
- time_offset = 0;
+ tick_length = tick_length_base;
+ time_offset = 0;
}
/*
@@ -136,28 +136,32 @@ static enum hrtimer_restart ntp_leap_sec
switch (time_state) {
case TIME_OK:
break;
+
case TIME_INS:
xtime.tv_sec--;
wall_to_monotonic.tv_sec++;
time_state = TIME_OOP;
- printk(KERN_NOTICE "Clock: "
- "inserting leap second 23:59:60 UTC\n");
+ printk(KERN_NOTICE
+ "Clock: inserting leap second 23:59:60 UTC\n");
leap_timer.expires = ktime_add_ns(leap_timer.expires,
NSEC_PER_SEC);
res = HRTIMER_RESTART;
break;
+
case TIME_DEL:
xtime.tv_sec++;
time_tai--;
wall_to_monotonic.tv_sec--;
time_state = TIME_WAIT;
- printk(KERN_NOTICE "Clock: "
- "deleting leap second 23:59:59 UTC\n");
+ printk(KERN_NOTICE
+ "Clock: deleting leap second 23:59:59 UTC\n");
break;
+
case TIME_OOP:
time_tai++;
time_state = TIME_WAIT;
/* fall through */
+
case TIME_WAIT:
if (!(time_status & (STA_INS | STA_DEL)))
time_state = TIME_OK;
@@ -193,21 +197,20 @@ void second_overflow(void)
* Compute the phase adjustment for the next second. The offset is
* reduced by a fixed factor times the time constant.
*/
- tick_length = tick_length_base;
- time_adj = shift_right(time_offset, SHIFT_PLL + time_constant);
- time_offset -= time_adj;
- tick_length += time_adj;
+ tick_length = tick_length_base;
+ time_adj = shift_right(time_offset, SHIFT_PLL + time_constant);
+ time_offset -= time_adj;
+ tick_length += time_adj;
if (unlikely(time_adjust)) {
- if (time_adjust > MAX_TICKADJ) {
- time_adjust -= MAX_TICKADJ;
- tick_length += MAX_TICKADJ_SCALED;
- } else if (time_adjust < -MAX_TICKADJ) {
- time_adjust += MAX_TICKADJ;
- tick_length -= MAX_TICKADJ_SCALED;
+ if (time_adjust > max_tickadj) {
+ time_adjust -= max_tickadj;
+ tick_length += scale_tickadj(max_tickadj);
+ } else if (time_adjust < -max_tickadj) {
+ time_adjust += max_tickadj;
+ tick_length -= scale_tickadj(max_tickadj);
} else {
- tick_length += (s64)(time_adjust * NSEC_PER_USEC /
- NTP_INTERVAL_FREQ) << NTP_SCALE_SHIFT;
+ tick_length += scale_tickadj(time_adjust);
time_adjust = 0;
}
}
@@ -216,13 +219,13 @@ void second_overflow(void)
#ifdef CONFIG_GENERIC_CMOS_UPDATE
/* Disable the cmos update - used by virtualization and embedded */
-int no_sync_cmos_clock __read_mostly;
+int no_sync_cmos_clock __read_mostly;
-static void sync_cmos_clock(unsigned long dummy);
+static void sync_cmos_clock(unsigned long unused);
static DEFINE_TIMER(sync_cmos_timer, sync_cmos_clock, 0, 0);
-static void sync_cmos_clock(unsigned long dummy)
+static void sync_cmos_clock(unsigned long unused)
{
struct timespec now, next;
int fail = 1;
@@ -234,12 +237,13 @@ static void sync_cmos_clock(unsigned lon
* This code is run on a timer. If the clock is set, that timer
* may not expire at the correct time. Thus, we adjust...
*/
- if (!ntp_synced())
+ if (!ntp_synced()) {
/*
* Not synced, exit, do not restart a timer (if one is
* running, let it run out).
*/
return;
+ }
getnstimeofday(&now);
if (abs(now.tv_nsec - (NSEC_PER_SEC / 2)) <= tick_nsec / 2)
@@ -271,7 +275,8 @@ static void notify_cmos_timer(void)
static inline void notify_cmos_timer(void) { }
#endif
-/* adjtimex mainly allows reading (and writing, if superuser) of
+/*
+ * adjtimex mainly allows reading (and writing, if superuser) of
* kernel time-keeping variables. used by xntpd.
*/
int do_adjtimex(struct timex *txc)
@@ -293,10 +298,11 @@ int do_adjtimex(struct timex *txc)
}
/* if the quartz is off by more than 10% something is VERY wrong ! */
- if (txc->modes & ADJ_TICK)
+ if (txc->modes & ADJ_TICK) {
if (txc->tick < 900000/USER_HZ ||
txc->tick > 1100000/USER_HZ)
return -EINVAL;
+ }
if (time_state != TIME_OK && txc->modes & ADJ_STATUS)
hrtimer_cancel(&leap_timer);
From: Ingo Molnar <[email protected]>
Subject: [patch] ntp: clean up code flow
clean up the flow of code by splitting the way too large and way too
nested do_adjtimex() function.
no change in the logic.
Signed-off-by: Ingo Molnar <[email protected]>
Merged on top of Roman's very similar cleanups.
(In other words: Blame John for the merge screwups).
Signed-off-by: John Stultz <[email protected]>
Index: linux-2.6/kernel/time/ntp.c
===================================================================
--- linux-2.6.orig/kernel/time/ntp.c 2008-03-14 20:10:41.000000000 -0700
+++ linux-2.6/kernel/time/ntp.c 2008-03-14 20:43:48.000000000 -0700
@@ -275,6 +275,87 @@ static void notify_cmos_timer(void)
static inline void notify_cmos_timer(void) { }
#endif
+static inline void process_adjtimex_adj_offset(struct timex *txc, long sec)
+{
+ if (txc->modes & ADJ_STATUS) {
+ if ((time_status & STA_PLL) && !(txc->status & STA_PLL)) {
+ time_state = TIME_OK;
+ time_status = STA_UNSYNC;
+ }
+ /* only set allowed bits */
+ time_status &= STA_RONLY;
+ time_status |= txc->status & ~STA_RONLY;
+
+ switch (time_state) {
+ case TIME_OK:
+ start_timer:
+ if (time_status & STA_INS) {
+ time_state = TIME_INS;
+ sec += 86400 - sec % 86400;
+ hrtimer_start(&leap_timer, ktime_set(sec, 0), HRTIMER_MODE_ABS);
+ } else if (time_status & STA_DEL) {
+ time_state = TIME_DEL;
+ sec += 86400 - (sec + 1) % 86400;
+ hrtimer_start(&leap_timer, ktime_set(sec, 0), HRTIMER_MODE_ABS);
+ }
+ break;
+ case TIME_INS:
+ case TIME_DEL:
+ time_state = TIME_OK;
+ goto start_timer;
+ break;
+ case TIME_WAIT:
+ if (!(time_status & (STA_INS | STA_DEL)))
+ time_state = TIME_OK;
+ break;
+ case TIME_OOP:
+ hrtimer_restart(&leap_timer);
+ break;
+ }
+ }
+
+ if (txc->modes & ADJ_NANO)
+ time_status |= STA_NANO;
+ if (txc->modes & ADJ_MICRO)
+ time_status &= ~STA_NANO;
+
+ if (txc->modes & ADJ_FREQUENCY) {
+ time_freq = (s64)txc->freq * PPM_SCALE;
+ time_freq = min(time_freq, MAXFREQ_SCALED);
+ time_freq = max(time_freq, -MAXFREQ_SCALED);
+ }
+
+ if (txc->modes & ADJ_MAXERROR)
+ time_maxerror = txc->maxerror;
+ if (txc->modes & ADJ_ESTERROR)
+ time_esterror = txc->esterror;
+
+ if (txc->modes & ADJ_TIMECONST) {
+ time_constant = txc->constant;
+ if (!(time_status & STA_NANO))
+ time_constant += 4;
+ time_constant = min(time_constant, (long)MAXTC);
+ time_constant = max(time_constant, 0l);
+ }
+
+ if (txc->modes & ADJ_TAI && txc->constant > 0)
+ time_tai = txc->constant;
+
+ if (txc->modes & ADJ_OFFSET) {
+ if (txc->modes == ADJ_OFFSET_SINGLESHOT)
+ /* adjtime() is independent from ntp_adjtime() */
+ time_adjust = txc->offset;
+ else
+ ntp_update_offset(txc->offset);
+ }
+
+ if (txc->modes & ADJ_TICK)
+ tick_usec = txc->tick;
+
+ if (txc->modes & (ADJ_TICK|ADJ_FREQUENCY|ADJ_OFFSET))
+ ntp_update_frequency();
+}
+
/*
* adjtimex mainly allows reading (and writing, if superuser) of
* kernel time-keeping variables. used by xntpd.
@@ -282,7 +363,7 @@ static inline void notify_cmos_timer(voi
int do_adjtimex(struct timex *txc)
{
struct timespec ts;
- long save_adjust, sec;
+ long save_adjust;
int result;
/* In order to modify anything, you gotta be super-user! */
@@ -306,6 +387,7 @@ int do_adjtimex(struct timex *txc)
if (time_state != TIME_OK && txc->modes & ADJ_STATUS)
hrtimer_cancel(&leap_timer);
+
getnstimeofday(&ts);
write_seqlock_irq(&xtime_lock);
@@ -314,121 +396,42 @@ int do_adjtimex(struct timex *txc)
save_adjust = time_adjust;
/* If there are input parameters, then process them */
- if (txc->modes) {
- if (txc->modes & ADJ_STATUS) {
- if ((time_status & STA_PLL) &&
- !(txc->status & STA_PLL)) {
- time_state = TIME_OK;
- time_status = STA_UNSYNC;
- }
- /* only set allowed bits */
- time_status &= STA_RONLY;
- time_status |= txc->status & ~STA_RONLY;
-
- switch (time_state) {
- case TIME_OK:
- start_timer:
- sec = ts.tv_sec;
- if (time_status & STA_INS) {
- time_state = TIME_INS;
- sec += 86400 - sec % 86400;
- hrtimer_start(&leap_timer, ktime_set(sec, 0), HRTIMER_MODE_ABS);
- } else if (time_status & STA_DEL) {
- time_state = TIME_DEL;
- sec += 86400 - (sec + 1) % 86400;
- hrtimer_start(&leap_timer, ktime_set(sec, 0), HRTIMER_MODE_ABS);
- }
- break;
- case TIME_INS:
- case TIME_DEL:
- time_state = TIME_OK;
- goto start_timer;
- break;
- case TIME_WAIT:
- if (!(time_status & (STA_INS | STA_DEL)))
- time_state = TIME_OK;
- break;
- case TIME_OOP:
- hrtimer_restart(&leap_timer);
- break;
- }
- }
-
- if (txc->modes & ADJ_NANO)
- time_status |= STA_NANO;
- if (txc->modes & ADJ_MICRO)
- time_status &= ~STA_NANO;
-
- if (txc->modes & ADJ_FREQUENCY) {
- time_freq = (s64)txc->freq * PPM_SCALE;
- time_freq = min(time_freq, MAXFREQ_SCALED);
- time_freq = max(time_freq, -MAXFREQ_SCALED);
- }
-
- if (txc->modes & ADJ_MAXERROR)
- time_maxerror = txc->maxerror;
- if (txc->modes & ADJ_ESTERROR)
- time_esterror = txc->esterror;
-
- if (txc->modes & ADJ_TIMECONST) {
- time_constant = txc->constant;
- if (!(time_status & STA_NANO))
- time_constant += 4;
- time_constant = min(time_constant, (long)MAXTC);
- time_constant = max(time_constant, 0l);
- }
-
- if (txc->modes & ADJ_TAI && txc->constant > 0)
- time_tai = txc->constant;
-
- if (txc->modes & ADJ_OFFSET) {
- if (txc->modes == ADJ_OFFSET_SINGLESHOT)
- /* adjtime() is independent from ntp_adjtime() */
- time_adjust = txc->offset;
- else
- ntp_update_offset(txc->offset);
- }
- if (txc->modes & ADJ_TICK)
- tick_usec = txc->tick;
-
- if (txc->modes & (ADJ_TICK|ADJ_FREQUENCY|ADJ_OFFSET))
- ntp_update_frequency();
- }
+ if (txc->modes)
+ process_adjtimex_adj_offset(txc, ts.tv_sec);
result = time_state; /* mostly `TIME_OK' */
if (time_status & (STA_UNSYNC|STA_CLOCKERR))
result = TIME_ERROR;
if ((txc->modes == ADJ_OFFSET_SINGLESHOT) ||
- (txc->modes == ADJ_OFFSET_SS_READ))
+ (txc->modes == ADJ_OFFSET_SS_READ)) {
txc->offset = save_adjust;
- else {
+ } else {
txc->offset = shift_right(time_offset * NTP_INTERVAL_FREQ,
NTP_SCALE_SHIFT);
if (!(time_status & STA_NANO))
txc->offset /= NSEC_PER_USEC;
}
- txc->freq = shift_right((s32)(time_freq >> PPM_SCALE_INV_SHIFT) *
- (s64)PPM_SCALE_INV,
- NTP_SCALE_SHIFT);
- txc->maxerror = time_maxerror;
- txc->esterror = time_esterror;
- txc->status = time_status;
- txc->constant = time_constant;
- txc->precision = 1;
- txc->tolerance = MAXFREQ_SCALED / PPM_SCALE;
- txc->tick = tick_usec;
- txc->tai = time_tai;
+ txc->freq = shift_right((s32)(time_freq >> PPM_SCALE_INV_SHIFT) *
+ (s64)PPM_SCALE_INV, NTP_SCALE_SHIFT);
+ txc->maxerror = time_maxerror;
+ txc->esterror = time_esterror;
+ txc->status = time_status;
+ txc->constant = time_constant;
+ txc->precision = 1;
+ txc->tolerance = MAXFREQ_SCALED / PPM_SCALE;
+ txc->tick = tick_usec;
+ txc->tai = time_tai;
/* PPS is not implemented, so these are zero */
- txc->ppsfreq = 0;
- txc->jitter = 0;
- txc->shift = 0;
- txc->stabil = 0;
- txc->jitcnt = 0;
- txc->calcnt = 0;
- txc->errcnt = 0;
- txc->stbcnt = 0;
+ txc->ppsfreq = 0;
+ txc->jitter = 0;
+ txc->shift = 0;
+ txc->stabil = 0;
+ txc->jitcnt = 0;
+ txc->calcnt = 0;
+ txc->errcnt = 0;
+ txc->stbcnt = 0;
write_sequnlock_irq(&xtime_lock);
txc->time.tv_sec = ts.tv_sec;
Make time_status, time_maxerror and time_esterror static to ntp.c
Signed-off-by: John Stultz <[email protected]>
Index: linux-2.6/arch/cris/arch-v32/kernel/time.c
===================================================================
--- linux-2.6.orig/arch/cris/arch-v32/kernel/time.c 2008-02-13 12:07:00.000000000 -0800
+++ linux-2.6/arch/cris/arch-v32/kernel/time.c 2008-03-14 20:13:24.000000000 -0700
@@ -248,8 +248,7 @@ timer_interrupt(int irq, void *dev_id)
* The division here is not time critical since it will run once in
* 11 minutes
*/
- if ((time_status & STA_UNSYNC) == 0 &&
- xtime.tv_sec > last_rtc_update + 660 &&
+ if (ntp_synced() && xtime.tv_sec > last_rtc_update + 660 &&
(xtime.tv_nsec / 1000) >= 500000 - (tick_nsec / 1000) / 2 &&
(xtime.tv_nsec / 1000) <= 500000 + (tick_nsec / 1000) / 2) {
if (set_rtc_mmss(xtime.tv_sec) == 0)
Index: linux-2.6/arch/mn10300/kernel/rtc.c
===================================================================
--- linux-2.6.orig/arch/mn10300/kernel/rtc.c 2008-02-13 12:07:00.000000000 -0800
+++ linux-2.6/arch/mn10300/kernel/rtc.c 2008-03-14 20:13:24.000000000 -0700
@@ -117,8 +117,7 @@ void check_rtc_time(void)
* RTC clock accordingly every ~11 minutes. set_rtc_mmss() has to be
* called as close as possible to 500 ms before the new second starts.
*/
- if ((time_status & STA_UNSYNC) == 0 &&
- xtime.tv_sec > last_rtc_update + 660 &&
+ if (ntp_synced() && xtime.tv_sec > last_rtc_update + 660 &&
xtime.tv_nsec / 1000 >= 500000 - ((unsigned) TICK_SIZE) / 2 &&
xtime.tv_nsec / 1000 <= 500000 + ((unsigned) TICK_SIZE) / 2
) {
Index: linux-2.6/include/linux/timex.h
===================================================================
--- linux-2.6.orig/include/linux/timex.h 2008-03-14 19:21:02.000000000 -0700
+++ linux-2.6/include/linux/timex.h 2008-03-14 20:18:44.000000000 -0700
@@ -206,23 +206,8 @@ extern int tickadj; /* amount of adjus
/*
* phase-lock loop variables
*/
-extern int time_status; /* clock synchronization status bits */
-extern long time_maxerror; /* maximum error */
-extern long time_esterror; /* estimated error */
-
extern long time_adjust; /* The amount of adjtime left */
-extern void ntp_init(void);
-extern void ntp_clear(void);
-
-/**
- * ntp_synced - Returns 1 if the NTP status is not UNSYNC
- *
- */
-static inline int ntp_synced(void)
-{
- return !(time_status & STA_UNSYNC);
-}
/* Required to safely shift negative values */
#define shift_right(x, s) ({ \
@@ -243,6 +228,9 @@ static inline int ntp_synced(void)
/* Returns how long ticks are at present, in ns / 2^NTP_SCALE_SHIFT. */
extern u64 tick_length;
+extern void ntp_init(void);
+extern int ntp_synced(void);
+extern void ntp_clear(void);
extern void second_overflow(void);
extern void update_ntp_one_tick(void);
extern int do_adjtimex(struct timex *);
Index: linux-2.6/kernel/time/ntp.c
===================================================================
--- linux-2.6.orig/kernel/time/ntp.c 2008-03-14 20:13:06.000000000 -0700
+++ linux-2.6/kernel/time/ntp.c 2008-03-14 20:15:27.000000000 -0700
@@ -32,13 +32,13 @@ static struct hrtimer leap_timer;
*/
/* TIME_ERROR prevents overwriting the CMOS clock */
static int time_state = TIME_OK; /* clock synch status */
-int time_status = STA_UNSYNC; /* clock status bits */
+static int time_status = STA_UNSYNC; /* clock status bits */
static long time_tai; /* TAI offset (s) */
static s64 time_offset; /* time adjustment (ns) */
static long time_constant = 2; /* pll time constant */
-long time_maxerror = NTP_PHASE_LIMIT;/* maximum error (us) */
-long time_esterror = NTP_PHASE_LIMIT;/* estimated error (us) */
-static s64 time_freq; /* frequency offset (scaled ns/s)*/
+static long time_maxerror = NTP_PHASE_LIMIT;/* maximum error (us) */
+static long time_esterror = NTP_PHASE_LIMIT;/* estimated error (us) */
+static s64 time_freq; /* frequency offset (scaled ns/s)*/
static long time_reftime; /* time at last adjustment (s) */
long time_adjust;
static long ntp_tick_adj;
@@ -174,6 +174,15 @@ static enum hrtimer_restart ntp_leap_sec
return res;
}
+/**
+ * ntp_synced - Returns 1 if the NTP status is not UNSYNC
+ *
+ */
+int ntp_synced(void)
+{
+ return !(time_status & STA_UNSYNC);
+}
+
/*
* this routine handles the overflow of the microsecond field
*
Hi,
On Fri, 14 Mar 2008, john stultz wrote:
> @@ -179,7 +181,7 @@ static inline cycle_t clocksource_read(struct clocksource *cs)
> static inline s64 cyc2ns(struct clocksource *cs, cycle_t cycles)
> {
> u64 ret = (u64)cycles;
> - ret = (ret * cs->mult) >> cs->shift;
> + ret = (ret * (cs->mult + cs->mult_adj)) >> cs->shift;
> return ret;
> }
This add an extra memory access and extra instruction to a code path, we
should keep as short as possible.
I'd rather add a mult_raw or mult_org and use that for your next patch.
bye, Roman
Hi,
On Fri, 14 Mar 2008, john stultz wrote:
> My solution is to introduce CLOCK_MONOTONIC_RAW. This exposes a
> nanosecond based time value, that increments starting at bootup and has
> no frequency adjustments made to it what so ever.
A general comment: the raw clock doesn't need any adjustments, so updates
don't have to be done that frequently and you can move most of that logic
into second_overflow().
> @@ -439,6 +475,7 @@ static void clocksource_adjust(s64 offset)
> void update_wall_time(void)
> {
> cycle_t offset;
> + static u64 raw_snsec; /* shifted raw nanosecnds */
>
> /* Make sure we're fully resumed: */
> if (unlikely(timekeeping_suspended))
IMO that's really a clock property, so this belongs in the clock
structure.
(Some day we may want to have multiple active clocks for various purposes
and thus export multiple raw clocks.)
> @@ -466,6 +504,11 @@ void update_wall_time(void)
> second_overflow();
> }
>
> + if (raw_snsec >= (u64)NSEC_PER_SEC << clock->shift) {
> + raw_snsec -= (u64)NSEC_PER_SEC << clock->shift;
> + monotonic_raw.tv_sec++;
> + }
> +
You don't really have to use clock->shift as a scale, thus simplifying the
shifting here (e.g. by using NTP_SCALE_SHIFT).
I'm thinking about doing this for e.g. xtime_nsec as well.
bye, Roman
Hi,
On Fri, 14 Mar 2008, john stultz wrote:
> +static inline void process_adjtimex_adj_offset(struct timex *txc, long sec)
Somewhat ugly name...
>From an userspace perspective it's the core of ntp_adjtime(), so I'd
prefer something close to that.
bye, Roman
On Sat, 2008-03-15 at 05:52 +0100, Roman Zippel wrote:
> Hi,
>
> On Fri, 14 Mar 2008, john stultz wrote:
>
> > +static inline void process_adjtimex_adj_offset(struct timex *txc, long sec)
>
> Somewhat ugly name...
Oh, that's a mis-merge. Thanks for catching it. The name should have
been: process_adjtimex_modes().
process_adjtimex_adj_offset was basically the same as your
ntp_update_offset() function, so I killed it but grabbed the wrong
function name.
> From an userspace perspective it's the core of ntp_adjtime(), so I'd
> prefer something close to that.
does process_adjtimex_modes(), or ntp_process_adjtimex_modes work for
you?
thanks
-john
On Sat, 2008-03-15 at 05:28 +0100, Roman Zippel wrote:
> Hi,
>
> On Fri, 14 Mar 2008, john stultz wrote:
>
> > @@ -179,7 +181,7 @@ static inline cycle_t clocksource_read(struct clocksource *cs)
> > static inline s64 cyc2ns(struct clocksource *cs, cycle_t cycles)
> > {
> > u64 ret = (u64)cycles;
> > - ret = (ret * cs->mult) >> cs->shift;
> > + ret = (ret * (cs->mult + cs->mult_adj)) >> cs->shift;
> > return ret;
> > }
>
> This add an extra memory access and extra instruction to a code path, we
> should keep as short as possible.
> I'd rather add a mult_raw or mult_org and use that for your next patch.
Yea, I played with that at first, but the functoinal duplication (two
values, both storing close to the same info) was ugly.
I guess I could go back to mult_orig.
-john
Hi,
On Fri, 14 Mar 2008, john stultz wrote:
> Make time_status, time_maxerror and time_esterror static to ntp.c
time_maxerror and time_esterror are ok, but ntp_synced() is such a small
function, I'd rather keep it inlined.
bye, Roman
Hi,
On Fri, 14 Mar 2008, john stultz wrote:
> -#define MAX_TICKADJ 500 /* microsecs */
> +static const long max_tickadj = 500; /* (usec) */
That's a matter of taste, but MAX_TICKADJ just works as well, so why
change it?
> /* TIME_ERROR prevents overwriting the CMOS clock */
> -static int time_state = TIME_OK; /* clock synchronization status */
> -int time_status = STA_UNSYNC; /* clock status bits */
> -static long time_tai; /* TAI offset (s) */
> -static s64 time_offset; /* time adjustment (ns) */
> -static long time_constant = 2; /* pll time constant */
> -long time_maxerror = NTP_PHASE_LIMIT; /* maximum error (us) */
> -long time_esterror = NTP_PHASE_LIMIT; /* estimated error (us) */
> -static s64 time_freq; /* frequency offset (scaled ns/s)*/
> -static long time_reftime; /* time at last adjustment (s) */
> -long time_adjust;
> -static long ntp_tick_adj;
> +static int time_state = TIME_OK; /* clock synch status */
> +int time_status = STA_UNSYNC; /* clock status bits */
> +static long time_tai; /* TAI offset (s) */
> +static s64 time_offset; /* time adjustment (ns) */
> +static long time_constant = 2; /* pll time constant */
> +long time_maxerror = NTP_PHASE_LIMIT;/* maximum error (us) */
> +long time_esterror = NTP_PHASE_LIMIT;/* estimated error (us) */
> +static s64 time_freq; /* frequency offset (scaled ns/s)*/
> +static long time_reftime; /* time at last adjustment (s) */
> +long time_adjust;
> +static long ntp_tick_adj;
I must say I really hate this kind of formating, it either gets out of
sync or requires reformatting. I would very strongly prefer not to
introduce this sort of formatting here.
> +static inline u64 scale_tickadj(u64 adj)
> +{
> + return ((adj * NSEC_PER_USEC) << NTP_SCALE_SHIFT) / NTP_INTERVAL_FREQ;
> +}
The function argument is actually s32 and depending on NTP_INTERVAL_FREQ
it might produce a link error on 32bit archs.
bye, Roman
john stultz wrote:
> Make this file more readable by applying a consistent coding style.
>
> No code changed.
>
> Signed-off-by: Ingo Molnar <[email protected]>
>
> Merged on top of Roman's very similar cleanups.
> (In other words: Blame John for the merge screwups).
>
> Signed-off-by: John Stultz <[email protected]>
>
>
> Index: linux-2.6/kernel/time/ntp.c
> ===================================================================
<snip>
> @@ -193,21 +197,20 @@ void second_overflow(void)
> * Compute the phase adjustment for the next second. The offset is
> * reduced by a fixed factor times the time constant.
> */
> - tick_length = tick_length_base;
> - time_adj = shift_right(time_offset, SHIFT_PLL + time_constant);
> - time_offset -= time_adj;
> - tick_length += time_adj;
> + tick_length = tick_length_base;
> + time_adj = shift_right(time_offset, SHIFT_PLL + time_constant);
> + time_offset -= time_adj;
> + tick_length += time_adj;
>
> if (unlikely(time_adjust)) {
> - if (time_adjust > MAX_TICKADJ) {
> - time_adjust -= MAX_TICKADJ;
> - tick_length += MAX_TICKADJ_SCALED;
> - } else if (time_adjust < -MAX_TICKADJ) {
> - time_adjust += MAX_TICKADJ;
> - tick_length -= MAX_TICKADJ_SCALED;
> + if (time_adjust > max_tickadj) {
> + time_adjust -= max_tickadj;
> + tick_length += scale_tickadj(max_tickadj);
> + } else if (time_adjust < -max_tickadj) {
> + time_adjust += max_tickadj;
> + tick_length -= scale_tickadj(max_tickadj);
> } else {
> - tick_length += (s64)(time_adjust * NSEC_PER_USEC /
> - NTP_INTERVAL_FREQ) << NTP_SCALE_SHIFT;
> + tick_length += scale_tickadj(time_adjust);
> time_adjust = 0;
> }
> }
<snip>
Since you are at it, isn't "time_adj" a poor name since you also have a
"time_adjust"?
--
Regards,
J?rg-Volker.
Hi John,
On Saturday 15 March 2008, john stultz wrote:
> Index: linux-2.6/kernel/time/ntp.c
> ===================================================================
> --- linux-2.6.orig/kernel/time/ntp.c 2008-03-14 20:10:41.000000000 -0700
> +++ linux-2.6/kernel/time/ntp.c 2008-03-14 20:43:48.000000000 -0700
> @@ -275,6 +275,87 @@ static void notify_cmos_timer(void)
> static inline void notify_cmos_timer(void) { }
> #endif
>
> +static inline void process_adjtimex_adj_offset(struct timex *txc, long sec)
> +{
> + if (txc->modes & ADJ_STATUS) {
> + if ((time_status & STA_PLL) && !(txc->status & STA_PLL)) {
> + time_state = TIME_OK;
> + time_status = STA_UNSYNC;
> + }
> + /* only set allowed bits */
> + time_status &= STA_RONLY;
> + time_status |= txc->status & ~STA_RONLY;
> +
> + switch (time_state) {
> + case TIME_OK:
> + start_timer:
> + if (time_status & STA_INS) {
> + time_state = TIME_INS;
> + sec += 86400 - sec % 86400;
> + hrtimer_start(&leap_timer, ktime_set(sec, 0), HRTIMER_MODE_ABS);
> + } else if (time_status & STA_DEL) {
> + time_state = TIME_DEL;
> + sec += 86400 - (sec + 1) % 86400;
> + hrtimer_start(&leap_timer, ktime_set(sec, 0), HRTIMER_MODE_ABS);
> + }
> + break;
> + case TIME_INS:
> + case TIME_DEL:
> + time_state = TIME_OK;
> + goto start_timer;
> + break;
I don't understand, why the goto is required here.
If you move these two cases in front of case "TIME_OK" and
omit the break, you can remove the goto and the label "start_timer".
Don't forget the /* fall through */ comment then.
If you want to keep this patch a simple code movement,
maybe add a later patch to improve this on top of it.
Best Regards
Ingo Oeser
Hi,
On Sat, 15 Mar 2008, J?rg-Volker Peetz wrote:
> Since you are at it, isn't "time_adj" a poor name since you also have a
> "time_adjust"?
These are two different (and independent) mechanism to adjust time. The
names are a little confusing, but I'd be a little reluctant to rename
time_adj, as the same name is used in the reference implementation.
bye, Roman
Hi,
On Sat, 15 Mar 2008, Ingo Oeser wrote:
> I don't understand, why the goto is required here.
>
> If you move these two cases in front of case "TIME_OK" and
> omit the break, you can remove the goto and the label "start_timer".
> Don't forget the /* fall through */ comment then.
Nice catch, I'll do this in the original patch.
bye, Roman
On Sat, 2008-03-15 at 05:50 +0100, Roman Zippel wrote:
> Hi,
>
> On Fri, 14 Mar 2008, john stultz wrote:
>
> > My solution is to introduce CLOCK_MONOTONIC_RAW. This exposes a
> > nanosecond based time value, that increments starting at bootup and has
> > no frequency adjustments made to it what so ever.
>
> A general comment: the raw clock doesn't need any adjustments, so updates
> don't have to be done that frequently and you can move most of that logic
> into second_overflow().
You're suggesting adding MONTONIC_RAW code to ntp.c ? That doesn't make
much sense to me (the whole point is its not ntp adjusted). Even if you
mean just inside the "if (clock->xtime_nsec >= (u64)NSEC_PER_SEC ..."
conditional, then that means we don't get to leverage the
cycles_interval, and cycle_last, values, so all of that would have to be
duplicated and managed. Which I'm not sure it would save us much more
then the extra add and compare here.
> > @@ -439,6 +475,7 @@ static void clocksource_adjust(s64 offset)
> > void update_wall_time(void)
> > {
> > cycle_t offset;
> > + static u64 raw_snsec; /* shifted raw nanosecnds */
> >
> > /* Make sure we're fully resumed: */
> > if (unlikely(timekeeping_suspended))
>
> IMO that's really a clock property, so this belongs in the clock
> structure.
> (Some day we may want to have multiple active clocks for various purposes
> and thus export multiple raw clocks.)
I disagree. I think that crufts up the clocksource structure (which is
ideally just a simple hw counter abstraction), with timekeeping state.
I'm still not sold on the multiple clocks with multiple notions of time
idea you keep on bringing up. But if/when we cross that bridge, maybe it
would be better to add a timekeeping_clock mid-layer abstraction that
keeps the clocksource specific timekeeping state. That way we don't add
lots of complexity for the clocksource driver writers to deal with and
we allow the clocksources to be better re-purposed (for maybe more sane
things like performance counters) without getting too bloated.
> > @@ -466,6 +504,11 @@ void update_wall_time(void)
> > second_overflow();
> > }
> >
> > + if (raw_snsec >= (u64)NSEC_PER_SEC << clock->shift) {
> > + raw_snsec -= (u64)NSEC_PER_SEC << clock->shift;
> > + monotonic_raw.tv_sec++;
> > + }
> > +
>
> You don't really have to use clock->shift as a scale, thus simplifying the
> shifting here (e.g. by using NTP_SCALE_SHIFT).
> I'm thinking about doing this for e.g. xtime_nsec as well.
Could you explain this more?
Given the clocksources have different shift values, why should we not
keep the extra resolution?
How does adding the extra shifting to convert from the clocksource scale
to the NTP_SCALE_SHIFT value simplify the shifting?
thanks
-john
On Mon, 2008-03-17 at 19:03 -0700, john stultz wrote:
> On Sat, 2008-03-15 at 05:50 +0100, Roman Zippel wrote:
> > > @@ -439,6 +475,7 @@ static void clocksource_adjust(s64 offset)
> > > void update_wall_time(void)
> > > {
> > > cycle_t offset;
> > > + static u64 raw_snsec; /* shifted raw nanosecnds */
> > >
> > > /* Make sure we're fully resumed: */
> > > if (unlikely(timekeeping_suspended))
> >
> > IMO that's really a clock property, so this belongs in the clock
> > structure.
> > (Some day we may want to have multiple active clocks for various purposes
> > and thus export multiple raw clocks.)
>
> I disagree. I think that crufts up the clocksource structure (which is
> ideally just a simple hw counter abstraction), with timekeeping state.
Bah. Ok, I've talked myself out of this one.
I still think it crufts up the clocksource structure, but its more
consistent that we follow the established cruft (such as the
pre-calculated cycle_interval/xtime_interval/raw_interval combo) rather
then me trying to arbitrarily draw the line in the sand at this
variable.
> I'm still not sold on the multiple clocks with multiple notions of time
> idea you keep on bringing up. But if/when we cross that bridge, maybe it
> would be better to add a timekeeping_clock mid-layer abstraction that
> keeps the clocksource specific timekeeping state. That way we don't add
> lots of complexity for the clocksource driver writers to deal with and
> we allow the clocksources to be better re-purposed (for maybe more sane
> things like performance counters) without getting too bloated.
I still think pulling out all of the non-counter-abstraction bits out of
the clocksource and into a mid-level timekeeping_clock structure would
still be ideal here, but I'll save our time/energy on that one for
another day. :)
thanks
-john
Hi,
On Tuesday 18. March 2008, john stultz wrote:
> > A general comment: the raw clock doesn't need any adjustments, so updates
> > don't have to be done that frequently and you can move most of that logic
> > into second_overflow().
>
> You're suggesting adding MONTONIC_RAW code to ntp.c ? That doesn't make
> much sense to me (the whole point is its not ntp adjusted).
I didn't looked at the code, what I meant was moving it close to it, so it's
done at the same time.
> Even if you
> mean just inside the "if (clock->xtime_nsec >= (u64)NSEC_PER_SEC ..."
> conditional, then that means we don't get to leverage the
> cycles_interval, and cycle_last, values, so all of that would have to be
> duplicated and managed. Which I'm not sure it would save us much more
> then the extra add and compare here.
You could use a shift (e.g. SHIFT_HZ for non NO_HZ), to scale these value up a
little.
> > IMO that's really a clock property, so this belongs in the clock
> > structure.
> > (Some day we may want to have multiple active clocks for various purposes
> > and thus export multiple raw clocks.)
>
> I disagree. I think that crufts up the clocksource structure (which is
> ideally just a simple hw counter abstraction), with timekeeping state.
It's not cruft, littering the code with unnecessary static variables is IMO
worse. In OO terms this would be an abstract base class, the actual
implementation only needs to provide a few functions and otherwise doesn't
really have to worry about all the details.
Splitting the structure is certainly an option, but hiding related variables
as static is IMO not a good choice.
> I'm still not sold on the multiple clocks with multiple notions of time
> idea you keep on bringing up.
Well, here are some ideas where I think it might be useful, it would make it
possible to chain clocks with different frequencies:
- SMP systems with slighty different clock frequencies.
- instead of deactivating an unstable tsc clock, delay activation until we
know it's stable.
- deactivate a tsc (which stops during sleep) before sleep and resync when
needed.
- use different clocks for timer purposes (e.g. jiffies and a slow clock).
(The last point would make IMO more acceptable to use hrtimer for more trivial
purposes, if it were at least possible to select the used clock.)
> > > + if (raw_snsec >= (u64)NSEC_PER_SEC << clock->shift) {
> > > + raw_snsec -= (u64)NSEC_PER_SEC << clock->shift;
> > > + monotonic_raw.tv_sec++;
> > > + }
> > > +
> >
> > You don't really have to use clock->shift as a scale, thus simplifying
> > the shifting here (e.g. by using NTP_SCALE_SHIFT).
> > I'm thinking about doing this for e.g. xtime_nsec as well.
>
> Could you explain this more?
>
> Given the clocksources have different shift values, why should we not
> keep the extra resolution?
The extra resolution is still there, as it doesn't make any sense to use a
shift larger than 32.
> How does adding the extra shifting to convert from the clocksource scale
> to the NTP_SCALE_SHIFT value simplify the shifting?
NTP_SCALE_SHIFT is a constant and a very simple shift, which produces better
code.
There are two ways to use the nsec value here:
1. nsec_now = (((cycle_new - cycle_last) * mult) + nsec) >> shift
2. nsec_now = ((cycle_new - cycle_last) * mult) >> shift +
nsec >> NTP_SCALE_SHIFT
Both do the same under the condition that the gettime operation is not
significantly shorter than 1nsec, but the second version can generate
slightly better code. I intended to use the first for xtime_nsec, but instead
the value is still splitted at every timer interrupt.
For the raw time please at least get rid of the splitting of raw_nsec in
update_wall_time(). From the monotonic_raw value you only need the tv_sec
part and the tv_nsec part can be calculated as above.
bye, Roman