LinuxLists.cc - [RFC][PATCH] Experimental enhanced NTP error accounting patch

2006-03-04 04:57:32

Subject: [RFC][PATCH] Experimental enhanced NTP error accounting patch

Hey Roman,
I'm sure you've got a number of things going on, but since I didn't
hear anything back from you last time I posted this, I figured I'd try
again.

Here is my first pass implementation of your suggested enhanced NTP
error accounting for the generic timekeeping code.

If you could get a chance to review it and let me know if it addresses
the issues you are concerned about, I would really appreciate it.

Currently it is a bit ugly and I've noted in the comments what I'm
concerned with, but hopefully we can find a way to clean it up a bit.

The patch applies against 2.6.16-rc5-mm2.

Again, I really appreciate the time you've taken in reviewing my patches
and explaining your ideas.

thanks
-john

Signed-off-by: John Stultz <[email protected]>

Documentation/timekeeping.txt | 22 +---
include/asm-generic/timeofday.h | 5 -
include/linux/clocksource.h | 195 +++++++++++++++++++---------------------
kernel/time/timeofday.c | 140 ++++++++++++++--------------
4 files changed, 173 insertions(+), 189 deletions(-)

linux-2.6.16-rc5_timeofday-ntp-error-fix_B20.patch
============================================
diff --git a/Documentation/timekeeping.txt b/Documentation/timekeeping.txt
index 23a5d9d..4ac91b2 100644
--- a/Documentation/timekeeping.txt
+++ b/Documentation/timekeeping.txt
@@ -37,12 +37,6 @@ this value in cycle_last.
cycle_t cycle_last;

-Further since all clocks drift somewhat from each other, we use the adjustment
-values provided via adjtimex() to correct our clocksource frequency for each
-interval. This frequency adjustment value is stored in ntp_adj.
-
-long ntp_adj;
-
Now that we've covered the core global variables for timekeeping, lets look at
how we maintain these values.

@@ -54,7 +48,7 @@ presented as:
timeofday_periodic_hook():
cycle_now = read_clocksource(clock)
cycle_delta = (cycle_now - cycle_last) & clock->mask
- nsec = cyc2ns(clock, cycle_delta, ntp_adj)
+ nsec = cyc2ns(clock, cycle_delta)
system_time += nsec
cycle_last = cycle_now

@@ -62,10 +56,9 @@ timeofday_periodic_hook():

You can see we read the cycle value from the clocksource, calculate a cycle
delta for the interval since we last called timeofday_periodic_hook(), convert
-that cycle delta to a nanosecond interval (for now ignore ntp_adj), add it to
-the system time and finally set our cycle_last value to cycle_now for the next
-interval. Using this simple algorithm we can correctly measure and record the
-passing of time.
+that cycle delta to a nanosecond interval, add it to the system time and finally
+set our cycle_last value to cycle_now for the next interval. Using this simple
+algorithm we can correctly measure and record the passing of time.

But just storing this info isn't very useful, we also want to make it available
to be used elsewhere. So how do we provide a notion of how much time has passed
@@ -77,7 +70,7 @@ timeofday_peridoic_hook().
get_nsec_offset():
cycle_now = read_clocksource(clock)
cycle_delta = (cycle_now - cycle_last) & clock->mask
- nsec = cyc2ns(clock, cycle_delta, ntp_adj)
+ nsec = cyc2ns(clock, cycle_delta)
return nsec

Here you can see, we read the clocksource, calculate a cycle interval, and
@@ -119,14 +112,14 @@ at a safe point, we use the update_callb
see "How to write a clocksource driver" below), this too must be done
in-between intervals in the periodic_hook call. Finally, since the ntp
adjustment made in the cyc2ns conversion is not static, we need to update the
-ntp state machine and get a calculate a new adjustment value.
+ntp state machine and adjust the clocksource mult value.

This adds some extra pseudo code to the timeofday_periodic_hook function:

timeofday_periodic_hook():
cycle_now = read_clocksource(clock)
cycle_delta = (cycle_now - cycle_last) & clock->mask
- nsec = cyc2ns(clock, cycle_delta, ntp_adj)
+ nsec = cyc2ns(clock, cycle_delta)
system_time += nsec
cycle_last = cycle_now

@@ -140,7 +133,6 @@ timeofday_periodic_hook():

ntp_advance(nsec)
ppm = ntp_get_ppm_adjustment()
- ntp_adj = ppm_to_mult_adj(clock, ppm)

Unfortunately, the actual timeofday_periodic_hook code is not as simple as this
diff --git a/include/asm-generic/timeofday.h b/include/asm-generic/timeofday.h
index 79b952f..ddcae35 100644
--- a/include/asm-generic/timeofday.h
+++ b/include/asm-generic/timeofday.h
@@ -20,10 +20,9 @@ extern void sync_persistent_clock(struct

#ifdef CONFIG_GENERIC_TIME_VSYSCALL
extern void arch_update_vsyscall_gtod(struct timespec wall_time,
- cycle_t offset_base, struct clocksource* clock,
- int ntp_adj);
+ cycle_t offset_base, struct clocksource* clock);
#else
-# define arch_update_vsyscall_gtod(x,y,z,w) do { } while(0)
+# define arch_update_vsyscall_gtod(x,y,z) do { } while(0)
#endif /* CONFIG_GENERIC_TIME_VSYSCALL */

#endif /* CONFIG_GENERIC_TIME */
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index bfd61a2..0c5acf8 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -48,6 +48,8 @@ typedef u64 cycle_t;
* @is_continuous: defines if clocksource is free-running.
* @vread: vsyscall read function
* @vdata: vsyscall data value passed to read function
+ * @interval_cycles: Used internally by timekeeping core, please ignore.
+ * @interval_snsecs: Used internally by timekeeping core, please ignore.
*/
struct clocksource {
char *name;
@@ -61,6 +63,10 @@ struct clocksource {
int is_continuous;
cycle_t (*vread)(void *);
void *vdata;
+
+ /* timekeeping specific data, ignore */
+ cycle_t interval_cycles;
+ u64 interval_snsecs;
};

@@ -127,58 +133,18 @@ static inline cycle_t read_clocksource(s
}

/**
- * ppm_to_mult_adj - Converts shifted ppm values to mult adjustment
- * @cs: Pointer to clocksource
- * @ppm: Shifted PPM value
- *
- * Helper which converts a shifted ppm value to clocksource mult_adj value.
- *
- * XXX - this could use some optimization
- */
-static inline int ppm_to_mult_adj(struct clocksource *cs, int ppm)
-{
- u64 mult_adj;
- int ret_adj;
-
- /* The basic math is as follows:
- * cyc * mult/2^shift * (1 + ppm/MILL) = scaled ns
- * We want to precalculate the ppm factor so it can be added
- * to the multiplyer saving the extra multiplication step.
- * cyc * (mult/2^shift + (mult/2^shift) * (ppm/MILL)) =
- * cyc * (mult/2^shift + (mult*ppm/MILL)/2^shift) =
- * cyc * (mult + (mult*ppm/MILL))/2^shift =
- * Thus we want to calculate the value of:
- * mult*ppm/MILL
- */
- mult_adj = abs(ppm);
- mult_adj = (mult_adj * cs->mult)>>SHIFT_USEC;
- mult_adj += 1000000/2; /* round for div*/
- do_div(mult_adj, 1000000);
- if (ppm < 0)
- ret_adj = -(int)mult_adj;
- else
- ret_adj = (int)mult_adj;
-
- return ret_adj;
-}
-
-/**
* cyc2ns - converts clocksource cycles to nanoseconds
* @cs: Pointer to clocksource
- * @ntp_adj: Multiplier adjustment value
* @cycles: Cycles
*
* Uses the clocksource and ntp ajdustment to convert cycle_ts to nanoseconds.
*
* XXX - This could use some mult_lxl_ll() asm optimization
*/
-static inline s64 cyc2ns(struct clocksource *cs, int ntp_adj, cycle_t cycles)
+static inline s64 cyc2ns(struct clocksource *cs, cycle_t cycles)
{
- u64 ret = cycles;
-
- ret *= (cs->mult + ntp_adj);
- ret >>= cs->shift;
-
+ u64 ret = (u64)cycles;
+ ret = (ret * cs->mult) >> cs->shift;
return ret;
}

@@ -195,12 +161,10 @@ static inline s64 cyc2ns(struct clocksou
*
* XXX - This could use some mult_lxl_ll() asm optimization.
*/
-static inline s64 cyc2ns_rem(struct clocksource *cs, int ntp_adj,
- cycle_t cycles, u64* rem)
+static inline s64 cyc2ns_rem(struct clocksource *cs, cycle_t cycles,
+ u64* rem)
{
- u64 ret = cycles;
-
- ret *= (cs->mult + ntp_adj);
+ u64 ret = (u64)cycles * cs->mult;
if (rem) {
ret += *rem;
*rem = ret & ((1<<cs->shift)-1);
@@ -212,27 +176,6 @@ static inline s64 cyc2ns_rem(struct cloc

/**
- * struct clocksource_interval - Fixed interval conversion structure
- *
- * @cycles: A specified number of cycles
- * @nsecs: The number of nanoseconds equivalent to the cycles value
- * @remainder: Non-integer nanosecond remainder stored in (ns<<cs->shift) units
- * @remainder_ns_overflow: Value at which the remainder is equal to
- * one second
- *
- * This is a optimization structure used by cyc2ns_fixed_rem() to avoid the
- * multiply in cyc2ns().
- *
- * Unless you're the timeofday_periodic_hook, you should not be using this!
- */
-struct clocksource_interval {
- cycle_t cycles;
- s64 nsecs;
- u64 remainder;
- u64 remainder_ns_overflow;
-};
-
-/**
* calculate_clocksource_interval - Calculates a clocksource interval struct
*
* @c: Pointer to clocksource.
@@ -244,61 +187,113 @@ struct clocksource_interval {
*
* Unless you're the timeofday_periodic_hook, you should not be using this!
*/
-static inline struct clocksource_interval
-calculate_clocksource_interval(struct clocksource *c, long adj,
- unsigned long length_nsec)
+static inline void calculate_clocksource_interval(struct clocksource *c,
+ unsigned long length_nsec)
{
- struct clocksource_interval ret;
u64 tmp;

/* XXX - All of this could use a whole lot of optimization */
tmp = length_nsec;
tmp <<= c->shift;
- do_div(tmp, c->mult+adj);
+ tmp += c->mult/2;
+ do_div(tmp, c->mult);
+
+ c->interval_cycles = (cycle_t)tmp;
+ if(c->interval_cycles == 0)
+ c->interval_cycles = 1;

- ret.cycles = (cycle_t)tmp;
- if(ret.cycles == 0)
- ret.cycles = 1;
-
- ret.remainder = 0;
- ret.remainder_ns_overflow = 1 << c->shift;
- ret.nsecs = cyc2ns_rem(c, adj, ret.cycles, &ret.remainder);
+ c->interval_snsecs = (u64)c->interval_cycles * c->mult;

+ printk("requested: %lu calculated %llu ns %llu cyc error: %lld\n", length_nsec, c->interval_snsecs>>c->shift, c->interval_cycles, (s64)length_nsec - (c->interval_snsecs>>c->shift) );
+}
+
+
+static inline s64 snsec2nsec_rem(u64 snsec, u32 shift, u64* rem)
+{
+ s64 ret = snsec >> shift;
+ if(rem)
+ *rem = snsec & ((1<<shift)-1);
return ret;
}

/**
- * cyc2ns_fixed_rem -
- * converts clocksource cycles to nanoseconds using fixed intervals
+ * cyc2sns_fixed_error-
+ * converts clocksource cycles to shifted nanoseconds using fixed intervals
*
- * @interval: precalculated clocksource_interval structure
- * @cycles: Number of clocksource cycles
- * @rem: Remainder
+ * @clock: Current clocksource
+ * @cycles: Number of clocksource cycles to accumulate
+ * @ntp_interval: length of the NTP adjusted interval (in shifted nsecs)
+ * @error: pointer to error accumulation variable
*
* Uses a precalculated fixed cycle/nsec interval to convert cycles to
- * nanoseconds. Returns the unaccumulated cycles in the cycles pointer as
- * well as uses and updates the value at the remainder pointer
+ * shifted nanoseconds. Return the unaccumulated cycles in the cycles
+ * pointer, and the error accumulated in the error pointer.
*
* Unless you're the timeofday_periodic_hook, you should not be using this!
*/
-static inline s64 cyc2ns_fixed_rem(struct clocksource_interval interval,
- cycle_t *cycles, u64* rem)
+/* XXX - 4 values in and 3 values out? Terrible! */
+static inline u64 cyc2sns_fixed_error(struct clocksource *clock,
+ cycle_t *cycles, u64 ntp_interval, s64* error)
+{
+ u64 delta_snsec = 0;
+ s64 interval_error = (s64)ntp_interval - clock->interval_snsecs;
+
+ /* accumulate cycles*/
+ while (*cycles > clock->interval_cycles) {
+ *cycles -= clock->interval_cycles;
+ delta_snsec += clock->interval_snsecs;
+ *error += interval_error;
+ }
+
+ return delta_snsec;
+}
+
+/* XXX - This needs a comment */
+static inline int error_aproximation(u64 error, u64 unit)
{
- s64 delta_nsec = 0;
+ static int saved_adj = 0;
+ u64 adjusted_unit = unit << saved_adj;

- while (*cycles > interval.cycles) {
- delta_nsec += interval.nsecs;
- *cycles -= interval.cycles;
- *rem += interval.remainder;
- while(*rem > interval.remainder_ns_overflow) {
- *rem -= interval.remainder_ns_overflow;
- delta_nsec += 1;
- }
+ if (error > (adjusted_unit * 2)) {
+ /* large error, so increment the adjustment factor */
+ saved_adj++;
+ } else if (error > adjusted_unit) {
+ /* just right, don't touch it */
+ } else if (saved_adj) {
+ /* small error, so drop the adjustment factor */
+ saved_adj--;
+ return 0;
}

- return delta_nsec;
+ return saved_adj;
+}
+
+static inline s64 make_ntp_adj(struct clocksource *clock,
+ cycles_t cycles_delta, s64* error)
+{
+ s64 ret = 0;
+ /* check NTP error */
+ if (*error > (s64)clock->interval_cycles / 2 ) {
+ int adj_factor = error_aproximation(*error,
+ clock->interval_cycles);
+ clock->mult += 1 << adj_factor;
+ clock->interval_snsecs += clock->interval_cycles << adj_factor;
+ ret = -(cycles_delta << adj_factor);
+ *error -= (clock->interval_cycles << adj_factor);
+ /* XXX adj error for cycle_delta offset? */
+ } else if ((-(*error)) > (s64)clock->interval_cycles/2) {
+ int adj_factor = error_aproximation(-(*error),
+ clock->interval_cycles);
+ clock->mult -= 1 << adj_factor;
+ clock->interval_snsecs -= clock->interval_cycles << adj_factor;
+ ret = cycles_delta << adj_factor;
+ *error += (clock->interval_cycles << adj_factor);
+ /* XXX adj error for cycle_delta offset? */
+ }
+ return ret;
}

+
/* used to install a new clocksource */
int register_clocksource(struct clocksource*);
void reselect_clocksource(void);
diff --git a/kernel/time/timeofday.c b/kernel/time/timeofday.c
index 8085b86..9b960d8 100644
--- a/kernel/time/timeofday.c
+++ b/kernel/time/timeofday.c
@@ -39,6 +39,13 @@
/* Periodic hook interval */
#define PERIODIC_INTERVAL_MS 50

+/*
+ * INTERVAL_LEN:
+ * This constant is the requested fixed interval period
+ * in nanoseconds.
+ */
+#define INTERVAL_LEN ((PERIODIC_INTERVAL_MS)*1000000)
+
/* [ktime_t based variables]
* system_time:
* Monotonically increasing counter of the number of nanoseconds
@@ -72,30 +79,12 @@ static struct timespec monotonic_time_of
*/
static cycle_t cycle_last;

-/* [clocksource_interval variables]
- * ts_interval:
- * This clocksource_interval is used in the fixed interval
- * cycles to nanosecond calculation.
- * INTERVAL_LEN:
- * This constant is the requested fixed interval period
- * in nanoseconds.
- */
-static struct clocksource_interval ts_interval;
-#define INTERVAL_LEN ((PERIODIC_INTERVAL_MS-1)*1000000)
-
/* [clocksource data]
* clock:
* current clocksource pointer
*/
static struct clocksource *clock;

-/* [NTP adjustment]
- * ntp_adj:
- * value of the current ntp adjustment, stored in
- * clocksource multiplier units.
- */
-static int ntp_adj;
-
/* [locks]
* system_time_lock:
* generic lock for all locally scoped time values
@@ -145,7 +134,7 @@ static void update_legacy_time_values(vo
write_sequnlock_irqrestore(&xtime_lock, flags);

/* since time state has changed, notify vsyscall code */
- arch_update_vsyscall_gtod(wall_time_ts, cycle_last, clock, ntp_adj);
+ arch_update_vsyscall_gtod(wall_time_ts, cycle_last, clock);
}

/**
@@ -167,7 +156,7 @@ static inline s64 __get_nsec_offset(void
cycle_delta = (cycle_now - cycle_last) & clock->mask;

/* convert to nanoseconds: */
- ns_offset = cyc2ns(clock, ntp_adj, cycle_delta);
+ ns_offset = cyc2ns(clock, cycle_delta);

/*
* special case for jiffies tick/offset based systems,
@@ -499,6 +488,7 @@ static int timeofday_init_device(void)

device_initcall(timeofday_init_device);

+
/**
* timeofday_periodic_hook - Does periodic update of timekeeping values.
* @unused: unused value
@@ -514,30 +504,57 @@ static void timeofday_periodic_hook(unsi

cycle_t cycle_now, cycle_delta;
s64 delta_nsec;
- static u64 remainder;

- long leapsecond = 0;
- struct clocksource* next;
+ /* nanoseconds left-shifted by clock->shift */
+ static u64 shifted_nsecs; /* works as a remainder */
+ static s64 ntp_error; /* Error accumulator */
+ s64 ntp_interval; /* interval length from ntp */

+ static s64 second_check;
+ long leapsecond = 0;
int ppm;
- static int ppm_last;

- int something_changed = 0;
+ struct clocksource* next;
struct clocksource old_clock;
- static s64 second_check;

write_seqlock_irqsave(&system_time_lock, flags);
-
- /* read time source & calc time since last call: */
+/**** Accumulation chunk *****/
+ /* read time source & calc cycle delta since last call: */
cycle_now = read_clocksource(clock);
cycle_delta = (cycle_now - cycle_last) & clock->mask;

- delta_nsec = cyc2ns_fixed_rem(ts_interval, &cycle_delta, &remainder);
+
+ /* Calculate ntp_inteval in clock-shifted nanoseconds
+ * XXX - This should all go away w/ Roman's NTP bits
+ */
+ ppm = ntp_get_ppm_adjustment(); /* ppm = shifted usec per sec */
+ ntp_interval = ((s64)INTERVAL_LEN * ppm);
+ /* convert to clock-shifted nanoseconds */
+ ntp_interval = shift_right(ntp_interval,
+ (SHIFT_USEC + 20 - clock->shift));
+ ntp_interval = (((s64)INTERVAL_LEN)<<clock->shift) + ntp_interval;
+
+ /* convert cycle_delta to shifted nanoseconds using fixed intervals */
+ /* XXX - ugh, way too much state going in and out of that function! */
+ shifted_nsecs += cyc2sns_fixed_error(clock, &cycle_delta, ntp_interval,
+ &ntp_error);
+
+ /* add accumulated cycles to cycle_last */
cycle_last = (cycle_now - cycle_delta)&clock->mask;

+ /* check for large NTP error and adjust clock->mult if needed */
+ /* XXX - This has sideeffects, can we make that more clear? */
+ shifted_nsecs += make_ntp_adj(clock, cycle_delta, &ntp_error);
+
+ /* convert shifted nanoseconds to seconds & remainder */
+ delta_nsec = snsec2nsec_rem(shifted_nsecs, clock->shift,
+ &shifted_nsecs);
+
/* update system_time: */
__increment_system_time(delta_nsec);

+/**** NTP book keeping chunk *****/
+ /* XXX - This all might be possible to ditch w/ Roman's bits */
/* advance the ntp state machine by ns interval: */
ntp_advance(delta_nsec);

@@ -555,69 +572,52 @@ static void timeofday_periodic_hook(unsi
if (ntp_synced())
sync_persistent_clock(wall_time_ts);

+/**** Clocksource change chunk ****/
+ old_clock.mult = 0;
/* if necessary, switch clocksources: */
next = get_next_clocksource();
- if (next != clock) {
+ if (unlikely(next != clock)) {
/* immediately set new cycle_last: */
cycle_last = read_clocksource(next);
/* update cycle_now to avoid problems in accumulation later: */
cycle_now = cycle_last;
+
/* swap clocksources: */
- old_clock = *clock;
+ old_clock.mult = clock->mult;
+ old_clock.shift = clock->shift;
clock = next;
printk(KERN_INFO "Time: %s clocksource has been installed.\n",
clock->name);
- ntp_clear();
- ntp_adj = 0;
- remainder = 0;
- something_changed = 1;
- }
-
- /*
- * now is a safe time, so allow clocksource to adjust
- * itself (for example: to make cpufreq changes):
- */
- if (clock->update_callback) {
+ } else if (unlikely(clock->update_callback)) {
/*
- * since clocksource state might change,
- * keep a copy, but only if we've not
- * already changed timesources:
+ * now is a safe time, so allow clocksource to adjust
+ * itself (for example: to make cpufreq changes):
*/
- if (!something_changed)
- old_clock = *clock;
- if (clock->update_callback()) {
- remainder = 0;
- something_changed = 1;
- }
- }
-
- /* check for new PPM adjustment: */
- ppm = ntp_get_ppm_adjustment();
- if (ppm_last != ppm) {
- /* make sure old_clock is set: */
- if (!something_changed)
- old_clock = *clock;
- something_changed = 1;
+ old_clock.mult = clock->mult;
+ old_clock.shift = clock->shift;
+ if (!clock->update_callback())
+ old_clock.mult = 0;
}

/* if something changed, recalculate the ntp adjustment value: */
- if (something_changed) {
+ if (unlikely(old_clock.mult)) {
/* accumulate current leftover cycles using old_clock: */
if (cycle_delta) {
- delta_nsec = cyc2ns_rem(&old_clock, ntp_adj,
- cycle_delta, &remainder);
+ delta_nsec = cyc2ns_rem(&old_clock, cycle_delta,
+ &shifted_nsecs);
cycle_last = cycle_now;
__increment_system_time(delta_nsec);
ntp_advance(delta_nsec);
+ second_check += delta_nsec;
}

- /* recalculate the ntp adjustment and fixed interval values: */
- ppm_last = ppm;
- ntp_adj = ppm_to_mult_adj(clock, ppm);
- ts_interval = calculate_clocksource_interval(clock, ntp_adj,
- INTERVAL_LEN);
+ ntp_clear();
+ shifted_nsecs = 0;
+ ntp_error = 0;
+ calculate_clocksource_interval(clock, INTERVAL_LEN);
}

+/**** Final bits chunk ****/
update_legacy_time_values();

write_sequnlock_irqrestore(&system_time_lock, flags);
@@ -660,6 +660,7 @@ void __init timeofday_init(void)

/* initialize the clock variable: */
clock = get_next_clocksource();
+ calculate_clocksource_interval(clock, INTERVAL_LEN);

/* initialize cycle_last offset base: */
cycle_last = read_clocksource(clock);
@@ -670,10 +671,7 @@ void __init timeofday_init(void)
read_persistent_clock()));

/* clear NTP scaling factor & state machine: */
- ntp_adj = 0;
ntp_clear();
- ts_interval = calculate_clocksource_interval(clock, ntp_adj,
- INTERVAL_LEN);

/* initialize legacy time values: */
update_legacy_time_values();

2006-03-04 13:55:22

by Roman Zippel

[permalink] [raw]

Subject: Re: [RFC][PATCH] Experimental enhanced NTP error accounting patch

Hi,

On Fri, 3 Mar 2006, john stultz wrote:

> Hey Roman,
> I'm sure you've got a number of things going on, but since I didn't
> hear anything back from you last time I posted this, I figured I'd try
> again.

Sorry about that, I was working on other things. Below are some notes
which hopefully help to understand the basic parts.

> Here is my first pass implementation of your suggested enhanced NTP
> error accounting for the generic timekeeping code.

I don't think it's the right way to go. You do a lot of extra work and
adding pieces of my code on top of it is going to be error-prone. To help
understanding the basics of my patch let me outline the core math
of it. The ntp code provides the time update per tick:

ntp_update = 1sec + adjustments

That's all the clock code needs from the ntp code (the details of its
calculation is not really important here) and ntp updates are done in tick
intervals. For simplicity let's assume HZ=1 so that the clock is updated
every tick by:

xtime += freq * mult

Due to resolution differences this produces an error with every update,
which we accumulate as:

error += ntp_update - freq * mult

As soon as the error is large we can calculate a new multiplier:

freq * mult_new = freq * mult + error

which becomes:

mult_adj = mult_new - mult = error / freq

Implementation note: The adjustment calculation is a bit more complicated
than this for various reasons, but that's really just an implementation
detail.

To keep timeofday the same after changing the multiplier we also have to
change the base xtime:

timeofday = xtime + off * mult = xtime_new + off * mult_new

so simplified xtime is changed as:

xtime -= off * mult_adj

After changing the multiplier this will also reduce the error:

error -= freq * mult - (off * mult + (freq - off) * mult_new)

This is the difference between the old update and the new update (where
the new multiplier is only applied for the rest of the tick), simplified
this becomes:

error -= (freq - off) * mult_adj

Finally setting time and changing clock can be done with the same
mechanism:

xtime = timeofday - off * mult

timeofday is simply the new time or the current time of the old clock,
this means that a clock change can be done outside the periodic hook.

This is basically the core part of my prototype to implement a continous
timeofday, the rest is just an implementation detail (e.g. how mult_adj
becomes a shift). It's maybe not trivial, but I don't think it's that hard
to understand (at least using an example) and results in very simple code.

Maybe the main question I have about your code now is why you want to
calculate nsec intervals during the update, but as you can see above, it's
not really necessary, so what do you want to use it for?

Another problem is timeofday_periodic_hook(), while I can understand that
you want to make it independent of the timer tick, some kind of timer tick
is still needed at which ntp time and clock time is synchronized. You go
to great lengths to hide it, but it's needed nevertheless, which makes
your code harder to read and more complex than needed.
It would be a lot simpler to keep the updates at first at HZ frequency.
Later we can still change the update frequency, but ntp time has to be
updated in fixed intervals if we want precise time keeping. Note that
error adjustments can still be done asynchronously as shown above, but
dynamic interval ntp updates are just too complex and error-prone.

I could give it a try to remove all that extra complexity to get it into
something mergable, but that would mean dropping quite some bits and
I'd prefered if we could agree on something. I simply don't have the time
right now to work on patches, which are rejected again in the end.

bye, Roman

2006-03-07 03:17:09

by john stultz

[permalink] [raw]

Subject: Re: [RFC][PATCH] Experimental enhanced NTP error accounting patch

On Sat, 2006-03-04 at 14:55 +0100, Roman Zippel wrote:
> Hi,
>
> On Fri, 3 Mar 2006, john stultz wrote:
>
> > Hey Roman,
> > I'm sure you've got a number of things going on, but since I didn't
> > hear anything back from you last time I posted this, I figured I'd try
> > again.
>
> Sorry about that, I was working on other things. Below are some notes
> which hopefully help to understand the basic parts.

Quite understandable :) We've all got other things to do.

> > Here is my first pass implementation of your suggested enhanced NTP
> > error accounting for the generic timekeeping code.
>
> I don't think it's the right way to go. You do a lot of extra work and
> adding pieces of my code on top of it is going to be error-prone. To help
> understanding the basics of my patch let me outline the core math
> of it.

While I do appreciate the overview (it helps verify or correct the
similar proofs I had to do earlier in implementing this patch), I'm not
sure I agree that integrating your ideas with the TOD patches is a bad
thing. While I think the NTP bits will have to be reconciled, I'm not
opposed to your ideas there, so I don't know exactly where the problem
is.

The "extra work" I do should have some purpose (as I'm not doing this
for fun), so either it can be eliminated or I need to justify it. Being
more specific here, like quoting the parts in my patch you don't like
would help.

This is an iterative approach, and while I know its frustrating to walk
me through all of this, I do think we're making progress.

> The ntp code provides the time update per tick:
>
> ntp_update = 1sec + adjustments
>
> That's all the clock code needs from the ntp code (the details of its
> calculation is not really important here) and ntp updates are done in tick
> intervals. For simplicity let's assume HZ=1 so that the clock is updated
> every tick by:
>
> xtime += freq * mult

Yep, fundamentally this is the same, although I store
(freq/interval)*mult into clocksource.interval_snsecs and use the
current cycle value to sort out how many full intervals have passed
instead of assuming one-interval per tick. Just like your prototype.

> Due to resolution differences this produces an error with every update,
> which we accumulate as:
>
> error += ntp_update - freq * mult

Yep. Same calculation is done in my patch. And we accumulate the error
on a per-interval basis.

> As soon as the error is large we can calculate a new multiplier:
>
> freq * mult_new = freq * mult + error
>
> which becomes:
>
> mult_adj = mult_new - mult = error / freq
>
> Implementation note: The adjustment calculation is a bit more complicated
> than this for various reasons, but that's really just an implementation
> detail.

I think I have this covered in the make_ntp_adj and error_approximation
functions.

> To keep timeofday the same after changing the multiplier we also have to
> change the base xtime:
>
> timeofday = xtime + off * mult = xtime_new + off * mult_new
>
> so simplified xtime is changed as:
>
> xtime -= off * mult_adj

Again, I believe this is done correctly by the return value of
make_ntp_adj().

> After changing the multiplier this will also reduce the error:
>
> error -= freq * mult - (off * mult + (freq - off) * mult_new)
>
> This is the difference between the old update and the new update (where
> the new multiplier is only applied for the rest of the tick), simplified
> this becomes:
>
> error -= (freq - off) * mult_adj

This part I'll need to study a bit more as it is not so clear why the
error must be corrected, instead of letting it be corrected in the next
tick. I know it is required, as the error does not get dampened without
it, but the why is less clear. I suspect it has to do with the fact that
xtime was adjusted, and since we accumulate error as we accumulate
xtime, we have to compensate there as well.

> Finally setting time and changing clock can be done with the same
> mechanism:
>
> xtime = timeofday - off * mult
>
> timeofday is simply the new time or the current time of the old clock,
> this means that a clock change can be done outside the periodic hook.

This piques my interest, as I still do not quite see how the clocksource
change can occur outside the periodic hook. I'm open to changing this as
the clocksource changing code along with the per-interval accumulation
does make the periodic_hook code a bit long. Breaking it up sounds like
a nice idea, however I'm not following this part.

> This is basically the core part of my prototype to implement a continous
> timeofday, the rest is just an implementation detail (e.g. how mult_adj
> becomes a shift). It's maybe not trivial, but I don't think it's that hard
> to understand (at least using an example) and results in very simple code.

I do agree it is much computationally simpler. However more state must
be maintained and the bit about subtracting from xtime to correct for
the multiplier adjustment is not trivial to understand at first glance.

Now, after I sat down and proved that bit to myself, it is quite
efficient code and I do like the idea of using it. I just want to
maintain the readability while getting the efficiency.

> Maybe the main question I have about your code now is why you want to
> calculate nsec intervals during the update, but as you can see above, it's
> not really necessary, so what do you want to use it for?

Calculate nsec intervals during the update? I'm not sure specifically
which bit you're talking about. Since you didn't really comment
specifically on the code I posted, it makes it hard for me to respond.
Could you point to a line number or a specific variable name?

> Another problem is timeofday_periodic_hook(), while I can understand that
> you want to make it independent of the timer tick, some kind of timer tick
> is still needed at which ntp time and clock time is synchronized. You go
> to great lengths to hide it, but it's needed nevertheless, which makes
> your code harder to read and more complex than needed.

I'm not sure I understand what you're getting at here. The only actual
tick based code is in ntp_advance() where we calculate the singleshot
adjtime adjustment, which is just done to preserve the NTP behavior.
This isn't something I'm hiding. Instead it has the benefit of allowing
the timekeeping code to not need to run every interrupt, which is needed
to handle lost or irregular ticks.

> It would be a lot simpler to keep the updates at first at HZ frequency.
> Later we can still change the update frequency, but ntp time has to be
> updated in fixed intervals if we want precise time keeping. Note that
> error adjustments can still be done asynchronously as shown above, but
> dynamic interval ntp updates are just too complex and error-prone.

I'd be interested if you could be more specific here. What parts of the
NTP state machine has be updated a real fixed intervals and what parts
do not?

What I'm trying to do w/ ntp_advance() is inform the NTP state machine
that the last total_sppm value calculated was used for the last X nsecs.
So a value for the next period can be calculated. It may be that the
last total_sppm value was applied for too long, but at least it knows
that after the fact rather then not at all.

Really, I'm not so picky about the NTP adjustments as I am about time
not going backwards. So I'm very open to your ideas regarding the NTP
state machine as long as they don't conflict with correct timekeeping.
Running periodic_hook() at HZ frequency is ok by me. However I think
using 50ms intervals is a good default because ticks will be missed on
users machines and we should have correct behavior in those cases. With
the -rt tree, we may only run once a second! I think much like
INITIAL_JIFFIES, using 50ms intervals forces the issue rather then
ignoring it.

> I could give it a try to remove all that extra complexity to get it into
> something mergable, but that would mean dropping quite some bits and
> I'd prefered if we could agree on something. I simply don't have the time
> right now to work on patches, which are rejected again in the end.

I'm not asking you to drop your work, I just want to understand your
objections and try to integrate your ideas. Please also realize that my
time is not infinite as well, so the sooner I can understand your ideas
and we can agree and get this into mainline, the better. :)

thanks
-john

2006-03-10 02:16:59

by Roman Zippel

[permalink] [raw]

Subject: Re: [RFC][PATCH] Experimental enhanced NTP error accounting patch

Hi,

On Mon, 6 Mar 2006, john stultz wrote:

> While I do appreciate the overview (it helps verify or correct the
> similar proofs I had to do earlier in implementing this patch), I'm not
> sure I agree that integrating your ideas with the TOD patches is a bad
> thing. While I think the NTP bits will have to be reconciled, I'm not
> opposed to your ideas there, so I don't know exactly where the problem
> is.
>
> The "extra work" I do should have some purpose (as I'm not doing this
> for fun), so either it can be eliminated or I need to justify it. Being
> more specific here, like quoting the parts in my patch you don't like
> would help.

The point is that it practically completely changes the NTP specific
parts, so it should be part of the initial patches.

> This is an iterative approach, and while I know its frustrating to walk
> me through all of this, I do think we're making progress.

I'm not worried about making progress, I'm worried that this gets merged
before it's ready and the NTP bits are not ready yet.

> > After changing the multiplier this will also reduce the error:
> >
> > error -= freq * mult - (off * mult + (freq - off) * mult_new)
> >
> > This is the difference between the old update and the new update (where
> > the new multiplier is only applied for the rest of the tick), simplified
> > this becomes:
> >
> > error -= (freq - off) * mult_adj
>
> This part I'll need to study a bit more as it is not so clear why the
> error must be corrected, instead of letting it be corrected in the next
> tick. I know it is required, as the error does not get dampened without
> it, but the why is less clear. I suspect it has to do with the fact that
> xtime was adjusted, and since we accumulate error as we accumulate
> xtime, we have to compensate there as well.

This is the error of the next tick, we look ahead to the next tick, so we
have reduced the error at the time we reach it.
So if we update the error with (ntp_update - mult * freq) we use the
values for the current tick and if we change the multiplier we also have
to correct the error for the remainder of the current tick. This way we
keep the error within the limits and keep it from oscillating.

> > Finally setting time and changing clock can be done with the same
> > mechanism:
> >
> > xtime = timeofday - off * mult
> >
> > timeofday is simply the new time or the current time of the old clock,
> > this means that a clock change can be done outside the periodic hook.
>
> This piques my interest, as I still do not quite see how the clocksource
> change can occur outside the periodic hook. I'm open to changing this as
> the clocksource changing code along with the per-interval accumulation
> does make the periodic_hook code a bit long. Breaking it up sounds like
> a nice idea, however I'm not following this part.

The basic idea is really simple:

lock();
new_clock->set_time(current_clock->get_time());
current_clock = new_clock;
unlock();

I don't quite understand the problem you see with this, only some
currently static state has to become per clock state (which is actually a
more general problem - too much clock state is global right now).

> > This is basically the core part of my prototype to implement a continous
> > timeofday, the rest is just an implementation detail (e.g. how mult_adj
> > becomes a shift). It's maybe not trivial, but I don't think it's that hard
> > to understand (at least using an example) and results in very simple code.
>
> I do agree it is much computationally simpler. However more state must
> be maintained and the bit about subtracting from xtime to correct for
> the multiplier adjustment is not trivial to understand at first glance.

The only really new state is the error value, which you didn't maintain
before.

> Now, after I sat down and proved that bit to myself, it is quite
> efficient code and I do like the idea of using it. I just want to
> maintain the readability while getting the efficiency.

I guess it's in the eye of the beholder. :)
While I do agree, it's not immediately obvious what the code does, I
really would prefer to add the complementing information via
documentation.

> > Maybe the main question I have about your code now is why you want to
> > calculate nsec intervals during the update, but as you can see above, it's
> > not really necessary, so what do you want to use it for?
>
> Calculate nsec intervals during the update? I'm not sure specifically
> which bit you're talking about. Since you didn't really comment
> specifically on the code I posted, it makes it hard for me to respond.
> Could you point to a line number or a specific variable name?

For example you still calculate a delta_nsec and ppm values.
It also would really help to concentrate first on the current time
services and add new time services (which e.g. return ktime_t) in a second
patch. I think it's possible that the extra state information for it is
updated once a second, in any way a separate patch would be easier to
review and improve.

> > It would be a lot simpler to keep the updates at first at HZ frequency.
> > Later we can still change the update frequency, but ntp time has to be
> > updated in fixed intervals if we want precise time keeping. Note that
> > error adjustments can still be done asynchronously as shown above, but
> > dynamic interval ntp updates are just too complex and error-prone.
>
> I'd be interested if you could be more specific here. What parts of the
> NTP state machine has be updated a real fixed intervals and what parts
> do not?

I think it's better to give an example of what I have in mind.
A lot is already possible without my NTP patches, the periodic function
could look something like this:

off = cycles - cycles_offset;
while (off >= update_cycles) {
off -= update_cycles;
cycles_offset += update_cycles;
jiffies_64++;
xtime_nsec += xtime_update;
if (xtime_nsec >= (u64)NSEC_PER_SEC << 22) {
xtime_nsec -= (u64)NSEC_PER_SEC << 22;
xtime.tv_sec++;
second_overflow();
}
xtime.tv_nsec = xtime_nsec >> 22;
xtime_error += current_tick_length() - xtime_update;
time_adjust_step = adjtime_adjustment();
if (time_adjust_step)
time_adjust -= time_adjust_step;
}
[error adjustment]

My patches would move the time_adjust stuff into second_overflow() and
make it easier to change the tick length for current_tick_length().
Anyway, this loop does the fixed interval updates (and also basically
does the same work as update_times()) and the error adjustment uses "off"
for the asynchronous updates and please don't split the loop into multiple
loops all over the place.
Most of the NTP work is still done in second_overflow(), there is no need
to calculate any ppm values and no need to separate the leapsecond
handling (at least initially) - IOW the two NTP rework patches become
unnecessary.

> What I'm trying to do w/ ntp_advance() is inform the NTP state machine
> that the last total_sppm value calculated was used for the last X nsecs.
> So a value for the next period can be calculated. It may be that the
> last total_sppm value was applied for too long, but at least it knows
> that after the fact rather then not at all.

What should it do with this knowledge?

> Really, I'm not so picky about the NTP adjustments as I am about time
> not going backwards. So I'm very open to your ideas regarding the NTP
> state machine as long as they don't conflict with correct timekeeping.

Correct timekeeping has two aspects - short-term and long-term
correctness. The old code maintains long-term stability by simply jumping
to the correct time at every tick. Your code now lets the clock run
free loosely controlled via NTP. I would really to like see to keep both
aspects correct and preferably before it gets merged.

> Running periodic_hook() at HZ frequency is ok by me. However I think
> using 50ms intervals is a good default because ticks will be missed on
> users machines and we should have correct behavior in those cases. With
> the -rt tree, we may only run once a second! I think much like
> INITIAL_JIFFIES, using 50ms intervals forces the issue rather then
> ignoring it.

The point is that you try to fix too many things at once. The current NTP
code is very much HZ based and correct timekeeping is also possible at HZ
frequency, so let's concentrate first on this and I'm certain it will be
simpler than what you have right now.
The update interval can still be changed, but that requires the functional
cleanup my NTP patches provide.

> > I could give it a try to remove all that extra complexity to get it into
> > something mergable, but that would mean dropping quite some bits and
> > I'd prefered if we could agree on something. I simply don't have the time
> > right now to work on patches, which are rejected again in the end.
>
> I'm not asking you to drop your work, I just want to understand your
> objections and try to integrate your ideas. Please also realize that my
> time is not infinite as well, so the sooner I can understand your ideas
> and we can agree and get this into mainline, the better. :)

That wouldn't be a problem, if there wasn't the pressure to get this
merged as soon as possible. It seems lately it has been more important to
get new features merged, than taking the time to do the necessary
groundwork thoroughly.

bye, Roman