This provides an interface for arch code to find out how many
nanoseconds are going to be added on to xtime by the next call to
do_timer. The value returned is a fixed-point number in 52.12 format
in nanoseconds. The reason for this format is that it gives the
full precision that the timekeeping code is using internally.
The motivation for this is to fix a problem that has arisen on 32-bit
powerpc in that the value returned by do_gettimeofday drifts apart
from xtime if NTP is being used. PowerPC is now using a lockless
do_gettimeofday based on reading the timebase register and performing
some simple arithmetic. (This method of getting the time is also
exported to userspace via the VDSO.) However, the factor and offset
it uses were calculated based on the nominal tick length and weren't
being adjusted when NTP varied the tick length.
Note that 64-bit powerpc has had the lockless do_gettimeofday for a
long time now. It also had an extremely hairy routine that got called
from the 32-bit compat routine for adjtimex, which adjusted the
factor and offset according to what it thought the timekeeping code
was going to do. Not only was this only called if a 32-bit task did
adjtimex (i.e. not if a 64-bit task did adjtimex), it was also
duplicating computations from kernel/timer.c and it wasn't clear that
it was (still) correct.
The simple solution is to ask the timekeeping code how long the
current jiffy will be on each timer interrupt, after calling
do_timer. If this jiffy will be a different length from the last one,
we then need to compute new values for the factor and offset used in
the lockless do_gettimeofday. In this way we can keep xtime and
do_gettimeofday in sync, even when NTP is varying the tick length.
Note that when adjtimex varies the tick length, it almost always
introduces the variation from the next tick on. The only case I could
see where adjtimex would vary the length of the current tick is when
an old-style adjtime adjustment is being cancelled. (It's not clear
to me why the adjustment has to be cancelled immediately rather than
from the next tick on.) Thus I don't see any real need for a hook in
adjtimex; the rare case of an old-style adjustment being cancelled can
be fixed up at the next tick.
Signed-off-by: Paul Mackerras <[email protected]>
---
This version of the patch has the adjtime calculation factored out and
the assignments inside if statements removed, as requested.
diff --git a/include/linux/timex.h b/include/linux/timex.h
index 04a4a8c..b7ca120 100644
--- a/include/linux/timex.h
+++ b/include/linux/timex.h
@@ -345,6 +345,9 @@ time_interpolator_reset(void)
#endif /* !CONFIG_TIME_INTERPOLATION */
+/* Returns how long ticks are at present, in ns / 2^(SHIFT_SCALE-10). */
+extern u64 current_tick_length(void);
+
#endif /* KERNEL */
#endif /* LINUX_TIMEX_H */
diff --git a/kernel/timer.c b/kernel/timer.c
index b9dad39..fe3a9a9 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -717,12 +717,16 @@ static void second_overflow(void)
#endif
}
-/* in the NTP reference this is called "hardclock()" */
-static void update_wall_time_one_tick(void)
+/*
+ * Returns how many microseconds we need to add to xtime this tick
+ * in doing an adjustment requested with adjtime.
+ */
+static long adjtime_adjustment(void)
{
- long time_adjust_step, delta_nsec;
+ long time_adjust_step;
- if ((time_adjust_step = time_adjust) != 0 ) {
+ time_adjust_step = time_adjust;
+ if (time_adjust_step) {
/*
* We are doing an adjtime thing. Prepare time_adjust_step to
* be within bounds. Note that a positive time_adjust means we
@@ -733,10 +737,19 @@ static void update_wall_time_one_tick(vo
*/
time_adjust_step = min(time_adjust_step, (long)tickadj);
time_adjust_step = max(time_adjust_step, (long)-tickadj);
+ }
+ return time_adjust_step;
+}
+/* in the NTP reference this is called "hardclock()" */
+static void update_wall_time_one_tick(void)
+{
+ long time_adjust_step, delta_nsec;
+
+ time_adjust_step = adjtime_adjustment();
+ if (time_adjust_step)
/* Reduce by this step the amount of time left */
time_adjust -= time_adjust_step;
- }
delta_nsec = tick_nsec + time_adjust_step * 1000;
/*
* Advance the phase, once it gets to one microsecond, then
@@ -759,6 +772,22 @@ static void update_wall_time_one_tick(vo
}
/*
+ * Return how long ticks are at the moment, that is, how much time
+ * update_wall_time_one_tick will add to xtime next time we call it
+ * (assuming no calls to do_adjtimex in the meantime).
+ * The return value is in fixed-point nanoseconds with SHIFT_SCALE-10
+ * bits to the right of the binary point.
+ * This function has no side-effects.
+ */
+u64 current_tick_length(void)
+{
+ long delta_nsec;
+
+ delta_nsec = tick_nsec + adjtime_adjustment() * 1000;
+ return ((u64) delta_nsec << (SHIFT_SCALE - 10)) + time_adj;
+}
+
+/*
* Using a loop looks inefficient, but "ticks" is
* usually just one (we shouldn't be losing ticks,
* we're doing this this way mainly for interrupt
On Fri, 2006-02-17 at 10:30 +1100, Paul Mackerras wrote:
> This provides an interface for arch code to find out how many
> nanoseconds are going to be added on to xtime by the next call to
> do_timer. The value returned is a fixed-point number in 52.12 format
> in nanoseconds. The reason for this format is that it gives the
> full precision that the timekeeping code is using internally.
>
> The motivation for this is to fix a problem that has arisen on 32-bit
> powerpc in that the value returned by do_gettimeofday drifts apart
> from xtime if NTP is being used. PowerPC is now using a lockless
> do_gettimeofday based on reading the timebase register and performing
> some simple arithmetic. (This method of getting the time is also
> exported to userspace via the VDSO.) However, the factor and offset
> it uses were calculated based on the nominal tick length and weren't
> being adjusted when NTP varied the tick length.
>
> Note that 64-bit powerpc has had the lockless do_gettimeofday for a
> long time now. It also had an extremely hairy routine that got called
> from the 32-bit compat routine for adjtimex, which adjusted the
> factor and offset according to what it thought the timekeeping code
> was going to do. Not only was this only called if a 32-bit task did
> adjtimex (i.e. not if a 64-bit task did adjtimex), it was also
> duplicating computations from kernel/timer.c and it wasn't clear that
> it was (still) correct.
>
> The simple solution is to ask the timekeeping code how long the
> current jiffy will be on each timer interrupt, after calling
> do_timer. If this jiffy will be a different length from the last one,
> we then need to compute new values for the factor and offset used in
> the lockless do_gettimeofday. In this way we can keep xtime and
> do_gettimeofday in sync, even when NTP is varying the tick length.
>
> Note that when adjtimex varies the tick length, it almost always
> introduces the variation from the next tick on. The only case I could
> see where adjtimex would vary the length of the current tick is when
> an old-style adjtime adjustment is being cancelled. (It's not clear
> to me why the adjustment has to be cancelled immediately rather than
> from the next tick on.) Thus I don't see any real need for a hook in
> adjtimex; the rare case of an old-style adjustment being cancelled can
> be fixed up at the next tick.
>
> Signed-off-by: Paul Mackerras <[email protected]>
> ---
> This version of the patch has the adjtime calculation factored out and
> the assignments inside if statements removed, as requested.
>
> diff --git a/include/linux/timex.h b/include/linux/timex.h
> index 04a4a8c..b7ca120 100644
> --- a/include/linux/timex.h
> +++ b/include/linux/timex.h
> @@ -345,6 +345,9 @@ time_interpolator_reset(void)
>
> #endif /* !CONFIG_TIME_INTERPOLATION */
>
> +/* Returns how long ticks are at present, in ns / 2^(SHIFT_SCALE-10). */
> +extern u64 current_tick_length(void);
> +
> #endif /* KERNEL */
>
> #endif /* LINUX_TIMEX_H */
> diff --git a/kernel/timer.c b/kernel/timer.c
> index b9dad39..fe3a9a9 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -717,12 +717,16 @@ static void second_overflow(void)
> #endif
> }
>
> -/* in the NTP reference this is called "hardclock()" */
> -static void update_wall_time_one_tick(void)
> +/*
> + * Returns how many microseconds we need to add to xtime this tick
> + * in doing an adjustment requested with adjtime.
> + */
> +static long adjtime_adjustment(void)
> {
> - long time_adjust_step, delta_nsec;
> + long time_adjust_step;
>
> - if ((time_adjust_step = time_adjust) != 0 ) {
> + time_adjust_step = time_adjust;
> + if (time_adjust_step) {
> /*
> * We are doing an adjtime thing. Prepare time_adjust_step to
> * be within bounds. Note that a positive time_adjust means we
> @@ -733,10 +737,19 @@ static void update_wall_time_one_tick(vo
> */
> time_adjust_step = min(time_adjust_step, (long)tickadj);
> time_adjust_step = max(time_adjust_step, (long)-tickadj);
> + }
> + return time_adjust_step;
> +}
>
> +/* in the NTP reference this is called "hardclock()" */
> +static void update_wall_time_one_tick(void)
> +{
> + long time_adjust_step, delta_nsec;
> +
> + time_adjust_step = adjtime_adjustment();
> + if (time_adjust_step)
> /* Reduce by this step the amount of time left */
> time_adjust -= time_adjust_step;
Does the if statement really buy you anything here?
> - }
> delta_nsec = tick_nsec + time_adjust_step * 1000;
> /*
> * Advance the phase, once it gets to one microsecond, then
> @@ -759,6 +772,22 @@ static void update_wall_time_one_tick(vo
> }
>
> /*
> + * Return how long ticks are at the moment, that is, how much time
> + * update_wall_time_one_tick will add to xtime next time we call it
> + * (assuming no calls to do_adjtimex in the meantime).
> + * The return value is in fixed-point nanoseconds with SHIFT_SCALE-10
> + * bits to the right of the binary point.
> + * This function has no side-effects.
> + */
> +u64 current_tick_length(void)
> +{
> + long delta_nsec;
> +
> + delta_nsec = tick_nsec + adjtime_adjustment() * 1000;
> + return ((u64) delta_nsec << (SHIFT_SCALE - 10)) + time_adj;
> +}
You've got time_adj here, but you're not using what's been accumulated
in time_phase, is that really ok?
Other then that it looks ok to me. I know Roman is working on related
code, so I CC'ed him on this.
thanks
-john
john stultz writes:
> > + if (time_adjust_step)
> > /* Reduce by this step the amount of time left */
> > time_adjust -= time_adjust_step;
>
> Does the if statement really buy you anything here?
Well, just avoiding dirtying a cache line in the common case where
time_adjust and time_adjust_step are zero, that's all. It's a very
minor thing. In fact if time_adjust is zero we could avoid the
procedure call, the subtraction and the multiplication by 1000, like
this:
delta_nsec = tick_nsec;
if (time_adjust) {
time_adjust_step = adjtime_adjustment();
/* Reduce by this step the amount of time left */
time_adjust -= time_adjust_step;
delta_nsec += time_adjust_step * 1000;
}
> > +u64 current_tick_length(void)
> > +{
> > + long delta_nsec;
> > +
> > + delta_nsec = tick_nsec + adjtime_adjustment() * 1000;
> > + return ((u64) delta_nsec << (SHIFT_SCALE - 10)) + time_adj;
> > +}
>
> You've got time_adj here, but you're not using what's been accumulated
> in time_phase, is that really ok?
Yes.
What's been accumulated in time_phase is always less than a
nanosecond's worth. I took the approach of delivering the full
precision of time_adj to the arch code (i.e. including the 12 bits to
the right of the binary point), rather than truncating it to whole
nanoseconds and then having to vary it by +/- 1 nanosecond each tick.
Basically time_phase is just accumulating the fraction of a nanosecond
that we haven't been able to add onto xtime yet. The effect of that
is that some ticks are 1 nanosecond longer or shorter than others,
depending on the fractional part of time_adj. I set the factor and
offset that I use in the lockless gettimeofday to take into account
the whole time_adj value, including the fractional part. That way I
don't have to change the factor and offset every tick, but only when
time_adj changes (or when the adjtime adjustment changes, of course).
The possible +/- 1 nanosecond difference is smaller than the
resolution of the timebase register, and way smaller than the time it
takes to do a gettimeofday and compare it with xtime, so it can be
ignored.
Paul.
On Fri, 2006-02-17 at 11:43 +1100, Paul Mackerras wrote:
> john stultz writes:
>
> > > + if (time_adjust_step)
> > > /* Reduce by this step the amount of time left */
> > > time_adjust -= time_adjust_step;
> >
> > Does the if statement really buy you anything here?
>
> Well, just avoiding dirtying a cache line in the common case where
> time_adjust and time_adjust_step are zero, that's all. It's a very
> minor thing. In fact if time_adjust is zero we could avoid the
> procedure call, the subtraction and the multiplication by 1000, like
> this:
>
> delta_nsec = tick_nsec;
> if (time_adjust) {
> time_adjust_step = adjtime_adjustment();
> /* Reduce by this step the amount of time left */
> time_adjust -= time_adjust_step;
> delta_nsec += time_adjust_step * 1000;
> }
Yea, I was thinking if you weren't going to do it for the mult, why
bother on the subtract. Either way, I'm not too picky, it should just be
consistent.
> > > +u64 current_tick_length(void)
> > > +{
> > > + long delta_nsec;
> > > +
> > > + delta_nsec = tick_nsec + adjtime_adjustment() * 1000;
> > > + return ((u64) delta_nsec << (SHIFT_SCALE - 10)) + time_adj;
> > > +}
> >
> > You've got time_adj here, but you're not using what's been accumulated
> > in time_phase, is that really ok?
>
> Yes.
>
> What's been accumulated in time_phase is always less than a
> nanosecond's worth. I took the approach of delivering the full
> precision of time_adj to the arch code (i.e. including the 12 bits to
> the right of the binary point), rather than truncating it to whole
> nanoseconds and then having to vary it by +/- 1 nanosecond each tick.
So you're accumulating it yourself in the arch code? That's fine, I just
wanted to be sure.
Acked-by: john stultz <[email protected]>
thanks
-john
Hi,
On Thu, 16 Feb 2006, john stultz wrote:
> > +u64 current_tick_length(void)
> > +{
> > + long delta_nsec;
> > +
> > + delta_nsec = tick_nsec + adjtime_adjustment() * 1000;
> > + return ((u64) delta_nsec << (SHIFT_SCALE - 10)) + time_adj;
> > +}
>
> You've got time_adj here, but you're not using what's been accumulated
> in time_phase, is that really ok?
>
>
> Other then that it looks ok to me. I know Roman is working on related
> code, so I CC'ed him on this.
I was only thinking about the possibility of increasing the precision
already, but that can still be done later.
Otherwise this function will only become simpler, so there's no real
problem. Actually it should even make NTP4 conversion easier, if ppc uses
this value instead of duplicating the calculation.
bye, Roman