2008-01-24 02:39:15

by john stultz

[permalink] [raw]
Subject: [PATCH] correct inconsistent ntp interval/tick_length usage

I recently noticed on one of my boxes that when synched with an NTP
server, the drift value reported for the system was ~283ppm. While in
some cases, clock hardware can be that bad, it struck me as unusual as
the system was using the acpi_pm clocksource, which is one of the more
trustworthy and accurate clocksources on x86 hardware.

I brought up another system and let it sync to the same NTP server, and
I noticed a similar 280some ppm drift.

In looking at the code, I found that the acpi_pm's constant frequency
was being computed correctly at boot-up, however once the system was up,
even without the ntp daemon running, the clocksource's frequency was
being modified by the clocksource_adjust() function.

Digging deeper, I realized that in the code that keeps track of how much
the clocksource is skewing from the ntp desired time, we were using
different lengths to establish how long an time interval was.

The clocksource was being setup with the following interval:
NTP_INTERVAL_LENGTH = NSEC_PER_SEC/NTP_INTERVAL_FREQ

While the ntp code was using the tick_length_base value:
tick_length_base ~= (tick_usec * NSEC_PER_USEC * USER_HZ)
/NTP_INTERVAL_FREQ

The subtle difference is:
(tick_usec * NSEC_PER_USEC * USER_HZ) != NSEC_PER_SEC

This difference in calculation was causing the clocksource correction
code to apply a correction factor to the clocksource so the two
intervals were the same, however this results in the actual frequency of
the clocksource to be made incorrect. I believe this difference would
affect all clocksources, although to differing degrees depending on the
clocksource resolution.

The issue was introduced when my HZ free ntp patch landed in 2.6.21-rc1,
so my apologies for the mistake, and for not noticing it until now.

The following patch, corrects the clocksource's initialization code so
it uses the same interval length as the code in ntp.c. After applying
this patch, the drift value for the same system went from ~283ppm to
only 2.635ppm.

I believe this patch to be good, however it does affect all arches and
I've only tested on x86, so some caution is advised. I do think it would
be a likely candidate for a stable 2.6.24.x release.

Any thoughts or feedback would be appreciated.

Signed-off-by: John Stultz <[email protected]>

Index: linux/kernel/time/timekeeping.c
===================================================================
--- linux.orig/kernel/time/timekeeping.c
+++ linux/kernel/time/timekeeping.c
@@ -208,7 +208,8 @@ static void change_clocksource(void)

clock->error = 0;
clock->xtime_nsec = 0;
- clocksource_calculate_interval(clock, NTP_INTERVAL_LENGTH);
+ clocksource_calculate_interval(clock,
+ (unsigned long)(current_tick_length()>>TICK_LENGTH_SHIFT));

tick_clock_notify();

@@ -265,7 +266,8 @@ void __init timekeeping_init(void)
ntp_clear();

clock = clocksource_get_next();
- clocksource_calculate_interval(clock, NTP_INTERVAL_LENGTH);
+ clocksource_calculate_interval(clock,
+ (unsigned long)(current_tick_length()>>TICK_LENGTH_SHIFT));
clock->cycle_last = clocksource_read(clock);

xtime.tv_sec = sec;


2008-01-24 09:15:19

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: [PATCH] correct inconsistent ntp interval/tick_length usage

On Wed, 23 Jan 2008 18:38:53 PST, john stultz said:
> I recently noticed on one of my boxes that when synched with an NTP
> server, the drift value reported for the system was ~283ppm. While in
> some cases, clock hardware can be that bad, it struck me as unusual as
> the system was using the acpi_pm clocksource, which is one of the more
> trustworthy and accurate clocksources on x86 hardware.
>
> I brought up another system and let it sync to the same NTP server, and
> I noticed a similar 280some ppm drift.

So I got curious, and dropped the patch onto my workstation - 24-rc8-mm1 x86_64
and it applied just fine with an offset reported.

Before the patch, the drift was reported as 140.67 or so.
After, it's sitting at -0.681.

Am running with hpet clocksource rather than acpi_pm, which probably explains
the 140 I see rather than the 280 John saw....


Attachments:
(No filename) (226.00 B)

2008-01-25 14:09:06

by Roman Zippel

[permalink] [raw]
Subject: Re: [PATCH] correct inconsistent ntp interval/tick_length usage

Hi,

On Wed, 23 Jan 2008, john stultz wrote:

> This difference in calculation was causing the clocksource correction
> code to apply a correction factor to the clocksource so the two
> intervals were the same, however this results in the actual frequency of
> the clocksource to be made incorrect. I believe this difference would
> affect all clocksources, although to differing degrees depending on the
> clocksource resolution.

Let's look at why the correction is done in first place. The update steps
don't add up precisely to 1sec (LATCH*HZ != CLOCK_TICK_RATE), so a small
addjustment is used to make up for it. The problem here is that if the
update frequency changes, the addjustment isn't correct anymore.
The simple fix is to just omit the addjustment in these cases in ntp.c:

#if NTP_INTERVAL_FREQ == HZ
...
#else
#define CLOCK_TICK_ADJUST 0
#endif

bye, Roman

2008-01-29 02:28:20

by john stultz

[permalink] [raw]
Subject: Re: [PATCH] correct inconsistent ntp interval/tick_length usage


On Fri, 2008-01-25 at 15:07 +0100, Roman Zippel wrote:
> Hi,
>
> On Wed, 23 Jan 2008, john stultz wrote:
>
> > This difference in calculation was causing the clocksource correction
> > code to apply a correction factor to the clocksource so the two
> > intervals were the same, however this results in the actual frequency of
> > the clocksource to be made incorrect. I believe this difference would
> > affect all clocksources, although to differing degrees depending on the
> > clocksource resolution.
>
> Let's look at why the correction is done in first place. The update steps
> don't add up precisely to 1sec (LATCH*HZ != CLOCK_TICK_RATE), so a small
> addjustment is used to make up for it. The problem here is that if the
> update frequency changes, the addjustment isn't correct anymore.
> The simple fix is to just omit the addjustment in these cases in ntp.c:
>
> #if NTP_INTERVAL_FREQ == HZ
> ...
> #else
> #define CLOCK_TICK_ADJUST 0
> #endif

Hmmm, although this doesn't explain why the issue is seen when
NTP_INTERVAL_FREQ == HZ (as it is in my system's case). Or am I missing
something?

Regardless, current_tick_length() really is the base interval we're
using in the error accumulation loop, so it seems the cleanest interface
to use (just to avoid redundancy at least) when establishing the
clocksource's interval length. Or do you not agree?

thanks
-john





2008-01-29 04:02:36

by Roman Zippel

[permalink] [raw]
Subject: Re: [PATCH] correct inconsistent ntp interval/tick_length usage

Hi,

On Mon, 28 Jan 2008, john stultz wrote:

> Regardless, current_tick_length() really is the base interval we're
> using in the error accumulation loop, so it seems the cleanest interface
> to use (just to avoid redundancy at least) when establishing the
> clocksource's interval length. Or do you not agree?

I see, what you need to use in timex.h for !CONFIG_NO_HZ is:

#define NTP_INTERVAL_LENGTH ((s64)LATCH * NSEC_PER_SEC) / (s64)CLOCK_TICK_RATE)

this calculates the base length of a clock tick in nsec.

current_tick_length() would only work during boot. If you switch clocks
later, it would include random adjustments specific to the old clock.

bye, Roman

2008-01-30 02:14:36

by john stultz

[permalink] [raw]
Subject: Re: [PATCH] correct inconsistent ntp interval/tick_length usage


On Tue, 2008-01-29 at 05:02 +0100, Roman Zippel wrote:
> Hi,
>
> On Mon, 28 Jan 2008, john stultz wrote:
>
> > Regardless, current_tick_length() really is the base interval we're
> > using in the error accumulation loop, so it seems the cleanest interface
> > to use (just to avoid redundancy at least) when establishing the
> > clocksource's interval length. Or do you not agree?
>
> I see, what you need to use in timex.h for !CONFIG_NO_HZ is:
>
> #define NTP_INTERVAL_LENGTH ((s64)LATCH * NSEC_PER_SEC) / (s64)CLOCK_TICK_RATE)
>
> this calculates the base length of a clock tick in nsec.
>
> current_tick_length() would only work during boot. If you switch clocks
> later, it would include random adjustments specific to the old clock.

Ah. Good point. How about the following, tested on x86_64 both with and
without CONFIG_NO_HZ?

Thanks for the review and feedback!
-john



Correct NTP drift caused by using inconsistent NTP_INTERVAL_LENGTHs in
clocksource initialization and error accumulation. This corrects a
280ppm drift seen on some systems using acpi_pm, and affects other
clocksources as well (likely to a lesser degree).


Signed-off-by: John Stultz <[email protected]>


Index: linux/include/linux/timex.h
===================================================================
--- linux.orig/include/linux/timex.h
+++ linux/include/linux/timex.h
@@ -231,7 +231,13 @@ static inline int ntp_synced(void)
#else
#define NTP_INTERVAL_FREQ (HZ)
#endif
-#define NTP_INTERVAL_LENGTH (NSEC_PER_SEC/NTP_INTERVAL_FREQ)
+
+#define CLOCK_TICK_OVERFLOW (LATCH * HZ - CLOCK_TICK_RATE)
+#define CLOCK_TICK_ADJUST (((s64)CLOCK_TICK_OVERFLOW * NSEC_PER_SEC) / \
+ (s64)CLOCK_TICK_RATE)
+
+/* Because using NSEC_PER_SEC would be too easy */
+#define NTP_INTERVAL_LENGTH ((((s64)TICK_USEC*NSEC_PER_USEC*USER_HZ)+CLOCK_TICK_ADJUST)/NTP_INTERVAL_FREQ)

/* Returns how long ticks are at present, in ns / 2^(SHIFT_SCALE-10). */
extern u64 current_tick_length(void);
Index: linux/kernel/time/ntp.c
===================================================================
--- linux.orig/kernel/time/ntp.c
+++ linux/kernel/time/ntp.c
@@ -43,10 +43,6 @@ long time_freq; /* frequency offset (
static long time_reftime; /* time at last adjustment (s) */
long time_adjust;

-#define CLOCK_TICK_OVERFLOW (LATCH * HZ - CLOCK_TICK_RATE)
-#define CLOCK_TICK_ADJUST (((s64)CLOCK_TICK_OVERFLOW * NSEC_PER_SEC) / \
- (s64)CLOCK_TICK_RATE)
-
static void ntp_update_frequency(void)
{
u64 second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ)

2008-01-31 01:55:57

by Roman Zippel

[permalink] [raw]
Subject: Re: [PATCH] correct inconsistent ntp interval/tick_length usage

Hi,

On Tue, 29 Jan 2008, john stultz wrote:

> +/* Because using NSEC_PER_SEC would be too easy */
> +#define NTP_INTERVAL_LENGTH ((((s64)TICK_USEC*NSEC_PER_USEC*USER_HZ)+CLOCK_TICK_ADJUST)/NTP_INTERVAL_FREQ)

Why are you using USER_HZ? Did you test this with HZ!=100?
Anyway, please don't make more complicated than it already is.
What I said previously about the update interval is still valid, so the
correct solution is to use the simpler NTP_INTERVAL_LENGTH calculation
from my last mail and to omit the correction for NO_HZ.

bye, Roman

2008-01-31 02:16:32

by john stultz

[permalink] [raw]
Subject: Re: [PATCH] correct inconsistent ntp interval/tick_length usage


On Thu, 2008-01-31 at 02:55 +0100, Roman Zippel wrote:
> Hi,
>
> On Tue, 29 Jan 2008, john stultz wrote:
>
> > +/* Because using NSEC_PER_SEC would be too easy */
> > +#define NTP_INTERVAL_LENGTH ((((s64)TICK_USEC*NSEC_PER_USEC*USER_HZ)+CLOCK_TICK_ADJUST)/NTP_INTERVAL_FREQ)
>
> Why are you using USER_HZ? Did you test this with HZ!=100?

I'm using USER_HZ, because ntp_update_frequency() uses USER_HZ.

And my tests were run at HZ=1000.

> Anyway, please don't make more complicated than it already is.
> What I said previously about the update interval is still valid, so the
> correct solution is to use the simpler NTP_INTERVAL_LENGTH calculation
> from my last mail and to omit the correction for NO_HZ.

I'm not sure I follow how having two separate and totally different
definitions of NTP_INTERVAL_LENGTH is less complicated then my last
patch.

Maybe I'm missing something from your suggestion? Or maybe could you
contribute your suggestion as a patch?

My concern is we state the accumulation interval is X ns long. Then
current_tick_length() is to return (X + ntp_adjustment), so each
accumulation interval we can keep track of the error and adjust our
interval length.

So if ntp_update_frequency() sets tick_length_base to be:

u64 second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ)
<< TICK_LENGTH_SHIFT;
second_length += (s64)CLOCK_TICK_ADJUST << TICK_LENGTH_SHIFT;
second_length += (s64)time_freq
<< (TICK_LENGTH_SHIFT - SHIFT_NSEC);

tick_length_base = second_length;
do_div(tick_length_base, NTP_INTERVAL_FREQ);


The above is basically (X + part of ntp_adjustment)

So I'm trying to calculate the X as:

#define NTP_INTERVAL_LENGTH ((((s64)TICK_USEC*NSEC_PER_USEC*USER_HZ)+CLOCK_TICK_ADJUST)/NTP_INTERVAL_FREQ)


thanks
-john

2008-01-31 05:03:19

by Roman Zippel

[permalink] [raw]
Subject: Re: [PATCH] correct inconsistent ntp interval/tick_length usage

Hi,

On Wed, 30 Jan 2008, john stultz wrote:

> My concern is we state the accumulation interval is X ns long. Then
> current_tick_length() is to return (X + ntp_adjustment), so each
> accumulation interval we can keep track of the error and adjust our
> interval length.
>
> So if ntp_update_frequency() sets tick_length_base to be:
>
> u64 second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ)
> << TICK_LENGTH_SHIFT;
> second_length += (s64)CLOCK_TICK_ADJUST << TICK_LENGTH_SHIFT;
> second_length += (s64)time_freq
> << (TICK_LENGTH_SHIFT - SHIFT_NSEC);
>
> tick_length_base = second_length;
> do_div(tick_length_base, NTP_INTERVAL_FREQ);
>
>
> The above is basically (X + part of ntp_adjustment)

CLOCK_TICK_ADJUST is based on LATCH and HZ, if the update frequency isn't
based on HZ, there is no point in using it!

Let's look at what actually needs to be done:

1. initializing clock interval:

clock_cycle_interval = timer_cycle_interval * clock_frequency / timer_frequency

It's simply about converting timer cycles into clock cycles, so they're
about the same time interval.
We already make it a bit more complicated than necessary as we go via
nsec:

ntp_interval = timer_cycle_interval * 10^9nsec / timer_frequency

and in clocksource_calculate_interval() basically:

clock_cycle_interval = ntp_interval * clock_frequency / 10^9nsec

Without a fixed timer tick it's actually even easier, then we use the same
frequency for clock and timer and the cycle interval is simply:

clock_cycle_interval = timer_cycle_interval = clock_frequency / NTP_INTERVAL_FREQ

There is no need to use the adjustment here, you'll only cause a mismatch
between the clock and timer cycle interval, which had to be corrected by
NTP.

2. initializing clock adjustment:

clock_adjust = timer_cycle_interval * NTP_INTERVAL_FREQ / timer_frequency - 1sec

This adjustment is used make up for the difference that the timer
frequency isn't evenly divisible by HZ, so that the clock is advanced by
1sec after timer_frequency cycles.

Like above the clock frequency is used for the timer frequency for this
calculation for CONFIG_NO_HZ, so it would be incorrect to use
CLOCK_TICK_RATE/LATCH/HZ here and since NTP_INTERVAL_FREQ is quite small
the resulting adjustment would be rather small, it's easier not to bother
in this case.

What you're basically trying is to add an error to the clock
initialization, so that we can later compensate for it. The correct
solution is really to not add the error in first place, so that there is
no need to compensate for it.

bye. Roman

2008-02-02 01:03:28

by john stultz

[permalink] [raw]
Subject: Re: [PATCH] correct inconsistent ntp interval/tick_length usage


On Thu, 2008-01-31 at 06:02 +0100, Roman Zippel wrote:
> Hi,
>
> On Wed, 30 Jan 2008, john stultz wrote:
>
> > My concern is we state the accumulation interval is X ns long. Then
> > current_tick_length() is to return (X + ntp_adjustment), so each
> > accumulation interval we can keep track of the error and adjust our
> > interval length.
> >
> > So if ntp_update_frequency() sets tick_length_base to be:
> >
> > u64 second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ)
> > << TICK_LENGTH_SHIFT;
> > second_length += (s64)CLOCK_TICK_ADJUST << TICK_LENGTH_SHIFT;
> > second_length += (s64)time_freq
> > << (TICK_LENGTH_SHIFT - SHIFT_NSEC);
> >
> > tick_length_base = second_length;
> > do_div(tick_length_base, NTP_INTERVAL_FREQ);
> >
> >
> > The above is basically (X + part of ntp_adjustment)
>
> CLOCK_TICK_ADJUST is based on LATCH and HZ, if the update frequency isn't
> based on HZ, there is no point in using it!

Hey Roman,

Again, I'm sorry I don't seem to be following your objections. If you
want to suggest a different patch to fix the issue, it might help.

The big issue for me, is that we have to initialize the clocksource
cycle interval so it matches the base tick_length that NTP uses.

To be clear, the issue I'm trying to address is only this:
Assuming there is no NTP adjustment yet to be made, if we initialize the
clocksource interval to X, then compare it with Y when we accumulate, we
introduce error if X and Y are not the same.

It really doesn't matter how long the length is, if we're including
CLOCK_TICK_ADJUST, or if it really matches the actual HZ tick length or
not. The issue is that we have to be consistent. If we're not, then we
introduce error that ntpd has to additionally correct for.

Now, once we're consistent, then CLOCK_TICK_ADJUST can be zero or not
and it won't really matter, other then making sure we accumulate at
exact tick length units. You can set it either way with my patch and
things will still work out to be correct.

My apologies for being so thick headed if I'm just missing something. I
do appreciate the feedback.
-john

2008-02-08 17:36:50

by Roman Zippel

[permalink] [raw]
Subject: Re: [PATCH] correct inconsistent ntp interval/tick_length usage

Hi,

On Fri, 1 Feb 2008, John Stultz wrote:

> > CLOCK_TICK_ADJUST is based on LATCH and HZ, if the update frequency isn't
> > based on HZ, there is no point in using it!
>
> Hey Roman,
>
> Again, I'm sorry I don't seem to be following your objections. If you
> want to suggest a different patch to fix the issue, it might help.

I already gave you the necessary details for how to set
NTP_INTERVAL_LENGTH and in the previous mail I explained the basis for it.
I really don't understand what's your problem with it. Why do you try to
make it more complex than necessary?

> The big issue for me, is that we have to initialize the clocksource
> cycle interval so it matches the base tick_length that NTP uses.
>
> To be clear, the issue I'm trying to address is only this:
> Assuming there is no NTP adjustment yet to be made, if we initialize the
> clocksource interval to X, then compare it with Y when we accumulate, we
> introduce error if X and Y are not the same.
>
> It really doesn't matter how long the length is, if we're including
> CLOCK_TICK_ADJUST, or if it really matches the actual HZ tick length or
> not. The issue is that we have to be consistent. If we're not, then we
> introduce error that ntpd has to additionally correct for.

You don't create consistency by adding corrections all over the place
until it adds up to the right sum.
The current correction is already somewhat of a hack and I'd rather get
rid of it than to let it spread all over the place (it's really only
needed so that people with weird HZ settings don't hit the 500ppm limit
and we're basically cheating to the ntpd by not telling it the real
frequency). Please keep the knowledge about this crutch at a single place
and don't spread it.
Anyway, for NO_HZ this correction is completely irrelevant, so again
there's no point in adding random values all over the place until you get
the correct result.

The only other alternative would be to calculate this correction
dynamically. For this you leave NTP_INTERVAL_LENGTH as is and when
changing clocks you check whether "abs(((cs->xtime_interval *
NTP_INTERVAL_FREQ) >> cs->shift) - NSEC_PER_SEC)" exceeds a certain limit
(e.g. 200usec) and in this case you print a warning message, that the
clock has large base drift value and is a bad ntp source and apply a
correction value. This way the correction only hits the very few system
which might need it and it would be the prefered solution, but it also
requires a few more changes.

bye, Roman

2008-02-09 02:17:52

by john stultz

[permalink] [raw]
Subject: Re: [PATCH] correct inconsistent ntp interval/tick_length usage


On Fri, 2008-02-08 at 18:33 +0100, Roman Zippel wrote:
> Hi,
>
> On Fri, 1 Feb 2008, John Stultz wrote:
>
> > > CLOCK_TICK_ADJUST is based on LATCH and HZ, if the update frequency isn't
> > > based on HZ, there is no point in using it!
> >
> > Hey Roman,
> >
> > Again, I'm sorry I don't seem to be following your objections. If you
> > want to suggest a different patch to fix the issue, it might help.
>
> I already gave you the necessary details for how to set
> NTP_INTERVAL_LENGTH and in the previous mail I explained the basis for it.

-ENOPATCH

We're taking weeks to critique fairly small bug fix. I'm sure we both
have better things to do then continue to misunderstand each other. I'll
do the testing and will happily ack it if it resolves the issue.

> I really don't understand what's your problem with it. Why do you try to
> make it more complex than necessary?

Again, I profusely apologize for not understanding your suggestions. I
do appreciate your feedback and I really respect your insistence on
getting this right, but I do not see exactly how your comments connect
to resolving the bug I'm trying to fix.

> > The big issue for me, is that we have to initialize the clocksource
> > cycle interval so it matches the base tick_length that NTP uses.
> >
> > To be clear, the issue I'm trying to address is only this:
> > Assuming there is no NTP adjustment yet to be made, if we initialize the
> > clocksource interval to X, then compare it with Y when we accumulate, we
> > introduce error if X and Y are not the same.
> >
> > It really doesn't matter how long the length is, if we're including
> > CLOCK_TICK_ADJUST, or if it really matches the actual HZ tick length or
> > not. The issue is that we have to be consistent. If we're not, then we
> > introduce error that ntpd has to additionally correct for.
>
> You don't create consistency by adding corrections all over the place
> until it adds up to the right sum.

Uh, I'd argue that you create consistency by using the same method on
both sides of the equation. That's what I'm trying to do.

Now, If you're disputing that I'm correcting the wrong side of the
equation, then we're getting somewhere. But its still not clear to me
how you're suggesting the other side (which is calculated in
ntp_update_frequency) be changed.

> The current correction is already somewhat of a hack and I'd rather get
> rid of it than to let it spread all over the place (it's really only
> needed so that people with weird HZ settings don't hit the 500ppm limit
> and we're basically cheating to the ntpd by not telling it the real
> frequency). Please keep the knowledge about this crutch at a single place
> and don't spread it.
> Anyway, for NO_HZ this correction is completely irrelevant, so again
> there's no point in adding random values all over the place until you get
> the correct result.

You keep on bringing up NO_HZ, and again, the bug I'm trying to fix
occurs with or without NO_HZ. The fix I proposed resolves the issue with
or without NO_HZ.

> The only other alternative would be to calculate this correction
> dynamically. For this you leave NTP_INTERVAL_LENGTH as is and when
> changing clocks you check whether "abs(((cs->xtime_interval *
> NTP_INTERVAL_FREQ) >> cs->shift) - NSEC_PER_SEC)" exceeds a certain limit
> (e.g. 200usec) and in this case you print a warning message, that the
> clock has large base drift value and is a bad ntp source and apply a
> correction value. This way the correction only hits the very few system
> which might need it and it would be the prefered solution, but it also
> requires a few more changes.

Uh, that seems to be just checking if the xtime_interval is off base, or
if the ntp correction has gone too far. I just don't see how this
connects to the issue at hand.

Just so we're current, I've refined the patch further and included it
below. It now addresses error caused by the intervals of adjusted
clocksource being recalculated on clocksource changes (and thus
including the adjustment in the base interval).

thanks
-john


Fix time skew caused by inconsistent use of NTP_INTERVAL_LENGTH and
current_tick_length(). Patch has three components:
1) Reverts the first version of this patch that made it upstream (while
not perfect, its still avoids the issue for most users).
2) Changes NTP_INTERVAL_LENGTH to be consistent with the calculations
made in ntp_update_frequency().
3) Splits the clocksource mult value into two parts: the original mult
component and the ntp adjusted mult_adj component. This allows us to
correctly recalculate the base xtime_intervals if users switch back and
forth between adjusted clocksources.

Signed-off-by: John Stultz <[email protected]>

diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c
index 3ab0427..07cacc9 100644
--- a/arch/ia64/kernel/time.c
+++ b/arch/ia64/kernel/time.c
@@ -357,7 +357,7 @@ void update_vsyscall(struct timespec *wall, struct clocksource *c)

/* copy fsyscall clock data */
fsyscall_gtod_data.clk_mask = c->mask;
- fsyscall_gtod_data.clk_mult = c->mult;
+ fsyscall_gtod_data.clk_mult = c->mult + c->mult_adj;
fsyscall_gtod_data.clk_shift = c->shift;
fsyscall_gtod_data.clk_fsys_mmio = c->fsys_mmio;
fsyscall_gtod_data.clk_cycle_last = c->cycle_last;
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 3b26fbd..36aa8da 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -814,7 +814,7 @@ void update_vsyscall(struct timespec *wall_time, struct clocksource *clock)

/* XXX this assumes clock->shift == 22 */
/* 4611686018 ~= 2^(20+64-22) / 1e9 */
- t2x = (u64) clock->mult * 4611686018ULL;
+ t2x = (u64) (clock->mult + clock->mult_adj) * 4611686018ULL;
stamp_xsec = (u64) xtime.tv_nsec * XSEC_PER_SEC;
do_div(stamp_xsec, 1000000000);
stamp_xsec += (u64) xtime.tv_sec * XSEC_PER_SEC;
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 3f82427..14afd48 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -83,7 +83,7 @@ void update_vsyscall(struct timespec *wall_time, struct clocksource *clock)
vsyscall_gtod_data.clock.vread = clock->vread;
vsyscall_gtod_data.clock.cycle_last = clock->cycle_last;
vsyscall_gtod_data.clock.mask = clock->mask;
- vsyscall_gtod_data.clock.mult = clock->mult;
+ vsyscall_gtod_data.clock.mult = clock->mult + clock->mult_adj;
vsyscall_gtod_data.clock.shift = clock->shift;
vsyscall_gtod_data.wall_time_sec = wall_time->tv_sec;
vsyscall_gtod_data.wall_time_nsec = wall_time->tv_nsec;
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 85778a4..6719e50 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -63,6 +63,7 @@ struct clocksource {
cycle_t (*read)(void);
cycle_t mask;
u32 mult;
+ s32 mult_adj;
u32 shift;
unsigned long flags;
cycle_t (*vread)(void);
@@ -179,7 +180,7 @@ static inline cycle_t clocksource_read(struct clocksource *cs)
static inline s64 cyc2ns(struct clocksource *cs, cycle_t cycles)
{
u64 ret = (u64)cycles;
- ret = (ret * cs->mult) >> cs->shift;
+ ret = (ret * (cs->mult + cs->mult_adj)) >> cs->shift;
return ret;
}

diff --git a/include/linux/timex.h b/include/linux/timex.h
index 8ea3e71..66ae8dd 100644
--- a/include/linux/timex.h
+++ b/include/linux/timex.h
@@ -232,7 +232,13 @@ static inline int ntp_synced(void)
#else
#define NTP_INTERVAL_FREQ (HZ)
#endif
-#define NTP_INTERVAL_LENGTH (NSEC_PER_SEC/NTP_INTERVAL_FREQ)
+
+#define CLOCK_TICK_OVERFLOW (LATCH * HZ - CLOCK_TICK_RATE)
+#define CLOCK_TICK_ADJUST (((s64)CLOCK_TICK_OVERFLOW * NSEC_PER_SEC) / \
+ (s64)CLOCK_TICK_RATE)
+
+/* Becaues using NSEC_PER_SEC would be too easy */
+#define NTP_INTERVAL_LENGTH ((((s64)TICK_USEC*NSEC_PER_USEC*USER_HZ)+CLOCK_TICK_ADJUST)/NTP_INTERVAL_FREQ)

/* Returns how long ticks are at present, in ns / 2^(SHIFT_SCALE-10). */
extern u64 current_tick_length(void);
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index e64efaf..c88b591 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -43,10 +43,6 @@ long time_freq; /* frequency offset (scaled ppm)*/
static long time_reftime; /* time at last adjustment (s) */
long time_adjust;

-#define CLOCK_TICK_OVERFLOW (LATCH * HZ - CLOCK_TICK_RATE)
-#define CLOCK_TICK_ADJUST (((s64)CLOCK_TICK_OVERFLOW * NSEC_PER_SEC) / \
- (s64)CLOCK_TICK_RATE)
-
static void ntp_update_frequency(void)
{
u64 second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ)
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index cd5dbc4..e319ecc 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -187,8 +187,7 @@ static void change_clocksource(void)

clock->error = 0;
clock->xtime_nsec = 0;
- clocksource_calculate_interval(clock,
- (unsigned long)(current_tick_length()>>TICK_LENGTH_SHIFT));
+ clocksource_calculate_interval(clock, NTP_INTERVAL_LENGTH);

tick_clock_notify();

@@ -245,8 +244,7 @@ void __init timekeeping_init(void)
ntp_clear();

clock = clocksource_get_next();
- clocksource_calculate_interval(clock,
- (unsigned long)(current_tick_length()>>TICK_LENGTH_SHIFT));
+ clocksource_calculate_interval(clock, NTP_INTERVAL_LENGTH);
clock->cycle_last = clocksource_read(clock);

xtime.tv_sec = sec;
@@ -426,7 +424,7 @@ static void clocksource_adjust(s64 offset)
} else
return;

- clock->mult += adj;
+ clock->mult_adj += adj;
clock->xtime_interval += interval;
clock->xtime_nsec -= offset;
clock->error -= (interval - offset) <<






2008-02-09 04:47:39

by Roman Zippel

[permalink] [raw]
Subject: Re: [PATCH] correct inconsistent ntp interval/tick_length usage

Hi,

On Fri, 8 Feb 2008, john stultz wrote:

>
> clock = clocksource_get_next();
> - clocksource_calculate_interval(clock,
> - (unsigned long)(current_tick_length()>>TICK_LENGTH_SHIFT));
> + clocksource_calculate_interval(clock, NTP_INTERVAL_LENGTH);
> clock->cycle_last = clocksource_read(clock);
>

Only now I noticed that the first patch had been merged without any
further question. :-(
What point is there in reviewing patches, if everything is merged anyway. :-(

bye, Roman

2008-02-09 05:12:59

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] correct inconsistent ntp interval/tick_length usage

On Sat, 9 Feb 2008 05:47:18 +0100 (CET) Roman Zippel <[email protected]> wrote:

> Hi,
>
> On Fri, 8 Feb 2008, john stultz wrote:
>
> >
> > clock = clocksource_get_next();
> > - clocksource_calculate_interval(clock,
> > - (unsigned long)(current_tick_length()>>TICK_LENGTH_SHIFT));
> > + clocksource_calculate_interval(clock, NTP_INTERVAL_LENGTH);
> > clock->cycle_last = clocksource_read(clock);
> >
>
> Only now I noticed that the first patch had been merged without any
> further question. :-(
> What point is there in reviewing patches, if everything is merged anyway. :-(
>

oops, mistake, sorry. There's plenty of time to fix it though.

2008-02-09 14:13:34

by Roman Zippel

[permalink] [raw]
Subject: Re: [PATCH] correct inconsistent ntp interval/tick_length usage

Hi,

On Fri, 8 Feb 2008, Andrew Morton wrote:

> > Only now I noticed that the first patch had been merged without any
> > further question. :-(
> > What point is there in reviewing patches, if everything is merged anyway. :-(
> >
>
> oops, mistake, sorry. There's plenty of time to fix it though.

It has been signed off by both Ingo and Thomas and neither noticed
anything? This makes me very afraid of the merging process...

bye, Roman

2008-02-10 18:52:18

by Roman Zippel

[permalink] [raw]
Subject: Re: [PATCH] correct inconsistent ntp interval/tick_length usage

Hi,

On Fri, 8 Feb 2008, john stultz wrote:

> -ENOPATCH
>
> We're taking weeks to critique fairly small bug fix. I'm sure we both
> have better things to do then continue to misunderstand each other. I'll
> do the testing and will happily ack it if it resolves the issue.

I don't want to just send a patch, I want you to understand why your
approach is wrong.

> Now, If you're disputing that I'm correcting the wrong side of the
> equation, then we're getting somewhere. But its still not clear to me
> how you're suggesting the other side (which is calculated in
> ntp_update_frequency) be changed.
> [..]
> You keep on bringing up NO_HZ, and again, the bug I'm trying to fix
> occurs with or without NO_HZ. The fix I proposed resolves the issue with
> or without NO_HZ.

The correction is incorrect for NO_HZ.
Let's try it the other way around, as my explanation seem to lack
something.
Please try to explain what this correction actually means and why it
should be correct for NO_HZ as well.

> > The only other alternative would be to calculate this correction
> > dynamically. For this you leave NTP_INTERVAL_LENGTH as is and when
> > changing clocks you check whether "abs(((cs->xtime_interval *
> > NTP_INTERVAL_FREQ) >> cs->shift) - NSEC_PER_SEC)" exceeds a certain limit
> > (e.g. 200usec) and in this case you print a warning message, that the
> > clock has large base drift value and is a bad ntp source and apply a
> > correction value. This way the correction only hits the very few system
> > which might need it and it would be the prefered solution, but it also
> > requires a few more changes.
>
> Uh, that seems to be just checking if the xtime_interval is off base, or
> if the ntp correction has gone too far. I just don't see how this
> connects to the issue at hand.

Above is the key to understanding the problem, if this difference is small
enough there is no need to correct anything.

This is the original patch which introduced the correction:

http://git.kernel.org/?p=linux/kernel/git/torvalds/old-2.6-bkcvs.git;a=commitdiff;h=69c1cd9218e4cf3016b0f435d6ef3dffb5a53860

Keep in mind that at that time PIT was used as the base clock (even if the
tsc was used, it was relative to PIT). So how much of those assumptions
are still valid today (especially with NO_HZ enabled)?

bye, Roman

2008-02-12 00:10:18

by john stultz

[permalink] [raw]
Subject: Re: [PATCH] correct inconsistent ntp interval/tick_length usage

On Sun, 2008-02-10 at 19:45 +0100, Roman Zippel wrote:
> Hi,
>
> On Fri, 8 Feb 2008, john stultz wrote:
>
> > -ENOPATCH
> >
> > We're taking weeks to critique fairly small bug fix. I'm sure we both
> > have better things to do then continue to misunderstand each other. I'll
> > do the testing and will happily ack it if it resolves the issue.
>
> I don't want to just send a patch, I want you to understand why your
> approach is wrong.

With all due respect, it also keeps the critique in one direction and
makes your review less collaborative and more confrontational then I
suspect (or maybe just hope) you intend.


> > Now, If you're disputing that I'm correcting the wrong side of the
> > equation, then we're getting somewhere. But its still not clear to me
> > how you're suggesting the other side (which is calculated in
> > ntp_update_frequency) be changed.
> > [..]
> > You keep on bringing up NO_HZ, and again, the bug I'm trying to fix
> > occurs with or without NO_HZ. The fix I proposed resolves the issue with
> > or without NO_HZ.
>
> The correction is incorrect for NO_HZ.
> Let's try it the other way around, as my explanation seem to lack
> something.
> Please try to explain what this correction actually means and why it
> should be correct for NO_HZ as well.


For the course of this discussion, lets assume we're talking about
2.6.24.

One of the goals of the timekeeping design is to let it be flexible
enough that accumulation interval can be irregular and it will still
behave correctly. This stems from the one of original issues with the
old tick based timekeeping: lost or irregular timer interrupts.

Now, we do used fixed interval accumulation in update_wall_time, but
that was for two reasons:
1) In the common case, its faster to do the compare and add/subtract
then the mult orignally used (*a)
2) It allows for the extra fine-grained NTP error accounting.


This fine grained error accounting is where the bug I'm trying to
address is cropping up from. In order to have the comparison we need to
have two values:
A: The clocksource's notion of how long the fixed interval is.
B: NTP's notion of how long the fixed interval is.

When no NTP adjustment is being made, these two values should be equal,
but currently they are not. This is what causes the 280ppm error seen on
my test system.

Part A is calculated in the following fashion:
#define NTP_INTERVAL_LENGTH (NSEC_PER_SEC/NTP_INTERVAL_FREQ)

Which is then functionally shifted up by TICK_LENGTH_SHIFT, but for the
course of this discussion, lets ignore that.

Part B is calculated in ntp_update_frequency() as:

u64 second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ)
<< TICK_LENGTH_SHIFT;
second_length += (s64)CLOCK_TICK_ADJUST << TICK_LENGTH_SHIFT;
second_length += (s64)time_freq << (TICK_LENGTH_SHIFT - SHIFT_NSEC);

tick_length_base = second_length;
do_div(tick_length_base, NTP_INTERVAL_FREQ);


If we're assuming there is no NTP adjustment, and avoiding the
TICK_LENGTH_SHIFT, this can be shorted to:
B = ((TICK_USEC * NSEC_PER_USEC * USER_HZ)
+ CLOCK_TICK_ADJUST)/NTP_ITNERVAL_FREQ


The A vs B comparison can be shortened further to:
NSEC_PER_SEC != (TICK_USEC * NSEC_PER_USEC * USER_HZ)
+ CLOCK_TICK_ADJUST

So now on to what seems to be the point of contention:
If A != B, which side do we fix?


My patches fix the A side so that it matches B, which on its face isn't
terribly complicated, but you seem to be suggesting we fix the B side
instead (Now I'm assuming here, because there's no patch. So I can only
speak to your emails, which were not clear to me).

Really it doesn't matter to me, as long as it works in all the cases
needed. You're argument of simplifying B so its closer to A does sound
initially compelling. Simpler is better, so lets follow the discussion
below to see if it will work, or if I'm just missing something.


*a: Note: this does backfire on us if the fixed interval is too small,
causing the loop to run too many times, which was seen when NO_HZ was
being merged. This was why we changed the NTP_INTERVAL_FREQ to be about
half a second w/ NO_HZ, which reduced the time spent in the accumulation
loop.


> This is the original patch which introduced the correction:
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/old-2.6-bkcvs.git;a=commitdiff;h=69c1cd9218e4cf3016b0f435d6ef3dffb5a53860
>
> Keep in mind that at that time PIT was used as the base clock (even if the
> tsc was used, it was relative to PIT). So how much of those assumptions
> are still valid today (especially with NO_HZ enabled)?

Indeed! When we use the TSC, acpi_pm, HPET or other fine grained
clocksource (regardless of NO_HZ), it does seem silly to take these odd
PIT based values into account.

However, what happens if a system uses the PIT or jiffies clocksource,
which do suffer from the HZ / ACTHZ error resolved by the patch linked
to above? Since those clocksources are so course, we still need to take
that error into account. So in that regard the assumptions are still
valid, no?

You've seemingly suggested we set CLOCK_TICK_ADJUST to zero in the
CONFIG_NO_HZ situation, however note that a CONFIG_NO_HZ kernel should
be able to boot and run on a box that only supports the jiffies or pit
clocksource (just not be able to switching into NO_HZ mode). Further,
how does that resolve the issue w/ the tsc/hpet/acpi_pm clocksources
that still exist when CONFIG_NO_HZ=n?

I just don't see how this will work.

Now, don't get me wrong, I'd love it if we didn't have to include
CLOCK_TICK_ADJUST, simpler is better, but I don't see how it will work
correctly.

So instead of trying to fix part B by removing the CLOCK_TICK_ADJUST
component, lets look at my solution, which adds the pit based
corrections to part A.

***This is the key part***: Since the hpet/acpi_pm/tsc are high res
clocksources, they can easily adapt to the request for a slightly odd
interval length, and it will not affect them. They are flexible like
that.

AGAIN: For high res clocksources, the interval length requested can be
almost(see note *a above, as well as hardware wrapping which puts some
limitations on us) arbitrary and the code should function correctly.

So CLOCK_TICK_ADJUST doesn't really mean anything in relation to
hpet/acpi_pm/tsc, but it also doesn't hurt us to include it.

But if, when using the same kernel, we switch to the jiffies
clocksource, we need to request a interval length that matches the
actual jiffy/pit tick length. This is why CLOCK_TICK_ADJUST is still
needed.

So by modifying part A, we achieve the consistency needed (that is
A==B), and we still function well with low-res clocksources. Further we
don't need any config time decisions to be made to function correctly.
It just all works out.

Now, I'm sure its quite likely I've somehow not understood your
suggestions. Maybe you are not suggesting adding lots of compile time
logic to try to even the B side out. Again, a patch would likely clarify
all of this.

thanks
-john

2008-02-12 14:45:22

by Roman Zippel

[permalink] [raw]
Subject: Re: [PATCH] correct inconsistent ntp interval/tick_length usage

Hi,

On Mon, 11 Feb 2008, john stultz wrote:

> > I don't want to just send a patch, I want you to understand why your
> > approach is wrong.
>
> With all due respect, it also keeps the critique in one direction and
> makes your review less collaborative and more confrontational then I
> suspect (or maybe just hope) you intend.

I don't think that's necessarily a contradiction, if we keep it to
confronting the problem. A simple patch wouldn't have provided any further
understanding of the problem compared to what I already said. You would
have seen what the patch does (which I described already differently), but
not really why it does that.

In this sense I prefer to force the confrontation of the problem. I'm
afraid a working patch would encourage to simply ignore the problem, as
your problem at hand would be solved without completely understanding it.
The point is that I'd really like you to understand the problem, so I'm
not the only one who understands this code :) and in the end it might
allow better collaboration to further improve this code.

To make it very clear this is just about understanding the problem, I
don't want to force a specific solution (which a patch would practically
do). If we both understand the problem, we can also discuss the solution
and maybe we find something better, but maybe I'm also totally wrong,
which would be a little embarrassing :), but that would be fine too. There
may be better ways to go about this problem, but IMO it would still be
better than just ignoring the problem and force it with a patch.

> This fine grained error accounting is where the bug I'm trying to
> address is cropping up from. In order to have the comparison we need to
> have two values:
> A: The clocksource's notion of how long the fixed interval is.
> B: NTP's notion of how long the fixed interval is.
>
> When no NTP adjustment is being made, these two values should be equal,
> but currently they are not. This is what causes the 280ppm error seen on
> my test system.
>
> Part A is calculated in the following fashion:
> #define NTP_INTERVAL_LENGTH (NSEC_PER_SEC/NTP_INTERVAL_FREQ)
>
> Which is then functionally shifted up by TICK_LENGTH_SHIFT, but for the
> course of this discussion, lets ignore that.
>
> Part B is calculated in ntp_update_frequency() as:
>
> u64 second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ)
> << TICK_LENGTH_SHIFT;
> second_length += (s64)CLOCK_TICK_ADJUST << TICK_LENGTH_SHIFT;
> second_length += (s64)time_freq << (TICK_LENGTH_SHIFT - SHIFT_NSEC);
>
> tick_length_base = second_length;
> do_div(tick_length_base, NTP_INTERVAL_FREQ);
>
>
> If we're assuming there is no NTP adjustment, and avoiding the
> TICK_LENGTH_SHIFT, this can be shorted to:
> B = ((TICK_USEC * NSEC_PER_USEC * USER_HZ)
> + CLOCK_TICK_ADJUST)/NTP_ITNERVAL_FREQ
>
>
> The A vs B comparison can be shortened further to:
> NSEC_PER_SEC != (TICK_USEC * NSEC_PER_USEC * USER_HZ)
> + CLOCK_TICK_ADJUST
>
> So now on to what seems to be the point of contention:
> If A != B, which side do we fix?
>
>
> My patches fix the A side so that it matches B, which on its face isn't
> terribly complicated, but you seem to be suggesting we fix the B side
> instead (Now I'm assuming here, because there's no patch. So I can only
> speak to your emails, which were not clear to me).

If we go from your base assumption above "there is no NTP adjustment", I
would actually agree with you and it wouldn't matter much on which side
to correct the equation.

The question is now what happens, if there are NTP adjustments, i.e. when
the time_freq value isn't zero. Based on this initialization we tell the
NTP daemon the base frequency, although not directly but it knows the
length freq1 == 1 sec. If the clock now needs adjustment, the NTP daemon
tells the kernel via time_freq how to change the speed so that freq1 ==
1 sec + time_freq.

The problem is now that by using CLOCK_TICK_ADJUST we are cheating and we
don't tell the NTP daemon the real frequency. We define 1 sec as freq2 +
tick_adjust and with the NTP adjustment we have freq2 + tick_adj == 1 sec
+ time_freq. Above initialization now calcalutes the base time length for
an update interval of freq2 / NTP_INTERVAL_FREQ, this means the requested
time_freq adjustment isn't applied to (freq2 + tick_adj) cycles but to
freq2 cycles, so this finally means any adjustment made by the NTP daemon
is slightly off.

To demonstrate this let's look at some real values and let's use the PIT
for it (as this is where this originated and on which CLOCK_TICK_ADJUST is
based on). With freq1=1193182 and HZ=1000 we program the timer with 1193
cycles and the actual update frequency is freq2=1193000. To adjust for
this difference we change the length of a timer tick:

(NSEC_PER_SEC + CLOCK_TICK_ADJUST) / NTP_INTERVAL_FREQ

or

(10^9 nsec - 152533 nsec) / 1000

This way every 1000 interrupts or 1193000 cycles the clock is advanced by
999847467 nsec, so that we advance time according to freq1, but any
further adjustments aren't applied to an interval of what the NTP daemon
thinks is 1 sec but 999847467 nsec instead.

In consequence this means, if we want to improve timekeeping, we first set
the (update_cycles * NTP_INTERVAL_FREQ) interval as close as possible to
the real frequency. It doesn't has to be perfect as we usually don't know
the real frequency with 100% certainty anyway. Second, we drop the tick
adjustment if possible and leave the adjustments to the NTP daemon and as
long as the drift is within the 500ppm limit it has no problem to manage
this.

> However, what happens if a system uses the PIT or jiffies clocksource,
> which do suffer from the HZ / ACTHZ error resolved by the patch linked
> to above? Since those clocksources are so course, we still need to take
> that error into account. So in that regard the assumptions are still
> valid, no?
>
> You've seemingly suggested we set CLOCK_TICK_ADJUST to zero in the
> CONFIG_NO_HZ situation, however note that a CONFIG_NO_HZ kernel should
> be able to boot and run on a box that only supports the jiffies or pit
> clocksource (just not be able to switching into NO_HZ mode). Further,
> how does that resolve the issue w/ the tsc/hpet/acpi_pm clocksources
> that still exist when CONFIG_NO_HZ=n?
>
> I just don't see how this will work.

Look at your Part A, with NO_HZ the update interval is much larger, so any
rounding error becomes practically irrelevant. In the above PIT example
the update interval wouldn't be 1193 cycles but 596591 cycles instead.

I hope this helps to understand what I suggested. For NO_HZ we can just
drop this correction without problems. In the other case it's a little
more complicated, but we basically have to check the update interval and
if (interval * NTP_INTERVAL_FREQ) is within the 500ppm limit there isn't
much we really have to do.
This correction was only introduced to accommodate clocks which exceed
this 500ppm limit for whatever reason, so the NTP daemon isn't confused.
The best solution would be really to limit this correction to these
clocks.

bye, Roman

2008-02-14 04:37:00

by john stultz

[permalink] [raw]
Subject: Re: [PATCH] correct inconsistent ntp interval/tick_length usage

On Tue, 2008-02-12 at 15:36 +0100, Roman Zippel wrote:
> On Mon, 11 Feb 2008, john stultz wrote:
> > This fine grained error accounting is where the bug I'm trying to
> > address is cropping up from. In order to have the comparison we need to
> > have two values:
> > A: The clocksource's notion of how long the fixed interval is.
> > B: NTP's notion of how long the fixed interval is.
> >
> > When no NTP adjustment is being made, these two values should be equal,
> > but currently they are not. This is what causes the 280ppm error seen on
> > my test system.
> >
> > Part A is calculated in the following fashion:
> > #define NTP_INTERVAL_LENGTH (NSEC_PER_SEC/NTP_INTERVAL_FREQ)
> >
> > Which is then functionally shifted up by TICK_LENGTH_SHIFT, but for the
> > course of this discussion, lets ignore that.
> >
> > Part B is calculated in ntp_update_frequency() as:
> >
> > u64 second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ)
> > << TICK_LENGTH_SHIFT;
> > second_length += (s64)CLOCK_TICK_ADJUST << TICK_LENGTH_SHIFT;
> > second_length += (s64)time_freq << (TICK_LENGTH_SHIFT - SHIFT_NSEC);
> >
> > tick_length_base = second_length;
> > do_div(tick_length_base, NTP_INTERVAL_FREQ);
> >
> >
> > If we're assuming there is no NTP adjustment, and avoiding the
> > TICK_LENGTH_SHIFT, this can be shorted to:
> > B = ((TICK_USEC * NSEC_PER_USEC * USER_HZ)
> > + CLOCK_TICK_ADJUST)/NTP_ITNERVAL_FREQ
> >
> >
> > The A vs B comparison can be shortened further to:
> > NSEC_PER_SEC != (TICK_USEC * NSEC_PER_USEC * USER_HZ)
> > + CLOCK_TICK_ADJUST
> >
> > So now on to what seems to be the point of contention:
> > If A != B, which side do we fix?
> >
> >
> > My patches fix the A side so that it matches B, which on its face isn't
> > terribly complicated, but you seem to be suggesting we fix the B side
> > instead (Now I'm assuming here, because there's no patch. So I can only
> > speak to your emails, which were not clear to me).
>
> If we go from your base assumption above "there is no NTP adjustment", I
> would actually agree with you and it wouldn't matter much on which side
> to correct the equation.
>
> The question is now what happens, if there are NTP adjustments, i.e. when
> the time_freq value isn't zero. Based on this initialization we tell the
> NTP daemon the base frequency, although not directly but it knows the
> length freq1 == 1 sec. If the clock now needs adjustment, the NTP daemon
> tells the kernel via time_freq how to change the speed so that freq1 ==
> 1 sec + time_freq.
>
> The problem is now that by using CLOCK_TICK_ADJUST we are cheating and we
> don't tell the NTP daemon the real frequency. We define 1 sec as freq2 +
> tick_adjust and with the NTP adjustment we have freq2 + tick_adj == 1 sec
> + time_freq. Above initialization now calcalutes the base time length for
> an update interval of freq2 / NTP_INTERVAL_FREQ, this means the requested
> time_freq adjustment isn't applied to (freq2 + tick_adj) cycles but to
> freq2 cycles, so this finally means any adjustment made by the NTP daemon
> is slightly off.

Oh! So your issue is that since time_freq is stored in ppm, or in effect
a usecs per sec offset, when we add it to something other then 1 second
we mis-apply what NTP tells us to. Is that right?


> To demonstrate this let's look at some real values and let's use the PIT
> for it (as this is where this originated and on which CLOCK_TICK_ADJUST is
> based on). With freq1=1193182 and HZ=1000 we program the timer with 1193
> cycles and the actual update frequency is freq2=1193000. To adjust for
> this difference we change the length of a timer tick:
>
> (NSEC_PER_SEC + CLOCK_TICK_ADJUST) / NTP_INTERVAL_FREQ
>
> or
>
> (10^9 nsec - 152533 nsec) / 1000
>
> This way every 1000 interrupts or 1193000 cycles the clock is advanced by
> 999847467 nsec, so that we advance time according to freq1, but any
> further adjustments aren't applied to an interval of what the NTP daemon
> thinks is 1 sec but 999847467 nsec instead.

Right, so if NTP has us apply a 10ppm adjustment, instead of doing:
NSEC_PER_SEC + 10,000

We're doing:
NSEC_PER_SEC + CLOCK_TICK_ADJUST + 10,000

Which, if I'm doing my math right, results in a 10.002ppm adjustment
(using the 999847467ns number above), instead of just a 10ppm
adjustment.

Now, true, this is an error, but it is a pretty small one. Even at the
maximum 500ppm value, it only results in an error of 76 parts per
billion. As you'll see below, that tends to be less error then what we
get from the clock granularity. Is there something else I'm missing here
or is this really the core issue you're concerned with?

If not, I'll agree that we may need to tune the "B" side of the
equation, but can we also agree that the very large error caused, even
without NTP adjustments, being involved by the A != B can be addressed
separately. That way if A is calculated in the same way as B, when we
fix B, we will at the same time fix A. Does that sound like a step
forward?

> In consequence this means, if we want to improve timekeeping, we first set
> the (update_cycles * NTP_INTERVAL_FREQ) interval as close as possible to
> the real frequency. It doesn't has to be perfect as we usually don't know
> the real frequency with 100% certainty anyway.

This might need some more explanation, as I'm not certain I know what
update_cycles refers to. Do you mean cycle_interval? I guess I'm not
completely sure how you're suggesting we change things here.

> Second, we drop the tick
> adjustment if possible and leave the adjustments to the NTP daemon and as
> long as the drift is within the 500ppm limit it has no problem to manage
> this.

Dropping the tick adjustment? By that do you mean the tick_usec value
set by adjtimex()? I don't quite see why we want that. Could you expand
here?


> > However, what happens if a system uses the PIT or jiffies clocksource,
> > which do suffer from the HZ / ACTHZ error resolved by the patch linked
> > to above? Since those clocksources are so course, we still need to take
> > that error into account. So in that regard the assumptions are still
> > valid, no?
> >
> > You've seemingly suggested we set CLOCK_TICK_ADJUST to zero in the
> > CONFIG_NO_HZ situation, however note that a CONFIG_NO_HZ kernel should
> > be able to boot and run on a box that only supports the jiffies or pit
> > clocksource (just not be able to switching into NO_HZ mode). Further,
> > how does that resolve the issue w/ the tsc/hpet/acpi_pm clocksources
> > that still exist when CONFIG_NO_HZ=n?
> >
> > I just don't see how this will work.
>
> Look at your Part A, with NO_HZ the update interval is much larger, so any
> rounding error becomes practically irrelevant. In the above PIT example
> the update interval wouldn't be 1193 cycles but 596591 cycles instead.

Ok, so sitting down and doing the math, I see for the pit clocksource
your point is correct. For relatively fine grained clocksources (the pit
having ~900ns resolution) the larger the cycle_interval gets, the
smaller the error becomes. So when the NTP_INTERVAL_FREQ is 2 instead of
HZ, the error does get quite small, even if we don't add in the
CLOCK_TICK_ADJUST factor.

I think here's where the contention has been, as I've been using the
jiffies clocksource in my head for the previous back-and-forths. The
jiffies clocksource is different, as due to its coarseness, the normal
cycles_interval is only 1. So dropping NTP_INTERVAL_FREQ from HZ to 2
doesn't improve the error, it basically stays constant.

I sat down and ran through some numbers, seeing how the error rates
changed as HZ changed, with and without CLOCK_TICK_ADJUST and how NO_HZ
effects it as well. The .c file is attached, but here is the output for
jiffies, pit, and acpi_pm:


HZ=1000 CLOCK_TICK_ADJUST=-152533
jiffies 467 ppb error
jiffies NOHZ 467 ppb error
pit 0 ppb error
pit NOHZ 0 ppb error
acpi_pm -280 ppb error
acpi_pm NOHZ 279 ppb error

HZ=100 CLOCK_TICK_ADJUST=15085
jiffies 15085 ppb error
jiffies NOHZ 15085 ppb error
pit -1 ppb error
pit NOHZ -1 ppb error
acpi_pm -281 ppb error
acpi_pm NOHZ 278 ppb error

HZ=250 CLOCK_TICK_ADJUST=56990
jiffies -5510 ppb error
jiffies NOHZ -5510 ppb error
pit 0 ppb error
pit NOHZ 0 ppb error
acpi_pm -281 ppb error
acpi_pm NOHZ 278 ppb error

HZ=300 CLOCK_TICK_ADJUST=-68723
jiffies -3523 ppb error
jiffies NOHZ -3523 ppb error
pit 1 ppb error
pit NOHZ 1 ppb error
acpi_pm -279 ppb error
acpi_pm NOHZ 279 ppb error

HZ=1000 CLOCK_TICK_ADJUST=0
jiffies 153000 ppb error
jiffies NOHZ 153000 ppb error
pit 152533 ppb error
pit NOHZ 0 ppb error
acpi_pm -127112 ppb error
acpi_pm NOHZ 279 ppb error

HZ=100 CLOCK_TICK_ADJUST=0
jiffies 0 ppb error
jiffies NOHZ 0 ppb error
pit -15086 ppb error
pit NOHZ 0 ppb error
acpi_pm 12571 ppb error
acpi_pm NOHZ 279 ppb error

HZ=250 CLOCK_TICK_ADJUST=0
jiffies -62500 ppb error
jiffies NOHZ -62500 ppb error
pit -56990 ppb error
pit NOHZ 0 ppb error
acpi_pm 12571 ppb error
acpi_pm NOHZ 279 ppb error

HZ=300 CLOCK_TICK_ADJUST=0
jiffies 65200 ppb error
jiffies NOHZ 65200 ppb error
pit 68724 ppb error
pit NOHZ 0 ppb error
acpi_pm -15366 ppb error
acpi_pm NOHZ 279 ppb error

So you are right, w/ pit & NO_HZ, the granularity error is always very
small both with or without CLOCK_TICK_ADJUST.

However, without CLOCK_TICK_ADJUST, the jiffies error increases for all
values of HZ except 100 (which at first seems odd, but seems to be due
to loss from rounding in the ACTHZ calculation).

One interesting surprise in the data: With CLOCK_TICK_ADJUST=0, the
acpi_pm's error frequency shot up in the !NO_HZ cases. This ends up
being due to the acpi_pm being a very close to a multiple (3x) of the
pit frequency, so CLOCK_TICK_ADJUST helps it as well.


> I hope this helps to understand what I suggested. For NO_HZ we can just
> drop this correction without problems. In the other case it's a little
> more complicated, but we basically have to check the update interval and
> if (interval * NTP_INTERVAL_FREQ) is within the 500ppm limit there isn't
> much we really have to do.

Well, if it weren't for the jiffies clocksource error, to correct for
the sub 100ppb error, I'd agree with you to #define CLOCK_TICK_ADJUST 0
for CONFIG_NO_HZ. But until that's worked out I'm not sure.

Further it seems to point that if we are going to be chasing down small
sub-100ppb errors (which I think would be great to do, but lets not make
users to endure 200+ppm errors while we debate the fine-tuning :) we
might want to consider a method where we let ntp_update_freq take into
account the current clocksource's interval length, so it becomes the
base value against which we apply adjustments (scaled appropriately).


Phew, this is turning into a long mail. So let's summarize:

There are 3 sources of error that we've discussed here:
1) The large (280ppm) error seen with no-NTP adjustment, caused by the
inconsistent (A!=B) interval comparisons which started this discussion,
which my patch does address.

2) The medium (153-0ppm)clocksource granularity error shown in the data
above, caused by the actual interval length not matching the requested
interval length (what CLOCK_TICK_ADJUST tries to compensate for when
using jiffies/PIT).

3) The smaller (0.076ppm max) NTP adjustment error caused by the
parts-per-billion == nsec-per-sec shortcut used in combination w/ a base
interval that is not actually a second (due to CLOCK_TICK_ADJUST being
added in) in ntp_update_frequency()

All of these are somewhat interdependent. One could consider that #1 is
in part due to #2, and #2 also causes #3, as there isn't proper scaling
being done in #3.

Fixing all of these in one swoop would be great, and I think we should
spend time continuing to refine the code. However, I'm just not sure
that we have a solution at this point for all the cases we have to
consider.

So given I do want to continue this discussion and work, would it be
possible that I fix up the #1 item above (including resolving the double
conflicting merges that have already gone in) as implemented in my last
patch, without your objection?

Again, I really appreciate you're continued deep observations and
feedback here.

thanks
-john


Attachments:
error-calc.c (2.26 kB)

2008-02-16 04:24:21

by Roman Zippel

[permalink] [raw]
Subject: Re: [PATCH] correct inconsistent ntp interval/tick_length usage

Hi,

On Wed, 13 Feb 2008, john stultz wrote:

> Oh! So your issue is that since time_freq is stored in ppm, or in effect
> a usecs per sec offset, when we add it to something other then 1 second
> we mis-apply what NTP tells us to. Is that right?

Pretty much everything is centered around that 1sec, so the closer the
update frequency is to it the better.

> Right, so if NTP has us apply a 10ppm adjustment, instead of doing:
> NSEC_PER_SEC + 10,000
>
> We're doing:
> NSEC_PER_SEC + CLOCK_TICK_ADJUST + 10,000
>
> Which, if I'm doing my math right, results in a 10.002ppm adjustment
> (using the 999847467ns number above), instead of just a 10ppm
> adjustment.
>
> Now, true, this is an error, but it is a pretty small one. Even at the
> maximum 500ppm value, it only results in an error of 76 parts per
> billion. As you'll see below, that tends to be less error then what we
> get from the clock granularity. Is there something else I'm missing here
> or is this really the core issue you're concerned with?

The error accumulates and there is no good reason to do this for the
common case.

> > In consequence this means, if we want to improve timekeeping, we first set
> > the (update_cycles * NTP_INTERVAL_FREQ) interval as close as possible to
> > the real frequency. It doesn't has to be perfect as we usually don't know
> > the real frequency with 100% certainty anyway.
>
> This might need some more explanation, as I'm not certain I know what
> update_cycles refers to. Do you mean cycle_interval? I guess I'm not
> completely sure how you're suggesting we change things here.

clock->cycle_interval

> > Second, we drop the tick
> > adjustment if possible and leave the adjustments to the NTP daemon and as
> > long as the drift is within the 500ppm limit it has no problem to manage
> > this.
>
> Dropping the tick adjustment? By that do you mean the tick_usec value
> set by adjtimex()? I don't quite see why we want that. Could you expand
> here?

CLOCK_TICK_ADJUST

> HZ=1000 CLOCK_TICK_ADJUST=-152533
> jiffies 467 ppb error
> jiffies NOHZ 467 ppb error
> pit 0 ppb error
> pit NOHZ 0 ppb error
> acpi_pm -280 ppb error
> acpi_pm NOHZ 279 ppb error
>
> HZ=1000 CLOCK_TICK_ADJUST=0
> jiffies 153000 ppb error
> jiffies NOHZ 153000 ppb error
> pit 152533 ppb error
> pit NOHZ 0 ppb error
> acpi_pm -127112 ppb error
> acpi_pm NOHZ 279 ppb error
>
> So you are right, w/ pit & NO_HZ, the granularity error is always very
> small both with or without CLOCK_TICK_ADJUST.

If you change the frequency of acpi_pm to 3579000 you'll get this:

HZ=1000 CLOCK_TICK_ADJUST=0
jiffies 153000 ppb error
jiffies NOHZ 153000 ppb error
pit 152533 ppb error
pit NOHZ 0 ppb error
acpi_pm 0 ppb error
acpi_pm NOHZ 0 ppb error

HZ=1000 CLOCK_TICK_ADJUST=-152533
jiffies 0 ppb error
jiffies NOHZ 466 ppb error
pit -467 ppb error
pit NOHZ -1 ppb error
acpi_pm 126407 ppb error
acpi_pm NOHZ 22 ppb error

CLOCK_TICK_ADJUST has only any meaning for PIT (and indirectly for
jiffies). For every other clock you just add some random value, where
it doesn't do _any_ good.
The worst case error there will always be (ntp_hz/freq/2*10^9nsec), all
you do with CLOCK_TICK_ADJUST is to do shift it around, but it doesn't
actually fix the error - it's still there.

> However, without CLOCK_TICK_ADJUST, the jiffies error increases for all
> values of HZ except 100 (which at first seems odd, but seems to be due
> to loss from rounding in the ACTHZ calculation).

jiffies depends on the timer resolution, so it will practically produce
the same results as PIT (assuming it's used to generate the timer tick).

> One interesting surprise in the data: With CLOCK_TICK_ADJUST=0, the
> acpi_pm's error frequency shot up in the !NO_HZ cases. This ends up
> being due to the acpi_pm being a very close to a multiple (3x) of the
> pit frequency, so CLOCK_TICK_ADJUST helps it as well.

What exactly does it help with?
All you are doing is number cosmetics, it has _no_ practically value and
only decreases the quality of timekeeping.

> Further it seems to point that if we are going to be chasing down small
> sub-100ppb errors (which I think would be great to do, but lets not make
> users to endure 200+ppm errors while we debate the fine-tuning :) we
> might want to consider a method where we let ntp_update_freq take into
> account the current clocksource's interval length, so it becomes the
> base value against which we apply adjustments (scaled appropriately).

The error at least is real, the use value of CLOCK_TICK_ADJUST for the
common case is not existent.

> There are 3 sources of error that we've discussed here:
> 1) The large (280ppm) error seen with no-NTP adjustment, caused by the
> inconsistent (A!=B) interval comparisons which started this discussion,
> which my patch does address.

Part of the error is caused by CLOCK_TICK_ADJUST, but the other part of
the error is real, all you do is hiding it.

> 2) The medium (153-0ppm)clocksource granularity error shown in the data
> above, caused by the actual interval length not matching the requested
> interval length (what CLOCK_TICK_ADJUST tries to compensate for when
> using jiffies/PIT).

To be precise: CLOCK_TICK_ADJUST doesn't compensate anything, it only
hides the error for jiffies/PIT. In any other case it's just some random
value added to it.

> 3) The smaller (0.076ppm max) NTP adjustment error caused by the
> parts-per-billion == nsec-per-sec shortcut used in combination w/ a base
> interval that is not actually a second (due to CLOCK_TICK_ADJUST being
> added in) in ntp_update_frequency()

The maximum is not that trivial, e.g. alpha has a CLOCK_TICK_RATE value of
32768 and has two possible HZ values 1024/1200. Everything is fine with
1024, but with 1200 you create an error of ~11230ppm.

In the alpha case you have the interesting problem, that the timer tick is
generated by the RTC (if I see it correctly). If the alpha used clock
sources, what value should CLOCK_TICK_RATE have? Which of the three
possible clocks do you want to adjust - the jiffies, the PIT or the cpu
cycle clock?

That's the other big problem with CLOCK_TICK_RATE, for many archs it's
just some random value and in some unlucky configuration it now may do
more harm than it does any good. It had some value when process timer were
jiffies based, but since hrtimer that reason is gone and in the NTP code
it does more harm than good.

bye, Roman

2008-02-19 01:02:33

by john stultz

[permalink] [raw]
Subject: Re: [PATCH] correct inconsistent ntp interval/tick_length usage


On Sat, 2008-02-16 at 05:24 +0100, Roman Zippel wrote:
> Hi,
>
> On Wed, 13 Feb 2008, john stultz wrote:
>
> > Oh! So your issue is that since time_freq is stored in ppm, or in effect
> > a usecs per sec offset, when we add it to something other then 1 second
> > we mis-apply what NTP tells us to. Is that right?
>
> Pretty much everything is centered around that 1sec, so the closer the
> update frequency is to it the better.
>
> > Right, so if NTP has us apply a 10ppm adjustment, instead of doing:
> > NSEC_PER_SEC + 10,000
> >
> > We're doing:
> > NSEC_PER_SEC + CLOCK_TICK_ADJUST + 10,000
> >
> > Which, if I'm doing my math right, results in a 10.002ppm adjustment
> > (using the 999847467ns number above), instead of just a 10ppm
> > adjustment.
> >
> > Now, true, this is an error, but it is a pretty small one. Even at the
> > maximum 500ppm value, it only results in an error of 76 parts per
> > billion. As you'll see below, that tends to be less error then what we
> > get from the clock granularity. Is there something else I'm missing here
> > or is this really the core issue you're concerned with?
>
> The error accumulates and there is no good reason to do this for the
> common case.

I agree, but we can also easily resolve this by scaling the ppm
adjustment to the interval length, rather then just applying it as
usec/sec. No?

So instead of:
adjusted_length = NSEC_PER_SEC + time_adj

We could do:
adjusted_length = ntp_interval_length +
(ntp_interval_length * time_adj)/MILLION


So this fixes the time_adj scaling error, while letting us be more
flexible w/ our interval length, so we can better map the requested
length to the clocksource granularity, lessening the granularity error.


> > HZ=1000 CLOCK_TICK_ADJUST=-152533
> > jiffies 467 ppb error
> > jiffies NOHZ 467 ppb error
> > pit 0 ppb error
> > pit NOHZ 0 ppb error
> > acpi_pm -280 ppb error
> > acpi_pm NOHZ 279 ppb error
> >
> > HZ=1000 CLOCK_TICK_ADJUST=0
> > jiffies 153000 ppb error
> > jiffies NOHZ 153000 ppb error
> > pit 152533 ppb error
> > pit NOHZ 0 ppb error
> > acpi_pm -127112 ppb error
> > acpi_pm NOHZ 279 ppb error
> >
> > So you are right, w/ pit & NO_HZ, the granularity error is always very
> > small both with or without CLOCK_TICK_ADJUST.
>
> If you change the frequency of acpi_pm to 3579000 you'll get this:
>
> HZ=1000 CLOCK_TICK_ADJUST=0
> jiffies 153000 ppb error
> jiffies NOHZ 153000 ppb error
> pit 152533 ppb error
> pit NOHZ 0 ppb error
> acpi_pm 0 ppb error
> acpi_pm NOHZ 0 ppb error
>
> HZ=1000 CLOCK_TICK_ADJUST=-152533
> jiffies 0 ppb error
> jiffies NOHZ 466 ppb error
> pit -467 ppb error
> pit NOHZ -1 ppb error
> acpi_pm 126407 ppb error
> acpi_pm NOHZ 22 ppb error

Right, but the acpi_pm's frequency isn't 3579000. It's supposed to be
3579545. That is injecting error, and I believe it to be different then
what I'm claiming is achieved by CLOCK_TICK_ADJUST.

Further, the specific example above was agreeing with you that that
CLOCK_TICK_ADJUST=0 looks ok for NOHZ, with the exception in the case of
the jiffies clocksource.

That one still has the 153ppm drift. What do we do about that?


> CLOCK_TICK_ADJUST has only any meaning for PIT (and indirectly for
> jiffies). For every other clock you just add some random value, where
> it doesn't do _any_ good.

While not the main point, the aside I included about the acpi_pm being
interesting, is that because the ACPI PM's frequency is a multiple of
the PIT's (and thus jiffies), the same granularity issues arise
(although to a lesser degree). That is why CLOCK_TICK_ADJUST helps it in
the !NOHZ case.

> The worst case error there will always be (ntp_hz/freq/2*10^9nsec), all
> you do with CLOCK_TICK_ADJUST is to do shift it around, but it doesn't
> actually fix the error - it's still there.
>
> > However, without CLOCK_TICK_ADJUST, the jiffies error increases for all
> > values of HZ except 100 (which at first seems odd, but seems to be due
> > to loss from rounding in the ACTHZ calculation).
>
> jiffies depends on the timer resolution, so it will practically produce
> the same results as PIT (assuming it's used to generate the timer tick).
>
> > One interesting surprise in the data: With CLOCK_TICK_ADJUST=0, the
> > acpi_pm's error frequency shot up in the !NO_HZ cases. This ends up
> > being due to the acpi_pm being a very close to a multiple (3x) of the
> > pit frequency, so CLOCK_TICK_ADJUST helps it as well.
>
> What exactly does it help with?
> All you are doing is number cosmetics, it has _no_ practically value and
> only decreases the quality of timekeeping.

Unless you want to clarify what aspect of "quality" you're talking
about, I'd very much disagree with your claim. This is not cosmetics,
this is about addressing granularity error.

By using a base interval that does not match the clocksource's natural
granularity, we effectively inject a per-interval error. Since the error
is tracked and compensated by the clocksource_adjust() function, we end
up with a incorrect clocksource frequency. This then has to be
discovered and compensated for by NTP, in addition to the actual
hardware error.

By using a base interval that does match the clocksource's natural
granularity, we avoid the per-interval error. Without this per-interval
error, we utilize the specified frequency of the hardware, and NTP only
has to correct for the actual hardware error.

If we are building a train station, and each train car is 60ft, it
doesn't make much sense to build 1000ft stations, right? Instead you'll
do better if you build a 1020ft station.

That's what CLOCK_TICK_ADJ is trying to address.

Now, I'm open to doing something other then just adding CLOCK_TICK_ADJ.

For instance, if we find a way to push the initial xtime_interval
(without clocksource_adjust()'s adjustments - see the mult-adj-split
patch from my patchset friday) back to the ntp code so it uses that as
its interval base, and then use the time_adj scaling pointed out above.
That I think would work.


> > Further it seems to point that if we are going to be chasing down small
> > sub-100ppb errors (which I think would be great to do, but lets not make
> > users to endure 200+ppm errors while we debate the fine-tuning :) we
> > might want to consider a method where we let ntp_update_freq take into
> > account the current clocksource's interval length, so it becomes the
> > base value against which we apply adjustments (scaled appropriately).
>
> The error at least is real, the use value of CLOCK_TICK_ADJUST for the
> common case is not existent.

Again, I disagree.

> > There are 3 sources of error that we've discussed here:
> > 1) The large (280ppm) error seen with no-NTP adjustment, caused by the
> > inconsistent (A!=B) interval comparisons which started this discussion,
> > which my patch does address.
>
> Part of the error is caused by CLOCK_TICK_ADJUST, but the other part of
> the error is real, all you do is hiding it.

Explain hiding it. I'm just making sure we do equal comparisons. If
ntp_update_frequency() is using CLOCK_TICK_ADJUST, so should the
clocksource_calculate_interval() functions.

And yes, if we remove CLOCK_TICK_ADJUST, that would also resolve the
(A!=B) issue, but it doesn't address the error from #2 below.


> > 2) The medium (153-0ppm)clocksource granularity error shown in the data
> > above, caused by the actual interval length not matching the requested
> > interval length (what CLOCK_TICK_ADJUST tries to compensate for when
> > using jiffies/PIT).
>
> To be precise: CLOCK_TICK_ADJUST doesn't compensate anything, it only
> hides the error for jiffies/PIT. In any other case it's just some random
> value added to it.

Wrong. We're using a base interval that maps to the granularity of the
PIT/jiffies clocksource. It doesn't matter too much for other
clocksources, as they're fine grained enough to not really matter, esp
if NTP_INTERVAL_FREQ is 2.


> > 3) The smaller (0.076ppm max) NTP adjustment error caused by the
> > parts-per-billion == nsec-per-sec shortcut used in combination w/ a base
> > interval that is not actually a second (due to CLOCK_TICK_ADJUST being
> > added in) in ntp_update_frequency()
>
> The maximum is not that trivial, e.g. alpha has a CLOCK_TICK_RATE value of
> 32768 and has two possible HZ values 1024/1200. Everything is fine with
> 1024, but with 1200 you create an error of ~11230ppm.

Uh. Check that again. I'm seeing the 11230ppm error only when looking at
the _granularity error_ when CLOCK_TICK_ADJUST is *not* used (where
CLOCK_TICK_ADJUST is used, we get 1.2ppm granularity error).

To see the maximum NTP adjustment error caused by the nsec-per-sec==ppb
shortcut, its:

(NSEC_PER_SEC + 500,000)/NSEC_PER_SEC - (NSEC_PER_SEC + CLOCK_TICK_ADJ +
500,000)/(NSEC_PER_SEC + CLOCK_TICK_ADJ) = -5689ppb error

So even here, in the case where a *very* poor HZ value is selected given
the CLOCK_TICK_RATE (its that train station analogy again), the error
from point #3 is fairly small.

> In the alpha case you have the interesting problem, that the timer tick is
> generated by the RTC (if I see it correctly). If the alpha used clock
> sources, what value should CLOCK_TICK_RATE have? Which of the three
> possible clocks do you want to adjust - the jiffies, the PIT or the cpu
> cycle clock?

On this I agree. If we're using fine grained clocksources,
CLOCK_TICK_RATE makes less and less sense. But it also should hurt less,
due to the granularity error being smaller on those clocksources. And
yes, one could imagine a system with multiple course clocks that have
different granularity errors and in that case CLOCK_TICK_RATE doesn't
address everything.

However right now CLOCK_TICK_RATE is important for very course
clocksources such as jiffies and the PIT. And many systems use those
clocksources (it should be noted, EVERY non GENERIC_TIME arch uses the
jiffies clocksource as its clocksource), so it effects many users. Until
systems do not use those as clocksources (when everyone has very fine
grained clocks) we need a solution here. I'm not seeing how your
suggestions solve this.

> That's the other big problem with CLOCK_TICK_RATE, for many archs it's
> just some random value and in some unlucky configuration it now may do
> more harm than it does any good. It had some value when process timer were
> jiffies based, but since hrtimer that reason is gone and in the NTP code
> it does more harm than good.

Oh yes. Setting it correctly is a responsibility of the arch maintainer.
If the arch has a incorrect CLOCK_TICK_RATE, it will very much do damage
to the accuracy of the jiffies clocksource and will hurt timekeeping.

However, if we just remove CLOCK_TICK_RATE, we will hurt the jiffies
clocksource as well.


Overall, I think I agree with your basic dislike for CLOCK_TICK_RATE,
but I do not see how we just get rid of it as you suggest and still have
good timekeeping for all systems.


So again I claim these are the issues, in severity order:

1) A!=B interval comparison needs to get solved. This is just a brain
dead source of large error, and is easily fixed one way or the other.

2) We need a solution that handles granularity error well, as this is a
moderate source of error for course clocksources such as jiffies.
CLOCK_TICK_ADJUST does cover this fairly well in most cases. I suspect
we could do even better, but that will take some deeper changes.

3) We should make sure the even smaller error done by not scaling the
ntp frequency adjustment to the interval length is corrected.


My approach solves only #1. #2 is addressed as CLOCK_TICK_ADJUST handles
it relatively well in most cases. I've proposed that we scale the
frequency adjustment properly to handle #3.

My understanding of your approach (removing CLOCK_TICK_ADJUST),
addresses issues #1 and #3, but hurts issue #2.

So have we talked this out enough? :)

If I'm still mis-characterizing your solution, feel free to send a patch
and I'll try to refine my critique.

thanks
-john

2008-02-19 04:04:18

by Roman Zippel

[permalink] [raw]
Subject: Re: [PATCH] correct inconsistent ntp interval/tick_length usage

Hi,

On Mon, 18 Feb 2008, john stultz wrote:

> If we are building a train station, and each train car is 60ft, it
> doesn't make much sense to build 1000ft stations, right? Instead you'll
> do better if you build a 1020ft station.

That would assume trains are always 60ft long, which is the basic error in
your assumption.

Since we're using analogies: What you're doing is to put you winter
clothes on your weight scale and reset the scale to zero to offset for the
weigth of the clothes. If you stand now with your bathing clothes on the
scale, does that mean you lost weight?
That's all you do - you change the scale and slightly screw the scale for
everyone else trying to use it.

To keep in mind what time adjusting is supposed to do:

freq = 1sec + time_freq

What we do instead is:

freq + tick_adj = 1sec + time_freq

Where exactly is now the problem to integrate tick_adj into time_freq? The
result is _exactly_ the same. The only visible difference is a slightly
higher time_freq value and as long as it is within the 500 ppm limit there
is simply no problem.

> And yes, if we remove CLOCK_TICK_ADJUST, that would also resolve the
> (A!=B) issue, but it doesn't address the error from #2 below.
> [..]
> 2) We need a solution that handles granularity error well, as this is a
> moderate source of error for course clocksources such as jiffies.
> CLOCK_TICK_ADJUST does cover this fairly well in most cases. I suspect
> we could do even better, but that will take some deeper changes.

How exactly does CLOCK_TICK_ADJUST address this problem? The error due to
insufficient resolution is still there, all it does is shifting the scale,
so it's not immediately visible.

> My understanding of your approach (removing CLOCK_TICK_ADJUST),
> addresses issues #1 and #3, but hurts issue #2.

What exactly is hurt?

bye, Roman

2008-02-20 01:51:29

by john stultz

[permalink] [raw]
Subject: Re: [PATCH] correct inconsistent ntp interval/tick_length usage


On Tue, 2008-02-19 at 05:04 +0100, Roman Zippel wrote:
> Hi,
>
> On Mon, 18 Feb 2008, john stultz wrote:
>
> > If we are building a train station, and each train car is 60ft, it
> > doesn't make much sense to build 1000ft stations, right? Instead you'll
> > do better if you build a 1020ft station.
>
> That would assume trains are always 60ft long, which is the basic error in
> your assumption.
>
> Since we're using analogies: What you're doing is to put you winter
> clothes on your weight scale and reset the scale to zero to offset for the
> weigth of the clothes. If you stand now with your bathing clothes on the
> scale, does that mean you lost weight?

Sheesh, now you're calling me deceiving *and* fat! ;)


> That's all you do - you change the scale and slightly screw the scale for
> everyone else trying to use it.

I don't think so. This all has to do with intervals, not scale offsets.
I'm having a hard time seeing how we're not connecting on this.

To better keep with your analogy, you'd have to imagine a scale that
only reads in X pound increments. As long as X is fairly small, it
should measure everyone's weight fairly well. However, if X is large,
like say 50kg, then it won't weigh a 70kg person very accurately (even
if he is a liar and he really weighs 77kgs).


This is the granularity error I'm talking about.

Again, this all has to do with our accumulation intervals.

Lets look at the code. I'm going to focus on jiffies as its the best
example, being both very common and very course.


Let's say we make our base interval 1,000,000ns.

Using the jiffies clocksource, the closest we can get to this interval
is 1 jiffy, which is 999,847ns.

Now every interval we will see 153ns error. The clocksource_adjust()
function will watch this error accumulate and adjust the jiffies's
frequency by 153ppm to try to compensate. Note, this compensation is not
adjusting for any hardware drift, it is only compensating for the error
we've injected by choosing an interval the clocksource cannot match.

Now, when NTP starts up, if we had perfect hardware and there was no
hardware drift, NTP would have to inject a -153ppm correction to offset
the systematic error we've introduced.

If we do not have perfect hardware, then NTP would have to correct for
both the 153ppm error and the hardware error. Sp if the hardware error
was in the same direction, we can now only compensate for up to a 347ppm
hardware drift, before we hit the 500ppm bound in NTP.

Using the test program I sent out before:
HZ=1000 CLOCK_TICK_ADJUST=0

jiffies
==========
1 cycles per 1000000 ns
999847 ns per cycle
999847 ns per interval
153000 ppb error

jiffies NOHZ
==========
500 cycles per 500000000 ns
999847 ns per cycle
499923500 ns per interval
153000 ppb error


Now, if we make our base interval match the clocksource granularity,
this can avoid the error. If the base interval we accumulate with is
999,847ns, we avoid needlessly accumulating error.

This means without NTP correction, the system clock drift will match the
hardware clock drift. So when NTP does kick in, it only has to correct
for that hardware drift.

HZ=1000 CLOCK_TICK_ADJUST=-152533
jiffies
==========
1 cycles per 999847 ns
999847 ns per cycle
999847 ns per interval
0 ppb error

jiffies NOHZ
==========
500 cycles per 499923733 ns
999847 ns per cycle
499923500 ns per interval
466 ppb error


> To keep in mind what time adjusting is supposed to do:
>
> freq = 1sec + time_freq

But it is this fixation on 1sec that is the cause of the granularity
error. I believe the following is also correct (assuming time_freq is in
ppm units):

adjusted_freq = interval_length + (interval_length * time_freq)/MILLION

This properly scales the adjustment to any interval length.


> What we do instead is:
>
> freq + tick_adj = 1sec + time_freq
>
> Where exactly is now the problem to integrate tick_adj into time_freq? The
> result is _exactly_ the same. The only visible difference is a slightly
> higher time_freq value and as long as it is within the 500 ppm limit there
> is simply no problem.

Well, it is a problem if its large. The 500ppm limit is supposed to be
for hardware frequency error correction, not hardware frequency +
software error correction. Now, if it were 1-10ppm, it wouldn't be that
big of an issue, but with the jiffies example above, 153ppm does cut
into the correctable space a good bit.

Show me how we cut that error down, and I'll be happy to kill
CLOCK_TICK_ADJUST.

> > And yes, if we remove CLOCK_TICK_ADJUST, that would also resolve the
> > (A!=B) issue, but it doesn't address the error from #2 below.
> > [..]
> > 2) We need a solution that handles granularity error well, as this is a
> > moderate source of error for course clocksources such as jiffies.
> > CLOCK_TICK_ADJUST does cover this fairly well in most cases. I suspect
> > we could do even better, but that will take some deeper changes.
>
> How exactly does CLOCK_TICK_ADJUST address this problem? The error due to
> insufficient resolution is still there, all it does is shifting the scale,
> so it's not immediately visible.

Again, I agree there is a small error (76ppb in my earlier example)
caused when we time_freq adjustment to what we assume is 1 second, when
CLOCK_TICK_ADJ is involved.

However, relatively this is small compared with the example 153ppm error
above caused by ignoring the granularity issue. Additionally we can fix
it using the scaling method I pointed out above.

> > My understanding of your approach (removing CLOCK_TICK_ADJUST),
> > addresses issues #1 and #3, but hurts issue #2.
>
> What exactly is hurt?

By injecting 153ppm of error, the ability for NTP to correct hardware
error within 500ppm is hurt.

Sigh. So at this point, if we're not closing the gap in our
understanding, I'm not sure how much its worth to continue on the
discussion in this manner. I'd welcome anyone to help clarify what I'm
missing, or maybe assistance in better communicating my point.

In the meantime, I'll add the scaling fix from above to my code and
re-submit it. If you want you can send your own patch and I'll gladly
test, review it and try to figure out what I'm missing.

Once again, thanks for the feedback.
-john

2008-02-20 17:08:55

by Roman Zippel

[permalink] [raw]
Subject: Re: [PATCH] correct inconsistent ntp interval/tick_length usage

Hi,

On Tue, 19 Feb 2008, john stultz wrote:

> To better keep with your analogy, you'd have to imagine a scale that
> only reads in X pound increments. As long as X is fairly small, it
> should measure everyone's weight fairly well. However, if X is large,
> like say 50kg, then it won't weigh a 70kg person very accurately (even
> if he is a liar and he really weighs 77kgs).
>
>
> This is the granularity error I'm talking about.

There is a big difference between accuracy and granularity. Even a coarse
grained scale can be accurate (just within its resolution) and a fine
grained scale can be very inaccurate if you shift around the scale.
Please keep this separate, otherwise this can go on forever...

> Now, when NTP starts up, if we had perfect hardware and there was no
> hardware drift, NTP would have to inject a -153ppm correction to offset
> the systematic error we've introduced.
>
> If we do not have perfect hardware, then NTP would have to correct for
> both the 153ppm error and the hardware error. Sp if the hardware error
> was in the same direction, we can now only compensate for up to a 347ppm
> hardware drift, before we hit the 500ppm bound in NTP.

Out of curiosity, what kind of hardware error do you expect?
If you have that crappy hardware you're better off initializing the clock
with the correct frequency.

> > To keep in mind what time adjusting is supposed to do:
> >
> > freq = 1sec + time_freq
>
> But it is this fixation on 1sec that is the cause of the granularity
> error.

You need some kind of fix point and everyone is using 1sec for as base
length, for some mysterious reason you're now trying to redefine it.
The PIT (and any clock based on it) has a certain resolution, so with an
update frequency of HZ=1000 it produces a certain error, but why on earth
are you trying to introduce this error as universal constant? This is a
property of the PIT clock, applying this error to every other clock makes
no sense.

> I believe the following is also correct (assuming time_freq is in ppm units):
>
> adjusted_freq = interval_length + (interval_length * time_freq)/MILLION
>
> This properly scales the adjustment to any interval length.

Actually it's not that simple, ntp_update_frequency() only calculates the
base length, time_offset had to be scaled as well (and back when requested
from user space). You would add a lot of complexity for a silly little
error, which the current mechanisms can handle just fine.

> > What we do instead is:
> >
> > freq + tick_adj = 1sec + time_freq
> >
> > Where exactly is now the problem to integrate tick_adj into time_freq? The
> > result is _exactly_ the same. The only visible difference is a slightly
> > higher time_freq value and as long as it is within the 500 ppm limit there
> > is simply no problem.
>
> Well, it is a problem if its large. The 500ppm limit is supposed to be
> for hardware frequency error correction, not hardware frequency +
> software error correction. Now, if it were 1-10ppm, it wouldn't be that
> big of an issue, but with the jiffies example above, 153ppm does cut
> into the correctable space a good bit.

Again, what kind of crappy hardware do you expect? Aren't clocks supposed
to get better and not worse?
Where do you get this idea that the 500ppm are exclusively for hardware
errors? If you have such bad hardware, there is another simple solution:
change HZ to 100 and the error is reduced to 15ppm.

I would see the point if this problem had actually any practically
relevance, but this error is not a problem for pretty much all existing
standard hardware. Why are you insisting on redesigning timekeeping for
broken hardware?

> > > My understanding of your approach (removing CLOCK_TICK_ADJUST),
> > > addresses issues #1 and #3, but hurts issue #2.
> >
> > What exactly is hurt?
>
> By injecting 153ppm of error, the ability for NTP to correct hardware
> error within 500ppm is hurt.

There's nothing 'injected', that resolution error is very real and the
500ppm limit is more than enough to deal with this. _Nobody_ is hurt by
this.

> Sigh. So at this point, if we're not closing the gap in our
> understanding, I'm not sure how much its worth to continue on the
> discussion in this manner. I'd welcome anyone to help clarify what I'm
> missing, or maybe assistance in better communicating my point.

The point is you are redefining a mechanism which has _never_ been
intendend for purpose you're trying to abuse it for now.
Reread the original patch, it was intended for kernel with HZ of 2000,
where the error would be 687ppm. Now we have other ways to increase the
resolution, timekeeping isn't solely based on the PIT anymore. The whole
reason for the original patch is pretty much gone by now.

If you really need some kind of adjustment for your extremely broken
hardware, below is the absolute maximum you need, which doesn't inflict
more insanity on all the sane hardware.

bye, Roman


Revert bbe4d18ac2e058c56adb0cd71f49d9ed3216a405 and
e13a2e61dd5152f5499d2003470acf9c838eab84 and remove CLOCK_TICK_ADJUST
completely. Add a optional kernel parameter ntp_tick_adj instead to allow
adjusting of a large base drift and thus keeping ntpd happy.
The CLOCK_TICK_ADJUST mechanism was introduced at a time PIT was the
primary clock, but we have a varity of clock sources now, so a global PIT
specific adjustment makes little sense anymore.

Signed-off-by: Roman Zippel <[email protected]>


---
include/linux/timex.h | 9 +--------
kernel/time/ntp.c | 11 ++++++++++-
kernel/time/timekeeping.c | 6 ++----
3 files changed, 13 insertions(+), 13 deletions(-)

Index: linux-2.6/include/linux/timex.h
===================================================================
--- linux-2.6.orig/include/linux/timex.h
+++ linux-2.6/include/linux/timex.h
@@ -232,14 +232,7 @@ static inline int ntp_synced(void)
#else
#define NTP_INTERVAL_FREQ (HZ)
#endif
-
-#define CLOCK_TICK_OVERFLOW (LATCH * HZ - CLOCK_TICK_RATE)
-#define CLOCK_TICK_ADJUST (((s64)CLOCK_TICK_OVERFLOW * NSEC_PER_SEC) / \
- (s64)CLOCK_TICK_RATE)
-
-/* Because using NSEC_PER_SEC would be too easy */
-#define NTP_INTERVAL_LENGTH ((((s64)TICK_USEC * NSEC_PER_USEC * USER_HZ) + \
- CLOCK_TICK_ADJUST) / NTP_INTERVAL_FREQ)
+#define NTP_INTERVAL_LENGTH (NSEC_PER_SEC/NTP_INTERVAL_FREQ)

/* Returns how long ticks are at present, in ns / 2^(SHIFT_SCALE-10). */
extern u64 current_tick_length(void);
Index: linux-2.6/kernel/time/ntp.c
===================================================================
--- linux-2.6.orig/kernel/time/ntp.c
+++ linux-2.6/kernel/time/ntp.c
@@ -42,12 +42,13 @@ long time_esterror = NTP_PHASE_LIMIT; /*
long time_freq; /* frequency offset (scaled ppm)*/
static long time_reftime; /* time at last adjustment (s) */
long time_adjust;
+long ntp_tick_adj;

static void ntp_update_frequency(void)
{
u64 second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ)
<< TICK_LENGTH_SHIFT;
- second_length += (s64)CLOCK_TICK_ADJUST << TICK_LENGTH_SHIFT;
+ second_length += (s64)ntp_tick_adj << TICK_LENGTH_SHIFT;
second_length += (s64)time_freq << (TICK_LENGTH_SHIFT - SHIFT_NSEC);

tick_length_base = second_length;
@@ -400,3 +401,11 @@ leave: if ((time_status & (STA_UNSYNC|ST
notify_cmos_timer();
return(result);
}
+
+static int __init ntp_tick_adj_setup(char *str)
+{
+ ntp_tick_adj = simple_strtol(str, NULL, 0);
+ return 1;
+}
+
+__setup("ntp_tick_adj=", ntp_tick_adj_setup);
Index: linux-2.6/kernel/time/timekeeping.c
===================================================================
--- linux-2.6.orig/kernel/time/timekeeping.c
+++ linux-2.6/kernel/time/timekeeping.c
@@ -187,8 +187,7 @@ static void change_clocksource(void)

clock->error = 0;
clock->xtime_nsec = 0;
- clocksource_calculate_interval(clock,
- (unsigned long)(current_tick_length()>>TICK_LENGTH_SHIFT));
+ clocksource_calculate_interval(clock, NTP_INTERVAL_LENGTH);

tick_clock_notify();

@@ -245,8 +244,7 @@ void __init timekeeping_init(void)
ntp_clear();

clock = clocksource_get_next();
- clocksource_calculate_interval(clock,
- (unsigned long)(current_tick_length()>>TICK_LENGTH_SHIFT));
+ clocksource_calculate_interval(clock, NTP_INTERVAL_LENGTH);
clock->cycle_last = clocksource_read(clock);

xtime.tv_sec = sec;

2008-02-22 02:39:38

by john stultz

[permalink] [raw]
Subject: Re: [PATCH] correct inconsistent ntp interval/tick_length usage


On Wed, 2008-02-20 at 18:08 +0100, Roman Zippel wrote:
> > Well, it is a problem if its large. The 500ppm limit is supposed to be
> > for hardware frequency error correction, not hardware frequency +
> > software error correction. Now, if it were 1-10ppm, it wouldn't be that
> > big of an issue, but with the jiffies example above, 153ppm does cut
> > into the correctable space a good bit.
>
> Again, what kind of crappy hardware do you expect? Aren't clocks supposed
> to get better and not worse?

Well, while I've seen much worse, I consider crappy hardware to be 100
+ppm error. So if the hardware is perfect and the system results in
153ppm error, I'd consider that pretty crappy, especially if its not the
hardware's fault.

> Where do you get this idea that the 500ppm are exclusively for hardware
> errors? If you have such bad hardware, there is another simple solution:
> change HZ to 100 and the error is reduced to 15ppm.

True its not exclusively for hardware errors, and if we were talking
about only 15ppm I wouldn't really worry about it. But when we're saying
the system is adding 30% of the maximum error, that's just not good.

> I would see the point if this problem had actually any practically
> relevance, but this error is not a problem for pretty much all existing
> standard hardware. Why are you insisting on redesigning timekeeping for
> broken hardware?

Remember my earlier data? Where I was talking about the acpi_pm being a
multiple of the PIT frequency? By removing CLOCK_TICK_ADJUST we got a
127ppm error when HZ=1000. NO_HZ drops that down to where we don't care,
but this _does_ effect current hardware, so I'd call it relevant.


> > > > My understanding of your approach (removing CLOCK_TICK_ADJUST),
> > > > addresses issues #1 and #3, but hurts issue #2.
> > >
> > > What exactly is hurt?
> >
> > By injecting 153ppm of error, the ability for NTP to correct hardware
> > error within 500ppm is hurt.
>
> There's nothing 'injected', that resolution error is very real and the
> 500ppm limit is more than enough to deal with this. _Nobody_ is hurt by
> this.

Sure, 500ppm is enough for most people with good hardware. But remember
the alpha example you brought up earlier? The HZ=1200 case, with the
CLOCK_TICK_RATE=32768? If we don't take CLOCK_TICK_ADJUST into account,
we end up with a **11230ppm** error from the granularity issue. NTP just
won't work on those systems.

Now granted, the three types of alpha systems that actually use that HZ
value is probably as close to "nobody" as you're going to get, but I
don't think we can just throw the granularity issue aside.


> Revert bbe4d18ac2e058c56adb0cd71f49d9ed3216a405 and
> e13a2e61dd5152f5499d2003470acf9c838eab84 and remove CLOCK_TICK_ADJUST
> completely. Add a optional kernel parameter ntp_tick_adj instead to allow
> adjusting of a large base drift and thus keeping ntpd happy.
> The CLOCK_TICK_ADJUST mechanism was introduced at a time PIT was the
> primary clock, but we have a varity of clock sources now, so a global PIT
> specific adjustment makes little sense anymore.
>
> Signed-off-by: Roman Zippel <[email protected]>

So thanks so much for sending the patch. It makes clear your solution.

My initial comments: Its simple and that really does have its merits. It
resolves the inconsistent comparison issue, and does not have the
smallish scaling error we talked about as well. As I've said before, I
do like the idea, I'm just worried about the corner cases (mainly
jiffies based systems).

The granularity issue is still present. Depending on the HZ settings and
the clocksource hardware, systems may see large errors added on to the
actual hardware error, and its possible the kernel error may dominate
the actual hardware error.

The ntp_tick_adjust option does give a way out if you have, for example,
one of those alpha systems where it would be necessary, but I do wish
there was a better way then forcing users to calculate for themselves
what the granularity adjustment should be (esp given that it is more a
function of the kernel compile options, so different kernels would need
different values for the same system).

So then yes, your patch is simple and corrects the issue that started
the discussion. I think we're closing the gaps. :)

I still think my claims hold that your patch as it stands may worsen the
drift error depending on HZ settings, especially on jiffies based
systems (which means every non-GENERIC_TIME arch). However, if folks
don't really care, then that may be acceptable.

As promised, here is my own patch, which takes the scaling error you
pointed out into account, as well as resolving the granularity issue in
a way similar to your ntp_tick_adjust option does, only the kernel will
calculate such a granularity correction on a per-clocksource base, so
users don't have to do the math.

Sadly I've not had the chance to really test and debug this (there's a
lot of shifting logic, so I may have flubed something there), but I
wanted to send it out for comments, as I think it communicates the gist
of the idea. I'll test and refine it tomorrow and send it out again if
needed.

Let me know what you think.

Again, thanks for sending the patch and your continued feedback!
-john

----------------------------------------->

My first version of the ntp_interval/tick_length inconsistent usage
patch was recently merged as bbe4d18ac2e058c56adb0cd71f49d9ed3216a405

http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=bbe4d18ac2e058c56adb0cd71f49d9ed3216a405

While the fix did greatly improve the situation, Roman correctly pointed
out that it does have a small bug: If the users change clocksources
after the system has been running and NTP has made corrections, the
correctoins made against the old clocksource will be applied against the
new clocksource, causing error.

My second attempt, which corrects the issue in the NTP_INTERVAL_LENGTH
definition has also made it up-stream as commit
e13a2e61dd5152f5499d2003470acf9c838eab84

http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e13a2e61dd5152f5499d2003470acf9c838eab84

Roman also had objections to this patch, and I agree with some, but not
all of them.

This patch reverts both of those changes, and introduces a new solution:

Roman has correctly pointed out that CLOCK_TICK_ADJUST is calculated
based on the PIT's frequency, and isn't really relevant to non-PIT
driven clocksources (that is, clocksources other then jiffies and pit).

However, by removing CLOCK_TICK_ADJUST, we no longer take into account
the granularity error that the PIT driven clocksources have. This can
result in large errors in time accumulation for systems that use jiffies
or the pit.

This patch tries to rectify the issue by introducing the
ntp_set_granularity_error() function, which is called when we begin
using a clocksource. This function allows the actual clocksource
granularity error to be taken into consideration by ntp when it
calculates the current_tick_length() function.

Using this method, we should be able to have the system clock drift
match very closely to the actual hardware drift, regardless of the
clocksource granularity or the tick frequency.


UNTESTED! DO NOT MERGE!
Signed-off-by: John Stultz <[email protected]>
UNTESTED! DO NOT MERGE!

diff --git a/include/linux/timex.h b/include/linux/timex.h
index c3f3747..24fa43b 100644
--- a/include/linux/timex.h
+++ b/include/linux/timex.h
@@ -233,17 +233,11 @@ static inline int ntp_synced(void)
#define NTP_INTERVAL_FREQ (HZ)
#endif

-#define CLOCK_TICK_OVERFLOW (LATCH * HZ - CLOCK_TICK_RATE)
-#define CLOCK_TICK_ADJUST (((s64)CLOCK_TICK_OVERFLOW * NSEC_PER_SEC) / \
- (s64)CLOCK_TICK_RATE)
-
-/* Because using NSEC_PER_SEC would be too easy */
-#define NTP_INTERVAL_LENGTH ((((s64)TICK_USEC * NSEC_PER_USEC * USER_HZ) + \
- CLOCK_TICK_ADJUST) / NTP_INTERVAL_FREQ)
+#define NTP_INTERVAL_LENGTH (NSEC_PER_SEC / NTP_INTERVAL_FREQ)

/* Returns how long ticks are at present, in ns / 2^(SHIFT_SCALE-10). */
extern u64 current_tick_length(void);
-
+extern void ntp_set_granularity_error(s64 len);
extern void second_overflow(void);
extern void update_ntp_one_tick(void);
extern int do_adjtimex(struct timex *);
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index c88b591..fe25c94 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -43,19 +43,32 @@ long time_freq; /* frequency offset (scaled ppm)*/
static long time_reftime; /* time at last adjustment (s) */
long time_adjust;

+static s64 granularity_error_adjust;
+
+void ntp_set_granularity_error(s64 len)
+{
+ granularity_error_adjust = len * NTP_INTERVAL_FREQ;
+}
+
static void ntp_update_frequency(void)
{
u64 second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ)
<< TICK_LENGTH_SHIFT;
- second_length += (s64)CLOCK_TICK_ADJUST << TICK_LENGTH_SHIFT;
- second_length += (s64)time_freq << (TICK_LENGTH_SHIFT - SHIFT_NSEC);
+ s64 adj;
+
+ /* Compensate for clocksource granularity error */
+ second_length += granularity_error_adjust;
+
+ /* Scale the base second length by the frequency adjustment */
+ adj = second_length * time_freq;
+ do_div(adj, 1000000);
+ second_length += adj>>SHIFT_NSEC;

tick_length_base = second_length;
+ do_div(tick_length_base, NTP_INTERVAL_FREQ);

do_div(second_length, HZ);
tick_nsec = second_length >> TICK_LENGTH_SHIFT;
-
- do_div(tick_length_base, NTP_INTERVAL_FREQ);
}

/**
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 1af9fb0..c91f3ec 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -172,6 +172,7 @@ static void change_clocksource(void)
struct clocksource *new;
cycle_t now;
u64 nsec;
+ s64 adj;

new = clocksource_get_next();

@@ -187,8 +188,15 @@ static void change_clocksource(void)

clock->error = 0;
clock->xtime_nsec = 0;
- clocksource_calculate_interval(clock,
- (unsigned long)(current_tick_length()>>TICK_LENGTH_SHIFT));
+ clocksource_calculate_interval(clock, NTP_INTERVAL_LENGTH);
+
+ /* Calculate the granularity error */
+ adj = clock->cycle_interval * clock->mult;
+ adj <<= TICK_LENGTH_SHIFT - clock->shift;
+ adj = (NTP_INTERVAL_LENGTH << TICK_LENGTH_SHIFT) - adj;
+ ntp_set_granularity_error(adj);
+
+ ntp_clear();

tick_clock_notify();

@@ -245,8 +253,7 @@ void __init timekeeping_init(void)
ntp_clear();

clock = clocksource_get_next();
- clocksource_calculate_interval(clock,
- (unsigned long)(current_tick_length()>>TICK_LENGTH_SHIFT));
+ clocksource_calculate_interval(clock, NTP_INTERVAL_LENGTH);
clock->cycle_last = clocksource_read(clock);

xtime.tv_sec = sec;

2008-02-25 14:44:16

by Roman Zippel

[permalink] [raw]
Subject: Re: [PATCH] correct inconsistent ntp interval/tick_length usage

Hi,

On Thu, 21 Feb 2008, john stultz wrote:

> > Again, what kind of crappy hardware do you expect? Aren't clocks supposed
> > to get better and not worse?
>
> Well, while I've seen much worse, I consider crappy hardware to be 100
> +ppm error. So if the hardware is perfect and the system results in
> 153ppm error, I'd consider that pretty crappy, especially if its not the
> hardware's fault.

Nevertheless this error is real, why are you trying to hide it?
This is isn't an error we can't handle, it's still perfectly within the
limit and except that NTP reports a somewhat larger drift than you'd like
to see, everything works fine.

> > Where do you get this idea that the 500ppm are exclusively for hardware
> > errors? If you have such bad hardware, there is another simple solution:
> > change HZ to 100 and the error is reduced to 15ppm.
>
> True its not exclusively for hardware errors, and if we were talking
> about only 15ppm I wouldn't really worry about it. But when we're saying
> the system is adding 30% of the maximum error, that's just not good.

Another 30% is required for normal to crappy hardware clocks and then
there is still enough room left.

> > I would see the point if this problem had actually any practically
> > relevance, but this error is not a problem for pretty much all existing
> > standard hardware. Why are you insisting on redesigning timekeeping for
> > broken hardware?
>
> Remember my earlier data? Where I was talking about the acpi_pm being a
> multiple of the PIT frequency? By removing CLOCK_TICK_ADJUST we got a
> 127ppm error when HZ=1000. NO_HZ drops that down to where we don't care,
> but this _does_ effect current hardware, so I'd call it relevant.

How exactly does it effect current hardware in a way that it breaks them?
Despite this error everything still works fine, the hardware doesn't care.

> > There's nothing 'injected', that resolution error is very real and the
> > 500ppm limit is more than enough to deal with this. _Nobody_ is hurt by
> > this.
>
> Sure, 500ppm is enough for most people with good hardware. But remember
> the alpha example you brought up earlier? The HZ=1200 case, with the
> CLOCK_TICK_RATE=32768? If we don't take CLOCK_TICK_ADJUST into account,
> we end up with a **11230ppm** error from the granularity issue. NTP just
> won't work on those systems.
>
> Now granted, the three types of alpha systems that actually use that HZ
> value is probably as close to "nobody" as you're going to get, but I
> don't think we can just throw the granularity issue aside.

That's actually a good example, why it's irrelevant. First it's using a
cycle based clock, thus the rounding error is irrelevant. Second in the
common case they already use 1024 as HZ to reduce this error, so something
similiar could be done for the HZ=1200 case and I suspect that it was
already done and only CLOCK_TICK_RATE is just wrong. This mail
http://consortiumlibrary.org/axp-list/archive/2002-11/0101.html suggest
that this is the right thing to do.

There is _no_ reason to artificially optimize this error value, there are
still enough other ways to improve timekeeping. The granularity error is
there no matter what you do and as long as it's within a reasonable limit
there is nothing that needs fixing.

bye, Roman

2008-02-29 04:49:54

by Roman Zippel

[permalink] [raw]
Subject: [PATCH] Remove obsolete CLOCK_TICK_ADJUST

Hi,

Can we please clean up the current mess and get the patch below merged?

bye, Roman


Revert bbe4d18ac2e058c56adb0cd71f49d9ed3216a405 and
e13a2e61dd5152f5499d2003470acf9c838eab84 and remove CLOCK_TICK_ADJUST
completely. Add a optional kernel parameter ntp_tick_adj instead to allow
adjusting of a large base drift and thus keeping ntpd happy.
The CLOCK_TICK_ADJUST mechanism was introduced at a time PIT was the
primary clock, but we have a varity of clock sources now, so a global PIT
specific adjustment makes little sense anymore.

Signed-off-by: Roman Zippel <[email protected]>

---
include/linux/timex.h | 9 +--------
kernel/time/ntp.c | 11 ++++++++++-
kernel/time/timekeeping.c | 6 ++----
3 files changed, 13 insertions(+), 13 deletions(-)

Index: linux-2.6/include/linux/timex.h
===================================================================
--- linux-2.6.orig/include/linux/timex.h
+++ linux-2.6/include/linux/timex.h
@@ -232,14 +232,7 @@ static inline int ntp_synced(void)
#else
#define NTP_INTERVAL_FREQ (HZ)
#endif
-
-#define CLOCK_TICK_OVERFLOW (LATCH * HZ - CLOCK_TICK_RATE)
-#define CLOCK_TICK_ADJUST (((s64)CLOCK_TICK_OVERFLOW * NSEC_PER_SEC) / \
- (s64)CLOCK_TICK_RATE)
-
-/* Because using NSEC_PER_SEC would be too easy */
-#define NTP_INTERVAL_LENGTH ((((s64)TICK_USEC * NSEC_PER_USEC * USER_HZ) + \
- CLOCK_TICK_ADJUST) / NTP_INTERVAL_FREQ)
+#define NTP_INTERVAL_LENGTH (NSEC_PER_SEC/NTP_INTERVAL_FREQ)

/* Returns how long ticks are at present, in ns / 2^(SHIFT_SCALE-10). */
extern u64 current_tick_length(void);
Index: linux-2.6/kernel/time/ntp.c
===================================================================
--- linux-2.6.orig/kernel/time/ntp.c
+++ linux-2.6/kernel/time/ntp.c
@@ -42,12 +42,13 @@ long time_esterror = NTP_PHASE_LIMIT; /*
long time_freq; /* frequency offset (scaled ppm)*/
static long time_reftime; /* time at last adjustment (s) */
long time_adjust;
+long ntp_tick_adj;

static void ntp_update_frequency(void)
{
u64 second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ)
<< TICK_LENGTH_SHIFT;
- second_length += (s64)CLOCK_TICK_ADJUST << TICK_LENGTH_SHIFT;
+ second_length += (s64)ntp_tick_adj << TICK_LENGTH_SHIFT;
second_length += (s64)time_freq << (TICK_LENGTH_SHIFT - SHIFT_NSEC);

tick_length_base = second_length;
@@ -400,3 +401,11 @@ leave: if ((time_status & (STA_UNSYNC|ST
notify_cmos_timer();
return(result);
}
+
+static int __init ntp_tick_adj_setup(char *str)
+{
+ ntp_tick_adj = simple_strtol(str, NULL, 0);
+ return 1;
+}
+
+__setup("ntp_tick_adj=", ntp_tick_adj_setup);
Index: linux-2.6/kernel/time/timekeeping.c
===================================================================
--- linux-2.6.orig/kernel/time/timekeeping.c
+++ linux-2.6/kernel/time/timekeeping.c
@@ -187,8 +187,7 @@ static void change_clocksource(void)

clock->error = 0;
clock->xtime_nsec = 0;
- clocksource_calculate_interval(clock,
- (unsigned long)(current_tick_length()>>TICK_LENGTH_SHIFT));
+ clocksource_calculate_interval(clock, NTP_INTERVAL_LENGTH);

tick_clock_notify();

@@ -245,8 +244,7 @@ void __init timekeeping_init(void)
ntp_clear();

clock = clocksource_get_next();
- clocksource_calculate_interval(clock,
- (unsigned long)(current_tick_length()>>TICK_LENGTH_SHIFT));
+ clocksource_calculate_interval(clock, NTP_INTERVAL_LENGTH);
clock->cycle_last = clocksource_read(clock);

xtime.tv_sec = sec;

2008-02-29 06:25:23

by Ray Lee

[permalink] [raw]
Subject: Re: [PATCH] Remove obsolete CLOCK_TICK_ADJUST

On Thu, Feb 28, 2008 at 8:49 PM, Roman Zippel <[email protected]> wrote:
> Revert bbe4d18ac2e058c56adb0cd71f49d9ed3216a405 and
> e13a2e61dd5152f5499d2003470acf9c838eab84 and remove CLOCK_TICK_ADJUST
> completely. Add a optional kernel parameter ntp_tick_adj instead to allow
> adjusting of a large base drift and thus keeping ntpd happy.

Wait, so you're saying that something that the kernel correctly
figures out on its own today should instead be a kernel parameter,
with users having to supply it explicitly? Am I confused?

If I'm not confused, would you'd be so kind to explain to the peanut
gallery over here why this is a good thing?

2008-02-29 13:31:32

by Roman Zippel

[permalink] [raw]
Subject: Re: [PATCH] Remove obsolete CLOCK_TICK_ADJUST

Hi,

On Thu, 28 Feb 2008, Ray Lee wrote:

> > Revert bbe4d18ac2e058c56adb0cd71f49d9ed3216a405 and
> > e13a2e61dd5152f5499d2003470acf9c838eab84 and remove CLOCK_TICK_ADJUST
> > completely. Add a optional kernel parameter ntp_tick_adj instead to allow
> > adjusting of a large base drift and thus keeping ntpd happy.
>
> Wait, so you're saying that something that the kernel correctly
> figures out on its own today should instead be a kernel parameter,
> with users having to supply it explicitly? Am I confused?
>
> If I'm not confused, would you'd be so kind to explain to the peanut
> gallery over here why this is a good thing?

Sorry, for causing this confusion.
Unless you run a computer with the PIT or jiffies clock, this value is
just a random value added to the calculation.
Even if you use the PIT clock, the drift isn't large enough to be worried
about it with standard configurations. You have to manually set HZ to 2000
and use the PIT clock, then you might have a need for this parameter.

bye, Roman

2008-02-29 19:00:23

by Jörg-Volker Peetz

[permalink] [raw]
Subject: Re: [PATCH] correct inconsistent ntp interval/tick_length usage

john stultz wrote:
<snip>
> diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
> index c88b591..fe25c94 100644
> --- a/kernel/time/ntp.c
> +++ b/kernel/time/ntp.c
> @@ -43,19 +43,32 @@ long time_freq; /* frequency offset (scaled ppm)*/
> static long time_reftime; /* time at last adjustment (s) */
> long time_adjust;
>
> +static s64 granularity_error_adjust;
> +
> +void ntp_set_granularity_error(s64 len)
> +{
> + granularity_error_adjust = len * NTP_INTERVAL_FREQ;
> +}
> +
> static void ntp_update_frequency(void)
> {
> u64 second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ)
> << TICK_LENGTH_SHIFT;
> - second_length += (s64)CLOCK_TICK_ADJUST << TICK_LENGTH_SHIFT;
> - second_length += (s64)time_freq << (TICK_LENGTH_SHIFT - SHIFT_NSEC);
> + s64 adj;
> +
> + /* Compensate for clocksource granularity error */
> + second_length += granularity_error_adjust;
> +
> + /* Scale the base second length by the frequency adjustment */
> + adj = second_length * time_freq;
> + do_div(adj, 1000000);
> + second_length += adj>>SHIFT_NSEC;
>
> tick_length_base = second_length;
> + do_div(tick_length_base, NTP_INTERVAL_FREQ);
>
> do_div(second_length, HZ);
> tick_nsec = second_length >> TICK_LENGTH_SHIFT;
> -
> - do_div(tick_length_base, NTP_INTERVAL_FREQ);
> }
>
> /**
<snip>

Hi John,

out of curiosity and inspired by the patch you suggested, I did a test
with the following ntp_update_frequency function in kernel/time/ntp.c
of kernel 2.6.24.3 using NO_HZ and the hpet timer:

static void ntp_update_frequency(void)
{
s64 adj;
u64 second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ)
<< TICK_LENGTH_SHIFT;

printk(KERN_NOTICE "*ntp* timefreq = %lld\n", (s64)time_freq);
printk(KERN_NOTICE "*ntp* s len = %lld\n", second_length);

printk(KERN_NOTICE "*ntp* corr 1 = %lld\n",
(s64)time_freq << (TICK_LENGTH_SHIFT - SHIFT_NSEC));

/* Scale the base second length by the frequency adjustment */
adj = second_length * time_freq;
do_div(adj, 1000000);
printk(KERN_NOTICE "*ntp* corr 2 = %lld\n", adj>>SHIFT_NSEC);

second_length += (s64)time_freq << (TICK_LENGTH_SHIFT - SHIFT_NSEC);

tick_length_base = second_length;
do_div(tick_length_base, NTP_INTERVAL_FREQ);

do_div(second_length, HZ);
tick_nsec = second_length >> TICK_LENGTH_SHIFT;
}

Running this kernel and ntpd I get numbers like the following in my
syslog file:

Feb 29 12:28:16 skadi kernel: *ntp* timefreq = 305345
Feb 29 12:28:16 skadi kernel: *ntp* s len = 4294967296000000000
Feb 29 12:28:16 skadi kernel: *ntp* corr 1 = 320177438720
Feb 29 12:28:16 skadi kernel: *ntp* corr 2 = 3030411349
Feb 29 12:36:53 skadi kernel: *ntp* timefreq = 730456
Feb 29 12:36:53 skadi kernel: *ntp* s len = 4294967296000000000
Feb 29 12:36:53 skadi kernel: *ntp* corr 1 = 765938630656
Feb 29 12:36:53 skadi kernel: *ntp* corr 2 = 2434829845
Feb 29 12:39:03 skadi kernel: *ntp* timefreq = 868771
Feb 29 12:39:03 skadi kernel: *ntp* s len = 4294967296000000000
Feb 29 12:39:03 skadi kernel: *ntp* corr 1 = 910972420096
Feb 29 12:39:03 skadi kernel: *ntp* corr 2 = 2301870005

So the original correction and the correction suggested by you
differ significantly by a factor of approximately 1000.
Incidentally, both corrections are nearly neglectable compared to
second_length.
But I think the correction suggested by you is calculated wrong due to
an overflow of the multiplication

second_length * time_freq

Calculating the correction as

(second_length / 1000000) * time_freq

the correction would be 1311446788997120000 which is much bigger as
the original correction 320177438720 (by a factor of 10^7).

Is it normal to have a second_length of approx. 4 * 10^18 ?
In what units?
--
Regards,
J?rg-Volker.

2008-02-29 22:10:29

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Remove obsolete CLOCK_TICK_ADJUST

On Fri, 29 Feb 2008 05:49:35 +0100 (CET)
Roman Zippel <[email protected]> wrote:

> Can we please clean up the current mess and get the patch below merged?
>
> bye, Roman
>
>
> Revert bbe4d18ac2e058c56adb0cd71f49d9ed3216a405 and
> e13a2e61dd5152f5499d2003470acf9c838eab84 and remove CLOCK_TICK_ADJUST
> completely. Add a optional kernel parameter ntp_tick_adj instead to allow
> adjusting of a large base drift and thus keeping ntpd happy.
> The CLOCK_TICK_ADJUST mechanism was introduced at a time PIT was the
> primary clock, but we have a varity of clock sources now, so a global PIT
> specific adjustment makes little sense anymore.
>

The changelog provides no reason for the revert of those two patches.

Look at it from the point of view of a person who hasn't been following the
discussion (whose initials might be LT). That person might get puzzled and
upset, no?

2008-02-29 22:39:29

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Remove obsolete CLOCK_TICK_ADJUST

On Fri, 29 Feb 2008 23:27:01 +0100 (CET)
Roman Zippel <[email protected]> wrote:

> Hi,
>
> On Fri, 29 Feb 2008, Andrew Morton wrote:
>
> > The changelog provides no reason for the revert of those two patches.
> >
> > Look at it from the point of view of a person who hasn't been following the
> > discussion (whose initials might be LT). That person might get puzzled and
> > upset, no?
>
> I'm puzzled at how to explain this...

Please try.

> The whole details have been explained over and over during the discussion.

That's no use to someone who is trying to understand this patch.

> The simple answer is that CLOCK_TICK_ADJUST has been causing extra clock
> drift, John's attempt didn't fix the real cause.
> My patch doesn't just revert the patches, it also includes the _real_ fix,
> so why would the real fix require more justification? That person also
> didn't get puzzled why two patches claiming to fix the same problem got
> merged...

A revert patch's changelog should explain why the code is being reverted.
ie: what was wrong with it.

A bugfix patch's changelog should explain the problem which is being
solved, and how it solves it.


Look at it from the point of view of someone who isn't intimately involved
in that subsystem and who isn't following every email on whatever mailing
list. Your changelog des not provide sufficient detail for such a person
to understand your patch.

2008-02-29 22:50:21

by Roman Zippel

[permalink] [raw]
Subject: Re: [PATCH] Remove obsolete CLOCK_TICK_ADJUST

Hi,

On Fri, 29 Feb 2008, Andrew Morton wrote:

> The changelog provides no reason for the revert of those two patches.
>
> Look at it from the point of view of a person who hasn't been following the
> discussion (whose initials might be LT). That person might get puzzled and
> upset, no?

I'm puzzled at how to explain this...
The whole details have been explained over and over during the discussion.
The simple answer is that CLOCK_TICK_ADJUST has been causing extra clock
drift, John's attempt didn't fix the real cause.
My patch doesn't just revert the patches, it also includes the _real_ fix,
so why would the real fix require more justification? That person also
didn't get puzzled why two patches claiming to fix the same problem got
merged...

bye, Roman

2008-02-29 23:12:09

by john stultz

[permalink] [raw]
Subject: Re: [PATCH] Remove obsolete CLOCK_TICK_ADJUST


On Fri, 2008-02-29 at 05:49 +0100, Roman Zippel wrote:
> Hi,
>
> Can we please clean up the current mess and get the patch below merged?

Yea. Sorry for being slow this week, had some other distractions going
on.

I'm still not happy about the granularity issue (or non-issue in your
mind), but both my version and your version of the solution are pretty
close at this point and I can rework my difference on-top of this patch
without much trouble.


> Revert bbe4d18ac2e058c56adb0cd71f49d9ed3216a405 and
> e13a2e61dd5152f5499d2003470acf9c838eab84 and remove CLOCK_TICK_ADJUST
> completely. Add a optional kernel parameter ntp_tick_adj instead to allow
> adjusting of a large base drift and thus keeping ntpd happy.
> The CLOCK_TICK_ADJUST mechanism was introduced at a time PIT was the
> primary clock, but we have a varity of clock sources now, so a global PIT
> specific adjustment makes little sense anymore.
>
> Signed-off-by: Roman Zippel <[email protected]>

Acked-by: John Stultz <[email protected]>


> ---
> include/linux/timex.h | 9 +--------
> kernel/time/ntp.c | 11 ++++++++++-
> kernel/time/timekeeping.c | 6 ++----
> 3 files changed, 13 insertions(+), 13 deletions(-)
>
> Index: linux-2.6/include/linux/timex.h
> ===================================================================
> --- linux-2.6.orig/include/linux/timex.h
> +++ linux-2.6/include/linux/timex.h
> @@ -232,14 +232,7 @@ static inline int ntp_synced(void)
> #else
> #define NTP_INTERVAL_FREQ (HZ)
> #endif
> -
> -#define CLOCK_TICK_OVERFLOW (LATCH * HZ - CLOCK_TICK_RATE)
> -#define CLOCK_TICK_ADJUST (((s64)CLOCK_TICK_OVERFLOW * NSEC_PER_SEC) / \
> - (s64)CLOCK_TICK_RATE)
> -
> -/* Because using NSEC_PER_SEC would be too easy */
> -#define NTP_INTERVAL_LENGTH ((((s64)TICK_USEC * NSEC_PER_USEC * USER_HZ) + \
> - CLOCK_TICK_ADJUST) / NTP_INTERVAL_FREQ)
> +#define NTP_INTERVAL_LENGTH (NSEC_PER_SEC/NTP_INTERVAL_FREQ)
>
> /* Returns how long ticks are at present, in ns / 2^(SHIFT_SCALE-10). */
> extern u64 current_tick_length(void);
> Index: linux-2.6/kernel/time/ntp.c
> ===================================================================
> --- linux-2.6.orig/kernel/time/ntp.c
> +++ linux-2.6/kernel/time/ntp.c
> @@ -42,12 +42,13 @@ long time_esterror = NTP_PHASE_LIMIT; /*
> long time_freq; /* frequency offset (scaled ppm)*/
> static long time_reftime; /* time at last adjustment (s) */
> long time_adjust;
> +long ntp_tick_adj;
>
> static void ntp_update_frequency(void)
> {
> u64 second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ)
> << TICK_LENGTH_SHIFT;
> - second_length += (s64)CLOCK_TICK_ADJUST << TICK_LENGTH_SHIFT;
> + second_length += (s64)ntp_tick_adj << TICK_LENGTH_SHIFT;
> second_length += (s64)time_freq << (TICK_LENGTH_SHIFT - SHIFT_NSEC);
>
> tick_length_base = second_length;
> @@ -400,3 +401,11 @@ leave: if ((time_status & (STA_UNSYNC|ST
> notify_cmos_timer();
> return(result);
> }
> +
> +static int __init ntp_tick_adj_setup(char *str)
> +{
> + ntp_tick_adj = simple_strtol(str, NULL, 0);
> + return 1;
> +}
> +
> +__setup("ntp_tick_adj=", ntp_tick_adj_setup);
> Index: linux-2.6/kernel/time/timekeeping.c
> ===================================================================
> --- linux-2.6.orig/kernel/time/timekeeping.c
> +++ linux-2.6/kernel/time/timekeeping.c
> @@ -187,8 +187,7 @@ static void change_clocksource(void)
>
> clock->error = 0;
> clock->xtime_nsec = 0;
> - clocksource_calculate_interval(clock,
> - (unsigned long)(current_tick_length()>>TICK_LENGTH_SHIFT));
> + clocksource_calculate_interval(clock, NTP_INTERVAL_LENGTH);
>
> tick_clock_notify();
>
> @@ -245,8 +244,7 @@ void __init timekeeping_init(void)
> ntp_clear();
>
> clock = clocksource_get_next();
> - clocksource_calculate_interval(clock,
> - (unsigned long)(current_tick_length()>>TICK_LENGTH_SHIFT));
> + clocksource_calculate_interval(clock, NTP_INTERVAL_LENGTH);
> clock->cycle_last = clocksource_read(clock);
>
> xtime.tv_sec = sec;

2008-02-29 23:12:37

by john stultz

[permalink] [raw]
Subject: Re: [PATCH] Remove obsolete CLOCK_TICK_ADJUST


On Fri, 2008-02-29 at 23:27 +0100, Roman Zippel wrote:
> Hi,
>
> On Fri, 29 Feb 2008, Andrew Morton wrote:
>
> > The changelog provides no reason for the revert of those two patches.
> >
> > Look at it from the point of view of a person who hasn't been following the
> > discussion (whose initials might be LT). That person might get puzzled and
> > upset, no?
>
> I'm puzzled at how to explain this...
> The whole details have been explained over and over during the discussion.
> The simple answer is that CLOCK_TICK_ADJUST has been causing extra clock
> drift, John's attempt didn't fix the real cause.
> My patch doesn't just revert the patches, it also includes the _real_ fix,
> so why would the real fix require more justification? That person also
> didn't get puzzled why two patches claiming to fix the same problem got
> merged...

How about this tweaked version of what I was using:


The first version of the ntp_interval/tick_length inconsistent usage
patch was recently merged as bbe4d18ac2e058c56adb0cd71f49d9ed3216a405

http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=bbe4d18ac2e058c56adb0cd71f49d9ed3216a405

While the fix did greatly improve the situation, it was correctly
pointed out by Roman that it does have a small bug: If the users change
clocksources after the system has been running and NTP has made
corrections, the correctoins made against the old clocksource will be
applied against the new clocksource, causing error.

The second attempt, which corrects the issue in the NTP_INTERVAL_LENGTH
definition has also made it up-stream as commit
e13a2e61dd5152f5499d2003470acf9c838eab84

http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e13a2e61dd5152f5499d2003470acf9c838eab84

Roman has correctly pointed out that CLOCK_TICK_ADJUST is calculated
based on the PIT's frequency, and isn't really relevant to non-PIT
driven clocksources (that is, clocksources other then jiffies and pit).

This patch reverts both of those changes, and simply removes
CLOCK_TICK_ADJUST.

This does remove the granularity error correction for users of PIT and
Jiffies clocksource users, but the granularity error but for the
majority of users, it should be within the 500ppm range NTP can
accommodate for.

For systems that have granularity errors greater then 500ppm, the
"ntp_tick_adj=" boot option can be used to compensate.




2008-03-02 04:04:06

by Roman Zippel

[permalink] [raw]
Subject: Re: [PATCH] Remove obsolete CLOCK_TICK_ADJUST

Hi,

On Fri, 29 Feb 2008, john stultz wrote:

> I'm still not happy about the granularity issue (or non-issue in your
> mind),

It's an issue, but it's not as big as you seem to think.
The error drift should be small and 150ppm isn't too good, but it isn't an
error to be too worried about.
As the alpha example shows, in the past we already tried to reduce this
error anyway, in this case by choosing an appropriate HZ value.

> but both my version and your version of the solution are pretty
> close at this point and I can rework my difference on-top of this patch
> without much trouble.

Practically it should be enough to just check the base drift and print a
warning message, as it suggest to be more likely a configuration error.

bye, Roman