Linus,
Please pull the latest timers-for-linus-ntp git tree from:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git timers-for-linus-ntp
Thanks,
tglx
------------------>
John Stultz (2):
ntp: adjust SHIFT_PLL to improve NTP convergence
ntp: fix comment typos
include/linux/timex.h | 42 +++++++++++++++++++++++++++++++-----------
1 files changed, 31 insertions(+), 11 deletions(-)
diff --git a/include/linux/timex.h b/include/linux/timex.h
index aa3475f..9910e3b 100644
--- a/include/linux/timex.h
+++ b/include/linux/timex.h
@@ -170,17 +170,37 @@ struct timex {
#include <asm/timex.h>
/*
- * SHIFT_KG and SHIFT_KF establish the damping of the PLL and are chosen
- * for a slightly underdamped convergence characteristic. SHIFT_KH
- * establishes the damping of the FLL and is chosen by wisdom and black
- * art.
+ * SHIFT_PLL is used as a dampening factor to define how much we
+ * adjust the frequency correction for a given offset in PLL mode.
+ * It also used in dampening the offset correction, to define how
+ * much of the current value in time_offset we correct for each
+ * second. Changing this value changes the stiffness of the ntp
+ * adjustment code. A lower value makes it more flexible, reducing
+ * NTP convergence time. A higher value makes it stiffer, increasing
+ * convergence time, but making the clock more stable.
*
- * MAXTC establishes the maximum time constant of the PLL. With the
- * SHIFT_KG and SHIFT_KF values given and a time constant range from
- * zero to MAXTC, the PLL will converge in 15 minutes to 16 hours,
- * respectively.
+ * In David Mills' nanokernel reference implementation SHIFT_PLL is 4.
+ * However this seems to increase convergence time much too long.
+ *
+ * https://lists.ntp.org/pipermail/hackers/2008-January/003487.html
+ *
+ * In the above mailing list discussion, it seems the value of 4
+ * was appropriate for other Unix systems with HZ=100, and that
+ * SHIFT_PLL should be decreased as HZ increases. However, Linux's
+ * clock steering implementation is HZ independent.
+ *
+ * Through experimentation, a SHIFT_PLL value of 2 was found to allow
+ * for fast convergence (very similar to the NTPv3 code used prior to
+ * v2.6.19), with good clock stability.
+ *
+ *
+ * SHIFT_FLL is used as a dampening factor to define how much we
+ * adjust the frequency correction for a given offset in FLL mode.
+ * In David Mills' nanokernel reference implementation SHIFT_FLL is 2.
+ *
+ * MAXTC establishes the maximum time constant of the PLL.
*/
-#define SHIFT_PLL 4 /* PLL frequency factor (shift) */
+#define SHIFT_PLL 2 /* PLL frequency factor (shift) */
#define SHIFT_FLL 2 /* FLL frequency factor (shift) */
#define MAXTC 10 /* maximum time constant (shift) */
@@ -192,10 +212,10 @@ struct timex {
#define SHIFT_USEC 16 /* frequency offset scale (shift) */
#define PPM_SCALE ((s64)NSEC_PER_USEC << (NTP_SCALE_SHIFT - SHIFT_USEC))
#define PPM_SCALE_INV_SHIFT 19
-#define PPM_SCALE_INV ((1ll << (PPM_SCALE_INV_SHIFT + NTP_SCALE_SHIFT)) / \
+#define PPM_SCALE_INV ((1LL << (PPM_SCALE_INV_SHIFT + NTP_SCALE_SHIFT)) / \
PPM_SCALE + 1)
-#define MAXPHASE 500000000l /* max phase error (ns) */
+#define MAXPHASE 500000000L /* max phase error (ns) */
#define MAXFREQ 500000 /* max frequency error (ns/s) */
#define MAXFREQ_SCALED ((s64)MAXFREQ << NTP_SCALE_SHIFT)
#define MINSEC 256 /* min interval between updates (s) */
On Mon, Jun 15, 2009 at 7:06 AM, Thomas Gleixner<[email protected]> wrote:
> Linus,
>
> Please pull the latest timers-for-linus-ntp git tree from:
>
> ? git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git timers-for-linus-ntp
>
> Thanks,
>
> ? ? ? ?tglx
>
> ------------------>
> John Stultz (2):
> ? ? ?ntp: adjust SHIFT_PLL to improve NTP convergence
> ? ? ?ntp: fix comment typos
Thomas,
Could we hold off on pushing this? I'm still working with Miroslav
to try to work out a solution here from ntpd user-side.
thanks
-john
On Mon, Jun 15, 2009 at 1:16 PM, john stultz<[email protected]> wrote:
> On Mon, Jun 15, 2009 at 7:06 AM, Thomas Gleixner<[email protected]> wrote:
>> Linus,
>>
>> Please pull the latest timers-for-linus-ntp git tree from:
>>
>> ? git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git timers-for-linus-ntp
>>
>> Thanks,
>>
>> ? ? ? ?tglx
>>
>> ------------------>
>> John Stultz (2):
>> ? ? ?ntp: adjust SHIFT_PLL to improve NTP convergence
>> ? ? ?ntp: fix comment typos
>
> Thomas,
> ? Could we hold off on pushing this? I'm still working with Miroslav
> to try to work out a solution here from ntpd user-side.
Linus,
You probably didn't see this before merging. Could you yank the
above two patches? Miroslav (RH package maintainer for ntpd), has
voiced concerns that the SHIFT_PLL patch breaks the NTP design and is
worried it may negatively effect NTP networks of systems running with
different SHIFT_PLL values.
While the patch does greatly improve NTP convergence times, and so far
no negative results have been seen in tests, its out of an abundance
of caution and a desire to keep the adjtimex behavior stable that I
requested Thomas and Ingo to hold off on merging this patch, while I
work with Miroslav to see if we cannot get the same benefit by
adjusting the userspace NTPd.
So if you could revert the two patches until we either sort things out
in userspace or I resubmit, I'd appreciate it.
Sorry for the mixup here.
thanks
-john
* john stultz <[email protected]> wrote:
> Linus,
> You probably didn't see this before merging. Could you yank the
> above two patches? Miroslav (RH package maintainer for ntpd), has
> voiced concerns that the SHIFT_PLL patch breaks the NTP design and is
> worried it may negatively effect NTP networks of systems running with
> different SHIFT_PLL values.
>
> While the patch does greatly improve NTP convergence times, and so
> far no negative results have been seen in tests, its out of an
> abundance of caution and a desire to keep the adjtimex behavior
> stable that I requested Thomas and Ingo to hold off on merging
> this patch, while I work with Miroslav to see if we cannot get the
> same benefit by adjusting the userspace NTPd.
As i explained it in previous threads i disagree. The only
technically correct direction is to improve NTP stabilization and
convergence times as much as possible. [*]
( [*] Without getting into over-compensation and without starting to
oscillate instead of converging - that would be a bug, but
such a bug has not been reported so far. )
The 'concern' voiced was that: "what if other OSs converge slower in
a cluster and now we have a faster OS in the mix". This absolutely
ignores the other 99% of cases where people would have crappier
convergence after the revert and for no good reason.
And even regarding that 1% example, well, duh: different OSes have
different convergence times, fundamentally so - such as Linux had a
very slow convergence time from about 2.6.18 up to recent kernels
due to a bug. Now it's converging even faster ...
So i dont think that "Linux is too good" is a good basis to
artificially make Linux's NTP code crappier. Really. We dont 'play
nice' by being equally crappy.
Each OS should converge back to the correct time _as fast as
physically possible_. If this is a problem and if someone wants
crappy time and longer periods of convergence for some odd reason
then that header file change can be edited by hand even. It's not
like it's that hard to change, if there's genuine interest.
So i'm against any revert on this basis. If another basis comes up
we can reconsider of course. What do you think?
Ingo
On Tue, 16 Jun 2009, Ingo Molnar wrote:
> Each OS should converge back to the correct time _as fast as
> physically possible_. If this is a problem and if someone wants
> crappy time and longer periods of convergence for some odd reason
> then that header file change can be edited by hand even. It's not
> like it's that hard to change, if there's genuine interest.
>
> So i'm against any revert on this basis. If another basis comes up
> we can reconsider of course. What do you think?
I completely agree.
Consistent convergence across different OSs is a wet dream.
We see even different behaviour across kernel versions :) Also I
recently looked at an embedded system running the same kernel version
as a PC in the same network. Same version of user space tools. Main
difference aside the arch was HZ (100 vs. 1000). The PC convergence
time was about 40% higher than the embedded systems.
Thanks,
tglx
On Tue, Jun 16, 2009 at 11:06:47AM +0200, Ingo Molnar wrote:
>
> * john stultz <[email protected]> wrote:
>
> > Linus,
> > You probably didn't see this before merging. Could you yank the
> > above two patches? Miroslav (RH package maintainer for ntpd), has
> > voiced concerns that the SHIFT_PLL patch breaks the NTP design and is
> > worried it may negatively effect NTP networks of systems running with
> > different SHIFT_PLL values.
> >
> > While the patch does greatly improve NTP convergence times, and so
> > far no negative results have been seen in tests, its out of an
> > abundance of caution and a desire to keep the adjtimex behavior
> > stable that I requested Thomas and Ingo to hold off on merging
> > this patch, while I work with Miroslav to see if we cannot get the
> > same benefit by adjusting the userspace NTPd.
[..]
> Each OS should converge back to the correct time _as fast as
> physically possible_. If this is a problem and if someone wants
> crappy time and longer periods of convergence for some odd reason
> then that header file change can be edited by hand even. It's not
> like it's that hard to change, if there's genuine interest.
>
> So i'm against any revert on this basis. If another basis comes up
> we can reconsider of course. What do you think?
I think the most important one is following the NTP specification.
If Linux really needs to have the fastest PLL, could it be done by
modifying the time constant passed in adjtimex structure instead of
changing SHIFT_PLL? The PLL response will be exactly the same, but it
will allow the applications (and admins) to detect that it is
different than expected.
Something like:
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -425,6 +425,8 @@
time_constant = txc->constant;
if (!(time_status & STA_NANO))
time_constant += 4;
+ /* We want faster PLL */
+ time_constant -= 2;
time_constant = min(time_constant, (long)MAXTC);
time_constant = max(time_constant, 0l);
}
Thanks,
--
Miroslav Lichvar
On Tue, 2009-06-16 at 14:52 +0200, Miroslav Lichvar wrote:
> On Tue, Jun 16, 2009 at 11:06:47AM +0200, Ingo Molnar wrote:
> >
> > * john stultz <[email protected]> wrote:
> >
> > > Linus,
> > > You probably didn't see this before merging. Could you yank the
> > > above two patches? Miroslav (RH package maintainer for ntpd), has
> > > voiced concerns that the SHIFT_PLL patch breaks the NTP design and is
> > > worried it may negatively effect NTP networks of systems running with
> > > different SHIFT_PLL values.
> > >
> > > While the patch does greatly improve NTP convergence times, and so
> > > far no negative results have been seen in tests, its out of an
> > > abundance of caution and a desire to keep the adjtimex behavior
> > > stable that I requested Thomas and Ingo to hold off on merging
> > > this patch, while I work with Miroslav to see if we cannot get the
> > > same benefit by adjusting the userspace NTPd.
>
> [..]
>
> > Each OS should converge back to the correct time _as fast as
> > physically possible_. If this is a problem and if someone wants
> > crappy time and longer periods of convergence for some odd reason
> > then that header file change can be edited by hand even. It's not
> > like it's that hard to change, if there's genuine interest.
> >
> > So i'm against any revert on this basis. If another basis comes up
> > we can reconsider of course. What do you think?
>
> I think the most important one is following the NTP specification.
>
> If Linux really needs to have the fastest PLL, could it be done by
> modifying the time constant passed in adjtimex structure instead of
> changing SHIFT_PLL? The PLL response will be exactly the same, but it
> will allow the applications (and admins) to detect that it is
> different than expected.
>
> Something like:
>
> --- a/kernel/time/ntp.c
> +++ b/kernel/time/ntp.c
> @@ -425,6 +425,8 @@
> time_constant = txc->constant;
> if (!(time_status & STA_NANO))
> time_constant += 4;
> + /* We want faster PLL */
> + time_constant -= 2;
> time_constant = min(time_constant, (long)MAXTC);
> time_constant = max(time_constant, 0l);
> }
It looks mathematically equivalent, although I've not had time to test
it yet. Probably needs a bigger comment :)
The nice thing with this version is that we're able to expose that the
behavior would be different then other systems, but the other side of
that coin might be that when the user specifies a time_constant value,
the interface will show a different one being used. This might cause
some bug reports saying the interface isn't responding properly, or
something. Although this is already the case for !STA_NANO, and so far
few have noticed.
thanks
-john
* John Stultz <[email protected]> wrote:
> On Tue, 2009-06-16 at 14:52 +0200, Miroslav Lichvar wrote:
> > On Tue, Jun 16, 2009 at 11:06:47AM +0200, Ingo Molnar wrote:
> > >
> > > * john stultz <[email protected]> wrote:
> > >
> > > > Linus,
> > > > You probably didn't see this before merging. Could you yank the
> > > > above two patches? Miroslav (RH package maintainer for ntpd), has
> > > > voiced concerns that the SHIFT_PLL patch breaks the NTP design and is
> > > > worried it may negatively effect NTP networks of systems running with
> > > > different SHIFT_PLL values.
> > > >
> > > > While the patch does greatly improve NTP convergence times, and so
> > > > far no negative results have been seen in tests, its out of an
> > > > abundance of caution and a desire to keep the adjtimex behavior
> > > > stable that I requested Thomas and Ingo to hold off on merging
> > > > this patch, while I work with Miroslav to see if we cannot get the
> > > > same benefit by adjusting the userspace NTPd.
> >
> > [..]
> >
> > > Each OS should converge back to the correct time _as fast as
> > > physically possible_. If this is a problem and if someone wants
> > > crappy time and longer periods of convergence for some odd reason
> > > then that header file change can be edited by hand even. It's not
> > > like it's that hard to change, if there's genuine interest.
> > >
> > > So i'm against any revert on this basis. If another basis comes up
> > > we can reconsider of course. What do you think?
> >
> > I think the most important one is following the NTP specification.
> >
> > If Linux really needs to have the fastest PLL, could it be done by
> > modifying the time constant passed in adjtimex structure instead of
> > changing SHIFT_PLL? The PLL response will be exactly the same, but it
> > will allow the applications (and admins) to detect that it is
> > different than expected.
> >
> > Something like:
> >
> > --- a/kernel/time/ntp.c
> > +++ b/kernel/time/ntp.c
> > @@ -425,6 +425,8 @@
> > time_constant = txc->constant;
> > if (!(time_status & STA_NANO))
> > time_constant += 4;
> > + /* We want faster PLL */
> > + time_constant -= 2;
> > time_constant = min(time_constant, (long)MAXTC);
> > time_constant = max(time_constant, 0l);
> > }
>
>
> It looks mathematically equivalent, although I've not had time to
> test it yet. Probably needs a bigger comment :)
>
> The nice thing with this version is that we're able to expose that
> the behavior would be different then other systems, but the other
> side of that coin might be that when the user specifies a
> time_constant value, the interface will show a different one being
> used. This might cause some bug reports saying the interface isn't
> responding properly, or something. Although this is already the
> case for !STA_NANO, and so far few have noticed.
Sounds good to me. It feels a bit quirky that we 'correct' the
user-space provided parameter by 2 ... Definitely needs a big
comment.
Ingo
On Wed, Jun 17, 2009 at 08:38:22AM -0700, John Stultz wrote:
> On Tue, 2009-06-16 at 14:52 +0200, Miroslav Lichvar wrote:
> > If Linux really needs to have the fastest PLL, could it be done by
> > modifying the time constant passed in adjtimex structure instead of
> > changing SHIFT_PLL? The PLL response will be exactly the same, but it
> > will allow the applications (and admins) to detect that it is
> > different than expected.
> >
> > Something like:
> >
> > --- a/kernel/time/ntp.c
> > +++ b/kernel/time/ntp.c
> > @@ -425,6 +425,8 @@
> > time_constant = txc->constant;
> > if (!(time_status & STA_NANO))
> > time_constant += 4;
> > + /* We want faster PLL */
> > + time_constant -= 2;
> > time_constant = min(time_constant, (long)MAXTC);
> > time_constant = max(time_constant, 0l);
> > }
>
>
> It looks mathematically equivalent, although I've not had time to test
> it yet. Probably needs a bigger comment :)
>
> The nice thing with this version is that we're able to expose that the
> behavior would be different then other systems, but the other side of
> that coin might be that when the user specifies a time_constant value,
> the interface will show a different one being used. This might cause
> some bug reports saying the interface isn't responding properly, or
> something. Although this is already the case for !STA_NANO, and so far
> few have noticed.
I have checked the NTP sources and the returned time constant is used
only for reporting, at least for NTP it shouldn't cause any problems.
Returning correct time constant will be very useful if NTP developers
decide to use lower values or have it configurable as decreasing the
constant by another two will make the PLL unstable.
Still, I'd really like to see the original behavior restored. Most of
the users complaining about slow convergence are probably just hitting
the calibration problem, which needs to be fixed by other means than
making PLL faster. Also, users of other systems seem to be happy with
their slow convergence. At least that's the impression I have from NTP
lists.
Thanks,
--
Miroslav Lichvar
* Miroslav Lichvar <[email protected]> wrote:
> Still, I'd really like to see the original behavior restored. Most
> of the users complaining about slow convergence are probably just
> hitting the calibration problem, which needs to be fixed by other
> means than making PLL faster. Also, users of other systems seem to
> be happy with their slow convergence. At least that's the
> impression I have from NTP lists.
Wouldnt the goal be to calibrate as fast as possible? (Without any
bad oscillation)
Ingo
On Wed, 2009-06-17 at 19:26 +0200, Ingo Molnar wrote:
> * Miroslav Lichvar <[email protected]> wrote:
>
> > Still, I'd really like to see the original behavior restored. Most
> > of the users complaining about slow convergence are probably just
> > hitting the calibration problem, which needs to be fixed by other
> > means than making PLL faster. Also, users of other systems seem to
> > be happy with their slow convergence. At least that's the
> > impression I have from NTP lists.
>
> Wouldnt the goal be to calibrate as fast as possible? (Without any
> bad oscillation)
I believe he means the TSC calibration error issue, where every boot the
TSC calibration varies by 30-80ppm. This makes it hard for systems to
stay in NTP sync after a reboot, because ntpd has to search for a new
freq (and the SHIFT_PLL & time_constant values control how fast that
happens).
While the TSC calibration is an issue, there is also the fact that NTP's
slow convergence model (which is "by design", for good or bad) doesn't
seem to handle thermal environment changes quickly enough to keep close
sync.
Now, weather we fix the change by tweaking ntpd or the kernel, I still
think is a question that we've not answered well. Even though I'm of the
opinion something needs to change, I'm not yet convinced of which side
is the right side to fix. And that is why I requested we hold off on
merging the SHIFT_PLL patch.
And really, if you look at Miroslav's patch, which is mathematically
equivalent to the SHIFT_PLL change, all we're doing is decreasing what
ntpd gave as the time_constant us by two. So the question is, why is
that fix best done in the kernel, instead of making ntpd reduce what it
passes in to the kernel?
thanks
-john
On Wed, Jun 17, 2009 at 07:26:01PM +0200, Ingo Molnar wrote:
> * Miroslav Lichvar <[email protected]> wrote:
>
> > Still, I'd really like to see the original behavior restored. Most
> > of the users complaining about slow convergence are probably just
> > hitting the calibration problem, which needs to be fixed by other
> > means than making PLL faster. Also, users of other systems seem to
> > be happy with their slow convergence. At least that's the
> > impression I have from NTP lists.
>
> Wouldnt the goal be to calibrate as fast as possible? (Without any
> bad oscillation)
Not really. It depends on how noisy is the input signal. On an idle
LAN the jitter is just few microseconds, but over internet it easily
reaches miliseconds. Over a certain point faster PLL will just make
things worse.
PLL is mainly about handling the signal noise, frequency adjusting is
secondary. When the noise is very low or the update interval is long
enough, the frequency variations caused by temperature changes will
dominate the signal noise and this is where FLL should kick in.
The PLL/FLL switching is controlled by update interval. Ideally it
would be adaptive, but NTP is not that sophisticated. By default, FLL
is enabled when the interval is longer than 2048 seconds. This is of
course not the optimal value for all systems.
Unfortunately in kernel it can be configured only to 2048 or 256 and
NTP never uses the shorter one. The NTP daemon has its own loop which
can be used instead and it allows to use arbitrary values though.
--
Miroslav Lichvar
* Miroslav Lichvar <[email protected]> wrote:
> On Wed, Jun 17, 2009 at 07:26:01PM +0200, Ingo Molnar wrote:
> > * Miroslav Lichvar <[email protected]> wrote:
> >
> > > Still, I'd really like to see the original behavior restored.
> > > Most of the users complaining about slow convergence are
> > > probably just hitting the calibration problem, which needs to
> > > be fixed by other means than making PLL faster. Also, users of
> > > other systems seem to be happy with their slow convergence. At
> > > least that's the impression I have from NTP lists.
> >
> > Wouldnt the goal be to calibrate as fast as possible? (Without
> > any bad oscillation)
>
> Not really. It depends on how noisy is the input signal. On an
> idle LAN the jitter is just few microseconds, but over internet it
> easily reaches miliseconds. Over a certain point faster PLL will
> just make things worse.
That is what i called 'bad oscillation' - a 'too fast' PLL that
over-compensates and does not converge well enough.
Is there a claim that this change causes that? (John's testing
suggested that there's no such effect)
> PLL is mainly about handling the signal noise, frequency adjusting
> is secondary. When the noise is very low or the update interval is
> long enough, the frequency variations caused by temperature
> changes will dominate the signal noise and this is where FLL
> should kick in.
>
> The PLL/FLL switching is controlled by update interval. Ideally it
> would be adaptive, but NTP is not that sophisticated. By default,
> FLL is enabled when the interval is longer than 2048 seconds. This
> is of course not the optimal value for all systems.
>
> Unfortunately in kernel it can be configured only to 2048 or 256
> and NTP never uses the shorter one. The NTP daemon has its own
> loop which can be used instead and it allows to use arbitrary
> values though.
How about going towards the ideal, adaptive design, to which ntpd
passes in time samples and which observes noise and converges as
quickly as possible (given the noise level) and stays stable once
there? I guess we need extensions to the NTP syscall for that.
The NTP code in kernel/time/ntp.c is now reasonably clean for
efforts like that.
It would also pave the way to properly support PPS devices in the
kernel. Would you be interested in things like this?
Ingo
On Tue, Jun 23, 2009 at 11:57:45AM +0200, Ingo Molnar wrote:
> > > Wouldnt the goal be to calibrate as fast as possible? (Without
> > > any bad oscillation)
> >
> > Not really. It depends on how noisy is the input signal. On an
> > idle LAN the jitter is just few microseconds, but over internet it
> > easily reaches miliseconds. Over a certain point faster PLL will
> > just make things worse.
>
> That is what i called 'bad oscillation' - a 'too fast' PLL that
> over-compensates and does not converge well enough.
>
> Is there a claim that this change causes that? (John's testing
> suggested that there's no such effect)
I think John's tests were done on LAN and in an environment with
sudden temperature changes. This is the case where frequency
variations strongly dominate the noise and faster PLL performs better.
On the opposite side is an idle machine in a room with stable
temperature syncing over wireless or dial-up. I don't have access to
such machine, but in simulations (noise with exponential distribution)
I see that offset RMS doubles when the time constant is decreased by 2.
Maybe for most of the users the change would be an improvement. I
don't have any statistics to back it up or claim otherwise. However,
if the constant needs to be adjusted, it's better to do it in NTP.
> > PLL is mainly about handling the signal noise, frequency adjusting
> > is secondary. When the noise is very low or the update interval is
> > long enough, the frequency variations caused by temperature
> > changes will dominate the signal noise and this is where FLL
> > should kick in.
> >
> > The PLL/FLL switching is controlled by update interval. Ideally it
> > would be adaptive, but NTP is not that sophisticated. By default,
> > FLL is enabled when the interval is longer than 2048 seconds. This
> > is of course not the optimal value for all systems.
> >
> > Unfortunately in kernel it can be configured only to 2048 or 256
> > and NTP never uses the shorter one. The NTP daemon has its own
> > loop which can be used instead and it allows to use arbitrary
> > values though.
>
> How about going towards the ideal, adaptive design, to which ntpd
> passes in time samples and which observes noise and converges as
> quickly as possible (given the noise level) and stays stable once
> there? I guess we need extensions to the NTP syscall for that.
Not sure how hard that would be. The ntp-hackers list is a better
place to discuss such modifications.
Other NTP clients don't have to use the PLL interface. For example,
chrony uses only the SINGLESHOT mode and sets the frequency directly.
It has an adaptive model using linear regression, it converges really
fast and in my tests performs better than NTP.
> The NTP code in kernel/time/ntp.c is now reasonably clean for
> efforts like that.
>
> It would also pave the way to properly support PPS devices in the
> kernel. Would you be interested in things like this?
I'm not very familiar with the PPS API, is there something wrong with
it?
--
Miroslav Lichvar
* Miroslav Lichvar <[email protected]> wrote:
> On Tue, Jun 23, 2009 at 11:57:45AM +0200, Ingo Molnar wrote:
> > > > Wouldnt the goal be to calibrate as fast as possible? (Without
> > > > any bad oscillation)
> > >
> > > Not really. It depends on how noisy is the input signal. On an
> > > idle LAN the jitter is just few microseconds, but over internet it
> > > easily reaches miliseconds. Over a certain point faster PLL will
> > > just make things worse.
> >
> > That is what i called 'bad oscillation' - a 'too fast' PLL that
> > over-compensates and does not converge well enough.
> >
> > Is there a claim that this change causes that? (John's testing
> > suggested that there's no such effect)
>
> I think John's tests were done on LAN and in an environment with
> sudden temperature changes. This is the case where frequency
> variations strongly dominate the noise and faster PLL performs
> better.
I'd also expect this to be quite similar to most everyday Linux
uses.
> On the opposite side is an idle machine in a room with stable
> temperature syncing over wireless or dial-up. I don't have access
> to such machine, but in simulations (noise with exponential
> distribution) I see that offset RMS doubles when the time constant
> is decreased by 2.
The thing is, an idle machine in a room with stable temperature is
in a good position anyway to have stable time, right? We should
rather care about the common-case of temperature variations,
reboots, etc.
That is where NTP _helps the most_ - as the physical environment is
very entropy laden to begin with.
> Maybe for most of the users the change would be an improvement. I
> don't have any statistics to back it up or claim otherwise.
> However, if the constant needs to be adjusted, it's better to do
> it in NTP.
>
> > > PLL is mainly about handling the signal noise, frequency adjusting
> > > is secondary. When the noise is very low or the update interval is
> > > long enough, the frequency variations caused by temperature
> > > changes will dominate the signal noise and this is where FLL
> > > should kick in.
> > >
> > > The PLL/FLL switching is controlled by update interval. Ideally it
> > > would be adaptive, but NTP is not that sophisticated. By default,
> > > FLL is enabled when the interval is longer than 2048 seconds. This
> > > is of course not the optimal value for all systems.
> > >
> > > Unfortunately in kernel it can be configured only to 2048 or 256
> > > and NTP never uses the shorter one. The NTP daemon has its own
> > > loop which can be used instead and it allows to use arbitrary
> > > values though.
> >
> > How about going towards the ideal, adaptive design, to which ntpd
> > passes in time samples and which observes noise and converges as
> > quickly as possible (given the noise level) and stays stable once
> > there? I guess we need extensions to the NTP syscall for that.
>
> Not sure how hard that would be. The ntp-hackers list is a better
> place to discuss such modifications.
>
> Other NTP clients don't have to use the PLL interface. For
> example, chrony uses only the SINGLESHOT mode and sets the
> frequency directly. It has an adaptive model using linear
> regression, it converges really fast and in my tests performs
> better than NTP.
That's good. Could this be integrated into the kernel, for even
better results?
> > The NTP code in kernel/time/ntp.c is now reasonably clean for
> > efforts like that.
> >
> > It would also pave the way to properly support PPS devices in
> > the kernel. Would you be interested in things like this?
>
> I'm not very familiar with the PPS API, is there something wrong
> with it?
The PPS patches i've seen just export IRQ timestamps to user-space.
That is not very robust in my opinion when it comes to do time
approximations - to get quick, low-latency action and precise
measurements it's best to keep the critical path as short as
possible, and within a single source code repository: i.e. within
the kernel.
There's little policy really, other than setting some general
parameters. NTPd can still provide the raw _network time_
timestamps, as that is probably best fetched by user-space and fed
to the kernel.
Ingo
On Tue, Jun 23, 2009 at 03:36:25PM +0200, Ingo Molnar wrote:
> > I think John's tests were done on LAN and in an environment with
> > sudden temperature changes. This is the case where frequency
> > variations strongly dominate the noise and faster PLL performs
> > better.
>
> I'd also expect this to be quite similar to most everyday Linux
> uses.
I'd say that most users keep the default distribution configs, i.e.
syncing over internet to servers from pool.ntp.org. The network jitter
is in hundreds of microseconds or even miliseconds and the temperature
changes are dominated by the noise.
> > Other NTP clients don't have to use the PLL interface. For
> > example, chrony uses only the SINGLESHOT mode and sets the
> > frequency directly. It has an adaptive model using linear
> > regression, it converges really fast and in my tests performs
> > better than NTP.
>
> That's good. Could this be integrated into the kernel, for even
> better results?
The code is quite complex with possibly lot of room for improvement.
I think it's better to keep it in userspace. There are two things that
would help chrony on kernel side though. Supporting nanosecond offset
in the SINGLESHOT mode and updating the reported offset with every
adjtimex call, not only once per second, so chrony would know exactly
how much of the offset was already applied.
> > I'm not very familiar with the PPS API, is there something wrong
> > with it?
>
> The PPS patches i've seen just export IRQ timestamps to user-space.
>
> That is not very robust in my opinion when it comes to do time
> approximations - to get quick, low-latency action and precise
> measurements it's best to keep the critical path as short as
> possible, and within a single source code repository: i.e. within
> the kernel.
That's what kernel PPS discipline does, it will be probably included
later. Its performance is an order or two better than the PLL/FLL
discipline.
--
Miroslav Lichvar
* Miroslav Lichvar <[email protected]> wrote:
> > > I'm not very familiar with the PPS API, is there something
> > > wrong with it?
> >
> > The PPS patches i've seen just export IRQ timestamps to
> > user-space.
> >
> > That is not very robust in my opinion when it comes to do time
> > approximations - to get quick, low-latency action and precise
> > measurements it's best to keep the critical path as short as
> > possible, and within a single source code repository: i.e.
> > within the kernel.
>
> That's what kernel PPS discipline does, it will be probably
> included later. Its performance is an order or two better than the
> PLL/FLL discipline.
Is there some kernel patch i can look at?
Ingo
On Tue, Jun 23, 2009 at 09:18:38PM +0200, Ingo Molnar wrote:
>
> * Miroslav Lichvar <[email protected]> wrote:
>
> > > > I'm not very familiar with the PPS API, is there something
> > > > wrong with it?
> > >
> > > The PPS patches i've seen just export IRQ timestamps to
> > > user-space.
> > >
> > > That is not very robust in my opinion when it comes to do time
> > > approximations - to get quick, low-latency action and precise
> > > measurements it's best to keep the critical path as short as
> > > possible, and within a single source code repository: i.e.
> > > within the kernel.
> >
> > That's what kernel PPS discipline does, it will be probably
> > included later. Its performance is an order or two better than the
> > PLL/FLL discipline.
>
> Is there some kernel patch i can look at?
It's in the old PPSkit patches for 2.4 kernels, function hardpps().
--
Miroslav Lichvar
On Tue, 2009-06-23 at 15:36 +0200, Ingo Molnar wrote:
> The PPS patches i've seen just export IRQ timestamps to user-space.
>
> That is not very robust in my opinion when it comes to do time
> approximations - to get quick, low-latency action and precise
> measurements it's best to keep the critical path as short as
> possible, and within a single source code repository: i.e. within
> the kernel.
>
> There's little policy really, other than setting some general
> parameters. NTPd can still provide the raw _network time_
> timestamps, as that is probably best fetched by user-space and fed
> to the kernel.
At some point that stops being NTP. NTP has quite a bit of userland
policy for filtering and managing a number of different network clocks
(other ntp servers, PPS sources, etc).
>From what you're describing (direct offset from a hardware time device
used to steer the clock directly in kernel), you might want to look at
the STP code in s390 (stp_sync_clock).
thanks
-john
> On Tue, 2009-06-23 at 15:36 +0200, Ingo Molnar wrote:
> > The PPS patches i've seen just export IRQ timestamps to user-space.
Correct. They improve the sampling information quality and help cut down
on jitter between the IRQ sampling and using the timestamp. The jitter is
what matters most here and NTP can figure out constant latencies rather
well.
> At some point that stops being NTP. NTP has quite a bit of userland
> policy for filtering and managing a number of different network clocks
> (other ntp servers, PPS sources, etc).
>
> >From what you're describing (direct offset from a hardware time device
> used to steer the clock directly in kernel), you might want to look at
> the STP code in s390 (stp_sync_clock).
And also hardware distributed timing systems like those that distribute a
clock with ethernet signals.
Alan
On Wed, 24 Jun 2009 10:29:15 +0100
Alan Cox <[email protected]> wrote:
> > At some point that stops being NTP. NTP has quite a bit of userland
> > policy for filtering and managing a number of different network clocks
> > (other ntp servers, PPS sources, etc).
> >
> > >From what you're describing (direct offset from a hardware time device
> > used to steer the clock directly in kernel), you might want to look at
> > the STP code in s390 (stp_sync_clock).
>
> And also hardware distributed timing systems like those that distribute a
> clock with ethernet signals.
The STP clock synchronization works below the kernel. Usually we don't
notice the clock drift at all, the kernel has the illusion of a perfect
clock. Only if the delta is over the clock synchronization tolerance
the hardware causes a machine check. Then it becomes the job of the
operating system to deal with the clock delta. The current Linux code
applies the offset to the hardware clock (which makes the TOD clock
non-monotonic) and applies the same offset to the base value
sched_clock_base_cc to even out the effect. Then a single shot
adjustment is passed to NTP to get the system time in sync with the
hardware clock again.
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.